BigQuery & the Data Sync Sequence

What data lives where in BigQuery, how it's grouped, and the daily/weekly/ monthly order in which it refreshes. The Worker reads from BQ; almost nothing writes to BQ from the Worker — the writes come from n8n ETL workflows and BQ scheduled queries (see n8n-workflows.md).

Project: listingmanager-1529856313699. Verified 2026-05-20.

1. Datasets

The Worker references 8 datasets (ref count = how heavily the code uses each):

Dataset	Role
`fcr_operations`	The main operational dataset — 73 tables + 5 views. Everything the dashboard owns.
`SUPERSET_MASTER`	Golden Pages listing master — `dim_listings_with_subscriber_id` (listing ↔ subscriber bridge), `mv_subscriber_service_map` (GBP location mapping).
`DYNAMICS_DATA`	Raw CRM (Dynamics) — `Accounts_and_related_data`, the source the `active_clients` scheduled query reads.
`GOOGLE_ACCOUNT_DATA`	GMB / GA4 / GSC performance feeds (`view_gmb_loc_keywords`, etc.).
`GOOGLE_ADS_TRANSFER_EU`	Google Ads Data Transfer (spend, clicks, conversions).
`GOOGLE_MC_AUTO`	Google Merchant Centre feed health / shopping.
`datastudio`, `VIEW_ALL`	misc / legacy reporting views.

2. `fcr_operations` — tables by domain (78 total)

HubSpot deals & engagements (see hubspot-data-architecture.md for full detail) hubspot_deals (open), hubspot_deals_staging, hubspot_deals_backcatalogue_{18m·,2024_2025,2025_summer,active,2025_2026,staging}, hubspot_deal_property_history_backcatalogue_{18m·,…} (stage-transition history), hubspot_engagements (+ _staging), deal_briefs, graham_hubspot_deals (+_staging). (· = VIEW)

CRM, AM & billing active_clients (daily snapshot — MRR/products/AM, the system of record for billing state), am_attributes, am_email_templates, am_team_email_templates, roam_users (person ↔ Roam ↔ HubSpot identity), advisor_usage_log, bg_replacement_proxy.

Category & keyword intelligence KEYWORD_INTELLIGENCE, KEYWORD_INTELLIGENCE_GP_V2, KEYWORD_INTELLIGENCE_MONTHLY·, CATEGORY_BENCHMARKS, CATEGORY_MAPPING, CATEGORY_NORMALISATION, CATEGORY_TOP_KEYWORDS, CATEGORY_SERVICES_BENCHMARK, CATEGORY_CROSS_HEADINGS, CATEGORY_GBP_ACTIVITY, CATEGORY_CUSTOMER_PRIORITIES, CATEGORY_REVIEW_PRIORITIES.

Note: KEYWORD_INTELLIGENCE / CATEGORY_BENCHMARKS are cumulative, not monthly. CATEGORY_GBP_ACTIVITY is genuinely monthly.

GBP (Google Business Profile) GBP_INFORMATION, GBP_PRODUCTS, GBP_SERVICES, GBP_RECOMMENDED_SERVICES, GBP_REVIEWS, client_reviews_cache, serp_gbp_snapshots.

InSites / prospecting USG_INSITES_AUDIT_INVENTORY (live), USG_INSITES_REPORT_MAP (+_STAGING), USG_INSITES_TEST, insites_audit_relationships (parent↔competitor links), PROSPECT_LISTINGS, prospect_engagement_events.

External caches (refreshed by n8n, read by the Worker) ahrefs_seo_cache, built_websites_cache, yext_listings_cache, sitepro_listing_map, sitepro_megadoc_cache.

Area / census / market (mostly static reference data) cso_small_areas / cso_small_areas_v2, cso_demographics, cso_new_builds, pobal_deprivation, valuation_properties, cro_companies_agg, facebook_reach_estimates, irish_places (gazetteer), sport_ireland_clubs.

Teamwork (projects + tickets) teamwork_open_tickets (+_old,_staging), teamwork_milestones, teamwork_project_milestones, teamwork_project_tasks.

SayMore pilot (90-day program tracking) saymore_pilot_cohort_2026_05, saymore_pilot_gmb_baseline_2026_05, saymore_pilot_keyword_baseline_2026_05, v_saymore_pilot_master_tracker·, v_saymore_program_universe·.

3. The staging→prod pattern

Several syncs write to a _staging table first, then atomically swap into prod (so readers never see a half-loaded table): hubspot_deals_staging, hubspot_engagements_staging, *_backcatalogue_staging, USG_INSITES_REPORT_MAP_STAGING, teamwork_open_tickets_staging. BigQuery loads/queries are atomic, so a reader always sees either the old or the new table, never a partial one.

4. The daily sync sequence (assumed order, UTC)

This is the order things refresh on a normal day. Worker crons are sequenced after the data they depend on has landed.

 ~03:00  (weekly Sun)  HubSpot Deals Back-Catalogue rolling cohort → BQ
 ~03:00  (monthly 8th) GMB performance feeds → GOOGLE_ACCOUNT_DATA
 ~04:00  (weekly Tue)  SitePro MegaDoc → sitepro_megadoc_cache / sitepro_listing_map
 ───────────────────────────────────────────────────────────────────
 ~05:45  hubspot_deals  full truncate+reload  (n8n: HubSpot Deal Sync, twice daily)
 ~06:00  active_clients scheduled query  (DYNAMICS_DATA → fcr_operations)
 07:30   n8n: HubSpot Engagements Daily Delta → hubspot_engagements
 07:30   Worker cron: pre-warm AM portfolio caches
 07:45   Worker cron: pre-warm check-account for open-deal subscribers
 08:00   Worker cron: deal-brief refresh (chunked) — first run AFTER the above
   ⋮     (then every 2h to 20:00)
 every minute  Worker cron: Pleper + SerpAPI enrich + HubSpot engagements drain

Why the order matters: the brief composer reads hubspot_deals, active_clients, and hubspot_engagements. The 08:00 brief cron is placed after the deal sync (~~05:45), the billing snapshot (~~06:00), the engagement delta (07:30), and the cache pre-warms (07:30/07:45) — so when it composes, it reads fresh data and hits warm caches. (See fcr-dashboard-architecture-overview.md for the brief pipeline.)

Cadence notes:

HubSpot deals: twice daily (the early run lands ~05:45). Open-only — closed deals move to the back-catalogue.
active_clients: daily BQ scheduled query (~06:00), snapshot per day.
Back-catalogue cohort: weekly (Sun 03:00) — grows as deals close.
Engagements: daily delta (07:30) + every-minute straggler drain via Worker.
SitePro MegaDoc: weekly (Tue 04:00). GMB perf: monthly (8th, 03:00).
Yext sync: monthly (SayMore universe fan-out).
Census / valuation / Pobal / CRO / irish_places: static (multi-year refresh).

5. Google data transfers & the subscriber-id bridges

The Google datasets are fed by Google's own BigQuery Data Transfer Service (configured in the GCP console, not in this repo — there's no n8n/Worker job for them). They land keyed by Google's IDs (customer_id, property, location), so the subscriber-id bridge tables are how we join them back to an FCR account.

Google Ads — `GOOGLE_ADS_TRANSFER_EU`

The standard Google Ads Data Transfer schema (~60 tables: ads_AccountStats_*, ads_CampaignStats_*, ads_AdGroupStats_*, ads_Campaign_*, ads_Ad_*, ads_Budget_*, geo/audience/asset variants, etc.).
Suffixed with the MCC (manager) customer id 3833870882 — the login customer; individual client accounts sit under it (GOOGLE_ADS_CUSTOMER_ID vars: customer 2935942007, login 3833870882).
Bridge: GOOGLE_ACCOUNT_DATA.GoogleAccount_Subscr_MAP_GAds (+ _GAds_native, _GAdsMgmtfeeChange) maps a Google Ads customer_id → subscriber_id.
Refresh: Google's transfer (daily, Google-managed). Ads spend here is retail.

Merchant Centre — `GOOGLE_MC_AUTO`

Merchant Centre transfer, suffixed with MC account 422504888: ProductPerformance_*, Products_*, PriceInsights_*, PriceCompetitiveness_*, BestSellers*, LocalInventories_*, RegionalInventories_*, ProductPerformance_AllFields_View.
Refresh: Google-managed transfer.

GA4 / GMB / GSC — `GOOGLE_ACCOUNT_DATA`

Monthly partition-style tables (_YYYYMMDD suffix per month):

Surface	Tables	Bridge to subscriber_id
GA4	`GA4_REPORTS_YYYYMMDD` (monthly), `GA4_ACCOUNTS`, `GA4_PROPERTIES`, `GA4_DATASTREAMS`	`GoogleAccount_Subscr_MAP_GAnalytics`
GMB	`GMB_LOCATIONS`, `GMB_KEYWORDS_YYYYMMDD`, `GMB_LOCALPOSTS_YYYYMMDD`, `GMB_ACCOUNTS`	`mv_subscriber_service_map` (service = "Google Business Profile") → `GMB_LOCATIONS`
GSC	`GSC_REPORTS_YYYYMMDD` (monthly), `GSC_SITEMAPS_YYYYMMDD`	`GoogleAccount_Subscr_MAP_GSearchConsole_native`

The GMB perf feeds are loaded monthly (8th, ~03:00 UTC) by an external ETL — not owned by this repo. GA4/GSC partitions also land monthly. So GA4/GSC/GMB data is month-grained — present it as such, never as live/daily.

The bridge tables (how subscriber_id joins everything)

Bridge	Lives in	Maps
`dim_listings_with_subscriber_id`	`SUPERSET_MASTER`	GP listing (place_id, bizname) ↔ subscriber_id ↔ CRM AccountName — the primary name/ID resolver
`mv_subscriber_service_map`	`SUPERSET_MASTER`	subscriber_id ↔ service_id per service (e.g. GBP location_id)
`GoogleAccount_Subscr_MAP_GAds / _GAnalytics / _GSearchConsole_native`	`GOOGLE_ACCOUNT_DATA`	subscriber_id ↔ Google Ads customer / GA4 property / GSC site
`sitepro_listing_map`	`fcr_operations`	FCR-built SitePro site ↔ GP listing_id (93% URL-matched)

So a question like "this client's ads + GBP + GSC" resolves the subscriber_id once (via dim_listings_with_subscriber_id), then fans out through the service-specific maps to the Google customer/property/location ids, then reads the per-service tables. The composite check-enrichment endpoint does exactly this.

6. How the Worker uses BQ

Reads via worker/src/bq/client.js (queryBQ) — BigQuery REST + a service-account JWT (bq/jwt.js). Generic ad-hoc reads go through the /dashboard-bq-execute endpoint (read-only).
Writes are rare and gated: DDL/inserts go through /dashboard-bq-admin (needs x-admin-key) or streaming insertRows (e.g. the deal_briefs audit). The bulk data writes are not the Worker's job — n8n owns ETL.
active_clients MRR trap: SUM(monthly_value) overstates MRR because DEA/Setup one-off lines carry the full amount; filter ri_period > 0 for recurring MRR.

FCR Dashboard documentation · generated from docs/ · keep counts verified, not guessed.