BigQuery & the Data Sync Sequence
What data lives where in BigQuery, how it's grouped, and the daily/weekly/ monthly order in which it refreshes. The Worker reads from BQ; almost nothing writes to BQ from the Worker — the writes come from n8n ETL workflows and BQ scheduled queries (see
n8n-workflows.md).Project:
listingmanager-1529856313699. Verified 2026-05-20.
1. Datasets
The Worker references 8 datasets (ref count = how heavily the code uses each):
| Dataset | Role |
|---|---|
fcr_operations |
The main operational dataset — 73 tables + 5 views. Everything the dashboard owns. |
SUPERSET_MASTER |
Golden Pages listing master — dim_listings_with_subscriber_id (listing ↔ subscriber bridge), mv_subscriber_service_map (GBP location mapping). |
DYNAMICS_DATA |
Raw CRM (Dynamics) — Accounts_and_related_data, the source the active_clients scheduled query reads. |
GOOGLE_ACCOUNT_DATA |
GMB / GA4 / GSC performance feeds (view_gmb_loc_keywords, etc.). |
GOOGLE_ADS_TRANSFER_EU |
Google Ads Data Transfer (spend, clicks, conversions). |
GOOGLE_MC_AUTO |
Google Merchant Centre feed health / shopping. |
datastudio, VIEW_ALL |
misc / legacy reporting views. |
2. fcr_operations — tables by domain (78 total)
HubSpot deals & engagements (see hubspot-data-architecture.md for full detail)
hubspot_deals (open), hubspot_deals_staging,
hubspot_deals_backcatalogue_{18m·,2024_2025,2025_summer,active,2025_2026,staging},
hubspot_deal_property_history_backcatalogue_{18m·,…} (stage-transition history),
hubspot_engagements (+ _staging), deal_briefs, graham_hubspot_deals (+_staging).
(· = VIEW)
CRM, AM & billing
active_clients (daily snapshot — MRR/products/AM, the system of record for billing state),
am_attributes, am_email_templates, am_team_email_templates, roam_users
(person ↔ Roam ↔ HubSpot identity), advisor_usage_log, bg_replacement_proxy.
Category & keyword intelligence
KEYWORD_INTELLIGENCE, KEYWORD_INTELLIGENCE_GP_V2, KEYWORD_INTELLIGENCE_MONTHLY·,
CATEGORY_BENCHMARKS, CATEGORY_MAPPING, CATEGORY_NORMALISATION,
CATEGORY_TOP_KEYWORDS, CATEGORY_SERVICES_BENCHMARK, CATEGORY_CROSS_HEADINGS,
CATEGORY_GBP_ACTIVITY, CATEGORY_CUSTOMER_PRIORITIES, CATEGORY_REVIEW_PRIORITIES.
Note: KEYWORD_INTELLIGENCE / CATEGORY_BENCHMARKS are cumulative, not monthly.
CATEGORY_GBP_ACTIVITYis genuinely monthly.
GBP (Google Business Profile)
GBP_INFORMATION, GBP_PRODUCTS, GBP_SERVICES, GBP_RECOMMENDED_SERVICES,
GBP_REVIEWS, client_reviews_cache, serp_gbp_snapshots.
InSites / prospecting
USG_INSITES_AUDIT_INVENTORY (live), USG_INSITES_REPORT_MAP (+_STAGING),
USG_INSITES_TEST, insites_audit_relationships (parent↔competitor links),
PROSPECT_LISTINGS, prospect_engagement_events.
External caches (refreshed by n8n, read by the Worker)
ahrefs_seo_cache, built_websites_cache, yext_listings_cache,
sitepro_listing_map, sitepro_megadoc_cache.
Area / census / market (mostly static reference data)
cso_small_areas / cso_small_areas_v2, cso_demographics, cso_new_builds,
pobal_deprivation, valuation_properties, cro_companies_agg,
facebook_reach_estimates, irish_places (gazetteer), sport_ireland_clubs.
Teamwork (projects + tickets)
teamwork_open_tickets (+_old,_staging), teamwork_milestones,
teamwork_project_milestones, teamwork_project_tasks.
SayMore pilot (90-day program tracking)
saymore_pilot_cohort_2026_05, saymore_pilot_gmb_baseline_2026_05,
saymore_pilot_keyword_baseline_2026_05, v_saymore_pilot_master_tracker·,
v_saymore_program_universe·.
3. The staging→prod pattern
Several syncs write to a _staging table first, then atomically swap into prod
(so readers never see a half-loaded table): hubspot_deals_staging,
hubspot_engagements_staging, *_backcatalogue_staging,
USG_INSITES_REPORT_MAP_STAGING, teamwork_open_tickets_staging. BigQuery
loads/queries are atomic, so a reader always sees either the old or the new
table, never a partial one.
4. The daily sync sequence (assumed order, UTC)
This is the order things refresh on a normal day. Worker crons are sequenced after the data they depend on has landed.
~03:00 (weekly Sun) HubSpot Deals Back-Catalogue rolling cohort → BQ
~03:00 (monthly 8th) GMB performance feeds → GOOGLE_ACCOUNT_DATA
~04:00 (weekly Tue) SitePro MegaDoc → sitepro_megadoc_cache / sitepro_listing_map
───────────────────────────────────────────────────────────────────
~05:45 hubspot_deals full truncate+reload (n8n: HubSpot Deal Sync, twice daily)
~06:00 active_clients scheduled query (DYNAMICS_DATA → fcr_operations)
07:30 n8n: HubSpot Engagements Daily Delta → hubspot_engagements
07:30 Worker cron: pre-warm AM portfolio caches
07:45 Worker cron: pre-warm check-account for open-deal subscribers
08:00 Worker cron: deal-brief refresh (chunked) — first run AFTER the above
⋮ (then every 2h to 20:00)
every minute Worker cron: Pleper + SerpAPI enrich + HubSpot engagements drain
Why the order matters: the brief composer reads hubspot_deals,
active_clients, and hubspot_engagements. The 08:00 brief cron is placed
after the deal sync (05:45), the billing snapshot (06:00), the engagement
delta (07:30), and the cache pre-warms (07:30/07:45) — so when it composes, it
reads fresh data and hits warm caches. (See fcr-dashboard-architecture-overview.md
for the brief pipeline.)
Cadence notes:
- HubSpot deals: twice daily (the early run lands ~05:45). Open-only — closed deals move to the back-catalogue.
active_clients: daily BQ scheduled query (~06:00), snapshot per day.- Back-catalogue cohort: weekly (Sun 03:00) — grows as deals close.
- Engagements: daily delta (07:30) + every-minute straggler drain via Worker.
- SitePro MegaDoc: weekly (Tue 04:00). GMB perf: monthly (8th, 03:00).
- Yext sync: monthly (SayMore universe fan-out).
- Census / valuation / Pobal / CRO / irish_places: static (multi-year refresh).
5. Google data transfers & the subscriber-id bridges
The Google datasets are fed by Google's own BigQuery Data Transfer Service (configured in the GCP console, not in this repo — there's no n8n/Worker job for them). They land keyed by Google's IDs (customer_id, property, location), so the subscriber-id bridge tables are how we join them back to an FCR account.
Google Ads — GOOGLE_ADS_TRANSFER_EU
- The standard Google Ads Data Transfer schema (~60 tables:
ads_AccountStats_*,ads_CampaignStats_*,ads_AdGroupStats_*,ads_Campaign_*,ads_Ad_*,ads_Budget_*, geo/audience/asset variants, etc.). - Suffixed with the MCC (manager) customer id
3833870882— the login customer; individual client accounts sit under it (GOOGLE_ADS_CUSTOMER_IDvars: customer2935942007, login3833870882). - Bridge:
GOOGLE_ACCOUNT_DATA.GoogleAccount_Subscr_MAP_GAds(+_GAds_native,_GAdsMgmtfeeChange) maps a Google Ads customer_id → subscriber_id. - Refresh: Google's transfer (daily, Google-managed). Ads spend here is retail.
Merchant Centre — GOOGLE_MC_AUTO
- Merchant Centre transfer, suffixed with MC account
422504888:ProductPerformance_*,Products_*,PriceInsights_*,PriceCompetitiveness_*,BestSellers*,LocalInventories_*,RegionalInventories_*,ProductPerformance_AllFields_View. - Refresh: Google-managed transfer.
GA4 / GMB / GSC — GOOGLE_ACCOUNT_DATA
Monthly partition-style tables (_YYYYMMDD suffix per month):
| Surface | Tables | Bridge to subscriber_id |
|---|---|---|
| GA4 | GA4_REPORTS_YYYYMMDD (monthly), GA4_ACCOUNTS, GA4_PROPERTIES, GA4_DATASTREAMS |
GoogleAccount_Subscr_MAP_GAnalytics |
| GMB | GMB_LOCATIONS, GMB_KEYWORDS_YYYYMMDD, GMB_LOCALPOSTS_YYYYMMDD, GMB_ACCOUNTS |
mv_subscriber_service_map (service = "Google Business Profile") → GMB_LOCATIONS |
| GSC | GSC_REPORTS_YYYYMMDD (monthly), GSC_SITEMAPS_YYYYMMDD |
GoogleAccount_Subscr_MAP_GSearchConsole_native |
The GMB perf feeds are loaded monthly (8th, ~03:00 UTC) by an external ETL — not owned by this repo. GA4/GSC partitions also land monthly. So GA4/GSC/GMB data is month-grained — present it as such, never as live/daily.
The bridge tables (how subscriber_id joins everything)
| Bridge | Lives in | Maps |
|---|---|---|
dim_listings_with_subscriber_id |
SUPERSET_MASTER |
GP listing (place_id, bizname) ↔ subscriber_id ↔ CRM AccountName — the primary name/ID resolver |
mv_subscriber_service_map |
SUPERSET_MASTER |
subscriber_id ↔ service_id per service (e.g. GBP location_id) |
GoogleAccount_Subscr_MAP_GAds / _GAnalytics / _GSearchConsole_native |
GOOGLE_ACCOUNT_DATA |
subscriber_id ↔ Google Ads customer / GA4 property / GSC site |
sitepro_listing_map |
fcr_operations |
FCR-built SitePro site ↔ GP listing_id (93% URL-matched) |
So a question like "this client's ads + GBP + GSC" resolves the subscriber_id
once (via dim_listings_with_subscriber_id), then fans out through the
service-specific maps to the Google customer/property/location ids, then reads
the per-service tables. The composite check-enrichment endpoint does exactly this.
6. How the Worker uses BQ
- Reads via
worker/src/bq/client.js(queryBQ) — BigQuery REST + a service-account JWT (bq/jwt.js). Generic ad-hoc reads go through the/dashboard-bq-executeendpoint (read-only). - Writes are rare and gated: DDL/inserts go through
/dashboard-bq-admin(needsx-admin-key) or streaminginsertRows(e.g. thedeal_briefsaudit). The bulk data writes are not the Worker's job — n8n owns ETL. active_clientsMRR trap:SUM(monthly_value)overstates MRR because DEA/Setup one-off lines carry the full amount; filterri_period > 0for recurring MRR.
FCR Dashboard documentation · generated from docs/ · keep counts verified, not guessed.