HubSpot Data Architecture — BQ tables, vector indexes, refresh & schema

Map of how HubSpot deal + engagement data flows from the live CRM into BigQuery, the Vectorize indexes, and the precomputed deal-brief cache that the Roam advisor and HubSpot Prospect Card read from. Includes the 18-month closed-deal history structure.

Counts/ranges verified live 2026-05-20. Dataset: listingmanager-1529856313699.fcr_operations.

Flow diagram

                          ┌─────────────────────────┐
                          │   HubSpot CRM (live)     │
                          │   deals + engagements    │
                          └───────────┬─────────────┘
                    daily sync ~05:45 │ + every-min engagement drain
                          ▼
┌──────────────────────────────────────────────────────────────────────────┐
│ BigQuery  listingmanager-1529856313699.fcr_operations                      │
│                                                                            │
│  OPEN DEALS                          CLOSED HISTORY (18 months)            │
│  ┌────────────────────┐             ┌──────────────────────────────────┐  │
│  │ hubspot_deals       │            │ hubspot_deals_backcatalogue_18m  │  │
│  │ 1,473 rows          │            │ (VIEW) 10,810 rows               │  │
│  │ open only           │            │ 2024-11-01 → 2026-05-15          │  │
│  │ truncate+reload     │            │ Lost 9,603 / Won 1,193 / Open 14 │  │
│  │ daily ~05:45 UTC    │            │  = UNION of 3 six-month cohorts: │  │
│  │ last_modified:STR   │            │   • bc_2024_2025   3,311 (Nov–Apr)│ │
│  │ synced_at:TIMESTAMP │            │   • bc_2025_summer 4,213 (May–Oct)│ │
│  └─────────┬──────────┘             │   • bc_active      3,286 (Nov–now)│ │
│            │                        └──────────────┬───────────────────┘  │
│            │   ┌─ bc_2025_2026 3,133 (Nov25–Apr26) frozen, YoY-paired      │
│            │   │   with bc_2024_2025 (same window, +1yr) for win/loss YoY  │
│            │   └─ hubspot_deal_property_history_backcatalogue_* (same       │
│            │       cohort split) = stage-transition history for velocity   │
│            │                                                                │
│  ENGAGEMENTS                         BRIEF AUDIT                            │
│  ┌────────────────────┐             ┌──────────────────────────────────┐  │
│  │ hubspot_engagements │            │ deal_briefs                      │  │
│  │ 230,342 rows        │            │ audit log, 1 row / compose       │  │
│  │ EMAIL110k CALL80k   │            │ partition composed_at, 365d exp  │  │
│  │ TASK23k NOTE17k MTG │            │ brief_json STRING + sources[]    │  │
│  │ every-min cron drain│            └──────────────┬───────────────────┘  │
│  └─────────┬──────────┘                            │                       │
└────────────┼───────────────────────────────────────┼──────────────────────┘
             │                                         │
   ┌─────────┴──────────┐              compose-deal-brief reads:
   │ deals-vector-reindex│              hubspot_deals + backcatalogue_18m
   │ (MANUAL, paginated) │              + engagements + active_clients
   ▼                     │              + live GBP/crawl  ──────────┐
┌────────────────────┐   │                                         ▼
│ VECTORIZE           │   │                            ┌────────────────────────┐
│ fcr-deal-history    │◄──┘                            │ KV  deal-brief:<id>    │
│ 1536d cosine        │                                │ hot blob, 35d TTL      │
│ embeds open+closed  │                                │ chunked cron 08-20/2h  │
│ meta idx: outcome,  │                                │ + prewarm + self-heal  │
│ close_year, county, │                                └───────────┬────────────┘
│ service_trade, owner│                                            │
└─────────┬──────────┘                                            │
          │              ┌──────────────────────┐                 │
          │              │ fcr-site-portfolio   │ (GP-listing-wide,│
          │              │ VECTORIZE — similar  │  feeds prospect_ │
          │              │ FCR sites/clients)   │  intel block)    │
          │              └──────────┬───────────┘                 │
          ▼                         ▼                             ▼
     ┌──────────────────────────────────────────────────────────────┐
     │  Roam advisor  +  HubSpot Prospect Card  (get_deal_brief, etc) │
     └──────────────────────────────────────────────────────────────┘

The 18-month closed-deal history

hubspot_deals_backcatalogue_18m is a VIEW equal to three sequential, non-overlapping 6-month cohort tables UNIONed (3,311 + 4,213 + 3,286 = 10,810):

Cohort table Window Role
hubspot_deals_backcatalogue_2024_2025 2024-11 → 2025-04 oldest 6m; also the frozen YoY baseline
hubspot_deals_backcatalogue_2025_summer 2025-05 → 2025-10 middle 6m
hubspot_deals_backcatalogue_active 2025-11 → 2026-05-15 current / rolling 6m
hubspot_deals_backcatalogue_2025_2026 (NOT in the view) 2025-11 → 2026-04 frozen snapshot of the current window, paired 1:1 with 2024_2025 for apples-to-apples YoY win/loss conversion
hubspot_deals_backcatalogue_staging load staging

A parallel hubspot_deal_property_history_backcatalogue_* family (same cohort split + its own _18m view) holds property-change history — stage transitions over time — for cycle-time / conversion-velocity analysis, separate from the deal snapshots above.

18m view outcome split (verified 2026-05-20): Lost 9,603 / Won 1,193 / Open 14 (the 14 "Open" are stragglers mid-transition between the open table and the back-catalogue).

Stores

Store Type Rows Role
hubspot_deals BASE TABLE 1,473 Open deals only. Closed deals are NOT here — they live solely in the back-catalogue.
hubspot_deals_backcatalogue_18m VIEW 10,810 Rolling 18m closed history (won + lost). UNION of 3 cohorts.
hubspot_deals_backcatalogue_{2024_2025,2025_summer,active,2025_2026,staging} BASE see above Cohort tables behind the view + the frozen YoY pair.
hubspot_deal_property_history_backcatalogue_* BASE + _18m VIEW Property-change/stage-transition history for velocity analysis.
hubspot_engagements BASE TABLE 230,342 Note/email/call/meeting/task bodies. EMAIL 110k, CALL 80k, TASK 23k, NOTE 17k, MEETING 300.
deal_briefs BASE TABLE grows Audit log of precomputed briefs (1 row/compose). Partitioned composed_at, clustered deal_id, 365-day expiry. brief_json STRING + sources_succeeded ARRAY.
graham_hubspot_deals / _staging BASE Separate working copy (not part of the advisor pipeline).
prospect_engagement_events BASE Telemetry, separate.

Vector indexes (Cloudflare Vectorize) — two distinct, don't confuse

Index Binding Embeds Notes
fcr-deal-history DEAL_VECTORIZE open (hubspot_deals) + closed (backcatalogue_18m) deals 1536-dim, cosine, text-embedding-3-small. Metadata indexes: outcome, close_year, county, service_trade, owner_name. Powers "deals like this one / what happened."
fcr-site-portfolio VECTORIZE GP-listing-wide sites (not deals) Feeds the prospect_intel "similar clients" block. GP-listing-wide → contains FRLs + churned, so consumers must split paying vs not (see prospect-intel).

Third index fcr-company-knowledge (COMPANY_KNOWLEDGE_VECTORIZE) exists but is company-knowledge, not HubSpot.

KV

Key Namespace TTL Role
deal-brief:<deal_id> CACHE 35 days Hot precomputed brief blob. Written by compose-deal-brief, read by get-deal-brief (the advisor's get_deal_brief tool).

Refresh cadence

Store Refresh Mechanism
hubspot_deals (open) daily ~05:45 UTC full truncate+reload (external sync)
bc_active / 18m view rolls forward as deals close external sync
frozen cohorts (bc_2024_2025, bc_2025_2026) static one-time snapshots for YoY
hubspot_engagements continuous every-minute worker cron drains stragglers (batch 3)
deal_briefs + KV deal-brief: chunked cron 08:00–20:00 every 2h + prewarm-enqueue + read self-heal compose-deal-brief via BRIEF_QUEUE
fcr-deal-history (vector) manual only — no cron deals-vector-reindex (paginated)
fcr-site-portfolio (vector) manual reindex n8n

Deal-brief compose inputs

compose-deal-brief (brief v1.2.x) reads, in parallel:

  • hubspot_dealsbackcatalogue_18m — the deal row (open or closed)
  • check-account (live CRM + active_clients) — CRM + active lines + MRR + tickets
  • check-enrichment — cached GBP + keywords + ads + GSC + Ahrefs
  • prospect-intel — category keywords, local area, similar paying clients (split from FRLs)
  • deals-vector-search — similar past won/lost deals (from fcr-deal-history)
  • hubspot_engagements — latest ~8 deduped engagements, real conversation ranked over automated
  • live GBP (SerpAPI + Pleper) for unmapped prospects + home-page crawl

Output → KV deal-brief:<deal_id> (hot read) + deal_briefs BQ row (audit).

Gotchas

  • hubspot_deals is open-only. Closed deals resolve only via the back-catalogue, so get_deal_by_id keeps a live-HubSpot fallback.
  • hubspot_engagements has future-dated rows (max observed 2027-05-10) — a source data-quality artifact. Don't trust raw MAX(created_at).
  • fcr-deal-history reindex is manual, so brand-new deals lag the index; compose-deal-brief tolerates a similar_deals miss.
  • CF Queue auto-pauses the brief consumer if a large backlog (1,000+) of slow messages lands at once — drip in ~80-chunks; clear a pause via consumer delete + redeploy. See the deal-brief design notes.

Generated 2026-05-20. Update the counts/ranges if the cohort windows roll.

FCR Dashboard documentation · generated from docs/ · keep counts verified, not guessed.