HubSpot Data Architecture — BQ tables, vector indexes, refresh & schema
Map of how HubSpot deal + engagement data flows from the live CRM into BigQuery, the Vectorize indexes, and the precomputed deal-brief cache that the Roam advisor and HubSpot Prospect Card read from. Includes the 18-month closed-deal history structure.
Counts/ranges verified live 2026-05-20. Dataset:
listingmanager-1529856313699.fcr_operations.
Flow diagram
┌─────────────────────────┐
│ HubSpot CRM (live) │
│ deals + engagements │
└───────────┬─────────────┘
daily sync ~05:45 │ + every-min engagement drain
▼
┌──────────────────────────────────────────────────────────────────────────┐
│ BigQuery listingmanager-1529856313699.fcr_operations │
│ │
│ OPEN DEALS CLOSED HISTORY (18 months) │
│ ┌────────────────────┐ ┌──────────────────────────────────┐ │
│ │ hubspot_deals │ │ hubspot_deals_backcatalogue_18m │ │
│ │ 1,473 rows │ │ (VIEW) 10,810 rows │ │
│ │ open only │ │ 2024-11-01 → 2026-05-15 │ │
│ │ truncate+reload │ │ Lost 9,603 / Won 1,193 / Open 14 │ │
│ │ daily ~05:45 UTC │ │ = UNION of 3 six-month cohorts: │ │
│ │ last_modified:STR │ │ • bc_2024_2025 3,311 (Nov–Apr)│ │
│ │ synced_at:TIMESTAMP │ │ • bc_2025_summer 4,213 (May–Oct)│ │
│ └─────────┬──────────┘ │ • bc_active 3,286 (Nov–now)│ │
│ │ └──────────────┬───────────────────┘ │
│ │ ┌─ bc_2025_2026 3,133 (Nov25–Apr26) frozen, YoY-paired │
│ │ │ with bc_2024_2025 (same window, +1yr) for win/loss YoY │
│ │ └─ hubspot_deal_property_history_backcatalogue_* (same │
│ │ cohort split) = stage-transition history for velocity │
│ │ │
│ ENGAGEMENTS BRIEF AUDIT │
│ ┌────────────────────┐ ┌──────────────────────────────────┐ │
│ │ hubspot_engagements │ │ deal_briefs │ │
│ │ 230,342 rows │ │ audit log, 1 row / compose │ │
│ │ EMAIL110k CALL80k │ │ partition composed_at, 365d exp │ │
│ │ TASK23k NOTE17k MTG │ │ brief_json STRING + sources[] │ │
│ │ every-min cron drain│ └──────────────┬───────────────────┘ │
│ └─────────┬──────────┘ │ │
└────────────┼───────────────────────────────────────┼──────────────────────┘
│ │
┌─────────┴──────────┐ compose-deal-brief reads:
│ deals-vector-reindex│ hubspot_deals + backcatalogue_18m
│ (MANUAL, paginated) │ + engagements + active_clients
▼ │ + live GBP/crawl ──────────┐
┌────────────────────┐ │ ▼
│ VECTORIZE │ │ ┌────────────────────────┐
│ fcr-deal-history │◄──┘ │ KV deal-brief:<id> │
│ 1536d cosine │ │ hot blob, 35d TTL │
│ embeds open+closed │ │ chunked cron 08-20/2h │
│ meta idx: outcome, │ │ + prewarm + self-heal │
│ close_year, county, │ └───────────┬────────────┘
│ service_trade, owner│ │
└─────────┬──────────┘ │
│ ┌──────────────────────┐ │
│ │ fcr-site-portfolio │ (GP-listing-wide,│
│ │ VECTORIZE — similar │ feeds prospect_ │
│ │ FCR sites/clients) │ intel block) │
│ └──────────┬───────────┘ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────────┐
│ Roam advisor + HubSpot Prospect Card (get_deal_brief, etc) │
└──────────────────────────────────────────────────────────────┘
The 18-month closed-deal history
hubspot_deals_backcatalogue_18m is a VIEW equal to three sequential,
non-overlapping 6-month cohort tables UNIONed (3,311 + 4,213 + 3,286 = 10,810):
| Cohort table | Window | Role |
|---|---|---|
hubspot_deals_backcatalogue_2024_2025 |
2024-11 → 2025-04 | oldest 6m; also the frozen YoY baseline |
hubspot_deals_backcatalogue_2025_summer |
2025-05 → 2025-10 | middle 6m |
hubspot_deals_backcatalogue_active |
2025-11 → 2026-05-15 | current / rolling 6m |
hubspot_deals_backcatalogue_2025_2026 (NOT in the view) |
2025-11 → 2026-04 | frozen snapshot of the current window, paired 1:1 with 2024_2025 for apples-to-apples YoY win/loss conversion |
hubspot_deals_backcatalogue_staging |
— | load staging |
A parallel hubspot_deal_property_history_backcatalogue_* family (same cohort
split + its own _18m view) holds property-change history — stage
transitions over time — for cycle-time / conversion-velocity analysis,
separate from the deal snapshots above.
18m view outcome split (verified 2026-05-20): Lost 9,603 / Won 1,193 / Open 14 (the 14 "Open" are stragglers mid-transition between the open table and the back-catalogue).
Stores
| Store | Type | Rows | Role |
|---|---|---|---|
hubspot_deals |
BASE TABLE | 1,473 | Open deals only. Closed deals are NOT here — they live solely in the back-catalogue. |
hubspot_deals_backcatalogue_18m |
VIEW | 10,810 | Rolling 18m closed history (won + lost). UNION of 3 cohorts. |
hubspot_deals_backcatalogue_{2024_2025,2025_summer,active,2025_2026,staging} |
BASE | see above | Cohort tables behind the view + the frozen YoY pair. |
hubspot_deal_property_history_backcatalogue_* |
BASE + _18m VIEW |
— | Property-change/stage-transition history for velocity analysis. |
hubspot_engagements |
BASE TABLE | 230,342 | Note/email/call/meeting/task bodies. EMAIL 110k, CALL 80k, TASK 23k, NOTE 17k, MEETING 300. |
deal_briefs |
BASE TABLE | grows | Audit log of precomputed briefs (1 row/compose). Partitioned composed_at, clustered deal_id, 365-day expiry. brief_json STRING + sources_succeeded ARRAY. |
graham_hubspot_deals / _staging |
BASE | — | Separate working copy (not part of the advisor pipeline). |
prospect_engagement_events |
BASE | — | Telemetry, separate. |
Vector indexes (Cloudflare Vectorize) — two distinct, don't confuse
| Index | Binding | Embeds | Notes |
|---|---|---|---|
fcr-deal-history |
DEAL_VECTORIZE |
open (hubspot_deals) + closed (backcatalogue_18m) deals |
1536-dim, cosine, text-embedding-3-small. Metadata indexes: outcome, close_year, county, service_trade, owner_name. Powers "deals like this one / what happened." |
fcr-site-portfolio |
VECTORIZE |
GP-listing-wide sites (not deals) | Feeds the prospect_intel "similar clients" block. GP-listing-wide → contains FRLs + churned, so consumers must split paying vs not (see prospect-intel). |
Third index
fcr-company-knowledge(COMPANY_KNOWLEDGE_VECTORIZE) exists but is company-knowledge, not HubSpot.
KV
| Key | Namespace | TTL | Role |
|---|---|---|---|
deal-brief:<deal_id> |
CACHE |
35 days | Hot precomputed brief blob. Written by compose-deal-brief, read by get-deal-brief (the advisor's get_deal_brief tool). |
Refresh cadence
| Store | Refresh | Mechanism |
|---|---|---|
hubspot_deals (open) |
daily ~05:45 UTC | full truncate+reload (external sync) |
bc_active / 18m view |
rolls forward as deals close | external sync |
frozen cohorts (bc_2024_2025, bc_2025_2026) |
static | one-time snapshots for YoY |
hubspot_engagements |
continuous | every-minute worker cron drains stragglers (batch 3) |
deal_briefs + KV deal-brief: |
chunked cron 08:00–20:00 every 2h + prewarm-enqueue + read self-heal | compose-deal-brief via BRIEF_QUEUE |
fcr-deal-history (vector) |
manual only — no cron | deals-vector-reindex (paginated) |
fcr-site-portfolio (vector) |
manual reindex | n8n |
Deal-brief compose inputs
compose-deal-brief (brief v1.2.x) reads, in parallel:
hubspot_deals∪backcatalogue_18m— the deal row (open or closed)check-account(live CRM +active_clients) — CRM + active lines + MRR + ticketscheck-enrichment— cached GBP + keywords + ads + GSC + Ahrefsprospect-intel— category keywords, local area, similar paying clients (split from FRLs)deals-vector-search— similar past won/lost deals (fromfcr-deal-history)hubspot_engagements— latest ~8 deduped engagements, real conversation ranked over automated- live GBP (SerpAPI + Pleper) for unmapped prospects + home-page crawl
Output → KV deal-brief:<deal_id> (hot read) + deal_briefs BQ row (audit).
Gotchas
hubspot_dealsis open-only. Closed deals resolve only via the back-catalogue, soget_deal_by_idkeeps a live-HubSpot fallback.hubspot_engagementshas future-dated rows (max observed 2027-05-10) — a source data-quality artifact. Don't trust rawMAX(created_at).fcr-deal-historyreindex is manual, so brand-new deals lag the index;compose-deal-brieftolerates asimilar_dealsmiss.- CF Queue auto-pauses the brief consumer if a large backlog (1,000+) of slow messages lands at once — drip in ~80-chunks; clear a pause via consumer delete + redeploy. See the deal-brief design notes.
Generated 2026-05-20. Update the counts/ranges if the cohort windows roll.
FCR Dashboard documentation · generated from docs/ · keep counts verified, not guessed.