SnapSync / SayMore Deep Dive — Process & Methodology

End-to-end process notes for the SnapSync + SayMore + combined product analyses run 2026-05-18 → 2026-05-19. Captures data sources, SQL patterns, keyword-cleaning rules, SerpAPI run pipeline, combined-view merge logic, and the gotchas that took multiple iterations to find.

This doc is the canonical reference for re-running the analysis or extending it. The findings live in the three HTML reports — see Outputs below.


What this analysis produces

Three deep-dive reports treating SnapSync (2017-2018 onboarding) and SayMore (2019+ rebrand) as the same SMB-listings product family:

  1. SayMore standaloneSAYMORE_DEEP_DIVE.html — 1,648 subs, 2018-2026 YTD.
  2. SnapSync standaloneSNAPSYNC_DEEP_DIVE.html — 2,637 subs, 2017-2026 YTD.
  3. CombinedSNAPSYNC_SAYMORE_DEEP_DIVE.html — 4,111 unique subs across both. This is the canonical view for general audiences. The standalones are useful when an audience specifically asks about one product.

Each report has two halves:

  • Part 1: Retention — cohort year, MRR tier, survival curve, tenure bands, categories, prior portfolio, upsell effect, co-churn, locality.
  • Part 2: Ranking — visibility, organic page 1, Local Pack, brand check (proof of digital life), per-cohort breakdowns, named lists (invisible / GBP-fix / upsell-target / proof).

PPT-input briefs for the deck builders:

  • SAYMORE_DEEP_DIVE_PPT_UPDATE_BRIEF.md (declarative slide spec)
  • SNAPSYNC_SAYMORE_PPT_UPDATE_BRIEF.md (combined deck, 19 slides)

Data sources

BigQuery tables

Table Used for
DYNAMICS_DATA.Accounts_and_related_data Authoritative orders + sales-order-details + billing-groups + recurring-invoices — the raw source for first-billing dates, MRR, cancellation dates. The unified KB SQL unnests orders → salesOrderDetails → billingGroup → recurringInvoices.
fcr_operations.active_clients Daily snapshot of currently-billing subscribers per product group. Authoritative for "alive today" via MAX(snapshot_date) + product_group = 'PMS - Saymore' / 'PMS - Snapsync'.
SUPERSET_MASTER.dim_listings_with_subscriber_id Subscriber identity: AccountName (CRM), bizname (GP listing), main_category_name. Daily-refreshed bridge between CRM subscriber_id and GP listing_id.
fcr_operations.sitepro_listing_map SitePro URL by subscriber_id. Built from MegaDoc cache via n8n workflow LVzHQQ7xMQ3moLwO (currently unpublished — manual refresh). Used to identify the "has FCR-built website" cohort.
fcr_operations.GBP_INFORMATION Scraped GBP data fallback (place_id, lat, lng, website). Used when the GA-native mapping table doesn't cover a sub.
GOOGLE_ACCOUNT_DATA.GoogleAccount_Subscr_MAP_GBP_native Canonical GBP location-id ↔ subscriber mapping. Authoritative when present. Columns: locationId, SUBSCR_ID_Match, title.
GOOGLE_ACCOUNT_DATA.GMB_LOCATIONS Full GBP location data per locationId — metadata.placeId, latlng.{latitude,longitude}, websiteUri, categories, etc.
fcr_operations.KEYWORD_INTELLIGENCE GSC search-term data per subscriber. Sources: gsc_organic (SitePro GSC) + gsc_goldenpages (GP listing GSC). Used to pull each sub's top keywords by impressions.

Worker API

All BQ calls go through the worker endpoint /dashboard-bq-execute with x-api-key header. Pattern:

curl -s -X POST "https://fcr-dashboard-api.fcrmedia.workers.dev/dashboard-bq-execute" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $API_KEY" \
  -d '{"sql": "...", "limit": 20000}'

Default limit is 500. For any query that might return more than 500 rows, set limit explicitly. Hitting the default 500 silently truncates the result.

For SerpAPI we hit https://serpapi.com/search.json directly with the SERPAPI_KEY from .env.local. The worker has SERPAPI integration but for batch runs it's faster to bypass.

Local secrets

.env.local keys used:

  • x-api-key (note: leading space, lowercase, hyphens) — worker API key
  • SERPAPI_KEY — SerpAPI direct access

See memory reference_env_local_layout.md for the full layout.


Output artifacts

File What it is
docs/saymore-deep-dive-kb.csv Per-sub SayMore history. 1,648 rows, 25 columns.
docs/saymore-deep-dive-kb.md Methodology + unified SQL (predates this doc but still valid).
docs/snapsync-deep-dive-kb.csv Per-sub SnapSync history. 2,637 rows, 25 columns. Same schema as SayMore (just _sm_ss).
SAYMORE_RANK_KEYWORD_PREVIEW.csv Pre-SerpAPI keyword plan for SayMore. 849 rows with raw kws + cleaned variants + drop reasons.
SAYMORE_RANK_ANALYSIS_V2.csv Per-call SerpAPI results for SayMore alive subs. Long format, ~4,100 rows.
SNAPSYNC_RANK_KEYWORD_PREVIEW.csv Same shape for SnapSync.
SNAPSYNC_RANK_ANALYSIS.csv Same shape for SnapSync. ~4,500 rows.
SAYMORE_DEEP_DIVE.html SayMore-only report.
SNAPSYNC_DEEP_DIVE.html SnapSync-only report.
SNAPSYNC_SAYMORE_DEEP_DIVE.html Combined report (canonical).
SAYMORE_DEEP_DIVE_PPT_UPDATE_BRIEF.md Declarative slide spec for SayMore deck.
SNAPSYNC_SAYMORE_PPT_UPDATE_BRIEF.md Declarative slide spec for combined deck.
.tmp_combined_metrics.json Combined retention + rank metrics (working file).
.tmp_combined_pre_portfolio.json Per-sub prior-portfolio enrichment INCLUDING Print (working file).
.tmp_combined_pre_segments.json Pre-portfolio retention breakdown (working file).

Phase 1 — KB build (per product)

For each product (SnapSync, SayMore), build a per-subscriber history CSV with 25 columns covering identity, first line, lifecycle, portfolio context.

Script: scripts/snapsync-saymore/01-snapsync-kb-build.cjs (template — swap TARGET_GROUP = 'PMS - Snapsync''PMS - Saymore').

SQL pattern (full version in docs/saymore-deep-dive-kb.md):

WITH all_lines AS (
  -- Pulls every subscription/recurring line for every subscriber
  -- with positive MRR (totalLineItemAmount > 0, invoicePeriod > 0).
  SELECT ... FROM Accounts_and_related_data,
    UNNEST(orders), UNNEST(salesOrderDetails),
    UNNEST([billingGroup]), UNNEST(recurringInvoices)
  WHERE product.subscription = TRUE  -- ⚠️ See gotcha below
    AND ri.totalLineItemAmount > 0
    AND ri.invoicePeriod > 0
),
lines_mrr AS (
  -- Proportional MRR split across BG line items
  SELECT lf.*, ri_total * (base_amount / total_base) / ri_period AS monthly_value
  FROM lines_filtered JOIN bg_totals
),
ss_subs AS (
  SELECT subscriber_id, MIN(billing_start) first_ss_start
  FROM lines_mrr WHERE pg = 'PMS - Snapsync' GROUP BY subscriber_id
),
first_ss_line AS (...),   -- Per-sub first line details
all_ss        AS (...),   -- Per-sub aggregates (last_end, last_cancellation)
pre_ss_groups AS (...),   -- Distinct product_groups billing BEFORE first SS line
post_ss_added AS (...),   -- Product groups whose FIRST EVER billing postdates first SS
post_ss_lost  AS (...),   -- Product groups now ended (no open-ended line)
live_ss       AS (...),   -- subscriber_ids alive in active_clients today
biz           AS (...)    -- Identity from dim_listings_with_subscriber_id
SELECT ... FROM ss_subs LEFT JOIN ... 25 columns

Critical gotcha — subscription filter

The product.subscription = TRUE filter is right for MRR / tenure math but wrong for pre/post portfolio classification. It excludes:

Product group Distinct subs excluded
DIR - PRINT 79,314 (the entire legacy print directory book)
DMS - Websites (non-sub) 4,209
DIR - GPI (non-sub) 2,702
DIR - PBI (non-sub) 2,052
DMS - SEA (non-sub) 1,335
PMS - GSV 1,096
PMS - Saymore (non-sub — migration/rebrand lines) 2,945

Impact on the SS/SM combined deep-dive: the subscription-only filter would say 86% of listings sales were "net-new"; with Print included, 76% were INTO existing FCR clients. The original SnapSync 2017-2018 motion was a print-book penetration play, not an acquisition campaign.

Fix pattern when classifying pre/post portfolio: drop the subscription filter, use signed_date as the temporal anchor (because non-subscription products don't have recurring-invoice firstInvoiceOn dates). See Phase 6 / 07-combined-pre-recompute.cjs.

Saved in memory as feedback_kb_subscription_filter_excludes_print.md.

Bridge cohort mapping (SayMore only)

The SayMore KB augments cohort_year with bridge_cohort_year + bridge_class from per-year calls to /dashboard-revenue-bridge?productGroup=PMS - Saymore&fromDate=...&toDate=...&includeDetail=true. Buckets a sub into new / renewed / returned based on the revenue-bridge dashboard logic.

The SnapSync KB skips this step because the bridge endpoint returns zero detail for PMS - Snapsync. SnapSync uses naive cohort_year (first-billing year) only. Acceptable because SnapSync onboarding ended 2018; very few renewal cycles to classify.

Type-mismatch gotcha

active_clients.subscriber_id is STRING. Accounts_and_related_data.accountNumber is INT64. The live_ss / live_sm join needs SAFE_CAST(subscriber_id AS INT64) on the active_clients side or the join silently produces zero matches and currently_alive_* comes out false for every row. Caught this on the first SnapSync KB build run.


Phase 2 — Keyword preview (per product)

For each alive subscriber, pick 1 brand keyword + 3 non-branded service keywords for the SerpAPI test.

Script: 02a-saymore-keyword-preview.cjs / 02b-snapsync-keyword-preview.cjs.

Cohort: alive subs only

Query active_clients for the latest snapshot, filter product_group = 'PMS - Saymore' (or Snapsync). Returns ~822 SayMore / ~844 SnapSync subs. Join to:

  • dim_listings_with_subscriber_id — biz name + category
  • Accounts_and_related_data — city + county
  • sitepro_listing_map — sitepro_url (for website_type)
  • GBP_INFORMATION — gbp_website (fallback)

website_type classification:

  • 'sitepro' if sitepro_url present
  • 'external' if gbp_website present (and no sitepro)
  • 'none' otherwise

Keyword source: KEYWORD_INTELLIGENCE

Pull top 20 keywords per sub by impressions, deduped (SUM(impressions) GROUP BY subscriber_id, keyword). Limit ROW_NUMBER() <= 20 per sub. Chunk the query in batches of 80 sub_ids to avoid hitting the worker's row limit.

Brand / clean classification rules

For each candidate keyword:

  1. Brand kw classification is done at preview time but the preview's brand_kw_maps field is NOT used in production. See Phase 4 — brand maps reruns with the raw account name.

  2. Clean kw rule (used by kw1_maps, kw2_maps, kw3_maps):

    • Tokenize keyword (lowercase, strip punctuation, drop tokens <2 chars).
    • Strip from biz tokens (before classification): BIZ_STOPWORDS (ltd, the, &, co, ...), SERVICE_NOUNS (roofing, plumbing, cleaning, hair, boiler, ...), COMMON_LOCATION_TOKENS (dublin, cork, ballyfermot, kilcock, ...), and main_category tokens. The leftover is the "brand discriminator" set.
    • If every non-stopword kw token is in the brand-discriminator set → brand kw.
    • If kw has 0 brand tokens → clean kw.
    • If kw has SOME but not all brand tokens (mixed brand+service) → drop as contaminated.
  3. maps_kw build (for clean kws): strip city/county/location tokens + BIZ_STOPWORDS. Preserve "near me" as a unit (key gotcha — see below).

  4. serp_kw build: keep original kw, append city if absent. Matches Digital Footprint's "keyword + location" behaviour.

Per-sub picker

After classification, pick up to 3 clean kws by impressions descending, deduped by maps_kw (two raw kws normalising to the same maps query → keep highest-impression one). Status buckets:

Status Meaning
ready 3 clean kws + 1 brand kw available
ready_no_brand 3 clean kws, no proper brand discriminator (biz name was all generic tokens like "Galway Landscaping")
brand_only <3 clean kws but brand kw exists — runs brand only
insufficient_clean_kws <3 clean kws and no brand — skipped
no_ki_data Sub has zero KI rows — skipped

"near me" gotcha

Original cleaner stripped "near" as a location token, leaving "me" stranded, then re-appended "near me" → producing "chiropodist me near me". Fix: treat "near me" as a unit. Strip the pair from working tokens before location-stripping, then re-append if the original kw had it.

Brand-token over-stripping gotcha

Original brand classification used full biz name tokens as the brand set without subtracting service nouns / location / category tokens. Result: subs in service-named verticals (e.g. "Premium Roofing", where "roofing" is a service noun) had their brand set polluted, and every kw containing "roofing" got classified as mixed_brand and dropped. Fix: strip service nouns + category tokens + location tokens from the biz set before computing the brand-token set.


Phase 3 — SerpAPI run (clean kws + brand on SERP)

Script: 03a-saymore-rank-run.cjs / 03b-snapsync-rank-run.cjs.

For each runnable sub (status ∈ ready / ready_no_brand / brand_only):

  • 3 clean kws × google_maps engine = maps queries
  • 3 clean kws × google engine (only if sub has website) = organic queries
  • 1 brand kw × google engine (only if sub has website) = brand organic

Brand maps queries are skipped here — they need different matching logic (Phase 4).

SerpAPI parameters

{
  engine: "google_maps" | "google",
  device: "mobile",
  gl: "ie",
  hl: "en",
  google_domain: "google.ie",
  q: <query>,
  // For maps with lat/lng:
  ll: "@<lat>,<lng>,14z",
  // For maps without lat/lng: omit ll, fall back to gl=ie country scope.
}

Place_id + lat/lng enrichment

Per runnable sub, pull place_id + lat + lng from the GA-native mapping with GBP_INFORMATION fallback:

WITH map AS (
  SELECT SAFE_CAST(SUBSCR_ID_Match AS INT64) AS subscriber_id, ANY_VALUE(locationId) AS locationId
  FROM GoogleAccount_Subscr_MAP_GBP_native WHERE SUBSCR_ID_Match IS NOT NULL GROUP BY SUBSCR_ID_Match
),
loc AS (
  SELECT locationId,
         ANY_VALUE(metadata.placeId) AS place_id,
         ANY_VALUE(latlng.latitude) AS lat,
         ANY_VALUE(latlng.longitude) AS lng
  FROM GMB_LOCATIONS GROUP BY locationId
),
native AS (SELECT m.subscriber_id, l.place_id, l.lat, l.lng FROM map m LEFT JOIN loc l USING(locationId) WHERE m.subscriber_id IN (...)),
fallback AS (SELECT ... FROM GBP_INFORMATION WHERE subscriber_id IN (...) GROUP BY subscriber_id)
SELECT COALESCE(n.subscriber_id, f.subscriber_id) AS subscriber_id,
       COALESCE(n.place_id, f.place_id), COALESCE(n.lat, f.lat), COALESCE(n.lng, f.lng)
FROM native FULL OUTER JOIN fallback ON ...

GMB_LOCATIONS columns to know:

  • metadata.placeId — the Google Place ID (NOT a top-level place_id field)
  • latlng.latitude / latlng.longitude — lat/lng nested struct
  • locationId — the GBP location ID (different from place_id)

GBP_INFORMATION columns:

  • place_id (STRING, top-level)
  • lat (FLOAT64), lng (FLOAT64)

location parameter trap

SerpAPI's location parameter requires a canonical name from their gazetteer. "Ballyfermot, Dublin, Ireland" is rejected with HTTP 400. Don't use location for maps/organic queries. Rely on gl=ie + google_domain=google.ie for country scope. For maps, use ll=@lat,lng,14z when you have coordinates; without coordinates, accept that the local pack is "anywhere in Ireland" rather than city-specific.

Match logic

  • Organic match: by hostname. Strip www., compare against sitepro_url host (if website_type=sitepro) or gbp_website host (if external). Match if equal or one ends with .{other}.
  • Maps match (clean kws): by place_id against local_results[].place_id. Fallback: 60% token-overlap match on local_results[].title vs biz name.

Concurrency

5 SerpAPI calls in flight via a manual worker-pool pattern. Retries 429 / 5xx with exponential backoff (max 2 retries). Append each result to the output CSV as it lands — gives resume-on-restart via the existing-rows check at startup.

Resume logic

On script restart, parse the existing output CSV. Build a done = Set('<subscriber_id>|<kw_label>|<surface>'). Skip any call already in done. This means a partial / crashed run can be resumed by just re-running.

Gotcha: rows with errors (error != '') need to be filtered OUT of the done-set, otherwise transient SerpAPI errors get baked in as "skip this forever". Scrub error rows before resuming.


Phase 4 — Brand maps refix (raw biz name + knowledge-panel matcher)

Script: 04a-saymore-brand-maps-refix.cjs / 04b-snapsync-brand-maps-refix.cjs.

Brand-on-maps needs a separate run with different logic because:

  1. Brand kw = raw account name. For an SME, the brand IS the business name (buildBrandKw(account_name): lowercase, strip first comma onward, strip legal suffixes ltd|limited|the|and|&|co|company|inc|..., collapse whitespace). Do NOT strip locations/service nouns from biz names — "Kilcock Radiators" IS the brand, not "radiators" with "Kilcock" stripped as a location.

  2. Knowledge-panel matcher. For brand-specific queries, Google often doesn't return a local_results array (which would be the 3-pack). Instead it returns a singular place_results knowledge panel describing the specific business. Default SerpAPI matcher checking local_results only would report 76% "invisible" when reality is 96% visible — the data was there, just in a different field.

    Matcher priority:

    if (json.place_results) {
      // Knowledge panel — Google's response for brand searches
      if (place_results.place_id === expected_place_id) → MATCH (rank 1, "knowledge_panel_place_id")
      else if (name_overlap >= 0.6) → MATCH (rank 1, "knowledge_panel_name")
      else → no match
    } else if (json.knowledge_graph?.title) {
      // Knowledge graph fallback
      ...
    } else {
      // local_results scan (place_id then fuzzy name)
      ...
    }
    

Brand+organic queries use the standard hostname matcher from Phase 3 — no special logic needed.

Process

  1. Delete existing brand+maps rows from the rank CSV (so we don't double-count).
  2. For each runnable sub with a buildBrandKw(...) result, run google_maps engine with the raw brand kw + lat/lng.
  3. Apply the knowledge-panel-aware matcher.
  4. Append new rows to the rank CSV.

Saved as memory feedback_sme_brand_is_bizname.md.


Phase 5 — Retention metrics (per product, from KB CSV)

Scripts: 06-snapsync-retention-analyze.cjs (SM retention metrics are computed inline in 05a-saymore-rank-analyze.cjs).

Reads docs/{ss,sm}-deep-dive-kb.csv, computes:

Metric How
Total / alive / retention currently_alive_ss === 'true' (CSV string) — NOT === true (boolean)
Median / avg tenure monthsBetween(first_*_billing_start, NOW if alive else last_*_end), capping 2099 sentinel dates at NOW
Cohort year Naive: year of first billing. For SnapSync, bucket cohort_year < 2019 as "Pre-2019 (legacy carry-in)" because the 2017-18 spike is the order-management migration registering existing print clients, not real signings.
Existing vs new pre_*_groups != '' (from KB) — but this misses Print, see Phase 6 for the corrected combined cut
Standalone / cross-sell / anchor / fully-bundled Boolean cross of had_pre + had_post
MRR tier <€30 / €30-49 / €50-99 / €100+ on first_*_mrr
Survival curve Y+0 → Y+8 For each cohort year y, % of subs still alive at end of (y + offset). Counts a sub as "alive at EOY" if currently_alive OR last_end > EOY
Tenure bands 0-6 mo, 6-12, 12-18, 18-24, 24-36, 3-4yr, 4-5yr, 5-6yr, 6-7yr, 7-8yr, 8yr+
Category × retention Group by main_category n≥10 (or n≥15 for combined). Compute retention + avg tenure + avg MRR per category.
Pre-portfolio × retention KB's pre_*_groups field — classify into segments. Limited because of subscription filter — see Phase 6.
Upsell effect post_*_added_groups != ''
Co-churn For churned subs, count occurrences in post_*_lost_groups
Locality KB's locality field (Metro / Other, computed from city regex)

Output: .tmp_{ss,sm}_retention_metrics.json for the HTML build.


Phase 6 — Combined view

Scripts: 09-combined-analyze.cjs, 07-combined-pre-recompute.cjs, 08-combined-pre-retention.cjs.

Merge step

Build combined[] from union of saymore-deep-dive-kb.csv + snapsync-deep-dive-kb.csv keyed by subscriber_id. Per merged sub:

Combined field Logic
Identity (name, biz, category, city, county) Prefer SM if present (newer data), else SS
first_start min(first_sm_billing_start, first_ss_billing_start)
last_end max(last_sm_end, last_ss_end, last_sm_cancellation, last_ss_cancellation) — drop 2099 sentinels
alive_either alive_sm OR alive_ss
tenure_months From first_start to (NOW if alive_either else last_end) — spans BOTH products for migrators
first_mrr MRR of the EARLIEST product (SM if SM started first, else SS)
cohort_year Year of first_start (earliest product)
product_history One of 8 buckets — see below

Product-history buckets

Bucket Definition
ss_only_alive Has SS KB row, no SM KB row, alive_ss
ss_only_churned Has SS KB row, no SM KB row, not alive_ss
sm_only_alive Has SM KB row, no SS KB row, alive_sm
sm_only_churned Has SM KB row, no SS KB row, not alive_sm
migrated_ss_to_sm Has both KB rows, alive_sm, NOT alive_ss, AND ss_start < sm_start
had_both_now_ss_only Has both KB rows, NOT alive_sm, alive_ss
dual_active Has both KB rows, alive_sm AND alive_ss
had_both_both_churned Has both KB rows, neither alive

The zero-value undercount caveat

CRM active_clients shows 43 subs alive on both products. The combined analysis sees 38 (18 dual_active + 20 had_both_now_ss_only). The 5-sub gap is zero-value billing lines — typically the SayMore Ultimate rebrand layered onto an existing SnapSync line at €0 MRR. The KB SQL filters totalLineItemAmount > 0 (correct for MRR math) so these lines drop out.

Similarly: historical CRM shows ~172 ever-upsold from SS → SM. The combined view sees 88 + 20 + 18 = 126. The 46-sub gap is the same zero-value upgrade pattern.

To dovetail properly (separate exercise): rebuild the KB SQL to retain zero-value lines as marker rows linking SS MRR to the SM product label.

Pre-portfolio with Print (the big methodology correction)

The KB's pre_*_groups field is built from subscription-only lines, which silently excludes the entire DIR - PRINT book — 79,314 subs across CRM. For a "did this customer have a prior FCR relationship?" question, that's wrong.

07-combined-pre-recompute.cjs rebuilds pre-portfolio for every combined sub WITHOUT the subscription filter, using signed_date < first_listings_billing_start as the temporal anchor:

WITH listings_subs AS (
  -- union of SS + SM first-billing dates
),
all_lines AS (
  SELECT a.accountNumber, o.signedDate, ol.product.productGroup.name AS pg
  FROM Accounts_and_related_data a, UNNEST(a.orders) o, UNNEST(o.salesOrderDetails) ol
  WHERE pg IS NOT NULL AND pg NOT IN ('PMS - Saymore', 'PMS - Snapsync')
)
SELECT
  ls.subscriber_id,
  STRING_AGG(DISTINCT al.pg, '; ') AS all_pre_groups,
  LOGICAL_OR(al.pg = 'DIR - PRINT') AS had_print,
  LOGICAL_OR(al.pg LIKE 'DIR -%')   AS had_any_directory,
  LOGICAL_OR(al.pg = 'DMS - Websites') AS had_websites,
  LOGICAL_OR(al.pg = 'DMS - SEA') AS had_sea,
  LOGICAL_OR(al.pg = 'DMS - SEO') AS had_seo
FROM listings_subs ls
LEFT JOIN all_lines al ON al.subscriber_id = ls.subscriber_id
                       AND al.signed_date < ls.first_listings_start
GROUP BY ls.subscriber_id, ls.first_listings_start

Pre-portfolio segments (mutually exclusive, prioritised)

The segments are prioritised by digital depth, so a sub with both Websites AND Print is counted in "Had Websites" — gives the cleanest read on whether digital products lift retention beyond Print alone.

Segment Priority n Retention Tenure
Had SEA (± print/dir) 1 915 49% 41 mo
Had Websites (± print/dir, no SEA) 2 709 41% 35 mo
Had Print + GPI/PBI (no digital sub) 3 1,299 40% 56 mo
Had directory (GPI/PBI, no print) 4 175 28% 26 mo
No prior FCR products 5 990 27% 22 mo
Other (GSV, SEO alone) 6 23 4%

Tally bug to watch for

When tallying boolean flags from BQ JSON, CSV/JSON cells come back as the strings "true" / "false", not booleans. if (r.had_print) is truthy for the string "false". Always explicitly check r.had_print === true || r.had_print === "true". Caught this on the first pre-recompute run when all flag counts came back identical at 3,222.

Standalone vs digitally-bundled

The original KB's pre_*_groups + post_*_added_groups is subscription-only. Use that as-is for the standalone vs digitally-bundled cut (Section 4 of the combined HTML) — because the bundling question is specifically about active digital management, not Print. Print is excluded from this cut by design.

So the combined view has TWO separate cuts:

  • Section 3: Existing vs net-new uses the Phase 6 recompute (includes Print).
  • Section 4: Standalone vs digitally-bundled uses the KB's subscription-filtered fields (excludes Print).

Document the definitional difference on each slide so readers don't confuse them.

Combined rank merge

rankBySub map keyed by subscriber_id. For each row from SAYMORE_RANK_ANALYSIS_V2.csv and SNAPSYNC_RANK_ANALYSIS.csv:

  • Track smSubIds set from the SM rank CSV.
  • For each SS rank row: if sub is in smSubIds, skip (SM data preferred for overlap — newer test, more current). Otherwise append.

Bug fixed mid-analysis: the original merge logic only added the FIRST SS row per sub when the sub had no SM data — because the second-loop conditional checked for a missing _source flag that was never set. Result: ss_only_alive subs had only 1 row of rank data instead of 4-7. Visibility came out at 66% instead of the real 88%. Fix: use a smSubIds set as the dedup key instead of inspecting the first existing row.

Rank by product_history

Joining the merged rank data to the product_history bucket surfaces an important pattern:

Bucket n Visible Local Pack Invisible
sm_only_alive 645 84% 61% 16%
ss_only_alive 696 88% 67% 12%
migrated_ss_to_sm 88 94% 63% 6%
had_both_now_ss_only 20 100% 75% 0%
dual_active 18 89% 67% 11%

Migration cohort (88) has the highest visibility — they got proper management attention. The two big buckets (SM-only-alive and SS-only-alive) are where the long-tail intervention work lives.


Phase 7 — Output (HTML + PPT brief)

HTML structure (mirrors across all three reports):

  1. Header — title, subtitle (totals + date), TOC.
  2. Section 1 — Headline stat-grid (4 cells).
  3. Section 2 — Cohort split (varies by report — bridge cohort year for SM, carry-in vs new for SS, product-history buckets for combined).
  4. Sections 3-12 — Retention dimensions: existing-vs-new, standalone-vs-bundled, MRR tier, survival, tenure, categories, prior portfolio, upsell, co-churn, locality.
  5. Section 13 — Ranking headline + cohort split.
  6. Section 13a — Brand check (proof of digital life).
  7. Sections 14-17 — SitePro vs no-site, by-cohort-year, MRR × rank, vertical winners/losers.
  8. Section 18 — Named lists (4-6 cards).
  9. Section 19 — Strategic takeaways (3 panels: broken / works / discuss).
  10. Footer — methodology + source CSVs.

PPT briefs are declarative slide specs — each section in the HTML maps to one slide. The combined PPT brief has 19 slides + cover + methodology footer.


Reproducing the analysis end-to-end

Scripts live in scripts/snapsync-saymore/ (see that folder's README for the full table). Each defines ROOT = path.resolve(__dirname, "..", "..") so file paths resolve to the repo root regardless of cwd. Run in order:

# 1. KB build per product (~3 min each)
node scripts/snapsync-saymore/01-snapsync-kb-build.cjs      # → docs/snapsync-deep-dive-kb.csv
# SM KB built earlier via a .tmp_sm_kb_build script (body in git history; methodology in
# docs/saymore-deep-dive-kb.md). To rebuild, adapt 01 — swap PMS - Snapsync → PMS - Saymore,
# _ss → _sm column names, and re-enable the bridge-cohort fetch.

# 2. Keyword preview per product (~3 min each)
node scripts/snapsync-saymore/02a-saymore-keyword-preview.cjs   # → SAYMORE_RANK_KEYWORD_PREVIEW.csv
node scripts/snapsync-saymore/02b-snapsync-keyword-preview.cjs  # → SNAPSYNC_RANK_KEYWORD_PREVIEW.csv

# 3. SerpAPI run (clean kws + brand serp). ~25 min each, ~$20 each
node scripts/snapsync-saymore/03a-saymore-rank-run.cjs   # → SAYMORE_RANK_ANALYSIS_V2.csv
node scripts/snapsync-saymore/03b-snapsync-rank-run.cjs  # → SNAPSYNC_RANK_ANALYSIS.csv

# 4. Brand maps refix (raw biz name + knowledge-panel matcher). ~5 min each, ~$4 each
node scripts/snapsync-saymore/04a-saymore-brand-maps-refix.cjs
node scripts/snapsync-saymore/04b-snapsync-brand-maps-refix.cjs

# 5. Rank + retention metrics (<1 min each)
node scripts/snapsync-saymore/05a-saymore-rank-analyze.cjs   # → .tmp_sm_rank_metrics.json
node scripts/snapsync-saymore/05b-snapsync-rank-analyze.cjs  # → .tmp_ss_rank_metrics.json
node scripts/snapsync-saymore/06-snapsync-retention-analyze.cjs  # → .tmp_ss_retention_metrics.json

# 6. Combined view (~3 min)
node scripts/snapsync-saymore/07-combined-pre-recompute.cjs   # → .tmp_combined_pre_portfolio.json (Print-inclusive)
node scripts/snapsync-saymore/08-combined-pre-retention.cjs   # → .tmp_combined_pre_segments.json
node scripts/snapsync-saymore/09-combined-analyze.cjs         # → .tmp_combined_metrics.json

# 7. HTMLs are hand-built from the JSON metrics (no auto-generator).
# Edit SAYMORE_DEEP_DIVE.html / SNAPSYNC_DEEP_DIVE.html / SNAPSYNC_SAYMORE_DEEP_DIVE.html
# with the numbers from the metrics JSON files.

Total wall-clock: ~80-90 minutes for a fresh end-to-end run. SerpAPI cost: ~$45-50 across both products. The .tmp_*.json files are intermediate working artifacts written to the repo root.


Gotchas summary (the lessons earned)

  1. product.subscription = TRUE excludes Print. Right for MRR math, wrong for existing-vs-net-new. The original analysis claimed 86% net-new; with Print included, 76% were existing FCR clients. → feedback_kb_subscription_filter_excludes_print.md

  2. active_clients.subscriber_id is STRING; accountNumber is INT64. Always SAFE_CAST(... AS INT64) on the active_clients side before joining.

  3. product.subscription = TRUE exists on multiple PG names. Both PMS - Saymore (subscription=true) and PMS - Saymore (subscription=false) exist in CRM. The non-subscription rows are zero-value migration/rebrand lines. The KB filter drops them. For dovetail work they need a separate channel.

  4. SerpAPI location parameter requires canonical names. "Ballyfermot, Dublin, Ireland" returns HTTP 400. Use ll=@lat,lng,14z for maps; rely on gl=ie for country scope on organic.

  5. GMB_LOCATIONS schema: metadata.placeId (not top-level), latlng.latitude / latlng.longitude (nested struct).

  6. GoogleAccount_Subscr_MAP_GBP_native column is SUBSCR_ID_Match (not subscriber_id).

  7. Brand kw for an SME is the raw business name. Don't strip locations / service nouns / category — "Kilcock Radiators" IS the brand, not a service. Brand-cleaning logic is for value-test kws only. → feedback_sme_brand_is_bizname.md

  8. SerpAPI returns place_results (singular) for brand-specific maps queries, not local_results. The matcher needs to check place_results.place_id first. Without this fallback, brand-maps hit rate looks like 24% when reality is 96%.

  9. CSV cells come back as strings, not booleans. r.had_print === true || r.had_print === "true". Always.

  10. "near me" must be preserved as a unit in keyword cleaning. Stripping "near" alone leaves "me" stranded, then re-appending "near me" produces "service me near me".

  11. Worker dashboard-bq-execute default limit is 500. Pass limit: 20000 (or more) explicitly for big queries. Hitting the default silently truncates.

  12. CSV parser must handle quoted commas. am field is "Last, First" format — commas inside quotes break naive split-by-comma. Use a proper multiline-aware parser.

  13. Resume logic must filter out error rows. Otherwise transient SerpAPI errors get baked in as "permanent skip."

  14. Combined rank merge needs an explicit overlap-skip. Using "is the first existing row from source X?" as a dedup key fails because no such flag is set on the first append.

  15. Cohort year < 2019 for SnapSync is the order-management migration, not real signings. Bucket as "Pre-2019 (legacy carry-in)" rather than letting 998 + 1,656 dominate the cohort tables.

  16. Pre-portfolio classification needs signed_date as the temporal anchor, not firstInvoiceOn. Non-subscription products don't have recurring invoices.

  17. Pre-portfolio segments should be prioritised by digital depth. A sub with both Websites and Print should be counted in "Had Websites" — the digital product is the stronger anchor.


Future improvements / known gaps

Gap What it would unlock
Zero-value migration lines in KB Proper dovetail count (43 dual-active not 38, ~172 migrations not 126). Needs a KB SQL variant that retains zero-MRR lines as marker rows linking SS MRR to SM product label.
Bridge cohort for SnapSync Match-quality reconciliation between cohort_year and bridge cohort year. Endpoint currently returns zero rows for SS.
Promote .tmp_*.cjs scripts to permanent Survive cleanup; let CI re-run quarterly. Move under scripts/snapsync-saymore-* with a README.
Auto-generated HTML Currently HTMLs are hand-edited from the JSON metrics. A small templater would let the HTMLs regenerate from a single source of truth.
Vertical sample sizes < 5 The "categories" cut requires n≥10 (standalone) or n≥15 (combined) to avoid noise. Higher-n thresholds drop interesting niche verticals. Worth a Bayesian shrinkage estimator for small-n categories.
Print sunset timeline modelling The 1,299 "Print + GPI/PBI" subs (32% of book) are aging out as Print declines. Project the digital-anchor migration runway.

Memory references

  • feedback_kb_subscription_filter_excludes_print.md — the Phase 6 subscription-filter gotcha
  • feedback_sme_brand_is_bizname.md — Phase 4 raw-name brand-check rule
  • feedback_saymore_snapsync.md — pre-existing memory that SnapSync and SayMore are the same product line at different labels
  • reference_env_local_layout.md — .env.local key format
  • reference_sitepro_listing_map.md — sitepro_listing_map table schema
  • reference_dim_listings_with_subscriber_id.md — identity bridge table
  • feedback_lrc_keyword_location_model.md — bare-service-term rule for LRC keywords (related to Phase 2 cleaning)

FCR Dashboard documentation · generated from docs/ · keep counts verified, not guessed.

Ask the docsRAG over this site
Ask anything about the FCR Dashboard platform — architecture, BigQuery, the worker routes, billing rules, the LRC stack, scoring… Answers are grounded in this documentation, with source links.
How does the deal-brief refresh work? Which routes are Worker vs n8n? How is account health scored?