SnapSync / SayMore Deep Dive — Process & Methodology

End-to-end process notes for the SnapSync + SayMore + combined product analyses run 2026-05-18 → 2026-05-19. Captures data sources, SQL patterns, keyword-cleaning rules, SerpAPI run pipeline, combined-view merge logic, and the gotchas that took multiple iterations to find.

This doc is the canonical reference for re-running the analysis or extending it. The findings live in the three HTML reports — see Outputs below.

What this analysis produces

Three deep-dive reports treating SnapSync (2017-2018 onboarding) and SayMore (2019+ rebrand) as the same SMB-listings product family:

SayMore standalone — SAYMORE_DEEP_DIVE.html — 1,648 subs, 2018-2026 YTD.
SnapSync standalone — SNAPSYNC_DEEP_DIVE.html — 2,637 subs, 2017-2026 YTD.
Combined — SNAPSYNC_SAYMORE_DEEP_DIVE.html — 4,111 unique subs across both. This is the canonical view for general audiences. The standalones are useful when an audience specifically asks about one product.

Each report has two halves:

Part 1: Retention — cohort year, MRR tier, survival curve, tenure bands, categories, prior portfolio, upsell effect, co-churn, locality.
Part 2: Ranking — visibility, organic page 1, Local Pack, brand check (proof of digital life), per-cohort breakdowns, named lists (invisible / GBP-fix / upsell-target / proof).

PPT-input briefs for the deck builders:

SAYMORE_DEEP_DIVE_PPT_UPDATE_BRIEF.md (declarative slide spec)
SNAPSYNC_SAYMORE_PPT_UPDATE_BRIEF.md (combined deck, 19 slides)

Data sources

BigQuery tables

Table	Used for
`DYNAMICS_DATA.Accounts_and_related_data`	Authoritative orders + sales-order-details + billing-groups + recurring-invoices — the raw source for first-billing dates, MRR, cancellation dates. The unified KB SQL unnests `orders → salesOrderDetails → billingGroup → recurringInvoices`.
`fcr_operations.active_clients`	Daily snapshot of currently-billing subscribers per product group. Authoritative for "alive today" via `MAX(snapshot_date)` + `product_group = 'PMS - Saymore'` / `'PMS - Snapsync'`.
`SUPERSET_MASTER.dim_listings_with_subscriber_id`	Subscriber identity: AccountName (CRM), bizname (GP listing), main_category_name. Daily-refreshed bridge between CRM `subscriber_id` and GP `listing_id`.
`fcr_operations.sitepro_listing_map`	SitePro URL by subscriber_id. Built from MegaDoc cache via n8n workflow `LVzHQQ7xMQ3moLwO` (currently unpublished — manual refresh). Used to identify the "has FCR-built website" cohort.
`fcr_operations.GBP_INFORMATION`	Scraped GBP data fallback (place_id, lat, lng, website). Used when the GA-native mapping table doesn't cover a sub.
`GOOGLE_ACCOUNT_DATA.GoogleAccount_Subscr_MAP_GBP_native`	Canonical GBP location-id ↔ subscriber mapping. Authoritative when present. Columns: `locationId`, `SUBSCR_ID_Match`, `title`.
`GOOGLE_ACCOUNT_DATA.GMB_LOCATIONS`	Full GBP location data per locationId — `metadata.placeId`, `latlng.{latitude,longitude}`, `websiteUri`, `categories`, etc.
`fcr_operations.KEYWORD_INTELLIGENCE`	GSC search-term data per subscriber. Sources: `gsc_organic` (SitePro GSC) + `gsc_goldenpages` (GP listing GSC). Used to pull each sub's top keywords by impressions.

Worker API

All BQ calls go through the worker endpoint /dashboard-bq-execute with x-api-key header. Pattern:

curl -s -X POST "https://fcr-dashboard-api.fcrmedia.workers.dev/dashboard-bq-execute" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $API_KEY" \
  -d '{"sql": "...", "limit": 20000}'

Default limit is 500. For any query that might return more than 500 rows, set limit explicitly. Hitting the default 500 silently truncates the result.

For SerpAPI we hit https://serpapi.com/search.json directly with the SERPAPI_KEY from .env.local. The worker has SERPAPI integration but for batch runs it's faster to bypass.

Local secrets

.env.local keys used:

x-api-key (note: leading space, lowercase, hyphens) — worker API key
SERPAPI_KEY — SerpAPI direct access

See memory reference_env_local_layout.md for the full layout.

Output artifacts

File	What it is
`docs/saymore-deep-dive-kb.csv`	Per-sub SayMore history. 1,648 rows, 25 columns.
`docs/saymore-deep-dive-kb.md`	Methodology + unified SQL (predates this doc but still valid).
`docs/snapsync-deep-dive-kb.csv`	Per-sub SnapSync history. 2,637 rows, 25 columns. Same schema as SayMore (just `_sm` → `_ss`).
`SAYMORE_RANK_KEYWORD_PREVIEW.csv`	Pre-SerpAPI keyword plan for SayMore. 849 rows with raw kws + cleaned variants + drop reasons.
`SAYMORE_RANK_ANALYSIS_V2.csv`	Per-call SerpAPI results for SayMore alive subs. Long format, ~4,100 rows.
`SNAPSYNC_RANK_KEYWORD_PREVIEW.csv`	Same shape for SnapSync.
`SNAPSYNC_RANK_ANALYSIS.csv`	Same shape for SnapSync. ~4,500 rows.
`SAYMORE_DEEP_DIVE.html`	SayMore-only report.
`SNAPSYNC_DEEP_DIVE.html`	SnapSync-only report.
`SNAPSYNC_SAYMORE_DEEP_DIVE.html`	Combined report (canonical).
`SAYMORE_DEEP_DIVE_PPT_UPDATE_BRIEF.md`	Declarative slide spec for SayMore deck.
`SNAPSYNC_SAYMORE_PPT_UPDATE_BRIEF.md`	Declarative slide spec for combined deck.
`.tmp_combined_metrics.json`	Combined retention + rank metrics (working file).
`.tmp_combined_pre_portfolio.json`	Per-sub prior-portfolio enrichment INCLUDING Print (working file).
`.tmp_combined_pre_segments.json`	Pre-portfolio retention breakdown (working file).

Phase 1 — KB build (per product)

For each product (SnapSync, SayMore), build a per-subscriber history CSV with 25 columns covering identity, first line, lifecycle, portfolio context.

Script: scripts/snapsync-saymore/01-snapsync-kb-build.cjs (template — swap TARGET_GROUP = 'PMS - Snapsync' ↔ 'PMS - Saymore').

SQL pattern (full version in docs/saymore-deep-dive-kb.md):

WITH all_lines AS (
  -- Pulls every subscription/recurring line for every subscriber
  -- with positive MRR (totalLineItemAmount > 0, invoicePeriod > 0).
  SELECT ... FROM Accounts_and_related_data,
    UNNEST(orders), UNNEST(salesOrderDetails),
    UNNEST([billingGroup]), UNNEST(recurringInvoices)
  WHERE product.subscription = TRUE  -- ⚠️ See gotcha below
    AND ri.totalLineItemAmount > 0
    AND ri.invoicePeriod > 0
),
lines_mrr AS (
  -- Proportional MRR split across BG line items
  SELECT lf.*, ri_total * (base_amount / total_base) / ri_period AS monthly_value
  FROM lines_filtered JOIN bg_totals
),
ss_subs AS (
  SELECT subscriber_id, MIN(billing_start) first_ss_start
  FROM lines_mrr WHERE pg = 'PMS - Snapsync' GROUP BY subscriber_id
),
first_ss_line AS (...),   -- Per-sub first line details
all_ss        AS (...),   -- Per-sub aggregates (last_end, last_cancellation)
pre_ss_groups AS (...),   -- Distinct product_groups billing BEFORE first SS line
post_ss_added AS (...),   -- Product groups whose FIRST EVER billing postdates first SS
post_ss_lost  AS (...),   -- Product groups now ended (no open-ended line)
live_ss       AS (...),   -- subscriber_ids alive in active_clients today
biz           AS (...)    -- Identity from dim_listings_with_subscriber_id
SELECT ... FROM ss_subs LEFT JOIN ... 25 columns

Critical gotcha — subscription filter

The product.subscription = TRUE filter is right for MRR / tenure math but wrong for pre/post portfolio classification. It excludes:

Product group	Distinct subs excluded
DIR - PRINT	79,314 (the entire legacy print directory book)
DMS - Websites (non-sub)	4,209
DIR - GPI (non-sub)	2,702
DIR - PBI (non-sub)	2,052
DMS - SEA (non-sub)	1,335
PMS - GSV	1,096
PMS - Saymore (non-sub — migration/rebrand lines)	2,945

Impact on the SS/SM combined deep-dive: the subscription-only filter would say 86% of listings sales were "net-new"; with Print included, 76% were INTO existing FCR clients. The original SnapSync 2017-2018 motion was a print-book penetration play, not an acquisition campaign.

Fix pattern when classifying pre/post portfolio: drop the subscription filter, use signed_date as the temporal anchor (because non-subscription products don't have recurring-invoice firstInvoiceOn dates). See Phase 6 / 07-combined-pre-recompute.cjs.

Saved in memory as feedback_kb_subscription_filter_excludes_print.md.

Bridge cohort mapping (SayMore only)

The SayMore KB augments cohort_year with bridge_cohort_year + bridge_class from per-year calls to /dashboard-revenue-bridge?productGroup=PMS - Saymore&fromDate=...&toDate=...&includeDetail=true. Buckets a sub into new / renewed / returned based on the revenue-bridge dashboard logic.

The SnapSync KB skips this step because the bridge endpoint returns zero detail for PMS - Snapsync. SnapSync uses naive cohort_year (first-billing year) only. Acceptable because SnapSync onboarding ended 2018; very few renewal cycles to classify.

Type-mismatch gotcha

active_clients.subscriber_id is STRING. Accounts_and_related_data.accountNumber is INT64. The live_ss / live_sm join needs SAFE_CAST(subscriber_id AS INT64) on the active_clients side or the join silently produces zero matches and currently_alive_* comes out false for every row. Caught this on the first SnapSync KB build run.

Phase 2 — Keyword preview (per product)

For each alive subscriber, pick 1 brand keyword + 3 non-branded service keywords for the SerpAPI test.

Script: 02a-saymore-keyword-preview.cjs / 02b-snapsync-keyword-preview.cjs.

Cohort: alive subs only

Query active_clients for the latest snapshot, filter product_group = 'PMS - Saymore' (or Snapsync). Returns ~822 SayMore / ~844 SnapSync subs. Join to:

dim_listings_with_subscriber_id — biz name + category
Accounts_and_related_data — city + county
sitepro_listing_map — sitepro_url (for website_type)
GBP_INFORMATION — gbp_website (fallback)

website_type classification:

'sitepro' if sitepro_url present
'external' if gbp_website present (and no sitepro)
'none' otherwise

Keyword source: KEYWORD_INTELLIGENCE

Pull top 20 keywords per sub by impressions, deduped (SUM(impressions) GROUP BY subscriber_id, keyword). Limit ROW_NUMBER() <= 20 per sub. Chunk the query in batches of 80 sub_ids to avoid hitting the worker's row limit.

Brand / clean classification rules

For each candidate keyword:

Brand kw classification is done at preview time but the preview's brand_kw_maps field is NOT used in production. See Phase 4 — brand maps reruns with the raw account name.
Clean kw rule (used by kw1_maps, kw2_maps, kw3_maps):
- Tokenize keyword (lowercase, strip punctuation, drop tokens <2 chars).
- Strip from biz tokens (before classification): BIZ_STOPWORDS (ltd, the, &, co, ...), SERVICE_NOUNS (roofing, plumbing, cleaning, hair, boiler, ...), COMMON_LOCATION_TOKENS (dublin, cork, ballyfermot, kilcock, ...), and main_category tokens. The leftover is the "brand discriminator" set.
- If every non-stopword kw token is in the brand-discriminator set → brand kw.
- If kw has 0 brand tokens → clean kw.
- If kw has SOME but not all brand tokens (mixed brand+service) → drop as contaminated.
maps_kw build (for clean kws): strip city/county/location tokens + BIZ_STOPWORDS. Preserve "near me" as a unit (key gotcha — see below).
serp_kw build: keep original kw, append city if absent. Matches Digital Footprint's "keyword + location" behaviour.

Per-sub picker

After classification, pick up to 3 clean kws by impressions descending, deduped by maps_kw (two raw kws normalising to the same maps query → keep highest-impression one). Status buckets:

Status	Meaning
`ready`	3 clean kws + 1 brand kw available
`ready_no_brand`	3 clean kws, no proper brand discriminator (biz name was all generic tokens like "Galway Landscaping")
`brand_only`	<3 clean kws but brand kw exists — runs brand only
`insufficient_clean_kws`	<3 clean kws and no brand — skipped
`no_ki_data`	Sub has zero KI rows — skipped

"near me" gotcha

Original cleaner stripped "near" as a location token, leaving "me" stranded, then re-appended "near me" → producing "chiropodist me near me". Fix: treat "near me" as a unit. Strip the pair from working tokens before location-stripping, then re-append if the original kw had it.

Brand-token over-stripping gotcha

Original brand classification used full biz name tokens as the brand set without subtracting service nouns / location / category tokens. Result: subs in service-named verticals (e.g. "Premium Roofing", where "roofing" is a service noun) had their brand set polluted, and every kw containing "roofing" got classified as mixed_brand and dropped. Fix: strip service nouns + category tokens + location tokens from the biz set before computing the brand-token set.

Phase 3 — SerpAPI run (clean kws + brand on SERP)

Script: 03a-saymore-rank-run.cjs / 03b-snapsync-rank-run.cjs.

For each runnable sub (status ∈ ready / ready_no_brand / brand_only):

3 clean kws × google_maps engine = maps queries
3 clean kws × google engine (only if sub has website) = organic queries
1 brand kw × google engine (only if sub has website) = brand organic

Brand maps queries are skipped here — they need different matching logic (Phase 4).

SerpAPI parameters

{
  engine: "google_maps" | "google",
  device: "mobile",
  gl: "ie",
  hl: "en",
  google_domain: "google.ie",
  q: <query>,
  // For maps with lat/lng:
  ll: "@<lat>,<lng>,14z",
  // For maps without lat/lng: omit ll, fall back to gl=ie country scope.
}

Place_id + lat/lng enrichment

Per runnable sub, pull place_id + lat + lng from the GA-native mapping with GBP_INFORMATION fallback:

WITH map AS (
  SELECT SAFE_CAST(SUBSCR_ID_Match AS INT64) AS subscriber_id, ANY_VALUE(locationId) AS locationId
  FROM GoogleAccount_Subscr_MAP_GBP_native WHERE SUBSCR_ID_Match IS NOT NULL GROUP BY SUBSCR_ID_Match
),
loc AS (
  SELECT locationId,
         ANY_VALUE(metadata.placeId) AS place_id,
         ANY_VALUE(latlng.latitude) AS lat,
         ANY_VALUE(latlng.longitude) AS lng
  FROM GMB_LOCATIONS GROUP BY locationId
),
native AS (SELECT m.subscriber_id, l.place_id, l.lat, l.lng FROM map m LEFT JOIN loc l USING(locationId) WHERE m.subscriber_id IN (...)),
fallback AS (SELECT ... FROM GBP_INFORMATION WHERE subscriber_id IN (...) GROUP BY subscriber_id)
SELECT COALESCE(n.subscriber_id, f.subscriber_id) AS subscriber_id,
       COALESCE(n.place_id, f.place_id), COALESCE(n.lat, f.lat), COALESCE(n.lng, f.lng)
FROM native FULL OUTER JOIN fallback ON ...

GMB_LOCATIONS columns to know:

metadata.placeId — the Google Place ID (NOT a top-level place_id field)
latlng.latitude / latlng.longitude — lat/lng nested struct
locationId — the GBP location ID (different from place_id)

GBP_INFORMATION columns:

place_id (STRING, top-level)
lat (FLOAT64), lng (FLOAT64)

`location` parameter trap

SerpAPI's location parameter requires a canonical name from their gazetteer. "Ballyfermot, Dublin, Ireland" is rejected with HTTP 400. Don't use location for maps/organic queries. Rely on gl=ie + google_domain=google.ie for country scope. For maps, use ll=@lat,lng,14z when you have coordinates; without coordinates, accept that the local pack is "anywhere in Ireland" rather than city-specific.

Match logic

Organic match: by hostname. Strip www., compare against sitepro_url host (if website_type=sitepro) or gbp_website host (if external). Match if equal or one ends with .{other}.
Maps match (clean kws): by place_id against local_results[].place_id. Fallback: 60% token-overlap match on local_results[].title vs biz name.

Concurrency

5 SerpAPI calls in flight via a manual worker-pool pattern. Retries 429 / 5xx with exponential backoff (max 2 retries). Append each result to the output CSV as it lands — gives resume-on-restart via the existing-rows check at startup.

Resume logic

On script restart, parse the existing output CSV. Build a done = Set('<subscriber_id>|<kw_label>|<surface>'). Skip any call already in done. This means a partial / crashed run can be resumed by just re-running.

Gotcha: rows with errors (error != '') need to be filtered OUT of the done-set, otherwise transient SerpAPI errors get baked in as "skip this forever". Scrub error rows before resuming.

Phase 4 — Brand maps refix (raw biz name + knowledge-panel matcher)

Script: 04a-saymore-brand-maps-refix.cjs / 04b-snapsync-brand-maps-refix.cjs.

Brand-on-maps needs a separate run with different logic because:

Brand kw = raw account name. For an SME, the brand IS the business name (buildBrandKw(account_name): lowercase, strip first comma onward, strip legal suffixes ltd|limited|the|and|&|co|company|inc|..., collapse whitespace). Do NOT strip locations/service nouns from biz names — "Kilcock Radiators" IS the brand, not "radiators" with "Kilcock" stripped as a location.

Knowledge-panel matcher. For brand-specific queries, Google often doesn't return a local_results array (which would be the 3-pack). Instead it returns a singular place_results knowledge panel describing the specific business. Default SerpAPI matcher checking local_results only would report 76% "invisible" when reality is 96% visible — the data was there, just in a different field.

Matcher priority:

if (json.place_results) {
  // Knowledge panel — Google's response for brand searches
  if (place_results.place_id === expected_place_id) → MATCH (rank 1, "knowledge_panel_place_id")
  else if (name_overlap >= 0.6) → MATCH (rank 1, "knowledge_panel_name")
  else → no match
} else if (json.knowledge_graph?.title) {
  // Knowledge graph fallback
  ...
} else {
  // local_results scan (place_id then fuzzy name)
  ...
}

Brand+organic queries use the standard hostname matcher from Phase 3 — no special logic needed.

Process

Delete existing brand+maps rows from the rank CSV (so we don't double-count).
For each runnable sub with a buildBrandKw(...) result, run google_maps engine with the raw brand kw + lat/lng.
Apply the knowledge-panel-aware matcher.
Append new rows to the rank CSV.

Saved as memory feedback_sme_brand_is_bizname.md.

Phase 5 — Retention metrics (per product, from KB CSV)

Scripts: 06-snapsync-retention-analyze.cjs (SM retention metrics are computed inline in 05a-saymore-rank-analyze.cjs).

Reads docs/{ss,sm}-deep-dive-kb.csv, computes:

Metric	How
Total / alive / retention	`currently_alive_ss === 'true'` (CSV string) — NOT `=== true` (boolean)
Median / avg tenure	`monthsBetween(first__billing_start, NOW if alive else last__end)`, capping `2099` sentinel dates at NOW
Cohort year	Naive: year of first billing. For SnapSync, bucket `cohort_year < 2019` as "Pre-2019 (legacy carry-in)" because the 2017-18 spike is the order-management migration registering existing print clients, not real signings.
Existing vs new	`pre__groups != ''` (from KB) — but this misses Print, see Phase 6 for the corrected combined cut*
Standalone / cross-sell / anchor / fully-bundled	Boolean cross of `had_pre` + `had_post`
MRR tier	<€30 / €30-49 / €50-99 / €100+ on `first_*_mrr`
Survival curve Y+0 → Y+8	For each cohort year y, % of subs still alive at end of (y + offset). Counts a sub as "alive at EOY" if `currently_alive` OR `last_end > EOY`
Tenure bands	0-6 mo, 6-12, 12-18, 18-24, 24-36, 3-4yr, 4-5yr, 5-6yr, 6-7yr, 7-8yr, 8yr+
Category × retention	Group by `main_category` n≥10 (or n≥15 for combined). Compute retention + avg tenure + avg MRR per category.
Pre-portfolio × retention	KB's `pre__groups` field — classify into segments. Limited because of subscription filter — see Phase 6.*
Upsell effect	`post_*_added_groups != ''`
Co-churn	For churned subs, count occurrences in `post_*_lost_groups`
Locality	KB's `locality` field (Metro / Other, computed from city regex)

Output: .tmp_{ss,sm}_retention_metrics.json for the HTML build.

Phase 6 — Combined view

Scripts: 09-combined-analyze.cjs, 07-combined-pre-recompute.cjs, 08-combined-pre-retention.cjs.

Merge step

Build combined[] from union of saymore-deep-dive-kb.csv + snapsync-deep-dive-kb.csv keyed by subscriber_id. Per merged sub:

Combined field	Logic
Identity (name, biz, category, city, county)	Prefer SM if present (newer data), else SS
`first_start`	`min(first_sm_billing_start, first_ss_billing_start)`
`last_end`	`max(last_sm_end, last_ss_end, last_sm_cancellation, last_ss_cancellation)` — drop 2099 sentinels
`alive_either`	`alive_sm OR alive_ss`
`tenure_months`	From `first_start` to (NOW if alive_either else last_end) — spans BOTH products for migrators
`first_mrr`	MRR of the EARLIEST product (SM if SM started first, else SS)
`cohort_year`	Year of `first_start` (earliest product)
`product_history`	One of 8 buckets — see below

Product-history buckets

Bucket	Definition
`ss_only_alive`	Has SS KB row, no SM KB row, alive_ss
`ss_only_churned`	Has SS KB row, no SM KB row, not alive_ss
`sm_only_alive`	Has SM KB row, no SS KB row, alive_sm
`sm_only_churned`	Has SM KB row, no SS KB row, not alive_sm
`migrated_ss_to_sm`	Has both KB rows, alive_sm, NOT alive_ss, AND ss_start < sm_start
`had_both_now_ss_only`	Has both KB rows, NOT alive_sm, alive_ss
`dual_active`	Has both KB rows, alive_sm AND alive_ss
`had_both_both_churned`	Has both KB rows, neither alive

The zero-value undercount caveat

CRM active_clients shows 43 subs alive on both products. The combined analysis sees 38 (18 dual_active + 20 had_both_now_ss_only). The 5-sub gap is zero-value billing lines — typically the SayMore Ultimate rebrand layered onto an existing SnapSync line at €0 MRR. The KB SQL filters totalLineItemAmount > 0 (correct for MRR math) so these lines drop out.

Similarly: historical CRM shows ~172 ever-upsold from SS → SM. The combined view sees 88 + 20 + 18 = 126. The 46-sub gap is the same zero-value upgrade pattern.

To dovetail properly (separate exercise): rebuild the KB SQL to retain zero-value lines as marker rows linking SS MRR to the SM product label.

Pre-portfolio with Print (the big methodology correction)

The KB's pre_*_groups field is built from subscription-only lines, which silently excludes the entire DIR - PRINT book — 79,314 subs across CRM. For a "did this customer have a prior FCR relationship?" question, that's wrong.

07-combined-pre-recompute.cjs rebuilds pre-portfolio for every combined sub WITHOUT the subscription filter, using signed_date < first_listings_billing_start as the temporal anchor:

WITH listings_subs AS (
  -- union of SS + SM first-billing dates
),
all_lines AS (
  SELECT a.accountNumber, o.signedDate, ol.product.productGroup.name AS pg
  FROM Accounts_and_related_data a, UNNEST(a.orders) o, UNNEST(o.salesOrderDetails) ol
  WHERE pg IS NOT NULL AND pg NOT IN ('PMS - Saymore', 'PMS - Snapsync')
)
SELECT
  ls.subscriber_id,
  STRING_AGG(DISTINCT al.pg, '; ') AS all_pre_groups,
  LOGICAL_OR(al.pg = 'DIR - PRINT') AS had_print,
  LOGICAL_OR(al.pg LIKE 'DIR -%')   AS had_any_directory,
  LOGICAL_OR(al.pg = 'DMS - Websites') AS had_websites,
  LOGICAL_OR(al.pg = 'DMS - SEA') AS had_sea,
  LOGICAL_OR(al.pg = 'DMS - SEO') AS had_seo
FROM listings_subs ls
LEFT JOIN all_lines al ON al.subscriber_id = ls.subscriber_id
                       AND al.signed_date < ls.first_listings_start
GROUP BY ls.subscriber_id, ls.first_listings_start

Pre-portfolio segments (mutually exclusive, prioritised)

The segments are prioritised by digital depth, so a sub with both Websites AND Print is counted in "Had Websites" — gives the cleanest read on whether digital products lift retention beyond Print alone.

Segment	Priority	n	Retention	Tenure
Had SEA (± print/dir)	1	915	49%	41 mo
Had Websites (± print/dir, no SEA)	2	709	41%	35 mo
Had Print + GPI/PBI (no digital sub)	3	1,299	40%	56 mo
Had directory (GPI/PBI, no print)	4	175	28%	26 mo
No prior FCR products	5	990	27%	22 mo
Other (GSV, SEO alone)	6	23	4%	—

Tally bug to watch for

When tallying boolean flags from BQ JSON, CSV/JSON cells come back as the strings "true" / "false", not booleans. if (r.had_print) is truthy for the string "false". Always explicitly check r.had_print === true || r.had_print === "true". Caught this on the first pre-recompute run when all flag counts came back identical at 3,222.

Standalone vs digitally-bundled

The original KB's pre_*_groups + post_*_added_groups is subscription-only. Use that as-is for the standalone vs digitally-bundled cut (Section 4 of the combined HTML) — because the bundling question is specifically about active digital management, not Print. Print is excluded from this cut by design.

So the combined view has TWO separate cuts:

Section 3: Existing vs net-new uses the Phase 6 recompute (includes Print).
Section 4: Standalone vs digitally-bundled uses the KB's subscription-filtered fields (excludes Print).

Document the definitional difference on each slide so readers don't confuse them.

Combined rank merge

rankBySub map keyed by subscriber_id. For each row from SAYMORE_RANK_ANALYSIS_V2.csv and SNAPSYNC_RANK_ANALYSIS.csv:

Track smSubIds set from the SM rank CSV.
For each SS rank row: if sub is in smSubIds, skip (SM data preferred for overlap — newer test, more current). Otherwise append.

Bug fixed mid-analysis: the original merge logic only added the FIRST SS row per sub when the sub had no SM data — because the second-loop conditional checked for a missing _source flag that was never set. Result: ss_only_alive subs had only 1 row of rank data instead of 4-7. Visibility came out at 66% instead of the real 88%. Fix: use a smSubIds set as the dedup key instead of inspecting the first existing row.

Rank by product_history

Joining the merged rank data to the product_history bucket surfaces an important pattern:

Bucket	n	Visible	Local Pack	Invisible
sm_only_alive	645	84%	61%	16%
ss_only_alive	696	88%	67%	12%
migrated_ss_to_sm	88	94%	63%	6%
had_both_now_ss_only	20	100%	75%	0%
dual_active	18	89%	67%	11%

Migration cohort (88) has the highest visibility — they got proper management attention. The two big buckets (SM-only-alive and SS-only-alive) are where the long-tail intervention work lives.

Phase 7 — Output (HTML + PPT brief)

HTML structure (mirrors across all three reports):

Header — title, subtitle (totals + date), TOC.
Section 1 — Headline stat-grid (4 cells).
Section 2 — Cohort split (varies by report — bridge cohort year for SM, carry-in vs new for SS, product-history buckets for combined).
Sections 3-12 — Retention dimensions: existing-vs-new, standalone-vs-bundled, MRR tier, survival, tenure, categories, prior portfolio, upsell, co-churn, locality.
Section 13 — Ranking headline + cohort split.
Section 13a — Brand check (proof of digital life).
Sections 14-17 — SitePro vs no-site, by-cohort-year, MRR × rank, vertical winners/losers.
Section 18 — Named lists (4-6 cards).
Section 19 — Strategic takeaways (3 panels: broken / works / discuss).
Footer — methodology + source CSVs.

PPT briefs are declarative slide specs — each section in the HTML maps to one slide. The combined PPT brief has 19 slides + cover + methodology footer.

Reproducing the analysis end-to-end

Scripts live in scripts/snapsync-saymore/ (see that folder's README for the full table). Each defines ROOT = path.resolve(__dirname, "..", "..") so file paths resolve to the repo root regardless of cwd. Run in order:

# 1. KB build per product (~3 min each)
node scripts/snapsync-saymore/01-snapsync-kb-build.cjs      # → docs/snapsync-deep-dive-kb.csv
# SM KB built earlier via a .tmp_sm_kb_build script (body in git history; methodology in
# docs/saymore-deep-dive-kb.md). To rebuild, adapt 01 — swap PMS - Snapsync → PMS - Saymore,
# _ss → _sm column names, and re-enable the bridge-cohort fetch.

# 2. Keyword preview per product (~3 min each)
node scripts/snapsync-saymore/02a-saymore-keyword-preview.cjs   # → SAYMORE_RANK_KEYWORD_PREVIEW.csv
node scripts/snapsync-saymore/02b-snapsync-keyword-preview.cjs  # → SNAPSYNC_RANK_KEYWORD_PREVIEW.csv

# 3. SerpAPI run (clean kws + brand serp). ~25 min each, ~$20 each
node scripts/snapsync-saymore/03a-saymore-rank-run.cjs   # → SAYMORE_RANK_ANALYSIS_V2.csv
node scripts/snapsync-saymore/03b-snapsync-rank-run.cjs  # → SNAPSYNC_RANK_ANALYSIS.csv

# 4. Brand maps refix (raw biz name + knowledge-panel matcher). ~5 min each, ~$4 each
node scripts/snapsync-saymore/04a-saymore-brand-maps-refix.cjs
node scripts/snapsync-saymore/04b-snapsync-brand-maps-refix.cjs

# 5. Rank + retention metrics (<1 min each)
node scripts/snapsync-saymore/05a-saymore-rank-analyze.cjs   # → .tmp_sm_rank_metrics.json
node scripts/snapsync-saymore/05b-snapsync-rank-analyze.cjs  # → .tmp_ss_rank_metrics.json
node scripts/snapsync-saymore/06-snapsync-retention-analyze.cjs  # → .tmp_ss_retention_metrics.json

# 6. Combined view (~3 min)
node scripts/snapsync-saymore/07-combined-pre-recompute.cjs   # → .tmp_combined_pre_portfolio.json (Print-inclusive)
node scripts/snapsync-saymore/08-combined-pre-retention.cjs   # → .tmp_combined_pre_segments.json
node scripts/snapsync-saymore/09-combined-analyze.cjs         # → .tmp_combined_metrics.json

# 7. HTMLs are hand-built from the JSON metrics (no auto-generator).
# Edit SAYMORE_DEEP_DIVE.html / SNAPSYNC_DEEP_DIVE.html / SNAPSYNC_SAYMORE_DEEP_DIVE.html
# with the numbers from the metrics JSON files.

Total wall-clock: ~80-90 minutes for a fresh end-to-end run. SerpAPI cost: ~$45-50 across both products. The .tmp_*.json files are intermediate working artifacts written to the repo root.

Gotchas summary (the lessons earned)

product.subscription = TRUE excludes Print. Right for MRR math, wrong for existing-vs-net-new. The original analysis claimed 86% net-new; with Print included, 76% were existing FCR clients. → feedback_kb_subscription_filter_excludes_print.md
active_clients.subscriber_id is STRING; accountNumber is INT64. Always SAFE_CAST(... AS INT64) on the active_clients side before joining.
product.subscription = TRUE exists on multiple PG names. Both PMS - Saymore (subscription=true) and PMS - Saymore (subscription=false) exist in CRM. The non-subscription rows are zero-value migration/rebrand lines. The KB filter drops them. For dovetail work they need a separate channel.
SerpAPI location parameter requires canonical names. "Ballyfermot, Dublin, Ireland" returns HTTP 400. Use ll=@lat,lng,14z for maps; rely on gl=ie for country scope on organic.
GMB_LOCATIONS schema: metadata.placeId (not top-level), latlng.latitude / latlng.longitude (nested struct).
GoogleAccount_Subscr_MAP_GBP_native column is SUBSCR_ID_Match (not subscriber_id).
Brand kw for an SME is the raw business name. Don't strip locations / service nouns / category — "Kilcock Radiators" IS the brand, not a service. Brand-cleaning logic is for value-test kws only. → feedback_sme_brand_is_bizname.md
SerpAPI returns place_results (singular) for brand-specific maps queries, not local_results. The matcher needs to check place_results.place_id first. Without this fallback, brand-maps hit rate looks like 24% when reality is 96%.
CSV cells come back as strings, not booleans. r.had_print === true || r.had_print === "true". Always.
"near me" must be preserved as a unit in keyword cleaning. Stripping "near" alone leaves "me" stranded, then re-appending "near me" produces "service me near me".
Worker dashboard-bq-execute default limit is 500. Pass limit: 20000 (or more) explicitly for big queries. Hitting the default silently truncates.
CSV parser must handle quoted commas. am field is "Last, First" format — commas inside quotes break naive split-by-comma. Use a proper multiline-aware parser.
Resume logic must filter out error rows. Otherwise transient SerpAPI errors get baked in as "permanent skip."
Combined rank merge needs an explicit overlap-skip. Using "is the first existing row from source X?" as a dedup key fails because no such flag is set on the first append.
Cohort year < 2019 for SnapSync is the order-management migration, not real signings. Bucket as "Pre-2019 (legacy carry-in)" rather than letting 998 + 1,656 dominate the cohort tables.
Pre-portfolio classification needs signed_date as the temporal anchor, not firstInvoiceOn. Non-subscription products don't have recurring invoices.
Pre-portfolio segments should be prioritised by digital depth. A sub with both Websites and Print should be counted in "Had Websites" — the digital product is the stronger anchor.

Future improvements / known gaps

Gap	What it would unlock
Zero-value migration lines in KB	Proper dovetail count (43 dual-active not 38, ~172 migrations not 126). Needs a KB SQL variant that retains zero-MRR lines as marker rows linking SS MRR to SM product label.
Bridge cohort for SnapSync	Match-quality reconciliation between cohort_year and bridge cohort year. Endpoint currently returns zero rows for SS.
Promote `.tmp_*.cjs` scripts to permanent	Survive cleanup; let CI re-run quarterly. Move under `scripts/snapsync-saymore-*` with a README.
Auto-generated HTML	Currently HTMLs are hand-edited from the JSON metrics. A small templater would let the HTMLs regenerate from a single source of truth.
Vertical sample sizes < 5	The "categories" cut requires n≥10 (standalone) or n≥15 (combined) to avoid noise. Higher-n thresholds drop interesting niche verticals. Worth a Bayesian shrinkage estimator for small-n categories.
Print sunset timeline modelling	The 1,299 "Print + GPI/PBI" subs (32% of book) are aging out as Print declines. Project the digital-anchor migration runway.

Memory references

feedback_kb_subscription_filter_excludes_print.md — the Phase 6 subscription-filter gotcha
feedback_sme_brand_is_bizname.md — Phase 4 raw-name brand-check rule
feedback_saymore_snapsync.md — pre-existing memory that SnapSync and SayMore are the same product line at different labels
reference_env_local_layout.md — .env.local key format
reference_sitepro_listing_map.md — sitepro_listing_map table schema
reference_dim_listings_with_subscriber_id.md — identity bridge table
feedback_lrc_keyword_location_model.md — bare-service-term rule for LRC keywords (related to Phase 2 cleaning)

FCR Dashboard documentation · generated from docs/ · keep counts verified, not guessed.