SnapSync / SayMore Deep Dive — Process & Methodology
End-to-end process notes for the SnapSync + SayMore + combined product analyses run 2026-05-18 → 2026-05-19. Captures data sources, SQL patterns, keyword-cleaning rules, SerpAPI run pipeline, combined-view merge logic, and the gotchas that took multiple iterations to find.
This doc is the canonical reference for re-running the analysis or extending it. The findings live in the three HTML reports — see Outputs below.
What this analysis produces
Three deep-dive reports treating SnapSync (2017-2018 onboarding) and SayMore (2019+ rebrand) as the same SMB-listings product family:
- SayMore standalone —
SAYMORE_DEEP_DIVE.html— 1,648 subs, 2018-2026 YTD. - SnapSync standalone —
SNAPSYNC_DEEP_DIVE.html— 2,637 subs, 2017-2026 YTD. - Combined —
SNAPSYNC_SAYMORE_DEEP_DIVE.html— 4,111 unique subs across both. This is the canonical view for general audiences. The standalones are useful when an audience specifically asks about one product.
Each report has two halves:
- Part 1: Retention — cohort year, MRR tier, survival curve, tenure bands, categories, prior portfolio, upsell effect, co-churn, locality.
- Part 2: Ranking — visibility, organic page 1, Local Pack, brand check (proof of digital life), per-cohort breakdowns, named lists (invisible / GBP-fix / upsell-target / proof).
PPT-input briefs for the deck builders:
SAYMORE_DEEP_DIVE_PPT_UPDATE_BRIEF.md(declarative slide spec)SNAPSYNC_SAYMORE_PPT_UPDATE_BRIEF.md(combined deck, 19 slides)
Data sources
BigQuery tables
| Table | Used for |
|---|---|
DYNAMICS_DATA.Accounts_and_related_data |
Authoritative orders + sales-order-details + billing-groups + recurring-invoices — the raw source for first-billing dates, MRR, cancellation dates. The unified KB SQL unnests orders → salesOrderDetails → billingGroup → recurringInvoices. |
fcr_operations.active_clients |
Daily snapshot of currently-billing subscribers per product group. Authoritative for "alive today" via MAX(snapshot_date) + product_group = 'PMS - Saymore' / 'PMS - Snapsync'. |
SUPERSET_MASTER.dim_listings_with_subscriber_id |
Subscriber identity: AccountName (CRM), bizname (GP listing), main_category_name. Daily-refreshed bridge between CRM subscriber_id and GP listing_id. |
fcr_operations.sitepro_listing_map |
SitePro URL by subscriber_id. Built from MegaDoc cache via n8n workflow LVzHQQ7xMQ3moLwO (currently unpublished — manual refresh). Used to identify the "has FCR-built website" cohort. |
fcr_operations.GBP_INFORMATION |
Scraped GBP data fallback (place_id, lat, lng, website). Used when the GA-native mapping table doesn't cover a sub. |
GOOGLE_ACCOUNT_DATA.GoogleAccount_Subscr_MAP_GBP_native |
Canonical GBP location-id ↔ subscriber mapping. Authoritative when present. Columns: locationId, SUBSCR_ID_Match, title. |
GOOGLE_ACCOUNT_DATA.GMB_LOCATIONS |
Full GBP location data per locationId — metadata.placeId, latlng.{latitude,longitude}, websiteUri, categories, etc. |
fcr_operations.KEYWORD_INTELLIGENCE |
GSC search-term data per subscriber. Sources: gsc_organic (SitePro GSC) + gsc_goldenpages (GP listing GSC). Used to pull each sub's top keywords by impressions. |
Worker API
All BQ calls go through the worker endpoint /dashboard-bq-execute with x-api-key header. Pattern:
curl -s -X POST "https://fcr-dashboard-api.fcrmedia.workers.dev/dashboard-bq-execute" \
-H "Content-Type: application/json" \
-H "x-api-key: $API_KEY" \
-d '{"sql": "...", "limit": 20000}'
Default limit is 500. For any query that might return more than 500 rows, set limit explicitly. Hitting the default 500 silently truncates the result.
For SerpAPI we hit https://serpapi.com/search.json directly with the SERPAPI_KEY from .env.local. The worker has SERPAPI integration but for batch runs it's faster to bypass.
Local secrets
.env.local keys used:
x-api-key(note: leading space, lowercase, hyphens) — worker API keySERPAPI_KEY— SerpAPI direct access
See memory reference_env_local_layout.md for the full layout.
Output artifacts
| File | What it is |
|---|---|
docs/saymore-deep-dive-kb.csv |
Per-sub SayMore history. 1,648 rows, 25 columns. |
docs/saymore-deep-dive-kb.md |
Methodology + unified SQL (predates this doc but still valid). |
docs/snapsync-deep-dive-kb.csv |
Per-sub SnapSync history. 2,637 rows, 25 columns. Same schema as SayMore (just _sm → _ss). |
SAYMORE_RANK_KEYWORD_PREVIEW.csv |
Pre-SerpAPI keyword plan for SayMore. 849 rows with raw kws + cleaned variants + drop reasons. |
SAYMORE_RANK_ANALYSIS_V2.csv |
Per-call SerpAPI results for SayMore alive subs. Long format, ~4,100 rows. |
SNAPSYNC_RANK_KEYWORD_PREVIEW.csv |
Same shape for SnapSync. |
SNAPSYNC_RANK_ANALYSIS.csv |
Same shape for SnapSync. ~4,500 rows. |
SAYMORE_DEEP_DIVE.html |
SayMore-only report. |
SNAPSYNC_DEEP_DIVE.html |
SnapSync-only report. |
SNAPSYNC_SAYMORE_DEEP_DIVE.html |
Combined report (canonical). |
SAYMORE_DEEP_DIVE_PPT_UPDATE_BRIEF.md |
Declarative slide spec for SayMore deck. |
SNAPSYNC_SAYMORE_PPT_UPDATE_BRIEF.md |
Declarative slide spec for combined deck. |
.tmp_combined_metrics.json |
Combined retention + rank metrics (working file). |
.tmp_combined_pre_portfolio.json |
Per-sub prior-portfolio enrichment INCLUDING Print (working file). |
.tmp_combined_pre_segments.json |
Pre-portfolio retention breakdown (working file). |
Phase 1 — KB build (per product)
For each product (SnapSync, SayMore), build a per-subscriber history CSV with 25 columns covering identity, first line, lifecycle, portfolio context.
Script: scripts/snapsync-saymore/01-snapsync-kb-build.cjs (template — swap TARGET_GROUP = 'PMS - Snapsync' ↔ 'PMS - Saymore').
SQL pattern (full version in docs/saymore-deep-dive-kb.md):
WITH all_lines AS (
-- Pulls every subscription/recurring line for every subscriber
-- with positive MRR (totalLineItemAmount > 0, invoicePeriod > 0).
SELECT ... FROM Accounts_and_related_data,
UNNEST(orders), UNNEST(salesOrderDetails),
UNNEST([billingGroup]), UNNEST(recurringInvoices)
WHERE product.subscription = TRUE -- ⚠️ See gotcha below
AND ri.totalLineItemAmount > 0
AND ri.invoicePeriod > 0
),
lines_mrr AS (
-- Proportional MRR split across BG line items
SELECT lf.*, ri_total * (base_amount / total_base) / ri_period AS monthly_value
FROM lines_filtered JOIN bg_totals
),
ss_subs AS (
SELECT subscriber_id, MIN(billing_start) first_ss_start
FROM lines_mrr WHERE pg = 'PMS - Snapsync' GROUP BY subscriber_id
),
first_ss_line AS (...), -- Per-sub first line details
all_ss AS (...), -- Per-sub aggregates (last_end, last_cancellation)
pre_ss_groups AS (...), -- Distinct product_groups billing BEFORE first SS line
post_ss_added AS (...), -- Product groups whose FIRST EVER billing postdates first SS
post_ss_lost AS (...), -- Product groups now ended (no open-ended line)
live_ss AS (...), -- subscriber_ids alive in active_clients today
biz AS (...) -- Identity from dim_listings_with_subscriber_id
SELECT ... FROM ss_subs LEFT JOIN ... 25 columns
Critical gotcha — subscription filter
The product.subscription = TRUE filter is right for MRR / tenure math but wrong for pre/post portfolio classification. It excludes:
| Product group | Distinct subs excluded |
|---|---|
| DIR - PRINT | 79,314 (the entire legacy print directory book) |
| DMS - Websites (non-sub) | 4,209 |
| DIR - GPI (non-sub) | 2,702 |
| DIR - PBI (non-sub) | 2,052 |
| DMS - SEA (non-sub) | 1,335 |
| PMS - GSV | 1,096 |
| PMS - Saymore (non-sub — migration/rebrand lines) | 2,945 |
Impact on the SS/SM combined deep-dive: the subscription-only filter would say 86% of listings sales were "net-new"; with Print included, 76% were INTO existing FCR clients. The original SnapSync 2017-2018 motion was a print-book penetration play, not an acquisition campaign.
Fix pattern when classifying pre/post portfolio: drop the subscription filter, use signed_date as the temporal anchor (because non-subscription products don't have recurring-invoice firstInvoiceOn dates). See Phase 6 / 07-combined-pre-recompute.cjs.
Saved in memory as feedback_kb_subscription_filter_excludes_print.md.
Bridge cohort mapping (SayMore only)
The SayMore KB augments cohort_year with bridge_cohort_year + bridge_class from per-year calls to /dashboard-revenue-bridge?productGroup=PMS - Saymore&fromDate=...&toDate=...&includeDetail=true. Buckets a sub into new / renewed / returned based on the revenue-bridge dashboard logic.
The SnapSync KB skips this step because the bridge endpoint returns zero detail for PMS - Snapsync. SnapSync uses naive cohort_year (first-billing year) only. Acceptable because SnapSync onboarding ended 2018; very few renewal cycles to classify.
Type-mismatch gotcha
active_clients.subscriber_id is STRING. Accounts_and_related_data.accountNumber is INT64. The live_ss / live_sm join needs SAFE_CAST(subscriber_id AS INT64) on the active_clients side or the join silently produces zero matches and currently_alive_* comes out false for every row. Caught this on the first SnapSync KB build run.
Phase 2 — Keyword preview (per product)
For each alive subscriber, pick 1 brand keyword + 3 non-branded service keywords for the SerpAPI test.
Script: 02a-saymore-keyword-preview.cjs / 02b-snapsync-keyword-preview.cjs.
Cohort: alive subs only
Query active_clients for the latest snapshot, filter product_group = 'PMS - Saymore' (or Snapsync). Returns ~822 SayMore / ~844 SnapSync subs. Join to:
dim_listings_with_subscriber_id— biz name + categoryAccounts_and_related_data— city + countysitepro_listing_map— sitepro_url (forwebsite_type)GBP_INFORMATION— gbp_website (fallback)
website_type classification:
'sitepro'ifsitepro_urlpresent'external'ifgbp_websitepresent (and no sitepro)'none'otherwise
Keyword source: KEYWORD_INTELLIGENCE
Pull top 20 keywords per sub by impressions, deduped (SUM(impressions) GROUP BY subscriber_id, keyword). Limit ROW_NUMBER() <= 20 per sub. Chunk the query in batches of 80 sub_ids to avoid hitting the worker's row limit.
Brand / clean classification rules
For each candidate keyword:
Brand kw classification is done at preview time but the preview's brand_kw_maps field is NOT used in production. See Phase 4 — brand maps reruns with the raw account name.
Clean kw rule (used by
kw1_maps,kw2_maps,kw3_maps):- Tokenize keyword (lowercase, strip punctuation, drop tokens <2 chars).
- Strip from biz tokens (before classification):
BIZ_STOPWORDS(ltd, the, &, co, ...),SERVICE_NOUNS(roofing, plumbing, cleaning, hair, boiler, ...),COMMON_LOCATION_TOKENS(dublin, cork, ballyfermot, kilcock, ...), andmain_categorytokens. The leftover is the "brand discriminator" set. - If every non-stopword kw token is in the brand-discriminator set → brand kw.
- If kw has 0 brand tokens → clean kw.
- If kw has SOME but not all brand tokens (mixed brand+service) → drop as contaminated.
maps_kw build (for clean kws): strip city/county/location tokens + BIZ_STOPWORDS. Preserve "near me" as a unit (key gotcha — see below).
serp_kw build: keep original kw, append city if absent. Matches Digital Footprint's "keyword + location" behaviour.
Per-sub picker
After classification, pick up to 3 clean kws by impressions descending, deduped by maps_kw (two raw kws normalising to the same maps query → keep highest-impression one). Status buckets:
| Status | Meaning |
|---|---|
ready |
3 clean kws + 1 brand kw available |
ready_no_brand |
3 clean kws, no proper brand discriminator (biz name was all generic tokens like "Galway Landscaping") |
brand_only |
<3 clean kws but brand kw exists — runs brand only |
insufficient_clean_kws |
<3 clean kws and no brand — skipped |
no_ki_data |
Sub has zero KI rows — skipped |
"near me" gotcha
Original cleaner stripped "near" as a location token, leaving "me" stranded, then re-appended "near me" → producing "chiropodist me near me". Fix: treat "near me" as a unit. Strip the pair from working tokens before location-stripping, then re-append if the original kw had it.
Brand-token over-stripping gotcha
Original brand classification used full biz name tokens as the brand set without subtracting service nouns / location / category tokens. Result: subs in service-named verticals (e.g. "Premium Roofing", where "roofing" is a service noun) had their brand set polluted, and every kw containing "roofing" got classified as mixed_brand and dropped. Fix: strip service nouns + category tokens + location tokens from the biz set before computing the brand-token set.
Phase 3 — SerpAPI run (clean kws + brand on SERP)
Script: 03a-saymore-rank-run.cjs / 03b-snapsync-rank-run.cjs.
For each runnable sub (status ∈ ready / ready_no_brand / brand_only):
- 3 clean kws ×
google_mapsengine = maps queries - 3 clean kws ×
googleengine (only if sub has website) = organic queries - 1 brand kw ×
googleengine (only if sub has website) = brand organic
Brand maps queries are skipped here — they need different matching logic (Phase 4).
SerpAPI parameters
{
engine: "google_maps" | "google",
device: "mobile",
gl: "ie",
hl: "en",
google_domain: "google.ie",
q: <query>,
// For maps with lat/lng:
ll: "@<lat>,<lng>,14z",
// For maps without lat/lng: omit ll, fall back to gl=ie country scope.
}
Place_id + lat/lng enrichment
Per runnable sub, pull place_id + lat + lng from the GA-native mapping with GBP_INFORMATION fallback:
WITH map AS (
SELECT SAFE_CAST(SUBSCR_ID_Match AS INT64) AS subscriber_id, ANY_VALUE(locationId) AS locationId
FROM GoogleAccount_Subscr_MAP_GBP_native WHERE SUBSCR_ID_Match IS NOT NULL GROUP BY SUBSCR_ID_Match
),
loc AS (
SELECT locationId,
ANY_VALUE(metadata.placeId) AS place_id,
ANY_VALUE(latlng.latitude) AS lat,
ANY_VALUE(latlng.longitude) AS lng
FROM GMB_LOCATIONS GROUP BY locationId
),
native AS (SELECT m.subscriber_id, l.place_id, l.lat, l.lng FROM map m LEFT JOIN loc l USING(locationId) WHERE m.subscriber_id IN (...)),
fallback AS (SELECT ... FROM GBP_INFORMATION WHERE subscriber_id IN (...) GROUP BY subscriber_id)
SELECT COALESCE(n.subscriber_id, f.subscriber_id) AS subscriber_id,
COALESCE(n.place_id, f.place_id), COALESCE(n.lat, f.lat), COALESCE(n.lng, f.lng)
FROM native FULL OUTER JOIN fallback ON ...
GMB_LOCATIONS columns to know:
metadata.placeId— the Google Place ID (NOT a top-levelplace_idfield)latlng.latitude/latlng.longitude— lat/lng nested structlocationId— the GBP location ID (different from place_id)
GBP_INFORMATION columns:
place_id(STRING, top-level)lat(FLOAT64),lng(FLOAT64)
location parameter trap
SerpAPI's location parameter requires a canonical name from their gazetteer. "Ballyfermot, Dublin, Ireland" is rejected with HTTP 400. Don't use location for maps/organic queries. Rely on gl=ie + google_domain=google.ie for country scope. For maps, use ll=@lat,lng,14z when you have coordinates; without coordinates, accept that the local pack is "anywhere in Ireland" rather than city-specific.
Match logic
- Organic match: by hostname. Strip
www., compare againstsitepro_urlhost (if website_type=sitepro) orgbp_websitehost (if external). Match if equal or one ends with.{other}. - Maps match (clean kws): by
place_idagainstlocal_results[].place_id. Fallback: 60% token-overlap match onlocal_results[].titlevs biz name.
Concurrency
5 SerpAPI calls in flight via a manual worker-pool pattern. Retries 429 / 5xx with exponential backoff (max 2 retries). Append each result to the output CSV as it lands — gives resume-on-restart via the existing-rows check at startup.
Resume logic
On script restart, parse the existing output CSV. Build a done = Set('<subscriber_id>|<kw_label>|<surface>'). Skip any call already in done. This means a partial / crashed run can be resumed by just re-running.
Gotcha: rows with errors (error != '') need to be filtered OUT of the done-set, otherwise transient SerpAPI errors get baked in as "skip this forever". Scrub error rows before resuming.
Phase 4 — Brand maps refix (raw biz name + knowledge-panel matcher)
Script: 04a-saymore-brand-maps-refix.cjs / 04b-snapsync-brand-maps-refix.cjs.
Brand-on-maps needs a separate run with different logic because:
Brand kw = raw account name. For an SME, the brand IS the business name (
buildBrandKw(account_name): lowercase, strip first comma onward, strip legal suffixesltd|limited|the|and|&|co|company|inc|..., collapse whitespace). Do NOT strip locations/service nouns from biz names — "Kilcock Radiators" IS the brand, not "radiators" with "Kilcock" stripped as a location.Knowledge-panel matcher. For brand-specific queries, Google often doesn't return a
local_resultsarray (which would be the 3-pack). Instead it returns a singularplace_resultsknowledge panel describing the specific business. Default SerpAPI matcher checkinglocal_resultsonly would report 76% "invisible" when reality is 96% visible — the data was there, just in a different field.Matcher priority:
if (json.place_results) { // Knowledge panel — Google's response for brand searches if (place_results.place_id === expected_place_id) → MATCH (rank 1, "knowledge_panel_place_id") else if (name_overlap >= 0.6) → MATCH (rank 1, "knowledge_panel_name") else → no match } else if (json.knowledge_graph?.title) { // Knowledge graph fallback ... } else { // local_results scan (place_id then fuzzy name) ... }
Brand+organic queries use the standard hostname matcher from Phase 3 — no special logic needed.
Process
- Delete existing brand+maps rows from the rank CSV (so we don't double-count).
- For each runnable sub with a
buildBrandKw(...)result, rungoogle_mapsengine with the raw brand kw + lat/lng. - Apply the knowledge-panel-aware matcher.
- Append new rows to the rank CSV.
Saved as memory feedback_sme_brand_is_bizname.md.
Phase 5 — Retention metrics (per product, from KB CSV)
Scripts: 06-snapsync-retention-analyze.cjs (SM retention metrics are computed inline in 05a-saymore-rank-analyze.cjs).
Reads docs/{ss,sm}-deep-dive-kb.csv, computes:
| Metric | How |
|---|---|
| Total / alive / retention | currently_alive_ss === 'true' (CSV string) — NOT === true (boolean) |
| Median / avg tenure | monthsBetween(first_*_billing_start, NOW if alive else last_*_end), capping 2099 sentinel dates at NOW |
| Cohort year | Naive: year of first billing. For SnapSync, bucket cohort_year < 2019 as "Pre-2019 (legacy carry-in)" because the 2017-18 spike is the order-management migration registering existing print clients, not real signings. |
| Existing vs new | pre_*_groups != '' (from KB) — but this misses Print, see Phase 6 for the corrected combined cut |
| Standalone / cross-sell / anchor / fully-bundled | Boolean cross of had_pre + had_post |
| MRR tier | <€30 / €30-49 / €50-99 / €100+ on first_*_mrr |
| Survival curve Y+0 → Y+8 | For each cohort year y, % of subs still alive at end of (y + offset). Counts a sub as "alive at EOY" if currently_alive OR last_end > EOY |
| Tenure bands | 0-6 mo, 6-12, 12-18, 18-24, 24-36, 3-4yr, 4-5yr, 5-6yr, 6-7yr, 7-8yr, 8yr+ |
| Category × retention | Group by main_category n≥10 (or n≥15 for combined). Compute retention + avg tenure + avg MRR per category. |
| Pre-portfolio × retention | KB's pre_*_groups field — classify into segments. Limited because of subscription filter — see Phase 6. |
| Upsell effect | post_*_added_groups != '' |
| Co-churn | For churned subs, count occurrences in post_*_lost_groups |
| Locality | KB's locality field (Metro / Other, computed from city regex) |
Output: .tmp_{ss,sm}_retention_metrics.json for the HTML build.
Phase 6 — Combined view
Scripts: 09-combined-analyze.cjs, 07-combined-pre-recompute.cjs, 08-combined-pre-retention.cjs.
Merge step
Build combined[] from union of saymore-deep-dive-kb.csv + snapsync-deep-dive-kb.csv keyed by subscriber_id. Per merged sub:
| Combined field | Logic |
|---|---|
| Identity (name, biz, category, city, county) | Prefer SM if present (newer data), else SS |
first_start |
min(first_sm_billing_start, first_ss_billing_start) |
last_end |
max(last_sm_end, last_ss_end, last_sm_cancellation, last_ss_cancellation) — drop 2099 sentinels |
alive_either |
alive_sm OR alive_ss |
tenure_months |
From first_start to (NOW if alive_either else last_end) — spans BOTH products for migrators |
first_mrr |
MRR of the EARLIEST product (SM if SM started first, else SS) |
cohort_year |
Year of first_start (earliest product) |
product_history |
One of 8 buckets — see below |
Product-history buckets
| Bucket | Definition |
|---|---|
ss_only_alive |
Has SS KB row, no SM KB row, alive_ss |
ss_only_churned |
Has SS KB row, no SM KB row, not alive_ss |
sm_only_alive |
Has SM KB row, no SS KB row, alive_sm |
sm_only_churned |
Has SM KB row, no SS KB row, not alive_sm |
migrated_ss_to_sm |
Has both KB rows, alive_sm, NOT alive_ss, AND ss_start < sm_start |
had_both_now_ss_only |
Has both KB rows, NOT alive_sm, alive_ss |
dual_active |
Has both KB rows, alive_sm AND alive_ss |
had_both_both_churned |
Has both KB rows, neither alive |
The zero-value undercount caveat
CRM active_clients shows 43 subs alive on both products. The combined analysis sees 38 (18 dual_active + 20 had_both_now_ss_only). The 5-sub gap is zero-value billing lines — typically the SayMore Ultimate rebrand layered onto an existing SnapSync line at €0 MRR. The KB SQL filters totalLineItemAmount > 0 (correct for MRR math) so these lines drop out.
Similarly: historical CRM shows ~172 ever-upsold from SS → SM. The combined view sees 88 + 20 + 18 = 126. The 46-sub gap is the same zero-value upgrade pattern.
To dovetail properly (separate exercise): rebuild the KB SQL to retain zero-value lines as marker rows linking SS MRR to the SM product label.
Pre-portfolio with Print (the big methodology correction)
The KB's pre_*_groups field is built from subscription-only lines, which silently excludes the entire DIR - PRINT book — 79,314 subs across CRM. For a "did this customer have a prior FCR relationship?" question, that's wrong.
07-combined-pre-recompute.cjs rebuilds pre-portfolio for every combined sub WITHOUT the subscription filter, using signed_date < first_listings_billing_start as the temporal anchor:
WITH listings_subs AS (
-- union of SS + SM first-billing dates
),
all_lines AS (
SELECT a.accountNumber, o.signedDate, ol.product.productGroup.name AS pg
FROM Accounts_and_related_data a, UNNEST(a.orders) o, UNNEST(o.salesOrderDetails) ol
WHERE pg IS NOT NULL AND pg NOT IN ('PMS - Saymore', 'PMS - Snapsync')
)
SELECT
ls.subscriber_id,
STRING_AGG(DISTINCT al.pg, '; ') AS all_pre_groups,
LOGICAL_OR(al.pg = 'DIR - PRINT') AS had_print,
LOGICAL_OR(al.pg LIKE 'DIR -%') AS had_any_directory,
LOGICAL_OR(al.pg = 'DMS - Websites') AS had_websites,
LOGICAL_OR(al.pg = 'DMS - SEA') AS had_sea,
LOGICAL_OR(al.pg = 'DMS - SEO') AS had_seo
FROM listings_subs ls
LEFT JOIN all_lines al ON al.subscriber_id = ls.subscriber_id
AND al.signed_date < ls.first_listings_start
GROUP BY ls.subscriber_id, ls.first_listings_start
Pre-portfolio segments (mutually exclusive, prioritised)
The segments are prioritised by digital depth, so a sub with both Websites AND Print is counted in "Had Websites" — gives the cleanest read on whether digital products lift retention beyond Print alone.
| Segment | Priority | n | Retention | Tenure |
|---|---|---|---|---|
| Had SEA (± print/dir) | 1 | 915 | 49% | 41 mo |
| Had Websites (± print/dir, no SEA) | 2 | 709 | 41% | 35 mo |
| Had Print + GPI/PBI (no digital sub) | 3 | 1,299 | 40% | 56 mo |
| Had directory (GPI/PBI, no print) | 4 | 175 | 28% | 26 mo |
| No prior FCR products | 5 | 990 | 27% | 22 mo |
| Other (GSV, SEO alone) | 6 | 23 | 4% | — |
Tally bug to watch for
When tallying boolean flags from BQ JSON, CSV/JSON cells come back as the strings "true" / "false", not booleans. if (r.had_print) is truthy for the string "false". Always explicitly check r.had_print === true || r.had_print === "true". Caught this on the first pre-recompute run when all flag counts came back identical at 3,222.
Standalone vs digitally-bundled
The original KB's pre_*_groups + post_*_added_groups is subscription-only. Use that as-is for the standalone vs digitally-bundled cut (Section 4 of the combined HTML) — because the bundling question is specifically about active digital management, not Print. Print is excluded from this cut by design.
So the combined view has TWO separate cuts:
- Section 3: Existing vs net-new uses the Phase 6 recompute (includes Print).
- Section 4: Standalone vs digitally-bundled uses the KB's subscription-filtered fields (excludes Print).
Document the definitional difference on each slide so readers don't confuse them.
Combined rank merge
rankBySub map keyed by subscriber_id. For each row from SAYMORE_RANK_ANALYSIS_V2.csv and SNAPSYNC_RANK_ANALYSIS.csv:
- Track
smSubIdsset from the SM rank CSV. - For each SS rank row: if sub is in
smSubIds, skip (SM data preferred for overlap — newer test, more current). Otherwise append.
Bug fixed mid-analysis: the original merge logic only added the FIRST SS row per sub when the sub had no SM data — because the second-loop conditional checked for a missing _source flag that was never set. Result: ss_only_alive subs had only 1 row of rank data instead of 4-7. Visibility came out at 66% instead of the real 88%. Fix: use a smSubIds set as the dedup key instead of inspecting the first existing row.
Rank by product_history
Joining the merged rank data to the product_history bucket surfaces an important pattern:
| Bucket | n | Visible | Local Pack | Invisible |
|---|---|---|---|---|
| sm_only_alive | 645 | 84% | 61% | 16% |
| ss_only_alive | 696 | 88% | 67% | 12% |
| migrated_ss_to_sm | 88 | 94% | 63% | 6% |
| had_both_now_ss_only | 20 | 100% | 75% | 0% |
| dual_active | 18 | 89% | 67% | 11% |
Migration cohort (88) has the highest visibility — they got proper management attention. The two big buckets (SM-only-alive and SS-only-alive) are where the long-tail intervention work lives.
Phase 7 — Output (HTML + PPT brief)
HTML structure (mirrors across all three reports):
- Header — title, subtitle (totals + date), TOC.
- Section 1 — Headline stat-grid (4 cells).
- Section 2 — Cohort split (varies by report — bridge cohort year for SM, carry-in vs new for SS, product-history buckets for combined).
- Sections 3-12 — Retention dimensions: existing-vs-new, standalone-vs-bundled, MRR tier, survival, tenure, categories, prior portfolio, upsell, co-churn, locality.
- Section 13 — Ranking headline + cohort split.
- Section 13a — Brand check (proof of digital life).
- Sections 14-17 — SitePro vs no-site, by-cohort-year, MRR × rank, vertical winners/losers.
- Section 18 — Named lists (4-6 cards).
- Section 19 — Strategic takeaways (3 panels: broken / works / discuss).
- Footer — methodology + source CSVs.
PPT briefs are declarative slide specs — each section in the HTML maps to one slide. The combined PPT brief has 19 slides + cover + methodology footer.
Reproducing the analysis end-to-end
Scripts live in scripts/snapsync-saymore/ (see that folder's README for the full table). Each defines ROOT = path.resolve(__dirname, "..", "..") so file paths resolve to the repo root regardless of cwd. Run in order:
# 1. KB build per product (~3 min each)
node scripts/snapsync-saymore/01-snapsync-kb-build.cjs # → docs/snapsync-deep-dive-kb.csv
# SM KB built earlier via a .tmp_sm_kb_build script (body in git history; methodology in
# docs/saymore-deep-dive-kb.md). To rebuild, adapt 01 — swap PMS - Snapsync → PMS - Saymore,
# _ss → _sm column names, and re-enable the bridge-cohort fetch.
# 2. Keyword preview per product (~3 min each)
node scripts/snapsync-saymore/02a-saymore-keyword-preview.cjs # → SAYMORE_RANK_KEYWORD_PREVIEW.csv
node scripts/snapsync-saymore/02b-snapsync-keyword-preview.cjs # → SNAPSYNC_RANK_KEYWORD_PREVIEW.csv
# 3. SerpAPI run (clean kws + brand serp). ~25 min each, ~$20 each
node scripts/snapsync-saymore/03a-saymore-rank-run.cjs # → SAYMORE_RANK_ANALYSIS_V2.csv
node scripts/snapsync-saymore/03b-snapsync-rank-run.cjs # → SNAPSYNC_RANK_ANALYSIS.csv
# 4. Brand maps refix (raw biz name + knowledge-panel matcher). ~5 min each, ~$4 each
node scripts/snapsync-saymore/04a-saymore-brand-maps-refix.cjs
node scripts/snapsync-saymore/04b-snapsync-brand-maps-refix.cjs
# 5. Rank + retention metrics (<1 min each)
node scripts/snapsync-saymore/05a-saymore-rank-analyze.cjs # → .tmp_sm_rank_metrics.json
node scripts/snapsync-saymore/05b-snapsync-rank-analyze.cjs # → .tmp_ss_rank_metrics.json
node scripts/snapsync-saymore/06-snapsync-retention-analyze.cjs # → .tmp_ss_retention_metrics.json
# 6. Combined view (~3 min)
node scripts/snapsync-saymore/07-combined-pre-recompute.cjs # → .tmp_combined_pre_portfolio.json (Print-inclusive)
node scripts/snapsync-saymore/08-combined-pre-retention.cjs # → .tmp_combined_pre_segments.json
node scripts/snapsync-saymore/09-combined-analyze.cjs # → .tmp_combined_metrics.json
# 7. HTMLs are hand-built from the JSON metrics (no auto-generator).
# Edit SAYMORE_DEEP_DIVE.html / SNAPSYNC_DEEP_DIVE.html / SNAPSYNC_SAYMORE_DEEP_DIVE.html
# with the numbers from the metrics JSON files.
Total wall-clock: ~80-90 minutes for a fresh end-to-end run. SerpAPI cost: ~$45-50 across both products. The .tmp_*.json files are intermediate working artifacts written to the repo root.
Gotchas summary (the lessons earned)
product.subscription = TRUEexcludes Print. Right for MRR math, wrong for existing-vs-net-new. The original analysis claimed 86% net-new; with Print included, 76% were existing FCR clients. →feedback_kb_subscription_filter_excludes_print.mdactive_clients.subscriber_idis STRING;accountNumberis INT64. AlwaysSAFE_CAST(... AS INT64)on the active_clients side before joining.product.subscription = TRUEexists on multiple PG names. BothPMS - Saymore (subscription=true)andPMS - Saymore (subscription=false)exist in CRM. The non-subscription rows are zero-value migration/rebrand lines. The KB filter drops them. For dovetail work they need a separate channel.SerpAPI
locationparameter requires canonical names. "Ballyfermot, Dublin, Ireland" returns HTTP 400. Usell=@lat,lng,14zfor maps; rely ongl=iefor country scope on organic.GMB_LOCATIONS schema:
metadata.placeId(not top-level),latlng.latitude/latlng.longitude(nested struct).GoogleAccount_Subscr_MAP_GBP_nativecolumn isSUBSCR_ID_Match(notsubscriber_id).Brand kw for an SME is the raw business name. Don't strip locations / service nouns / category — "Kilcock Radiators" IS the brand, not a service. Brand-cleaning logic is for value-test kws only. →
feedback_sme_brand_is_bizname.mdSerpAPI returns
place_results(singular) for brand-specific maps queries, notlocal_results. The matcher needs to checkplace_results.place_idfirst. Without this fallback, brand-maps hit rate looks like 24% when reality is 96%.CSV cells come back as strings, not booleans.
r.had_print === true || r.had_print === "true". Always."near me" must be preserved as a unit in keyword cleaning. Stripping "near" alone leaves "me" stranded, then re-appending "near me" produces "service me near me".
Worker
dashboard-bq-executedefaultlimitis 500. Passlimit: 20000(or more) explicitly for big queries. Hitting the default silently truncates.CSV parser must handle quoted commas.
amfield is "Last, First" format — commas inside quotes break naive split-by-comma. Use a proper multiline-aware parser.Resume logic must filter out error rows. Otherwise transient SerpAPI errors get baked in as "permanent skip."
Combined rank merge needs an explicit overlap-skip. Using "is the first existing row from source X?" as a dedup key fails because no such flag is set on the first append.
Cohort year < 2019 for SnapSync is the order-management migration, not real signings. Bucket as "Pre-2019 (legacy carry-in)" rather than letting 998 + 1,656 dominate the cohort tables.
Pre-portfolio classification needs
signed_dateas the temporal anchor, notfirstInvoiceOn. Non-subscription products don't have recurring invoices.Pre-portfolio segments should be prioritised by digital depth. A sub with both Websites and Print should be counted in "Had Websites" — the digital product is the stronger anchor.
Future improvements / known gaps
| Gap | What it would unlock |
|---|---|
| Zero-value migration lines in KB | Proper dovetail count (43 dual-active not 38, ~172 migrations not 126). Needs a KB SQL variant that retains zero-MRR lines as marker rows linking SS MRR to SM product label. |
| Bridge cohort for SnapSync | Match-quality reconciliation between cohort_year and bridge cohort year. Endpoint currently returns zero rows for SS. |
Promote .tmp_*.cjs scripts to permanent |
Survive cleanup; let CI re-run quarterly. Move under scripts/snapsync-saymore-* with a README. |
| Auto-generated HTML | Currently HTMLs are hand-edited from the JSON metrics. A small templater would let the HTMLs regenerate from a single source of truth. |
| Vertical sample sizes < 5 | The "categories" cut requires n≥10 (standalone) or n≥15 (combined) to avoid noise. Higher-n thresholds drop interesting niche verticals. Worth a Bayesian shrinkage estimator for small-n categories. |
| Print sunset timeline modelling | The 1,299 "Print + GPI/PBI" subs (32% of book) are aging out as Print declines. Project the digital-anchor migration runway. |
Memory references
feedback_kb_subscription_filter_excludes_print.md— the Phase 6 subscription-filter gotchafeedback_sme_brand_is_bizname.md— Phase 4 raw-name brand-check rulefeedback_saymore_snapsync.md— pre-existing memory that SnapSync and SayMore are the same product line at different labelsreference_env_local_layout.md— .env.local key formatreference_sitepro_listing_map.md— sitepro_listing_map table schemareference_dim_listings_with_subscriber_id.md— identity bridge tablefeedback_lrc_keyword_location_model.md— bare-service-term rule for LRC keywords (related to Phase 2 cleaning)
FCR Dashboard documentation · generated from docs/ · keep counts verified, not guessed.