PROSPECT_LISTINGS — composition, provenance & resolve-queue state (2026-05-28)

Snapshot audit of what's actually in listingmanager-1529856313699.fcr_operations.PROSPECT_LISTINGS (108,071 rows) — how listings got there, how they reconcile to the master book, what GBP/crawl data we hold, and what SerpAPI resolution work is sitting un-promoted.

Numbers are a point-in-time snapshot (2026-05-28); re-run the queries in §7 to refresh.

Update 2026-06-01 — three figures below have moved materially; corrections:

  • NOFs (net-new, no subscriber_id): now 42,555 total / 29,916 in-universe (was 12,242). The SERP-archive harvest (serp_archive_harvest, 30,313 rows) landed after this snapshot, roughly tripling the net-new/NOF layer.
  • GBP data coverage: now 84,841 place_id rows carry has_serp_data=TRUE (review_count 86,114, rating 82,567) — effectively all listings and place_id NOFs that have been enriched, NOT the 15,360 in §1/§5. SerpAPI + Pleper enrichment ran broadly after 2026-05-28; the GBP basics live on the spine (review_count/rating/has_serp_data), not mainly in GBP_INFORMATION (~13.5K, the description sub-table only). The ~33K real-place_id rows still without it are recent harvest held behind the enrich_enabled gate.
  • The crawl is complete across all live sites, with HTTP response breakouts confirmed: of ~80,270 unique domains, 78% cleanly crawled; failures split 5xx ≈ 8,675 / 4xx ≈ 5,976 / timeout ≈ 2,352 / robots ≈ 573 unique domains. 530/NXDOMAIN dead domains were DoH-confirmed (3,775) and routed to website_action_queue; where a live site returned 403, Ahrefs batch-analysis extracted SEO insights into dim_seo (10,570 domains). Full as-built detail: Prospect Enrichment & Opportunity Index.

§1 below is refreshed to 2026-06-01; §2–§7 remain the original 2026-05-28 snapshot (reconciliation arithmetic, provenance, resolve-queue) — re-run the §7 queries to refresh those.

1. Population matrix (file-status × place_id × web)

Distinct (refreshed 2026-06-01): 152,091 subscribers · 118,606 Google place_ids · 104,226 LST: synthetic · 80,270 web domains.

Cohort place_id / web Listings has GBP data crawled site live
Active client (paying) google place_id + web 3,122 2,961 3,116 2,834
google place_id + no web 138 112
On-file, not active (activation pool) google place_id + web 51,723 41,436 50,791 41,175
google place_id + no web 21,069 10,218
LST: synthetic + web 29,785 0 29,694 22,278
LST: synthetic + no web 74,441 0
NOF (no subscriber_id, net-new) google place_id + web 30,781 20,926 29,013 24,271
google place_id + no web 11,774 9,188

On-file = 180,278 · NOF = 42,555 · active = 3,260 · total = 222,833. "has GBP data" = has_serp_data on the spine (LST: synthetic rows have no real place_id → never GBP-enrichable); "no web" rows are never crawled. Crawl is complete across all live sites; HTTP-response and 403→Ahrefs detail is in Prospect Enrichment & Opportunity Index.

2. Identity keys

  • Primary key = place_id. Real Google place_ids are ChIJ… (88,285 of 88,293).
  • LST: synthetic pid = "LST:" || dim.listing_id (e.g. LST:IE_32671122_468295_13141). Minted for on-file listings with a website but no resolved Google place_id — crawlable (crawl keys on crawl_root), but never GBP-enrichable (no real place_id → has_serp_data=FALSE for all).
  • PROSPECT_LISTINGS has no listing_id column — the only listing_id linkage is via the LST: prefix. Maps-resolved rows link to the book only through the place_id embedded in dim.urls_urls.

3. Master book ↔ prospect

Master = SUPERSET_MASTER.dim_listings_with_subscriber_id (213,835 rows / 180,343 distinct listing_id). The Google place_id lives inside urls_urls (pipe-delimited |, aligned to urls_types; the GOOGLEMAPS value is the ChIJ… place_id — there is no place_id column).

Listings
Distinct listings in master 180,343
→ in prospect via Google place_id 62,452
→ in prospect via LST: listing_id 19,777
In prospect 82,229
Not in prospect 98,114 (54%)
  • Book ChIJ place_ids: 57,524 distinct; 57,510 are in prospect, 14 are not (the "off-by-14").
  • The 98,114 not-in-prospect are overwhelmingly listings with neither a usable website nor a Google place_id — nothing to crawl or enrich, so never ingested.

4. Provenance of prospect's 88,284 Google place_ids (reconciliation)

88,284  =  57,510  book-sourced (in dim.urls_urls; book total 57,524 − 14 not-in-prospect)
        +  19,805  resolve-promoted (distinct ChIJ, prospect_resolve_queue.promoted=TRUE)
        −   1,167  overlap (in BOTH book and promoted — don't double-count)
        +  12,136  net-new harvest (SayMore SERP + competitor enrich — neither book nor resolve)
Provenance place_ids with subscriber NOF (no sub)
Book only 56,343 56,226 117
Promoted only 18,638 18,638 0
Book + promoted (overlap) 1,167 1,167 0
Net-new harvest 12,136 12 12,124

The 12,136 net-new are ~99.9% NOF (no subscriber). So "subscribers (on-file) with a place_id" ≈ book + resolve-promoted − overlap; the net-new harvest is a separate NOF layer. prospect is NOT a strict subset of the book — it's the book's web/maps-resolvable slice plus the net-new harvest.

5. GBP / SerpAPI / Pleper (where we have a place_id)

place_id ≠ GBP data. Only 15,360 of ~88k place_id rows carry has_serp_data=TRUE — the rest have the place_id (the key) but enrichment was never run (enrich_enabled cost gate). What we have is dated and currently all <90 days old (SerpAPI rows span 2026-03-22 → 05-28). 4,688 Pleper errors logged (pleper_error/pleper_error_at).

Table (fcr_operations) Source Date column Key Holds
serp_gbp_snapshots SerpAPI snapshot_date, synced_at place_id place details, rating, review/photo/post counts, hours/desc flags, raw_json
GBP_REVIEWS SerpAPI review_date, crawled_at place_id individual reviews (≤20/pid), owner responses
GBP_SERVICES Pleper scraped_date place_id services list (name/desc/price/category) + photo/video counts

Split: SerpAPI = place details + reviews + hours flag (NOT a services list). Pleper = the services list + authoritative photo/video counts. PROSPECT_LISTINGS.last_updated = GBP refresh stamp (last writer of SerpAPI/Pleper paths wins).

6. Website crawls

One engine (crawlWebsite(), worker/src/handlers/crawl-website.js), two passes + one external audit:

  • (a) Shallow — LIVE. 34 typed crawl_* cols + 8KB crawl_raw JSON (technical/features/technologies/structuredData/sitemap/robots/aiReadiness/social/contact). Includes AI-block cols (crawl_ai_bots_blocked/_edge_blocked/_live_blocked) and crawl_slow_attempts. 65,308 crawled (2026-05-23→28).
  • (b) Deep — DISABLED since 2026-05-24 (tripped bot-detection). Adds crawl_deep_at, crawl_service_page_count, crawl_location_page_count. 8,973 rows got it before shutoff.
  • (c) InSites / Digital Footprint — external (n8n). Off-page (SEO/backlinks/rank/GBP-completeness/social/ads). Not persisted to BQ — full report lives on InSites, KV-cached 1hr; only audit metadata in fcr_operations.USG_INSITES_AUDIT_INVENTORY.

JS / tag-manager gap: the crawler is fetch-only — never executes JS, so tags injected at runtime via Google Tag Manager are missed (crawl_has_analytics/_google_ads_tag/_fb_pixel under-report). Fix is in-house: env.BROWSER (Cloudflare Browser Rendering / @cloudflare/puppeteer) already exists and is proven in serp-screenshot.js — just not wired to the crawler.

7. prospect_resolve_queue — SerpAPI resolution state (101,114 rows)

Working table for resolving on-file listings (keyed by listing_id/subscriber_id) to a Google place_id. A row only reaches PROSPECT_LISTINGS when promoted = TRUE. Key cols: cand_place_id, decision, promoted, status, match signals (match_website/match_phone/match_name_score/match_distance_m/match_category).

decision / status promoted rows candidate pid not in prospect note
promote / resolved TRUE 19,353 upserted ✓
promote_loose / resolved TRUE 1,245 upserted ✓
promote / resolved FALSE 1,281 568 distinct clean missed upsert — high conf (1,127 website matches, name 0.66); promote step never ran
null / resolved FALSE 30,418 16,485 distinct have a candidate but no decision applied; low conf (0 website matches, name 0.34) — need classifier, not blind upsert
added_no_place_id / resolved FALSE 19,704 11,543 distinct added as LST:; candidate not attached. 7,363 had a website match → re-review
null / pending_phase2_nocat FALSE 28,785 0 (no candidate yet) never SerpAPI-resolved — carved out for having NO primary_category (see 08-carve-no-category.cjs)
null / pending_clientfix FALSE 136 0 needs client data fix
no_match / error FALSE 188

Action items:

  1. Promote the 568 (promote/promoted=FALSE) — already-decided confident matches, just never written. Run 03-promote.cjs.
  2. 28,785 pending_phase2_nocat — the next SerpAPI run. They lack primary_category, which the category-class match gate needs → must promote the first category from the listing's category list into primary_category before/at resolve time. Multi-parallel SerpAPI run.
  3. 30,418 undecided + 7,363 website-matched LST adds — re-run classifier / re-review before promoting.

Toolchain

scripts/prospect-resolve/: 01-create-and-seed-queue02-resolve-batch (SerpAPI) → _classify / 09-reclassify (CATEGORY_MAPPING-driven) → 03-promote (upsert to PROSPECT_LISTINGS). 04-add-no-placeid-prospects (LST adds), 06-seed-phase2-nap, 08-carve-no-category, 13-export-pending-phase2-nocat.

FCR Dashboard documentation · generated from docs/ · keep counts verified, not guessed.

Ask the docsRAG over this site
Ask anything about the FCR Dashboard platform — architecture, BigQuery, the worker routes, billing rules, the LRC stack, scoring… Answers are grounded in this documentation, with source links.
How does the deal-brief refresh work? Which routes are Worker vs n8n? How is account health scored?