KEYWORD_INTELLIGENCE — Structure, Maintenance & Downstream Impact

1. Table Overview

listingmanager-1529856313699.fcr_operations.KEYWORD_INTELLIGENCE is the central keyword data store for FCR's digital marketing intelligence. It holds 28.7 million rows of search term data across 8 sources, representing what people search for when looking for services in the categories FCR clients operate in.

Schema (17 columns)

Column	Type	Description
`subscriber_id`	STRING	FCR subscriber ID, or directory listing ID for GP/GetLocal sources
`business_name`	STRING	Business name
`category`	STRING	Raw category (as received from source — often messy)
`city`	STRING	Business city
`county`	STRING	Business county
`keyword`	STRING	The search term
`source`	STRING	Data origin (8 sources — see below)
`month`	DATE	Month partition
`impressions`	INT64	Search impressions
`clicks`	INT64	Clicks (null for GMB source)
`avg_cpc`	FLOAT64	Average cost-per-click (Google Ads only)
`conversions`	FLOAT64	Conversions (Google Ads only)
`ctr`	FLOAT64	Click-through rate
`normalized_category`	STRING	NEW — Clean canonical category name
`keyword_intent`	STRING	NEW — `commercial`, `informational`, `navigational`, or `brand`
`is_fcr_client`	BOOL	NEW — TRUE if subscriber is an active FCR client
`source_type`	STRING	NEW — `paid`, `owned_organic`, `owned_gbp`, `directory`, `third_party`

8 Data Sources

Source	Source Type	What It Is	Rows	Subscribers
`gsc_organic`	owned_organic	Google Search Console — FCR-built websites	12.6M	1,702
`google_ads`	paid	Google Ads — actual search terms that triggered ads	7.0M	703
`gsc_goldenpages`	directory	GSC for Golden Pages — all GP directory listings	3.5M	86,944
`gsc_getlocal`	directory	GSC for GetLocal directory	2.7M	1
`gmb_search`	owned_gbp	Google Business Profile — discovery keywords	2.5M	4,717
`gsc_phonebook`	directory	GSC for Phonebook directory	315K	1
`ahrefs_organic`	third_party	Ahrefs — organic keyword rankings	5.7K	267
`ahrefs_tracked`	third_party	Ahrefs — manually tracked keywords	3.0K	225

Important distinctions:

gsc_goldenpages subscriber_id values are GP listing IDs (86K+), not FCR client IDs. Use is_fcr_client to distinguish.
gmb_search has no click data — the GBP Performance API doesn't provide keyword-level clicks.
google_ads data is search terms (what people actually typed), not campaign keywords.
All data is cumulative over time, not monthly snapshots. Use GROUP BY month for monthly breakdowns.

2. New Columns (Added March 2026)

`normalized_category`

Problem solved: The raw category field had 12,565 distinct values — many were business names with postcodes ("Cunninghams Funeral Directors Dublin D15"), duplicates with different casing ("Funeral director" vs "Funeral Directors"), or synonyms ("Funeral home" = "Funeral director").

How it works: A lookup table CATEGORY_NORMALISATION maps every raw category to a clean canonical name via rules:

Strip eircode patterns (e.g. D15, A91, T12)
Strip business names ending with Irish town/city names
Filter out very long entries (>50 chars) that are clearly business names
Split comma-separated categories (take first segment)
Normalise case (Title Case)
Merge synonyms (Funeral Home → Funeral Director, Tyre Shop → Tire Shop, etc.)

Result: 12,565 raw categories → 9,447 normalised categories. Unknown/unmappable entries tagged as "Unknown".

`keyword_intent`

Problem solved: Informational queries (death notices, "what to wear to a funeral") were mixed with commercial queries ("funeral directors near me"), making keyword lists misleading for sales and LRC.

Classification rules (in priority order):

navigational — references rip.ie, goldenpages, google maps, tripadvisor, etc.
informational — death notices, obituaries, how-to/what-is patterns
commercial — "near me", cost/price/quote signals, locality patterns, Google Ads clicks
commercial — all Google Ads impressions (even without clicks)
commercial — all GMB discovery queries (someone looking for the service)
informational — everything else (safe default)

Impact: For Funeral Directors, this separated 8.4M informational impressions (death notices) from 3.4M commercial impressions — the data that actually matters for selling.

`is_fcr_client`

Problem solved: The gsc_goldenpages source showed 418 "subscribers" for funeral directors — but only ~95 are actual FCR clients. The rest are GP directory listing IDs.

How it works: JOIN against active_clients on subscriber_id. TRUE = active FCR client, FALSE = directory listing or non-client.

`source_type`

Groups the 8 sources into 5 meaningful tiers for reporting: paid, owned_organic, owned_gbp, directory, third_party.

3. Monthly Maintenance Pipeline

When to Run

After the monthly data load — when new GSC, GA, GMB performance, and Ahrefs data has been inserted into KEYWORD_INTELLIGENCE. The underlying data lands monthly, so normalisation runs monthly.

What to Run

Open the BQ console and run data/keyword-intelligence/00_run_all.sql as a single script. On subsequent runs after the first, it only processes new/unclassified rows (WHERE column IS NULL), so it's fast.

The script executes 6 steps:

Step	What	Touches
0	Add columns (IF NOT EXISTS — no-op after first run)	KEYWORD_INTELLIGENCE schema
1	Update CATEGORY_NORMALISATION + backfill `normalized_category`	CATEGORY_NORMALISATION table + KI rows
2	Classify `keyword_intent` on new rows	KI rows
3	Set `is_fcr_client` + `source_type` on new rows	KI rows
4	Recreate `KEYWORD_INTELLIGENCE_MONTHLY` view	View (auto-reflects changes)
5	Rebuild `CATEGORY_BENCHMARKS` + `CATEGORY_TOP_KEYWORDS`	These tables are fully replaced

Adding New Synonyms

When a new category synonym is discovered (e.g. "Heating Engineer" should map to "Plumber"), add a WHEN clause to the synonym CASE block in step 1 of 00_run_all.sql, then re-run. The MERGE will update existing mappings.

Adding New Intent Rules

When a keyword pattern is misclassified (e.g. "dublin deaths" tagged as commercial because it contains a locality), add the pattern to the appropriate REGEXP_CONTAINS block in step 2. New rules apply to all unclassified rows, but to reclassify existing rows, you'd need to NULL out keyword_intent for the affected rows first.

4. Downstream Impact

4.1 Local Rank Check (LRC) — Keyword Selection

Where keywords are chosen: The /prospect skill selects exactly 5 keywords for LRC submission.

Before normalisation: Keywords were selected from categoryKeywords.topKeywords sorted by raw impressions. This meant death notices, rip.ie queries, and other informational noise dominated the top keyword lists for categories like Funeral Directors, Solicitors, and Medical Clinics. The LRC would check rankings for keywords nobody uses to find a funeral director.

After normalisation: The keyword_intent field allows the prospect skill to filter for commercial intent keywords only. The top keywords by impressions are now genuinely the terms people use when looking for the service:

Before (raw)	After (intent-filtered)
"rip ie recent deaths south county dublin" (10,800 imp)	"funeral directors dublin" (56,965 imp)
"death notices dublin" (9,550 imp)	"funeral and cremation price list" (36,462 imp)
"latest death notices" (9,000 imp)	"funeral home near me" (23,691 imp)

Action needed: Update the prospect skill to prefer keyword_intent = 'commercial' when selecting keywords. See Report 2.

4.2 Digital Footprint (InSites) — Keywords Parameter

Where keywords are used: When submitting a DF audit via the n8n webhook, the products field takes 3+ keywords. These determine what InSites evaluates the website against.

Impact: Same as LRC — better keyword selection means InSites evaluates the site against commercially relevant terms, not informational noise.

4.3 Ahrefs — Keyword Volume Lookups

Where keywords are used: After LRC keywords are confirmed, the prospect skill queries Ahrefs for national monthly search volume:

mcp__claude_ai_Ahrefs__keywords-explorer-overview
  keywords: {confirmed keywords}
  country: ie

Impact: The keywords sent to Ahrefs are now commercially relevant, so the volume and difficulty data returned is meaningful for the sales pitch. Previously, sending "death notices dublin" to Ahrefs would return high volume but completely irrelevant data.

4.4 Category Benchmarks

Before: CATEGORY_BENCHMARKS was built from raw category values. "Funeral director" and "Funeral home" were separate benchmark entries. Peer subscriber counts included GP directory listings.

After: Benchmarks are built from normalized_category. One "Funeral Director" benchmark row with:

total_fcr_subscribers: 95 (actual FCR clients)
commercial_keywords: 19,516 / commercial_impressions: 3,369,704
informational_keywords: 24,769 / informational_impressions: 8,397,493

This allows the category-keywords skill and prospect-intel endpoint to present accurate peer comparisons.

4.5 Keyword Gap Analysis

Before: Gap analysis compared subscriber keywords against noisy category top keywords. A subscriber might show 90% "capture rate" because they ranked for death notices, but miss the commercial terms that matter.

After: CATEGORY_TOP_KEYWORDS now includes keyword_intent per keyword. Gap analysis can focus on commercial keyword gaps — the ones that actually represent missed business opportunities.

4.6 Google Keyword Planner Seeds

Where used: prospect-intel.js takes top BQ keywords and sends them to Google Keyword Planner for county-level monthly volumes.

Impact: KP seed keywords are now cleaner. Previously, informational queries could leak into the seed list, returning irrelevant local volume data.

5. Data Quality Notes

Known Limitations

GMB has no click data. Every gmb_search row has clicks = null. This is a Google API limitation, not a data issue.
Ahrefs coverage is thin. Only 267 subscribers with 1 month of data (Feb 2026). Will improve over time.
Intent classification is rule-based. Some edge cases will be misclassified (e.g. "kildare deaths" tagged commercial because it contains a locality). Refine rules monthly as patterns are discovered.
"Unknown" categories. 2.7M rows couldn't be normalised (empty raw category or business-name entries). These are excluded from benchmarks and views but the underlying data is preserved.

Validation Query

Run this monthly after the pipeline to check health:

SELECT
  COUNTIF(normalized_category IS NULL) as null_norm,
  COUNTIF(keyword_intent IS NULL) as null_intent,
  COUNTIF(is_fcr_client IS NULL) as null_fcr,
  COUNTIF(source_type IS NULL) as null_source,
  COUNT(*) as total
FROM `listingmanager-1529856313699.fcr_operations.KEYWORD_INTELLIGENCE`

All four null counts should be 0 after a complete run.

FCR Dashboard documentation · generated from docs/ · keep counts verified, not guessed.