KEYWORD_INTELLIGENCE — Monthly Normalisation Pipeline

Overview

These SQL scripts add 5 columns to the KEYWORD_INTELLIGENCE table and rebuild downstream benchmark tables from the normalised data. Run monthly after the GSC/GA/GMB data load.

Columns

Column Type Description
normalized_category STRING Clean canonical category name (via CATEGORY_NORMALISATION lookup)
keyword_intent STRING commercial, informational, navigational, or brand
is_fcr_client BOOL TRUE if subscriber_id is an active FCR client
source_type STRING paid, owned_organic, owned_gbp, directory, third_party
keyword_location STRING Irish city/county extracted from the keyword, or NULL

Execution Order

Run in BQ console in this order, monthly after data load:

1. ALTER TABLE — add columns (one-time only, first run)
2. 01_category_normalisation.sql — populate CATEGORY_NORMALISATION table, backfill normalized_category
3. 02_keyword_intent.sql — classify keyword_intent on new rows (navigational, brand, commercial, informational)
4. 02b_brand_and_location.sql — improved brand detection + keyword_location extraction
5. 03_fcr_client_and_source_type.sql — set is_fcr_client and source_type on new rows
6. 04_monthly_view.sql — create/replace the KEYWORD_INTELLIGENCE_MONTHLY view
7. 05_rebuild_benchmarks.sql — rebuild CATEGORY_BENCHMARKS and CATEGORY_TOP_KEYWORDS

One-Time Setup (First Run)

Before running the scripts for the first time, add the columns:

ALTER TABLE `listingmanager-1529856313699.fcr_operations.KEYWORD_INTELLIGENCE`
  ADD COLUMN IF NOT EXISTS normalized_category STRING,
  ADD COLUMN IF NOT EXISTS keyword_intent STRING,
  ADD COLUMN IF NOT EXISTS is_fcr_client BOOL,
  ADD COLUMN IF NOT EXISTS source_type STRING,
  ADD COLUMN IF NOT EXISTS keyword_location STRING;

Then run all scripts. The first run will backfill all existing rows. Subsequent monthly runs only process new/unclassified rows (WHERE column IS NULL).

What Each Script Does

01 — Category Normalisation

Creates CATEGORY_NORMALISATION lookup table. Strips eircodes, business names, normalises case, merges synonyms (Funeral Directors → Funeral Director etc.). Updates normalized_category on KEYWORD_INTELLIGENCE.

02 — Keyword Intent (Initial Pass)

Classifies keywords by intent using rules:

  • navigational: rip.ie, goldenpages, google maps, yelp, tripadvisor
  • brand: business name match (full name LIKE) or >85% subscriber click concentration
  • commercial: "near me", cost/price patterns, locality + service, paid clicks, GBP discovery
  • informational: how-to/what-is patterns, death notices, default fallback

02b — Brand Refinement + Location Extraction

Two-phase refinement run AFTER step 02:

Phase 1 — Improved brand detection: Uses subscriber click concentration (>70%, <=3 subs) with commercial pattern exclusion. Catches keywords like "frank buttimer solicitor cork" and "valentine ladders" that step 02's business name matching misses (because the keyword contains only part of the business name). Reclassifies ~4M rows from commercial/informational to brand.

Phase 2 — keyword_location extraction: Matches keywords against 730+ Irish cities and counties (from active_clients). Longest match wins — "electrician naas kildare" → "Kildare". Tags ~6M rows. NULL means no location in the keyword (e.g. "solicitor near me").

03 — FCR Client + Source Type

Flags FCR clients (JOIN against active_clients snapshot) and maps sources to tiers:

  • paid = google_ads
  • owned_organic = gsc_organic
  • owned_gbp = gmb_search
  • directory = gsc_goldenpages
  • third_party = everything else

04 — Monthly View

Creates a pre-aggregated monthly view for fast category-level queries.

05 — Rebuild Benchmarks

Rebuilds CATEGORY_BENCHMARKS and CATEGORY_TOP_KEYWORDS from normalised data, now including:

  • total_fcr_subscribers — real FCR client count (not inflated GP directory counts)
  • commercial_keywords / commercial_impressions — commercial intent only
  • informational_keywords / informational_impressions
  • brand_keywords / navigational_keywords
  • keyword_intent per keyword in CATEGORY_TOP_KEYWORDS
  • appearing_in_fcr_subs — FCR clients ranking for each keyword

Known Issues

Category Mapping Mismatch

Three naming systems exist:

  • GP: "Dental Surgeons", "Funeral Directors", "Solicitors"
  • Google GBP: "Dental Clinic", "Funeral Director", "Legal Services"
  • KI normalized: "Dentist", "Funeral Director", "Law Firm"

CATEGORY_MAPPING bridges GP↔Google but normalized_category is a third set of names. The worker handlers now match against BOTH gp_category_name AND google_category_name from CATEGORY_MAPPING, plus the raw category column, to ensure results are found regardless of which naming system applies. Long-term fix: align step 01 normalisation to use CATEGORY_MAPPING as the reference.

Keyword Planner CPC

Google returns zero bid estimates for keywords with too few advertisers in a location. The worker handler returns estimated_cpc_eur: null in these cases — the advisor should omit CPC rather than show EUR0.00.

Maintaining the Synonym List

The synonym mapping in 01_category_normalisation.sql covers the most common duplicates. To add new synonyms:

  1. Query for the raw category values you want to merge
  2. Add a new WHEN clause to the synonym CASE block
  3. Re-run step 01 — the MERGE will update existing mappings

Maintaining the Brand Detection

The brand detection in 02b_brand_and_location.sql uses subscriber click concentration. If you find false positives (generic keywords misclassified as brand), increase the concentration threshold (currently 0.70) or add patterns to the commercial exclusion regex. If you find false negatives (brand keywords still showing as commercial), lower the threshold or increase the subscriber count limit (currently <=3).

Maintaining the Location List

Location names come from active_clients cities and counties. As new clients are added in new towns, the location list grows automatically. Excluded short/ambiguous names (new, old, cross, bridge, hill, park, etc.) are hardcoded in 02b_brand_and_location.sql — add to this list if new false positives appear.

FCR Dashboard documentation · generated from docs/ · keep counts verified, not guessed.

Ask the docsRAG over this site
Ask anything about the FCR Dashboard platform — architecture, BigQuery, the worker routes, billing rules, the LRC stack, scoring… Answers are grounded in this documentation, with source links.
How does the deal-brief refresh work? Which routes are Worker vs n8n? How is account health scored?