KEYWORD_INTELLIGENCE — Monthly Normalisation Pipeline
Overview
These SQL scripts add 5 columns to the KEYWORD_INTELLIGENCE table and rebuild downstream benchmark tables from the normalised data. Run monthly after the GSC/GA/GMB data load.
Columns
| Column | Type | Description |
|---|---|---|
normalized_category |
STRING | Clean canonical category name (via CATEGORY_NORMALISATION lookup) |
keyword_intent |
STRING | commercial, informational, navigational, or brand |
is_fcr_client |
BOOL | TRUE if subscriber_id is an active FCR client |
source_type |
STRING | paid, owned_organic, owned_gbp, directory, third_party |
keyword_location |
STRING | Irish city/county extracted from the keyword, or NULL |
Execution Order
Run in BQ console in this order, monthly after data load:
1. ALTER TABLE — add columns (one-time only, first run)
2. 01_category_normalisation.sql — populate CATEGORY_NORMALISATION table, backfill normalized_category
3. 02_keyword_intent.sql — classify keyword_intent on new rows (navigational, brand, commercial, informational)
4. 02b_brand_and_location.sql — improved brand detection + keyword_location extraction
5. 03_fcr_client_and_source_type.sql — set is_fcr_client and source_type on new rows
6. 04_monthly_view.sql — create/replace the KEYWORD_INTELLIGENCE_MONTHLY view
7. 05_rebuild_benchmarks.sql — rebuild CATEGORY_BENCHMARKS and CATEGORY_TOP_KEYWORDS
One-Time Setup (First Run)
Before running the scripts for the first time, add the columns:
ALTER TABLE `listingmanager-1529856313699.fcr_operations.KEYWORD_INTELLIGENCE`
ADD COLUMN IF NOT EXISTS normalized_category STRING,
ADD COLUMN IF NOT EXISTS keyword_intent STRING,
ADD COLUMN IF NOT EXISTS is_fcr_client BOOL,
ADD COLUMN IF NOT EXISTS source_type STRING,
ADD COLUMN IF NOT EXISTS keyword_location STRING;
Then run all scripts. The first run will backfill all existing rows. Subsequent monthly runs only process new/unclassified rows (WHERE column IS NULL).
What Each Script Does
01 — Category Normalisation
Creates CATEGORY_NORMALISATION lookup table. Strips eircodes, business names, normalises case, merges synonyms (Funeral Directors → Funeral Director etc.). Updates normalized_category on KEYWORD_INTELLIGENCE.
02 — Keyword Intent (Initial Pass)
Classifies keywords by intent using rules:
- navigational: rip.ie, goldenpages, google maps, yelp, tripadvisor
- brand: business name match (full name LIKE) or >85% subscriber click concentration
- commercial: "near me", cost/price patterns, locality + service, paid clicks, GBP discovery
- informational: how-to/what-is patterns, death notices, default fallback
02b — Brand Refinement + Location Extraction
Two-phase refinement run AFTER step 02:
Phase 1 — Improved brand detection: Uses subscriber click concentration (>70%, <=3 subs) with commercial pattern exclusion. Catches keywords like "frank buttimer solicitor cork" and "valentine ladders" that step 02's business name matching misses (because the keyword contains only part of the business name). Reclassifies ~4M rows from commercial/informational to brand.
Phase 2 — keyword_location extraction: Matches keywords against 730+ Irish cities and counties (from active_clients). Longest match wins — "electrician naas kildare" → "Kildare". Tags ~6M rows. NULL means no location in the keyword (e.g. "solicitor near me").
03 — FCR Client + Source Type
Flags FCR clients (JOIN against active_clients snapshot) and maps sources to tiers:
paid= google_adsowned_organic= gsc_organicowned_gbp= gmb_searchdirectory= gsc_goldenpagesthird_party= everything else
04 — Monthly View
Creates a pre-aggregated monthly view for fast category-level queries.
05 — Rebuild Benchmarks
Rebuilds CATEGORY_BENCHMARKS and CATEGORY_TOP_KEYWORDS from normalised data, now including:
total_fcr_subscribers— real FCR client count (not inflated GP directory counts)commercial_keywords/commercial_impressions— commercial intent onlyinformational_keywords/informational_impressionsbrand_keywords/navigational_keywordskeyword_intentper keyword in CATEGORY_TOP_KEYWORDSappearing_in_fcr_subs— FCR clients ranking for each keyword
Known Issues
Category Mapping Mismatch
Three naming systems exist:
- GP: "Dental Surgeons", "Funeral Directors", "Solicitors"
- Google GBP: "Dental Clinic", "Funeral Director", "Legal Services"
- KI normalized: "Dentist", "Funeral Director", "Law Firm"
CATEGORY_MAPPING bridges GP↔Google but normalized_category is a third set of names. The worker handlers now match against BOTH gp_category_name AND google_category_name from CATEGORY_MAPPING, plus the raw category column, to ensure results are found regardless of which naming system applies. Long-term fix: align step 01 normalisation to use CATEGORY_MAPPING as the reference.
Keyword Planner CPC
Google returns zero bid estimates for keywords with too few advertisers in a location. The worker handler returns estimated_cpc_eur: null in these cases — the advisor should omit CPC rather than show EUR0.00.
Maintaining the Synonym List
The synonym mapping in 01_category_normalisation.sql covers the most common duplicates. To add new synonyms:
- Query for the raw category values you want to merge
- Add a new WHEN clause to the synonym CASE block
- Re-run step 01 — the MERGE will update existing mappings
Maintaining the Brand Detection
The brand detection in 02b_brand_and_location.sql uses subscriber click concentration. If you find false positives (generic keywords misclassified as brand), increase the concentration threshold (currently 0.70) or add patterns to the commercial exclusion regex. If you find false negatives (brand keywords still showing as commercial), lower the threshold or increase the subscriber count limit (currently <=3).
Maintaining the Location List
Location names come from active_clients cities and counties. As new clients are added in new towns, the location list grows automatically. Excluded short/ambiguous names (new, old, cross, bridge, hill, park, etc.) are hardcoded in 02b_brand_and_location.sql — add to this list if new false positives appear.
FCR Dashboard documentation · generated from docs/ · keep counts verified, not guessed.