Building a touchpoint taxonomy for lead source accuracy
Why a taxonomy matters
Lead source accuracy depends on consistent naming. Without a taxonomy, one team may use utm_medium=social
, another may send utm_medium=paid-social
, and a third may omit the parameter entirely. Each variation fragments reporting, inflates direct traffic, and hides the channels that drive awareness. A simple, enforceable taxonomy turns scattered labels into a single source of truth.
Core design principles
- Make the smallest set of allowed values that meets current needs.
- Prefer prescriptive lists over free text for source and medium.
- Separate how campaigns are tagged from how they are grouped into channels.
- Enforce normalization at ingest and produce clean data in the warehouse.
Standard UTM policy
Use lowercase, hyphen-separated words. Keep fields short and unambiguous.
utm_source
: vendor or property, e.g.,google
,instagram
,newsletter
utm_medium
: one ofcpc
,email
,social
,referral
,affiliate
,display
,video
,audio
,organic
,direct
utm_campaign
: marketing initiative, e.g.,fall-sale-202509
utm_content
: creative or placement id, e.g.,carousel-a
utm_term
: paid search keyword when applicable
If a parameter is missing, supply a default at collection time. For example, empty medium for a known paid partner can be mapped to display
. Do not perform these fixes inside dashboards; apply them once during ingestion so downstream consumers see the same values.
Channel mapping rules
Translate source
and medium
pairs into a canonical channel
dimension. Keep mappings explicit and audited in version control.
Examples:
google
+cpc
->paid search
bing
+cpc
->paid search
instagram
+social
->paid social
newsletter
+email
->email
- empty
source
+direct
->direct
Ambiguous or missing inputs should map to unclassified
until governance decisions are made. This makes gaps visible so the team can fix upstream links or expand the dictionary.
Event and property schema
Adopt a compact set of events for acquisition and conversion. Recommended names:
page_view
lead_form_viewed
lead_form_started
lead_form_submitted
signup_started
signup_submitted
account_created
Common properties:
source
,medium
,campaign
,content
,term
channel
session_id
,device_id
,user_id
when availabletouchpoint_id
(hash of device, timestamp, and URL)first_touch_ref
,last_touch_ref
objects when known
Naming conventions
- Keep event names lowercase with underscores.
- Use ISO timestamps in UTC.
- Avoid spaces and punctuation in parameter values.
- For recurring campaigns, append a period marker like
yyyymm
orqN
.
Governance workflow
Data quality improves when ownership is clear and feedback is quick. A lightweight process is sufficient for most teams:
- Maintain the UTM and channel dictionary in a shared repository.
- Validate incoming parameters at the collector or ETL tier.
- Reject or quarantine events with invalid values.
- Review weekly: unclassified shares, top sources, and new patterns.
- Announce changes with effective dates to keep dashboards consistent.
Implementation checklist
- Create a campaign dictionary with allowed values and examples.
- Add a server endpoint that normalizes UTMs and enriches
channel
. - Store a durable
first_touch_ref
and pass it through conversion events. - Dedupe events by
touchpoint_id
plus event key. - Build a QA report that flags unclassified or missing medium.
Measuring success
Track three ratios over time:
- the share of conversions with a valid
source
andmedium
- the share of traffic mapped to a canonical
channel
- the size of the
unclassified
anddirect
buckets
When those percentages move in the right direction and stay there, the taxonomy is working. Campaign review becomes faster, budget shifts are grounded in consistent data, and teams develop a shared language for acquisition and retention.
Related posts:
Case study: how touchpoint stats improved online sales and advertising efficiency
How touchpoint data shapes online sales and advertising spend
Customer journey mapping through touchpoint analysis