Building a touchpoint taxonomy for lead source accuracy
Why a taxonomy matters
Lead source accuracy depends on consistent naming. Without a taxonomy, one team may use utm_medium=social, another may send utm_medium=paid-social, and a third may omit the parameter entirely. Each variation fragments reporting, inflates direct traffic, and hides the channels that drive awareness. A simple, enforceable taxonomy turns scattered labels into a single source of truth.
Core design principles
- Make the smallest set of allowed values that meets current needs.
- Prefer prescriptive lists over free text for source and medium.
- Separate how campaigns are tagged from how they are grouped into channels.
- Enforce normalization at ingest and produce clean data in the warehouse.
Standard UTM policy
Use lowercase, hyphen-separated words. Keep fields short and unambiguous.
utm_source: vendor or property, e.g.,google,instagram,newsletterutm_medium: one ofcpc,email,social,referral,affiliate,display,video,audio,organic,directutm_campaign: marketing initiative, e.g.,fall-sale-202509utm_content: creative or placement id, e.g.,carousel-autm_term: paid search keyword when applicable
If a parameter is missing, supply a default at collection time. For example, empty medium for a known paid partner can be mapped to display. Do not perform these fixes inside dashboards; apply them once during ingestion so downstream consumers see the same values.
Channel mapping rules
Translate source and medium pairs into a canonical channel dimension. Keep mappings explicit and audited in version control.
Examples:
google+cpc->paid searchbing+cpc->paid searchinstagram+social->paid socialnewsletter+email->email- empty
source+direct->direct
Ambiguous or missing inputs should map to unclassified until governance decisions are made. This makes gaps visible so the team can fix upstream links or expand the dictionary.
Event and property schema
Adopt a compact set of events for acquisition and conversion. Recommended names:
page_viewlead_form_viewedlead_form_startedlead_form_submittedsignup_startedsignup_submittedaccount_created
Common properties:
source,medium,campaign,content,termchannelsession_id,device_id,user_idwhen availabletouchpoint_id(hash of device, timestamp, and URL)first_touch_ref,last_touch_refobjects when known
Naming conventions
- Keep event names lowercase with underscores.
- Use ISO timestamps in UTC.
- Avoid spaces and punctuation in parameter values.
- For recurring campaigns, append a period marker like
yyyymmorqN.
Governance workflow
Data quality improves when ownership is clear and feedback is quick. A lightweight process is sufficient for most teams:
- Maintain the UTM and channel dictionary in a shared repository.
- Validate incoming parameters at the collector or ETL tier.
- Reject or quarantine events with invalid values.
- Review weekly: unclassified shares, top sources, and new patterns.
- Announce changes with effective dates to keep dashboards consistent.
Implementation checklist
- Create a campaign dictionary with allowed values and examples.
- Add a server endpoint that normalizes UTMs and enriches
channel. - Store a durable
first_touch_refand pass it through conversion events. - Dedupe events by
touchpoint_idplus event key. - Build a QA report that flags unclassified or missing medium.
Measuring success
Track three ratios over time:
- the share of conversions with a valid
sourceandmedium - the share of traffic mapped to a canonical
channel - the size of the
unclassifiedanddirectbuckets
When those percentages move in the right direction and stay there, the taxonomy is working. Campaign review becomes faster, budget shifts are grounded in consistent data, and teams develop a shared language for acquisition and retention.
Related posts:
Case study: how touchpoint stats improved online sales and advertising efficiency
How touchpoint data shapes online sales and advertising spend
Customer journey mapping through touchpoint analysis