Building a touchpoint taxonomy for lead source accuracy

Why a taxonomy matters

Lead source accuracy depends on consistent naming. Without a taxonomy, one team may use utm_medium=social, another may send utm_medium=paid-social, and a third may omit the parameter entirely. Each variation fragments reporting, inflates direct traffic, and hides the channels that drive awareness. A simple, enforceable taxonomy turns scattered labels into a single source of truth.

Core design principles

  • Make the smallest set of allowed values that meets current needs.
  • Prefer prescriptive lists over free text for source and medium.
  • Separate how campaigns are tagged from how they are grouped into channels.
  • Enforce normalization at ingest and produce clean data in the warehouse.

Standard UTM policy

Use lowercase, hyphen-separated words. Keep fields short and unambiguous.

  • utm_source: vendor or property, e.g., google, instagram, newsletter
  • utm_medium: one of cpc, email, social, referral, affiliate, display, video, audio, organic, direct
  • utm_campaign: marketing initiative, e.g., fall-sale-202509
  • utm_content: creative or placement id, e.g., carousel-a
  • utm_term: paid search keyword when applicable

If a parameter is missing, supply a default at collection time. For example, empty medium for a known paid partner can be mapped to display. Do not perform these fixes inside dashboards; apply them once during ingestion so downstream consumers see the same values.

Channel mapping rules

Translate source and medium pairs into a canonical channel dimension. Keep mappings explicit and audited in version control.

Examples:

  • google + cpc -> paid search
  • bing + cpc -> paid search
  • instagram + social -> paid social
  • newsletter + email -> email
  • empty source + direct -> direct

Ambiguous or missing inputs should map to unclassified until governance decisions are made. This makes gaps visible so the team can fix upstream links or expand the dictionary.

Event and property schema

Adopt a compact set of events for acquisition and conversion. Recommended names:

  • page_view
  • lead_form_viewed
  • lead_form_started
  • lead_form_submitted
  • signup_started
  • signup_submitted
  • account_created

Common properties:

  • source, medium, campaign, content, term
  • channel
  • session_id, device_id, user_id when available
  • touchpoint_id (hash of device, timestamp, and URL)
  • first_touch_ref, last_touch_ref objects when known

Naming conventions

  • Keep event names lowercase with underscores.
  • Use ISO timestamps in UTC.
  • Avoid spaces and punctuation in parameter values.
  • For recurring campaigns, append a period marker like yyyymm or qN.

Governance workflow

Data quality improves when ownership is clear and feedback is quick. A lightweight process is sufficient for most teams:

  1. Maintain the UTM and channel dictionary in a shared repository.
  2. Validate incoming parameters at the collector or ETL tier.
  3. Reject or quarantine events with invalid values.
  4. Review weekly: unclassified shares, top sources, and new patterns.
  5. Announce changes with effective dates to keep dashboards consistent.

Implementation checklist

  • Create a campaign dictionary with allowed values and examples.
  • Add a server endpoint that normalizes UTMs and enriches channel.
  • Store a durable first_touch_ref and pass it through conversion events.
  • Dedupe events by touchpoint_id plus event key.
  • Build a QA report that flags unclassified or missing medium.

Measuring success

Track three ratios over time:

  • the share of conversions with a valid source and medium
  • the share of traffic mapped to a canonical channel
  • the size of the unclassified and direct buckets

When those percentages move in the right direction and stay there, the taxonomy is working. Campaign review becomes faster, budget shifts are grounded in consistent data, and teams develop a shared language for acquisition and retention.

Related posts:

More tdstats.com: