Designing a customer journey data model that survives channel changes
Modeling goals
A durable journey model preserves meaning when channels, tools, and identifiers change. The aim is simple: link the touches that introduce a user to a brand with the events that create value, then keep that link as users shift devices, clear cookies, or move between owned and paid surfaces.
Core entities
At minimum, define these tables:
users
orleads
: first party identifier, signup state, lifecycle metadata.devices
: device-level identifier derived from first party storage.sessions
: time-bounded group of page views and micro-events.touchpoints
: normalized campaign context captured at interaction time.events
: atomic actions such assignup_started
oraccount_created
.
Keys that matter
Stable joins keep the model consistent:
user_id
: assigned when a user authenticates or submits a form.device_id
: created on first visit; survives most page refreshes.session_id
: resets on inactivity or new campaign context.touchpoint_id
: hash of device, canonical URL, and rounded timestamp.
When a user logs in or signs up, link the current device and session to the new user_id
. Backfill earlier touches for that device to create a complete journey.
Event schema
Represent events with a compact schema:
events(
event_id string,
event_name string,
occurred_at timestamp,
user_id string null,
device_id string,
session_id string,
touchpoint_id string,
source string,
medium string,
campaign string,
content string null,
term string null,
channel string,
properties variant
)
This structure keeps acquisition context available for any downstream analysis, including attribution, cohorting, or retention.
First and last touch preservation
Store two reference objects at the moment of conversion:
first_touch_ref
: the earliest touch tied to this user or device.last_touch_ref
: the most recent qualifying touch before conversion.
These can be computed at collection time and stored on the event. Attribution models can then weigh these references without re-scanning all historical rows.
Offline and partner events
Journeys frequently cross systems. Add imports for offline or partner data:
- in-person signup records
- phone sales logs
- affiliate S2S postbacks
Assign synthetic touchpoint_id
values for these rows and map them into the same channel dictionary. This keeps comparisons valid across online and offline sources.
Data hygiene and QA
Protect the model with routine checks:
- fraction of events with valid
source
andmedium
- ratio of
direct
vs. mappable channels - orphaned events without
session_id
ordevice_id
- duplicate
event_id
counts by day
Invest in these basics before tuning attribution weights. Clean inputs matter more than complex scoring.
Practical outputs
With the above in place, teams can publish views most stakeholders recognize:
- lead source by signup date
- assisted channel share for conversions
- time to convert from first touch to signup
- multi-touch attribution comparisons by campaign and week
The data model stays useful as tools come and go because its rules are simple: consistent keys, normalized campaign context, and clear event semantics.
Related posts:
Server-side vs client-side tracking for touchpoint integrity
Case study: how touchpoint stats improved online sales and advertising efficiency
How touchpoint data shapes online sales and advertising spend
Customer journey mapping through touchpoint analysis
Case study: improving signup source tracking for VEGAS.XYZ