Designing a customer journey data model that survives channel changes

Modeling goals

A durable journey model preserves meaning when channels, tools, and identifiers change. The aim is simple: link the touches that introduce a user to a brand with the events that create value, then keep that link as users shift devices, clear cookies, or move between owned and paid surfaces.

Core entities

At minimum, define these tables:

users or leads: first party identifier, signup state, lifecycle metadata.
devices: device-level identifier derived from first party storage.
sessions: time-bounded group of page views and micro-events.
touchpoints: normalized campaign context captured at interaction time.
events: atomic actions such as signup_started or account_created.

Keys that matter

Stable joins keep the model consistent:

user_id: assigned when a user authenticates or submits a form.
device_id: created on first visit; survives most page refreshes.
session_id: resets on inactivity or new campaign context.
touchpoint_id: hash of device, canonical URL, and rounded timestamp.

When a user logs in or signs up, link the current device and session to the new user_id. Backfill earlier touches for that device to create a complete journey.

Event schema

Represent events with a compact schema:

events(
  event_id string,
  event_name string,
  occurred_at timestamp,
  user_id string null,
  device_id string,
  session_id string,
  touchpoint_id string,
  source string,
  medium string,
  campaign string,
  content string null,
  term string null,
  channel string,
  properties variant
)

This structure keeps acquisition context available for any downstream analysis, including attribution, cohorting, or retention.

First and last touch preservation

Store two reference objects at the moment of conversion:

first_touch_ref: the earliest touch tied to this user or device.
last_touch_ref: the most recent qualifying touch before conversion.

These can be computed at collection time and stored on the event. Attribution models can then weigh these references without re-scanning all historical rows.

Offline and partner events

Journeys frequently cross systems. Add imports for offline or partner data:

in-person signup records
phone sales logs
affiliate S2S postbacks

Assign synthetic touchpoint_id values for these rows and map them into the same channel dictionary. This keeps comparisons valid across online and offline sources.

Data hygiene and QA

Protect the model with routine checks:

fraction of events with valid source and medium
ratio of direct vs. mappable channels
orphaned events without session_id or device_id
duplicate event_id counts by day

Invest in these basics before tuning attribution weights. Clean inputs matter more than complex scoring.

Practical outputs

With the above in place, teams can publish views most stakeholders recognize:

lead source by signup date
assisted channel share for conversions
time to convert from first touch to signup
multi-touch attribution comparisons by campaign and week

The data model stays useful as tools come and go because its rules are simple: consistent keys, normalized campaign context, and clear event semantics.