Skip to content
← ALL WRITING

2026-04-23 / 8 MIN READ

Event schema design for DTC: naming that survives replatform

Pattern library for DTC event schema design. Canonical event names, typed payloads, versioning, and naming rules that survive a replatform.

Event schemas are the contract between your ecommerce stack and your warehouse. Get the contract right and every downstream dashboard, attribution model, and email flow is stable. Get it wrong and every replatform forces a rewrite, every new ad platform breaks a join, and every integration produces a subtly different version of "purchase."

This is the pattern library I use on every warehouse-first engagement. Canonical event names, typed payloads, versioning, and the naming rules that survive a Shopify-to-Hydrogen replatform or a Klaviyo-to-Customer.io migration.

Fits into the warehouse-first analytics rebuild hub at stage 1, the foundation. Every other stage depends on this.

event catalog
6 events shown · 22 total
envelope (shared across all events)
{
  event_id: string;          // UUID v4
  event_name: string;        // from catalog
  event_version: string;     // semver
  occurred_at: string;       // ISO 8601
  received_at: string;       // ingest-time
  anonymous_id: string;      // cookie id
  customer_id: string | null;
  session_id: string;
  source: "web" | "server" | "email" | "mobile";
  properties: Properties;    // event-specific
  context: { ... };
}
product_view · propertiesv2.0
{
  product_id: string;
  product_title: string;
  variant_id: string | null;
  sku: string | null;
  price: number;          // decimal, not string
  currency: string;       // ISO 4217
  category: string | null;
  collections: string[];
}
Shared envelope, typed properties per event. Same shape works across Shopify, Hydrogen, or BigCommerce.

The DTC event catalog

A mid-market DTC brand needs surprisingly few event names. 18 to 22 core events cover the operator surface. Here is the canonical list I start with.

Commerce events (11)

page_view
product_view
collection_view
add_to_cart
remove_from_cart
view_cart
begin_checkout
add_shipping_info
add_payment_info
purchase
refund

Lifecycle events (6)

subscription_created
subscription_renewed
subscription_cancelled
customer_identified
newsletter_subscribed
newsletter_unsubscribed

Email events (5)

email_sent
email_opened
email_clicked
email_bounced
email_unsubscribed

That is 22 events. Most brands have many more in their source systems and 22 is what your mart layer really needs. Everything in the source data can be mapped to one of these or dropped as not-operator-relevant.

The canonical envelope

Every event, regardless of name, carries the same envelope fields.

interface EventEnvelope {
  event_id: string;          // UUID v4, unique per event
  event_name: string;        // from the canonical list above
  event_version: string;     // semver, e.g., "2.0"
  occurred_at: string;       // ISO 8601 with TZ offset
  received_at: string;       // ISO 8601, ingest-time
  anonymous_id: string;      // cookie/device-level id
  customer_id: string | null;// canonical customer id (not Shopify's)
  session_id: string;        // session correlation id
  source: "web" | "server" | "email" | "mobile";
  properties: Record<string, unknown>; // event-specific, typed per event
  context: {
    user_agent?: string;
    ip_hash?: string;
    locale?: string;
    currency?: string;
    page?: { url: string; referrer?: string };
  };
}

The envelope is the part that does not change. The properties blob is event-specific and strictly typed per event name.

Typed payloads per event

Every event name has a specific properties shape. Consistency here is where the rewrites-later live.

// product_view
interface ProductViewProperties {
  product_id: string;
  product_title: string;
  variant_id: string | null;
  variant_title: string | null;
  sku: string | null;
  price: number;          // decimal, not string
  compare_at_price: number | null;
  currency: string;       // ISO 4217
  category: string | null;
  brand: string | null;
  collections: string[];
}

// purchase
interface PurchaseProperties {
  order_id: string;       // canonical, not Shopify's numeric id
  order_number: string;   // human-readable order number
  transaction_id: string; // same as order_id for dedup
  subtotal: number;
  tax: number;
  shipping: number;
  discount: number;
  total: number;
  currency: string;
  payment_method: string;
  line_items: LineItem[];
  shipping_address: AddressShape;
  billing_address: AddressShape;
  attribution: AttributionShape | null;
  first_order: boolean;   // is this the customer's first paid order?
}

interface LineItem {
  line_item_id: string;
  product_id: string;
  variant_id: string;
  sku: string;
  title: string;
  quantity: number;
  price: number;
  subtotal: number;
  total_discount: number;
  properties: Record<string, unknown>; // custom-properties, product-specific
}

Three rules the shapes enforce.

Prices are numbers. Not strings. Decimal handling is a source of silent reconciliation bugs; locking it at the schema level prevents the string-vs-number question from creeping back in.

IDs are strings. Even when Shopify returns a numeric id, cast it to string at the staging layer. Joins on integer vs string ids break in BigQuery silently.

Line items are a separate type. Reused across add_to_cart, purchase, refund. If the shape evolves, it evolves in one place.

The naming rules

Five rules. They look obvious written out and they are obvious only until the first time someone breaks one.

Rule 1: snake_case. Not camelCase, not kebab-case. snake_case is the SQL-native convention and dbt models work without quoting. product_view, not productView.

Rule 2: past tense for actions, present for views. purchase, refund, subscription_created are things that happened. product_view, collection_view are states the user entered. Enforce consistently; it becomes grammatically obvious in queries.

Rule 3: noun-first for objects, verb-first for actions. subscription_created (noun-verb) for lifecycle events where the object is the subject. add_to_cart (verb-object) for action events where the user is the subject. This matches GA4's convention and makes translation free.

Rule 4: no platform names in event names. Not shopify_purchase, not klaviyo_email_opened. Just purchase and email_opened. Platform details live in source and in the properties blob. Replatforming Shopify to Hydrogen should not require renaming 22 events.

Rule 5: version the event name when the shape breaks. Adding a new optional property is a minor version bump (purchase at version 2.1). Renaming or removing a property is a major version bump (purchase at version 3.0). Keep both versions live in the ingestion service for a migration window, then deprecate.

The versioning pattern

Schemas evolve. They should evolve gracefully.

purchase@1.0  initial shape
purchase@1.1  added line_items[].properties (optional)
purchase@2.0  renamed subtotal_price to subtotal
purchase@2.1  added attribution (optional)

The ingestion service reads event_version from the envelope, applies a version-specific transformer, and writes to a single canonical table in the warehouse. Historical events in version 1 formats still work; the transformer handles the reshape.

-- stg_events.sql (pseudocode)
SELECT
  event_id,
  event_name,
  event_version,
  CASE event_version
    WHEN '1.0' THEN JSON_EXTRACT_SCALAR(properties, '$.subtotal_price')
    ELSE JSON_EXTRACT_SCALAR(properties, '$.subtotal')
  END AS subtotal,
  -- ... other fields
FROM raw.events

Deprecate old versions after 90 days of no events at that version. The warehouse stays clean and the source of truth is always the latest schema.

The survive-replatform test

The final test for an event schema: can it survive a replatform? If you moved the brand from Shopify to Hydrogen (or from Shopify to BigCommerce, or from Klaviyo to Customer.io), which events would need to change?

If the answer is "most of them," the schema is platform-coupled and will cost months of rework at the replatform. If the answer is "the source column value and a few properties map differently," the schema is platform-agnostic and the replatform is mostly an ingestion change.

A good test: sketch the same 22 events for a Hydrogen + Customer.io stack. If the event names and envelope are identical and only the properties mapping differs, you are where you want to be. If the event names or envelope have to change, rethink the schema before going live on the first platform.

The schema is the artifact that lasts. Platforms churn every two to three years; the canonical event catalog should outlast all of them.

FAQ

Do I need this if I am a small brand on default Shopify?

Less than a mid-market brand, but still yes. The default Shopify + Klaviyo + GA4 event naming is messy (mixed camelCase/snake_case, some events missing properties). Even a small brand benefits from landing a canonical schema; it just takes a day instead of a week.

What about the Segment / RudderStack spec?

Segment's industry-standard spec is a reasonable starting point. Most of my canonical events line up with Segment's ecommerce spec (Product Viewed, Cart Viewed, Order Completed). Minor differences: I prefer snake_case and past tense throughout; Segment's spec is Title Case. Either works; pick one and stick with it.

How do I handle brand-specific custom events?

Same envelope, same naming rules, same properties shape. gift_card_redeemed, product_reviewed, subscription_box_customized. Keep them to the minimum needed; every custom event is additional schema surface area that has to be maintained.

Who maintains the schema document?

One person should own it. At a mid-market brand, this is usually the data engineer or the senior developer. The document lives in the repo as a Markdown file or an OpenAPI-style spec. Every change goes through PR review. Treat it like API schema because that is what it is.

Should I use JSON Schema or TypeScript types?

Either works. TypeScript types are easier for a JavaScript-heavy team and they can validate at the ingestion edge via Zod. JSON Schema has better cross-language portability and works with tools like AJV. I default to Zod with TypeScript; the validation happens at the edge and errors are human-readable.

What to try this week

Open a blank document. Write down the 22 canonical events from above. Next to each, write the properties that matter for your brand. Compare to what Shopify + Klaviyo + GA4 are actually sending today. The gaps (events you thought you had and do not, properties that are mis-shaped) are your schema-rebuild scope.

If the gap list is substantial and you are not sure where to start, a DTC Stack Audit maps the current state against a canonical schema and tells you which events are costing you the most downstream cleanup. Downstream of the schema, see BigQuery for Shopify data for the storage shape and server-side GA4 via Measurement Protocol for the reporting layer.

Sources and specifics

// related

DTC Stack Audit

If this resonated, the audit covers your tracking layer end-to-end. Server-side CAPI, dedup logic, and attribution gaps - all mapped to your stack.

>See what is covered