Skip to content
bizurk
← ALL WRITING

2026-06-03 / 16 MIN READ

Klaviyo revenue attribution in your warehouse: the two-key join

Tutorial for tying Klaviyo events to Shopify orders inside BigQuery. Hashed email plus order_id with a 48-hour window survives consent updates.

A DTC operator opened our first call by reading off Klaviyo's flow-revenue dashboard. The welcome series had attributed $387,442 in the last 90 days. The CFO had asked her to defend that number. She could not, because the number lived inside Klaviyo's own attribution model and had never touched a Shopify order in the warehouse. The question she actually wanted to answer was simpler than the dashboard pretended: what did our welcome series make us last quarter, in dollars that hit the bank.

This is the tutorial for answering that question properly. Klaviyo events go into BigQuery, Shopify orders are already there, and a two-key join (hashed email plus order_id, inside a forty-eight-hour window) ties them together in a way that survives consent updates and right-to-erase requests. By the end you have a fact_klaviyo_orders model that lets you report flow revenue in the same dollars your CFO sees in the Shopify finance export.

I shipped this pattern at a mid-market DTC operator in Q1 2026 as part of an analytics engine rebuild, where the warehouse fed six dashboards and the email team's flow attribution lived against settled revenue, not Klaviyo's modeled number.

Why Klaviyo's attributed-revenue number won't survive a board review

Klaviyo computes flow and campaign revenue using its own attribution model. Last-click within a configurable window, default five days for emails and one day for SMS, with a tiebreaker that favors the most recent qualifying touch. That number is fine for inside-Klaviyo decisions like which flow variant to keep. It falls apart when the CFO opens the deck.

The first reason is reconciliation. Klaviyo's number does not equal Shopify's order subtotal for the same date range. Klaviyo includes orders that fired the Placed Order metric but were later refunded, partially refunded, cancelled, or never settled. Shopify reports settled net revenue. The two numbers will diverge by 3 to 12 percent every month, and the gap is not auditable from inside Klaviyo.

The second reason is the attribution model itself. Klaviyo's UI lets you change the attribution window, and any change retroactively rewrites historical reports. A finance team cannot accept a revenue number that retro-changes when someone clicks a setting. The board deck has to come from a system where last quarter's number is locked.

The third reason is the field most operators end up using, attributed_revenue on the Placed Order event. That field is Klaviyo's modeled credit assignment for that order, not the order's real total. Aggregating it across a flow gives you Klaviyo's opinion of what the flow earned. It is not what the flow earned.

Prerequisites

You need a few things in place before any of this works. None of them are exotic, but each one is a real prerequisite. Skip one and the tutorial breaks at the join.

A Klaviyo plan with API access. Free plan users can still pull events through the public API but rate limits will hurt during backfill. Any paid plan and the three Klaviyo to BigQuery ETL paths all work for this tutorial; pick the one that fits your volume.

Shopify orders already landing in BigQuery. The join target is a stg_shopify__orders (or equivalent) model with one row per order, including order_id, email, created_at, and total_price. If your Shopify schema does not look like that yet, the schema-that-does-not-regret-you reference is where to start.

A dbt project with at least staging models for both Klaviyo and Shopify, running against the same warehouse. The two-key join lives in an intermediate model that depends on both staging layers.

A service account with BigQuery Data Editor on the dataset and BigQuery User on the project. dbt-bigquery uses this for runs.

A SHA-256 hashing function available in your warehouse. BigQuery has it built in as SHA256(). Snowflake has SHA2(). Both produce identical output when fed the same input bytes, so the hash is portable across warehouses if you ever migrate.

Single jagged glass fragment isolated on a dark studio backdrop, broken edge under cold electric-blue rim, hot-pink dispersion bleeding through the inner face.
// the fragment · broken edge under twin rim lights

Step 1: pull Klaviyo events into BigQuery raw

The shape you want in BigQuery raw is one row per metric occurrence, with the original properties JSON preserved. The native Klaviyo BigQuery connector, Fivetran, Airbyte, and a DIY ingestion service all produce roughly the same shape. The differences are cost and latency, not the data.

The minimum schema for klaviyo_raw.events:

CREATE TABLE klaviyo_raw.events (
  id              STRING NOT NULL,
  metric_id       STRING NOT NULL,
  profile_id      STRING NOT NULL,
  datetime        TIMESTAMP NOT NULL,
  properties      JSON,
  email           STRING,
  ingested_at     TIMESTAMP NOT NULL
)
PARTITION BY DATE(datetime)
CLUSTER BY profile_id, metric_id;

Two things matter for the join later. Partitioning by DATE(datetime) keeps the join cost predictable as the table grows. Clustering by profile_id makes the email-hash lookup fast.

The properties blob is JSON because the shape varies per metric. A Placed Order event has OrderId, Total, Items, BillingAddress.Email, and a few dozen other fields. An Opened Email event has CampaignId, MessageId, Subject. The staging model extracts the fields the downstream models need.

If you are running the ingestion yourself, the Klaviyo events endpoint is /api/events. Cursor-paginate, filter by filter=greater-than(datetime,"<last-cursor>"), sleep on 429s. Backfill takes a few hours for a typical mid-market account. The pattern library for the three ETL paths covers the trade-offs in detail.

Step 2: build the hashed-email join key

The temptation is to join Klaviyo to Shopify on raw email. Do not do that. The hash is the resilient key, and the reason matters.

Email addresses in your warehouse are PII. When a profile asks for right-to-erase, you have to clear the raw email field. If you are joining downstream models on raw email, that join breaks the moment you honor the request. The hash, on the other hand, is one-way and is not PII once the source row is cleared. You can keep the hash in the fact tables, lose the raw email at staging, and the historical attribution still works.

The standard hash is SHA-256 of the lowercased and trimmed email. This is the same normalization Meta CAPI uses for its em user identifier, which means if you also pipe data into a server-side CAPI flow with consistent hashing, the same hashed email is your join key across Klaviyo, Shopify, and Meta.

Build the hash as a dbt macro so it is identical wherever you call it.

-- macros/hash_email.sql
{% macro hash_email(column) %}
  CASE
    WHEN {{ column }} IS NULL THEN NULL
    WHEN TRIM({{ column }}) = '' THEN NULL
    ELSE TO_HEX(SHA256(LOWER(TRIM({{ column }}))))
  END
{% endmacro %}

Then reference it in both staging models.

-- models/staging/klaviyo/stg_klaviyo__events.sql
SELECT
  id AS event_id,
  metric_id,
  profile_id,
  datetime AS occurred_at,
  {{ hash_email('email') }} AS email_hash,
  properties,
  ingested_at
FROM {{ source('klaviyo_raw', 'events') }}
-- models/staging/shopify/stg_shopify__orders.sql
SELECT
  id AS order_id,
  {{ hash_email('email') }} AS email_hash,
  created_at,
  total_price,
  financial_status,
  -- ... rest of order fields
FROM {{ source('shopify_raw', 'orders') }}

Now both sides of the join speak the same hashed-email language and a right-to-erase request only has to land at the staging layer.

Hashed email is one-way and is not PII once the source row is cleared. The hash stays in the fact tables; the raw email leaves at staging.

Ultra-wide distant view of a solitary translucent monolith on a dim plain at dusk, vanishing horizon, electric-blue atmospheric haze, hot-pink rim on the far edge.
// the distant form · monolith on vanishing plain

Step 3: build the order_id secondary key

Klaviyo's Placed Order event ships the Shopify OrderId inside properties. Extract it into a typed column at the staging layer so the join can lean on it.

-- models/staging/klaviyo/stg_klaviyo__placed_orders.sql
SELECT
  event_id,
  email_hash,
  occurred_at,
  CAST(JSON_EXTRACT_SCALAR(properties, '$.OrderId') AS STRING)
    AS shopify_order_id,
  CAST(JSON_EXTRACT_SCALAR(properties, '$.$flow') AS STRING)
    AS flow_id,
  CAST(JSON_EXTRACT_SCALAR(properties, '$.$message') AS STRING)
    AS message_id,
  CAST(JSON_EXTRACT_SCALAR(properties, '$.$campaign_id') AS STRING)
    AS campaign_id
FROM {{ ref('stg_klaviyo__events') }}
WHERE metric_id = '{{ var('klaviyo_placed_order_metric_id') }}'

The $flow and $message fields are how Klaviyo tags which flow or campaign generated the click that led to the conversion. Those are the dimensions you actually want to report on, not Klaviyo's attributed_revenue rollup.

There is a real edge case where OrderId is null on a Placed Order event. It happens when the order came from a checkout that did not fire Klaviyo's pixel or when the Shopify-Klaviyo integration was misconfigured at the time. Roughly 1 to 3 percent of Placed Order events have a null OrderId in the accounts I have audited. The two-key join handles this case by falling back to the hashed-email plus timestamp window match.

Macro close-up of crystalline dispersion on a single glass slab, refractive bands of cold blue splitting into hot-pink across the chipped corner, fine surface detail.
// the dispersion · refractive bands close up

Step 4: write the two-key join

The join is one CTE per match strategy plus a coalesce. Match strategy A is the strict order_id match, which is the strongest signal and you should always prefer it. Match strategy B is the hashed-email plus timestamp window fallback, used when order_id is missing.

-- models/intermediate/int_klaviyo_orders_joined.sql
{{
  config(
    materialized='incremental',
    unique_key='join_id',
    partition_by={'field': 'order_created_at', 'data_type': 'timestamp'},
    cluster_by=['flow_id', 'campaign_id']
  )
}}

WITH klaviyo_placed AS (
  SELECT *
  FROM {{ ref('stg_klaviyo__placed_orders') }}
  {% if is_incremental() %}
    WHERE occurred_at > (
      SELECT TIMESTAMP_SUB(MAX(klaviyo_event_at), INTERVAL 7 DAY)
      FROM {{ this }}
    )
  {% endif %}
),

shopify_orders AS (
  SELECT
    order_id,
    email_hash,
    created_at,
    total_price,
    financial_status
  FROM {{ ref('stg_shopify__orders') }}
  WHERE financial_status IN ('paid', 'partially_refunded')
),

-- Strategy A: strict order_id match
match_a AS (
  SELECT
    k.event_id,
    k.flow_id,
    k.campaign_id,
    k.message_id,
    k.email_hash AS klaviyo_email_hash,
    k.occurred_at AS klaviyo_event_at,
    s.order_id,
    s.created_at AS order_created_at,
    s.total_price AS order_total,
    'A: order_id' AS match_strategy
  FROM klaviyo_placed k
  INNER JOIN shopify_orders s
    ON CAST(k.shopify_order_id AS STRING) = CAST(s.order_id AS STRING)
  WHERE k.shopify_order_id IS NOT NULL
),

-- Strategy B: hashed-email plus 48h window fallback
match_b AS (
  SELECT
    k.event_id,
    k.flow_id,
    k.campaign_id,
    k.message_id,
    k.email_hash AS klaviyo_email_hash,
    k.occurred_at AS klaviyo_event_at,
    s.order_id,
    s.created_at AS order_created_at,
    s.total_price AS order_total,
    'B: hash+window' AS match_strategy
  FROM klaviyo_placed k
  INNER JOIN shopify_orders s
    ON k.email_hash = s.email_hash
    AND ABS(TIMESTAMP_DIFF(s.created_at, k.occurred_at, MINUTE)) <= 2880
  WHERE k.shopify_order_id IS NULL
),

unioned AS (
  SELECT * FROM match_a
  UNION ALL
  SELECT * FROM match_b
)

SELECT
  TO_HEX(SHA256(CONCAT(event_id, '|', order_id))) AS join_id,
  *
FROM unioned

The forty-eight-hour window (2,880 minutes) on strategy B is empirical. I tested 12, 24, 48, and 72-hour windows on a year of historical data at the analytics engine client. Twelve hours under-matched. Seventy-two over-matched, picking up unrelated orders the customer placed days later that had nothing to do with the email. Forty-eight was the sweet spot where false positives stayed under one percent and false negatives stayed under three.

The unique_key on join_id (a hash of event_id and order_id) is what lets the model run incrementally without duplicating matches across runs. The seven-day lookback in the is_incremental() clause covers the case where a refund or cancellation flips an order's financial_status after the original ingestion.

Step 5: turn the joined model into a flow-revenue report

The fact table is a small step beyond the intermediate join. Group by flow_id and campaign_id, sum the order totals, and the row count is the order count.

-- models/marts/fact_klaviyo_flow_revenue.sql
{{
  config(
    materialized='incremental',
    unique_key=['flow_id', 'campaign_id', 'order_date'],
    partition_by={'field': 'order_date', 'data_type': 'date'}
  )
}}

SELECT
  flow_id,
  campaign_id,
  DATE(order_created_at) AS order_date,
  COUNT(DISTINCT order_id) AS order_count,
  SUM(order_total) AS revenue_usd,
  COUNTIF(match_strategy = 'A: order_id') AS strict_matches,
  COUNTIF(match_strategy = 'B: hash+window') AS fallback_matches
FROM {{ ref('int_klaviyo_orders_joined') }}
WHERE order_created_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 400 DAY)
GROUP BY 1, 2, 3

The revenue_usd column is the number that goes on the board deck. It is the sum of actual settled Shopify order totals, attributable to a Klaviyo flow or campaign by the two-key join. The strict_matches and fallback_matches columns let you sanity-check the ratio. In a healthy account, strict matches should be 95 percent or more of the total. Anything lower means the Klaviyo-Shopify integration is dropping OrderId on more events than it should.

A useful weekly variance check, run side by side with the Shopify-GA4-Meta reconciliation variance report:

SELECT
  order_date,
  SUM(revenue_usd) AS warehouse_klaviyo_revenue,
  -- compare against Klaviyo's own attributed_revenue from the events table
  (
    SELECT SUM(CAST(JSON_EXTRACT_SCALAR(e.properties, '$.$value') AS NUMERIC))
    FROM {{ ref('stg_klaviyo__events') }} e
    WHERE DATE(e.occurred_at) = f.order_date
      AND e.metric_name = 'Placed Order'
  ) AS klaviyo_ui_attributed_revenue
FROM {{ ref('fact_klaviyo_flow_revenue') }} f
WHERE order_date >= CURRENT_DATE() - 30
GROUP BY 1
ORDER BY 1 DESC

The two columns will diverge. The warehouse number will be lower, because it excludes refunded and cancelled orders. The Klaviyo number will be higher and harder to defend. That is the entire point.

Ultra-wide backlit silhouette of a single tall translucent slab against a deep electric-blue evening sky, hot-pink magic-hour band along the low horizon.
// the silhouette · slab against deep dusk sky

Common mistakes

Joining on raw email instead of the hash. The model works on day one. It breaks the first time a customer requests right-to-erase and the email field gets cleared at staging. Future runs miss those customers' historical orders. The hash never has this problem.

Skipping the timestamp window on strategy B. Without the 48-hour bound, a customer who clicked an email six months ago and finally placed an order will get matched to that ancient flow. The flow looks like it converted them, when really they were a long-tail organic conversion. The window is what keeps the attribution honest.

Trusting Klaviyo's attributed_revenue field as the truth. It is fine for inside-Klaviyo flow optimization. It is not the number a CFO will accept on a quarterly review. The whole reason this pipeline exists is that the attributed_revenue field cannot be reconciled to settled Shopify revenue.

Materializing the intermediate model as a table instead of incremental. On a year of order history at a mid-market account, the full rebuild scans tens of gigabytes every dbt run. With the incremental config above, daily runs scan a few hundred megabytes. The cost difference is real money over a quarter.

What to try next

The fact table is now the primary input for two more things you probably want.

Cohort retention. Once you have flow-revenue tied to settled orders, the same join shape supports the cohort retention SQL pattern. Cohort customers by their first order, then measure how much of their repeat revenue came through Klaviyo flows versus organic versus paid.

A multi-touch attribution model that includes email. Email click events from stg_klaviyo__events become touchpoints in the same int_touchpoints table that feeds the BigQuery attribution SQL models. Last-click, linear, and time-decay all run against a touchpoint table that finally has email represented.

A Looker Studio (or Hex, or Lightdash) view for the operator. The fact_klaviyo_flow_revenue table is small and fast to query, so a flow-level dashboard with date pickers and per-flow trend lines costs almost nothing to build on top.

FAQ

Why hash the email at all if both Klaviyo and Shopify have the raw value?

Three reasons. Right-to-erase requests clear the raw email, but the hash stays in the fact tables and historical attribution survives. The hash is portable across systems (it matches Meta CAPI's em identifier byte-for-byte), so the same key works for cross-channel attribution. And keeping raw email out of intermediate and fact models reduces your PII surface area, which makes audits and BAAs simpler.

What hashing function should I use?

SHA-256 of the lowercased and trimmed email, hex-encoded. Both BigQuery (TO_HEX(SHA256(...))) and Snowflake (SHA2(...)) produce the same output. This is the format Meta CAPI specifies and the one Klaviyo's own API uses internally for consent-mode profile resolution. Using anything else (MD5, SHA-1, plain SHA-256 without normalization) means the hash from one system will not match the hash from another.

What happens to the join when a customer changes their email?

The Shopify order created before the change has the old hash. The Klaviyo profile after the change has the new hash. They will not match on strategy B, so the model relies on strategy A (order_id) for those orders. The Klaviyo-Shopify integration usually carries OrderId correctly even across email changes, so the strict match handles it. If you want to be extra safe, build a slowly-changing-dimension table of Klaviyo profile email history and use the historical hash at the time of the order.

How do I handle anonymous-then-identified order flow?

Klaviyo identifies a profile only after a click on a Klaviyo-tagged email link or a manual identify call. An anonymous Shopify checkout that never opened an email will not have a Klaviyo profile to attribute to. Those orders fall out of the join entirely, which is correct. They are not attributable to a flow.

Should I use this instead of Klaviyo's attribution settings or alongside?

Alongside. Keep Klaviyo's settings as they are for inside-Klaviyo flow optimization, where the team is comparing variant A versus variant B and Klaviyo's last-click model is fine. The warehouse fact table is the source for board reporting, finance, and any external review. Both numbers exist; they answer different questions.

Does this work for SMS attribution too?

Yes, with the same shape. Klaviyo's SMS events use Received SMS, Clicked SMS, and the same Placed Order metric for conversions. Phone hash (E.164 normalized, then SHA-256) becomes the secondary join key for SMS-only profiles. The two-key join becomes a three-key join: order_id, then email hash, then phone hash, in that order of preference.

Sources and specifics

  • The two-key join pattern was shipped at a mid-market DTC operator in Q1 2026 as part of a multi-service warehouse rebuild.
  • The 48-hour timestamp window on the email-hash fallback was tuned empirically against a year of historical orders. Twelve, 24, 48, and 72-hour windows were tested; 48 had the best false-positive and false-negative balance.
  • Klaviyo's Placed Order metric carries OrderId in the event properties payload. Klaviyo's API documentation at developers.klaviyo.com confirms the field name and its position in the JSON.
  • SHA-256 of a lowercased, trimmed email is the normalization Meta CAPI specifies for the em user identifier (Meta Marketing API documentation, "Customer Information Parameters").
  • The dbt incremental config (partition by order_created_at, cluster by flow_id and campaign_id) is from a year of operating the model at the analytics engine client. Daily runs scan a few hundred megabytes; full rebuilds scan tens of gigabytes.
  • If you are scoping a warehouse-first rebuild and want this exact join validated against your live Klaviyo and Shopify, a DTC Stack Audit runs the check and tells you whether the attributed_revenue and warehouse numbers diverge enough to require the rebuild.

// related

Let us talk

If something in here connected, feel free to reach out. No pitch deck, no intake form. Just a direct conversation.

>Get in touch

Tell me what you’re trying to ship.

Send a quick message and I read it within a day, or talk to AI Michael first if you want to feel out your project before you write to me.