Skip to content
bizurk
← ALL WRITING

2026-05-09 / 14 MIN READ

BigQuery cost optimization for DTC: under $100 setup

BigQuery setup for $2-10M DTC stores under $100 a month. Partitioning, clustering, query habits that control cost, and when to migrate off the warehouse.

A $2-10M Shopify brand should not be paying more than a hundred dollars a month for BigQuery. Most of the operators I've reviewed are paying nowhere near that, but the bill runs unpredictable, queries take 40 seconds, and the analyst is afraid to run anything new without permission in slack. The root cause sits in the setup, and it traces back to the same three or four moves every single time.

This is the setup I install on a DTC warehouse-first rebuild. Partition shape, cluster keys, the query habits that blow the bill up versus the ones that flatten it, the cost ledger from a real engagement, and the migration-off question I get asked every time and answer the same way. It pairs with the three-layer Shopify schema in BigQuery; this article is about the cost surface around that schema, not the schema itself.

schema · query · monthly cost
$2-10m dtc · ~400 queries/mo
01 · schema
02 · query habit
03 · monthly cost
bytes / query
50 MB
400 queries / mo
19.5 GB scanned
projected bill
$0.11

[billable] Mid-range pattern. Fix the schema first; the query habits do less if the underlying table forces a full scan.

Schema choice and one query habit drive almost the entire BigQuery bill for a $2-10M DTC brand.

The price you actually pay for BigQuery

Two prices. That is all there is.

Scan price. $6.25 per TB scanned, on-demand pricing, as of May 2026 (cloud.google.com/bigquery/pricing). Every query you run scans some number of bytes from the columns it touches in the partitions it touches. You pay for those bytes regardless of whether the query returns one row or a million.

Storage price. $0.02 per GB per month for active storage (data modified within the last 90 days), $0.01 per GB per month for long-term (older than 90 days, automatic). For a DTC warehouse that lands Shopify, ad data, email engagement, and GA4 events, total storage is usually 3-15 GB. Storage cost is rounding error.

The free tier is the part operators usually do not budget around. The first 1 TB scanned per project per month is free. The first 10 GB stored per month is free. For a $2-10M brand with a sensible schema, you will use about a fifth of the scan free tier and almost none of the storage free tier. The all-in BigQuery bill from a real engagement I shipped in Q1 2026 was $4.12 in month one, $3.87 in month two. Ingestion (Fivetran, Airbyte, or self-hosted) is the dominant warehouse cost; everything else rounds to zero.

The trap is a brand whose dashboard reads from raw, unpartitioned tables, and whose analyst runs SELECT * exploration queries. Those two habits alone can take a $4 monthly bill to $400 in a week.

The schema choices that control BigQuery cost

Three settings on the table itself do most of the cost work.

Partition by DATE(occurred_at), not _ingested_at. Operator queries always filter by event date or order date. If your table is partitioned by ingestion date, a "Q2 orders" query has to scan every partition that contains a Q2 order, which is most of them. If it is partitioned by order date, the same query scans one partition. The difference is roughly 30 to 90x on read cost, depending on how spread out your ingestions are.

Cluster by customer_id and one low-cardinality status column. Partitioning slices the table by date; clustering sorts within each partition by the columns you put in the cluster spec. For Shopify orders, customer_id plus financial_status is the right answer 90 percent of the time. Cohort queries filter by customer, dashboards filter by status, both run inside a single partition without scanning the rest. BigQuery surfaces this in the query planner; if a query says "estimated bytes processed: 2.1 GB" against a 200 MB partition, your cluster keys are wrong.

Nested STRUCTs over star joins for line items. A Shopify order has a one-to-many relationship to line items. The standard relational answer is two tables, joined on order_id. The BigQuery answer is one table with line_items ARRAY<STRUCT<...>> as a column on the order. Joins become UNNEST(line_items), which scans only the line items inside the rows the query already filtered to. A typical cohort query that needs both order and SKU detail scans one third the bytes of the equivalent two-table join.

The mistake I see most: teams import Shopify with a connector that lands raw JSON in a single column, then write dashboards directly against that raw column. Every query parses every JSON blob in every partition the query touches. The fix is a typed staging layer between raw and dashboard, which I covered in the warehouse schema piece for Shopify ingestion. That single layer is usually the difference between a $4 bill and a $40 bill.

Query habits that blow up the bill

Five habits. The first two account for almost every blown bill I have audited.

SELECT * on a wide table. Shopify orders have 60+ columns, many of them nested. A SELECT * scans all of them, even if your downstream code only uses three. Project the columns you actually need at the staging layer or in the query itself. This alone is often a 5-10x reduction.

No partition filter, on a partitioned table. This is the silent killer. If you wrote WHERE created_at >= '2026-04-01', BigQuery uses the partition pruner. If you wrote WHERE EXTRACT(MONTH FROM created_at) = 4, the function call defeats the pruner and you scan the whole table. The fix: always filter on the raw partition column, with no functions wrapping it. There is a --require_partition_filter=true table option you can set if you want BigQuery to refuse queries that do not have a partition filter at all.

Dashboards reading raw tables instead of marts. A live operator dashboard refreshing every 10 minutes against a raw Shopify table is a recipe for a $200 bill nobody anticipated. Dashboards should read from materialized mart models, which are pre-aggregated and small. The mart for a "today's revenue by channel" tile is usually under 50 KB; reading from raw, the same tile scans 800 MB.

Repeated full-table joins instead of materialized incrementals. If your dbt model is materialized as a view, every query against it re-runs the underlying SQL. If it is materialized as incremental, dbt only processes new rows on each run. For tables that grow daily, the difference compounds: a view-materialized cohort model that ran fine at 3 months of history will be 4x as expensive at 12 months and 16x at 4 years. Materialize anything that joins more than two tables as incremental, with unique_key set to the natural key.

No project-level cap. This is the guardrail every BigQuery project should have on day one. In the Cloud Console, project IAM, set a daily query bytes-billed quota. I default to 100 GB per day for a new DTC project, which is well above any reasonable workload but well below the "I left a SELECT * running in a notebook overnight" disaster scenario. When the cap is hit, queries fail loudly instead of silently billing through.

A real cost ledger from a $2-10M brand

Numbers from a Q1 2026 engagement, anonymized to a $2-10M Shopify brand running roughly 8K orders per month with two-year history.

Storage was 4.5 GB total across all warehouse tables. Shopify orders, customers, and products took 1.2 GB. GA4 events at one row per session took 1.8 GB. Klaviyo events took 0.6 GB. Meta and Google Ads spend tables took the rest. Monthly storage cost: $0.09.

Compute was about 380 analyst queries per month at 30-80 MB scanned each (partitioned and clustered as described above), plus 6 dashboards refreshing every 10 minutes against mart models. Total monthly scan: roughly 18-22 GB, all under the 1 TB free tier. Monthly compute cost: $0.

Ingestion was Fivetran for Shopify and Klaviyo at $480 per month combined, plus a custom Cloud Run job that pushed Meta and Google Ads spend data nightly for $14 per month in Cloud Run costs. Total ingestion: $494.

The total warehouse line on this brand's monthly bill was $494.09. Of that, BigQuery itself was nine cents. The visible warehouse cost is almost entirely ingestion. Operators who shop by "BigQuery vs Snowflake vs Postgres" without understanding that are optimizing the wrong line. See the field notes on running a DTC data warehouse for $1,000 a month for the complete budget breakdown including non-warehouse line items.

When to migrate off BigQuery (and when not to)

I get this question on every audit. The answer is almost always "do not migrate." But there are three legitimate cases.

Snowflake makes sense when you have a real cross-warehouse contract-sharing need. Snowflake's data sharing is the best in the industry; if a fulfillment partner or a media agency wants live read access to a slice of your warehouse without exporting it, Snowflake's secure data sharing solves that without a custom export pipeline. For a $2-10M DTC brand with no external data-sharing partners, this is irrelevant.

DuckDB or MotherDuck make sense when your total warehouse is under 100 GB and you have no concurrent users. A solo operator running ad-hoc analyses on a laptop or a single shared notebook can do everything BigQuery does, locally, for free. The moment you have two analysts hitting the warehouse at the same time, or a dashboard that needs to be online 24/7, BigQuery's managed concurrency wins.

Postgres plus Materialize makes sense for sub-second operator dashboards on rapidly changing data. BigQuery is not built for sub-second query latency; queries usually take 1-3 seconds for warm cache, longer for cold. If you need a live-updating ops dashboard that reflects orders within 5 seconds of placement, you want a Postgres-with-Materialize setup or a streaming dashboard tool, not BigQuery.

For everyone else: stay on BigQuery. The cost is not why brands move off the warehouse. They move off because of unfamiliarity with the schema discipline that keeps it cheap, and they discover that Snowflake is no different on that front. A new platform is a new month of figuring out partitioning all over again. Spend that month on the schema you already have.

The cost is not why brands move off the warehouse. They move off because of unfamiliarity with the schema discipline that keeps it cheap, and they discover that the next platform is no different.

The cost guardrails I install on every new project

Four settings, in the order I do them on a new BigQuery project.

Daily project query cap. In Cloud Console, IAM and Admin, Quotas, set "Query usage per day per project" to 100 GB for a new DTC project. Adjust upward only if a real workload pushes it. This single setting prevents the runaway-notebook scenario.

Cost-attribution query labels. Every dbt model run, every dashboard query, every analyst notebook session sets a --label on its queries. A typical label set: team=analytics, dashboard=cohort_ltv, dbt_run_id=2026-05-02-0900. After 30 days, an INFORMATION_SCHEMA.JOBS query joined on labels tells you exactly which dashboard and which model is using the most scan budget. Without labels, the cost view is opaque; with labels, an analyst can answer "what query did I run last week that scanned 80 GB" in 30 seconds.

The hot-query report. A scheduled dbt query (or just a saved query in BigQuery itself) that runs daily against INFORMATION_SCHEMA.JOBS_BY_PROJECT, surfacing the top 10 most expensive queries from the previous day. I drop it in a private Slack channel. Most weeks the channel is silent. The week it surfaces a 40 GB scan from a notebook nobody remembers, the cost-watch pays for itself.

Reservation slots, only if usage justifies it. BigQuery on-demand pricing (the $6.25 per TB) is the default. BigQuery Editions / reservations bills you for compute capacity instead of bytes scanned. The break-even is around $2,000 per month in on-demand spend, which a $2-10M DTC brand will never hit. Stay on on-demand. Reservations only make sense at the $20M+ DTC scale where the warehouse is processing serious data engineering loads, not operator analytics.

This guardrail set takes about 90 minutes to install and removes the entire category of "BigQuery surprise bill" risk for the rest of the brand's life on the platform.

FAQ

Can a $2-10M DTC brand really run BigQuery for under $5 a month?

Yes, when partitioning, clustering, and the staging-to-mart layering are correct. BigQuery itself is nine cents in storage and zero in compute under the free tier on a brand that size. The total warehouse bill is dominated by ingestion (Fivetran, Airbyte, or equivalent) at $300-1,200 per month depending on how many sources you run. The BigQuery line is a rounding error in that total.

What is the single most expensive BigQuery mistake I should avoid?

A live dashboard that reads from a raw, unpartitioned table without column projection. That one pattern can turn a $5 monthly bill into a $300 weekly one. Fix it by adding partitioning, building a typed mart model, and pointing the dashboard at the mart, not the raw table.

Should I use BigQuery on-demand or BigQuery Editions reservations?

On-demand for any DTC brand under roughly $20M GMV. The break-even where reservations become cheaper is around $2,000 per month in on-demand scan spend, and most $2-10M brands run their warehouse for under $20 per month. Reservations add operational complexity (slot management, commit terms) that is not worth it at small scale.

How do I know if my partitioning is working?

Look at the BigQuery query planner before running a query. If "estimated bytes processed" matches roughly the size of one partition (a few MB to tens of MB on a date-partitioned table), partitioning is working. If it matches the size of the full table, your filter is defeated, usually because of a function wrapping the partition column.

Do I still need dbt for a brand this small?

For 1-2 dashboards and a single analyst, raw SQL in BigQuery is fine. Once you have 4+ mart models, multiple dashboards, and incremental builds, dbt-core (free, self-hosted) earns its keep through model lineage, testing, and incremental materialization. dbt Cloud at $100 per developer per month is rarely worth it under $5M GMV; dbt-core is.

What about Shopify Plus's native BigQuery connector?

The native Shopify Plus to BigQuery connector is solid for Plus merchants and free, which makes the all-in warehouse bill genuinely closer to $5 per month including ingestion. For non-Plus merchants, Fivetran or Airbyte is still the realistic answer. As of May 2026 the native connector handles orders, customers, and products well; it is thinner on metafields and discount codes than third-party connectors.

What to try this week

Open BigQuery in the Cloud Console, run this query against your project to find the most expensive queries from the last 30 days:

SELECT
  user_email,
  query,
  total_bytes_billed / POW(10, 9) AS gb_billed,
  ROUND((total_bytes_billed / POW(10, 12)) * 6.25, 2) AS dollars,
  creation_time
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  AND state = 'DONE'
  AND total_bytes_billed > 0
ORDER BY total_bytes_billed DESC
LIMIT 20;

If the top result scanned more than 5 GB and ran more than once, you have a partition or column-projection problem on that query. The fix is usually a 30-second edit. If the top result is a Looker Studio refresh, point that chart at a mart model rather than the raw table.

If the warehouse is not yet stood up at all, a DTC Stack Audit will diagnose whether the right next move is a warehouse rebuild or a tracking fix before you commit budget to either.

Sources and specifics

  • BigQuery on-demand pricing: $6.25 per TB scanned, $0.02 per GB per month active storage, $0.01 per GB per month long-term storage, as of May 2026 (cloud.google.com/bigquery/pricing).
  • Free tier: first 1 TB scanned per project per month and first 10 GB storage per month are free.
  • Cost ledger drawn from Q1 2026 production reporting infrastructure for a $2-10M Shopify brand, multi-source warehouse documented as the analytics engine work I shipped.
  • BigQuery Editions / reservations break-even is around $2,000 per month in on-demand spend, per Google's published pricing examples.
  • INFORMATION_SCHEMA.JOBS_BY_PROJECT requires bigquery.resourceViewer role at the project level; the column set above runs against the standard schema.

// related

Let us talk

If something in here connected, feel free to reach out. No pitch deck, no intake form. Just a direct conversation.

>Get in touch