Running analytics on a regulated site when GA4 is off the table

A founder I was advising asked whether she could keep GA4 on the marketing homepage and strip it from the member area. The honest answer is that the HIPAA Privacy Rule does not care where the script lives if any of it ends up on a page that reveals a user is using your service. That is OCR's December 2022 bulletin on tracking technologies in one sentence. Most regulated sites are not wrong to fear GA4. They are wrong to assume that fear means they get no analytics at all.

The question I keep getting is not whether to strip GA4. That part is usually already decided by the time someone calls me. The question is what to replace it with when the business still needs to answer "where are members dropping off" and "which funnel is converting" without landing a URL and an IP address in Google's infrastructure.

I have shipped three patterns against this problem, and they are not substitutes for each other. They stack. Analytics is one surface inside the regulated DTC healthcare cluster index, which links the attribution boundary to the auth, audit, and vendor pieces that share the same privacy perimeter.

Pattern 1: Server-side measurement with a first-party warehouse

The first pattern replaces browser-side tags with server-to-server events. When a member completes an action that matters to the business (sign-up, intake submitted, subscription started, prescription fulfilled), the server emits an event to a warehouse you control. No JavaScript ships that information to a third party.

I have built this layer as part of a unified analytics suite shipped in 30 days for a mid-market operator, where the requirement was "every question the marketing team used to ask now has a single URL where the answer lives." The pattern is an ingestion layer that writes raw events to Postgres, dBT transformations that shape those events into dimensions and facts, and a read API that serves dashboards. The browser never sees the measurement code.

The tradeoff is that server-side events only exist for things the server knows about. You lose page-level scroll depth on marketing pages, time on site for anonymous visitors, and the easy funnel visualization that GA4 gives you out of the box. In exchange, you get events that are accurate to the minute, that survive ad blockers, and that do not depend on a vendor's definitions of "conversion" staying stable.

For regulated products, the upside is larger than the downside. The member area was never the place you were going to learn about drop-off anyway, because the answer is never "improve the headline." It is always "fix the onboarding flow," and that data comes from event logs the application already writes.

A minimal server-side event shape:

create table analytics.events (
  id              bigserial primary key,
  event_time      timestamptz not null default now(),
  actor_id        uuid,
  event_name      text not null,
  properties      jsonb,
  session_id      uuid,
  source          text,
  surface         text check (surface in ('marketing','member','admin'))
);

Notice surface. This is the boundary column. Marketing rollup queries filter to surface = 'marketing', and product-usage rollups filter to surface = 'member'. You can answer "how did last month's campaign convert" without joining any identifier that crosses the boundary.

The warehouse itself is still inside your compliance perimeter. It is your database, with the same roles and access controls you already apply to PHI, rather than a third-party tool. The analytics event log inherits the same posture as the audit log I described for regulated Next.js apps: a separate schema, a writer role with INSERT only, a reader role with SELECT only.

Pattern 2: Privacy-safe session replay and heatmaps

The second pattern is the hardest, because most session replay vendors are disqualified by default and the ones that qualify are expensive. You need replay of the member area when a support case escalates. A screenshot in a ticket is not enough; you need to see the exact sequence that produced the broken state. GA4 does not do this. Hotjar and FullStory will, but only under a signed business associate agreement (BAA), and the pricing for the BAA tier is typically four to ten times the list price.

The patterns I have seen work in production, in order of preference:

First, self-hosted open-source tools with input masking at the library level. rrweb (the open-source recording library that most commercial replay tools are built on) lets you mask inputs by selector or attribute. You tag every input that could contain PHI with a data-sensitive attribute, the recorder replaces the value with a mask, and the recording that gets stored on your infrastructure is safe to share with support engineers. This works because the mask is applied on the client, before the data is ever transmitted.

Second, commercial vendors on their BAA tier. This is the "pay for the compliance version" path. The tooling is identical to the consumer tier; the difference is the contract. Some vendors publish their BAA tier pricing, some do not. The vendor-evaluation work I walk through in the BAA and vendor-risk questions for a small team applies here unchanged: does the vendor sign a BAA, does the sub-processor chain stay inside the BAA, and is the audit log of access to the replay data itself discoverable.

Third, no replay at all. For small regulated teams, this is a legitimate choice. A well-instrumented error log plus a small number of custom events (this component rendered, this form was submitted, this button was clicked) gets you 80 percent of the diagnostic value of replay without the compliance weight.

“Privacy-safe replay is not about finding a compliant vendor. It is about making the data non-sensitive before it leaves the browser.”

The mistake I keep seeing is teams that deploy session replay, assume masking will happen automatically, and ship. Masking is not automatic. Every regulated vendor I have evaluated has the same rule in their documentation: you tag the sensitive inputs, or the recording is not compliant. The tagging is the work.

Pattern 3: Marketing-surface analytics with a hard architectural boundary

The third pattern solves a different problem. You have a marketing site that needs real marketing analytics (traffic by source, campaign conversion, landing-page A/B tests), and you have a regulated product that cannot touch any of that infrastructure. The pattern is to split the domain.

www.example.com runs on marketing-site infrastructure with whatever measurement stack makes sense for marketing. GA4, a Hotjar tier that is not under BAA, PostHog with a marketing-only bucket. This surface never sees an authenticated member. The moment a user clicks "Log in" or "Sign up," the traffic goes to app.example.com, which runs a different stack with different cookies and no third-party scripts at all.

The split domain matters because of how cookies and measurement IDs work. A cookie set on www.example.com does not leak to app.example.com. Marketing analytics ends the session when the user crosses the boundary, and app analytics starts fresh under a different instrumentation stack (the one from Pattern 1).

Instrument the handoff event carefully. When a user clicks a CTA on the marketing surface that takes them to the app (a checkout link, a sign-up button), the marketing stack records "user left the marketing surface with intent X." On the other side, the app stack records "user arrived with intent X." Correlation happens through a UTM-style parameter that both sides agree on; the user identity itself never crosses. This is the attribution question I covered in more detail in the post about attribution windows across iOS and Android, but in a regulated context.

Aspect	Server-side warehouse	Self-hosted replay	Split-domain
Primary use	Funnel metrics, usage	Support diagnostics	Campaign attribution
Vendor BAA needed	Only for warehouse host	Only if commercial tier	Only on app side
Build time	2 to 4 weeks	1 to 2 weeks	1 week for routing
Ongoing work	Event schema discipline	Mask-tag every new input	UTM parameter contract

The pattern underneath all three

The three patterns look different on the surface. They share one idea: the privacy boundary and the architecture boundary are the same line.

GA4 and similar tools fail on regulated sites because they assume one measurement layer spans everything, from anonymous visitor to authenticated member. That assumption is incompatible with HIPAA's rule that any tracking-tech combination that can identify a member plus a service creates an identifier subject to the Privacy Rule. You cannot solve this by configuring the tool harder. You solve it by drawing the boundary first and then picking a tool that respects the boundary.

Each pattern draws the boundary somewhere different: at the server (Pattern 1), at the mask (Pattern 2), at the domain (Pattern 3). All three are saying "decide where the regulated surface starts, and do not let measurement cross that line accidentally." The rest is plumbing.

For teams further along the stack, the same discipline applies to tracking and commerce attribution; I walk through the commerce side of the boundary in the productized DTC stack diagnostic, which reviews server-side tagging and event dedup under the same lens.

FAQ

Does the OCR bulletin apply to unauthenticated pages?

The December 2022 OCR bulletin was later clarified in March 2024. Unauthenticated pages can still create PHI if the page itself reveals a health condition (for example, a page about a specific treatment). For a general marketing homepage with no specific condition content, the risk is lower. For a product-specific page inside a healthcare funnel, assume authentication is not the line. The Office for Civil Rights treats the "combined with IP" bar as low.

Can I use Google Analytics 4 with IP anonymization on a marketing site?

For a pure marketing surface with no condition-specific content, IP anonymization reduces risk significantly. It does not eliminate it. Google does not offer a BAA for GA4. Consider Plausible or a self-hosted stack if the content skews toward specific conditions or services. The cost difference is minimal; the compliance posture is much cleaner.

What about Meta Pixel on healthcare marketing pages?

Meta Pixel is the vendor most often named in OCR enforcement actions against healthcare organizations. Meta does not offer a BAA. The safest posture is to keep Meta Pixel off any surface where a visitor might be identified as a current or prospective member of a regulated service. Server-side CAPI can help for conversion reporting, but read the BAA question before assuming server-side CAPI solves the underlying issue.

Is Plausible compliant with HIPAA out of the box?

Plausible Self-Hosted keeps data on your infrastructure, which removes the third-party sub-processor issue. It does not automatically make the analytics data non-PHI. If a Plausible event is paired with a user's IP and a URL that identifies them as a member, you still have PHI, just in your database. Plausible Cloud does not currently offer a BAA. Treat self-hosted Plausible as a useful tool inside the compliance perimeter, not as a "HIPAA-compliant analytics solution."

How do I answer the CEO who wants a funnel dashboard?

The warehouse path from Pattern 1 gives you every funnel metric a GA4 dashboard would. Sign-ups, activations, conversions, retention cohorts, period-over-period comparisons. The build is 2 to 4 weeks for the first version. Most of the value comes from the first pass; iteration happens over the following quarters. The dashboard tells the CEO what GA4 would have, minus the UI chrome, plus the assurance that no PHI is leaving the perimeter.

What about GDPR and CCPA on the same site?

These patterns stack on top of GDPR and CCPA. Consent management still applies. The "legitimate interest" basis that justifies analytics in some EU contexts is harder to argue for a tool that sends data to a US-based vendor without a BAA. The split-domain pattern in particular maps cleanly to GDPR: the marketing domain can carry a consent banner with standard analytics choices, while the app domain has a different and narrower posture.

Sources and specifics

OCR guidance on tracking technologies: HHS Office for Civil Rights, "Use of Online Tracking Technologies by HIPAA Covered Entities and Business Associates," original December 2022, updated March 2024.
Safe Harbor reference: 45 CFR 164.514(b) for de-identification standards; IP address is one of the 18 identifiers that must be stripped for safe harbor.
OCR enforcement actions against healthcare organizations for Meta Pixel usage have been public since 2023; the posture in this article treats Meta Pixel as high-risk on any authenticated surface.
Warehouse event schema derives from production analytics platforms built for mid-market operators, with PHI-adjacent data kept inside the customer's compliance perimeter.
Session replay masking pattern references rrweb, the open-source library behind most commercial replay vendors; input masking via selector is the supported mechanism.
Split-domain pattern maps to standard subdomain cookie isolation; verification against your hosting provider's cookie behavior is recommended before rollout.