Skip to content
bizurk
← ALL WRITING

2026-06-22 / 18 MIN READ

PHI Boundary Test Coverage: A CI Gate That Catches Leaks

How to wire PHI boundary test coverage into a Next.js HIPAA build: assertion patterns for logs, errors, URLs, and the encryption boundary, run per PR.

The PHI boundary tests are the ones in a regulated build that pass by failing. They sit on top of the unit, integration, and end-to-end suites, and their job is to fail loudly the moment a harmless-looking change starts leaking protected health information across a boundary the rest of the suite does not watch. I shipped a regulated member compliance platform with more than 1,185 tests in production. The four PHI assertions below are the ones that caught the leaks the team would otherwise have missed.

This is a tutorial about the assertion patterns and the CI wiring, not the boundary mechanics. The mechanics are covered in the App Router PHI surfaces survey at the top of this cluster and in the deeper four-surfaces breakdown. The assertions below assume you already have the boundaries drawn somewhere in your codebase. The patterns run on Next.js 16, Vitest, and Playwright.

// phi boundary suitesentinel: ZZPHI_TEST_SENTINEL_*
surfacechecking…
application log stream
catches: template-string interpolation, error-object stringification
logs: pass
errors: pass
urls: pass
encryption: pass
Tap each assertion to see what it catches. The suite passes by failing loudly the moment a PHI sentinel surfaces in a place it should not.

Prerequisites: what your stack needs before the test patterns work

The boundary tests assume a few foundations. A Next.js 16 App Router app on TypeScript strict, Vitest or Jest in Node mode for the server-side suites, and a Playwright project for the browser-side coverage. A logger with a redaction layer already in front of any PHI write path; the assertions below catch leaks the redactor missed, not leaks the redactor was supposed to catch. The redaction patterns I wrote up separately cover the logger side, and the encrypted-fields walkthrough covers the storage side.

You also need a CI runner with parallel jobs and per-job timeouts, a way to provision a real Postgres test container (not a mock), and a project policy that blocks merge on a failing required check. GitHub Actions handles all of this without ceremony.

The only piece I had to build from scratch was the PHI fixture layer. That is step 1, and the rest of the suite reads from it.

Step 1: tag synthetic PHI with a sentinel so assertions can find it

The first move is to stop treating test fixtures as opaque blobs. The boundary assertions need to find PHI in the wrong place, and they can only find what they recognize. The pattern that worked in production was to bake a sentinel string into every synthetic PHI value, then write a single regex that matches the sentinel anywhere it surfaces.

// test/support/phi-fixtures.ts
export const PHI_SENTINEL = "ZZPHI_TEST_SENTINEL_8f3c";
export const PHI_SENTINEL_RX = /ZZPHI_TEST_SENTINEL_[0-9a-f]+/;

let counter = 0;
const tag = () => `${PHI_SENTINEL}_${(++counter).toString(16)}`;

export function buildMember(overrides: Partial<MemberRecord> = {}): MemberRecord {
  return {
    id: crypto.randomUUID(),
    legalName: `Member ${tag()}`,
    dateOfBirth: `1980-01-${(counter % 28 + 1).toString().padStart(2, "0")}`,
    email: `${tag()}@example.test`,
    phone: `+1555${(7000000 + counter).toString().padStart(7, "0")}`,
    diagnosisCodes: [`G47.${tag().slice(0, 4)}`],
    ...overrides,
  };
}

The sentinel is a constant prefix plus a per-value hex counter. The prefix is unusual enough that nothing in real code will produce it accidentally, and the regex survives JSON serialization, URL-encoding, and base64 round-trips. The factory returns a fully populated record with every PHI field tagged; tests that need specific values pass overrides, tests that just need any member pass nothing. The sentinel discipline holds because the default is always tagged.

The counter is global within the test process so each leaked sentinel ID points back to a specific fixture. When an assertion fails with ZZPHI_TEST_SENTINEL_8f3c_2a, the leak source is the 42nd record built in that run, which is on screen in under a minute.

Close-up macro of a translucent crystalline surface with fine internal fractures and refractive edges catching light.
// the seal · fractures up close

Step 2: assert no PHI lands in application logs

The first boundary assertion is the log layer. It catches the most common leak shape I see in regulated builds: a developer logs an error, the error includes a record fragment, the redactor missed it. The assertion runs at the unit and integration level, exercises a code path under test, captures the emitted log stream, and fails if the sentinel regex matches anywhere in the captured records.

// test/support/log-capture.ts
import { logger } from "@/lib/logger";

type Captured = { level: string; message: string; meta: unknown };

export function captureLogs(): { records: Captured[]; restore: () => void } {
  const records: Captured[] = [];
  const original = logger.transport;
  logger.transport = (rec) => {
    records.push(rec);
  };
  return { records, restore: () => { logger.transport = original; } };
}

export function assertNoPhiInLogs(records: Captured[]): void {
  const flat = JSON.stringify(records);
  const match = flat.match(PHI_SENTINEL_RX);
  if (match) {
    throw new Error(
      `PHI sentinel found in log stream: ${match[0]}\n` +
      `Logged records: ${flat.slice(0, 800)}...`
    );
  }
}

The capture function replaces the logger's transport for the test, collects every record, and restores the original afterward. The assertion serializes the captured records and runs the sentinel regex against the output, failing with the matched sentinel ID if anything hits.

The two failure modes this catches in practice both look like simple developer habits. The first is template-string interpolation. Code that builds a log message with logger.error and a backtick template that drops member.legalName into the string lands the legal name in the log even if the redactor only inspects the metadata object, because the interpolation happens before the redactor runs. The second is error-object stringification: a library throws a PostgresError whose detail field includes the conflicting row, the application logs error.detail, and the redactor passes the string through untouched because it does not know which fields of which error types contain PHI.

A reflective glass surface bounces a faint pinkish glow back across a still composition, mirrored light pooling near the edge.
// the mirror · pink glow folded back

Step 3: assert no PHI in error responses crossing server-to-client

The second boundary assertion is the server-to-client error edge. Server actions and route handlers return errors to the client when they throw. If the team's error handling re-emits the original exception, or includes a record fragment in the error message, that fragment ships to the browser the first time the action throws under load. This is a boundary failure I describe in the App Router survey and the test for it is small.

// test/support/action-errors.ts
import { describe, expect, it } from "vitest";

export async function assertActionErrorIsOpaque<T>(
  action: () => Promise<T>,
): Promise<void> {
  let result: unknown = null;
  let thrown: unknown = null;
  try {
    result = await action();
  } catch (e) {
    thrown = e;
  }
  const payload = JSON.stringify({ result, thrown });
  const match = payload.match(PHI_SENTINEL_RX);
  if (match) {
    throw new Error(
      `PHI sentinel reached the client error boundary: ${match[0]}\n` +
      `Payload: ${payload.slice(0, 800)}...`
    );
  }
}

The driver invokes the action under conditions designed to make it throw. A common shape is to insert a fixture, then call the action with a value that violates a unique constraint, then assert that the resulting error or returned payload contains no sentinel.

// test/actions/update-member-preferences.test.ts
import { describe, it } from "vitest";
import { buildMember } from "../support/phi-fixtures";
import { assertActionErrorIsOpaque } from "../support/action-errors";
import { updateMemberPreferences } from "@/app/(member)/actions";

describe("updateMemberPreferences boundary", () => {
  it("returns an opaque error ID on constraint violation", async () => {
    const member = await seedMember(buildMember());
    await assertActionErrorIsOpaque(() =>
      updateMemberPreferences(member.id, conflictingPrefs(member)),
    );
  });
});

This catches the leak shapes the team did not write themselves. A Postgres unique-constraint violation often quotes the conflicting value in detail. An ORM-level error can include the failing input. A TLS error from an external API call can echo the request body. The opaque-error pattern handles all of them by default, and the boundary test confirms the pattern is applied at every action boundary. A natural extension is a sweep test that iterates every server action and asserts the contract holds, paired with the audit-log four-field shape so a single failure surfaces in both the test report and the audit stream.

Step 4: assert no PHI in URLs, redirects, or query strings

The third boundary assertion lives in the browser. URLs are the boundary developers most often forget because they look like infrastructure. They are not. Every URL ends up in browser history, in referer headers, in CDN access logs, in third-party analytics if any of those scripts are present, and in screenshots members occasionally share with support. Anything in the URL is leaked to surfaces you do not control.

The Playwright pattern intercepts every navigation, captures the destination URL, and asserts the sentinel regex never matches.

// test/support/playwright-phi.ts
import { Page, expect, test as base } from "@playwright/test";
import { PHI_SENTINEL_RX } from "./phi-fixtures";

export const test = base.extend<{ phiUrlGuard: void }>({
  phiUrlGuard: [async ({ page }, use) => {
    const offending: string[] = [];
    page.on("framenavigated", (frame) => {
      const url = frame.url();
      if (PHI_SENTINEL_RX.test(url)) offending.push(url);
    });
    page.on("request", (req) => {
      if (PHI_SENTINEL_RX.test(req.url())) offending.push(req.url());
    });
    await use();
    expect(offending, "PHI sentinel found in URL").toEqual([]);
  }, { auto: true }],
});

The fixture wires itself into every browser test automatically via auto: true, so individual tests do not have to remember to enable the guard. Every navigation and every request is inspected against the sentinel regex; offending URLs are accumulated and the test fails on teardown.

This catches the "I put the member ID in the URL because it was easier" anti-pattern and the related "the redirect after form submission included the form values as query parameters" leak. Neither is caught by unit tests; both ship to production unless someone is asserting on URL contents. The cost is one fixture file. The benefit is a regression-proof contract across every end-to-end test in the project.

A single broken fragment of crystalline material lit in pink and electric blue, isolated against a darker field.
// the fragment · isolated and lit twice
The PHI boundary tests are the only tests in a regulated build that pass by failing.

Step 5: contract-test the encryption boundary with property-based assertions

The fourth boundary assertion is the storage layer. The contract is straightforward to state: every column declared as PHI is encrypted on write, decrypted on read, and never round-trips as plaintext at the storage layer. The pattern that has held up across schema changes uses property-based testing to round-trip a representative sample of synthetic records and assert the contract at every step.

// test/encryption-boundary.test.ts
import { describe, it, expect } from "vitest";
import fc from "fast-check";
import { Pool } from "pg";
import { buildMember, PHI_SENTINEL_RX } from "./support/phi-fixtures";
import { writeMember, readMember } from "@/lib/db/members";

const pool = new Pool({ connectionString: process.env.TEST_DATABASE_URL });

describe("PHI encryption boundary", () => {
  it("never stores PHI as plaintext at rest", async () => {
    await fc.assert(
      fc.asyncProperty(fc.integer({ min: 0, max: 1_000_000 }), async (seed) => {
        const member = buildMember({ id: seedToUuid(seed) });
        await writeMember(member);

        // Read the raw row, bypassing the application's decryption layer.
        const { rows } = await pool.query(
          "SELECT legal_name_ct, email_ct, dob_ct FROM member_profile WHERE id = $1",
          [member.id],
        );
        const raw = JSON.stringify(rows);
        expect(raw.match(PHI_SENTINEL_RX)).toBeNull();

        // Read through the application layer.
        const round = await readMember(member.id);
        expect(round.legalName).toBe(member.legalName);
      }),
      { numRuns: 200 },
    );
  });
});

The property is one statement run 200 times with different fixture inputs. Each run writes a synthetic member through the application path, queries the raw bytes from Postgres without going through the decryption layer, asserts the sentinel is absent, then reads the member back and asserts the round-trip is faithful. If anyone introduces a column that bypasses the encryption hook, removes a pgp_sym_encrypt call from a write path, or adds a PHI field without registering it, this test fails on its next run.

Property-based testing is the right shape here because the regression is structural, not data-specific. A test that round-trips one fixture passes if the developer happened to encrypt that one fixture's fields. A property-based test exercises every shape the fixture builder can produce, including edge cases your team did not think to write a unit test for. The container takes about eight seconds to spin up on a moderate runner, which is acceptable in nightly runs and tolerable in PR runs when gated behind a path filter.

Step 6: wire the suite into CI as a per-PR gate

The four assertions are individually small. The system that makes them effective is the CI wiring that runs them on every PR and blocks merge on failure.

# .github/workflows/phi-boundary.yml
name: phi-boundary
on:
  pull_request:
    branches: [main]
  schedule:
    - cron: "0 7 * * *"

jobs:
  phi-fast:
    runs-on: ubuntu-latest
    timeout-minutes: 8
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_PASSWORD: test
        ports: ["5432:5432"]
        options: --health-cmd pg_isready
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: pnpm }
      - run: pnpm install --frozen-lockfile
      - run: pnpm db:migrate
        env:
          DATABASE_URL: postgres://postgres:test@localhost:5432/postgres
      - name: PHI boundary suite (changed paths)
        run: pnpm vitest run --changed --reporter=github
      - name: Playwright URL guard (changed paths)
        run: pnpm playwright test --grep="@phi-url"

The PR job runs the boundary suite filtered to changed paths. Vitest's --changed mode picks up the files in the diff and the tests that depend on them, which keeps the PR run under five minutes on a moderate codebase. The Playwright leg uses a @phi-url tag on the URL-guard tests so the subset runs without the full browser suite.

The nightly job runs the full suite without the path filter, with the property-based encryption test getting its 200 runs across the whole fixture space. PR runs use a smaller numRuns to stay inside the time budget. The phi-boundary workflow is set as a required status check on main, and failures block merge regardless of whether the unit and integration suites pass. That last part is what makes the suite a gate. A non-blocking PHI suite produces a green report that everyone learns to ignore.

The reporting pattern that helped during the first quarter was a single workflow summary collapsing the four assertion categories into a four-row table with pass count, fail count, and the first matching sentinel ID per category. Reviewers could see the boundary state at a glance.

Ultra-wide distant vantage of a mineral landscape under a deep electric-blue sky, scale rendered by the single small bright point in the distance.
// the long view · bright point in a vast field

Common mistakes that defeat the suite

Four mistakes silently neutralize the boundary check.

The first is mocking the logger globally. A vi.mock("@/lib/logger") at the top of the test file replaces the logger with a stub that records nothing, the capture helper sees no records, and the assertion passes vacuously. Replace the transport instead, which is what the helper in step 2 does.

The second is using random UUIDs as fixture data. A fixture that returns crypto.randomUUID() for the legal name field will never trigger the sentinel regex, but a real production write of the same code path would leak whatever real value flowed through it. The boundary test passes, the leak is real, and the team never finds out until a privacy review surfaces it.

The third is skipping Playwright on the URL surface with the argument that "staging will catch it." Staging will not. Staging exercises the URL with synthetic data that does not match the sentinel; production exercises it with real PHI that does. The PR-time URL guard is the only place to catch this before it ships.

The fourth is running the suite only nightly. A regression introduced at 9am ships at 11am and stays in production for sixteen hours. The PR-time fast leg is what makes the suite a gate rather than a notification.

What to try next

The fifth assertion that pairs naturally is the audit log contract: every PHI access produces an audit record with the four-field shape, and the audit record itself contains no PHI. The mechanics live in the audit log walkthrough, and the assertion is structurally similar to step 2.

The App Router PHI surfaces survey catalogs the six primitives where PHI can leak; the assertions here cover four of them directly. The remaining two (the request cache layer and the proxy file) lean more on code review than automated tests for now. If your stack includes a clinical CRM, the integration boundary needs its own contract tests, covered in the clinical CRM integration writeup.

The diagnostic that pulls this together for an operator who wants to know whether their existing build has the right scaffolding is the productized stack audit. The audit catches the build-level gap; the boundary suite above catches the per-PR regressions inside it.

Will this suite slow PR feedback to the point that the team disables it?

The PR-time leg runs the boundary suite filtered to changed paths, which keeps it under five minutes on a moderate codebase. The full property-based encryption test and the full Playwright sweep run nightly. If the PR leg starts creeping over five minutes, look for a fixture file that pulls in a giant module graph or an unfiltered Playwright project.

Why a sentinel string instead of a class instance or a typed marker?

The sentinel has to survive serialization. A class instance becomes a plain object after JSON.stringify. A symbol disappears entirely. A typed marker requires the receiving code to know about the type, which the logger transport and the URL inspector deliberately do not. A regex against the serialized output is the lowest-context check that covers logs, error payloads, URLs, and raw database bytes with one expression.

How is this different from a redaction layer in the logger?

The redactor is the first defense; the boundary tests are the second. The redactor handles known shapes (a member record passed as the metadata argument). It does not handle a stringified error whose detail field contains a record fragment, or a template-string interpolation that lands the legal name in the message before the redactor sees it.

Do I need a real Postgres test container, or is an in-memory mock enough?

For the encryption boundary, you need real Postgres. The contract is that the bytes at rest never contain plaintext PHI, and "the bytes at rest" only means anything against the actual storage engine. A mock can be made to satisfy any contract you write against it because you control the mock. A test container with your real migrations is the only setup that proves the encryption hook is wired correctly.

How does this fit alongside audit log assertions?

The audit assertion is structurally similar to the log assertion in step 2, run against the audit stream rather than the application log stream. The audit log should contain the four-field shape (actor, action, target, outcome) and references to records by opaque identifier. The same sentinel regex finds violations the same way. Adding it as a fifth assertion is a small change once the first four are in place.

Can these tests catch a misuse of use cache in Next.js 16?

Indirectly. The cache misuse pattern, where two members share a cache entry because the key does not include the member identifier, surfaces as the wrong record returning from a cached function. The URL guard and the error-payload guard will not catch it on their own. A targeted test that calls a cached function as member A and then as member B and asserts the returned record matches the caller is the addition that closes the gap.

Sources and specifics

  • The 1,185+ test count comes from one operator's regulated build shipped during 2024 and 2025; it is not a public benchmark and the shape varies by codebase.
  • The patterns described run on Next.js 16 with the App Router, Vitest 1.x on the server-side suites, Playwright 1.x on the browser-side suites, and Postgres 16 in a CI service container.
  • The four assertion categories (logs, error responses, URLs, encryption boundary) are the four points one operator asserted at build time. The choice of categories is editorial, not a published regulatory standard. The Security Rule at 45 CFR Part 164 Subpart C does not name test patterns; the mapping from regulation to assertion is one engineer's working interpretation.
  • The PHI fixture pattern uses synthetic data tagged with a sentinel string. It is suitable for in-development assertion, not for de-identification of a dataset under HIPAA Safe Harbor at 164.514.
  • The CI wiring uses GitHub Actions; the same shape works on other runners that support service containers, parallel jobs, and required status checks.

// related

Let us talk

If something in here connected, feel free to reach out. No pitch deck, no intake form. Just a direct conversation.

>Get in touch

Tell me what you’re trying to ship.

Send a quick message and I read it within a day, or talk to AI Michael first if you want to feel out your project before you write to me.