Encrypted fields in PostgreSQL for HIPAA: a working pattern

The PHI columns in a regulated Postgres database have to be encrypted at rest, queryable in production, and rotatable without a six-week rewrite. pgcrypto, envelope encryption, and Transparent Data Encryption all sound like they answer that question. Only one of them does, and which one depends on how the rest of the system is built. An approach to PHI storage on Postgres that has held up across backup-restore drills, regional failover, and a key rotation cycle looks like envelope encryption sitting on top of pgcrypto, with a blind-index column on the lookups that have to survive.

This is the storage-layer companion to the four App Router PHI surfaces breakdown. That post covers boundaries between server and client. This one covers what happens to PHI once it lands in the database.

The choice nobody phrases correctly: pgcrypto vs envelope vs TDE

The first mistake I made on this work was treating Transparent Data Encryption, pgcrypto, and envelope encryption as three vendors selling the same product. They answer three different threat questions and they stack rather than substitute.

Transparent Data Encryption, the kind RDS and most managed Postgres providers ship by default, encrypts the data files on disk and the backups. It protects against someone walking out of the data center with a drive. It does not protect against an application bug, a leaked database password, a logged ciphertext, or a backup restore that lands plaintext in a non-production environment. Memory is plaintext. Query results are plaintext. TDE is a perimeter, not a column-level control.

pgcrypto is a Postgres extension that encrypts inside the column. The application or a trigger calls pgp_sym_encrypt with a key, and the column stores ciphertext. Reads call pgp_sym_decrypt with the same key. The DBA looking at the table sees ciphertext. The application that holds the key sees plaintext. This is the right default for small surfaces, and the layer where most teams stop and pay for it later.

Envelope encryption wraps the column-encryption key (the DEK) with a master key (the KEK) held by a key management service. The DEK travels with the row, encrypted. KMS decrypts the DEK only on demand, only after an IAM check, and every decryption is logged. The plaintext PHI is encrypted by a key the database never sees in cleartext, and the master key never leaves the KMS boundary. This is what scales across regions, tenants, and compliance audits.

A regulated app needs all three, layered. TDE protects backups. pgcrypto encrypts columns. Envelope encryption manages keys. This layered stack is what carries three portals over one encrypted data layer on a HIPAA-grade platform I shipped. The interesting design choice is where pgcrypto stops and envelope encryption starts, which is what the next sections work through.

Single jagged glass fragment isolated on a dark studio backdrop, broken edge under cold electric-blue rim, hot-pink dispersion bleeding across the inner face. — // the fragment · broken edge under twin rim lights

When pgcrypto is enough (and when it lies to you)

The simplest case: one small Postgres database, one tenant, one application, a handful of clearly-marked PHI fields, a single AES key in an environment variable, and a workload that fits inside a single team's heads.

-- One-time install
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- The columns
CREATE TABLE member_profile (
  id              uuid PRIMARY KEY,
  email_ct        bytea NOT NULL,           -- pgp_sym_encrypt
  legal_name_ct   bytea NOT NULL,
  date_of_birth_ct bytea,
  created_at      timestamptz NOT NULL DEFAULT now()
);

// app/lib/phi.ts, server-only
import "server-only";
import { sql } from "@/lib/db";

const KEY = process.env.PHI_COLUMN_KEY!; // 32 bytes, hex-encoded

export async function insertMember(input: {
  id: string;
  email: string;
  legalName: string;
  dateOfBirth: string | null;
}) {
  await sql`
    INSERT INTO member_profile (id, email_ct, legal_name_ct, date_of_birth_ct)
    VALUES (
      ${input.id},
      pgp_sym_encrypt(${input.email}, ${KEY}),
      pgp_sym_encrypt(${input.legalName}, ${KEY}),
      ${input.dateOfBirth
        ? sql`pgp_sym_encrypt(${input.dateOfBirth}, ${KEY})`
        : sql`NULL`}
    )
  `;
}

This works. It encrypts at the column boundary, the DBA cannot read values, and the audit story is "the application has the key, the database does not." For a small regulated surface, this is the right starting point.

The hidden cost is the key. PHI_COLUMN_KEY lives in an environment variable, which means it lives in the platform's secret store, which means it lives in every node that runs the app. Rotating it requires coordinating every running process and re-encrypting every existing row. The first time you have to rotate (because a contractor left, because a node leaked an env dump into a log, because a quarterly auditor asked) you discover the rotation is a six-week project on a system that was supposed to be done.

The second hidden cost is the randomized-versus-deterministic split. pgp_sym_encrypt is randomized: the same plaintext produces different ciphertext on every call. That is the right default for confidentiality and the wrong default for any column you need to look up by exact value. The standard fix is the blind index: a separate column that stores an HMAC of a normalized plaintext. The HMAC is deterministic for the same input plus the same secret, so equality lookups work, and the HMAC does not reveal plaintext to anyone without the secret.

ALTER TABLE member_profile ADD COLUMN email_blind_idx bytea;
CREATE INDEX member_profile_email_blind_idx ON member_profile(email_blind_idx);

import { createHmac } from "crypto";

const BLIND_INDEX_SECRET = process.env.PHI_BLIND_INDEX_SECRET!;

function blindIndex(value: string): Buffer {
  const normalized = value.trim().toLowerCase();
  return createHmac("sha256", BLIND_INDEX_SECRET).update(normalized).digest();
}

export async function findMemberByEmail(email: string) {
  const idx = blindIndex(email);
  return sql`
    SELECT id FROM member_profile WHERE email_blind_idx = ${idx} LIMIT 1
  `;
}

The blind index secret is a separate key from the column key, on a different rotation schedule, in a different secret store. Picking equal-strength secrets in separate stores is the part teams skip on the first pass.

Ultra-wide distant view of a solitary translucent monolith on a dim plain at dusk, vanishing horizon, electric-blue atmospheric haze, hot-pink rim on the far edge. — // the distant form · monolith on vanishing plain

When envelope encryption beats column encryption

Single-key column encryption stops scaling the moment you need different blast radii: one tenant compromised should not leak another, one region breached should not open the others. That requires different keys, which requires a place to keep keys that is not a long list of environment variables.

Envelope encryption answers this with a two-layer key hierarchy. Each row (or each tenant, or each table partition) has its own DEK. The DEK encrypts the plaintext. The DEK itself is encrypted by a KEK that lives in a key management service like AWS KMS, GCP KMS, or HashiCorp Vault Transit. The wrapped DEK travels with the row. To read the plaintext, the application asks KMS to decrypt the DEK, then uses that DEK to decrypt the column. The KEK never leaves the KMS boundary. The DEK exists in plaintext only briefly in application memory.

CREATE TABLE member_profile (
  id                uuid PRIMARY KEY,
  email_ct          bytea NOT NULL,
  email_dek_ct      bytea NOT NULL,           -- DEK wrapped by KMS
  email_key_version int  NOT NULL,
  email_blind_idx   bytea,
  legal_name_ct     bytea NOT NULL,
  legal_name_dek_ct bytea NOT NULL,
  legal_name_key_version int NOT NULL,
  created_at        timestamptz NOT NULL DEFAULT now()
);

import "server-only";
import { randomBytes, createCipheriv, createDecipheriv } from "crypto";
import { kmsClient } from "@/lib/kms";

const KEK_ID = process.env.PHI_KEK_ID!;     // KMS key alias, not a secret
const CURRENT_VERSION = 1;

async function wrapDek(plaintext: Buffer): Promise<Buffer> {
  const out = await kmsClient.encrypt({ KeyId: KEK_ID, Plaintext: plaintext });
  return Buffer.from(out.CiphertextBlob!);
}

async function unwrapDek(ciphertext: Buffer): Promise<Buffer> {
  const out = await kmsClient.decrypt({
    KeyId: KEK_ID,
    CiphertextBlob: ciphertext,
  });
  return Buffer.from(out.Plaintext!);
}

export async function encryptField(plaintext: string) {
  const dek = randomBytes(32);                    // 256-bit DEK
  const iv = randomBytes(12);                     // GCM nonce
  const cipher = createCipheriv("aes-256-gcm", dek, iv);
  const ct = Buffer.concat([cipher.update(plaintext, "utf8"), cipher.final()]);
  const tag = cipher.getAuthTag();
  const wrapped = await wrapDek(dek);
  dek.fill(0);                                    // best-effort scrub
  return {
    ciphertext: Buffer.concat([iv, tag, ct]),
    wrappedDek: wrapped,
    keyVersion: CURRENT_VERSION,
  };
}

export async function decryptField(
  ciphertext: Buffer,
  wrappedDek: Buffer,
): Promise<string> {
  const iv = ciphertext.subarray(0, 12);
  const tag = ciphertext.subarray(12, 28);
  const ct = ciphertext.subarray(28);
  const dek = await unwrapDek(wrappedDek);
  try {
    const decipher = createDecipheriv("aes-256-gcm", dek, iv);
    decipher.setAuthTag(tag);
    const pt = Buffer.concat([decipher.update(ct), decipher.final()]);
    return pt.toString("utf8");
  } finally {
    dek.fill(0);
  }
}

Three things this pattern buys you. KMS logs every decryption, so the "who decrypted PHI when" audit trail is the KMS log itself. The DEK is unique per field write, so a row-level breach does not give attackers a corpus to attack the master key. Per-tenant or per-region scoping is a question of which KEK you wrap with; the row layout does not change.

The cost is operational. Every read crosses a KMS boundary, adding a few milliseconds plus IAM and rate-limit considerations. At read-heavy scale, cache the unwrapped DEK in memory for the lifetime of a request and let it die with the response, never the process. KMS rate limits bite first; benchmark before you ship.

Query patterns that survive encryption

Encrypting columns breaks four query patterns and preserves one. The preserved pattern is exact-equality lookup on a blind index. The broken ones are substring search, range queries, sort orders, and joins on encrypted values. The mistake to avoid is reaching for a tool that "decrypts on read" inside the database; that undoes the encryption boundary you just built.

The working playbook for each pattern. Equality on an encrypted column: blind index, HMAC the normalized value with a separate key, index the HMAC. Range or sort on an encrypted value: don't. Compute a non-PHI surrogate (year-of-birth band, postal prefix, age bracket) and sort on that, decrypt the rest at the read edge. Joins between two encrypted-PHI tables: never join on the PHI directly. Store a pseudonymous member ID in both tables, join on the pseudonym, decrypt at the read edge after the join is materialized.

-- Equality with blind index, fine
SELECT id, email_ct, email_dek_ct
FROM member_profile
WHERE email_blind_idx = $1
LIMIT 1;

-- Range, do not
-- (Imagine SELECT WHERE date_of_birth_ct < '2000-01-01' going wrong.)

-- Sort on a derived non-PHI surrogate, fine
SELECT id, year_of_birth_band
FROM member_profile
ORDER BY year_of_birth_band DESC;

-- Join on pseudonym, fine
SELECT mp.id, mp.email_ct, mp.email_dek_ct, ar.appointment_at
FROM member_profile mp
JOIN appointment_record ar ON ar.member_pseudonym = mp.member_pseudonym
WHERE mp.member_pseudonym = $1;

Full-text search catches teams off guard. Postgres tsvector over a ciphertext column is meaningless. The pattern is a separate search index holding only redacted, non-PHI tokens (intake keywords, plan codes, never names or contact info), with the application enforcing redaction at write time. The PHI columns themselves never feed the search path. The healthcare-cluster post on shipping audit-trail tables that hold up under inspection covers the parallel trail you build alongside this so reads of decrypted PHI leave a record.

“The mistake is treating TDE, pgcrypto, and envelope encryption as substitutes. They are not.”

Macro close-up of crystalline dispersion on a single glass slab, refractive bands of cold blue splitting into hot-pink across the chipped corner, fine surface detail. — // the dispersion · refractive bands close up

Key rotation strategy without a 30-day rewrite

The key rotation question decides whether your encryption story is real. A pattern that cannot rotate keys is a pattern with one expiration date. The honest version is that rotation never finishes; you just stop depending on the old key in any code path that runs in production.

The pattern that holds is per-field key versioning plus lazy rotation on next write. Every encrypted field carries its key_version. The decrypt helper dispatches on version: v1 unwraps with the v1 KEK alias, v2 with the v2 alias. New writes use the current version. Reads of old rows decrypt with the old version. A background job re-encrypts cold rows on a budget. The rotation is "complete" when no row carries the old version, which it never quite is, and that is fine; the old key still works until you actually retire the KEK alias.

type KeyVersion = 1 | 2;

const CURRENT_VERSION: KeyVersion = 2;

const KEK_BY_VERSION: Record<KeyVersion, string> = {
  1: process.env.PHI_KEK_ID_V1!,
  2: process.env.PHI_KEK_ID_V2!,
};

async function unwrapDekV(
  ciphertext: Buffer,
  version: KeyVersion,
): Promise<Buffer> {
  const out = await kmsClient.decrypt({
    KeyId: KEK_BY_VERSION[version],
    CiphertextBlob: ciphertext,
  });
  return Buffer.from(out.Plaintext!);
}

export async function decryptFieldV(
  ciphertext: Buffer,
  wrappedDek: Buffer,
  version: KeyVersion,
): Promise<string> {
  const dek = await unwrapDekV(wrappedDek, version);
  // ...same AES-GCM decrypt path as before
  // ...lazy-rewrite path: if version !== CURRENT_VERSION, schedule a re-encrypt
}

The lazy-rewrite path is the part that actually moves rotation forward. When the app reads a row whose key_version is below the current version, the read path enqueues a background re-encrypt of that row with the current version. After ninety days, the long tail is small enough that you can run a full sweep without hurting production, and after that you can revoke the old KEK with a clean audit trail.

The blind-index secret rotates the same way, with the same complication: rotating it invalidates the existing index column, so you build an email_blind_idx_v2 column alongside the v1 column, dual-write during the rotation, and drop v1 once cutover is complete.

Backup and restore implications nobody plans for

A regulated app's backup story has three pieces: the database backup, the KMS key, and the runbook to bring them together. The backup itself, taken with pg_dump or a managed snapshot, dumps ciphertext, which is the right behavior. The restore is where it falls apart.

The first failure mode is restoring to a region without the KMS key. The KEK is regional. A snapshot restored elsewhere cannot decrypt the wrapped DEKs without first replicating or recreating the KEK in the new region. The fix is multi-region KMS keys, used deliberately, with restore drills that include the key-decrypt step and a verification read of a known-good encrypted row.

The second failure mode is restoring to a non-production environment for debugging. Encrypted columns are inert in staging if staging cannot reach the production KEK. That is the right compliance posture and the wrong outcome for the engineer trying to reproduce a bug. The pattern is a separate "synthetic PHI" key in non-production, with a script that decrypts production data, applies a deterministic anonymization (HMAC the SSN with a staging-only secret, replace email domains), and re-encrypts under the staging KEK. That script is its own attack surface and sits behind the same review gates as production access.

The third failure mode is cross-region replication. Postgres replicates ciphertext fine and the wrapped DEK travels with the row, but the KEK has to exist in the replica region or the KEK's multi-region setting has to be on. Easy to forget on the day of a failover.

Ultra-wide backlit silhouette of a single tall translucent slab against a deep electric-blue evening sky, hot-pink magic-hour band along the low horizon. — // the silhouette · slab against deep dusk sky

How to set this up on a fresh schema

Six steps, in this order, before any application code touches the database.

Step 1: classify the columns. Walk every table with a privacy lens. Mark each column as PHI, quasi-PHI (identifiers that combine into PHI), or non-PHI. Only PHI gets envelope-encrypted. Quasi-PHI gets a pseudonymization decision (HMAC, tokenize, leave alone). Skipping this produces "encrypt everything" budgets and "encrypt nothing usable" outcomes.

Step 2: pick AES-256-GCM and write it down. GCM is authenticated encryption; the decrypt step verifies the ciphertext has not been tampered with. CBC without an HMAC is not a defense against active attackers. Pick GCM, document it in the architecture notes, and do not let anyone add a "lighter" cipher later.

Step 3: split into a dedicated phi schema. Postgres schemas are namespaces. Put encrypted-PHI tables in a phi schema with a restricted role grant: only the application service account can SELECT and INSERT, only the app role's session can call the decrypt helpers, the read-replica replication role gets SELECT on ciphertext columns only. The principle of least privilege becomes visible in the grants themselves.

Step 4: standardize the column quartet for every encrypted field. For an email field: email_ct, email_dek_ct, email_key_version, and optionally email_blind_idx. The naming is load-bearing; a future engineer should recognize the pattern at a glance.

Step 5: keep the encrypt and decrypt helpers in the application data-access layer, not in SQL. The database has no business calling KMS or holding the unwrapped DEK in memory. The encryption boundary belongs where you can review it, test it, and audit it.

Step 6: write a CI test that round-trips every encrypted field with both the current and previous key version. This is the test that catches the "we shipped a v2 key but forgot to keep v1 in the dispatcher" bug. It runs on every pull and is the cheapest insurance you can build. The same kind of round-trip thinking shows up in the productized stack audit I run for DTC and regulated clients when the encryption layer is in place but no one has tested whether it survives a key rotation under load.

If those six are in place before the first row writes, the pattern holds when the system grows. If they are bolted on later, the rewrite is the project. The field guide for logging in HIPAA-regulated apps without leaking sensitive fields and the working pattern for wiring Salesforce Health Cloud into a Next.js app are the two adjacent practices that round out this storage layer into a defensible compliance posture.

FAQ

Is pgcrypto still recommended in 2026, or is it considered legacy?

pgcrypto is fine for column-cipher mechanics, especially for small surfaces. The issue is the key management around it. If the key sits in an env var and rotation is manual, you have outgrown pgcrypto-alone the moment you need to rotate. Combining pgcrypto's primitives with KMS-managed envelope encryption keeps the column work simple and the key work scalable.

Why not use Postgres at-rest encryption or AWS RDS encryption alone?

Those features encrypt the data files at rest. That is necessary and not sufficient under HIPAA's encryption guidance. They do not protect against application-level breaches, leaked DB credentials, logged ciphertext, or a pg_dump copy restored to staging. Column-level encryption sits on top of the at-rest layer; it does not replace it.

Do I need a blind index for every encrypted column?

No. Add a blind index only for columns you actually look up by exact value. Every blind index is another secret to manage and another data-leak surface (the HMAC reveals plaintext distribution). A common rule of thumb: blind-index email and external-identifier columns, load names and addresses by primary key.

What about searching encrypted columns with order-preserving or homomorphic encryption?

Order-preserving encryption leaks order, which leaks distribution, which is enough to identify members in any dataset over a few hundred rows. Homomorphic encryption is real and slow; it is rarely the right answer for a transactional regulated app. The pattern that holds in production is to compute non-PHI surrogates for queries that need ordering or aggregation, and decrypt PHI only at the read edge.

How do I handle PHI in JSONB columns?

Treat the JSONB blob as a single encrypted unit unless individual fields need independent access. If they do, split them into separate typed columns. JSONB-with-mixed-PHI makes per-field encryption awkward and per-field redaction in logs error-prone. Splitting the schema costs migration work; not splitting costs you on every audit.

Does this pattern work the same way on Supabase or Neon as on RDS?

The Postgres-side primitives (pgcrypto, schema grants, GCM) are the same. The KMS pattern depends on the host. Supabase exposes Vault for app-managed secrets and integrates with provider KMS in some tiers; Neon offers similar primitives at the project level. Confirm the host's BAA covers the workload and that the KMS used is compliance-eligible before adopting either.

Sources and specifics

pgcrypto extension reference: PostgreSQL official docs, postgresql.org/docs, pgcrypto module.
AES-256-GCM and authenticated encryption: NIST Special Publication 800-38D.
HIPAA Security Rule encryption guidance: 45 CFR 164.312(a)(2)(iv); HHS guidance on rendering PHI unusable, unreadable, or indecipherable.
AWS KMS envelope encryption pattern: AWS KMS Developer Guide, "Envelope encryption".
Patterns observed across PHI storage-layer engineering in regulated Next.js engagements; no client-identifying schema names, record counts, or infrastructure identifiers are included.
Code samples target Node.js 20+, TypeScript strict, PostgreSQL 15+ with pgcrypto. Forward-compatible with Postgres 16 and 17.
Nothing here is legal advice. Engage privacy counsel and a qualified compliance reviewer before applying these patterns in production.