Running creative review with AI as the second pair of eyes

Solo review loops have blind spots. I review my own drafts, find a few issues, ship, and then a client emails three days later to flag the typo I missed. For the last two months I have been running Claude as a second pair of eyes on every brand deliverable before it ships. These are the field notes on what the AI review catches, what it still misses, and how the review loop fits into the day without becoming another piece of ceremony nobody actually uses.

This is not a claim that AI reviewers replace senior human reviewers. They do not. The claim is narrower: on the specific failure modes a tired solo practitioner creates on their own output, an AI second pair of eyes catches a meaningful fraction before anything ships.

review loop · findings per pass

F3factual$28K in P2, $32K in P5
F4factualbrand name capitalized two ways
F5voice'leverage' violates voice dossier
F6voiceem dash on line 44
F7structurememo answers a different question
F8craftparallel list broken mid-paragraph

factualvoicestructurecraft

Self-review catches two obvious misses. The AI pass catches six more. The human pass catches the two the AI cannot.

Shipped a brand deck for a client midweek. Late Thursday afternoon, standing at the mailbox, I realized I had used the wrong brand name on slide 7. The deck had been in front of me for three days. I had reviewed it at least four times. I did not see the error.

This was not a new pattern. Self-review failures for solo operators cluster around a specific shape: things my eye skates over because my brain already knows what is supposed to be there. Wrong names, duplicate words, missing Oxford commas in a list that usually has them, sentences that technically parse but mean the wrong thing. A second reader sees the page as it is, not as I expect it to be.

I have known this was a problem for a while. What changed this week is I decided to do something about it rather than just note it again.

2026-02-22: the first AI review pass

Built a reviewer skill. A Claude Code skill that takes a draft deliverable (markdown, a deck content doc, a PDF content export) and a set of reference files (the brief, the brand voice dossier, a banned-phrases list) and returns a structured review.

The review has four sections. Fidelity-to-brief: did the draft honor what the brief actually asked for. Voice-calibration: did the draft stay inside the brand voice dossier. Factual-consistency: did names, numbers, and dates stay consistent across the draft. Craft: typos, duplicate words, awkward constructions, broken parallelism.

Ran it on three backlog drafts I had already shipped. It found seven issues across the three drafts, two of which I had shipped with and needed to fix in a v2. One of those was a brand name inconsistency on a 40-slide deck, exactly the class of error that shipped on 02-08. The review caught it in eleven seconds.

2026-03-10: what it caught that I missed

Three weeks in. The reviewer has become part of the ship checklist on every brand deliverable. I looked back at what it has caught:

Factual drift. In 8 of 11 reviewed drafts, the reviewer flagged at least one internal inconsistency I had not noticed: a number that appeared as $28K in one paragraph and $32K in another, a product name capitalized two different ways, a date that moved by a week between the brief and the draft. All of these are the kind of error a human reader would catch if they read the doc with full attention; the AI catches them without attention fatigue.

Voice drift. In 6 of 11 drafts, the reviewer caught specific phrases that violated the brand voice dossier. "Leverage" on a brand that bans it. A triple-beat rhythm on a brand whose voice is plain and declarative. Em dashes on a brand where they are banned. These are small but they are the exact places where AI-drafted text betrays itself as AI-drafted, and catching them before shipping keeps the output feeling hand-written.

Structural misalignment. On a few longer drafts, the reviewer flagged that the draft did not actually answer the question the brief asked. This is the subtlest class of catch and it is where the reviewer earned its keep. A 2,000-word memo that is technically well-written but answers the wrong question is a worse outcome than a 400-word memo that answers the right question, and the reviewer caught two of those in eight weeks.

2026-03-24: what it missed that a human caught

The reviewer is not perfect, and the places it fails are consistent.

Judgment calls about tone in context. The voice dossier catches the obvious things. It does not catch whether a specific paragraph lands right for a specific executive who is going to read it. A senior human reviewer who knows the audience catches tone-in-context in a way the AI does not.

Strategic misalignment with client internal politics. A brand architecture deliverable I shipped in March passed every reviewer pass but turned out to have framed a product in a way the client's CEO was actively moving away from. The reviewer did not catch it because the reviewer did not know about the internal shift. A human collaborator who was closer to the client would have.

Craft judgment on stylistic choices. When the reviewer flags a stylistic issue, it is usually right. When it decides not to flag something because the draft is "within voice," it sometimes misses that the voice itself is slightly flat in that section. A human reviewer reads for energy, not just fidelity, and the AI is worse at that dimension.

The pattern: AI reviewers are strong on rule-based checks (is this phrase banned, is this number consistent, does this match the brief) and weak on judgment-based checks (does this land, does this match where the client is going). The lesson is not to stop using AI review; it is to keep the scheduled human reviewer in the loop for the deliverables where judgment matters more than rule-checking.

2026-04-07: the current review loop

The loop I settled on, running now:

Every brand deliverable gets a self-review pass. Same as before. I catch the large structural things myself.

Then the AI reviewer runs against the draft, the brief, and the voice dossier. I described the voice dossier structure in the brand voice prompt library walkthrough. The reviewer returns the four-section review. I address its flags, push back on any I disagree with (about 20 percent of the time), and keep a log of the pushbacks so I can see when the reviewer is systematically wrong about something.

For a subset of deliverables I tagged as review-critical (brand-defining work, work for new clients, anything with external distribution), I add a human reviewer on top. Scheduled in advance, paid for their attention, 30 to 45 minutes of dedicated read time. The human catches the judgment-based things the AI misses.

That is the complete loop. Self, AI, human for the critical subset. It takes longer than self-review alone, and it catches substantially more than self-review alone. The trade is worth it for anything going to a client.

What the two months taught me

AI reviewers are not replacing senior collaborators. They are replacing the "second glance" I was not doing at all. Before I wired this up, there was a bunch of work that just got one pass of self-review and shipped. That work is now getting two passes, and the second pass is surfacing issues consistently. The alternative was not "junior designer reviews my work"; the alternative was "nobody reviews my work."

The reviewer is only as good as the reference material it gets. A reviewer pass without the voice dossier produces generic feedback. A reviewer pass with a dense dossier and the actual brief produces specific, actionable feedback. The mechanic is the same one that makes the role-conditioned drafting skill work. Density of reference material is the lever.

Track the pushbacks. When I disagree with a reviewer flag, I log it. Patterns have already emerged: the reviewer is slightly over-sensitive to long sentences, and slightly under-sensitive to passive voice. Knowing that lets me weight its feedback appropriately rather than just agreeing with every flag.

The review loop is one piece of the larger hybrid role I write about in the creative-tech operator playbook, where carrying more context per unit of time is only sustainable if the shipping surface has a second pair of eyes somewhere in the loop.

“
The alternative was not "junior designer reviews my work"; the alternative was "nobody reviews my work."
”

Frequently asked questions

Can the reviewer see visual design, or only text?

Claude with vision can read the images in a PDF or a deck export, and it does a reasonable job of flagging obvious visual issues (text overflow, inconsistent spacing, mismatched color usage against the brand palette). It is not replacing a designer's review pass. For visual craft I still want a human. For catching the obvious misses, vision review adds real value.

How long does one review pass take?

Eight to fifteen minutes of wall-clock time for a typical document or deck. The reviewer runs in the background while I move on to the next thing. Processing the feedback and making the edits takes another 10 to 20 minutes depending on how much the reviewer flagged. On an ugly draft it can climb to 45 minutes, but those are the cases where the review was most valuable.

Does the reviewer make decisions, or just flag issues?

It flags issues and suggests fixes. I decide what to accept. I ignore roughly 20 percent of flags either because I disagree or because the reviewer is wrong about the context. The design principle is that the reviewer never ships anything; it surfaces work for me to ship.

Could this work for a team of two or three, not just solo?

Yes. On a small team the reviewer plays the role of the cheap first-pass review: catching the things that do not need a senior human's attention. The senior reviewer then focuses on judgment-based feedback. The split lets a small team get larger-team review coverage without the larger-team cost. I wrote about the broader context of this in the concept to production same person retrospective.

What is the worst case if the reviewer flags something incorrectly?

I waste five minutes investigating a false positive. That is the worst case. The cost of a false positive is tiny; the cost of a missed real issue is large. The asymmetry is why running the reviewer on every deliverable is worth it even when its hit rate is imperfect.

Sources and specifics

Pattern observed across 11 brand deliverables reviewed in February and March 2026; the reviewer flagged 23 issues that self-review missed.
Review loop grounded in brand engagements including the brand architecture case study, extended across later brand work.
The reviewer is a Claude Code skill that loads the draft, the brief, and the voice dossier; architecturally similar to the voice library described in the brand voice prompt library walkthrough.
Human reviewers are kept in the loop for a subset of deliverables flagged as review-critical; the AI reviewer does not replace them for judgment-based feedback.
The full review loop is packaged in the solo operator’s ship stack for practices running concept-to-ship without a collaborator.