Things move fast. I recently wrote about using the adversarial models with my feature-workflow plugin a few weeks ago — ripping out the internal self-review skill and replacing it with external reviewers. Different model, read-only, posting structured critiques on the PR. The core insight was that a different model catches things the implementing model doesn't, because it isn't agreeable to its own work. That part was right and hasn't changed.
But the process was cumbersome and time consuming. To get a review I ran /feature-submit in Claude Code, switched to a second terminal where gemini was waiting, pasted the PR URL, read the findings, switched back to Claude, described what needed to change, iterated. With three or four rounds per feature, it became difficult to keep track of terminals, even with using cmux.
The plugin is now at v9.2.3, the reviewer runs in GitHub Actions, triggered by a PR label and I don't have to switch terminals anymore.
Setup
One command wires the whole thing up. /feature-init asks for a branch prefix (feature/, feat/, whatever), a target branch (dev or main), a reviewer (gemini, codex, or none), and an API key if you picked a reviewer. It writes .feature-workflow.yml with those settings, drops .github/workflows/feature-review.yml plus the plan and impl review prompts into the repo, uploads the API key as a GitHub repo secret via gh secret set, and flips on the repo-level "Allow GitHub Actions to approve pull requests" setting so the bot's review approvals actually land. Every skill that manages branches — /feature-review-plan, /feature-review-impl, /feature-ship — reads the prefix and target from .feature-workflow.yml, so feature branches get named consistently and PRs target the right base without me typing it each time.
There's also /feature-init --update, which refreshes just the workflow file and the review prompts from the current plugin templates without touching config, the API secret, or existing features. I've used it several times as I've iterated on the prompts and the workflow gating — edit the templates in claude-code-plugins, run --update in my project, commit the refreshed .github/ files, and the next PR gets the new behavior.
Labels as the Trigger
/feature-review-plan and /feature-review-impl still do the git work — branch off dev, commit, push, open or update a draft PR. The new step is that they add a label at the end.
gh pr edit <pr-number> --add-label plan-review
A GitHub Actions workflow listens for pull_request labeled events and runs the reviewer inside the Action runner.
on:
pull_request:
types: [labeled, synchronize]
jobs:
plan-review:
if: contains(github.event.pull_request.labels.*.name, 'plan-review') && !contains(github.event.pull_request.labels.*.name, 'impl-review')
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0
- id: gemini
uses: google-github-actions/run-gemini-cli@v0
with:
prompt: ${{ steps.prompt.outputs.value }}
gemini_api_key: ${{ secrets.GOOGLE_API_KEY }}
- name: Post review
if: always() && steps.gemini.outputs.summary != ''
run: bash .github/scripts/post-review.sh
Labels specifically, not workflow_dispatch or commit triggers. They're scoped to the PR, visible in the GitHub UI, and remove-then-add is a cheap way to re-run a review without pushing a new commit or filing out a dispatch form.
Two Phases, Two Labels
The plan/impl split already existed in the previous version. plan-review runs against a PR containing only idea.md and plan.md. impl-review runs against the same PR once implementation lands. The two jobs in the workflow have mutually-exclusive label guards:
if: contains(github.event.pull_request.labels.*.name, 'plan-review') && !contains(github.event.pull_request.labels.*.name, 'impl-review')
The Submit Skill Stops
/feature-review-impl does its git work, swaps labels, and stops. The skill has explicit language telling Claude not to start reviewing the code locally after submitting:
Display the following to the user, then **STOP**. Do NOT launch any code review agents, do NOT run any review skills, do NOT analyze the code further. Your job is done.
What It's Caught
The clearest example I have was a potential P0 privacy leak. Private and unlisted records were surfacing in a public feed. The feature went through three rounds of plan review before a single line of code was written, and each round was blocked with a different correctness finding.
Round one: the plan fixed the write path in one Lambda and claimed a sibling Lambda was already correct. Gemini pointed at the exact line — !== 'private' — and noted that unlisted records are not 'private', so they leak. Plan was updated to tighten the gate to === 'public'.
Round two: a third leak path in a third Lambda on the client reconnect path, same bug. Also a DynamoDB semantics issue in the cleanup script — visibility <> :public evaluates to UNKNOWN against rows where the visibility attribute is missing entirely, so legacy rows without the attribute would stay leaked.
Fix: attribute_exists(gsi2pk) AND (attribute_not_exists(visibility) OR visibility <> :public).
Round three: a spread-inheritance bug. The write-path fix was structured as {...existing, ...(visibility === 'public' ? {gsi2pk: 'PUBLIC', ...} : {})}. If existing already held a leaked gsi2pk='PUBLIC' from a prior retry, the conditional spread is empty for non-public records, so the stale key gets inherited from ...existing.
Fix: explicitly set gsi2pk: undefined with DynamoDB's removeUndefinedValues: true.
Three correctness bugs, all against a markdown file, with no code written yet. Any one of these could have been annoying to deal with down the road and all were caught just by pushing the plan to GitHub.
In another scenario, I ran the impl-review while finishing up a different feature. An upstream client had changed its event payload between versions, renaming a boolean field. The Lambda I'd just written was still keying off the old field name and defaulting to false when it was missing, so every event from the new client version was being persisted with the wrong value. The same review pass flagged a routing bug where a special-case event type was being tagged as a generic event because the kind switch ran before the special-case check, a hydration-on-load call passing a field that's always null at init time, and a reorder-buffer livelock where unsequenced events could strand behind a missing sequence number.
All four issues were caught and corrected just by going through my standard Claude Code process.
What Changed About How I Use It
The review is asynchronous now. I run /feature-review-impl, the skill stops, and I go do something else while the Action runs. When the review comments land on the PR, I pull them into Claude's context with gh pr view <n> --comments. The PR is the single record of the review history — commits and comments in one place, not scattered across terminal scrollback. None of the review state is local. If I switch machines mid-feature the PR is still there with the review still attached.
The v9.x Hardening
Most of what changed between v9.0.0 and v9.2.3 was bugs I hit running features through the system. v9.1.0 moved the review-posting step into the workflow itself via post-review.sh because the Gemini action's built-in PR-review posting produced malformed output on certain verdicts. v9.2.1 fixed verdict parsing and added the plan-review gating from earlier in this post, both after hitting the bugs in slay-the-spire and upstreaming the fixes the same day. v9.2.3 synced the review prompts with prettier formatting.
Sequence Flow
sequenceDiagram
participant CC as Claude Code
participant GH as GitHub (PR)
participant GA as GitHub Actions
participant GM as Gemini CLI
CC->>GH: git push feature/<id>
CC->>GH: gh pr create --draft (or update)
CC->>GH: gh pr edit --add-label plan-review
Note right of CC: Skill stops here
GH-->>GA: pull_request labeled event
GA->>GA: checkout repo (fetch-depth: 0)
GA->>GA: load .github/review-prompt-plan.md
GA->>GM: run-gemini-cli with prompt + PR ref
GM->>GH: gh pr view <n> --json (diff, plan.md)
GM-->>GA: review summary + verdict
GA->>GH: post-review.sh → gh pr review --comment
Note over CC,GH: Async — I go do something else
CC->>GH: gh pr view <n> --comments
GH-->>CC: review findings
CC->>CC: iterate, re-submit