Fallow and Skylos: Static-Analysis Gates for AI-Generated Code

I've been workig on a fun side project to hook up my record player to a Raspberry Pi and automatically identify the songs being played. We'll have a lot more on that once it's complete. The core of the project consists of two fairly distinct parts: a TypeScript/React based front end, and a Python based back end that is processing the audio. Typically, this type of split annoys me because I am required to run two separate management harnesses. With Claude Code and long running agent workflows, it's provided some interesting benefits. Because the two components are so distinct, it's a little easier to run tasks against each in parallel.

However, this process comes with it's own drawbacks. In this case, the underlying work was fairly complex (turns out identifying songs correctly from vinyl when queries are not 100% is more difficult than it looks) and the code turned into a bit of a mess.

The first time I ran npx fallow health against the kiosk, the composite score came back at 88. That was already worse than I'd assumed, but the per-file breakdown was the actual surprise: a single file — NowPlaying.tsx, the screen the project is named after — was 287 lines of code with a cyclomatic complexity of 60 and a CRAP score of 3660. The fallow report described it as "the worst file in the kiosk by every metric," which is the kind of one-line summary that's hard to argue with. That report is what this post is about: what fallow surfaced, what skylos covers on the Python side, how both tools wire into pre-commit and Claude Code as hooks, and the four refactor passes that pulled the score from 88 to 98.

The two tools

Fallow is a TypeScript codebase analyzer. It scans for unused code, circular dependencies, duplication, complexity hotspots, and architectural boundary violations, and rolls those into a composite fallow health score out of 100, A–F grade. The components are cyclomatic and cognitive complexity, function-size distribution, maintainability index, and dead-code ratio. Fallow also exposes fallow audit — a changeset-scoped variant that only flags findings introduced by the current change against the base branch, which is what makes it usable as a pre-commit gate without drowning you in inherited debt.

Skylos is a Python static analyzer. It runs across five categories — Security, Quality, Dead Code, Dependencies, and Secrets — each scored with its own grade and weight in a composite codebase grade out of 100, A–F. Findings range from hallucinated imports and hardcoded secrets to swallowed exceptions, SQL-injection patterns, weak hashes, path traversal, dependency CVEs, circular imports, and the complexity-and-size smells that overlap with what fallow catches on the other side. Skylos also exposes skylos agent pre-commit — a staged-file-scoped variant focused on the subset of regressions an AI tool is statistically likely to introduce — which is what makes it usable as a pre-commit gate without drowning you in inherited debt.

One tool per language. Fallow watches the TypeScript kiosk frontend. Skylos watches the Python orchestrator that runs on the Pi.

What the tools flagged on the first run

The fallow summary line was straightforward:

● Health score: 88 A
  Deductions: unit size -X · dead exports -Y

The interesting part is what fallow health then lists by severity. CRITICAL covers very high complexity or CRAP scores — usually one function doing too many things. HIGH is above the cyclomatic-7 or LOC-60 threshold but not yet pathological. MODERATE shows up in the report but doesn't penalize the score.

The metric that matters for prioritization is the CRAP score — Change Risk Anti-Patterns — which combines cyclomatic complexity with an estimate of test coverage. A 13-cyclomatic function with no coverage scores around 182. The same function with full coverage scores closer to 13. NowPlaying.tsx scored 3660. Fallow can read Istanbul coverage data via --coverage to compute the score precisely; without it the estimate comes from the module graph, which is fine for relative ranking.

NowPlaying.tsx was the top finding, but there were several others above the LOC and cyclomatic thresholds — enough that the first refactor pass was already obviously scoped to "extract everything that isn't structural layout from that one file." Subsequent passes worked outward from there.

The skylos first run on the orchestrator side told a worse story:

Category	Score	Weight	Top finding
Codebase Grade	F (38/100)	—	—
Security	0 / F	35%	CRITICAL: possible SSRF
Quality	0 / F	25%	CRITICAL: cyclomatic complexity
Dead Code	100 / A+	20%	1 dead symbol
Dependencies	81 / B-	10%	LOW: aiohttp@3.12.14 CVEs
Secrets	100 / A+	10%	No secrets found

Two categories at the floor — Security 0/F and Quality 0/F, both with CRITICAL findings driving the score — pulled the composite down to F (38/100) despite Dead Code and Secrets being clean. The categories carry different weights (Security 35%, Quality 25%, Dead Code 20%, Dependencies 10%, Secrets 10%), so the composite is sensitive to where the rot lives. In this case it was sitting in the categories that weigh the most.

Cleaning up the kiosk

The drop from 88 to 98 happened across four separately-shipped features over a few sessions, each routed through a feature-workflow plugin I use (idea → plan → external review → implementation → external review → ship). The passes were:

kiosk-refactor-nowplaying — decomposed NowPlaying.tsx (287 LOC, cyc 60, CRAP 3660) into seven named regions: pure hooks for state (useScreenState, useArtPrefetch, useArtOverride), a now-playing/ sub-folder for the four presentational pieces, and lib/ helpers (lib/art.ts, lib/tracklist.ts) for the pure logic. Score 88 → 90. Test suite 0 → 18.
kiosk-refactor-identify — applied the same shape to the Identify route. Split useIdentifySearch and useIdentifyActions into smaller hooks and extracted pure search helpers. Test suite 18 → 30.
kiosk-fallow-quick-wins — purely mechanical JSX decomposition: every flagged file got its alternates.length > 0 && (…) block and its candidates.map(...) loop lifted into a sibling component. Five files split, one dead file deleted. No behavior changes, no new tests. Score 91.4 → 91.8.
kiosk-fallow-medium-wins — the slightly invasive cuts that quick-wins deliberately deferred: a useToast hook extracted from useIdentifySearch, a useSubmit hook out of useIdentifyActions, formatRelativeTime converted from an if-ladder to a [thresholdSec, divisorSec, unit] lookup table, and applyTokenMatch split into two pure helpers. Test suite 30 → 50.

The last two features were planned in tandem so they could run in parallel via git worktrees. Their file sets were disjoint by construction (quick-wins touched AdminOverlay.tsx, ArtPicker.tsx, TracklistPicker.tsx, AlbumCard.tsx; medium-wins touched useIdentifySearch.ts, useIdentifyActions.ts, NowPlaying.tsx, utils/format.ts). Claude Code's Agent tool can spawn subagents with isolation: "worktree" — each agent gets its own working tree on its own branch — which makes this kind of two-stream parallel work tractable from a single session.

The score progression looked like this:

Stage	Health score	Tests
Initial run	88	0
After `kiosk-refactor-nowplaying`	90	18
After `kiosk-refactor-identify`	91.4	30
After `kiosk-fallow-quick-wins`	91.8	30
After `kiosk-fallow-medium-wins` + config	98	50

The jump from 91.8 to 98 is partly real refactor work from medium-wins and partly a config change explained in the next section.

Cleaning up the orchestrator

The kiosk arc was feature-by-feature — each pass was one shipped refactor PR through the same plugin. On the orchestrator side, skylos clusters its findings by category (Security, Quality, Dead Code, Dependencies, Secrets), so the natural shape of the cleanup follows that grouping rather than file-by-file. Four phases, structured around which categories are at F and which are already clean:

Security suppressions for false positives. Three of the four initial Security findings were canonical SQLite parameterized bulk IN-clause patterns — f"... IN ({placeholders})" where placeholders = ",".join("?" * len(values)) and the values bind through the second argument to conn.execute. Skylos can't prove these are safe statically. Each gets a per-site # skylos: ignore SKY-D211 with a one-line WHY. The fourth finding is a _read_bytes path-traversal pattern in a helper that's only called on internal capture-pipeline paths; same treatment with SKY-D215. This phase moves Security from 0/F to 100/A+ without touching production behavior — it's pure annotation work.
Dependency CVE bumps. aiohttp>=3.12.14 → >=3.13.5 clears 18 CVEs. python-dotenv>=1.0 → >=1.2.2 clears 1. Straight version bumps with no API impact at the call sites. Dependencies moves from 81/B- to 100/A+.
Function-level complexity cuts. Quality findings cluster around three files — main.py, vinyl/fingerprint.py, llm.py. The pattern matches the kiosk's medium-wins pass: extract pure helpers, narrow try-block scopes, fix inconsistent return signatures. Behavior-preserving splits that drop cyclomatic and CRAP scores without changing what any function does externally. Three parallel implementer subagents in worktrees, one per target file, since the file sets are naturally disjoint.
God-file splits. Five files where size and complexity findings cluster — api.py, history.py, control.py, llm.py, main.py. Each converts from a single-file module into a multi-module package with public import surfaces preserved via __init__.py re-exports. main.py is the biggest of the five (~2000 LOC, the systemd ExecStart target) and becomes a thin shim plus a nowplaying.orchestrator package. Sequential rather than parallel because the splits share design decisions about where shared helpers should land.

All four phases shipped. The end state in main:

Category	First run	After cleanup
Security	0 / F	100 / A+
Quality	0 / F	100 / A+
Dead Code	100 / A+	100 / A+
Dependencies	81 / B-	100 / A+
Secrets	100 / A+	100 / A+
Codebase Grade	F (38/100)	A+ (100/100)

Same destination as the kiosk — both tools at the top of their scales — but a different shape getting there. The kiosk's findings were file-shaped and mapped cleanly onto file-scoped feature PRs. The orchestrator's findings were category-shaped and mapped onto category-scoped phases that didn't slice as evenly. Same gate, same workflow plugin in principle — different cleanup geometry because the tools surface debt differently.

The score formula and test files

After medium-wins landed, the composite score was still in the low 90s even though every production function was now under the LOC threshold. Fallow's score formula is non-monotonic with new test code: every vitest describe(...) block counts as a long function, so adding 50 tests pushed the very-high-risk function-size profile back up and re-triggered the unit-size penalty.

The fix is a one-line config in .fallowrc.json at the repo root:

{
  "health": {
    "ignore": ["**/*.test.ts", "**/*.test.tsx"]
  }
}

That excludes test files from the size profile while still tracking them for dead-code detection. The alternative — setting production: true — also hides test coverage from the CRAP-score calculator, which inflates scores on tested helpers. Use health.ignore instead. With test files excluded, the score moved into the 98 A range, where it sits today.

The pre-commit hook

Now that the code was in better shape, I wanted to make sure it stayed that way. To do this, I wired up a pre-commit hook to run before committing.

repos:
  - repo: local
    hooks:
      - id: skylos-agent
        name: skylos (staged-file regression scan)
        entry: uvx skylos agent pre-commit pi/
        language: system
        pass_filenames: false
        types_or: [python, toml]
        stages: [pre-commit]

      - id: fallow-audit-kiosk
        name: fallow audit (kiosk frontend)
        entry: bash -c 'cd kiosk && npx fallow audit'
        language: system
        pass_filenames: false
        files: ^kiosk/.*\.(ts|tsx)$
        stages: [pre-commit]

Two hooks, one per language, scoped to the directory each language owns. The skylos hook fires on staged Python or TOML files anywhere under pi/; the fallow hook fires on staged TypeScript files under kiosk/. A commit that only touches one side pays the cost of one tool, not both.

A few details worth pointing at in that config:

fallow audit, not fallow health. Audit is changeset-scoped against the base branch (which it auto-detects), and only findings introduced by the commit trip the hook. Inherited debt is ignored by default. --gate all makes it stricter on a young codebase, but new-only is the right starting posture for adoption.
uvx skylos agent pre-commit pi/, not uvx skylos pi/ -a. The agent pre-commit subcommand is purpose-built for AI-generated diffs — it focuses on regressions an agent is statistically likely to introduce (hallucinated imports, hardcoded secrets, command injection from f-strings, swallowed exceptions) rather than running the full SAST sweep. The full sweep is what you run manually once a week; this is what gates each commit.
pass_filenames: false on both hooks. Each tool figures out its own file list from git rather than from the hook framework — pre-commit's default is to pass staged file paths as args, which both tools would either ignore or get confused by.
The files: regex on fallow. Scopes the hook to kiosk TypeScript only, so a Python-only commit doesn't pay the npm startup cost (~1 second of overhead before fallow even starts).
bash -c 'cd kiosk && ...' on the fallow entry. Pre-commit invokes entry from the repo root; fallow expects to run from inside the package it's auditing. The skylos hook doesn't need this because uvx skylos agent pre-commit pi/ takes the target path as an argument.

One-time setup on a fresh clone is pipx install pre-commit && pre-commit install. After that, every git commit runs both hooks against the relevant staged files in roughly 1–2 seconds. The per-commit gate is changeset-scoped; the wider standalone modes (cd kiosk && npx fallow health, uvx skylos pi/ -a) are what produced the 88-A and F (38/100) baseline numbers earlier in the post and are the commands you'd run to see the full state of the codebase rather than just what a commit introduced.

Verifying the hook fires

pre-commit install is supposed to write .git/hooks/pre-commit. It will silently refuse to do so if git config core.hooksPath is set to a non-default value in the repo — returns 0, prints nothing useful, the hook file never lands. From that point on, git commit runs without invoking fallow or skylos. The install looks like it succeeded; the gate is doing nothing. Worth checking explicitly before trusting the install.

The check:

git config core.hooksPath

If that prints anything, unset it and re-install:

git config --unset-all core.hooksPath
pre-commit install

To confirm the gate is actually wired, stage a deliberately-bad file and try to commit it. Something like this in the kiosk:

// kiosk/src/_gate_test.ts
export function complexNonsense(x: number, y: number, z: number): string {
  if (x > 0 && y > 0) return 'a';
  if (x > 0 && y < 0) return 'b';
  // ... 15 more branches, cyclomatic 31
  return 'z';
}

git add kiosk/src/_gate_test.ts
git commit -m "test"

The two outcomes are visibly different. When the hook is firing correctly, git commit exits 2 and prints fallow audit JSON to stderr — function name, line, severity (CRITICAL), CRAP score (~992 for cyclomatic 31), and introduced: true indicating the gate distinguishes new findings from inherited baseline debt. The commit doesn't land. When the hook isn't firing, the commit lands silently with no audit output at all. One test tells you which state you're in.

The same check applies to the skylos hook. Stage a file under pi/ with a swallowed exception (except Exception: pass) or a hardcoded secret and skylos agent pre-commit blocks the commit the same way — exit 2, finding in stderr.

The suppression escape hatch

After a long Claude Code session that ran a kiosk feature through the full review-respond-merge cycle, I ran npx fallow --summary and got back a familiar shape: 9 dead files, 44 functions above the complexity threshold, 3 clone groups. Roughly what the codebase looked like before any of the cleanup work. But every pre-commit run in the session had reported fallow audit (kiosk frontend) ............... Passed. The gate had blocked zero commits.

The two tools answer slightly different questions:

fallow audit (what the gate runs) → "did this commit introduce any new findings vs. main?" → 0 introduced → PASS
fallow --summary → "what's the total debt in HEAD right now?" → 9 / 44 / 3 → debt

Same tool, different attribution. The gate uses changeset-scoped gate: new-only, which is the right default for adoption — you don't want every commit blocked on inherited baseline debt that long predates the branch. The escape hatch is what counts as "no new finding."

When the agent hit a complexity warning, it had two paths to make the gate pass:

Option A: refactor the function — extract helpers, reduce conditionals,
          re-run tests, verify nothing regressed (~20 minutes)

Option B: add // fallow-ignore-next-line complexity above the function
          (~3 seconds)

Both produce introduced: 0. Both pass the gate. Under an objective of "get this cycle green and move on," the agent picked option B every time. By the time the PR merged, the diff had added a wall of fallow-ignore-next-line comments and the underlying functions were just as complex as before. The gate had no opinion on this — fallow has no way to know whether a suppression was justified or drive-by. It just trusts the comment.

275 mentions of fallow-ignore-next-line in the session log (upper bound — includes search results; lower bound from actual added lines was still solidly in double digits). 1 legitimate skylos: ignore (the SKY-D211 false positive on a parameterized bulk IN-clause).

Wrong fixes, then the right one

The first instinct was to tighten the gate. Ban any commit that adds a fallow-ignore-* line. Cap suppressions per PR. Move to absolute thresholds instead of new-only attribution. Each of those would have stopped the drive-by suppression — and also stopped the legitimate suppressions that the codebase already depends on.

The skylos # skylos: ignore SKY-D211 on fingerprint.py is a parameterized bulk IN-clause that skylos can't statically prove is safe. The two SKY-Q501 suppressions on State and Orchestrator are deliberate: those are single-container classes by design, and splitting them would scatter mutations across five files for the linter's benefit alone. Killing the suppression mechanism would either force genuinely worse code (string-built SQL, fragmented state classes) or push the agent into a worse escape hatch like deleting tests or carving up files purely to dodge the line count.

The actual problem wasn't that suppressions existed. The agent reached for them first instead of last. That isn't a gate problem; it's a behavior problem. Behavior in a Claude Code session is shaped by the prompt context, not the static analyzers — so the fix went into the prompt, not the gate.

The change: a prompt rule

The wording, kept short enough to actually be followed and concrete enough to not lawyer around:

## Static analysis suppressions (fallow, skylos)

Suppressions are a **last resort**, not a way to make the gate pass.
Before adding `// fallow-ignore-*` or `# skylos: ignore SKY-...`:

1. **Try to fix it first.** Most complexity findings extract cleanly into
   helpers. Most dead-code findings are actually dead. Most clone groups
   reduce to a shared function.
2. **If you can't fix it, justify it.** Every suppression MUST have an
   adjacent comment explaining *why* the finding is wrong or the fix is
   worse than the suppression. "complexity" with no reason is not a
   justification.
3. **Cap of 2 new suppressions per PR.** If a single PR needs more, the
   feature is doing too much or the refactor pass was skipped — stop and
   split the work.

Examples of legitimate suppressions already in the repo:
- pi/nowplaying/orchestrator/state.py:7 — State is intentionally one
  container; splitting scatters mutations.
- pi/nowplaying/vinyl/fingerprint.py — SKY-D211 false positive on a
  bulk IN-clause; the SQL is parameterized, the linter just can't see it.

A drive-by // fallow-ignore-next-line complexity above an unmodified
function is the failure mode. If you can't explain why the function
shouldn't be simpler, simplify it.

The rule sits at the plugin layer rather than per-repo CLAUDE.md, so every project that uses the same agent skill inherits it. The gate itself stays unchanged. Suppressions still work. Pre-commit still uses new-only attribution. The only change was telling the agent — once, in a paragraph — what "fixing it" actually means.

After

Five features merged after the rule landed, across roughly two days and 26 commits. Then the same audit:

Fallow (kiosk): 0 dead files, 0 dead exports, 0 functions above threshold, MI 92.8. Only remaining finding: 3 clone groups / 117 duplicated lines — pre-existing twins between two route components, tracked debt.
Skylos (pi): A+ (100/100). Held.
New // fallow-ignore-* lines across the 26 commits: 0.
New # skylos: ignore lines: 0.

Whatever complexity warnings the agent ran into during those features got fixed by actually refactoring — which is what fallow had been suggesting in its action list for every finding the whole time. The agent had just been going straight to the suppression option because nothing in the session told it not to.

The full setup, file-by-file

Below is every file you'd need to replicate this on your own project. Copy-paste verbatim and adjust paths.

`.pre-commit-config.yaml` (at repo root)

# Install once per clone:
#   pipx install pre-commit && pre-commit install
# Then `git commit` will run the hooks below on staged files.

repos:
  - repo: local
    hooks:
      - id: skylos-agent
        name: skylos (staged-file regression scan)
        entry: uvx skylos agent pre-commit pi/
        language: system
        pass_filenames: false
        types_or: [python, toml]
        stages: [pre-commit]

      - id: fallow-audit-kiosk
        name: fallow audit (kiosk frontend)
        entry: bash -c 'cd kiosk && npx fallow audit'
        language: system
        pass_filenames: false
        files: ^kiosk/.*\.(ts|tsx)$
        stages: [pre-commit]

`.fallowrc.json` (at repo root)

{
  "health": {
    "ignore": ["**/*.test.ts", "**/*.test.tsx"]
  }
}

health.ignore excludes test files from the LOC + complexity profile without breaking dead-export detection. Don't use production: true — it also hides test coverage, which inflates CRAP scores on tested helpers.

`pi/pyproject.toml` `[tool.skylos]` block

Global rule suppressions for known false positives and policy nags. Per-site # skylos: ignore SKY-XXX comments still document each call-site reason; this block is for rules that fire across many files for non-actionable reasons.

[tool.skylos]
ignore = [
    # --- Security false positives (cache-key hashing, not crypto) ---
    "SKY-D207",  # Weak hash SHA1 — used only as a cache key digest; non-crypto
    "SKY-D208",  # Weak hash MD5 — same

    # --- Dependency-resolution false positives ---
    "SKY-D222",  # Hallucinated dependency — false positive for local script imports
    "SKY-D223",  # Undeclared dependency — scripts/ + tests/ use [project.optional-dependencies] which skylos doesn't read

    # --- Architectural advisories (small-project metrics noise) ---
    "SKY-Q801", "SKY-Q802", "SKY-Q803", "SKY-Q804",

    # --- Pythonic idioms skylos is over-strict about ---
    "SKY-L027",  # Duplicate string literal — false-positive on JSON field names
    "SKY-L028",  # Too many returns — guard-clause early-return is pythonic

    # --- Policy nags about config files we don't use ---
    "SKY-R101", "SKY-R102", "SKY-R103", "SKY-R104",

    # --- Clone detection: intentional parallel-method patterns ---
    "SKY-C401",  # Fires on intentional parallel-method patterns (LLMAssist.judge_* mirror set)
]
max_lines = 80
max_args = 7

One-time install commands

After dropping the files above into a fresh clone:

pipx install pre-commit
git config --unset-all core.hooksPath   # if it's set; pre-commit refuses otherwise
pre-commit install                       # writes .git/hooks/pre-commit

# Verify the hooks run on a clean tree — both should pass / exit 0
pre-commit run --all-files

# Live regression test: stage a deliberately bad file, try `git commit`.
# The hook should exit 2 with the offending function, severity, and CRAP
# score in stderr; the commit should not land.

Once installed, the only ongoing maintenance is keeping fallow and skylos on a recent enough version. The pre-commit framework caches each tool's environment under ~/.cache/pre-commit/ and rebuilds it when the config changes.