What We Found
The Silent Trap: exit(1) Is Not exit(2)
We discovered that 3 of our PreToolUse hooks were using sys.exit(1) to signal violations. They printed error messages, looked like they were working — but exit(1) is a non-blocking hook error. The tool call proceeds. Stderr is shown in verbose mode only. Only exit(2) actually blocks.
This matters because exit(1) is the universal Unix convention for "error." Developers writing enforcement hooks will naturally reach for it. The hook still runs and produces output — there's no obvious failure signal. The failure is completely silent.
The hooks were behaving exactly as written: running, printing error output, exiting non-zero. Everything looked correct. The only way to discover the problem was to observe that the tool calls the hooks were supposed to block were still proceeding. Three enforcement hooks in a production pipeline — a Class A gate, a README-before-code check, and a test-before-stack check — had been silently non-enforcing from the moment they were deployed.
# This looks like enforcement — it is NOT
sys.exit(1) # Non-blocking. Tool call proceeds. Violation slips through.
# This is actual enforcement
sys.exit(2) # Blocking. Tool call prevented. Stderr fed to Claude as context.
If your enforcement hook uses exit(1), it is silently not enforcing. The hook runs, produces output, exits non-zero — and the tool call proceeds anyway.
The Claude Code hooks reference does document this behavior correctly. But it doesn't warn that the intuitive choice — the one any developer with Unix experience will reach for — is the wrong one for enforcement. That gap is what we filed.
What We Filed
Contribution 1: Comment on #37550 — Hook Architecture Evidence
We contributed quantified production evidence to an existing discussion about hook enforcement. The core question being discussed was whether CLAUDE.md rules or PreToolUse hooks were more reliable for enforcing pipeline governance. We had the data.
We audited which of the 7 HARD RULES in our KB pipeline were actually enforced versus behavioral-only. The result: 2 of 7 were enforced via PreToolUse hooks with exit(2). The remaining 5 of 7 were CLAUDE.md-only — and all 5 were observed to be violated at least once during autonomous operation.
The reason is structural. CLAUDE.md is context window text. It raises salience, increases compliance probability, and is the right tool for guidance. But it operates inside the model's reasoning loop. PreToolUse hooks with exit(2) operate outside the model's reasoning loop entirely — they block at the OS level before the tool call completes. The model doesn't get a chance to reason its way around them.
Contribution 2: New Issue — exit(1) Documentation Gap
We filed a new issue requesting a prominent warning that exit(1) is silently non-blocking in PreToolUse hook enforcement contexts.
Three of our own hooks were silently failing. We are not casual users — we had read the hooks documentation carefully and had built a 10-hook enforcement pipeline with explicit governance rules. We still made this mistake because the documentation doesn't warn against it clearly enough. The behavior is technically documented; the footgun is not called out.
The request: add a prominent callout near the exit code behavior documentation explicitly stating that exit(1) will not block tool calls and that exit(2) is required for enforcement. Ideally with a warning pattern example that mirrors how developers naturally write enforcement code.
Contribution 3: New Issue — 8 Undocumented CLI Flags
Running claude --help on v2.1.86 revealed flags not present in the official documentation. We catalogued them with exact flag names, observed behavior, and platform context (Windows 11, bash shell).
Undocumented flags found: --brief, --debug-file, --file, --replay-user-messages, --mcp-debug (deprecated). Undocumented subcommands: claude doctor, claude install, claude setup-token. We also documented observed --bare flag behavior that exceeds what the docs describe — in bare mode, authentication is restricted to ANTHROPIC_API_KEY only, but skills still resolve, which is a non-obvious interaction worth noting explicitly.
Each flag was documented with its exact help text as it appears in claude --help output on the tested version, making the contribution directly verifiable.
Why Class A Evidence Matters for Open Source Contributions
Every claim in these contributions was backed by a specific Class A knowledge base entry — executed commands with observed output, not speculation. This is what makes contributions credible.
The difference between a useful issue and a noise issue is specificity: specific version (v2.1.86), specific platform (Windows 11), specific hooks with specific exit codes at specific line numbers, specific observed behavior versus expected behavior. When you file an issue with "I discovered this while operating a 10-hook enforcement pipeline on version 2.1.86 and here's the exact line in my code where the exit code was wrong" — that's a different signal than a general report.
The Class A trust tier in the AWACS knowledge pipeline exists precisely because of this. Class A entries are validated operational truth: a command was run, output was observed, the entry records exactly what happened. You can trace any claim back to a specific command log entry with a timestamp. That audit trail is what turns operational experience into credible evidence.
None of these contributions required speculation or inference. The exit code behavior was discovered by observing that tool calls proceeded when they shouldn't have, tracing the hooks to find the sys.exit(1) calls, changing them to sys.exit(2), and confirming the blocking behavior. The CLI flags were discovered by running claude --help and comparing output to documentation. Executed commands, observed results, documented evidence.
The Layered Architecture That Works
CLAUDE.md isn't useless — it's the wrong tool for hard enforcement but the right tool for behavioral guidance. The pattern that emerged from building and operating this pipeline is a three-layer governance architecture.
The first layer: CLAUDE.md rules raise salience and increase compliance probability for rules that can't be mechanically enforced. The 5 pipeline rules we couldn't wrap in hooks still benefit from their presence in CLAUDE.md. They're violated less often with the rules present. But "less often" isn't "never," and for must-not-fail rules, less often is not good enough.
The second layer: PreToolUse hooks with exit(2) for must-not-fail rules. These operate outside the model's reasoning loop. The model cannot reason its way around a hook that blocks at the OS level. This is where the critical pipeline invariants live: Class A writes require a prior candidate file and a librarian decision. No hook bypass, no exception, no context where the rule is relaxed.
The third layer: human-in-the-loop for high-risk operations. Some operations are consequential enough that no automated gate is sufficient on its own. The pipeline flags these explicitly and routes them for confirmation before proceeding.
The silent enforcement trap — hooks that look like they're working but aren't — collapses the second layer without any visible signal. That's the actual risk. Not that the hooks are hard to write, but that they can fail in a way that produces no error, no warning, and no indication that anything is wrong until you observe a violation that should have been blocked.