Skip to content

Decision log

WhenDecisionRationale
Initialbash + claude as the runtimeNo agent-runtime opinion. Drop-in replaceable. ~1500 lines of supervisor that anyone can read in an afternoon.
InitialDisk-only stateAny role can be invoked from a fresh shell at any time and produce the same output. Snapshots take iteration boundaries with no quiescence.
InitialFresh context per roleNo drift across iterations. Anthropic prompt cache absorbs the cost (99.94% hit rate observed). See Why fresh contexts.
InitialPer-role markdown promptsImprovements compound without code changes. Branch-mergeable. Cache-key-stable.
InitialCurator roleWithout compaction raw.md grows unboundedly. Curator owns memory pruning. See Why a curator.
MidPlan-reviewer gateBorrowed the structural intuition from gstack’s /plan-ceo-review but made it autonomous and gating. Catches scope drift before any worker fires.
MidDebugger roleStuck features at attempts ≥ 3 deserve root-cause analysis, not blind retry. Fires once, writes notes, next worker reads them.
MidCrosscheck roleDifferent model re-validates the validator’s pass. Catches “confidently wrong” validator decisions. Optional, gated by config.
LateCost-cap removedWas exiting at $25 vs real $135 spend (miscount). User wanted unbounded cost. Replaced with cost-soft-warn that paused via SIGSTOP. Even that was disabled.
LateTDD iron law in validatorSuperpowers-inspired. Behavioural assertions need a test that failed before the change. ~20-line prompt change, biggest quality lever we have.
LateBoil the lake principlegstack-inspired. Roles that sprawled (debugger, ui-qa, curator, product, architect) got an explicit “do fewer things perfectly” section.
LateEvidence ruleclaudecode-orchestrator-inspired. Every PASS claim must quote source output. Reports without evidence revert to failing.
LateIDENTITY.md per roleAgent-Swarm-inspired. Cross-mission append-only memory. Curator promotes recurring patterns.
LateTRICK: conventionMOLTRON-inspired. Worker-tagged generalisable observations get promoted to identity/worker.md after recurring.
LateNamed checkpointsHermes-inspired. Pre-agreed milestones gate progress. Distinct from escalations. Auto-fire on triggerOn: post-planner.
LateGit worktreesComposio-inspired. Each parallel lane gets its own filesystem dir. No git race possible.
LateService smoke-testclaudecode-orchestrator-inspired. Real curl + service startup before declaring DONE. Reopens features on failure.
LateTwo-mode parallelismConductor-inspired. lane (different features) vs competition (same feature, validator picks winner).
LateDecisions timeline + ghost rateGenuine differentiator. No other framework exposes orchestrator parse-ghost telemetry.
RejectedWhy
AutoGPT-style infinite loopDrift catastrophe. Rejected in favour of supervised iterations with explicit decision verbs.
LangChain orchestrationHeavyweight Python runtime + agent abstractions we didn’t need. Bash+files won.
Linear/Notion task source integrationUser explicitly said no external task source.
Vector DB memory (LanceDB, etc)Markdown files are fine until they aren’t. Identity files stay sub-100KB.
Discord/Slack control surfaceCoupled the harness to a particular UI chat surface. Rejected in favour of HTTP API + Next dashboard.
Tmux-based agent messaging (claude-swarm)Coupled state to a terminal multiplexer. Files-on-disk are simpler.
Subdomain-based public site rewriteConsidered Cloudflare-rewrite of /p/papercup → public domain. Rejected: not isolated enough. Built apps/public-site instead.