What we kept, dropped, and invented

Borrowed principles

From	Pattern	Our adoption
Superpowers	TDD iron law	Validator HARD RULE: behavioural assertions need failing-then-passing test
gstack	Boil the lake	Added to debugger / ui-qa / curator / product / architect prompts
claudecode-orchestrator	Quality through truth	Validator EVIDENCE RULE: quote source output to claim PASS
claudecode-orchestrator	Service smoke-test	`bin/service-smoke-test.sh` + `smokeTest.onDone/onFeaturePass`
Hermes	Named checkpoints	`CHECKPOINT <name>` decision verb + `triggerOn: post-planner` auto-fire
Composio	Git worktree per agent	`branchIsolation.useWorktrees` + `worktree_*` fns
Conductor	Two-mode parallelism	`parallelWorkers.mode: "lane"\|"competition"`
Agent Swarm	Per-agent IDENTITY.md	`~/autonomous-harness/identity/<role>.md` cross-mission append-only
MOLTRON	Self-evolving learnings	Worker `TRICK:` convention promoted by curator
Ralph Loop	4-layer memory	L0 runtime / L1 raw / L2 summary / L3 MEMORY / L4 identity
GSD	Per-phase orchestrators with state-to-disk	Fresh-context per role per iteration
Agentwise	Real-time dashboard	Next.js `/harness` UI

What we explicitly rejected

From	Pattern	Why rejected
OpenSwarm	Linear/Notion task source	User explicit: no external task source
OpenSwarm	LanceDB vector memory	Overkill at our scale
Claude-Swarm	Tmux-based agent messaging	File-based simpler
Agentwise	Discord/Slack control	UI-coupled
MOLTRON	Workers rewrite their own prompts	Audit / determinism preferred
AutoGPT	Infinite loop, no decision verbs	Drift catastrophe
LangChain	Heavyweight Python runtime	Bash + claude won
All “skill packs”	Human as orchestrator	We needed autonomous loop

What we invented (or at least: didn’t find elsewhere)

Pattern	Why we needed it	Where it lives
Decisions timeline + ghost rate	Detect orchestrator parser regressions	`/api/harness/:slug/decisions` + dashboard panel
Cost-per-feature attribution	Per-session cost tracking is too coarse	Filename tag pattern: `<ts>-<role>-<fid>.jsonl` → `/usage byFeature`
Plan-reviewer as autonomous gate	gstack’s `/plan-ceo-review` is human-triggered; ours fires automatically	`prompts/plan-reviewer.md` + run.sh gate
Autonomous Product role	MOLTRON evolves capabilities; we wanted scope expansion	`prompts/product.md` + GOAL.md vs SPEC.md diff + proposals CRUD
Proposals CRUD with bulk accept/reject	Operator triage workflow	`.harness/proposals/*.md` + dashboard 🪄 tab
Mission cost cap with SIGSTOP auto-pause (later removed)	Soft warn before hard cap	Removed at user request after $135 mission
Per-feature timeline aggregating snapshots + agent runs + debug + PR	Forensics replacement for log scrolling	`/api/harness/:slug/features/:id/timeline`
Health endpoint with 9 composite checks	One-glance system health	`/api/harness/:slug/health`

The bet, summarised

We bet that:

A small bash supervisor with no agent-runtime opinion is the right unit of generality.
Stable per-role markdown prompts give us prompt-cache-dominance (99.94% hit rate).
Files-on-disk are the right state model — any role can re-derive from them.
A curator role doing memory compaction prevents unbounded raw.md growth.
Observability beats autonomy — we’d rather see the orchestrator’s parse-ghost rate than have a more autonomous orchestrator we can’t observe.

So far the bet is paying off: 99.94% cache hit, 22/37 features green at iteration 5 of the live mission, $135/mission, $0.05/role-invocation effective cost.