Skip to content

Decision log

When	Decision	Rationale
Initial	bash + claude as the runtime	No agent-runtime opinion. Drop-in replaceable. ~1500 lines of supervisor that anyone can read in an afternoon.
Initial	Disk-only state	Any role can be invoked from a fresh shell at any time and produce the same output. Snapshots take iteration boundaries with no quiescence.
Initial	Fresh context per role	No drift across iterations. Anthropic prompt cache absorbs the cost (99.94% hit rate observed). See Why fresh contexts.
Initial	Per-role markdown prompts	Improvements compound without code changes. Branch-mergeable. Cache-key-stable.
Initial	Curator role	Without compaction `raw.md` grows unboundedly. Curator owns memory pruning. See Why a curator.
Mid	Plan-reviewer gate	Borrowed the structural intuition from gstack’s `/plan-ceo-review` but made it autonomous and gating. Catches scope drift before any worker fires.
Mid	Debugger role	Stuck features at attempts ≥ 3 deserve root-cause analysis, not blind retry. Fires once, writes notes, next worker reads them.
Mid	Crosscheck role	Different model re-validates the validator’s pass. Catches “confidently wrong” validator decisions. Optional, gated by config.
Late	Cost-cap removed	Was exiting at $25 vs real $135 spend (miscount). User wanted unbounded cost. Replaced with cost-soft-warn that paused via SIGSTOP. Even that was disabled.
Late	TDD iron law in validator	Superpowers-inspired. Behavioural assertions need a test that failed before the change. ~20-line prompt change, biggest quality lever we have.
Late	Boil the lake principle	gstack-inspired. Roles that sprawled (debugger, ui-qa, curator, product, architect) got an explicit “do fewer things perfectly” section.
Late	Evidence rule	claudecode-orchestrator-inspired. Every PASS claim must quote source output. Reports without evidence revert to failing.
Late	IDENTITY.md per role	Agent-Swarm-inspired. Cross-mission append-only memory. Curator promotes recurring patterns.
Late	TRICK: convention	MOLTRON-inspired. Worker-tagged generalisable observations get promoted to `identity/worker.md` after recurring.
Late	Named checkpoints	Hermes-inspired. Pre-agreed milestones gate progress. Distinct from escalations. Auto-fire on `triggerOn: post-planner`.
Late	Git worktrees	Composio-inspired. Each parallel lane gets its own filesystem dir. No git race possible.
Late	Service smoke-test	claudecode-orchestrator-inspired. Real curl + service startup before declaring DONE. Reopens features on failure.
Late	Two-mode parallelism	Conductor-inspired. `lane` (different features) vs `competition` (same feature, validator picks winner).
Late	Decisions timeline + ghost rate	Genuine differentiator. No other framework exposes orchestrator parse-ghost telemetry.

Org-layer decisions

The coding harness predated the org layer. These decisions were made when adding the company-of-directors layer on top.

When	Decision	Rationale
Org-init	5 fixed departments, no runtime mutability	Adding/retiring directors at runtime would break the message contract (each kind has a from-dept allowlist). 5 felt right (Business, R&D, Tech, Mgmt, Marketing) and changing it requires a code change on purpose.
Org-init	CEO is a mode of Business, not a 6th department	A 6th would expand the contract surface and require new outboxKinds. Modes already exist (orchestrator vs worker vs validator); adding “ceo” as a worker-variant for Business when handling Directives kept the dept count fixed.
Org-init	File-based storage (`~/.restart-org/.json`/`.jsonl`)	Mirrors the existing harness pattern. No DB migration. Any role can read full state from a fresh shell. The `audit.jsonl` is the durable record.
Org-init	Typed messages over conversational	Drift catastrophe in conversational orgs. 17 named kinds, each with explicit from/to/projectId rules, server-validated. Workers can’t invent new kinds.
Org-init	All actions live in a project	Reserved projects `platform-reserved` (Tech cross-cutting) and `org-ops-reserved` (governance) catch messages without a specific project so `projectId` is always set.
Org-init	ACTIONS contract instead of network access for workers	Workers run as `claude -p` subprocesses with no API access. They emit a structured `actions` block; the backend executes ops sequentially with charter validation. Failed actions abort the iteration cleanly.
Mid-org	Server-side schedulers, not browser-side	An auto-loop only when the operator has a browser tab open is useless. Schedulers live in `globalThis.__papercupSchedulers`, survive HMR, run even with all browsers closed.
Mid-org	Director memory injection	Each director’s prompt now includes a synthesized `## Recent decisions (your own memory, newest first)` block from the last 12 run records. Lets directors notice patterns like “I keep deferring this.”
Mid-org	ProgressUpdate side-effect → project status	Management can advance project state with a single `send_message` action (`metadata.statusUpdate: "in_progress"`). Cleaner autonomous loop — no separate PATCH op needed.
Mid-org	Reserved-project guard on delete	`DELETE /projects/:slug` rejects `platform-reserved` / `org-ops-reserved`. Reserved projects can still be wiped via direct file edit; the guard is for accidental UI/API deletes.
Late-org	Health folded into Harnesses	`/papercup/health` was a 4th view of the same 5 dept cards already on Organization + Harnesses. Folded the live ops controls onto Harnesses (now the single ops cockpit); cockpit is preserved at `_discarded/HarnessOpsCockpit.tsx` for revival.
Late-org	Documentation-first tab order	New operators learn the model before acting. Docs leftmost; Harnesses second (daily-use); reference tabs after.
Late-org	Shared lib for Papercup views	Both `apps/web/app/papercup/` and `apps/public-site/` render identical components from `libs/papercup-shared/`. Single source of truth; `readOnly` prop disables edit affordances on the public render.

Decisions we explicitly rejected

Rejected	Why
AutoGPT-style infinite loop	Drift catastrophe. Rejected in favour of supervised iterations with explicit decision verbs.
LangChain orchestration	Heavyweight Python runtime + agent abstractions we didn’t need. Bash+files won.
Linear/Notion task source integration	User explicitly said no external task source.
Vector DB memory (LanceDB, etc)	Markdown files are fine until they aren’t. Identity files stay sub-100KB.
Discord/Slack control surface	Coupled the harness to a particular UI chat surface. Rejected in favour of HTTP API + Next dashboard.
Tmux-based agent messaging (claude-swarm)	Coupled state to a terminal multiplexer. Files-on-disk are simpler.
Subdomain-based public site rewrite	Considered Cloudflare-rewrite of /p/papercup → public domain. Rejected: not isolated enough. Built `apps/public-site` instead.