Skip to content

Decisions timeline

The orchestrator emits one decision per iteration: NEXT_WORKER F-023, NEXT_VALIDATOR F-022, DONE, ESCALATE …, CHECKPOINT …. The decisions timeline is the audit log of every verb the orchestrator has emitted across the mission, with three things per row:

  • Verb — recognised verbs are coloured; unrecognised verbs are flagged as ghosts (parser failed).
  • Iteration — which iteration emitted it.
  • Args — feature id, reason, etc.

The orchestrator sometimes wraps its decision in a markdown fence or adds a **bold** decoration. The parser strips these, but it can fail. A failed parse = the harness saw no decision = the loop stalls.

Tracking ghosts / total as a percentage gave us visibility into a real bug we’d been masking: 8% of orchestrator outputs were silently ghost-classified. We hardened the orchestrator prompt with an anti-fence rule and the rate dropped to <1%.

GET /api/harness/:slug/decisions
→ {
decisions: [{ ts, verb, args, iteration, isGhost }, …],
total: 63,
recognized: 57,
ghosts: 6,
ghostRate: 0.095,
byVerb: { NEXT_WORKER: 25, NEXT_VALIDATOR: 29, … }
}
  • Modal panel from the harness dashboard (palette → 🧠 Decisions).
  • Live table: timestamp, iteration, verb pill, args.
  • Toggle to hide ghost rows when reviewing only valid decisions.
  • Summary strip: total / recognised / ghosts / ghost rate / by-verb chips.

We surveyed 11+ frameworks; none of them expose orchestrator parse-ghost telemetry. Orchestrator output is usually treated as opaque. Making it observable is one of our genuine differentiators — it lets us catch prompt regressions in a single dashboard glance.

The low_ghost_rate check is one of nine signals in the harness /health endpoint.