Erebus

An R&D control plane for agentic engineering

Memory · Learning · Isolation · Convergence — compliant from the ground up.

lifetime
31,093
LLM dispatches routed
38,915
task state transitions
105,339
events stored
live data fetched 2026-05-23 18:55 UTC ↓↑ arrows · scroll wheel · space

Surface inventory

What's actually running

Live scrape of the control plane — current state of the system as of 2026-05-23 18:55 UTC.

lifetime
1,066
tasks in store
615
routing outcomes
103
arch docs ingested
16
task types tracked

Swarm mode paused · bandit strategy thompson

erebus-surfaceControl Planerouter + DOsTask Store(DO)Worker FleetVector Memory768·384·1024DEvent Storeappend-only SQLiteMicroVM Pool(sandbox)
erebus-surfaceControl Planerouter + DOsTask Store(DO)Worker FleetVector Memory768·384·1024DEvent Storeappend-only SQLiteMicroVM Pool(sandbox)
erebus-surfaceControl Planerouter + DOsTask Store(DO)Worker FleetVector Memory768·384·1024DEvent Storeappend-only SQLiteMicroVM Pool(sandbox)

Pillar 1 — Memory

Three vector indices, not one chat scrollback

  • Episodic (768D) — task histories, decisions, outcomes across runs.
  • Code-semantic (384D) — functions, modules, architecture notes from repos.
  • Cross-project docs (1024D) — specs, ADRs, design docs.
  • Handlers read/write to memory automatically — no manual copy-paste.
The same prompt fired twice doesn't re-learn the lesson — the result is durable, not chat-window-bound.

live prompt-variant A/B (per task type)

Task typePrompt variantSamplesWin
code-change test-aware 9 45.5%
code-change plan-first 10 41.7%
code-change minimal 10 41.7%
code-change diff-focused 9 36.4%
feature scaffold 6 30.0%
feature default 10 25.0%
bug-fix test-aware 6 25.0%
feature plan-first 7 22.2%

Pillar 2 — Learning

Bandit picks the model — these are today's picks

Top arms by Thompson-sampling win rate from 30,687 observed task outcomes across 651 arms. No synthetic data on this slide.

Task typeModel armPullsWinLatency
CI-gate fix OpenAI sonar-pro 65 97.2% 10.6s
CI-gate fix openai_native:gpt-4.1 77 97.2% 15.4s
code-change DeepSeek deepseek-chat 908 97.1% 73.9s
code-change Gemini gemini-3.1-flash-lite-preview 2,141 97.1% 24.3s
code-change groq:qwen3-32b 5,015 97.1% 7.3s
code-change groq:gpt-oss-120b 5,068 97.1% 4.8s
CI-gate fix DeepSeek deepseek-chat 732 97.1% 34.2s
CI-gate fix Gemini gemini-pro-latest 392 97.1% 54.8s

Strategy: thompson. Arm cold-start at 10 pulls; rebalance runs continuously.

Pillar 3 — Isolation

Side effects run in microVMs, never on the host

  • Shell, fetch, test, lint handlers all run inside microVMs.
  • Per-task lifecycle: create → checkpoint → terminate. Hourly orphan reaper.
  • Pool allocator (Durable Object) coordinates VM creation, reuse, cleanup.
  • Per-task network + filesystem policy, independent of the developer machine.

live fleet — registered workers

WorkerTypeCapacityLast seen
sidhe-spark light 5000 4613s
sluagh-d9278465 heavy 10 131s
erebus-microvmTask HandlerPool Allocator DOVM 1VM 2VM 3Host machinenevernevernever
erebus-microvmTask HandlerPool Allocator DOVM 1VM 2VM 3Host machinenevernevernever
erebus-microvmTask HandlerPool Allocator DOVM 1VM 2VM 3Host machinenevernevernever

Pillar 4 — Convergence

Tasks have states, and the loop closes itself

Live snapshot of the task store — same store the bandit, verifiers, and failure-triage coach all read from:

IDTypeStatusTitle
d462a277-4a50-435e… research completed [lore] X-ray: hf-trl 1.0.0 — medium-risk (score:50)
task_verify_532126… verify completed Verify: Update lint-staged: 16.4.0 → 17.0.5 (major)
task_verify_532126… verify completed Verify: Update eslint-plugin-jsdoc: 62.9.0 → 63.0.0 (major)
task_verify_532126… verify completed Verify: Update @types/node: 24.12.4 → 25.9.1 (major)
task_verify_532126… verify completed Verify: Update @google/genai: 1.50.1 → 2.6.0 (major)
task_verify_532126… verify completed Verify: Update zod: 4.3.6 → 4.4.3 (minor)
▸ self-healing — built in, not bolted on

Failures don't disappear and they don't page anyone first. The control plane has four loops that try to repair before a human is involved:

  • Attention recycler — stuck tasks re-enqueued automatically.
  • Orphan reaper — hourly cron sweeps dead microVMs + workers.
  • Failure-triage coach — failed tasks routed to a learning handler; pattern feeds the anti-pattern detector.
  • Alignment auto-rollback — misaligned prompt variants get reverted by the gate, not by an operator.

Calibration

Every task type, every arm explored

Cold-start floor is 10 pulls per arm. Bandits don't get to pick until each option has been tried at least that many times — so the win rates on the previous slide are earned, not assumed.

Calibration state on every task type:

  • code-change — 47/47 arms explored
  • CI-gate fix — 47/47 arms explored
  • failure-triage — 47/47 arms explored
  • planning — 47/47 arms explored
  • research — 47/47 arms explored
  • troubleshooting — 47/47 arms explored
  • context-analysis — 47/47 arms explored

Fully calibrated: true

If a new model is added, the calibration counter drops and the router automatically explores it before letting it influence routing decisions.

Cost & throughput

Lifetime cost, priced per dispatch

Every dispatch is priced from provider token usage and written to a reward/cost ledger — the same ledger the bandit reads to update arm rewards.

$55.174
lifetime spend
$0.002
avg per dispatch
0
fallback events ever
23.6s
avg latency

lifetime cost by task type

Task typeDispatchesLifetime costAvg latency
code-change 18,597 $27.55 16.4s
CI-gate fix 5,218 $12.35 20.9s
failure-triage 3,377 $7.96 24.5s
context-analysis 2,595 $5.00 10.2s
research 424 $1.51 22.3s

Zero lifetime fallbacks = bandit's first pick succeeded on every routed task; no degraded-mode dispatch.

Securing the surface — 1/2

Zero-local-secrets

  • 82 secrets in the KV vault — 72 KV-backed, none on disk.
  • Bootstrap-of-bootstrap: initial config carries pointers; real credentials fetched at runtime.
  • Provider keys (LLMs, GitHub, Linear) never reach clients — every call proxies through internal APIs.
  • Bearer PROXY_API_KEY required on internal hops — no anonymous calls.

sample of vault keys (live)

CLOUDFLARE_ACCOUNT_ID · CF_WORKERS_AI_TOKEN · CF_WORKERS_TOKEN · CF_DNS_TOKEN · DEEPSEEK_API_KEY · ANTHROPIC_API_KEY · GEMINI_API_KEY · PERPLEXITY_API_KEY

erebus-secretsClientno keys, everAPI Gatewaybearer PROXY_API_KEYControl PlaneKV VaultLLM Providersreadrotate in placelate-bind
erebus-secretsClientno keys, everAPI Gatewaybearer PROXY_API_KEYControl PlaneKV VaultLLM Providersreadrotate in placelate-bind
erebus-secretsClientno keys, everAPI Gatewaybearer PROXY_API_KEYControl PlaneKV VaultLLM Providersreadrotate in placelate-bind

Securing the surface — 2/2

Access + webhooks

  • Admin surface fronted by zero-trust access (CF Access service-to-service).
  • Internal routing via Flycast — admin endpoints never reach the public internet.
  • Every internal call carries a bearer token — missing or mismatched → reject.
  • Incoming webhooks (GitHub, Linear, Codex) HMAC-verified before any work is enqueued.
  • 5-minute webhook-dedup window backed by a dedicated KV namespace.

Closes the three places agentic stacks usually leak:

keys in code open admin untrusted callers

Compliance

Compliant from the ground up — not bolted on after

The same architecture that runs inside firewalled networks maps directly to SOC 2 / GDPR / HIPAA / ISO 27001 controls. Each property is a property of the build, not a checkbox added later:

FrameworkControl areaHow it's already satisfied
SOC 2CC6 / CC7 — access & monitoringZero-trust access on admin surface; append-only event store for every dispatch.
GDPRArt. 25 / 32 — data minimization, integrityNo client-side keys, no shadow copies; provider calls proxied through one audited gateway.
HIPAA§164.312 — audit, transmission securityHMAC-verified ingress; bearer-token egress; per-task microVM isolation.
ISO 27001A.5 / A.8 / A.12 — policy, asset, opsKV vault as single source of truth (82 keys); rotation in-place; outbound restricted to webhook-pull.
Side effect: the system runs where most agent stacks can't — locked-down corporate networks, regulated environments, air-gapped review.

Where it's going

Direction

currently hardening

  • Tightening the alignment gate — richer policy language, auto-rollback of misaligned prompt variants.
  • Surface inventory operator console — the live data you just saw, refreshable on demand.
  • Observability dashboards over the event store + routing decisions.
  • Soaking the secure surface before loosening the swarm.

on the roadmap

  • More task handlers — infra changes, performance tuning, cross-repo refactors.
  • Cross-project bandit-arm exchange propagating winning patterns.
  • Headless audit captures (PDF + screenshot) of every routed decision.
  • Wider provider matrix as new model arms come online.

This deck is informational — a snapshot of what's built and where it's heading. Not a proposal.