ActiveGraph continuity layer
Evolving the Gitwami agent architecture toward one typed graph, an append-only event log, and behaviors that react to change.
A plan to evolve gitoku's agent architecture from stateless, per-table job queues toward an ActiveGraph-style continuity layer: one evolving graph of typed nodes and edges, an append-only event log, behaviors that react to changes (instead of hardcoded triggers), forkable sandboxes for agent code execution, and the surrounding NixOS services imported as first-class context.
Status (2026-06-06): Phases 1, 2a, 2b, 2c, 4b-metadata, 5a, 5b, 5c shipped;
prompt-context (Stage 0c Pillar 1) and per-repo agent manifest +
fork-agent-run (Stages 0 + 3) shipped alongside. The graph triple
(graph_nodes/graph_edges/graph_events) is live, behaviors fire via
the dispatcher, provenance dual-writes into the graph, semantic-memory
injection at prompt render is on, and the trace-as-product Activity UI
ships. Remaining unbuilt: NixOS sandbox-host + microVM fleet (Stage 4a)
and the Stage 5d continuity capstones. See
continuity-completion for the active
stage-by-stage execution roadmap with commit hashes.
1. Motivation
Today an agent run is disposable: a fresh clone into an ephemeral workspace, destroyed on teardown, leaving only a commit + a summary string. There is no carried-over memory, no typed relationships between facts, and no generic event log. The ActiveGraph thesis is that the hard part of long-running agents is not the loop — it's the continuity layer around it: a shared evolving state where tasks, claims, evidence, decisions, risks, failures, tools, and relations coexist; events record what changed; behaviors react; and a trace explains how anything came to exist.
The good news: gitoku already has three of the four pillars in primitive form. This plan generalizes them rather than rewriting.
2. Current state (baseline)
| ActiveGraph pillar | Gitoku today | Location |
|---|---|---|
| World state | Flat relational domain tables, no typed edges | repositories, pull_requests, pull_request_reviews, issues |
| Behaviors (reaction) | Per-table IHP job queues, each hardwired to one trigger | prompt_pull_request_jobs (Schema.sql:774), pull_request_review_jobs (:810), pull_request_conflict_resolution_jobs, pull_request_diff_ai_response_jobs, pull_request_form_suggestion_jobs |
| Trace / lineage | Commit-keyed AI provenance + per-step run trace. Capture-side fidelity is heuristic-windowed, not full-thread — Stage 2d unifies this so each commit's stored context is the full delta of the agent session since the prior commit. | commit_ai_contexts (:746) via AiContextLookup.hs; workflow_runs (:1082) + workflow_run_steps (:1133); rgh capture in rgh/src/Rgh/Capture.hs |
| Unified graph + event log | Shipped (43b62840, Phase 1) — graph_nodes / graph_edges / graph_events triple in Postgres, plus Application.Graph writer + Stage 2a provenance dual-write from commit_ai_contexts. | Application/Schema.sql graph tables; Application/Graph.hs; Application/GraphProvenance.hs |
| Carried-over agent memory | Shipped (5ac4719d, Stage 5b) — recent memory-flavored graph nodes (claims / decisions / failures / evidence) project into every AI surface's prompt via Application.GraphMemory + Application.PromptContextBlock. Per-surface byte budgets cap the injection. | Application/GraphMemory.hs; Application/PromptContextBlock.hs; commit_ai_contexts projected by GraphProvenance |
| Prompt context layering | Unified end-to-end across all 5 AI surfaces (Stage 0c Pillar 1, shipped) — every prompt-PR / PR review / conflict resolution / diff-thread reply / form-suggestion run loads from Application.PromptContextBlock with per-surface byte budgets. UI visibility: prompt-PR card footer, per-PR AI activity panel on the conversation tab, in-comment footer on the AI diff-reply, create-form footer on the AI form suggestion. guidance_bytes / memory_bytes persisted on every job row. | Application/PromptContextBlock.hs; Application/Jobs/*Run.hs (5 surfaces); Application/Jobs/PullRequestDiffAiResponse/Run.hs appendContextBytesFooter; Web/View/Repositories/Show.hs renderPullRequestSuggestionContextFooter; Web/View/PullRequests/Show.hs renderPullRequestAiContextSummary |
| LLM provider | Hardcoded per surface — Codex CLI for the three agentic surfaces (prompt-PR, review, conflict); OpenAI gpt-5-mini direct for the two HTTP surfaces (diff-reply, form-suggestion). Claude Code credentials exist (Application.ClaudeCodeCredentials) but have no consumer. No OpenRouter / Anthropic-direct path. Stage 0c Pillar 2 unifies this (CLI runner dispatch + chat-provider abstraction + tool-call harmonization). | Application/CodexCredentials.hs, Application/ClaudeCodeCredentials.hs, IHP.OpenAI call-sites in PullRequestDiffAiResponse/, PullRequestFormSuggestion/ |
Sandbox model. Isolation is an enum dispatched by case, not an interface
(IsolatedExecution.hs:91): Local (in-process), LocalRunner (child process,
same host — what git.lazare.ai uses), AwsEcsTask, AwsEc2Vm. Inside the
runner it's plain bash -lc, docker run, or a codex subprocess. Swapping a
backend touches three sites: parseExecutionBackend, WorkflowRun.hs:84,
PromptPullRequest/Run.hs:128. The payload contract (WorkflowVmPayload,
PromptRunnerEnvironment) and the HTTP callback protocol are already
backend-agnostic — the interface is latent but unnamed.
3. Target architecture
A JSONB node/edge/event triple in the existing Postgres (no graph DB), layered alongside domain tables, which become projections.
graph_nodes (id, repository_id, node_type, status, payload JSONB, created_at, updated_at)
node_type ∈ {task, claim, evidence, decision, risk, failure, memory, tool, service, agent}
graph_edges (id, src_node_id, dst_node_id, relation, payload JSONB, created_at)
relation ∈ {supports, contradicts, depends_on, derived_from, blocks, exposes, uses_secret}
graph_events (id, node_id, edge_id, event_type, actor, before JSONB, after JSONB, run_id, created_at)
-- append-only; node_id/edge_id nullable, exactly one setPrinciples:
- Postgres + JSONB, not Neo4j — right weight for an IHP app.
- Domain tables stay; nodes are projections/backfills, so nothing relational is lost.
- The event log is the product: any node's lineage reconstructs by traversal.
- "Next step emerges from what changed," not from a hardcoded DAG.
4. Phased plan
Phase 1 — Graph alongside, backfilled from projections
Goal: introduce the triple without disrupting anything.
- Add
graph_nodes,graph_edges,graph_eventsvia a new forward-only migration (Application/Migration/<ts>.sql); updateApplication/Schema.sql. Make idempotent (IF NOT EXISTS). - New module
Application/Graph.hs: node/edge/event types + insert/query helpers (usesqlQueryTyped/sqlExecTyped, per repo rule — neversqlQuery/sqlExec). - Backfill mapping: PR →
tasknode; review comment →claimnode; eachcommit_ai_contextsrow →evidencenode +derived_fromedge to the commit/task. - Read path: a "lineage" view that traverses edges for a node ("why does this exist?").
- Done when: an existing PR renders its provenance as a graph traversal with no behavior change.
Phase 2 — Event-driven behaviors (highest leverage)
Goal: replace bespoke per-table triggering with one dispatcher over events.
- Define a
Behaviorregistry: predicate(event_type, node predicate)→ action. Start as a record-of-functions inApplication/Behaviors.hs. - One dispatcher job reads new
graph_eventsand fans out to matching behaviors (reuse the IHP job runner as the execution substrate). - Re-express existing jobs as behaviors:
claimwith nosupportsedge → spawn researchtask- two
contradicts-linkedclaims → spawn review depends_onsatisfied → unblock dependenttask- PR opened → existing review / diff / form-suggestion behaviors
- Register per-repo agents as behaviors: an
[agent.*]manifest'sevents/branchesdeclarations (see developer-features §5) become a behavior predicate → invoke that named agent. This is the roadmap's "AI router dispatching to agents," and it reusesworkflowMatchesEventfor matching. Per-agent instructions (gitoku.toml-wired, opt-in) compose with ambient repo guidance (Stage 0b — every branch's AGENTS.md / CLAUDE.md / *.md, opt-out): both layers feed the samerenderCodexPromptseam, ambient first, per-agent second. - Keep the old
*_jobstables as the execution mechanism; only the triggering moves to events. - Done when: PR review fires from a graph event, not a direct enqueue, with identical output.
Phase 3 — Patches: propose vs. accept
Goal: risky mutations stay proposed until approved.
- Add a
proposed/acceptedstatus to graph mutations; reusepull_request_reviews/ review-governance for approval. - Risky
memory/decisionwrites land asproposed→ approval queue; accept promotes them. - Done when: a memory-write behavior produces a proposed node a human can approve/reject.
Phase 4 — Trace as product
Goal: unify the lineage fragments.
- Fold
commit_ai_contexts+workflow_run_stepswrites intograph_events(dual-write first, then read from events). - Reconstruct any node's full history: which behavior created it, what evidence supports it, what changed after.
- Done when: a PR's "how did this conclusion form" view is built purely from the event log.
5. Sandboxes for agent code execution
Unlocks the self-improvement loop. Two changes:
5a. Name the backend interface
- Turn the
IsolatedExecutionBackendenum into a record-of-functions / typeclass:provision → injectPayload → run → collect → teardown. The interface is already latent in the backend-agnostic payload + callback protocol. - Collapse the three dispatch sites (
parseExecutionBackend,WorkflowRun.hs:84,PromptPullRequest/Run.hs:128) onto the interface. - Injection point:
runLocalRunnerProcess/localRunnerProcessCommand(IsolatedExecution.hs:683/1146).
5b. NixOS-native forkable backend
- Add a
systemd-nspawn(ormicrovm.nix/firecracker) backend where the sandbox image is a Nix derivation — reproducible toolchain, single-host, no AWS. - Forkability: a fork = a fresh nspawn from the same snapshot.
5c. Self-modification loop (depends on Phases 2–4 + 5b)
- Run a behavior in sandbox A; record the trace as events.
- Propose a prompt/rule patch as a
proposedgraph node. - Fork sandbox B with the patch; run the same task.
- Diff the two trace subgraphs — reuse gitoku's existing PR-diff machinery.
- Accept the winning node to promote the change.
"Self-modification with lineage" falls out of graph + fork + diff.
Note: the ambient repo guidance files (Stage 0b — AGENTS.md, CLAUDE.md,
.gitoku/instructions.md, etc.) are committed to the branch, so the most
natural patches the agent can propose are edits to its own ambient
guidance. A patch to AGENTS.md is just a normal commit — the fork on the
patched branch automatically renders the new guidance into the next
prompt, and the trace-diff measures whether it helped. No new wiring
needed beyond what 0b + 5c already provide.
6. NixOS services as context
The services in ~/nixos-config are richly described but have no generated
inventory. Metadata lives in modules/ports.nix (~48 ports),
modules/resource-budget.nix (tiers/slices), per-service modules (sops secrets,
nginx vhost, deps), and the configuration.nix import list.
- Harvest (static):
nix evalthe config → JSON{name, port, tier, user, secret-refs, vhost, deps, ExecStart}. - Harvest (runtime):
systemctl --user list-units --output=json+ MCP/health//mcpprobes for live status. - Model: each service →
servicenode withdepends_on(postgres, redis),exposes(port/vhost),uses_secretedges. The ~13 MCP servers →toolnodes (they are the agent's callable tools — cleanest fit). - Generator: one more NixOS timer service writes the inventory into
graph_nodes/graph_edgeson a schedule. - Staleness: a unit going inactive emits an event → behaviors mark dependent nodes stale (the article's "stale source → stale memo" pattern, applied to infra).
7. Risks & non-goals
- Non-goal: a graph database. Postgres + JSONB is the right weight; Neo4j is overkill for an IHP app.
- Non-goal: rip-and-replace. Domain tables and the IHP job runner stay; we generalize triggering, not execution.
- Carried-over agent memory — was originally framed as the biggest unbuilt concept, sequenced last; shipped at Stage 5b (
5ac4719d) on top of the Phase 1 graph and Stage 2a provenance dual-write. Recall is now a tunable per-surface knob (Application.PromptContextBlock'sSurfaceContextOptions) rather than a missing capability. Next memory-side work is recall quality (Stage 5d capstones), not existence. - Migration discipline: forward-only, idempotent migrations; never edit an applied revision (per
AGENTS.mdmigration rules). - Don't over-trigger. Event-driven behaviors can cascade (cf.
roadmap: "cascading loop"). Add per-behavior concurrency/budget guards (theRepositoryAiContextBudgetpattern already exists) before enabling autonomous chains.
8. Sequencing (status snapshot)
The original plan recommended this order; the actual delivery has tracked it closely. Statuses below reflect 2026-06-06.
- Phase 1 (graph triple + backfill) — ✅ shipped (
43b62840). - Phase 5a (SandboxBackend interface) — ✅ shipped (
43b62840) alongside Phase 1. - Phase 2 (event-driven behaviors) — ✅ shipped across Stage 2a (
af443555, provenance dual-write), 2b (1367e19b, behavior registry/dispatcher), 2c (84db920e, agents-as-behaviors). - Phase 6 (NixOS service nodes) — 📋 planned, not started.
- Phase 5b (NixOS sandbox backend / Stage 4a microVM fleet) — 📋 deferred to a dedicated multi-week infra push. Stage 4b metadata side shipped (
d2b036e8); ZFS bytes await 4a. - Phase 3 + 4 (propose/accept + trace-as-product) — ✅ shipped at Stage 5c (
56686c02, propose/accept) and Stage 5a (079634c1, Activity tab trace UI). - Phase 5c (self-improvement loop / Stage 5d capstones) — 📋 next on the spine.
Phases 1 and 5a carry ~80% of the value and can start in parallel.
9. Business model & positioning
The "holy trinity": git + filesystem + logs
Every agent action gitoku runs produces three kinds of state. Today each is owned by a different vendor category, and nobody binds them coherently:
| Leg | What it is | Who owns it today |
|---|---|---|
| git | Intentional state — committed code, branches, PRs, merge history | GitHub / GitLab / Forgejo |
| filesystem | Working state — uncommitted edits, dep caches, build artifacts, runtime env ("filesystem as memory"); durable + forkable as ZFS snapshots | e2b / Daytona / Modal |
| logs | Reasoning + execution state — prompt, thinking, tool calls, step trace, the ActiveGraph claims/evidence/decisions graph, lineage | LangSmith / agent frameworks |
An agent action you can actually reproduce, fork, and audit needs all three
coherently linked: this commit ⟷ this filesystem snapshot ⟷ this reasoning
trace. gitoku is the only system already on git that is also growing the
stateful sandbox (filesystem, see nixos-sandbox-fleet-plan.md)
and the event log (logs, this plan). The coherence is the product. The
sellable/forkable unit is the content-addressed tuple
(git ref, fs snapshot, trace subgraph) — i.e. a run/snapshot node in the graph.
Open-core split (maps 1:1 onto billing)
- OSS core — adoption, trust, self-hosting. The forge, the ActiveGraph
graph/event model, the
local-runner/nspawnsandbox backend, and open formats for all three legs (git, files, an open event-log schema). No-lock-in is the feature, not a concession — it is what the BYOS future demands and what enterprises require before adopting. - Paid — operational hard parts + scale. "Own provisioning" = the
nixos-anywhere/Hetzner sandbox fleet run as a managed service (flashing hosts, ZFS snapshot GC, warm pools, cross-host snapshot transport, KVM, egress firewall) — pure toil nobody wants to self-run. Plus hosted control plane, snapshot storage/retention, sandbox-compute hours, and enterprise audit/SSO/governance.
The trinity is also the meter
| Leg | Pricing role |
|---|---|
| git | cheap / free → drives adoption |
| filesystem | storage-metered (GB-month of snapshots) — it genuinely costs us, so meter it |
| logs | enterprise / audit upsell (retention, search, propose/accept governance) — the logs are the compliance product |
| compute | sandbox-hours, usage-based |
Defensible without lock-in
Open formats mean switching cost ≈ 0, so we can't and shouldn't sell lock-in. We sell operations + coherence: the reproducible/forkable/auditable triple, and the managed fleet that produces it. Both are expensive to run and hard to replicate; the file formats are not.
Enterprise angle: provision the fleet into the customer's own cloud
account (their Hetzner/AWS, via nixos-anywhere). Single-tenant, data never
leaves, but gitoku operates it — the OSS-trust + managed-ops combination GitHub
can't match (closed) and sandbox startups can't match (no git/logs coherence).
Discipline / risk
The filesystem leg is where we both differentiate and bleed money. Keep git cheap, make the filesystem the metered resource, make logs the upsell — and resist becoming a general sandbox platform. BYOS the commodity case (e2b/Modal/Daytona behind the §5a backend interface); only operate the fleet for paying "managed provisioning" customers. That keeps us out of the commoditizing middle.
Positioning line: GitHub for agents — every action is a reproducible (code, environment, reasoning) triple you own in open formats and we operate at scale.
AI tab unification
Collapsing four AI-adjacent repo tabs (Prompts, Sessions, Activity, Prompt context) into one "AI" umbrella with sub-sections.
Continuity completion plan
Execution roadmap for the remaining ActiveGraph continuity stages (2a → 5, excluding 4a) — seam, sketch, acceptance, and tests per stage.