Gitwami

ActiveGraph continuity layer

Evolving the Gitwami agent architecture toward one typed graph, an append-only event log, and behaviors that react to change.

A plan to evolve gitoku's agent architecture from stateless, per-table job queues toward an ActiveGraph-style continuity layer: one evolving graph of typed nodes and edges, an append-only event log, behaviors that react to changes (instead of hardcoded triggers), forkable sandboxes for agent code execution, and the surrounding NixOS services imported as first-class context.

Status (2026-06-06): Phases 1, 2a, 2b, 2c, 4b-metadata, 5a, 5b, 5c shipped; prompt-context (Stage 0c Pillar 1) and per-repo agent manifest + fork-agent-run (Stages 0 + 3) shipped alongside. The graph triple (graph_nodes/graph_edges/graph_events) is live, behaviors fire via the dispatcher, provenance dual-writes into the graph, semantic-memory injection at prompt render is on, and the trace-as-product Activity UI ships. Remaining unbuilt: NixOS sandbox-host + microVM fleet (Stage 4a) and the Stage 5d continuity capstones. See continuity-completion for the active stage-by-stage execution roadmap with commit hashes.


1. Motivation

Today an agent run is disposable: a fresh clone into an ephemeral workspace, destroyed on teardown, leaving only a commit + a summary string. There is no carried-over memory, no typed relationships between facts, and no generic event log. The ActiveGraph thesis is that the hard part of long-running agents is not the loop — it's the continuity layer around it: a shared evolving state where tasks, claims, evidence, decisions, risks, failures, tools, and relations coexist; events record what changed; behaviors react; and a trace explains how anything came to exist.

The good news: gitoku already has three of the four pillars in primitive form. This plan generalizes them rather than rewriting.


2. Current state (baseline)

ActiveGraph pillarGitoku todayLocation
World stateFlat relational domain tables, no typed edgesrepositories, pull_requests, pull_request_reviews, issues
Behaviors (reaction)Per-table IHP job queues, each hardwired to one triggerprompt_pull_request_jobs (Schema.sql:774), pull_request_review_jobs (:810), pull_request_conflict_resolution_jobs, pull_request_diff_ai_response_jobs, pull_request_form_suggestion_jobs
Trace / lineageCommit-keyed AI provenance + per-step run trace. Capture-side fidelity is heuristic-windowed, not full-thread — Stage 2d unifies this so each commit's stored context is the full delta of the agent session since the prior commit.commit_ai_contexts (:746) via AiContextLookup.hs; workflow_runs (:1082) + workflow_run_steps (:1133); rgh capture in rgh/src/Rgh/Capture.hs
Unified graph + event logShipped (43b62840, Phase 1) — graph_nodes / graph_edges / graph_events triple in Postgres, plus Application.Graph writer + Stage 2a provenance dual-write from commit_ai_contexts.Application/Schema.sql graph tables; Application/Graph.hs; Application/GraphProvenance.hs
Carried-over agent memoryShipped (5ac4719d, Stage 5b) — recent memory-flavored graph nodes (claims / decisions / failures / evidence) project into every AI surface's prompt via Application.GraphMemory + Application.PromptContextBlock. Per-surface byte budgets cap the injection.Application/GraphMemory.hs; Application/PromptContextBlock.hs; commit_ai_contexts projected by GraphProvenance
Prompt context layeringUnified end-to-end across all 5 AI surfaces (Stage 0c Pillar 1, shipped) — every prompt-PR / PR review / conflict resolution / diff-thread reply / form-suggestion run loads from Application.PromptContextBlock with per-surface byte budgets. UI visibility: prompt-PR card footer, per-PR AI activity panel on the conversation tab, in-comment footer on the AI diff-reply, create-form footer on the AI form suggestion. guidance_bytes / memory_bytes persisted on every job row.Application/PromptContextBlock.hs; Application/Jobs/*Run.hs (5 surfaces); Application/Jobs/PullRequestDiffAiResponse/Run.hs appendContextBytesFooter; Web/View/Repositories/Show.hs renderPullRequestSuggestionContextFooter; Web/View/PullRequests/Show.hs renderPullRequestAiContextSummary
LLM providerHardcoded per surface — Codex CLI for the three agentic surfaces (prompt-PR, review, conflict); OpenAI gpt-5-mini direct for the two HTTP surfaces (diff-reply, form-suggestion). Claude Code credentials exist (Application.ClaudeCodeCredentials) but have no consumer. No OpenRouter / Anthropic-direct path. Stage 0c Pillar 2 unifies this (CLI runner dispatch + chat-provider abstraction + tool-call harmonization).Application/CodexCredentials.hs, Application/ClaudeCodeCredentials.hs, IHP.OpenAI call-sites in PullRequestDiffAiResponse/, PullRequestFormSuggestion/

Sandbox model. Isolation is an enum dispatched by case, not an interface (IsolatedExecution.hs:91): Local (in-process), LocalRunner (child process, same host — what git.lazare.ai uses), AwsEcsTask, AwsEc2Vm. Inside the runner it's plain bash -lc, docker run, or a codex subprocess. Swapping a backend touches three sites: parseExecutionBackend, WorkflowRun.hs:84, PromptPullRequest/Run.hs:128. The payload contract (WorkflowVmPayload, PromptRunnerEnvironment) and the HTTP callback protocol are already backend-agnostic — the interface is latent but unnamed.


3. Target architecture

A JSONB node/edge/event triple in the existing Postgres (no graph DB), layered alongside domain tables, which become projections.

graph_nodes  (id, repository_id, node_type, status, payload JSONB, created_at, updated_at)
             node_type ∈ {task, claim, evidence, decision, risk, failure, memory, tool, service, agent}
graph_edges  (id, src_node_id, dst_node_id, relation, payload JSONB, created_at)
             relation  ∈ {supports, contradicts, depends_on, derived_from, blocks, exposes, uses_secret}
graph_events (id, node_id, edge_id, event_type, actor, before JSONB, after JSONB, run_id, created_at)
             -- append-only; node_id/edge_id nullable, exactly one set

Principles:

  • Postgres + JSONB, not Neo4j — right weight for an IHP app.
  • Domain tables stay; nodes are projections/backfills, so nothing relational is lost.
  • The event log is the product: any node's lineage reconstructs by traversal.
  • "Next step emerges from what changed," not from a hardcoded DAG.

4. Phased plan

Phase 1 — Graph alongside, backfilled from projections

Goal: introduce the triple without disrupting anything.

  • Add graph_nodes, graph_edges, graph_events via a new forward-only migration (Application/Migration/<ts>.sql); update Application/Schema.sql. Make idempotent (IF NOT EXISTS).
  • New module Application/Graph.hs: node/edge/event types + insert/query helpers (use sqlQueryTyped/sqlExecTyped, per repo rule — never sqlQuery/sqlExec).
  • Backfill mapping: PR → task node; review comment → claim node; each commit_ai_contexts row → evidence node + derived_from edge to the commit/task.
  • Read path: a "lineage" view that traverses edges for a node ("why does this exist?").
  • Done when: an existing PR renders its provenance as a graph traversal with no behavior change.

Phase 2 — Event-driven behaviors (highest leverage)

Goal: replace bespoke per-table triggering with one dispatcher over events.

  • Define a Behavior registry: predicate (event_type, node predicate) → action. Start as a record-of-functions in Application/Behaviors.hs.
  • One dispatcher job reads new graph_events and fans out to matching behaviors (reuse the IHP job runner as the execution substrate).
  • Re-express existing jobs as behaviors:
    • claim with no supports edge → spawn research task
    • two contradicts-linked claims → spawn review
    • depends_on satisfied → unblock dependent task
    • PR opened → existing review / diff / form-suggestion behaviors
  • Register per-repo agents as behaviors: an [agent.*] manifest's events/branches declarations (see developer-features §5) become a behavior predicate → invoke that named agent. This is the roadmap's "AI router dispatching to agents," and it reuses workflowMatchesEvent for matching. Per-agent instructions (gitoku.toml-wired, opt-in) compose with ambient repo guidance (Stage 0b — every branch's AGENTS.md / CLAUDE.md / *.md, opt-out): both layers feed the same renderCodexPrompt seam, ambient first, per-agent second.
  • Keep the old *_jobs tables as the execution mechanism; only the triggering moves to events.
  • Done when: PR review fires from a graph event, not a direct enqueue, with identical output.

Phase 3 — Patches: propose vs. accept

Goal: risky mutations stay proposed until approved.

  • Add a proposed/accepted status to graph mutations; reuse pull_request_reviews / review-governance for approval.
  • Risky memory/decision writes land as proposed → approval queue; accept promotes them.
  • Done when: a memory-write behavior produces a proposed node a human can approve/reject.

Phase 4 — Trace as product

Goal: unify the lineage fragments.

  • Fold commit_ai_contexts + workflow_run_steps writes into graph_events (dual-write first, then read from events).
  • Reconstruct any node's full history: which behavior created it, what evidence supports it, what changed after.
  • Done when: a PR's "how did this conclusion form" view is built purely from the event log.

5. Sandboxes for agent code execution

Unlocks the self-improvement loop. Two changes:

5a. Name the backend interface

  • Turn the IsolatedExecutionBackend enum into a record-of-functions / typeclass: provision → injectPayload → run → collect → teardown. The interface is already latent in the backend-agnostic payload + callback protocol.
  • Collapse the three dispatch sites (parseExecutionBackend, WorkflowRun.hs:84, PromptPullRequest/Run.hs:128) onto the interface.
  • Injection point: runLocalRunnerProcess / localRunnerProcessCommand (IsolatedExecution.hs:683/1146).

5b. NixOS-native forkable backend

  • Add a systemd-nspawn (or microvm.nix/firecracker) backend where the sandbox image is a Nix derivation — reproducible toolchain, single-host, no AWS.
  • Forkability: a fork = a fresh nspawn from the same snapshot.

5c. Self-modification loop (depends on Phases 2–4 + 5b)

  1. Run a behavior in sandbox A; record the trace as events.
  2. Propose a prompt/rule patch as a proposed graph node.
  3. Fork sandbox B with the patch; run the same task.
  4. Diff the two trace subgraphs — reuse gitoku's existing PR-diff machinery.
  5. Accept the winning node to promote the change.

"Self-modification with lineage" falls out of graph + fork + diff.

Note: the ambient repo guidance files (Stage 0b — AGENTS.md, CLAUDE.md, .gitoku/instructions.md, etc.) are committed to the branch, so the most natural patches the agent can propose are edits to its own ambient guidance. A patch to AGENTS.md is just a normal commit — the fork on the patched branch automatically renders the new guidance into the next prompt, and the trace-diff measures whether it helped. No new wiring needed beyond what 0b + 5c already provide.


6. NixOS services as context

The services in ~/nixos-config are richly described but have no generated inventory. Metadata lives in modules/ports.nix (~48 ports), modules/resource-budget.nix (tiers/slices), per-service modules (sops secrets, nginx vhost, deps), and the configuration.nix import list.

  • Harvest (static): nix eval the config → JSON {name, port, tier, user, secret-refs, vhost, deps, ExecStart}.
  • Harvest (runtime): systemctl --user list-units --output=json + MCP /health//mcp probes for live status.
  • Model: each service → service node with depends_on (postgres, redis), exposes (port/vhost), uses_secret edges. The ~13 MCP servers → tool nodes (they are the agent's callable tools — cleanest fit).
  • Generator: one more NixOS timer service writes the inventory into graph_nodes/graph_edges on a schedule.
  • Staleness: a unit going inactive emits an event → behaviors mark dependent nodes stale (the article's "stale source → stale memo" pattern, applied to infra).

7. Risks & non-goals

  • Non-goal: a graph database. Postgres + JSONB is the right weight; Neo4j is overkill for an IHP app.
  • Non-goal: rip-and-replace. Domain tables and the IHP job runner stay; we generalize triggering, not execution.
  • Carried-over agent memory — was originally framed as the biggest unbuilt concept, sequenced last; shipped at Stage 5b (5ac4719d) on top of the Phase 1 graph and Stage 2a provenance dual-write. Recall is now a tunable per-surface knob (Application.PromptContextBlock's SurfaceContextOptions) rather than a missing capability. Next memory-side work is recall quality (Stage 5d capstones), not existence.
  • Migration discipline: forward-only, idempotent migrations; never edit an applied revision (per AGENTS.md migration rules).
  • Don't over-trigger. Event-driven behaviors can cascade (cf. roadmap: "cascading loop"). Add per-behavior concurrency/budget guards (the RepositoryAiContextBudget pattern already exists) before enabling autonomous chains.

8. Sequencing (status snapshot)

The original plan recommended this order; the actual delivery has tracked it closely. Statuses below reflect 2026-06-06.

  1. Phase 1 (graph triple + backfill) — ✅ shipped (43b62840).
  2. Phase 5a (SandboxBackend interface) — ✅ shipped (43b62840) alongside Phase 1.
  3. Phase 2 (event-driven behaviors) — ✅ shipped across Stage 2a (af443555, provenance dual-write), 2b (1367e19b, behavior registry/dispatcher), 2c (84db920e, agents-as-behaviors).
  4. Phase 6 (NixOS service nodes) — 📋 planned, not started.
  5. Phase 5b (NixOS sandbox backend / Stage 4a microVM fleet) — 📋 deferred to a dedicated multi-week infra push. Stage 4b metadata side shipped (d2b036e8); ZFS bytes await 4a.
  6. Phase 3 + 4 (propose/accept + trace-as-product) — ✅ shipped at Stage 5c (56686c02, propose/accept) and Stage 5a (079634c1, Activity tab trace UI).
  7. Phase 5c (self-improvement loop / Stage 5d capstones) — 📋 next on the spine.

Phases 1 and 5a carry ~80% of the value and can start in parallel.


9. Business model & positioning

The "holy trinity": git + filesystem + logs

Every agent action gitoku runs produces three kinds of state. Today each is owned by a different vendor category, and nobody binds them coherently:

LegWhat it isWho owns it today
gitIntentional state — committed code, branches, PRs, merge historyGitHub / GitLab / Forgejo
filesystemWorking state — uncommitted edits, dep caches, build artifacts, runtime env ("filesystem as memory"); durable + forkable as ZFS snapshotse2b / Daytona / Modal
logsReasoning + execution state — prompt, thinking, tool calls, step trace, the ActiveGraph claims/evidence/decisions graph, lineageLangSmith / agent frameworks

An agent action you can actually reproduce, fork, and audit needs all three coherently linked: this commit ⟷ this filesystem snapshot ⟷ this reasoning trace. gitoku is the only system already on git that is also growing the stateful sandbox (filesystem, see nixos-sandbox-fleet-plan.md) and the event log (logs, this plan). The coherence is the product. The sellable/forkable unit is the content-addressed tuple (git ref, fs snapshot, trace subgraph) — i.e. a run/snapshot node in the graph.

Open-core split (maps 1:1 onto billing)

  • OSS core — adoption, trust, self-hosting. The forge, the ActiveGraph graph/event model, the local-runner/nspawn sandbox backend, and open formats for all three legs (git, files, an open event-log schema). No-lock-in is the feature, not a concession — it is what the BYOS future demands and what enterprises require before adopting.
  • Paid — operational hard parts + scale. "Own provisioning" = the nixos-anywhere/Hetzner sandbox fleet run as a managed service (flashing hosts, ZFS snapshot GC, warm pools, cross-host snapshot transport, KVM, egress firewall) — pure toil nobody wants to self-run. Plus hosted control plane, snapshot storage/retention, sandbox-compute hours, and enterprise audit/SSO/governance.

The trinity is also the meter

LegPricing role
gitcheap / free → drives adoption
filesystemstorage-metered (GB-month of snapshots) — it genuinely costs us, so meter it
logsenterprise / audit upsell (retention, search, propose/accept governance) — the logs are the compliance product
computesandbox-hours, usage-based

Defensible without lock-in

Open formats mean switching cost ≈ 0, so we can't and shouldn't sell lock-in. We sell operations + coherence: the reproducible/forkable/auditable triple, and the managed fleet that produces it. Both are expensive to run and hard to replicate; the file formats are not.

Enterprise angle: provision the fleet into the customer's own cloud account (their Hetzner/AWS, via nixos-anywhere). Single-tenant, data never leaves, but gitoku operates it — the OSS-trust + managed-ops combination GitHub can't match (closed) and sandbox startups can't match (no git/logs coherence).

Discipline / risk

The filesystem leg is where we both differentiate and bleed money. Keep git cheap, make the filesystem the metered resource, make logs the upsell — and resist becoming a general sandbox platform. BYOS the commodity case (e2b/Modal/Daytona behind the §5a backend interface); only operate the fleet for paying "managed provisioning" customers. That keeps us out of the commoditizing middle.

Positioning line: GitHub for agents — every action is a reproducible (code, environment, reasoning) triple you own in open formats and we operate at scale.