The most common architecture mistake in agentic SOC is to build the agent first and the data layer second. The order matters. The substrate the agent reasons against decides everything downstream — the latency budget, the false-positive rate, the tractable scope of an autonomous action, and whether the analyst's audit log makes sense after the fact. Get the substrate wrong and you spend the next eighteen months tuning prompts to compensate.
This post is about the choice we made early in AiSOC: a Neo4j-backed
graph materialised at ingest time, written from the Go ingester at
services/ingest/internal/graph/, with a fixed 17-label, 14-edge
schema gated in CI. Not a graph query layer over a row store. Not an
on-demand graph projection. The graph is the canonical model; the row
stores live alongside it for the things relational stores are good at
(time-series rollups, full-text grep, audit-log immutability).
I'll work through why graph-at-ingest is the right call for an agentic loop, what the schema actually looks like, the substrate-level numbers we measure today, and where this architecture pushes back on us.
The row-store SIEM as an LLM substrate is a poor fit
Most of the SIEMs an agent will inherit from are row-store warehouses. They optimise for two access patterns: append fast, scan ranges fast. Both are great for analyst workflows that look like "give me every event from this host between 02:00 and 02:15". Both are terrible for the access pattern an LLM agent actually needs.
The agent's access pattern, when you watch one drive a real investigation, is graph traversal disguised as a sequence of questions:
- Start at an alert (
Alert{id: a-7102}). - Walk to the targeted asset (
Alert)-[:ASSERTS]->(Host{id: h-44}). - Walk to the identity that touched it
(
Host<-[:LOGGED_IN_FROM]-(Identity)). - Walk to other assets that identity touched in the last 30 minutes.
- Walk to the IOCs observed on those assets.
- Walk to the playbook history for any IOC seen before.
A row-store SIEM answers each of those steps with a fresh query. Six steps, six queries, six round trips. Every round trip costs the agent budget on two axes: the latency of the query itself, and the context budget of the LLM that has to read the result and decide what to ask next.
In the agent's view of the world, the cheapest unit of evidence is a materialised neighbourhood, not a query. If the model can ask "give me the 2-hop neighbourhood around this host, filtered to the last 30 minutes" and get back a small typed subgraph, the cost of an investigation drops by an order of magnitude relative to the query-by- query approach. That's why graph-at-ingest matters.
What "graph at ingest" actually means
The temptation, when you read "graph for SOC", is to bolt a graph projection onto the existing pipeline: keep the row store as the source of truth, run a streaming job that builds the graph on a delay, serve the agent from the graph. We tried it. It doesn't work, for two reasons:
- The lag kills the loop. Streaming projections lag the row store by seconds-to-minutes. The agent's investigation is sub-minute. By the time the projection catches up, the alert is closed.
- The schema drifts. Two write paths means two opinions on what the right node label is, what the right relationship type is, what the right property name is. Schema drift between the row store and the graph projection is a guaranteed source of agent hallucinations: the agent sees the projection's view, the analyst sees the row store's view, and the two diverge.
So we picked the other answer: the ingester writes the graph
directly, in the same transaction that writes the row record. The
extractor in services/ingest/internal/graph/extractor.go reads the
incoming OCSF event, projects it to a (node, edge) set, and the
writer in services/ingest/internal/graph/writer.go upserts it into
Neo4j with a single MERGE per node and per relationship. The
underlying row store sees the same event in the same transaction.
There is exactly one source of truth for the schema —
schemas/graph-schema.yaml, mirrored to a Go enum in
services/ingest/internal/graph/schema.go, and gated in CI by
scripts/export_graph_schema.py --check. A schema PR that touches
one without the other fails the build.
That last point — the CI drift gate — is the part most teams skip. Without it, "graph at ingest" decays back into "graph projection" within a quarter, because every drift bug looks like a one-off until the fifth one in a row.
The schema, drawn
Seventeen node labels, fourteen relationship types. The schema fits on a slide deliberately; an agent that reasons over a graph the size of a phonebook is going to make bad decisions. The diagram below is the v1.0 schema as of 2026-05-13.
The schema lives in schemas/graph-schema.yaml with a one-paragraph
prose entry per label and per relationship. The contract is:
- Every label declares its required and optional properties, the
ID convention (
{provider}:{external_id}for IdP-anchored labels,{tenant}:{kind}:{uuid}for tenant-internal ones), and the retention policy. - Every relationship is either an event edge (carries
ts,source_event_id,snapshot_id— written from observed events) or a structural edge (carriessnapshot_id,valid_from,valid_to— reconciled from configuration snapshots). The convention is enforced at the schema level so the agent always knows whether a given walk is "what happened" or "what was true at time T". - The schema version is in the file (
v1.0) and is bumped by semver rules: additive change is a minor bump, anything else is a major bump.
That last point matters more than the schema itself. A graph schema
without a version is a graph schema that drifts every quarter, and an
agent reasoning over a drifting graph silently regresses. The
/sovereign page for AiSOC lists the same drift gate
under "audit-grade graph"; this is what backs the claim.
The substrate eval — how we measure the layer beneath the agent
Substrate self-checks are a thing we keep visually distinct from live agent benchmarks. The numbers below are from the public eval harness running in CI on every PR, against a fixed 200-incident corpus. They measure the substrate — the in-harness fusion grouper, the extractors, the deterministic templates — not the live LLM agent. The distinction is on the public benchmark page and we maintain it religiously: the moment a substrate number gets quoted as agent latency, the trust falls off a cliff.
What the substrate eval tells us about the graph layer:
- Extractor coverage. 100 % of the 200-incident corpus produces a
non-empty
(node, edge)set on the first ingest pass. The fail-open behaviour we used to have (skip the graph write if any property is missing) was replaced with fail-closed in v1.4: the ingester now refuses to commit the row record if the graph write fails the schema check, because a partial graph is worse than no graph for an agent reasoning over it. - Graph-walk completeness. For 187 of the 200 incidents (93.5 %),
the agent's first canned 2-hop traversal — alert → asserted asset
→ identity → other assets touched in the last 30 minutes — returns
a non-empty result. For the remaining 13, the alert is a
cloud-control-plane event with no asset side, and the schema
routes the traversal through
Resourcerather thanHostinstead. Both shapes are covered. - Substrate latency. Median substrate-eval graph build is 0.8 ms
per incident on a laptop-class run. Substrate again — this is the
in-harness fusion grouper assembling the same
(node, edge)set the production extractor would emit, not Neo4j round-trip latency. The point is to gate algorithmic regressions in CI, not to claim production performance.
The wet-eval numbers — actual Neo4j p50/p95 round trips for the agent's canned traversals — are a different report. We publish them on the benchmark page under the wet-eval section, and they're the ones I'd cite in a procurement conversation. Substrate numbers are for the engineering team; wet numbers are for the operator. Mixing them is the most common mistake I see in vendor benchmarks.
What the graph enables for the agent
Graph-at-ingest pays for itself in three places that an agent loop actually feels:
-
The first context bundle is one query. The
ContextBundlework in T2.1 — described in the next post in this series — collapses the agent's first "what's around this alert?" question into one Cypher call. On the row-store path that was six. The latency win is not the round-trip savings (those are measured in milliseconds); it's the context budget the agent doesn't burn reasoning over six unrelated query results. -
Cross-source correlation is free. When an EDR alert and an IdP sign-in event both reference the same
Identity{id: …}node, the correlation happens at ingest time, not at query time. The agent that readsAlert{id: a-7102}already sees the IdP context as a neighbour edge. No fan-out query, no cross-index join in the warehouse. -
The investigation ledger is graph-native. Every action the agent takes is written as an
Actionnode attached to theIncidentnode it acted on. The audit replay — "show me what the agent considered before it suspended this session" — is a single Cypher walk from the action back through the agent's read set. This is what the L0–L4 maturity model post means by an audit-loggable gate; the auditability is structural, not bolted on.
The fourth, less obvious win is what graph-at-ingest doesn't do: it doesn't try to be a feature store, a metrics store, a log archive, or a search index. Those live in their native systems. The graph is the agent's working memory and the audit ledger; everything else is one hop away.
What we got wrong, and what's still open
The honest version of this story includes the wrong turns.
- We over-modelled at first. The v0.x schema had 31 labels and
44 relationships. The agent's behaviour got worse as the schema
grew, because the LLM had to reason about more shape every step.
Cutting to 17/14 in v1.0 was a hard pruning: we collapsed
UserandServiceAccountunder a sharedIdentitysuperclass, collapsedRoleAssignmentinto a structuralHAS_ROLEedge with validity windows, and removed three "future-use" labels that no extractor was writing yet. The agent's investigation completeness on the substrate eval went up, not down, after the cut. - Cross-tenant reasoning isn't there yet. The graph is tenant-scoped by design — all queries are gated by tenant ID at the driver level. That's the right call for isolation, but it means the agent can't do "have we seen this IOC across other tenants?" today. The work to expose a federated, hashed view of cross-tenant IOC observation is on the v8.0 roadmap and not yet in code.
- Snapshot reconciliation is harder than event ingest. Event-edge
writes are easy: take the event, write the edge. Structural-edge
writes — the ones that need a
valid_from/valid_towindow — require us to diff the latest snapshot against the previous one and emit edge updates only where the diff matters. The diff algorithm is the noisiest part of the pipeline and the most frequent source of regressions. It's documented under "open questions" in the graph schema reference.
None of these are deal-breakers; all of them are engineering work, not architecture work. The architecture — graph as canonical model, written at ingest, with a CI-gated schema — has held up across v1, v1.4, and v8.0.
What I'd tell another team picking the substrate
If you're building an agentic SOC, or any agentic system that has to reason over a network-shaped domain, three principles from this work generalise:
-
The substrate decides the agent's latency budget. A graph substrate buys you sub-second context bundling. A row-store substrate forces multi-query fan-out, and you'll pay for that every investigation, forever.
-
Pin the schema and gate it in CI. Whatever schema you pick, write it down, version it, and fail the build on drift. The second-most-common cause of agent hallucination, after model inconsistency, is the agent reading a schema the docs claim exists but the data doesn't follow.
-
Materialise once at the seam. Don't run two parallel writes to two parallel stores. Pick the seam — for us, it's the OCSF normalisation step in the ingester — and emit the graph and the row record from the same transaction. Lag and drift are not things you can tune away.
The next post in this series — Latency budget for sub-minute investigation — picks up where this one ends: given the graph is in place, how do you spend the 30-second budget between alert and verdict? The third post — L0 → L4 SOC automation maturity — builds on both: once the agent can reason in 30 seconds, what is it allowed to do at the end?
The schema lives at
schemas/graph-schema.yaml.
The Go ingester lives at
services/ingest/internal/graph/.
The drift gate lives at
scripts/export_graph_schema.py.
All of it is MIT-licensed; pull requests against the schema are
welcome, especially from teams running this kind of substrate in
production.