mcp-graph v12 stops being folklore: it becomes an AISE method

“It’s just a task tracker with AI”

That was the critique I got last week, in a closed group of engineers who use AI heavily day to day. They looked at mcp-graph, saw nodes, edges, phases, and delivered the verdict: “okay, but this is just a smart Jira with an LLM hook”.

I get the read. From a distance, anything with stateful tasks looks like a tracker. The problem is that from a distance, nothing interesting looks interesting. Up close, mcp-graph v12 is something else: it is the local-first implementation of two canonical methodologies of the discipline that is forming right now, in 2026, under the name AISE (AI Software Engineering). I am not making this up to defend the project. It is in the literature, it is in the DORA Report 2025, and now it is inside the project repository itself, after I decided to stop calling mcp-graph a “tool” and start calling it what it is: a runtime layer.

Let me unpack. By the end of this post, I want you to be able to answer the critique above in one sentence, with a source.

What AISE is, according to the 2026 literature

AISE is not a marketing term. It is the emerging category that describes the practice of delivering software via AI agents combining two complementary pillars: Agentic AI (autonomous systems with goal-oriented decision-making) and Generative AI (artifact production: code, text, design).

What changed in 2025-26 was the register. Until 2024, AI in software engineering was basically “AI-assisted coding”: Copilot completing lines, Cursor suggesting refactors. Useful, but marginal. The inflection came when large companies started reporting that the bottleneck was no longer “the agent writes code fast”. The bottleneck became “the system surrounding the agent”. That is when the term AISE stops being a synonym for autocomplete and becomes a discipline, with its own methodologies, metrics, and gates.

Who is writing about this in 2026: Pragmatic Engineer in a series of predictions for the year, Anthropic’s 2026 Agentic Coding Trends Report, Thoughtworks’ research team (Birgitta Böckeler has dense material on spec-driven development), and the already-mentioned DORA Report 2025, which is the most quantitative instrument of the bunch. There is also peer-reviewed work in MDPI under the heading “AI-Driven Innovations in Software Engineering”. The idea is mature enough to have literature, fresh enough that few people are aware of it.

Inside this emerging category, two methodologies are crystallizing as operational pillars. Those are the ones that matter here.

The two canonical methodologies of AISE

The first is Specification-Driven Development (SDD).

Write precise, machine-readable specifications before code. Roots in formal methods, BDD, API design. The spec is the canonical artifact, the code is a verifiable translation of it.

Conceptually, nothing new. What is new is the reason SDD is back to mattering. When you have an AI agent writing code at 200 lines per minute, the clarity of the specification stops being hygiene and becomes the absolute bottleneck. Without a spec, the agent guesses. With a loose spec, the agent nails the looseness. With an executable spec, the agent has something to reproduce and something to validate against.

The second is Context-Driven Engineering (CDE), sometimes called “context engineering” plain.

Provide the agent with the complete context in which it will operate (intent, constraints, prior decisions, relevant code) instead of loose prompts. Reduces the non-deterministic fraction of the output.

CDE is what separates “I pasted my PRD into Claude” from “the agent opens a session and already knows who it is, what it is doing, what the limit is, and where it left off”. The name covers both what goes into the prompt and what persists across sessions.

Both pillars show up together in nearly every serious piece of AISE literature in 2026. SDD answers “what to build, with what precision”. CDE answers “in what substrate the agent operates”. Treating them in isolation falls into the classic anti-pattern: spec without context becomes paper, context without spec becomes organized noise.

The DORA 2025 insight that changes the game

The DORA Report 2025 (summarized in detail by InfoQ and dozens of engineering blogs in March 2026) brought the sentence that changed how I explain mcp-graph to skeptics:

AI amplifies the quality of the engineering system it operates in. Organizations with mature DevOps, well-defined workflows, and platform capabilities convert AI productivity gains into measurable delivery improvements.

Practical translation: 75% of engineers use AI, according to the report itself. But most organizations saw no measurable improvement in delivery. Why? Because AI alone is an amplifier. If the system is chaotic, AI amplifies chaos. The agent gets faster, but cycle time does not drop, because rework increases at the same rate.

The inverse reading is what matters: organizations that do have mature platform capability, with well-defined workflows, are the ones that manage to convert agent productivity gains into delivery gains. AI stacks on top of a structure that already works. Without that structure, the stacking does not happen.

Mcp-graph is literally a proposal of “platform capability” for small teams and individuals. It is the part of the system that DORA says is a prerequisite.

v12: what changed

Before getting to the proof, a factual update. Version 12.0.0 shipped on April 26, 2026. Compared to v11, what changed:

Single bin mcp-graph replaces the fragmentation of @mcp-graph-workflow/cli and v11 binaries.
24 consolidated subcommands (8 server, 16 lifecycle/ops).
CLI workspace folded into a single bundle dist/v11-cli.mjs at build time.
Cross-platform compatibility (Windows, macOS, Linux) stabilized.
Full PT-BR documentation.
PHASE_GATES formalized as a canonical table.

Not a revolutionary release on the surface. The important thing is that v12 closed the rough edges that still made the project look like “academic experimentation”. Today it is a binary, with unified docs, with explicit gates. It is the version where you can stop calling it folklore.

SDD in mcp-graph: PRD becomes an executable graph

Here the proof starts. SDD says “spec before code, machine-readable”. I will show where that lives in the repo.

The entry point is src/core/importer/prd-to-graph.ts. It is a parser that takes a PRD file in Markdown, DOCX, or PDF, and emits typed nodes into a SQLite graph. Node types are fixed: epic, task, subtask, requirement, constraint, milestone, and acceptance_criteria. No free-form. AC is a first-class node type, not a loose field inside a task.

It runs like this:

$ mcp-graph import docs/PRD-checkout.md
✓ Parsed PRD (4.2KB, 14 sections)
✓ Created 1 epic (E-001: "Checkout flow rewrite")
✓ Created 7 tasks under E-001
✓ Created 23 subtasks
✓ Extracted 11 acceptance_criteria nodes
✓ Linked 4 constraints (LGPD, latency<200ms, idempotency, audit)
✓ Graph persisted at workflow-graph/graph.db

The difference from “paste PRD into chat” is that each of these nodes becomes a stable address. AC #7 three weeks from now is still AC #7. You can trace which code was born from which AC. That is spec-driven in its simplest operational form.

The second piece is enforcement. In docs/reference/LIFECYCLE.md, the PHASE_GATES table defines that to leave IMPLEMENT and enter VALIDATE, the graph needs at least 50% of tasks with testable AC. Not convention: gate. The pre-tool-use hook blocks invalid transitions before the agent even tries.

The third piece is TDD. In src/core/pipeline/start-task.ts, the start_task call checks test prerequisites before releasing implementation. No declared test, no code. The test becomes the executable form of the spec. It is BDD reincarnated in a runtime that the agent is forced to respect.

This is SDD applied, not SDD described in a Medium post.

CDE in mcp-graph: context that survives a session

Now the second pillar.

CDE says: give the agent the complete informational environment, not just the prompt. Mcp-graph implements this in three layers:

Graph persistence. The SQLite at workflow-graph/graph.db survives a reload. If you close the terminal today and open it tomorrow, the agent knows which task it was on, which subtask failed, which decision was made. There is no prompt-side reconstruction, because the state was not lost. This makes a concrete difference: the most common post-mortem of agents in production is “forgot what it was doing in the next session”.

RAG over the repository. Inside src/core/rag/ live 50+ specialized indexers: code-context, entity, decision, journey, knowledge linker, adaptive router, semantic cache. It is not a generic text-embedding RAG. It is a RAG that knows code has an AST, decisions have rationale, journeys have steps. Constraint discovery stops depending on the agent “remembering” that a constraint exists in a doc. It actively retrieves.

Sibling context assembly. Documented in ADR-0047. When the agent is about to touch a subtask, the runtime assembles a context containing the already-completed sibling subtasks, in topological order, truncated to a token budget (default 4000, configurable up to 8000). The same triple (epicId, subtaskId, tokenBudget) always produces the same markdown. Reproducible. This is the opposite of traditional prompt-soup, where what enters the context depends on the human’s mood.

Together, this is operational Context-Driven Engineering. It is not chat-with-clever-memory. It is an information substrate that the agent queries by rules.

The 9-phase loop as executable spec

I have mentioned phases several times. Worth making them explicit:

ANALYZE → DESIGN → PLAN → IMPLEMENT → VALIDATE → REVIEW → HANDOFF → DEPLOY → LISTENING

Each transition has a gate. I list the most relevant ones (docs/reference/LIFECYCLE.md):

Transition	Gate (summary)
`ANALYZE → DESIGN`	≥1 mapped epic
`DESIGN → PLAN`	design-ready analysis + ADR challenge passed
`PLAN → IMPLEMENT`	≥1 task with sprint assigned
`IMPLEMENT → VALIDATE`	≥50% tasks done with testable AC
`VALIDATE → REVIEW`	green test suite + minimum coverage
`REVIEW → HANDOFF`	review recorded with decision

Three stacked layers of enforcement guarantee the gate is not decoration:

Out-of-process hook (pre-tool-use): blocks invalid action before the MCP is even called.
MCP server (in-process): validates deprecation, runs the handler, accounts for tokens.
PHASE_GATES table in lifecycle-phase.ts: source of truth for transitions.

If you want to see SDD in concrete form, this loop is it. The spec is not a static doc: it is an automaton that refuses illegal states.

Empirical proof: the v11 benchmark that backs v12

The hardest part to defend in technical conversations is not “this exists”, it is “this pays off”. Here comes the benchmark documented in docs/_internal/BENCHMARK-v11.md, replicated independently via OpenRouter (run id db6e57871911) under the codename H12-faithful.

Arm	Model + structure	Pass rate	Cost USD	Wall time
A (Haiku mono baseline)	Haiku, no graph	60%	$0.0110	10,668 ms
B-v2 (Haiku + decomp)	Haiku with graph + sibling context	100% parse	$0.0257	measured
C (Sonnet baseline)	Sonnet, no graph	80%	$0.0327	measured

Three things to notice:

First: Haiku alone has a 60% pass rate. Haiku with mcp-graph structure hits 100% on the parse gate. The structure compensated for the smaller model. Not marginal: 40 percentage points.

Second: cost. Arm B-v2 (Haiku + structure) costs $0.0257 per execution. Arm C (Sonnet alone) costs $0.0327. The structure is 79% of the Sonnet cost with equivalent or higher parse pass rate. You pay less for the bigger model, and what fills the gap is spec + context.

Third: H12-faithful, run on a different provider (OpenRouter) with the same protocol, replicated the result independently. This kills the most obvious hypothesis (“ah, the result is just a bug in Diego’s implementation”). The protocol reproduces, so the protocol is doing the work.

For ballast: the repository has 1,413 .test.ts tests validating per-layer behaviors, plus 6 .bench.ts files holding the performance line. This is not vibes.

Repositioning: the AISE taxonomy of mcp-graph

Putting it all in one quadrant. Each side of the rectangle is one of the transformations mcp-graph makes happen:

PRD → Spec (SDD). mcp-graph import turns a human document into an executable spec graph with AC.
Spec → Context (CDE). Graph + RAG + memory give the agent complete, reproducible context, within a token budget.
Context → Code (Agentic). The agent navigates the graph, but can only leave IMPLEMENT after green TDD.
Code → Audit (DORA). Each line of code was born from an AC, born from a requirement, born from an epic. End-to-end traceability becomes a natural delivery metric.

Here lives the sentence that now sits inside the project’s README and glossary, and that holds up the repositioning:

mcp-graph is the AISE runtime layer, where Specification-Driven Development meets Context-Driven Engineering, on a local-first substrate.

That is academically defensible without losing the direct voice. It names exactly what the project is, in terms the 2026 literature already uses. Stops competing with “task tracker” and starts occupying a slot in the new discipline.

Vaccine against the critic (“this is just a task tracker”)

Back to the critique from the start. When someone says “but this is just a Jira with AI”, the certified answer is:

No. It is the local-first implementation of the two canonical AISE methodologies: Specification-Driven Development and Context-Driven Engineering. The graph is the spec. The context is the substrate. TDD is the gate. Memory is the continuity. Without these four properties together, the agent does not do engineering, it does expensive vibe-coding.

And if the person insists, four more bricks:

package.json declares v12.0.0 with mcp-graph as the single bin. Not a prototype.
The repository sits under CITATION.cff, ORCID 0009-0002-1117-9571, tied to ongoing master’s research at UNOPAR. Folklore is not cited in CFF.
The benchmark was independently replicated via OpenRouter (H12-faithful). Folklore does not pass replication.
The license is AGPL v3, with a separate commercial licensing track documented in COMMERCIAL.md. Folklore has neither dual licensing nor a network copyleft clause.

Four bricks in, we stop calling it folklore.

Where this goes

The question that matters in 2026 stopped being “which model do I use?”. Models come and go, get cheaper every quarter, gain 10 percentage points and lose 10 in a different domain. The model is commodity.

The question that matters became: what is your AISE runtime layer?

Whoever runs an agent in production without an executable spec and reproducible context is paying the price of not having an answer to that question, even if they do not know it. The DORA Report 2025 already showed the aggregate numbers: agent productivity goes up, delivery does not. The difference is exactly the slice that SDD + CDE fill.

Concrete call. Open the mcp-graph repo, grab a PRD of yours (real, not example), and run mcp-graph import on it. Watch the graph come to life. Look at GLOSSARY.md, which now has dedicated entries for SDD and CDE as sub-pillars of AISE. Look at the README, which now declares which “platform capability” it is proposing to be. And decide whether you want to keep pushing the agent without a substrate, or start treating AI software engineering as engineering.

The difference, in the end, is not in the model. It is in the maze you build for the agent to run. Engineering, unlike models, is 100% under your control.