· 16 min read · ...

From Zero to Product in 9 Days: What Happens When a Dev Actually Uses AI

A real study on how one person and an AI built a complete software product in just 9 days. Understand the process, the numbers, and the lessons.

AIsoftware developmentproductivitycase studyClaude/Copilot
Table of Contents

I built a complete piece of software in 9 days. Not a throwaway prototype. A real product, published on npm, with 2,023 passing tests, 23 releases, and over 60,000 effective lines of code. Just me and an AI. From first commit to npm publish.

I know it sounds like a sales pitch. I would have doubted it myself two months ago. But it happened, all the data is public on GitHub, and in this article I’m going to break down exactly how it went. No sugar-coating.

The problem that bugged me

By March 2026, AI coding tools were everywhere. Claude, Copilot, Cursor. Everybody was using them. But something kept nagging at me: none of these tools addressed how to organize your work when AI is your coding partner.

Think about it. Jira, Linear, Trello. They were all built for human teams working in two-week sprints. But when you code with AI, the rhythm changes completely. You work in hour-long cycles, not week-long sprints. Tasks need clear criteria the AI can validate, not vague story point estimates that nobody gets right anyway.

If you’re not in the field: Story points are how dev teams estimate task effort. Instead of saying “this takes 3 days,” the team assigns an abstract number (like 3, 5, or 8) for relative complexity. In practice, it’s an educated guess.

So the question became: what if I built a management tool designed specifically for people who code with AI?

The bet I made

The idea was bold for two reasons:

  1. Build the tool using AI itself. Claude/Copilot would be the copilot for the entire development.
  2. Use the tool to manage its own construction. If it could manage itself, it would work for anything.
flowchart LR
    DEV[Dev defines\narchitecture] --> CLAUDE[Claude/Copilot\nimplements]
    CLAUDE --> MCPGRAPH[mcp-graph\nmanages tasks]
    MCPGRAPH --> NEXT[next task\nrecommended]
    NEXT --> DEV
    MCPGRAPH --> |accelerates| CLAUDE

    style DEV fill:#2196f3,color:#fff
    style CLAUDE fill:#7c3aed,color:#fff
    style MCPGRAPH fill:#10b981,color:#fff
    style NEXT fill:#f59e0b,color:#000

And something interesting happened: the more features the tool gained, the faster development became. Like a snowball. Each new feature fed the next cycle.

If you’re not in the field: MCP (Model Context Protocol) is an open protocol that lets AI agents connect to external tools. Think of it as a “USB port” for artificial intelligence. A universal standard for plugging different capabilities into any compatible agent.

The journey, day by day

Days 1-2: When it all started (30 commits)

It was a Sunday, 1:16 AM. I couldn’t sleep, kept thinking about the architecture. Opened my terminal and pushed the first commit. And it wasn’t a timid start. Within 48 hours, the project already had:

  • A full MCP server with 10 tools
  • A parser that turned requirement docs into structured tasks
  • SQLite database with automatic migrations
  • Web dashboard with 5 tabs (visual graph, backlog, code analysis, knowledge base, insights)
  • Full REST API with CRUD operations
  • Real-time event system via SSE
  • 54 automated tests
  • CI pipeline on GitHub Actions

If you’re not in the field: A REST API is basically the language systems use to talk to each other over the internet. When your phone app fetches data from a server, it’s probably using a REST API. SSE (Server-Sent Events) lets the server push updates to the browser in real time, without the browser constantly asking “anything new?”

The first stable version shipped on day 1. By the next day it was already at v2.1.0.

What made this possible? Discipline. Sounds cliché, but every feature followed a strict cycle: test first, then minimum code to pass, then refactor. The AI wasn’t “vibing” code into existence. It followed very specific instructions I documented in a file called CLAUDE.md. Without that, it would have been chaos by day two.

Days 3-4: The craziest day (40 commits)

March 11th. This was the wildest day of the whole project: 35 commits in 24 hours. Five versions released.

And here’s the funny part: instead of piling on more stuff, I did the opposite. Consolidated the tools from 31 down to 26. Fewer buttons, more power in each one. Like when you have 5 remote controls in the living room and finally buy a universal one.

If you’re not in the field: A commit is like a “save point” in software development. Each commit records what changed in the code with a description. 35 commits in a day means 35 significant changes saved and documented.

The key milestones:

  • v3.0.0: Package identity change (in software we call this a “breaking change,” meaning the update requires existing users to adapt)
  • v4.0.0: Complete lifecycle management. The project now knew what phase each task was in and enforced discipline.

Days 5-6: Reality knocked on the door (25 commits)

Remember that initial excitement? Days 5 and 6 were the cold shower. Shipping fast is easy. Shipping with quality is a different story.

Six consecutive bug fixes showed me problems no unit test in the world would catch:

  • Safari decided to show a blank screen on one of the tabs. Only Safari. Of course.
  • Icons returning 404 errors. Just vanishing.
  • The dashboard wouldn’t switch databases when I changed projects.

It stung in the moment, but each bug taught me something that prevented bigger problems later. Speed without quality is just technical debt piling up.

If you’re not in the field: A unit test checks if a small isolated piece of code works. But some bugs only appear when the system runs for real, in an actual browser, with real data. That’s why integration and end-to-end tests exist too. They look at the whole system.

Days 7-8: Things got serious (29 commits)

This is where the project leveled up. Instead of relying on external tools, I started bringing everything in-house:

Code analysis engine: It could read the project’s source code, map relationships between files, and calculate the impact of any change. Before touching a function, the system already told me what would break. Game changer.

Native memory system: Previously the project depended on an external service to “remember” past decisions. Now it ran internally, with integrated smart search. Less dependency, more speed.

flowchart TB
    subgraph "Before (v4.x)"
        A1[mcp-graph] --> A2[Serena MCP\nexternal]
        A1 --> A3[GitNexus MCP\nexternal]
    end

    subgraph "After (v5.1+)"
        B1[mcp-graph] --> B2[Native Memories\nbuilt-in]
        B1 --> B3[Code Intelligence\nbuilt-in]
    end

    style A2 fill:#f44336,color:#fff
    style A3 fill:#f44336,color:#fff
    style B2 fill:#10b981,color:#fff
    style B3 fill:#10b981,color:#fff
    style B1 fill:#4263eb,color:#fff
    style A1 fill:#4263eb,color:#fff

Version 5.2.0 brought something I’m quite proud of: phase-aware context. If you’re implementing, the system prioritizes code examples. If you’re reviewing, it prioritizes impact analysis. The context adapts to what you need at that moment.

Day 9: The final sprint (17 commits)

Last day. All-in on tests and quality:

  • 391 new tests in a single day. Coverage jumped from 78% to 91%.
  • Skills system for automating common workflows.
  • Insights dashboard with automatic bottleneck detection.

If you’re not in the field: Test coverage measures what percentage of your code has automated verification. 91% means almost everything is tested. Above 80% is considered excellent in the industry.

xychart-beta
    title "Test Growth Over the 9 Days"
    x-axis ["D1-2", "D3-4", "D5-6", "D7-8", "D9"]
    y-axis "Tests" 0 --> 2100
    line [54, 200, 600, 1632, 2023]
    bar [54, 200, 600, 1632, 2023]

The consolidated numbers

MetricValue
Total duration9 days, 7h47min
Commits142
Releases (published versions)23
Effective code~60,000 lines
Tests2,023 passing (100% green)
Test coverage91%
MCP tools30
API routes20 endpoints
Dashboard8+ tabs
Average commits per day15.8

When the code happened

xychart-beta
    title "Commits per Day"
    x-axis ["Mar 9", "Mar 10", "Mar 11", "Mar 12", "Mar 13", "Mar 14", "Mar 15", "Mar 16", "Mar 17", "Mar 18"]
    y-axis "Commits" 0 --> 40
    bar [24, 6, 35, 5, 0, 17, 8, 20, 9, 17]

Look at day 13: zero commits. Not laziness. After 4 straight days with 70 commits, my brain called it quits. AI doesn’t need rest, but I do. And you know what happened? The next 3 days produced 45 commits of noticeably better quality. Rest was part of the process.

When I was most productive

xychart-beta
    title "Commits by Hour of Day"
    x-axis ["0h", "1h", "2h", "3h", "4h", "5h", "6h", "7h", "8h", "9h", "10h", "11h", "12h", "13h", "14h", "15h", "16h", "17h", "18h", "19h", "20h", "21h", "22h", "23h"]
    y-axis "Commits" 0 --> 25
    bar [8, 23, 5, 3, 2, 0, 0, 0, 1, 5, 8, 6, 4, 15, 15, 8, 6, 10, 5, 3, 4, 2, 3, 5]

23 commits at 1 AM. No meetings, no notifications, no interruptions. Just me, the AI, and the terminal. Everybody has their peak hours and mine, clearly, are late at night. AI amplifies your productive moments. It doesn’t create new ones.

Who did what

pie title Commits by Author
    "Diego Nogueira" : 118
    "github-actions[bot]" : 17
    "dependabot[bot]" : 4
    "Claude" : 3

83% of commits came from my hands. Every piece of AI-generated code went through my review before being merged. The automation bots handled releases and security updates on their own.

All 23 releases on the map

timeline
    title 23 Releases in 9 Days
    section Day 1 (Mar 9)
        v2.0.0 : Core MCP + Parser + Dashboard
        v2.0.1 : Fix shebang
        v2.1.0 : All edge types active
    section Day 3 (Mar 11)
        v3.0.0 : npm scope rename (BREAKING)
        v4.0.0 : Lifecycle + dashboard (BREAKING)
        v4.1.0 : CI security pipeline
        v4.2.0 : Real-time logs tab
        v4.3.0 : Lifecycle auto-detection
    section Day 4 (Mar 12)
        v4.3.1 : Config path fix
    section Day 5 (Mar 14)
        v5.0.0 : npm scope migration (BREAKING)
    section Day 6 (Mar 15)
        v5.0.1 : Safari blank screen fix
        v5.0.2 : Favicon + canvas fix
        v5.0.3 : Dashboard bundle cleanup
        v5.0.4 : Runtime DB swap
        v5.0.5 : Benchmark calculation fix
    section Day 7 (Mar 16)
        v5.1.0 : Doctor command
        v5.1.1 : CI benchmark threshold
        v5.1.2 : CI timeout fix
        v5.1.3 : Release workflow fixes
        v5.1.4 : Release PAT fix
        v5.1.5 : Lifecycle strict mode
    section Day 8 (Mar 17)
        v5.2.0 : Knowledge mesh phase-aware
    section Day 9 (Mar 18)
        v5.3.0 : Skills system + Code Intel + Memories
        v5.4.0 : Test coverage 91%

When you look at the map, you can see the natural rhythm of the project:

  • Day 1: Feature explosion. That early-project energy.
  • Day 3: Two big structural changes on the same day. Rearranging the house while building it.
  • Days 5-6: Six bug-fix releases. Reality collecting the bill for all that speed.
  • Day 7: Five CI/CD infrastructure tweaks. Boring but necessary.
  • Days 8-9: Advanced features and quality. The project maturing.

The snowball effect

This was the coolest part of the whole thing. Every feature I added to mcp-graph made developing the next feature faster. Compound acceleration.

flowchart TD
    A[Day 1: import_prd\nTransforms requirements into tasks] --> B[Day 2: next task\nRecommends the next task]
    B --> C[Day 3: context\nCompresses context by 73%]
    C --> D[Day 4: lifecycle\nEnforces process discipline]
    D --> E[Day 5: knowledge store\nAccumulates knowledge]
    E --> F[Day 7: Code Intelligence\nAnalyzes change impact]
    F --> G[Day 8: knowledge mesh\nPhase-aware context]
    G --> H[Day 9: skills system\nWorkflow automation]

    A --> |used to\nbuild| B
    B --> |used to\nbuild| C
    C --> |used to\nbuild| D

    style A fill:#2196f3,color:#fff
    style B fill:#4263eb,color:#fff
    style C fill:#7c3aed,color:#fff
    style D fill:#9c27b0,color:#fff
    style E fill:#e91e63,color:#fff
    style F fill:#f44336,color:#fff
    style G fill:#ff5722,color:#fff
    style H fill:#ff9800,color:#000

Concrete examples:

  1. Day 1: The requirements parser was already being used to organize mcp-graph’s own tasks. It managed itself from day one.
  2. Day 3: The context compressor cut 73% of the information volume. The AI could now process way more per interaction.
  3. Day 7: Impact analysis checked, before each change, what would break. No more fear of touching something and blowing up something else.
  4. Day 8: Adaptive context delivered different information depending on the phase. Implementing? Code examples. Reviewing? Risk analysis.

The tool didn’t just manage development. It accelerated development. And the more it accelerated, the more features I shipped for it to accelerate even further. Positive infinite loop.

Comparing with real-world projects (no AI)

To check whether these numbers make sense, I compared mcp-graph with open-source projects built before the AI era (pre-2015). All with public data on GitHub.

If you’re not in the field: LOC stands for “Lines of Code.” It’s a simple metric counting how many lines a project has. Not perfect, since a line can be trivial or deeply complex, but it works as a scale reference.

How long each project took to reach v1.0

xychart-beta
    title "Days to first stable version"
    x-axis ["mcp-graph", "Mocha", "Gulp", "PM2", "Webpack"]
    y-axis "Days" 0 --> 800
    bar [9, 120, 150, 210, 730]
ProjectWhat it doesTime to v1.0DevsApprox. LOC
mcp-graphCLI + API + Dashboard + graph + AI9 days1 (+AI)~60k
MochaJS testing framework~3-4 months1~5-8k
GulpBuild automation~4-5 months1-2~3-5k
PM2Process manager~6-8 months1-2~15-20k
WebpackModule bundler~2 years1~30-40k

Lines of code per day

xychart-beta
    title "LOC per effective development day"
    x-axis ["mcp-graph", "PM2", "Mocha", "Webpack", "Gulp"]
    y-axis "LOC/day" 0 --> 7000
    bar [6600, 95, 75, 65, 35]

The raw number suggests a 60-100x gain. But I know raw comparisons are unfair, so let me discount the distorting factors:

FactorDiscountReason
Modern tooling (2026 vs 2012)/2.0TypeScript, ESM, mature npm, much better IDEs
AI generates more verbose code/1.5Not every generated line carries the same weight as a hand-written one
Unsustainable pace/1.3I worked through the night, on weekends. This is not replicable.

Doing the honest math:

Raw gain: ~6,600 / ~80 = ~82x
With discounts: 82 / (2.0 x 1.5 x 1.3) = 82 / 3.9 = ~21x

Estimated range: 15 to 30 times more productive

If you’re not in the field: COCOMO II is a mathematical model that estimates how much time and how many people it takes to build software based on code size. According to this model, mcp-graph’s ~60k lines would take about 174 person-months without AI. That means a team of 5 would work for nearly 3 years.

But wait, don’t studies say 1.2-2x?

Yes. Studies from GitHub, Google, and McKinsey report 1.2x to 2x gains with AI. So why am I talking about 15-30x? Because the context is completely different:

  1. Full agent vs autocomplete. Tools like Copilot suggest isolated lines. Agents like Claude implement entire features from a spec. It’s like comparing a spell checker to a ghostwriter.

  2. The meta-recursive effect. The tool manages its own development. Each new feature improves the process that builds it. External tools don’t create this loop.

  3. New project, solo dev. Zero meetings. Zero communication overhead. Zero external code review. I decide, the AI implements. No friction.

Being honest about the limitations

I need to be transparent here. This comparison has problems, and I know what they are:

  • Different eras. Programming in 2012 was harder because of limited tooling, not just the absence of AI.
  • Lines of code don’t measure complexity. 60k lines of mcp-graph don’t compare to 60k lines of Webpack. Webpack solves much harder algorithmic problems.
  • Survivorship bias. I’m comparing with projects that succeeded. Thousands died before v1.0.
  • A pace nobody should maintain. 23 commits at 1 AM is not a healthy lifestyle. It works for 9 days, not 9 months.
  • Greenfield project. No legacy, no users, no backward compatibility. Any existing project would be much slower.
  • Long-term quality is a different conversation. 9 days of building doesn’t compare to 10+ years of maintenance.

How I kept the AI from generating junk

That’s the question I get asked the most. Short answer: a methodology I call Anti-Vibe-Coding, based on Extreme Programming adapted for AI-assisted work.

If you’re not in the field: XP (Extreme Programming) is a methodology that prioritizes automated testing, frequent releases, and simple code. TDD (Test-Driven Development) is an XP practice where you write the test before the code. You define what you expect first, then implement until the test passes.

Every task went through 8 phases:

graph LR
    A[ANALYZE\nRequirements] --> D[DESIGN\nArchitecture]
    D --> P[PLAN\nDecompose tasks]
    P --> I[IMPLEMENT\nTDD]
    I --> V[VALIDATE\nFull tests]
    V --> R[REVIEW\nImpact analysis]
    R --> H[HANDOFF\nDocumentation]
    H --> L[LISTENING\nFeedback]
    L --> |new cycle| A

    style A fill:#2196f3,color:#fff
    style D fill:#7c3aed,color:#fff
    style P fill:#f59e0b,color:#000
    style I fill:#4caf50,color:#fff
    style V fill:#06b6d4,color:#fff
    style R fill:#ec4899,color:#fff
    style H fill:#10b981,color:#fff
    style L fill:#9e9e9e,color:#fff

Day to day, it looked like this:

  1. next: get the next task the engine recommended
  2. context: receive a compressed summary of what mattered
  3. Write a failing test: the test defines expected behavior
  4. Minimum code to pass: just enough, nothing more
  5. Refactor: improve without changing behavior
  6. Mark as done: update the status in the graph
  7. Repeat

No “let me just quickly code this.” The AI followed the process I documented. And that made all the difference between code that works and code that looks like it works.

10 things I learned from this

1. AI doesn’t replace architecture

I define the WHAT and the HOW. The AI executes. Every time I asked it to “create a system for X,” the result was generic. When I described the architecture in detail (schemas, modules, interfaces), the result was precise.

2. Documenting patterns is the highest-return investment

Every hour I spent documenting conventions and mistakes in CLAUDE.md saved hours of corrections. The file grew from 20 to 400+ lines over the 9 days. By day 9, the AI barely made mistakes anymore.

3. With AI, tests matter more, not less

Without tests, AI generates code that “looks like it works.” With tests, it generates code that provably works. The 2,023 tests weren’t bureaucracy. They were the safety net that let me go fast without fear.

4. Rest is part of the process

The zero-commit day wasn’t weakness. The 3 days after it produced 45 commits better than everything that came before. My brain processes information while resting. AI doesn’t need to stop, but I do.

5. Big changes with planning don’t scare anyone

Three breaking changes in 9 days sounds insane. But each came with automatic migration. The fear of changing does more damage than the change itself.

6. Production bugs teach more than new features

The 6 hotfixes showed me things unit tests would never catch: blank screen in Safari, vanishing icons, database not switching at runtime. Real bugs are unforgiving teachers.

7. Simplifying is harder than adding

Cutting from 31 to 26 tools wasn’t a loss. It was a gain. Each surviving tool became more powerful and easier to use.

8. If you built the system, trust it

Using the tool to manage its own development only works if you respect what it recommends. If the engine says “do X first” and you ignore it, the system loses meaning.

9. Know your hours

23 commits at 1 AM. Zero between 5 and 7 AM. I’m a night owl. You might be a morning person. AI amplifies your natural rhythm, it doesn’t create a new one.

10. Automate on the third time

Automatic releases, security updates, multi-OS testing. Each automation cost 30 minutes and saved hours. The rule is simple: if you’ve done it 3 times by hand, automate it.

What I take away from all this

It’s not that AI replaces developers. It’s that AI changes what a dev can accomplish alone.

I couldn’t have written 60,000 lines in 9 days without AI. But I also couldn’t have done it with AI if I didn’t know how to design systems, ensure quality, manage complexity, and most importantly, know when to stop.

AI is an amplifier. It amplifies competence and it amplifies incompetence. Those who know architecture build robust systems much faster. Those who don’t create much more mess.

Tomorrow’s tools will be built by today’s AI. But throughout that cycle, one thing doesn’t change: someone needs to decide what to build and why. That person is the dev. AI is the fastest, most patient copilot that has ever existed. But a copilot is not a pilot.


Final numbers

Start:        March 9, 2026, 01:16 AM
Last commit:  March 18, 2026, 09:03 AM
Duration:     9 days, 7 hours, 47 minutes

142 commits | 23 releases | 461 files | ~60k effective lines
2,023 tests | 91% coverage | 30 MCP tools | 20 API routes

1 developer + 1 AI = 1 complete product

This study is based on real data extracted from the mcp-graph-workflow repository. All numbers were verified via git log, git shortlog, and source code analysis.

Comments

Loading comments...

Leave a comment

Related posts

Stay in the loop

Get articles about software architecture, AI and open source projects delivered to your inbox.