What 3,800 Lines of Deadline Code Looks Like When AI Writes It

AI under deadline pressure produces the same technical debt humans produce: god objects, triple bookkeeping, swallowed exceptions. The difference is that it produces a working system faster on the way there.

I know because I just paid for the audit. My company workspace was built incrementally over months, mostly by earlier-generation AI models, under what I’d honestly call “make it work tonight” pressure. That framing is mine, and I stand by the tradeoff. The pressure produced a real operating system: a canonical Postgres data layer, an autonomous task loop, a daemon fleet, vector retrieval, and a dozen internal dashboards. It runs my business every day.

In June 2026 I had the current frontier model run three parallel deep audits over the whole thing: one on the data layer, one on the runtime, one on the UI and servers. Every finding had to cite file and line. No vibes, no “this feels messy.” Evidence or it didn’t count. Here’s what working-but-indebted looks like in numbers.

The Debt, Itemized

The database engine is one Python class: 3,831 lines, 108 methods, spanning roughly 12 domains. Inside it, 91 call sites each open a fresh database connection, because the cursor helper was written in one night and never pooled. Every query pays connection setup. Every future change pays the tax of reading a class that does twelve jobs.

The internal API server is 2,283 lines of hand-rolled JSON-RPC where every tool gets registered in three places. The audit counted 44 schemas against 43 handlers. The drift that triple bookkeeping “will eventually” cause had already happened. One tool existed on paper and nowhere else.

The importers carry roughly 470 duplicated lines: three near-identical CRUD blocks for three staging tables, each copied from the last and lightly edited. One file holds 16 bare “except Exception” blocks. The most expensive one wraps vector search: when it fails, the system silently degrades to keyword search and swallows the exception. The most differentiated feature in the stack could have been broken for weeks and the logs would have shown nothing.

My favorite finding is archaeological. One file contains a 70-line comment documenting a real production outage caused by sys.path import hacks. The bug report lives as a comment inside the file that caused the bug. Someone, or something, chose to document the landmine instead of removing it, which is exactly what a tired human does at 2am.

Then the surfaces. Five hand-rolled HTTP servers on five ports with five divergent security implementations. One dashboard is 1,809 lines of HTML living inside Python f-strings. Across every surface: 844 scattered CSS background declarations and zero CSS custom properties. And the one finding that actually scared me: secrets, including a GitHub token and an API key, injected straight into served HTML.

None of this is broken today. All of it taxes every future change.

That sentence is the audit’s own framing, and it’s the most precise definition of technical debt I’ve read. Debt that’s broken gets fixed. Debt that works gets compounded.

The Part Where the Auditor Was Wrong

Here’s the section most audit stories skip, and it’s the one that decided what happened next.

I forced each audit to end with a list of what it got wrong. The list was real. The audit assumed the project had no migrations framework: Alembic existed with 38 revisions. It assumed there were no tests: the data layer had 30+ passing test files. It assumed vector search was missing entirely: pgvector with an HNSW index was live in production, doing the work.

Notice the shape of those errors. Every one of them, taken at face value, argues for a rewrite. No migrations, no tests, no search? Burn it down. With the corrections in hand, the same evidence argues for an upgrade: the foundations exist, the debt sits on top of them, and the debt can be scheduled.

An auditor that can’t be corrected produces rewrites; an auditor forced to list its own stale assumptions produces an upgrade path.

So that’s what we did. The fix plan kept the architecture and scheduled the debt: connection pool first, because it’s the smallest diff with the biggest win across all 91 call sites. Then the queue. Then the gateway. Each phase ships on its own and reverses on its own. The 3,831-line class doesn’t get rewritten; it gets unbundled one domain at a time, with the tests that already exist standing guard.

Every V1 Was Built by a Weaker Model

“Weaker models under deadline pressure” sounds like a confession about AI. It also describes every junior team that ever shipped a v1.

Capable people, real constraints, success defined as “it works tonight.” You get god objects because decomposition takes time you don’t have. You get triple bookkeeping because adding a third registry was faster than refactoring the first two. You get swallowed exceptions because the demo is in an hour and an error dialog would ruin it. I’ve watched human teams produce every single pattern on my audit list. The models just produced them at a fraction of the cost, and handed me a running business while they did it.

Which is why I don’t regret a line of it. The regret in these stories never comes from the v1. It comes from building v2 on top of a v1 nobody ever counted. The numbers were sitting there the whole time: 91 unpooled connections, 44 schemas against 43 handlers, 16 silent failure points, one token in the HTML. None of them required genius to find. They required someone deciding the count was worth a day.

AI didn’t change the physics of technical debt. It changed the speed at which you can afford to measure it, and the speed at which you accumulate it if you don’t. The audit that took the frontier model an afternoon would have taken a consultant a month, and the consultant wouldn’t have cited file and line for every claim either.

So the operating rule I’ve landed on is simple. Ship the v1 under pressure, on purpose, with your eyes open. Then audit it before you build on it, force the auditor to show its evidence, and force it to say what it got wrong. The teams that get hurt by AI-written code won’t be hurt by the code. They’ll be hurt by never counting it.

I’d rather be the operator who counts.