In May, a few-line fix cost me most of a day. The broken thing was a small worker that sends a thank-you email when a form submission arrives.
To fix it, I had to:
- reread every file in the service
- probe a live API to learn what its response shape had quietly become
- reconstruct a policy decision made six weeks earlier, in a chat session that no longer existed
The bug was schema drift. The cost was discovery.
The graph nobody drew
In an AI-augmented workspace you ship small services fast: a worker that polls an API, a worker that sends email, a job scheduler, a database gateway. Each is a few hundred lines and takes an afternoon. At that speed, everyone skips the architecture diagram and the runbook. Each piece feels too small to deserve them.
Three forces compound:
- AI collapsed the cost of writing code and left the cost of writing about it untouched. The code-to-docs ratio falls off a cliff.
- Each service is locally simple but globally connected. That small worker was one node in a graph of nine: the form, its API, a schedule trigger, a dedupe store, and an email API. Plus DNS records, a stored token, a dashboard, and one policy rule about who gets which email.
- Chat-scoped context evaporates. “Only video submissions get the email” was decided once, out loud, in a session that’s gone.
Two months of building like that and a workspace holds 20 to 30 small services. Each is readable in minutes. Together they’re a system nobody fully understands. Including me. I built most of them.
A card per service
What holds is a doc that lives next to the code and changes in the same commit. One markdown file per service, fixed structure. Co-location is the forcing function. The review that catches a code typo catches a stale card.
The template, ready to lift:
# wiring-card: <service-name>
## Trigger
How this service starts: cron, webhook, queue, manual.
## Inputs
External APIs (with a sample response shape), secrets, stores, env vars.
## Side effects
Emails sent, files written, rows inserted, downstream services invoked.
## Schematic
A mermaid diagram of the flow. Ten boxes or fewer.
## Dependents
Services that break if this one changes.
## Change checklist
"If you change X, also touch Y." The institutional rules.
## Failure modes
| Date | What broke | Root cause | Prevented by this card now? |
|------|------------|------------|------------------------------|
## Policy rules
Deliberate operational decisions this service's output must respect.
Eight fields. Each one a question I actually asked during that lost day. Inputs, with a real sample response pasted in, would have caught the schema drift at read time. Side effects answers the blast-radius question.
| Field on the card | Failure it catches early |
|---|---|
| Inputs | A service changing its response shape while the local code still looks readable. |
| Side effects | A tiny edit quietly sending email, writing records, or touching a downstream service. |
| Dependents | A helper service turning out to be load-bearing for something nobody had open. |
| Policy rules | A deliberate human decision disappearing because the code alone looks permissive. |
Policy rules is where “only video submissions get the email” stops being scrollback. Without it, the next session reads the code as the whole truth and re-enables what was deliberately turned off. I wrote about that failure mode in my reliability patterns.
Where to start
Skip cards for one-off scripts, vendor SaaS, and pure libraries. Then:
- Write the template once. It doubles as the new-service checklist.
- Backfill the one service that most recently bit you. Check the card would’ve prevented that bug.
- Backfill the rest opportunistically: any session that touches a cardless service writes the card in the same edit.
The honest cost is 15 to 20 minutes per card, call it 8 hours across 30 services. The honest risk is drift: a card and its code can disagree the moment one is edited alone, and nothing enforces the pairing yet. The next cross-cutting bug will report on how much that was worth.