In May, a few-line fix cost me most of a day. The broken thing was a small worker that sends a thank-you email when a form submission arrives.

To fix it, I had to:

  • reread every file in the service
  • probe a live API to learn what its response shape had quietly become
  • reconstruct a policy decision made six weeks earlier, in a chat session that no longer existed

The bug was schema drift. The cost was discovery.

The graph nobody drew

In an AI-augmented workspace you ship small services fast: a worker that polls an API, a worker that sends email, a job scheduler, a database gateway. Each is a few hundred lines and takes an afternoon. At that speed, everyone skips the architecture diagram and the runbook. Each piece feels too small to deserve them.

Three forces compound:

  1. AI collapsed the cost of writing code and left the cost of writing about it untouched. The code-to-docs ratio falls off a cliff.
  2. Each service is locally simple but globally connected. That small worker was one node in a graph of nine: the form, its API, a schedule trigger, a dedupe store, and an email API. Plus DNS records, a stored token, a dashboard, and one policy rule about who gets which email.
  3. Chat-scoped context evaporates. “Only video submissions get the email” was decided once, out loud, in a session that’s gone.

Two months of building like that and a workspace holds 20 to 30 small services. Each is readable in minutes. Together they’re a system nobody fully understands. Including me. I built most of them.

A card per service

What holds is a doc that lives next to the code and changes in the same commit. One markdown file per service, fixed structure. Co-location is the forcing function. The review that catches a code typo catches a stale card.

The template, ready to lift:

# wiring-card: <service-name>

## Trigger
How this service starts: cron, webhook, queue, manual.

## Inputs
External APIs (with a sample response shape), secrets, stores, env vars.

## Side effects
Emails sent, files written, rows inserted, downstream services invoked.

## Schematic
A mermaid diagram of the flow. Ten boxes or fewer.

## Dependents
Services that break if this one changes.

## Change checklist
"If you change X, also touch Y." The institutional rules.

## Failure modes
| Date | What broke | Root cause | Prevented by this card now? |
|------|------------|------------|------------------------------|

## Policy rules
Deliberate operational decisions this service's output must respect.

Eight fields. Each one a question I actually asked during that lost day. Inputs, with a real sample response pasted in, would have caught the schema drift at read time. Side effects answers the blast-radius question.

Field on the cardFailure it catches early
InputsA service changing its response shape while the local code still looks readable.
Side effectsA tiny edit quietly sending email, writing records, or touching a downstream service.
DependentsA helper service turning out to be load-bearing for something nobody had open.
Policy rulesA deliberate human decision disappearing because the code alone looks permissive.

Policy rules is where “only video submissions get the email” stops being scrollback. Without it, the next session reads the code as the whole truth and re-enables what was deliberately turned off. I wrote about that failure mode in my reliability patterns.

Where to start

Skip cards for one-off scripts, vendor SaaS, and pure libraries. Then:

  1. Write the template once. It doubles as the new-service checklist.
  2. Backfill the one service that most recently bit you. Check the card would’ve prevented that bug.
  3. Backfill the rest opportunistically: any session that touches a cardless service writes the card in the same edit.

The honest cost is 15 to 20 minutes per card, call it 8 hours across 30 services. The honest risk is drift: a card and its code can disagree the moment one is edited alone, and nothing enforces the pairing yet. The next cross-cutting bug will report on how much that was worth.