← All articles

How We Build a Website That Can't Ship Broken

John Liddy ·

Operators monitoring rows of screens in a mission control room
Pexels / Pixabay · Pexels License

On June 10 we made the test suite behind our build gates public. Four CI runs later, it failed us.

Not a customer site. Not a staging server. The gate engine's own README. I had added a line describing our fleet-wide standard, and the line used a punctuation mark our style guard forbids. The commit went up, the public run went red, and the fix shipped twenty minutes later. Both runs are still in the public log, dated, for anyone to read.

That is not an embarrassing story. That is the product working. A quality system that only catches other people's mistakes is a sales pitch. One that blocks its own maintainers, in public, on its fourth run, is a standard.

This post is about how we build websites so they cannot ship broken, and how we prove it.

The standard, in numbers

Every website on the Site Clinic stack runs the same four build-time enforcement gates. Nine sites. Zero opt-outs. That includes siteclinic.io itself, our client builds, and the sample site whose only job is to prove the system works.

The gates check three things that quietly rot on most websites: accessibility, Google indexing rules, and the AI-discoverability baseline. A failing gate fails the build. A failing build does not deploy. There is no bypass flag and no "ship it anyway" path.

We covered what each gate checks in detail when we open-sourced the package. The short version: WCAG 2.1 AA via axe-core on every route, the exact indexing failures Search Console flags after the fact, and the per-bot robots, llms.txt, and JSON-LD surfaces that AI crawlers read. The engine is MIT-licensed and free to drop into any Next.js or JavaScript site.

One honest boundary. The AI-instrumentation gate enforces hygiene, not magic. Having llms.txt and JSON-LD in place does not make AI assistants cite you. The published research on that is clear, and we will not tell you otherwise. What the gate guarantees is narrower and more useful: the surfaces AI systems do read are present, well-formed, and cannot silently vanish in a refactor.

Claims are cheap. Proof has a green checkmark.

Anyone can write "we test our code" on a marketing page. So in June we moved the claims somewhere we cannot edit after the fact.

The gate engine's test suite, 55 detection-pattern tests as of this writing, runs in public GitHub Actions on every commit. Every release from v0.2.0 to v0.4.1 is tagged and published with changelog notes. If a test breaks, the badge on our own developers page turns red. We wired it that way on purpose. A proof surface you can quietly unpublish is not a proof surface.

Then there is the sample site.

The from-scratch build

bwt-sample-site is a complete Next.js 16 marketing build with one job: prove the gates pass end to end on a site that started from nothing.

No private templates. No internal Site Clinic scaffolding. No copied scripts. The git history is public, so you can watch it get built commit by commit, wired to the package exactly as the published README describes. Eight routes, each authored to pass all four gates.

Every Monday, a scheduled public CI run re-verifies the whole thing. It boots the site, loads every route in a real browser, runs the accessibility scan, checks the indexing rules, and validates the AI-instrumentation surfaces. The logs are dated and public. You do not have to trust that the gates passed once on my laptop in 2026. You can check what they did this week.

And if you want to go one step further, clone it. npm install, then npm run gate:all. The same four gates that block our deploys will run on your machine in a few minutes.

Why enforcement beats checklists

Most agencies will tell you they follow accessibility and SEO best practices. Many of them mean it. The gap is never intent. The gap is that websites drift.

A vendor component ships a button with no accessible name. A refactor moves a route handler and the canonical tag stops matching. The llms.txt file you added in March is gone by June because nobody knew the new code path skipped it. Each change passed review. The site looks fine. The standard is quietly dead.

A checklist describes the standard. A gate enforces it on every single deploy, forever, including the deploys nobody thought were risky. That difference is the whole product decision behind how we build.

It is also why the README incident matters more than any case study we could write. The guard did not know it was blocking the founder. It does not know the difference between a customer site and our own sales page. That indifference is exactly what you want from a quality system.

What the gates cannot do

The gates run once per deploy and exit. They prove the build that went out met the standard. They cannot see what happens after.

Production drift is a different problem. Search Console flags a new exclusion three weeks later. An AI crawler stops fetching your llms.txt. A DNS change quietly breaks a subdomain. The build was clean and the live site degraded anyway.

That after-deploy layer is what Site Clinic monitoring exists for, and it is the paid half of the boundary. The gates are free, for anyone, forever. The watching is the service.

Use it

If you build websites, take the gates. The repo has copyable config templates per site type, an AGENTS.md for AI-assisted onboarding, and the sample site as a working reference. Pin a tagged release, add four lines to package.json, and your next build is held to the same standard ours are.

If you would rather someone else hold the standard for you, that is the other thing we do.