Design-to-code Pipeline v2

A single-command pipeline that turns design agency deliverables into deterministic AI build rules


Most teams treat visual consistency as a discipline problem. You hire careful developers, schedule design reviews, run periodic audits. It works until attention lapses — which it always does. Visual debt accumulates silently. Dark mode overrides get hardcoded. Token systems drift from the spec. By the time anyone notices, hundreds of components need manual correction.

This pipeline treats consistency differently: as a context problem. AI agents produce inconsistent UI not because they lack judgment, but because they lack the structured context to make the right call without guessing. Give them that context in the right form — encoded once, machine-readable, always consulted before anything is built — and consistency becomes a structural property of the system rather than something you maintain through effort.

This is an updated version of From Design Intent to Working Components, which introduced that concept and the two-index architecture underlying it. v2 makes it operational: a portable, self-configuring pack that runs the full pipeline from a single command, handles the messy reality of what design agencies actually deliver, and ships a fillable brief you can hand to the agency before work begins.


The So What — What This Actually Gives You

Before the detail: what does running this pipeline produce, and why does it matter?

You get two files that the rest of the system runs on.

The first is a design JSON — every visual decision the agency made, encoded with purpose, location, rationale, and a machine-readable usage instruction. Not just “this hex is the primary brand color” but “use this token for primary action surfaces — buttons, active nav indicators, links — not for backgrounds or decorative elements, because it was chosen for maximum contrast against neutral surfaces.” An agent reading this file knows not just the value but the intent.

The second is a component index — a permanent map of every component in the codebase, what level it sits at in the atomic hierarchy, what it’s composed from, and where it’s used. When an agent needs to build something, it looks up what already exists before writing a line.

The practical result: every AI agent that touches your UI reads these two files before acting. Visual decisions stop being per-session judgment calls and become retrieved context. Consistency stops being something you audit after the fact and starts being something the system produces.

Why this matters for teams running lean: you don’t need a dedicated design engineering function. You need the pipeline run once, the files committed, and agents that consult them. The design intent encoded by the agency scales across every component, including ones the agency never explicitly designed.


What’s New in v2

The conceptual architecture from v1 is unchanged. What changed is how you get from zero to a running pipeline — and how the system handles the gap between what you want from an agency and what agencies actually deliver.

1. Intake is now format-agnostic. The original pipeline assumed a well-structured set of deliverable files. Real agencies deliver Figma exports, PDFs, styled markdown guides, screenshots, or some combination. The Intake Agent now reads whatever is in the agency-deliverables/ folder, classifies each source by format, and infers design intent from whatever it finds. If all you have is screenshots, it runs a visual analysis pass and flags everything it infers for human review.

2. There’s a fillable brief to give the agency. Rather than hoping the delivery lands right, you hand the agency a structured document — DESIGN.md — before work begins. It tells them exactly what to fill in and why each section matters. Completed and returned, it becomes the primary Intake input. The brief is covered in detail below, including what went wrong in practice when specific sections were missing.

3. One command runs the full pipeline. In v1 you loaded agent files manually, pasted prompts, reviewed output, and repeated. The pipeline now has an orchestrator: run-agent.mjs. You invoke it once and it drives the full sequence end-to-end — stopping only at the three gates that genuinely require human input.

4. The pack is portable and self-configuring. A one-time init script walks you through your project structure and writes a pack.config.json that all subsequent scripts read. Runtime agent files are materialised from templates with your actual paths resolved — agents receive instructions that reference real files, not placeholders.

5. The pipeline now knows what it can’t do autonomously. Organisms and templates with complex runtime dependencies — real-time sockets, collaborative editing, AI streaming — get design tokens applied in-place but are explicitly deferred rather than silently broken. The Curator produces a curator-report.md after each run: a sorted work queue of → HUMAN ACTION items for developers, not a raw log.


The Agency Brief — What to Ask For and Why

The single most valuable addition in v2 is getting the brief right before the agency starts. Running this pipeline against a real codebase revealed exactly what breaks when specific sections are missing. Those lessons are built into the brief.

The downloadable template is the clean version to hand your agency. What follows is the annotated version — with the reasoning behind each section and what actually went wrong when it was incomplete.

D1 — Example screens in light and dark mode

Pixel-identical pairs of the same screens in both modes. Not different screens — the exact same layouts, side by side. Minimum 6–8 screens covering common page types.

From a real run: Without screen pairs, the Intake Agent infers dark mode surface values from description alone. It produces values that look plausible but are wrong, with no detection mechanism. Screen pairs give the agent ground truth: “sidebar background in light = #F5F5F5, sidebar background in dark = #1A1A1A” — verifiable, not inferred.

D2 — Color palettes, full 50–950 scales

Every color family as a complete scale. Every shade annotated with its intended use.

Why the full scale matters: Without it, hover states, disabled states, and background tints have to be guessed. The agent can’t derive brand-600 (hover) from brand-500 (default) without the scale telling it that 600 is the hover shade. With the full scale, every interactive derivative is predetermined and consistent.

D3 — Color inventory with why field

Every color used in the screens, documented with purpose, location, and rationale. References palette names, not raw hex.

From a real run: When the why field is absent, agents produce technically correct but contextually wrong choices. They match a hex value but don’t understand the intent — so they apply a sidebar background color to a modal overlay because the values happen to be close. The why field is what lets the agent reason about components the agency never explicitly designed.

D4 — Semantic surface annotations

Every surface in the screens named by role — bg-surface-card, bg-surface-sidebar, text-on-surface-muted — with both light and dark values mapped to palette references.

Why naming matters: Named surfaces are the bridge between the example screens and the entire rest of the product. Without them, the agent can style what it has seen but cannot reliably style anything new. With them, it can map any new component to the right surface role without having seen that component explicitly.

D5 — Dark mode: the most commonly underprovided section

Three sub-sections: surface model decision (opaque or glass), shadow behavior per elevation level in both modes, and semantic token shifts between modes.

From a real run: A production codebase with a partially tokenised design system had 1,544 hardcoded dark mode overrides that bypassed the token system entirely — accumulated over time because the dark token set was never fully specified. Wherever the spec had gaps, developers hardcoded values directly. The more complete the dark token set, the fewer manual overrides accumulate.

On shadow behavior specifically: Light mode and dark mode shadows are fundamentally different. Light mode uses dark pigment drop shadows. Dark mode typically uses inset light edges combined with deep drop shadows — not just a darker version of the same shadow. Without explicit dark shadow values, the pipeline falls back to inverted light values, which look wrong.

D6 — State transformation rules

One universal set: hover, focus, active, disabled, selected. Stated once. We apply them to every interactive component downstream.

From a real run: When state rules are absent, the Intake Agent correctly skips the interactive state tokenisation pass rather than guessing. The consequence: every button, input, link, and nav item in the product needs its hover/focus/active/disabled states specified manually. One page of state rules unblocks the entire interactive layer.

D7 — Systematic rules

Spacing scale, border-radius scale, shadow/elevation scale, typography scale. Every value with the context it applies to.

Why context annotations matter: A spacing value without context (“16px”) is nearly useless for composition decisions. A spacing value with context (“16px — section padding, compact”) gives the agent enough to make correct decisions without needing explicit instruction for every case.

D8 — Aesthetic description

300–600 words of prose covering: overall aesthetic, spatial philosophy, surface hierarchy, motion feel, typography voice, and dark mode character.

Why prose, not a checklist: This is fallback context for edge cases the token maps don’t cover. A list of properties doesn’t capture “this dark mode is pure black canvas with vibrant accents — it’s not an inversion of light mode, it’s a distinct personality.” That qualitative distinction changes hundreds of micro-decisions across the product. Prose captures what values cannot.

D9–D11 — Font licensing, gradients, icon style

Font licensing is a hard blocker — the pipeline cannot deploy a font without confirmed web licensing. Gradients documented only as visual elements get hardcoded at implementation time and are nearly impossible to update systematically later; documented as CSS strings with palette anchors, they stay inside the token system.


How the Pipeline Runs

The pipeline runs through an AI coding agent — Claude Code or equivalent. You don’t execute Node commands yourself. You point the agent at the pack, give it the invocation, and it handles the full sequence end-to-end. Your role is directing, not operating.

Setup — once per project

Tell your agent to run:

node design-agent-pack/scripts/init.mjs

The agent works through an interactive questionnaire — where your design system lives, where shared components go, which product roots to include, what framework you’re on — and produces a pack.config.json. Runtime agent files are then materialised from templates with your actual paths resolved. The agent then validates the setup and scans component roots before the first Census.

Running the pipeline

Tell your agent to run:

node design-agent-pack/scripts/run-agent.mjs --config <path-to-pack.config.json>

The agent executes each stage in sequence and stops only at three genuine gate conditions where a human decision is actually required:

  • After Intake conflict audit — if token naming collisions need a decision
  • After Census — if duplicate component resolution is required before the Atomizer proceeds
  • After Atomizer diagnosis — for remediation plan approval before any files are touched

Everything else runs unattended. Pipeline state is written to pipeline-state.json after each stage so resuming after a gate is seamless.

Single-stage re-runs are supported when you need to rerun just one phase:

node design-agent-pack/scripts/run-agent.mjs --config <path> --stage INTAKE

The Full Pipeline — v2

flowchart TD
    subgraph SETUP["Setup (One-Time)"]
        S1[init.mjs] --> S2[pack.config.json]
        S2 --> S3[materialize-agents.mjs]
        S3 --> S4[validate-setup.mjs]
        S4 --> S5[scan.mjs]
    end

    subgraph PHASE_1["Phase 1 — Intake"]
        A1["agency-deliverables/\n(DESIGN.md, markdown, images)"] --> A2[Pass 0: Format Detection + Conflict Scan]
        A2 --> A3[Pass 1: Extraction]
        A3 --> A4[Pass 2: Enrichment]
        A4 --> A5[Human Audit]
        A5 --> A6[/"Design JSON + CSS Tokens + Tailwind Extension"/]
        A6 --> A7[[Schema validation]]
    end

    subgraph PHASE_2["Phase 2 — Census"]
        B1[Scope Manifest] --> B2[Component Inventory]
        A6 --> B2
        B2 --> B3[Duplicate Scoring]
        B3 --> B4[Migration Ledger]
        B4 --> B5[/"CENSUS_STATUS: READY_FOR_ATOMIZER"/]
        B5 --> B6[[Schema validation]]
    end

    subgraph PHASE_3["Phase 3 — Atomizer"]
        C1[Codebase Scan] --> C2[Diagnosis Report]
        C2 -->|HARD GATE| C3[Remediation]
        C3 --> C4[Component Mapping]
        C4 --> C5[/"Component Index"/]
        C5 --> C6[[Schema validation]]
    end

    subgraph PHASE_4["Phase 4 — UI Builder"]
        D1[Design JSON + Index] --> D2[Replace Styling with Tokens]
        D2 --> D3[Zero Functionality Changes]
    end

    subgraph PHASE_5["Phase 5 — UX Audit"]
        E1[Storybook Review] --> E2[Functionality Fixes]
        E2 --> E3[Index Validation]
    end

    subgraph PHASE_6["Phase 6 — Feature Builder (Ongoing)"]
        F1[Feature Spec] --> F2{Component exists?}
        F2 -->|Yes| F3[Use existing]
        F2 -->|No| F4{Meets shared criteria?}
        F4 -->|Shared| F5[Atomizer creates + registers]
        F4 -->|One-off| F6[In-page + data-cid stamp]
        F5 --> F7[Curator updates counts]
        F6 --> F7
    end

    subgraph STANDING["Standing Agents"]
        G1[Curator] --> G2{One-off 2+ locations?}
        G2 -->|Yes| G3[Flag for approval]
        G2 -->|No| G4[Hold]
        G3 --> G5[Atomizer promotes]
        H1[Postmortem Governor] --> H2[Archive + Track findings]
    end

    SETUP --> PHASE_1
    PHASE_1 --> PHASE_2
    PHASE_2 --> PHASE_3
    PHASE_3 --> PHASE_4
    PHASE_4 --> PHASE_5
    PHASE_5 --> PHASE_6
    PHASE_6 --> STANDING
    STANDING -->|"Index stays current"| PHASE_6
PhaseAgentModeWhat it produces
Setupinit + materializeOne-timepack.config.json, runtime agents with resolved paths
1IntakeOne-timeDesign JSON, CSS tokens, Tailwind extension, conflict audit. Schema-validated.
2CensusOne-time per waveComponent inventory, duplicate scoring, migration ledger. Gates Atomizer. Schema-validated.
3AtomizerOne-time + standingComponent index, atomic structure. DEFER_ORGANISM for high-coupling components. Schema-validated.
4UI BuilderOne-timeToken-styled codebase. Zero functionality changes.
5UX AuditManualValidated components, corrected index.
6Feature BuilderOngoingNew UI from spec, using both indexes.
CuratorStandingUsage telemetry, one-off tracking. Emits curator-report.md as developer work queue.
Postmortem GovernorStandingReport archiving, finding lifecycle, drift detection.
Validation LayerPer-runJSON Schema enforcement on all structured outputs. P0 on failure.

Where the Pipeline Hands Off to Developers

The pipeline is fully autonomous for atoms and molecules. Organisms and templates with complex runtime dependencies are handled differently — and the system is explicit about where its autonomy ends.

DEFER_ORGANISM — when the Atomizer steps back

When the Atomizer classifies a component as an organism or template, it checks for dependency patterns that make extraction to the shared library unsafe to do automatically:

TriggerWhy it blocks extraction
Socket.IO importReal-time state is tied to the component’s location
TipTap / Hocuspocus importCollaborative editing context can’t be naively moved
AI streaming importStreaming lifecycle coupled to parent page/route
Store importsStore binding depends on the component’s context
Data-fetching lifecycle importsQuery scope may be tied to a route
200+ lines AND 4+ importsComplexity heuristic — likely a god component

When any of these fire, the Atomizer applies design tokens in-place (TOKEN_PASS) but does not extract the component to the shared library. The deferral is recorded with the specific trigger. The developer handoff message: “Design tokens have been applied in-place. Moving this component to the shared library requires untangling its dependency first — that’s an architectural decision the pipeline can’t make safely.”

The developer decides: refactor and re-run, leave in-place permanently, or extract manually with full knowledge of the consequences. This is not a failure mode — it’s the pipeline correctly identifying its own boundary.

The Curator Report — a work queue, not a log

After each Curator run, curator-report.md is produced. It contains usage counts for all shared components, anomalies (zero-usage, unregistered, broken imports), promotion candidates, and explicit → HUMAN ACTION items for deferred organisms and anything else the pipeline flagged. The document you send to the dev team after a pipeline run. A sorted work queue, not a debug log.


What Stayed the Same

The two indexes are structurally identical to v1. The design JSON is still a shallow index for visual intent: every token carrying a usage instruction written for an agent, not a human. The component index is still a shallow index for structural composition: composed_of pointing downward, used_in pointing upward, the Curator owning telemetry writes, the Atomizer owning structural writes.

The data-cid one-off tracking mechanism is unchanged. The atomic hierarchy governs component creation. The Postmortem Governor enforces governance across all stage runs. The human gates are unchanged: design decisions flagged by agents, one-off promotion approvals, and the UX audit all still require human judgment.


Resources


This pipeline is part of ongoing work on codebase-first design tooling. The pack is freely available — use it, adapt it, and if you build something with it, reach out.