What is a prompt registry?

A canonical store of prompt IDs, versions, environment pins, owners, and eval links—so production loads addressable templates, not wiki copy that drifts.

Who owns prompt versions in production?

The line-of-business process owner is accountable for outcomes and approves wording changes; IT pins versions in integrations; Legal consults on policy-bearing prompts.

When should we block a prompt release?

When smoke eval fails, eval set ID is missing, policy pack version is out of sync, or no named approver is listed for customer-facing changes.

The Prompt Registry Playbook

Production prompts fail quietly when they live in shared docs, chat threads, and developer laptops. Someone “just fixes” wording during a Friday incident; three teams fork different versions; Legal discovers customer-facing language that never went through review. A prompt registry is the operating answer: versioned templates, named owners, environment pins, and releases tied to evaluation hooks—the same discipline you expect from application code, applied to the text that frames model behavior.

This playbook is the operational guide for teams that already understand the blueprint in Structured Prompt System Blueprint. It walks through registry schema, promotion paths, rollback, and the release rituals that keep prompts auditable under real compliance pressure. Pair it with context architecture so prompt versions and policy packs move together, and with governance roles so someone can say no to scope creep. Registry terms: Glossary.

Why shared docs are not a registry

Shared documents fail for three predictable reasons. Drift: the wiki says one thing, the integration loads another, and nobody notices until a customer does. Audit: you cannot answer who changed customer-facing language on May 14 or roll back to the prior wording without archaeology in chat logs. Ownership: “everyone can edit” means no one is accountable when eval pass rate drops after a “small tweak.”

A registry makes prompts addressable: stable prompt_id, semver or calendar versions, environment pins (dev / staging / prod), and an explicit link to the eval set that gates promotion. Process owners see the same artifact IT deploys; Legal can approve a version hash, not a moving Google Doc.

Northline B2B blocked a second copilot purchase until support-reply-v3 held ninety-two percent eval pass for four weeks—not because they lacked models, but because they lacked a single prod pin and changelog. The registry was the artifact that made that conversation factual.

Registry schema (minimum viable)

Each live workflow step gets one row—not prose in a deck. Extend with your ITSM or git tooling as needed.

Field	Purpose
`prompt_id`	Stable identifier per workflow step (task framing, checker, retrieval query)
`workflow_id`	Parent workflow for RACI and audit correlation
`owner`	Accountable human for business wording and release approval
`version`	Semver or calendar version; never edit prod in place
`environment`	`dev` / `staging` / `prod` / `retired`
`template_hash`	Hash of deployed text for replay and audit
`eval_set_id`	Held-out cases that gate promotion
`min_pass_rate`	Threshold for pilot and prod promotion
`approved_by`	Legal or sponsor for policy-bearing prompts
`changelog_ref`	Ticket or commit linking to why version bumped

prompt_id is stable across versions—like an API route. version bumps when wording, constraints, or output contract change. template_hash lets audit prove which text ran on ticket #4821 six months later. eval_set_id and min_pass_rate connect to smoke, pilot, and scale gates documented in evaluation hooks.

Sample production row

prompt_id: support-reply-v3/task-framing
workflow_id: support-reply-v3
owner: support-ops@northline.example
version: 1.4.2
environment: prod
template_hash: sha256:a1b2c3…
eval_set_id: support-reply-eval-25
min_pass_rate: 0.92
last_release: 2026-05-10
approved_by: legal-counsel@northline.example
changelog_ref: RISK-2041

Promotion path at Northline: change in dev → smoke on ten cases → pilot threshold for two weeks → risk forum vote for shadow traffic increase → prod pin update with changelog line.

Version bump vs new prompt_id

Teams accumulate sprawl when every experiment gets a new ID. Use these rules to keep the registry legible.

Bump version when the workflow step is the same but wording, constraints, or output contract changed—task framing 1.4.2 → 1.4.3. Re-run eval; update template_hash; log in changelog.

Add a new prompt_id when the step is genuinely new—e.g., adding a checker/refund-policy step that did not exist, or splitting retrieval query from task framing because they now have different owners and release cadences.

Do not fork IDs for A/B model tests; use environment pins and feature flags on the same ID where possible so audit and eval stay comparable.

Retire old versions to archive, not delete—regulators and customers may ask what ran on a date six months ago. Northline keeps retired rows read-only with environment: retired.

If you cannot explain in one sentence why a new ID exists, you probably only needed a version bump.

Release checklist (operational)

Treat promotion like a software release because it is one.

Bump version in registry; never edit prod in place.
Run eval smoke plus pilot thresholds documented for the workflow.
Log release in changelog with ticket link and approver names.
Notify process owner and risk forum if customer-facing or policy-bearing.
Spot-check ten audit trail rows after promotion—inputs, policy_pack_version, sent text.

Block Friday prod releases without a named on-call owner. Pair registry changes with context pack version bumps when Legal publishes new policy text—prompt and context move together or eval lies.

Use the AI workflow eval checklist as the copy-paste gate document for go-live reviews.

Rollout without boiling the ocean

Northline’s first registry mistake would have been entering every chat template in the company on day one. They started with one workflow: support-reply-v3. Support ops owned three prompt IDs (task framing, retrieval query template, checker). IT pinned prod hashes in integration config—wiki copy is reference only, not authoritative.

Week 1–2: Workflow canvas completed; context spec linked from architecture guide; eval set of twenty-five cases drafted.

Week 3–4: Registry rows in dev; smoke gate enforced in CI for pull requests touching prompt files.

Week 5–8: Pilot at fifty percent tier-2 queue; weekly pass rate in risk forum; no Friday prod releases without on-call owner listed in changelog.

After pass rate held: Prod pin updated; shadow traffic increased per forum decision; policy_pack_version added to logs after a near-miss.

Second workflow (routing agent) cloned the matrix with a narrower allow list and its own eval set—expanded only after fifty-case pass, per data boundaries.

Rollback and incident response

When eval pass rate drops after a release, rollback is a registry operation, not a panic edit in prod.

Pin prior version and template_hash in prod environment.
Log rollback in changelog with incident ticket ID.
Add failing cases to eval set within one week.
Risk forum reviews root cause—prompt drift, context pack mismatch, or model change.

Northline rolled back task framing 1.4.3 to 1.4.2 in eleven minutes because prod pins lived in config tied to registry rows—not in a shared doc someone was editing concurrently.

Anti-patterns

Registry as documentation only while prod loads prompts from a developer’s laptop export.
Shared doc “source of truth” with no hash or version in logs—audit cannot replay.
Eval optional for “internal” workflows that later face customers after a rebrand.
Multiple prompt_ids for the same step because teams could not agree on semver.
Prompt changes without context pack version when Legal updated policy last week.

Where to go next

Document the workflow on the canvas before adding registry rows. Wire evaluation hooks before scaling traffic. Align context layers in what is context architecture so prompt versions and policy packs version together. When ownership is unclear, start with governance roles RACI so someone can say no to scope creep—not another tool purchase.

For regression testing practice, see Prompt Regression Testing in One Week. For RACE vs TAG framework choice in registry entries, see Prompt Frameworks for Business. For agent tool integration that loads prompts at runtime, pair this playbook with Model Context Protocol for Enterprise Teams when agents call external systems.