Skip to main content
neutral

Phase 18 — Prompt Library

Adds a versioned, searchable prompt library that lets agents discover task-specific prompts before constructing them from scratch, and contribute effective prompts back to a shared catalog. Modeled after the tool registry's immutable-versioned pattern (internal/registry/) with embedding-based semantic search via the existing vector infrastructure and a salience-inspired ranking formula.

Status: Completed (2026-02-09) Depends on: Phases 1-14 complete Migrations: 0025_prompt_library (Phase 18A) Branch: dev


Why Now

With Phases 1-14 complete, Cruvero agents have a mature tool registry, memory system, and embedding infrastructure — but prompt construction remains entirely hardcoded:

  1. No prompt reuse — Every agent run constructs prompts from scratch in internal/agent/activities.go (LLMDecideActivity). There is no mechanism to search for previously successful prompts for similar tasks.
  2. Hardcoded prompt builders — System prompts, repair prompts, and routing prompts are built inline with string concatenation. Variations require code changes, not catalog updates.
  3. No quality tracking — There is no feedback loop recording which prompts produce good outcomes. Agents cannot learn from past prompt effectiveness across runs.
  4. No parameterization — Prompt templates with variable interpolation would allow a single prompt definition to serve multiple contexts (e.g., different tool sets, domains, or agent personalities).

Phase 18 solves all four by introducing internal/promptlib/ as a prompt catalog with content-hashed versioning, embedding-based search, quality metrics, and text/template rendering.


Architecture

New package: internal/promptlib/

All prompt storage, search, ranking, and rendering consolidates here. Agent tool executors (prompt_search, prompt_create) provide the agent-facing interface.

┌───────────────────────────────────────────────────────────────────┐
│ promptlib.Store │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ PostgresStore │ │ MetricsStore │ │ Renderer │ │
│ │ (prompt CRUD, │ │ (usage counts, │ │ (text/template│ │
│ │ immutable hash) │ │ success rate, │ │ interpolation│ │
│ │ │ │ LLM ratings) │ │ + validation)│ │
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬───────┘ │
│ │ │ │ │
│ └─────────┬───────────┘ │ │
│ │ │ │
│ ┌──────▼──────┐ │ │
│ │ Searcher │ │ │
│ │ (3-stage) │ │ │
│ └──────┬──────┘ │ │
│ │ │ │
│ ┌────────────┼────────────┐ │ │
│ │ │ │ │ │
│ ┌─────▼─────┐ ┌────▼────┐ ┌────▼─────┐ │ │
│ │ Vector │ │ Ranking │ │ Result │ │ │
│ │ Retrieval │ │ Scorer │ │ Assembly │ │ │
│ │ (embed + │ │ (quality│ │ (render, │ │ │
│ │ search) │ │ +recen-│ │ format) │ │ │
│ │ │ │ cy+use)│ │ │ │ │
│ └───────────┘ └─────────┘ └──────────┘ │ │
│ │ │
│ External deps (reused, not owned): │ │
│ ├─ internal/embedding/Embedder │ │
│ ├─ internal/vectorstore/VectorStore (collection: │ │
│ │ "prompt_library") │ │
│ ├─ internal/memory/salience.go (ComputeRecency, │ │
│ │ ComputeUsageFrequency) │ │
│ └─ internal/tenant/ (multi-tenant isolation) │ │
└───────────────────────────────────────────────────────────────────┘

Core API

// Store manages prompt CRUD with content-hash immutability.
type Store interface {
Get(ctx context.Context, id string, version int) (Prompt, error)
GetByHash(ctx context.Context, hash string) (Prompt, error)
GetLatest(ctx context.Context, id string) (Prompt, error)
Put(ctx context.Context, prompt Prompt) error
List(ctx context.Context, filter ListFilter) ([]Prompt, error)
}

// Searcher finds prompts by semantic similarity + quality ranking.
type Searcher interface {
Search(ctx context.Context, query SearchQuery) ([]ScoredPrompt, error)
}

// Renderer applies text/template interpolation to prompt content.
type Renderer interface {
Render(prompt Prompt, params map[string]interface{}) (string, error)
ValidateParams(prompt Prompt, params map[string]interface{}) error
}

// MetricsStore tracks mutable quality signals separately from immutable prompts.
type MetricsStore interface {
RecordUsage(ctx context.Context, promptHash string, outcome UsageOutcome) error
RecordFeedback(ctx context.Context, promptHash string, feedback Feedback) error
GetMetrics(ctx context.Context, promptHash string) (PromptMetrics, error)
}

Key Types

type Prompt struct {
ID string `json:"id"`
Version int `json:"version"`
Hash string `json:"hash"`
Type PromptType `json:"type"`
Name string `json:"name"`
Description string `json:"description"`
Content string `json:"content"`
Parameters []ParamDef `json:"parameters,omitempty"`
Tags []string `json:"tags,omitempty"`
Author string `json:"author"`
TenantID string `json:"tenant_id"`
CreatedAt time.Time `json:"created_at"`
Metadata json.RawMessage `json:"metadata,omitempty"`
}

type PromptType string

const (
PromptTypeSystem PromptType = "system"
PromptTypeUser PromptType = "user"
PromptTypeTask PromptType = "task"
PromptTypeRepair PromptType = "repair"
PromptTypeRouting PromptType = "routing"
PromptTypeToolDesc PromptType = "tool_description"
PromptTypeChainOfThought PromptType = "chain_of_thought"
PromptTypeCustom PromptType = "custom"
)

type ParamDef struct {
Name string `json:"name"`
Type string `json:"type"`
Required bool `json:"required"`
Default string `json:"default,omitempty"`
Description string `json:"description,omitempty"`
}

type PromptMetrics struct {
PromptHash string `json:"prompt_hash"`
UsageCount int `json:"usage_count"`
SuccessCount int `json:"success_count"`
FailureCount int `json:"failure_count"`
AvgLLMRating float64 `json:"avg_llm_rating"`
LastUsedAt time.Time `json:"last_used_at"`
}

type ScoredPrompt struct {
Prompt Prompt `json:"prompt"`
Score float64 `json:"score"`
Components ScoreComponents `json:"components"`
}

type ScoreComponents struct {
Similarity float64 `json:"similarity"`
Quality float64 `json:"quality"`
Recency float64 `json:"recency"`
Usage float64 `json:"usage"`
}

Content Hashing (Immutability)

Mirrors registry.ComputeHash (internal/registry/types.go:65-78):

func ComputeHash(id string, version int, content string, promptType PromptType) (string, error) {
payload := hashInput{ID: id, Version: version, Content: content, Type: string(promptType)}
b, _ := json.Marshal(payload)
h := sha256.Sum256(b)
return hex.EncodeToString(h[:]), nil
}

Store uses INSERT ... ON CONFLICT DO NOTHING + hash verification — same pattern as registry.PostgresStore.Put (internal/registry/store.go:108-144).


Search Pipeline

Three-stage pipeline:

Stage 1: Vector Retrieval

  1. Embed query text using embedding.Embedder.Embed() (internal/embedding/embedder.go:23)
  2. Search prompt_library collection via vectorstore.VectorStore.Search() (internal/vectorstore/store.go:35)
  3. Apply tenant isolation filter (internal/tenant/)
  4. Retrieve top-K candidates (default K=20)

Stage 2: Re-Ranking

Score each candidate using a weighted formula adapted from memory.SalienceScorer (internal/memory/salience.go:51-65):

score = W_sim * similarity + W_qual * quality + W_rec * recency + W_use * usage
WeightDefaultSource
W_sim (similarity)0.4Vector cosine similarity from Stage 1
W_qual (quality)0.3success_rate * avg_llm_rating from prompt_metrics
W_rec (recency)0.2ComputeRecency(created_at, now, half_life) from memory/salience.go:155
W_use (usage)0.1ComputeUsageFrequency(usage_count, max_count) from memory/salience.go:187

Stage 3: Result Assembly

  1. Sort by composite score
  2. Truncate to requested limit (default 5)
  3. Optionally render templates with provided parameters
  4. Return []ScoredPrompt with score components for transparency

Feedback System

LLM Auto-Feedback

After each agent run that used a library prompt, the LLM self-assesses prompt effectiveness:

type UsageOutcome struct {
PromptHash string `json:"prompt_hash"`
RunID string `json:"run_id"`
StepIdx int `json:"step_idx"`
Success bool `json:"success"`
LLMRating float64 `json:"llm_rating"` // 0.0-1.0
TenantID string `json:"tenant_id"`
}

Recorded as a Temporal activity (non-blocking, fire-and-forget). Updates prompt_metrics table via MetricsStore.RecordUsage().

Optional User Feedback

Non-blocking user feedback via CLI/API signal — never blocks workflow:

type Feedback struct {
PromptHash string `json:"prompt_hash"`
UserID string `json:"user_id"`
Rating float64 `json:"rating"` // 0.0-1.0
Comment string `json:"comment,omitempty"`
TenantID string `json:"tenant_id"`
}

Recorded via MetricsStore.RecordFeedback(). Feedback is additive — it adjusts the running average but cannot delete or modify prompt content (immutable).


Template Rendering

Prompts use Go text/template for parameterized content:

// Example prompt content:
// "You are a {{.Role}} agent. Your task is to {{.Task}}. Available tools: {{range .Tools}}{{.Name}}, {\{end\}}"

func (r *TemplateRenderer) Render(prompt Prompt, params map[string]interface{}) (string, error) {
tmpl, err := template.New(prompt.ID).Parse(prompt.Content)
if err != nil {
return "", fmt.Errorf("invalid template: %w", err)
}
var buf bytes.Buffer
if err := tmpl.Execute(&buf, params); err != nil {
return "", fmt.Errorf("template execution failed: %w", err)
}
return buf.String(), nil
}

Parameter validation checks required params are present and types match ParamDef definitions before rendering.


Agent Access via Tools

Two tool executors following the memory_read/memory_write pattern (internal/tools/memory_read.go, internal/tools/memory_write.go):

prompt_search Tool

type PromptSearchTool struct {
searcher Searcher
renderer Renderer
}

func (t *PromptSearchTool) Name() string { return "prompt_search" }

// Schema:
// {
// "type": "object",
// "properties": {
// "query": {"type": "string"},
// "type": {"type": "string", "enum": ["system","user","task","repair","routing","tool_description","chain_of_thought","custom"]},
// "tags": {"type": "array", "items": {"type": "string"}},
// "params": {"type": "object"},
// "limit": {"type": "integer"}
// },
// "required": ["query"]
// }

prompt_create Tool

type PromptCreateTool struct {
store Store
embedder embedding.Embedder
vs vectorstore.VectorStore
}

func (t *PromptCreateTool) Name() string { return "prompt_create" }

// Schema:
// {
// "type": "object",
// "properties": {
// "name": {"type": "string"},
// "type": {"type": "string", "enum": ["system","user","task","repair","routing","tool_description","chain_of_thought","custom"]},
// "description": {"type": "string"},
// "content": {"type": "string"},
// "parameters": {"type": "array", "items": {"type": "object"}},
// "tags": {"type": "array", "items": {"type": "string"}}
// },
// "required": ["name", "type", "content"]
// }

Both tools are explicitly invoked by the agent — not auto-injected. Every access appears in the decision log via normal tool execution audit trail.


Sub-Phases

Sub-PhaseNamePromptsDepends On
18AFoundation: Types, Store, Hash, Renderer, Migration5
18BSearch + Ranking: Embedder Wiring, Searcher, Scorer518A
18CAgent Integration + Feedback: Tools, Activities, Wiring418B
18DCLI, Testing & Ops: Seed, Query, Feedback CLIs, Tests418C

Total: 4 sub-phases, 18 prompts, 9 documentation files

Dependency Graph

18A (Foundation) → 18B (Search/Ranking) → 18C (Agent Integration) → 18D (CLI/Testing)

Strictly sequential: each sub-phase builds on the previous.


Environment Variables

VariableDefaultDescription
CRUVERO_PROMPTLIB_ENABLEDtrueEnable prompt library
CRUVERO_PROMPTLIB_COLLECTIONprompt_libraryVector store collection name
CRUVERO_PROMPTLIB_SEARCH_K20Vector retrieval candidates (Stage 1)
CRUVERO_PROMPTLIB_RESULT_LIMIT5Max results returned to agent
CRUVERO_PROMPTLIB_W_SIMILARITY0.4Ranking weight: vector similarity
CRUVERO_PROMPTLIB_W_QUALITY0.3Ranking weight: quality score
CRUVERO_PROMPTLIB_W_RECENCY0.2Ranking weight: recency decay
CRUVERO_PROMPTLIB_W_USAGE0.1Ranking weight: usage frequency
CRUVERO_PROMPTLIB_HALF_LIFE168hRecency decay half-life (7 days)
CRUVERO_PROMPTLIB_FEEDBACK_ENABLEDtrueEnable user feedback recording
CRUVERO_PROMPTLIB_AUTO_FEEDBACKtrueEnable LLM self-assessment after prompt use

Files Overview

New Files

FileSub-PhaseDescription
internal/promptlib/types.go18APrompt, PromptType, ParamDef, PromptMetrics, ScoredPrompt, ScoreComponents
internal/promptlib/store.go18AStore interface + PostgresStore (CRUD, hash immutability)
internal/promptlib/metrics_store.go18AMetricsStore interface + PostgresMetricsStore
internal/promptlib/hash.go18AComputeHash (SHA256, mirrors registry pattern)
internal/promptlib/renderer.go18ARenderer interface + TemplateRenderer (text/template)
internal/promptlib/searcher.go18BSearcher interface + DefaultSearcher (3-stage pipeline)
internal/promptlib/scorer.go18BPromptScorer (ranking formula, weight config)
internal/promptlib/indexer.go18BIndexer (embed + upsert to vector store on Put)
internal/promptlib/config.go18BConfig wiring + component assembly from env vars
internal/tools/prompt_search.go18CPromptSearchTool executor
internal/tools/prompt_create.go18CPromptCreateTool executor
internal/promptlib/feedback.go18CFeedback types + RecordUsageActivity (Temporal)
cmd/prompt-seed/main.go18DCLI to seed prompt library from YAML/JSON files
cmd/prompt-query/main.go18DCLI to search prompt library
cmd/prompt-feedback/main.go18DCLI to submit user feedback
migrations/0025_prompt_library.up.sql18ACreate prompts + prompt_metrics tables
migrations/0025_prompt_library.down.sql18ADrop tables
internal/promptlib/types_test.go18DType validation and JSON round-trip tests
internal/promptlib/hash_test.go18DHash determinism and uniqueness tests
internal/promptlib/store_test.go18DPostgresStore tests (sqlmock)
internal/promptlib/metrics_store_test.go18DPostgresMetricsStore tests (sqlmock)
internal/promptlib/renderer_test.go18DTemplateRenderer tests
internal/promptlib/indexer_test.go18DIndexer tests (mock embedder + vector store)
internal/promptlib/scorer_test.go18DPromptScorer tests
internal/promptlib/searcher_test.go18DDefaultSearcher pipeline tests
internal/promptlib/config_test.go18DConfig loading and validation tests
internal/promptlib/feedback_test.go18DFeedback activity tests
docs/manual/prompt-library.md18DFeature manual page

Modified Files

FileSub-PhaseChange
internal/tools/manager.go18CRegister prompt_search and prompt_create executors
internal/agent/activities.go18CWire optional prompt library lookup before LLM prompt construction
internal/config/config.go18BAdd promptlib config fields + env var loading

Migration: 0025_prompt_library

-- 0025_prompt_library.up.sql

CREATE TABLE IF NOT EXISTS prompts (
tenant_id TEXT NOT NULL DEFAULT '_global',
id TEXT NOT NULL,
version INTEGER NOT NULL,
hash TEXT NOT NULL,
type TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT NOT NULL DEFAULT '',
content TEXT NOT NULL,
parameters JSONB,
tags TEXT[] DEFAULT '{}',
author TEXT NOT NULL DEFAULT '',
metadata JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (tenant_id, id, version),
UNIQUE (tenant_id, hash)
);

CREATE INDEX idx_prompts_type ON prompts (tenant_id, type);
CREATE INDEX idx_prompts_tags ON prompts USING GIN (tags);
CREATE INDEX idx_prompts_hash ON prompts (hash);

CREATE TABLE IF NOT EXISTS prompt_metrics (
prompt_hash TEXT NOT NULL PRIMARY KEY,
tenant_id TEXT NOT NULL DEFAULT '_global',
usage_count INTEGER NOT NULL DEFAULT 0,
success_count INTEGER NOT NULL DEFAULT 0,
failure_count INTEGER NOT NULL DEFAULT 0,
total_rating DOUBLE PRECISION NOT NULL DEFAULT 0,
rating_count INTEGER NOT NULL DEFAULT 0,
last_used_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_prompt_metrics_tenant ON prompt_metrics (tenant_id);

Success Metrics

MetricTarget
Prompt type coverage8 types (system, user, task, repair, routing, tool_description, chain_of_thought, custom)
Search latency (vector + re-rank)< 50ms p99
Store immutabilityHash verification on every Put (0 content mutations)
Template rendering< 1ms p99
Feedback recordingNon-blocking, < 5ms fire-and-forget
Agent tool integrationprompt_search + prompt_create registered and functional
Multi-tenant isolationAll queries scoped by tenant_id
Quality signal accuracyLLM auto-rating within 0.15 of user rating (when both present)
Backward compatibilityExisting prompt construction in activities.go unchanged when library disabled
Test coverage>= 80% for internal/promptlib/ (enforced by scripts/check-coverage.sh)

Risk Mitigation

RiskMitigation
Cold start (empty library)cmd/prompt-seed CLI pre-loads curated prompts. Library search returns empty gracefully — agent falls back to hardcoded builders.
Low-quality prompt proliferationQuality score incorporates success rate + LLM rating. Low-quality prompts naturally sink in rankings.
Template injection via parameterstext/template auto-escapes. Parameter validation enforces types. Content is immutable (can't be modified post-creation).
Vector search latency at scaleStage 1 retrieval bounded by K=20. Re-ranking is in-memory, O(K log K). Collection uses existing vector infrastructure.
Embedding cost for indexingEmbeddings generated once on Put, cached in vector store. Search embeds query only (single call).
Breaking existing prompt constructionLibrary is opt-in. When disabled (CRUVERO_PROMPTLIB_ENABLED=false), activities.go prompt builders are unchanged.

Relationship to Other Phases

PhaseRelationship
Phase 5 (Memory)18B reuses memory.ComputeRecency and ComputeUsageFrequency for ranking
Phase 6 (Tool Registry)18A mirrors registry.Store immutability pattern (hash, ON CONFLICT, tenant isolation)
Phase 8 (Embeddings + Vector)18B reuses embedding.Embedder and vectorstore.VectorStore with new collection
Phase 14 (API)API endpoints can expose prompt library search/create via existing route patterns
Phase 17 (PII Guard)PII filtering applies to prompt content at output boundary (no special handling needed)

Progress Notes

(none yet)