Skip to main content

Phase 9

neutral

🔒 Phase 9D: Security Hardening & I/O Sanitization

📜 Phase 9C: Audit Logging & Compliance

⚖️ Phase 9B: Rate Limiting, Quotas & Cost Guardrails

🏛️ Phase 9A: Multi-Tenancy & Namespace Isolation

Subphases

Phase 9: Enterprise Hardening

Status: Completed (2026-02-09)

Production-ready for serious, multi-tenant workloads. Security, compliance, and operational resilience as infrastructure guarantees.

Why This Phase Matters

Cruvero's core value proposition is "production survival." Phases 1–8 built the runtime, tools, memory, multi-agent coordination, and observability. Phase 9 ensures the platform can be operated by teams you don't control, for workloads you didn't anticipate, under compliance regimes you must satisfy.

This is the difference between "works on my machine" and "SOC 2 auditor approved."

Design Philosophy

Tenant isolation is not a feature — it's a property of the architecture. Every boundary (namespace, quota, network, audit) is enforced at the infrastructure layer (Temporal namespaces, Postgres row-level security, network policies) rather than application-level checks that can be bypassed.

Zero-trust by default. Every tool call, LLM invocation, and state mutation is authenticated, authorized, and auditable. Opt out of security for development; never opt in for production.

Compliance as code. Audit trails, PII detection, and export formats are automated pipelines — not manual processes bolted on after the fact.

Subphases

SubphaseScopeEst. Duration
9AMulti-Tenancy & Namespace Isolation2 weeks
9BRate Limiting, Quotas & Cost Guardrails1.5 weeks
9CAudit Logging & Compliance2 weeks
9DSecurity Hardening & I/O Sanitization2 weeks
9EHigh Availability & Disaster Recovery1.5 weeks

Total estimated: 8–10 weeks (some parallelizable)

Subphase Index

SubTitleKey DeliverablePrompts
9AMulti-Tenancy & Namespace IsolationTenant CRUD, Temporal namespaces, RLS, memory/registry scoping4 prompts
9BRate Limiting, Quotas & Cost GuardrailsToken bucket limiter, cost caps, model downgrade, quota dashboard3 prompts
9CAudit Logging & ComplianceHash-chained audit trail, PII detection, SOC 2/HIPAA exports3 prompts
9DSecurity Hardening & I/O SanitizationgVisor/nsjail sandbox, prompt injection defense, network policies, Vault4 prompts
9EHigh Availability & Disaster RecoveryHealth checks, LLM failover, K8s manifests, DR playbook, runbooks3 prompts

Dependencies

  • Phase 2 (signals, queries, decision log) — required
  • Phase 4 (memory) — required for tenant-scoped memory isolation
  • Phase 5 (supervisor) — required for multi-tenant agent coordination
  • Phase 6B (cost tracking) — required for quota enforcement
  • Phase 8C (observability, auth) — required for OIDC integration and OTEL pipeline

Architecture Decisions

Tenant Model

One Temporal namespace per tenant. This gives hard isolation at the workflow engine level — tenants cannot see, signal, or query each other's workflows. The alternative (shared namespace with workflow-ID prefixing) was rejected because it relies on application-level enforcement and breaks Temporal's native access controls.

Quota Enforcement Layer

Quotas are enforced via a middleware activity wrapper that checks tenant limits before every LLM call and tool execution. This is not a rate limiter in front of the API — it's baked into the workflow execution path, so even replayed or continued-as-new workflows respect current quotas.

Audit Storage

Audit events go to an append-only Postgres table with hash chaining (each event includes hash of previous event). This provides tamper evidence without requiring external blockchain infrastructure. Export pipelines produce SOC 2 and HIPAA-compatible formats.

Security Layers

LayerMechanism
Tool sandboxgVisor/nsjail for python_exec/bash_exec
Input sanitizationPre-LLM prompt injection detection
Output filteringPII redaction, sensitive data masking
Network policiesPer-tool egress rules, deny-by-default
Secret injectionVault/OIDC per-tenant, no env vars in prod

Key Files (New)

internal/tenant/
config.go # TenantConfig, ResourceQuotas, RateLimits
store.go # TenantStore interface
postgres_store.go # Postgres implementation
middleware.go # Activity middleware for quota enforcement
namespace.go # Temporal namespace management

internal/quota/
limiter.go # Token bucket + sliding window
policy.go # QuotaPolicy evaluation
store.go # Quota state persistence

internal/audit/
event.go # AuditEvent types
logger.go # Append-only audit writer
chain.go # Hash chain verification
export.go # SOC2/HIPAA export
pii.go # PII detection + redaction

internal/security/
sanitizer.go # Input sanitization
output_filter.go # Output filtering
network_policy.go # Per-tool egress rules
sandbox.go # Enhanced sandbox (gVisor/nsjail)

migrations/
0013_tenants.up.sql / down.sql
0014_tenant_usage_daily.up.sql / down.sql
0015_quotas.up.sql / down.sql
0016_audit_log.up.sql / down.sql

Exit Criteria (Phase 9 Complete)

  • Tenants fully isolated at Temporal namespace level
  • Per-tenant rate limits enforced without race conditions
  • Audit log tamper-evident with hash chain verification
  • PII detection and redaction operational
  • Compliance exports (SOC 2, HIPAA) passing validation
  • Tool sandbox hardened with gVisor or nsjail
  • Input sanitization blocks prompt injection patterns
  • Network policies enforced per-tool
  • HA deployment guide validated in staging
  • DR playbook tested with failover scenario

Closeout Gaps and Future-Proof Backlog (2026-02-07)

  1. Audit UI surface tracked in Phase 7F (docs/phases/PHASE7F.md) and implemented in legacy UI bridge pages.
  2. Security alerts UI surface tracked in Phase 7F (docs/phases/PHASE7F.md) and implemented in legacy UI bridge pages.
  3. Host-level sandbox integration tests added (tagged security,integration; opt-in via CRUVERO_RUN_HOST_SANDBOX_TESTS=true).
  4. Alert rules as code added under deploy/monitoring/ (Prometheus + Loki).
  5. DR and HA drill automation scripts added under scripts/ops/.
  6. Security posture and DR readiness checklists added under docs/operations/checklists/.
  7. Execute staged HA/DR drills and attach reports to release evidence.

Environment Variables (New)

# Tenancy
CRUVERO_TENANT_MODE=single|multi # default: single
CRUVERO_TENANT_STORE=postgres # default: postgres
CRUVERO_TENANT_DEFAULT_NAMESPACE=default

# Quotas
CRUVERO_QUOTA_ENABLED=true|false # default: true
CRUVERO_QUOTA_DEFAULT_RPM=60 # requests per minute
CRUVERO_QUOTA_DEFAULT_TPD=1000000 # tokens per day
CRUVERO_QUOTA_DEFAULT_COST_USD=100.0 # max daily cost

# Audit
CRUVERO_AUDIT_ENABLED=true|false # default: false
CRUVERO_AUDIT_PII_DETECTION=true|false # default: false
CRUVERO_AUDIT_EXPORT_FORMAT=soc2|hipaa|json
CRUVERO_AUDIT_RETENTION_DAYS=365

# Security
CRUVERO_SANDBOX_MODE=process|gvisor|nsjail # default: process
CRUVERO_INPUT_SANITIZATION=true|false # default: false
CRUVERO_OUTPUT_PII_REDACTION=true|false # default: true
CRUVERO_NETWORK_POLICY_ENABLED=true|false # default: false

🌐 Phase 9E: High Availability & Disaster Recovery