108. DEPLOYMENT MODELS & HOSTING STRATEGY
From Hosted to Self-Hosted Without Rewriting: The Zero-Regret Architecture
"Start on HUMAN-hosted in 5 minutes. Move to your own infra in 5 days, without rewriting your app."
That's the barrier killer.
This document defines how HUMAN supports every deployment model β from 10-person teams to regulated enterprises β through clean architectural boundaries and zero-regret migration paths.
THE DESIGN PRINCIPLE: ZERO-REGRET HOSTING
We keep our core invariants:
- Keys live on devices (Passport rooted in hardware, not our cloud)
- Cloud is coordination + proofs, not raw user data
- Storage is pluggable behind clean adapters
The mantra becomes:
"Start on HUMAN-hosted in 5 minutes. Move to your own infra in 5 days, without rewriting your app."
This solves the fundamental tension:
- SMBs need: "Please don't make me think about infra. Just make it work."
- Enterprises need: "Cool idea, but it has to run in our VPC / DB / SIEM / whatever."
One architecture. Multiple deployment profiles. Same APIs. Same semantics.
WHAT HUMAN HOSTS VS WHAT WE REFUSE TO HOST
Think in 3 layers:
π Layer 0 β Devices (non-negotiable)
Passport keys live on:
- iPhones, Macs, laptops, Android devices, etc.
We never host:
- Root identity keys
- Raw biometric / behavioral signals
We may host:
- Public keys
- Signed assertions ("this key belongs to Org X, role Y"), but not the keys themselves
This is structural: HUMAN cannot hold your identity even if we wanted to.
π§ Layer 1 β HUMAN Control Plane (this is the "managed hosting" we do want)
This is the stuff we can run as HUMAN Cloud or you can self-host:
What Lives Here
- Identity federation: Mapping "IdP user 123" β "Passport subject ABC"
- Capability Graph: Roles, permissions, what a given human/agent is allowed to approve
- HumanOS policy engine: "When an AI tries to do X, require human Y or escalate to group Z"
- Attestation / ledger interface: Where "AI did X under these conditions with these humans" gets recorded
This is metadata about trust, not the org's documents/emails/PHI.
Deployment Options
We:
- Offer this as managed for SMB / fast starts (HUMAN Cloud)
- Offer hybrid / self-hosted for enterprises who want it in their VPC
π¦ Layer 2 β Data & Systems (we stay aggressively out of here)
What lives in:
- Salesforce, Google Workspace, O365, Epic, internal DBs, S3 buckets, etc.
We don't become their database:
We connect to these via:
- OAuth / service accounts / private links
We only store:
- IDs, hashes, and pointers needed for provenance and policy decisions
Result:
β
Yes to hosting the trust fabric
β No to becoming their data warehouse
THE THREE DEPLOYMENT PROFILES
We make this a first-class concept:
Profile 1: Hosted (SMB Default)
What it means:
Everything in HUMAN Cloud, except:
- Keys on devices
- Primary data in their SaaS tools
Who it's for:
- 10β200 person companies
- Teams without dedicated infra people
- Organizations prioritizing speed over control
Setup time: 5 minutes
Monthly cost: Plan-based ($Xβ$XX per user + governed events)
Migration path: Can move to Hybrid or Self-hosted later with export/import tooling
Profile 2: Hybrid (Enterprise Common)
What it means:
HUMAN control plane in our cloud:
- Policy engine
- Capability Graph
- Federation of identities
Data plane in their infra:
- They run the ledger node(s)
- They host any caches / sensitive stores
Connect via:
- Outbound-only secure tunnels
- Or VPC peering, depending on taste
Who it's for:
- Mid-market to enterprise (200β10,000 employees)
- Organizations with infra teams but want operational simplicity
- Companies with data residency requirements but flexible on control plane
Setup time: 1β2 days
Monthly cost: Platform fee + usage + optional support
Migration path: Can move to full Self-hosted when compliance requires it
Profile 3: Self-Hosted (Regulated / Gov)
What it means:
We give them:
- Helm charts / Terraform / Ansible
- Reference architecture
- AI-powered installation automation (see "AI-Powered Installation Automation" section)
- Compliance templates (HIPAA, FedRAMP, PCI-DSS, GDPR)
They run (self-hosted in their infrastructure):
-
HumanOS orchestration engine
- Policy engine (escalation rules, safety boundaries)
- Routing logic (capability-first task assignment)
- Approval queue service
-
Capability Graph (org-scoped view)
- Internal employee and agent capabilities
- Skill tracking and growth
- Capability attestations (org-namespaced)
-
Workforce Cloud Runtime (internal routing layer)
- Agent-to-agent task orchestration
- Agent-to-human escalation (employees or customers)
- Capability-based task assignment within organization
- Internal workflow execution engine
-
MARA Runtime (agent execution environment)
- Agent pods and workloads
- Agent registry service
- Execution monitoring
-
Ledger nodes (local audit trail)
- Provenance logs
- Attestation storage
- Immutable audit trail
- Optional federation with HUMAN's public ledger
-
All storage (databases, object storage, caches)
- PostgreSQL (policies, workflows, history)
- Vector store (agent memory, capability embeddings)
- Object storage (MinIO, S3)
- Redis (caching, sessions)
-
Monitoring & observability stack
- Prometheus, Grafana
- Loki (log aggregation)
- Tempo (distributed tracing)
They access via API (HUMAN-hosted, optional services):
-
Workforce Cloud Global Marketplace (optional, pay-per-task)
- For routing to external trained humans beyond internal staff
- 24/7 coverage, surge capacity, specialized expertise
- Academy-trained workers globally
- Pricing: Usage-based ($50 per escalation + $75 per human-hour)
- Use cases: Overflow beyond org's staff, specialized skills, 24/7 ops
-
Academy Training Platform (free for individuals, volume pricing for enterprises)
- Web-based access for employee training
- Always free for displaced workers (Zero Barriers principle)
- Enterprise bulk programs: $500/employee/year (volume discounts available)
- Requirements: Internet connectivity (no self-hosted option by design)
- Integration: SSO, custom learning paths, capability sync via API
- See: KB 24 (Academy) for full deployment model
-
Global Capability Federation (optional, subscription)
- Cross-org credential verification
- Interoperability with other HUMAN deployments
- Verify capabilities from external organizations
- Pricing: $10K/year
-
Public Ledger Anchoring (included in platform license)
- Global attestation root of trust
- Distributed ledger for cross-org verification
- Customer can run federated ledger nodes
- Anchoring to HUMAN's public ledger for global validity
Deployment modes:
-
Air-Gapped (Full Isolation):
- Runs WCR, MARA, HumanOS, Capability Graph fully offline
- No access to Global Marketplace or Academy
- Local-only ledger (no global verification)
- Org-only capability tracking
- Requires internal image registries, Helm mirrors
Canonical note (Passport growth still happens): Even in full isolation, governed work events still update the local Capability Graph and create local attestations. The Passport evolves in-place via updated
CapabilityGraphRoot(pointer to the head of the personal graph in the on-prem vault) and newLedgerRefs(attestation anchors on the on-prem ledger).Example data placement (air-gapped):
personalGraphs: storage: on_prem_vault evidence: storage: on_prem_vault attestations: storage: on_prem_ledger federation: enabled: false -
Hybrid (Internal + Global Services):
- Self-hosted control plane + optional HUMAN Cloud services
- Internal routing for org's tasks
- API access to Global Marketplace for overflow
- Employee access to Academy for training
- Most common for regulated enterprises
Who it's for:
- Regulated industries (healthcare, finance, government)
- Organizations with strict data sovereignty requirements
- Companies with mature platform engineering teams
- Air-gapped environments (defense, intelligence)
- Multi-national corporations with data residency compliance
Setup time:
- With AI installation assistant: 5-15 minutes (automated)
- With intelligent CLI: 1-2 hours (semi-automated)
- Manual Helm/Terraform: 1-2 weeks (depends on infra complexity)
Monthly cost:
- Platform license: $30K-$150K+/year (based on scale, see KB 34)
- Support contract: Included (24/7 for Enterprise Elite)
- Optional services: Usage-based (Workforce Cloud, Academy bulk, Federation)
- Infrastructure costs: Customer responsibility (compute, storage, networking, AI tokens)
Migration path: This is the end state; no further migration needed
SELF-HOSTED SECURITY BOUNDARIES
Overview
Self-hosted deployments provide maximum control and data sovereignty, but they do not grant identity minting authority or bypass cryptographic safeguards.
This section explicitly defines what self-hosted infrastructure CAN and CANNOT do, and why infrastructure compromise doesn't threaten human sovereignty.
Key Principle: Trust derives from cryptography, not operational control.
What Self-Hosted Infrastructure CAN Do
β Identity Verification
- Verify Passport signatures cryptographically
- Validate DID resolution and key ownership
- Check delegation chains for authenticity
- Verify attestation signatures
Why Safe: Verification requires only public keys. No private key access needed.
β Org-Scoped Attestations
- Issue attestations within organizational namespace
- Attest to employment, roles, permissions within the org
- Sign attestations with org's private key (held in org HSM)
Why Safe: Org attestations are namespaced. They don't affect other organizations or create global identity.
β Policy Engine & Agent Runtime Hosting
- Run HumanOS policy engine
- Host agent execution environments
- Enforce escalation rules and safety boundaries
- Route tasks based on capability requirements
Why Safe: Policy enforcement is read-only verification. Cannot override cryptographic constraints.
β Org and Agent Key Custody
- Hold Org Passport keys in organizational HSMs
- Custody Agent Passport keys under policy constraints
- Manage agent delegation certificates
Why Safe: These are delegated identities, not sovereign identities. They derive authority from humans, not from infrastructure.
What Self-Hosted Infrastructure CANNOT Do
These constraints are cryptographically enforced, not policy-based. Violating them renders the deployment non-compliant with the HUMAN Protocol.
β Server-Side Human Passport Creation
Forbidden:
- Minting Human Passports on servers
- Generating human identity keys in infrastructure
- Creating "admin" identities that impersonate humans
Why Forbidden:
- Human Passports MUST be created on-device (Secure Enclave, TEE, hardware key)
- Private keys MUST NEVER leave the device
- Only devices can prove human presence (biometric, passkey)
Technical Enforcement:
- Device attestation required for Human Passport minting
- Ledger rejects Human Passports without device signature
- Other deployments reject server-minted identities
Result: Self-hosted infrastructure physically cannot create human identities that other systems will accept.
β Admin-Minted "Human" Identities
Forbidden:
- Admins creating "human" accounts for convenience
- Shared credentials representing multiple humans
- Service accounts masquerading as humans
Why Forbidden:
- Violates identity sovereignty (humans own their identity)
- Breaks provenance (can't distinguish human from admin action)
- Creates liability (who is responsible for actions?)
Technical Enforcement:
- Human Passports require device-rooted keys
- Capability Graph rejects capability updates from non-device sources
- Attestations require human signature, not admin signature
Result: Admin convenience cannot override identity architecture.
β Shared or Pooled Human Signing Keys
Forbidden:
- Multiple humans sharing one private key
- "Team" identities with shared credentials
- Delegating human signing authority to infrastructure
Why Forbidden:
- Destroys accountability (who signed this?)
- Breaks provenance chain (no attribution)
- Enables impersonation (anyone with key = "you")
Technical Enforcement:
- Each human has unique DID and keypair
- Private keys never exported from device
- Signature verification checks specific DID
Result: Infrastructure cannot hold human private keys, even if admins request it.
β Identity Recovery Performed by Infrastructure
Forbidden:
- Admins "recovering" human identity without human authorization
- Resetting human private keys from servers
- Backdoor recovery mechanisms
Why Forbidden:
- Recovery without human = identity theft
- Breaks trust (infrastructure can impersonate)
- Creates legal liability (unauthorized access)
Technical Enforcement:
- Recovery requires guardian quorum (other humans, not servers)
- Recovery process uses threshold cryptography (no single point of failure)
- Ledger logs all recovery attempts
Result: Only humans (via guardian network) can recover human identity. Infrastructure cannot override.
β Silent Identity Creation or Modification
Forbidden:
- Creating identities without human approval
- Modifying identity records without signed consent
- "Backdating" identity changes
Why Forbidden:
- Violates consent (humans must approve)
- Breaks provenance (no audit trail)
- Enables fraud (who made this change?)
Technical Enforcement:
- All identity changes require signature from identity owner
- Ledger anchors record creation and modification timestamps
- Unsigned changes rejected by protocol
Result: Infrastructure cannot modify identity, even with "good intentions."
Breach Blast Radius Analysis
Understanding what an attacker gains by compromising different deployment types:
Hosted Profile Breach (HUMAN Cloud Compromise)
What Attacker Gains:
- Disruption of service (DoS)
- Metadata about API usage (traffic patterns)
- Ability to issue fake attestations (rejected by verification)
What Attacker CANNOT Gain:
- Human private keys (never stored server-side)
- Ability to mint Human Passports (device-only)
- Ability to impersonate humans (no private keys)
- Ledger modification (distributed, immutable)
Customer Impact:
- Hosted customers: Service interruption (failover to backup region)
- Self-hosted customers: Zero impact (independent deployments)
Mitigation:
- Multi-region active-active (automatic failover)
- Keys on devices (zero server-side exposure)
- Ledger distribution (no single point of truth)
Hybrid Profile Breach (Data Plane Compromise)
What Attacker Gains:
- Access to customer's ledger nodes (can disrupt sync)
- Access to org attestations (can view org-specific data)
- Potential ability to issue fake org attestations (namespaced)
What Attacker CANNOT Gain:
- Human private keys (on devices)
- Ability to mint Human Passports (device-only)
- Access to other orgs' data (namespace isolation)
- Ability to override human decisions (cryptographically enforced)
Customer Impact:
- Affected customer: Must revoke org key and re-issue attestations
- Other customers: Zero impact (namespace isolation)
- Humans: Zero impact (keys on devices)
Mitigation:
- Org key revocation via ledger broadcast
- Namespace isolation prevents cross-org contamination
- Audit trail reveals all actions during compromise window
Self-Hosted Profile Breach (Full Infrastructure Compromise)
What Attacker Gains:
- Full access to org's deployment (database, services, keys)
- Ability to issue fake org-scoped attestations
- Ability to disrupt org's operations
- Metadata about org's agent usage
What Attacker CANNOT Gain:
- Human private keys (on devices, not in infrastructure)
- Ability to mint Human Passports (device-only)
- Ability to impersonate humans in other orgs (namespace isolation)
- Ability to modify capability records for humans (requires human signature)
- Access to distributed ledger state (replicated across network)
Customer Impact:
- Affected org: Must revoke org key, rebuild infrastructure
- Other orgs: Zero impact (namespace isolation)
- Humans: Identity intact (keys on devices)
Mitigation:
- Human keys never in infrastructure (zero exposure)
- Org key revocation invalidates all attestations
- Other orgs' verification rejects compromised attestations
- Humans can revoke consent and move to new org deployment
Critical Insight: Even complete infrastructure compromise doesn't grant attacker human identity authority.
Open Source Safety Guarantees
Q: Can self-hosted customers modify HUMAN code to bypass these restrictions?
A: No. Protocol compliance is mathematically enforced, not code-enforced.
Why Code Modification Doesn't Grant Authority
-
Cryptographic Verification is Protocol-Level
- Even modified code must verify Ed25519 signatures
- Invalid signatures are rejected by other nodes
- Forked implementations cannot interoperate without compliance
-
Network Effects Enforce Standards
- Distributed ledger rejects non-compliant attestations
- Other deployments ignore invalid signatures
- Humans choose which implementations to trust
-
Device Keys are the Source of Truth
- Human identity keys live on devices, not in code
- Infrastructure verifies signatures, doesn't create them
- Modified infrastructure cannot access device keys
-
Interoperability Requires Compliance
- Non-compliant forks cannot participate in ledger
- Attestations from non-compliant deployments are rejected
- Enterprise customers lose certification
Example Attack (Why It Fails):
// Malicious self-hosted deployment tries to mint human identity
async function evilAdminMintHuman() {
const fakePassport = {
did: 'did:human:evil-admin-123',
publicKey: generateKeyPair().publicKey,
// ... other fields
};
await db.passports.insert(fakePassport);
// β This fails because:
// 1. No device attestation (requires Secure Enclave signature)
// 2. DID not registered on distributed ledger
// 3. Cannot sign with fake private key (device holds real key)
// 4. Other deployments reject attestations from this DID
// 5. Humans won't trust this "passport" (no provenance)
return fakePassport; // Locally stored, but useless
}
Result: Modified code can create database records, but not valid identities.
Comparison: Self-Hosted HUMAN vs Self-Hosted Traditional Identity
| Dimension | Traditional IdP (Okta, Auth0) | HUMAN Self-Hosted |
|---|---|---|
| Identity Creation | Admin creates users | Only devices create humans |
| Key Storage | Server-side (HSM) | Device-only (Secure Enclave) |
| Admin Override | Admins can reset passwords | Admins cannot access human keys |
| Impersonation Risk | Admin can impersonate users | Cryptographically impossible |
| Compromise Blast Radius | All users (admin has master keys) | Org only (humans unaffected) |
| Recovery | Admin initiates | Guardian quorum (other humans) |
| Portability | Vendor lock-in | Globally portable DID |
Why This Matters:
Traditional self-hosted identity gives administrators god-mode access. HUMAN self-hosted gives administrators operational control without identity authority.
This is the architectural innovation that makes self-hosted deployments safe at scale.
Compliance Statement
For regulated industries:
Self-hosted HUMAN deployments comply with:
- HIPAA (patient identity sovereignty)
- GDPR (data subject rights, right to portability)
- eIDAS (qualified electronic signatures)
- SOC 2 (cryptographic key management)
- Zero Trust Architecture (continuous verification, no implicit trust)
Certification:
Self-hosted deployments that violate identity minting rules:
- Lose HUMAN Protocol certification
- Cannot interoperate with HUMAN Cloud or other compliant deployments
- Lose vendor support and updates
- Risk regulatory non-compliance
Audit Trail:
All self-hosted deployments must:
- Log all identity verification events
- Maintain provenance for all attestations
- Participate in distributed ledger (or run private ledger node)
- Submit to periodic compliance audits (for certification)
Why This Architecture Wins
For Enterprises:
- Self-hosting gives control without creating liability
- Infrastructure compromise doesn't expose human identity
- Clear blast radius (org only, not global)
- Regulatory compliance by design
For Humans:
- Identity sovereignty maintained even in self-hosted deployments
- Can leave org without losing identity
- Cannot be impersonated by admins
- Portable across all deployment types
For HUMAN:
- Open source doesn't compromise security
- Self-hosted doesn't fragment protocol
- Network effects reinforce standards
- Trust derives from cryptography, not vendor control
Result: Self-hosted deployments are safe, compliant, and strategically valuable β not a security liability.
AI-POWERED INSTALLATION AUTOMATION
The Problem with Traditional Self-Hosting:
- 1-2 weeks to deploy
- Requires Kubernetes expertise
- 1266 lines of manual YAML
- High error rate, high support burden
- Blocks SMBs from self-hosting
HUMAN's Solution: Installation as a Conversation
Self-hosted HUMAN installs in 5-15 minutes through three paths:
Installation Path 1: Companion Installer (Conversational)
Natural language installation for technical decision-makers:
User: "I want to self-host HUMAN in our AWS VPC for 50 agents with HIPAA compliance"
Companion Installer:
- Detects environment (AWS EKS, RDS available, VPC config)
- Asks 5 clarifying questions (HA requirements, air-gap, integrations)
- Generates optimal configuration (capacity planning, compliance hardening)
- Executes automated installation with human approval at critical steps
- Validates deployment health
- Provides dashboard access + admin credentials
Time: 8-15 minutes
Human involvement: Approve 3-5 critical decisions
Expertise required: Understand business requirements (not YAML)
Installation Path 2: Intelligent CLI
For engineers who prefer CLI:
$ npx @human/installer init
π Detecting environment...
β
AWS EKS cluster detected (us-east-1)
β
kubectl configured (v1.28)
β
PostgreSQL RDS available
π Configuration wizard (5 questions):
? Agent capacity: 50 agents
? High availability: Yes (multi-AZ)
? Compliance: HIPAA
? Air-gapped: No
ποΈ Installing HUMAN...
[Progress bars for each component]
β¨ Complete (12m 34s)
Time: 10-20 minutes
Human involvement: Answer 5 questions
Expertise required: Basic cloud/k8s familiarity
Installation Path 3: Cloud Marketplace
One-click deployment for enterprises:
- AWS Marketplace β Click "Launch" β HUMAN deployed in 10 minutes
- GCP Marketplace β Same experience
- Azure Marketplace β Same experience
Time: 5-10 minutes (fully automated)
Human involvement: Click "Subscribe"
Expertise required: None
Installation Path 4: Manual (Advanced)
For maximum customization or air-gapped environments with no installer access:
- Follow detailed implementation spec:
setup/agent_deployment_selfhosted_spec.md - Manual Helm/kubectl commands
- Full control over every configuration detail
Time: 1-2 weeks
Human involvement: Full manual configuration
Expertise required: Deep Kubernetes/infrastructure knowledge
How AI-Powered Installation Works
Environment Detection:
- Cloud provider (AWS, GCP, Azure, bare metal)
- Kubernetes version and capabilities
- Existing infrastructure (databases, storage, monitoring)
- Network configuration (VPC, subnets, security groups)
- Compliance posture (encryption, audit logs)
Configuration Generation:
- Capacity planning (CPU, memory, storage based on agent count)
- Compliance templates (HIPAA, FedRAMP, PCI hardening)
- High availability (multi-AZ, failover, backup)
- Cost optimization (minimum viable resources)
- Security best practices (CIS benchmarks, zero-trust)
Automated Installation:
- Pre-flight validation (capacity, permissions, connectivity)
- Kubernetes resource creation (namespaces, deployments, services)
- Database schema migration
- Secrets management (encryption at rest)
- Network policies (zero-trust networking)
- Monitoring stack deployment (Prometheus, Grafana)
Post-Install Validation:
- All pods healthy
- Database connectivity
- API responsiveness
- Agent registration works
- Storage accessible
- Monitoring operational
Human-in-the-Loop:
- Approve critical decisions (database connection, secrets creation)
- Review generated configuration before apply
- Escalation on errors (with remediation suggestions)
Why This Matters
Before AI-powered installation:
- Self-hosting = enterprise-only (requires dedicated ops team)
- SMBs blocked from data sovereignty
- High support burden for HUMAN
- Slow adoption, high friction
After AI-powered installation:
- Self-hosting = accessible to SMBs
- 5-15 minute setup (vs 1-2 weeks)
-
95% success rate (vs ~60% manual)
- Low support burden
- Fast adoption, low friction
This is Living HAIO: AI agents installing and configuring AI agent infrastructure.
Status: Vision documented (this PRD). Implementation: Q1 2026.
See: KB 50 (Human Agent Design) for agent architecture.
SELF-HOSTED ENTERPRISE REQUIREMENTS
Licensing & Enforcement
License Types:
| License Type | Annual Price | Agent Limit | Support Level | Use Case |
|---|---|---|---|---|
| Development | $0 | 5 | Community | Testing, staging environments |
| Production (Node-Locked) | $30,000 | 200 | Standard | Single datacenter deployment |
| Production (Floating) | $50,000 | 200 | Standard | Multi-datacenter with failover |
| Enterprise (Unlimited) | $100,000+ | Unlimited | Enterprise + TAM | Global deployments, MSPs |
Enforcement Mechanism:
- License key validated on control plane startup
- Cryptographic signature verification
- Phone-home validation (once per 24hr, optional for air-gapped)
- Grace period: 30 days after expiry (with warnings)
- Air-gapped: Offline license validation via signed JWT
License Renewal:
- Automated renewal reminders (90, 60, 30, 7 days)
- Zero-downtime renewal (hot-swap license keys)
- Volume discounts for multi-year contracts
Support & Service Level Agreements
Support Tiers:
| Severity | Response Time | Resolution Target | Channels | Included In |
|---|---|---|---|---|
| P0 (System Down) | <1 hour | <4 hours | Phone, Slack, Email | Enterprise |
| P1 (Critical Impact) | <4 hours | <24 hours | Slack, Email | Standard+ |
| P2 (Moderate Impact) | <8 hours | <3 days | Email, Portal | Standard+ |
| P3 (Low Impact) | <24 hours | <7 days | Portal | All (incl Community) |
Support Access Requirements:
- Standard: Business hours (9-5 local time), email + portal
- Enterprise: 24/7, dedicated Slack channel, phone, TAM assigned
- Community: Forums, GitHub issues, community Slack (best-effort)
Enterprise Support Add-Ons:
- Technical Account Manager (TAM): +$10k/year
- Professional Services: $250/hour
- Onsite Training: $2k/person (2-day workshop)
- Compliance Certification Support: $15k/year (HIPAA, FedRAMP guidance)
Total Cost of Ownership (TCO) Analysis
TCO Comparison: Hosted vs Self-Hosted (50 agents, 3 years)
| Cost Component | HUMAN-Hosted | Self-Hosted |
|---|---|---|
| Software License | $0 (usage-based) | $30k/yr Γ 3 = $90k |
| Infrastructure | Included | $3.2k/mo Γ 36 = $115k |
| Operational Labor | Included | 0.5 FTE Γ 3yr = $180k |
| Support | Included | $0 (Standard incl) |
| Upgrades | Automated | Included |
| Total (3yr) | ~$180k | ~$385k |
Break-Even Analysis:
- Self-hosted TCO higher for <100 agents
- Break-even at ~150-200 agents (3-year horizon)
- Self-hosted wins for >200 agents OR data sovereignty required
When Self-Hosted Makes Sense:
- Regulated industries (HIPAA, FedRAMP, PCI)
- Air-gapped environments (defense, classified)
- Data sovereignty requirements (EU, China, government)
- Very high scale (>200 agents)
- Existing infrastructure (sunk costs in datacenter)
When Hosted Makes Sense:
- Small deployments (<50 agents)
- Fast time-to-value (no infrastructure burden)
- Variable workloads (pay-as-you-go)
- No ops team available
Reference Architectures
Small Enterprise (5-20 agents):
- Kubernetes: 3 nodes, 4vCPU, 8GB each
- Database: PostgreSQL (8vCPU, 32GB, Multi-AZ)
- Storage: 500GB SSD
- Estimated cost: $1,050/month infrastructure + $30k/yr license
Medium Enterprise (20-100 agents):
- Kubernetes: 10 nodes, 8vCPU, 16GB each
- Database: PostgreSQL (16vCPU, 64GB, Multi-AZ + replicas)
- Storage: 2TB SSD
- Estimated cost: $3,200/month infrastructure + $50k/yr license
Large Enterprise (100-500 agents):
- Kubernetes: 30 nodes, 16vCPU, 32GB each
- Database: PostgreSQL (32vCPU, 128GB, Multi-AZ + read replicas)
- Storage: 10TB SSD
- Multi-region deployment (primary + DR)
- Estimated cost: $12k/month infrastructure + $100k/yr license
Global Deployment (500+ agents, multi-region):
- Kubernetes: 100+ nodes across 3+ regions
- Database: Distributed PostgreSQL (CitusDB or similar)
- Storage: 50TB+ distributed
- Multi-cloud (AWS + Azure for resilience)
- Estimated cost: $50k+/month infrastructure + custom licensing
COMPLIANCE READINESS FOR SELF-HOSTED
HIPAA Compliance
HUMAN provides:
- Encryption at rest (database, storage, secrets)
- Encryption in transit (TLS 1.3)
- Audit logging (all access, all actions)
- Access controls (RBAC, MFA)
- Business Associate Agreement (BAA) template
Customer responsible for:
- Administrative safeguards (policies, training)
- Physical safeguards (datacenter security)
- Technical safeguards (network security, backups)
HIPAA-Specific Configuration:
compliance:
hipaa:
enabled: true
auditLogging:
retention: 6years # HIPAA requirement
immutable: true
encryption:
algorithm: AES-256-GCM
keyRotation: 90days
accessControls:
mfaRequired: true
sessionTimeout: 15min
HIPAA Checklist: See compliance document docs/compliance/self-hosted-checklists.md
FedRAMP Compliance (Moderate Baseline)
HUMAN provides:
- Automated compliance configuration templates
- Control implementation documentation
- Continuous monitoring dashboards
- Incident response runbooks
Customer responsible for:
- Full FedRAMP authorization package
- Third-party assessment organization (3PAO) audit
- Continuous monitoring (ConMon) program
FedRAMP Support:
- HUMAN can provide FedRAMP compliance support: $15k/year
- Includes: Control mapping, documentation templates, audit support
Note: Full FedRAMP authorization is a 12-18 month process. HUMAN provides technical controls; customer owns authorization.
PCI-DSS Compliance
Applicable if: Processing, storing, or transmitting cardholder data
HUMAN provides:
- Network segmentation (Kubernetes network policies)
- Encrypted storage and transmission
- Access control and logging
- Vulnerability management guidance
Customer responsible for:
- PCI-DSS compliance validation (QSA or SAQ)
- Cardholder data environment (CDE) segmentation
- Regular penetration testing
GDPR Compliance
HUMAN provides:
- Data portability (export APIs)
- Right to erasure (deletion APIs)
- Data processing agreements (DPA)
- Privacy-by-design architecture
Customer responsible for:
- Lawful basis for processing
- Data subject consent management
- Data protection impact assessments (DPIA)
- GDPR compliance program
AIR-GAPPED OPERATIONS (EXTENDED)
Update Distribution Methods
For environments with no external connectivity:
Method 1: USB Transfer
- Download update bundle from HUMAN portal (authenticated)
- Transfer via USB to air-gapped environment
- Verify cryptographic signature
- Apply via installer CLI
Method 2: Secure FTP
- HUMAN pushes updates to customer-controlled SFTP
- Customer pulls to air-gapped environment
- Signature verification required
Method 3: Courier (High-Security)
- Physical media shipment for classified environments
- Tamper-evident packaging
- Chain-of-custody documentation
Update Bundle Contents
human-v1.2.0-airgapped.tar.gz (signed)
βββ helm-charts/ # Versioned Helm charts
βββ container-images/ # All Docker images (no registry pulls)
βββ database-migrations/ # SQL migration scripts
βββ installer-cli/ # Offline installer binary
βββ license-validator/ # Offline license validation
βββ checksums.txt # SHA256 of all files
βββ signature.sig # GPG signature for verification
Local LLM Integration
For air-gapped environments requiring AI capabilities:
Supported Local LLM Providers:
- Ollama (easiest setup)
- vLLM (high performance)
- LocalAI (model-agnostic)
Configuration:
llm:
provider: ollama
endpoint: http://ollama.internal:11434
model: llama2:70b
airgapped: true
fallback: none # No external API calls
Model Distribution:
- Models included in air-gapped bundle OR
- Customer downloads separately and transfers
Air-Gapped Certificate Management
Challenge: No external Certificate Authority (CA) access
Solution: Internal CA
tls:
ca: internal
certPath: /etc/human/certs/
keyPath: /etc/human/keys/
renewalStrategy: manual # No ACME in air-gapped
Process:
- Generate internal CA (one-time)
- Issue certificates for HUMAN components
- Distribute CA cert to all clients
- Manual renewal before expiry (alerts at 30/60/90 days)
Offline License Validation
Standard licensing: Phone-home validation (24hr interval)
Air-gapped licensing: Signed JWT with long expiry
license:
type: airgapped
key: <signed-jwt-with-6month-expiry>
validation: offline
renewal: manual # Requires new JWT from HUMAN
Renewal process:
- Generate renewal request (includes deployment ID)
- Transfer request to connected environment
- Submit to HUMAN portal
- Receive new signed JWT
- Transfer back to air-gapped environment
- Apply new license (zero-downtime)
Fallback & Degraded Mode
If critical services unavailable in air-gapped:
- LLM unavailable β Route to human-only workflow
- Monitoring unavailable β Local logging only
- License validation unavailable β Grace period (30 days warning)
Principle: System remains operational, degraded gracefully
PERFORMANCE BENCHMARKS & CAPACITY PLANNING
Capacity Planning Formulas
Kubernetes Nodes:
nodes_required = ceil(agent_count / 10) # 10 agents per node (8vCPU, 16GB)
+ 3 # Control plane nodes (HA)
+ 2 # Monitoring nodes
Database:
db_cpu = max(8, agent_count / 25) # 1 vCPU per 25 agents
db_memory_gb = max(32, agent_count * 0.5) # 500MB per agent
db_storage_gb = max(100, agent_count * 2) # 2GB per agent (logs, history)
Redis:
redis_memory_gb = agent_count * 0.1 # 100MB per agent (session cache)
Network Bandwidth:
bandwidth_mbps = agent_count * 5 # 5 Mbps per active agent
Example: 200 Agent Deployment
Kubernetes:
- Nodes: ceil(200/10) + 3 + 2 = 25 nodes (8vCPU, 16GB each)
- Total: 200 vCPU, 400GB RAM
Database:
- CPU: max(8, 200/25) = 8 vCPU
- Memory: max(32, 200*0.5) = 100GB
- Storage: max(100, 200*2) = 400GB
Redis:
- Memory: 200*0.1 = 20GB (clustered)
Network:
- Bandwidth: 200*5 = 1 Gbps
Estimated Infrastructure Cost:
- ~$8k/month (AWS pricing)
Performance Targets
| Metric | Target | Measurement |
|---|---|---|
| API Latency (p50) | <100ms | Time from request to response |
| API Latency (p99) | <500ms | 99th percentile |
| Agent Registration | <5s | Time to register new agent |
| Task Assignment | <2s | Time from task creation to assignment |
| Database Query (p95) | <50ms | 95th percentile query time |
| Failover Time | <30s | Primary node failure to recovery |
| Throughput | 10k req/s | Sustained request rate (per region) |
Load Testing Recommendations
Before production launch:
# Install k6 load testing tool
$ helm install k6-operator k6/k6-operator
# Run load test (simulates 100 agents)
$ k6 run --vus 100 --duration 30m load-test.js
Checks:
β
API latency p95 < 200ms
β
Error rate < 0.1%
β
Database connections stable
β
Memory usage < 80%
β
CPU usage < 70%
Performance Tuning
Database Tuning (PostgreSQL):
-- Increase connection pool
max_connections = 500
-- Optimize for read-heavy workload
shared_buffers = 8GB
effective_cache_size = 24GB
Kubernetes Tuning:
# Horizontal Pod Autoscaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Redis Tuning:
# Redis cluster mode for >10GB data
redis:
cluster:
enabled: true
nodes: 6 # 3 masters + 3 replicas
maxmemory: 20gb
maxmemory-policy: allkeys-lru
Monitoring Key Metrics
Infrastructure:
- CPU utilization (target: <70%)
- Memory utilization (target: <80%)
- Disk IOPS (target: <80% capacity)
- Network throughput
Application:
- API request rate
- API error rate (target: <0.1%)
- API latency (p50, p95, p99)
- Database query time
- Cache hit rate (target: >90%)
Business:
- Active agents
- Tasks completed per hour
- Agent utilization rate
- Escalation rate
MULTI-TENANCY IN SELF-HOSTED DEPLOYMENTS
Use Cases
Managed Service Providers (MSPs):
- MSP operates single HUMAN deployment
- Serves multiple client organizations
- Full isolation between clients
System Integrators (SIs):
- SI deploys HUMAN for multiple divisions/subsidiaries
- Shared infrastructure, isolated data
Holding Companies:
- Parent company runs HUMAN
- Subsidiaries use as tenants
- Centralized billing, distributed usage
Architecture: Namespace Isolation
graph TB
subgraph HUMAN_Control_Plane [HUMAN Control Plane]
TenantRouter[Tenant Router]
end
subgraph Tenant_A [Tenant A: Acme Corp]
NS_A[Namespace: tenant-acme]
Agents_A[Agents 1-50]
DB_A[Database Schema: acme]
end
subgraph Tenant_B [Tenant B: GlobalCo]
NS_B[Namespace: tenant-globalco]
Agents_B[Agents 1-100]
DB_B[Database Schema: globalco]
end
TenantRouter --> NS_A
TenantRouter --> NS_B
NS_A --> Agents_A
NS_A --> DB_A
NS_B --> Agents_B
NS_B --> DB_B
Isolation Guarantees
Network Isolation:
- Kubernetes NetworkPolicy (deny-all by default)
- Traffic between tenants blocked
- Ingress only via tenant-specific endpoints
Compute Isolation:
- Separate namespaces per tenant
- ResourceQuotas enforced (CPU, memory, pods)
- No shared pods between tenants
Data Isolation:
- Separate database schemas per tenant
- Row-level security (RLS) for shared tables
- Encryption keys unique per tenant
Access Isolation:
- Separate RBAC policies per tenant
- Tenant admins cannot access other tenants
- MSP admin has cross-tenant visibility (audit only)
Kubernetes Configuration
Namespace per Tenant:
apiVersion: v1
kind: Namespace
metadata:
name: tenant-acme
labels:
tenant-id: acme
msp-managed: "true"
ResourceQuota:
apiVersion: v1
kind: ResourceQuota
metadata:
name: acme-quota
namespace: tenant-acme
spec:
hard:
requests.cpu: "50" # 50 vCPU
requests.memory: 100Gi # 100GB RAM
pods: "100" # Max 100 pods
persistentvolumeclaims: "10"
NetworkPolicy:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-cross-tenant
namespace: tenant-acme
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
tenant-id: acme # Only same tenant
Licensing for Multi-Tenant MSPs
MSP License:
- Unlimited tenants
- Agent count = sum across all tenants
- Pricing: $100k/yr base + $200/agent/yr
Example:
- MSP serves 10 clients
- Total agents: 500
- Cost: $100k + (500 Γ $200) = $200k/yr
Alternative: Per-Tenant Licensing
- Each tenant purchases own license
- MSP provides infrastructure only
- HUMAN bills tenants directly
Security Considerations
MSP Responsibilities:
- Network segmentation enforcement
- Resource quota management
- Monitoring and alerting (per-tenant dashboards)
- Backup and disaster recovery (tenant data isolated)
Tenant Responsibilities:
- Application-level access control (who can use agents)
- Compliance with regulations (HIPAA, etc.)
- Agent configuration and management
HUMAN's Role:
- Provide secure multi-tenant architecture
- License enforcement (per tenant)
- Support MSP and tenants (tiered support model)
ENTERPRISE INTEGRATION PATTERNS
Identity Federation
Supported Protocols:
| Protocol | Use Case | Complexity | Recommended For |
|---|---|---|---|
| SAML 2.0 | Enterprise SSO | Medium | Large enterprises, government |
| OAuth2/OIDC | Modern apps | Low | Tech companies, SaaS |
| LDAP/AD | Legacy systems | High | Traditional enterprises |
SAML 2.0 Configuration:
auth:
provider: saml
saml:
entryPoint: https://idp.acme.com/sso
issuer: https://human.acme.internal
cert: /etc/human/saml/idp-cert.pem
identifierFormat: urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress
attributeMapping:
email: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress
firstName: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname
lastName: http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname
LDAP Configuration:
auth:
provider: ldap
ldap:
url: ldaps://ldap.acme.com:636
bindDN: cn=human-service,ou=services,dc=acme,dc=com
bindPassword: <secret>
searchBase: ou=users,dc=acme,dc=com
searchFilter: (uid={{username}})
groupSearchBase: ou=groups,dc=acme,dc=com
groupMemberAttribute: memberOf
Corporate Proxy Support
For enterprises with mandatory proxy:
network:
proxy:
http: http://proxy.acme.com:8080
https: http://proxy.acme.com:8080
noProxy:
- localhost
- 127.0.0.1
- .acme.internal
- .svc.cluster.local
caCerts:
- /etc/ssl/certs/acme-root-ca.crt
VPN & Private Connectivity
AWS Direct Connect:
- Private connection to HUMAN-hosted (hybrid deployment)
- Latency: <10ms
- Bandwidth: 1-100 Gbps
Azure ExpressRoute:
- Private peering to HUMAN control plane
- Redundant connections across regions
GCP Cloud Interconnect:
- Dedicated interconnect for high throughput
Site-to-Site VPN:
- IPsec tunnels for smaller deployments
- Encrypted traffic over internet
Custom Certificate Authority
For enterprises with internal CA:
tls:
ca: custom
customCA:
rootCert: /etc/human/ca/root.crt
intermediateCerts:
- /etc/human/ca/intermediate1.crt
- /etc/human/ca/intermediate2.crt
certManager:
enabled: true
issuer: acme-internal-ca
SIEM Integration
Supported SIEM Platforms:
Splunk:
logging:
siem:
provider: splunk
endpoint: https://splunk.acme.com:8088
token: <hec-token>
index: human_logs
sourcetype: human:json
Microsoft Sentinel:
logging:
siem:
provider: sentinel
workspaceId: <workspace-id>
sharedKey: <shared-key>
logType: HumanAgentLogs
IBM QRadar:
logging:
siem:
provider: qradar
endpoint: https://qradar.acme.com
syslogPort: 514
protocol: tcp
DLP Integration
For enterprises with Data Loss Prevention:
security:
dlp:
enabled: true
provider: symantec # or forcepoint, mcafee
endpoint: https://dlp.acme.com/api
scanOutbound: true
blockOnViolation: true
alertOnSuspicious: true
UPGRADE STRATEGY & BREAKING CHANGES
Release Cadence
| Release Type | Frequency | Version Change | Contents |
|---|---|---|---|
| Major | Annual | 1.0 β 2.0 | Breaking changes, new features |
| Minor | Quarterly | 1.1 β 1.2 | New features, no breaking changes |
| Patch | Monthly | 1.1.1 β 1.1.2 | Bug fixes, security patches |
Semantic Versioning
Format: MAJOR.MINOR.PATCH (e.g., 1.2.3)
- MAJOR: Breaking API changes, requires migration
- MINOR: New features, backward compatible
- PATCH: Bug fixes, security patches
Breaking Change Policy
Announcement: 90 days before release
Migration Guide: Published with announcement
Support: Old version supported for 12 months after new major release
Example Timeline:
- Day 0: Announce v2.0 (breaking changes)
- Day 90: Release v2.0
- Day 90-Day 455: Support both v1.x and v2.x
- Day 455: End support for v1.x
Upgrade Process (Zero-Downtime)
Blue-Green Deployment:
# 1. Deploy new version (green) alongside old (blue)
$ helm install human-v2 human/control-plane \
--namespace human-green \
--set version=2.0.0
# 2. Validate green environment
$ human-installer validate --namespace human-green
β
All health checks passed
# 3. Switch traffic to green (gradual)
$ kubectl patch ingress human --type merge \
-p '{"spec":{"rules":[{"host":"human.acme.internal","http":{"paths":[{"path":"/","pathType":"Prefix","backend":{"service":{"name":"api-gateway-green","port":{"number":8080}}}}]}}]}}'
# 4. Monitor for 24 hours
# 5. Decommission blue environment
$ helm uninstall human-v1 --namespace human
Compatibility Matrix
| Control Plane | Agent SDK | Database Schema | Supported |
|---|---|---|---|
| 1.2.x | 1.2.x | v1.2 | β Yes |
| 1.2.x | 1.1.x | v1.2 | β Yes (N-1 support) |
| 1.2.x | 1.0.x | v1.2 | β No (upgrade agents) |
| 2.0.x | 1.2.x | v2.0 | β No (breaking change) |
Policy: Control plane supports agent SDK from previous minor version (N-1).
Database Migration Safety
Automated Migrations:
- All migrations tested against copy of production data
- Rollback plan for every migration
- Execution time estimated and validated
Example Migration:
-- Migration: v1.2.0 β v1.3.0
-- Estimated time: 15 minutes (10M rows)
-- Rollback: Available (DROP COLUMN is reversible)
BEGIN;
-- Add new column
ALTER TABLE agents ADD COLUMN status_v2 VARCHAR(50);
-- Migrate data (batched)
UPDATE agents SET status_v2 = status WHERE status_v2 IS NULL;
-- Once validated, drop old column (future migration)
-- ALTER TABLE agents DROP COLUMN status;
COMMIT;
Automated Upgrade Testing
Pre-release validation:
# Run upgrade test suite
$ human-test-upgrade --from 1.2.0 --to 1.3.0
Tests:
β
Database migration (15m 32s)
β
API compatibility (all endpoints)
β
Agent SDK compatibility (1.2.x β 1.3.x)
β
Zero-downtime switchover
β
Rollback procedure
β
Performance benchmarks (no regression)
Result: Safe to upgrade
Upgrade Checklist
Pre-Upgrade:
- Review release notes and migration guide
- Backup all data (database, configs, secrets)
- Test upgrade in staging environment
- Schedule maintenance window (or plan zero-downtime)
- Notify users of potential disruption
During Upgrade:
- Deploy new version (blue-green)
- Run database migrations
- Validate new environment health
- Switch traffic gradually (10% β 50% β 100%)
- Monitor error rates and latency
Post-Upgrade:
- Validate all critical workflows
- Check monitoring dashboards
- Verify agent registration
- Confirm database performance
- Decommission old environment (after 24hr)
DAY 2 OPERATIONS FOR SELF-HOSTED
Operational Responsibilities
| Task | Frequency | Owner | Automation |
|---|---|---|---|
| Database backups | Daily | Customer Ops | Automated (Velero) |
| Security patches | Weekly | Customer Ops | Semi-automated (Helm) |
| Certificate renewal | 30 days before expiry | Customer Ops | Automated (cert-manager) |
| Capacity review | Monthly | Customer Ops | Dashboard-driven |
| Performance tuning | Quarterly | Customer Ops + HUMAN TAM | Guided |
| Disaster recovery drill | Quarterly | Customer Ops | Scripted |
| Compliance audit | Annual | Customer Compliance | HUMAN support available |
Automated Backup Strategy
Velero Configuration:
# Backup schedule
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: human-daily-backup
spec:
schedule: "0 2 * * *" # 2 AM daily
template:
includedNamespaces:
- human
- human-runtime
storageLocation: aws-s3
volumeSnapshotLocations:
- aws-ebs
ttl: 720h # 30 days retention
Database Backup:
# PostgreSQL automated backup (via cron)
0 2 * * * pg_dump -h postgres.acme.internal -U human human | \
gzip > /backups/human-$(date +\%Y\%m\%d).sql.gz
# Retention: 30 days local, 1 year S3
Disaster Recovery Procedures
Recovery Time Objective (RTO): 1 hour
Recovery Point Objective (RPO): 24 hours
DR Scenario 1: Database Failure
# 1. Promote read replica to primary
$ aws rds promote-read-replica --db-instance-identifier human-db-replica-1
# 2. Update connection string
$ kubectl patch configmap human-config \
-p '{"data":{"DB_HOST":"human-db-replica-1.xyz.rds.amazonaws.com"}}'
# 3. Restart affected pods
$ kubectl rollout restart deployment --all -n human
# RTO: ~15 minutes
DR Scenario 2: Complete Region Failure
# 1. Failover to DR region
$ kubectl config use-context human-dr-us-west-2
# 2. Restore from backup
$ velero restore create --from-backup human-daily-backup-20250101
# 3. Update DNS (Route53 or equivalent)
$ aws route53 change-resource-record-sets --hosted-zone-id Z123 \
--change-batch file://failover-dns.json
# RTO: ~1 hour
DR Drill Procedure
Quarterly drill (4 hours):
-
Hour 1: Simulate region failure
- Take primary region offline (controlled)
- Measure detection time (<5 min target)
-
Hour 2: Execute failover
- Promote DR region
- Restore from backup
- Validate data integrity
-
Hour 3: Validate DR environment
- Run health checks
- Test agent registration
- Verify API functionality
-
Hour 4: Failback to primary
- Sync data from DR to primary
- Switchback to primary region
- Debrief and document improvements
Monitoring & Alerting
Critical Alerts (Page on-call):
| Alert | Threshold | Response Time |
|---|---|---|
| Control plane down | >50% pods unhealthy | <5 min |
| Database connection failure | >10% error rate | <5 min |
| Disk space critical | >90% full | <15 min |
| Certificate expiring soon | <7 days to expiry | <24 hr |
| License expiring | <30 days to expiry | <24 hr |
Warning Alerts (Review next business day):
| Alert | Threshold | Response Time |
|---|---|---|
| High CPU usage | >80% for 30min | <4 hr |
| High memory usage | >85% for 30min | <4 hr |
| Slow API responses | p95 >500ms | <4 hr |
| Failed backups | 2 consecutive failures | <12 hr |
Performance Tuning (Monthly Review)
Checklist:
- Review database slow query log (optimize queries >100ms)
- Check cache hit rate (target: >90%)
- Analyze resource utilization (CPU, memory, disk)
- Review pod auto-scaling behavior (scale-up/down frequency)
- Check for pod restarts (investigate if >5/day)
- Review API error logs (investigate 4xx/5xx patterns)
Common Operations
Scale Up (Add Capacity):
# Add 10 more nodes
$ eksctl scale nodegroup --cluster=human-prod --nodes=35 --name=human-workers
# Adjust HPA max replicas
$ kubectl patch hpa human-api --patch '{"spec":{"maxReplicas":30}}'
Add Region (Multi-Region):
# Deploy to new region
$ human-installer install --region eu-west-1 --profile multi-region
# Configure cross-region replication
$ human-installer configure-replication \
--primary us-east-1 \
--replica eu-west-1 \
--mode async
Rotate Secrets:
# Rotate database password (zero-downtime)
$ human-installer rotate-secret --name DB_PASSWORD --zero-downtime
Steps:
1. Generate new password
2. Add to database (secondary user)
3. Update application to use new password
4. Remove old password from database
5. Validate no errors
IMPLEMENTATION DETAILS
For engineers building or deploying HUMAN agents, detailed implementation specs are available:
Setup Specifications
Each deployment profile has a complete implementation spec with infrastructure configs, monitoring setup, and deployment procedures:
-
Hosted Profile:
setup/agent_deployment_hosted_spec.md- Zero-config deployment flow
- What HUMAN manages (infrastructure, monitoring, security)
- API access and authentication
- Cost structure and visibility
-
Hybrid Profile:
setup/agent_deployment_hybrid_spec.md- Control plane in HUMAN Cloud, execution in customer VPC
- Secure tunnel configuration (mTLS, no inbound firewall rules)
- Monitoring options (push to HUMAN Cloud OR self-hosted)
- Data residency guarantees
-
Self-Hosted Profile: See implementation spec
setup/agent_deployment_selfhosted_spec.md- Complete infrastructure requirements
- Helm charts and Terraform modules
- Database setup and network topology
- Air-gapped deployment support
Monitoring Configurations
Comprehensive, copy-paste configs for all profiles:
setup/monitoring_configurations.md- Prometheus scraping configs (self-hosted)
- Grafana dashboard JSONs (fleet overview, cost analytics, audit trail)
- Alert rules (agent down, high error rate, budget alerts)
- Distributed tracing setup (Tempo integration)
- Log aggregation (Loki configuration)
Also see: KB 103 (Monitoring & Observability) for architectural overview and best practices.
Control Plane Architecture
setup/mara_humanos_control_plane_v0.2.md- Control plane deployment by profile
- Routing, policy engine, approval queue
- Async job system and workflow DAG construction
- Cross-profile consistency guarantees
Agent SDK Patterns
setup/human_agent_sdk_patterns_v0.2.mdhuman.call()primitive (works identically across all profiles)- Delegation and risk classification
- Context propagation and attestation generation
- Profile-aware SDK configuration
Quick Reference
| Need | Document |
|---|---|
| Deploy to Hosted (zero-config) | setup/agent_deployment_hosted_spec.md |
| Deploy to Hybrid (data sovereignty) | setup/agent_deployment_hybrid_spec.md |
| Deploy Self-Hosted (full control) | See implementation spec setup/agent_deployment_selfhosted_spec.md |
| Configure Prometheus/Grafana | setup/monitoring_configurations.md |
| Understand control plane | setup/mara_humanos_control_plane_v0.2.md |
| Build agents | KB 105 (Agent SDK Architecture), KB 130 (Design Patterns) |
MIGRATION PATHS: BORING AND REVERSIBLE
We bake migration paths in from day one:
From Hosted β Hybrid
Trigger: "We need attestations in our data lake" or "Compliance wants ledger in our region"
Process:
- Export Capability Graph state
- Export active policies
- Stand up ledger nodes in their VPC
- Configure HUMAN control plane to point to their ledger
- Test with shadow traffic
- Cut over
Downtime: Minutes (not hours)
Code changes required: Zero (just config)
From Hybrid β Self-Hosted
Trigger: "Audit says we need full operational control" or "We're going multi-cloud"
Process:
- Deploy HumanOS services via Helm/Terraform
- Deploy Capability Graph nodes
- Configure storage adapters (their RDS, S3, etc.)
- Migrate control plane state via export/import
- Point apps to new
HUMAN_BASE_URL - Decommission hosted control plane
Downtime: Hours (planned maintenance window)
Code changes required: URL + credential changes only
From Hosted β Self-Hosted (Skip Hybrid)
Trigger: "We're a 50-person healthcare startup, just got our first enterprise customer, need HIPAA self-hosted"
Process:
Same as Hosted β Hybrid β Self-hosted, but done in one shot with migration automation
Downtime: 1 day (planned)
Support: We provide migration engineer + runbooks
STORAGE ADAPTER ARCHITECTURE
Everything that persists state in HUMAN goes through a narrow interface:
The Three Stores
1. GraphStore
Stores: Capability Graph nodes and edges
Interface:
interface GraphStore {
addNode(node: CapabilityNode): Promise<void>;
addEdge(edge: CapabilityEdge): Promise<void>;
queryCapabilities(query: CapabilityQuery): Promise<Capability[]>;
updateCapability(id: string, update: CapabilityUpdate): Promise<void>;
}
Adapters:
HumanCloudGraphStore(our multi-tenant infra)PostgresGraphStore(customer RDS/Aurora)Neo4jGraphStore(native graph DB)TigerGraphStore(high-performance alternative)
2. PolicyStore
Stores: HumanOS policies, rules, escalation configs
Interface:
interface PolicyStore {
storePolicy(policy: Policy): Promise<void>;
getPolicy(id: string): Promise<Policy>;
evaluatePolicy(context: PolicyContext): Promise<PolicyDecision>;
listPolicies(filter: PolicyFilter): Promise<Policy[]>;
}
Adapters:
HumanCloudPolicyStorePostgresPolicyStoreS3PolicyStore(for large orgs with many policies)
3. LedgerStore
Stores: Attestations, provenance records, audit logs
Interface:
interface LedgerStore {
anchor(attestation: Attestation): Promise<AnchorReceipt>;
verify(id: string): Promise<VerificationResult>;
query(filter: AttestationFilter): Promise<Attestation[]>;
export(range: TimeRange): Promise<AuditExport>;
}
Adapters:
HumanCloudLedgerStore(hosted distributed ledger)LocalLedgerStore(dev/test)PrivateLedgerStore(customer-operated nodes)SnowflakeLedgerStore(enterprise data lake integration)
ONBOARDING FLOWS BY PROFILE
Hosted Onboarding (SMB)
Step 1: Sign up with Google / O365
- Auto-create HUMAN workspace tied to domain
Step 2: Install Companion
- Browser extension + desktop app
- Generates Passport keys locally on device
Step 3: Pick a starter pack
- "AI customer support with human escalation"
- "AI sales assistant with approval gates"
- "AI recruiting assistant with human screen"
Step 4: Connect existing tools
- OAuth to Gmail, Slack, CRM, etc.
- We store pointers, not content
From their POV: No talk of VPC, DBs, S3 buckets. It just⦠works.
Hybrid Onboarding (Enterprise)
Step 1: Start with Hosted for pilot
- Prove value with real workflows
- Security evaluates during pilot
Step 2: Deploy data plane components
- We provide Terraform modules
- They deploy ledger + caches in VPC
- Establish secure tunnel to HUMAN Cloud
Step 3: Migrate attestations
- Historical data exports to their ledger
- New attestations route to their infra
Step 4: Connect enterprise systems
- SSO integration (Okta, Azure AD)
- Private connectors to internal apps
- VPC peering for sensitive workloads
From their POV: Same app experience, but attestations stay in our cloud, data plane in theirs.
Self-Hosted Onboarding (Regulated)
Step 1: Architecture review
- HUMAN solutions architect + their platform team
- Define: regions, storage, networking, compliance requirements
Step 2: Deploy via IaC
- Helm charts for Kubernetes
- Terraform for AWS/GCP/Azure
- Ansible for on-prem
Step 3: Configure storage adapters
- Point to their RDS, S3, Neo4j, Snowflake, etc.
- Set retention policies, backup strategies
Step 4: Load test and validate
- Run simulated governance load
- Validate attestation integrity
- Test failover scenarios
Step 5: Cut over production apps
- Update
HUMAN_BASE_URLin app configs - Monitor dashboards for anomalies
From their POV: Full control, full visibility, HUMAN becomes infrastructure they operate.
WHAT STAYS THE SAME ACROSS ALL PROFILES
No matter which deployment profile, these don't change:
1. API Surface
Same REST/GraphQL/gRPC endpoints:
/v1/passport/*/v1/capabilities/*/v1/humanos/*/v1/attestations/*
2. SDKs
Same client libraries:
import { HumanClient } from '@human/sdk';
const client = new HumanClient({
baseUrl: process.env.HUMAN_BASE_URL // <-- only thing that changes
});
3. Semantics
Same policy language, same attestation format, same capability model
4. Developer Experience
Same docs, same examples, same onboarding tutorials
Result: Moving between profiles is a URL change, not a rewrite.
PRICING IMPLICATIONS BY PROFILE
Updated: 2025-12-19
The Principle: Self-Hosting Changes Margin Mix, Not Core Engines
Whether HumanOS runs fully on HUMAN Cloud, hybrid, or fully self-hosted, we still charge for governed infrastructure, workforce access, and network effects.
What changes: Who pays for infra and our margin per customer
What doesn't change: Whether we get paid
The sovereign cockpit model means orgs pay for:
- The Platform (HumanOS license, Policy Engine, Reasoning Service)
- The Standards (certification, attestation formats, compliance)
- The Network (optional: workforce services, marketplace, cross-org governance)
They DON'T pay for "permission to make decisions."
See 34_revenue_engines_and_tam.md for complete pricing tiers and revenue model.
Hosted (HUMAN Cloud)
What customer pays us:
- Platform license (based on tier: agents + instance capacity)
- Free: $0/month (3 agents, 10 instances)
- Starter: $49/month (10 agents, 50 instances)
- Professional: $199/month (50 agents, 200 instances)
- Business: $799/month (200 agents, 800 instances)
- Enterprise: $2,500+/month (custom)
- Infrastructure included (we run the compute)
- Optional: HUMAN-managed reasoning (we front AI token costs)
- Optional: Workforce services (when available in Phase 2)
What we pay (our COGS):
- Compute per instance-hour (infrastructure costs)
- AI tokens (if HUMAN-managed reasoning)
- Support overhead
β οΈ Pricing Validation Note:
Hosted Infrastructure Costs: Instance-hour allowances and overage pricing require validation against production AWS/GCP costs. The tier structure and features are validated. Self-hosted pricing (below) is fully validated.
Economics:
- Highest touch (we run everything)
- Target margin: 60-70% after scale
- Revenue: Platform license + infrastructure bundled
Customer profile:
- Small businesses (5-100 people)
- No IT/DevOps team
- Want "it just works"
- Comfortable with HUMAN-hosted
Example: 15-person law firm at $199/month Professional tier
- Gets 50 agents, 200 concurrent instances
- HUMAN handles all infrastructure
- Firm focuses on using agents, not running them
Hybrid (HUMAN Cloud + Customer Infrastructure)
What customer pays us:
- Platform license (same tiers as Hosted)
- Partial infrastructure (we host some, they host sensitive workloads)
- BYO keys (typically for on-prem reasoning)
- Optional: Workforce services
What we pay:
- Compute for HUMAN-hosted portion only
- Zero costs for their self-hosted portion
What customer pays (their costs):
- Their own infrastructure (VPC, compute for on-prem agents)
- Their own AI token costs (for on-prem reasoning)
Economics:
- Mixed margins (lower than pure hosted, higher than pure self-hosted)
- Revenue: Platform license + partial infrastructure
- Lower compute costs for us (they run sensitive stuff)
Customer profile:
- Mid-size orgs (100-500 people)
- Some IT capability
- Mix of sensitive and non-sensitive workloads
- Want flexibility (cloud for convenience, on-prem for compliance)
Example: 50-person hospital at $799/month Business tier
- Runs PHI-touching agents on-prem (clinical notes, patient data)
- Runs non-PHI agents on HUMAN Cloud (scheduling, billing)
- Gets HIPAA compliance built-in
- Hybrid = best of both worlds
Self-Hosted (Customer Infrastructure)
What customer pays us:
- Platform license only (based on agents/users/scale)
- No infrastructure fees (they run it)
- No per-instance charges (they pay their own compute)
- Support & certification (annual contract)
- Premium support included
- Quarterly business reviews
- Certification services
- Optional: Workforce services (when available)
- Optional: Marketplace (we take rev share on installed agents)
What we pay:
- Minimal control plane infrastructure (metadata only)
- Support team costs
What customer pays (their costs):
- All infrastructure (VPC, Kubernetes, databases, compute)
- All AI token costs (their BYO keys)
- Their own DevOps/SRE team
Economics:
- Lowest touch for us (they run it)
- Pure software licensing margins (80%+)
- Revenue: License + support + optional services
- Highest ACVs (enterprises pay more for control)
Customer profile:
- Large enterprises (500+ people)
- Mature platform engineering team
- Regulated industries (finance, healthcare, government)
- Want full control and data sovereignty
Example: 500-person bank at $30k/year Enterprise license
- Runs everything on their AWS
- Uses their own LLM cluster
- HUMAN provides: software license, certification, support
- Bank's total cost: $30k license + ~$40k their infra = $70k/year
- Bank's value: Replaced $500k BPO contract + $2M fraud savings = 40x ROI
Pricing Summary Table
| Deployment | Platform License | Infrastructure | Support | Our Margin | Customer ACL |
|---|---|---|---|---|---|
| Hosted | $49-799/mo tiers | Included | Email/Phone | 60-70% | $588-9.6k/year |
| Hybrid | Same tiers | Partial (we host some) | Business | 50-60% | $1k-15k/year |
| Self-Hosted | $30k+/year | Customer pays | Enterprise + TAM | 80%+ | $30k-100k+/year |
Key insight: Self-hosted has highest margin (pure software) but requires enterprise sales motion. Hosted has lower margin but scales via self-serve.
Cost Flows by Deployment Mode
Hosted:
Customer pays: $799/month (Business tier)
ββ To HUMAN: $799/month
ββ Platform license: $799
ββ Infrastructure: Included
ββ AI tokens: Included (up to allowance)
HUMAN pays:
ββ Compute: ~$200/month (infrastructure for their agents)
ββ AI tokens: ~$150/month (reasoning calls)
ββ Support: ~$50/month (allocated)
ββ Margin: ~$400/month (50%)
Hybrid:
Customer pays: $799/month + their AWS costs
ββ To HUMAN: $799/month
ββ Platform license: $799
ββ Infrastructure: Partial (non-sensitive agents)
ββ BYO keys for on-prem reasoning
ββ To AWS (their bill): ~$300/month
ββ VPC for on-prem agents
ββ Compute for sensitive workloads
ββ Their LLM endpoints
HUMAN pays:
ββ Compute: ~$100/month (only non-sensitive portion)
ββ Support: ~$50/month
ββ Margin: ~$650/month (81%)
Customer total cost: $1,099/month
Self-Hosted:
Customer pays: $30k/year license + their infrastructure
ββ To HUMAN: $30k/year ($2,500/month)
ββ Platform license: $30k
ββ Support & certification: Included
ββ Infrastructure: $0 (they run it)
ββ To their cloud provider: ~$40k/year
ββ Kubernetes cluster
ββ Databases
ββ Compute for agents
ββ Their LLM cluster
HUMAN pays:
ββ Support team: ~$300/month (allocated)
ββ Minimal infrastructure: ~$50/month (control plane metadata)
ββ Margin: ~$2,150/month (86%)
Customer total cost: $70k/year
Customer value delivered: $2.8M/year (savings + revenue)
ROI: 40x
Why This Model Works
For Small Businesses (Hosted):
- Zero infrastructure burden
- Predictable monthly cost
- Scale up as they grow
- Can migrate to hybrid/self-hosted later if needed
For Mid-Market (Hybrid):
- Best of both worlds
- Keep sensitive data on-prem
- Use cloud for convenience
- Optimize costs (don't pay us for compute they can run cheaper)
For Enterprises (Self-Hosted):
- Full control and sovereignty
- Data never leaves their infrastructure
- Regulatory compliance built-in
- Still get platform innovation (we ship updates)
For HUMAN:
- Hosted = lower margin, higher volume (SMB focus)
- Self-hosted = higher margin, lower volume (enterprise focus)
- Both are profitable at scale
- Revenue model survives regardless of deployment choice
Revenue Impact: Deployment Mix Over Time
Year 1 (Platform Launch):
- 80% Hosted (SMBs discovering product)
- 15% Hybrid (early mid-market)
- 5% Self-Hosted (pilot enterprises)
Year 2 (Enterprise Adoption):
- 60% Hosted (SMB growth continues)
- 25% Hybrid (mid-market standard)
- 15% Self-Hosted (enterprise momentum)
Year 3 (Enterprise Dominance):
- 40% Hosted (by customer count, but lower ACVs)
- 30% Hybrid (sweet spot for many)
- 30% Self-Hosted (by revenue, highest ACVs)
Revenue distribution shifts even as customer mix doesn't:
- Hosted customers: Many, but $49-799/month each
- Self-hosted customers: Few, but $30k-100k/year each
By Year 3:
- 10,000 hosted customers Γ $200/month avg = $24M ARR
- 500 hybrid customers Γ $1,000/month avg = $6M ARR
- 200 self-hosted customers Γ $50k/year avg = $10M ARR
- Total Platform Revenue: $40M ARR
Self-hosted is 2% of customers but 25% of platform revenue (and highest margin).
DECISION CRITERIA: WHICH PROFILE SHOULD A CUSTOMER CHOOSE?
| Factor | Hosted | Hybrid | Self-Hosted |
|---|---|---|---|
| Team Size | <200 | 200β5,000 | >1,000 or regulated |
| Infra Team | None / small | Exists | Mature platform eng |
| Data Sensitivity | LowβMedium | MediumβHigh | Highest |
| Compliance | General | Industry-specific | Regulated (HIPAA, FedRAMP) |
| Speed to Value | Minutes | Days | Weeks |
| OpEx Preference | High (pay us) | Mixed | Low (run it themselves) |
| CapEx Willingness | None | Some | High |
| Vendor Lock-in Concern | Low | Medium | High |
AVOIDING HOSTING AS A BARRIER
The traditional problem:
- Big enterprises: "Cool idea, but it has to run in our VPC"
- SMB: "Please don't make me think about any of that"
Our solution:
- SMB: "It just works, you never see infra"
- Enterprise: "Same APIs, runs in your VPC when you're ready"
The messaging becomes:
For SMB:
"Start here, you never have to touch infra."
For Enterprise:
"Start here, prove value, then shift into your VPC with the same code."
STORAGE AS NON-ISSUE
When an enterprise says: "We only use Snowflake / RDS / Azure SQL / Splunk"
We say: "Cool β here are the adapters, here's a reference deployment, your apps don't change."
The adapter pattern means:
- HUMAN Cloud: optimized multi-tenant storage
- Customer-hosted: we support their preferred vendors
- Migration: export from ours, import to theirs, done
Result: Storage preference becomes a config option, not a deal-breaker.
WHY THIS ARCHITECTURE WORKS
1. Clean Boundaries
- Devices own keys (Layer 0)
- HUMAN owns coordination (Layer 1)
- Customers own data (Layer 2)
These layers never blur.
2. Pluggable Storage
- Everything behind narrow interfaces
- Swap PostgreSQL for Neo4j? Config change.
- Add Snowflake export? New adapter.
3. Same Semantics Everywhere
- Hosted, Hybrid, Self-hosted: same protocol
- No "enterprise edition" with different behavior
- Migration is boring (the best kind of boring)
4. Revenue Flexibility
- SMB: SaaS economics (high margin)
- Enterprise: Mixed (medium margin, high ACV)
- Self-hosted: Services (lower margin, highest ACV)
Every segment has a profitable path.
HUMAN'S OWN PRODUCTION INFRASTRUCTURE (4-NINES ARCHITECTURE)
This section describes how HUMAN operates its own Hosted profile infrastructure to achieve 99.99% availability.
Multi-Region Active-Active Architecture
HUMAN targets 99.99% (4 nines) availability = 4.3 minutes downtime/month.
To achieve this, HUMAN operates multi-region active-active (not active-passive):
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Global DNS (Route53) β
β Latency-based routing + health checks (10s interval) β
βββββββββββββ¬βββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β β
βββββββββΌβββββββββ ββββββββΌβββββββββββββ
β US-East-1 ββββββββββββββββΊβ US-West-2 β
β (Active) β Replication β (Active) β
ββββββββββββββββββ€ <1s lag βββββββββββββββββββββ€
β β’ EKS: 10 pods β β β’ EKS: 10 pods β
β β’ Load: 50% β β β’ Load: 50% β
β β’ RDS: Primary β β β’ RDS: Replica β
β β’ Redis: Pri β β β’ Redis: Replica β
ββββββββββββββββββ βββββββββββββββββββββ
β β
ββββββββββββ¬ββββββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββ
β Global State β
β β’ DynamoDB (global) β
β β’ S3 (multi-region) β
βββββββββββββββββββββββββ
Key Characteristics:
- Both regions serve live traffic (50% each)
- Either region can handle 100% load (capacity buffer)
- Automated failover <30 seconds if one region fails
- No single points of failure (distributed across 3+ AZs per region)
- Data replicated in real-time (<1s lag)
Cost Impact:
- Single region: ~$3,500/month
- Multi-region active-active: ~$7,500/month
- Additional cost: $4,000/month for 4-nines availability
- ROI: Prevents customer SLA breaches and reputation damage
Regional Failover Automation
Terraform Configuration:
# terraform/modules/region/main.tf
module "us_east_1" {
source = "./modules/region"
region = "us-east-1"
environment = "production"
is_primary = true # RDS primary (write)
eks_node_count = 10
rds_instance_class = "db.r6g.2xlarge"
rds_multi_az = true
replicate_to = ["us-west-2"]
}
module "us_west_2" {
source = "./modules/region"
region = "us-west-2"
environment = "production"
is_primary = false # RDS read replica (can be promoted)
eks_node_count = 10
rds_instance_class = "db.r6g.2xlarge"
rds_multi_az = true
replicate_from = "us-east-1"
}
# Route53 health checks
resource "aws_route53_health_check" "us_east_1" {
fqdn = "api.us-east-1.human.ai"
port = 443
type = "HTTPS"
resource_path = "/health"
request_interval = 10 # Check every 10 seconds
failure_threshold = 2 # Fail after 2 consecutive failures (20s)
tags = {
Name = "US-East-1 Health Check"
}
}
resource "aws_route53_health_check" "us_west_2" {
fqdn = "api.us-west-2.human.ai"
port = 443
type = "HTTPS"
resource_path = "/health"
request_interval = 10
failure_threshold = 2
tags = {
Name = "US-West-2 Health Check"
}
}
# Global DNS with latency-based routing
resource "aws_route53_record" "api" {
zone_id = aws_route53_zone.human_ai.id
name = "api.human.ai"
type = "A"
set_identifier = "us-east-1"
latency_routing_policy {
region = "us-east-1"
}
health_check_id = aws_route53_health_check.us_east_1.id
alias {
name = module.us_east_1.load_balancer_dns
zone_id = module.us_east_1.load_balancer_zone_id
evaluate_target_health = true
}
}
resource "aws_route53_record" "api_west" {
zone_id = aws_route53_zone.human_ai.id
name = "api.human.ai"
type = "A"
set_identifier = "us-west-2"
latency_routing_policy {
region = "us-west-2"
}
health_check_id = aws_route53_health_check.us_west_2.id
alias {
name = module.us_west_2.load_balancer_dns
zone_id = module.us_west_2.load_balancer_zone_id
evaluate_target_health = true
}
}
Automated RDS Failover Lambda:
// lambda/regional-failover.ts
export async function handleRegionalFailover(event: CloudWatchEvent) {
const unhealthyRegion = event.detail.region;
logger.critical({ unhealthyRegion }, 'Regional failover triggered');
if (unhealthyRegion === 'us-east-1') {
// Promote us-west-2 RDS replica to primary
await rds.promoteReadReplica({
DBClusterIdentifier: 'human-production-us-west-2',
});
logger.info('RDS replica promoted to primary');
// Update Route53 weights (100% to us-west-2)
await route53.changeResourceRecordSets({
HostedZoneId: ZONE_ID,
ChangeBatch: {
Changes: [
{
Action: 'UPSERT',
ResourceRecordSet: {
Name: 'api.human.ai',
Type: 'A',
SetIdentifier: 'us-east-1',
Weight: 0, // Stop sending to us-east-1
},
},
{
Action: 'UPSERT',
ResourceRecordSet: {
Name: 'api.human.ai',
Type: 'A',
SetIdentifier: 'us-west-2',
Weight: 100, // Send 100% to us-west-2
},
},
],
},
});
logger.info('Route53 updated to route to us-west-2');
}
// Page on-call
await pagerduty.trigger({
severity: 'critical',
summary: `AUTOMATED REGIONAL FAILOVER: ${unhealthyRegion} β healthy region`,
details: {
unhealthyRegion,
estimatedDowntime: '20-30 seconds',
rdsProm oted: true,
dnsUpdated: true,
},
});
// Log to provenance
await provenance.log({
actor: 'automation:regional-failover',
action: 'promote_secondary_region',
from: unhealthyRegion,
automated: true,
});
}
Database Multi-Region Strategy
Aurora Global Database:
# Primary cluster (us-east-1)
resource "aws_rds_cluster" "primary" {
cluster_identifier = "human-production-primary"
engine = "aurora-postgresql"
engine_version = "15.3"
engine_mode = "provisioned"
master_username = "human_admin"
master_password = data.aws_secretsmanager_secret_version.db_password.secret_string
# Multi-AZ within region
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
# Global database for cross-region replication
global_cluster_identifier = "human-production-global"
# Automated backups
backup_retention_period = 30
preferred_backup_window = "03:00-04:00"
# Connection pooling
db_cluster_parameter_group_name = aws_rds_cluster_parameter_group.human_pg.name
# Enable Performance Insights
enabled_cloudwatch_logs_exports = ["postgresql"]
}
# Secondary cluster (us-west-2) - read replica
resource "aws_rds_cluster" "secondary" {
provider = aws.us_west_2
cluster_identifier = "human-production-secondary"
engine = "aurora-postgresql"
engine_version = "15.3"
# Replicate from primary
replication_source_identifier = aws_rds_cluster.primary.arn
# Can be promoted to primary on failover
global_cluster_identifier = "human-production-global"
# Multi-AZ
availability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"]
}
# Connection pooling with pgbouncer
resource "aws_ecs_service" "pgbouncer" {
name = "pgbouncer"
cluster = aws_ecs_cluster.human_production.id
task_definition = aws_ecs_task_definition.pgbouncer.arn
desired_count = 3
load_balancer {
target_group_arn = aws_lb_target_group.pgbouncer.arn
container_name = "pgbouncer"
container_port = 6432
}
}
Replication Lag Monitoring:
# Prometheus alert for replication lag
- alert: RDSReplicationLagHigh
expr: |
aws_rds_replica_lag_seconds{cluster="human-production"} > 5
for: 2m
labels:
severity: warning
annotations:
summary: "RDS replication lag >5 seconds"
description: "Replication lag {{ $value }}s may impact failover RTO"
action: "Investigate replication performance"
Zero-Downtime Deployment with Terraform
Kubernetes Blue/Green via Terraform:
# Blue deployment (current production)
resource "kubernetes_deployment" "companion_api_blue" {
metadata {
name = "companion-api-blue"
labels = {
app = "companion-api"
deployment = "blue"
}
}
spec {
replicas = 10
selector {
match_labels = {
app = "companion-api"
deployment = "blue"
}
}
template {
metadata {
labels = {
app = "companion-api"
deployment = "blue"
version = var.current_version
}
}
spec {
container {
name = "companion-api"
image = "human/companion-api:${var.current_version}"
resources {
requests {
cpu = "500m"
memory = "512Mi"
}
limits {
cpu = "1000m"
memory = "1Gi"
}
}
}
}
}
}
}
# Green deployment (new version, starts at 0 replicas)
resource "kubernetes_deployment" "companion_api_green" {
metadata {
name = "companion-api-green"
labels = {
app = "companion-api"
deployment = "green"
}
}
spec {
replicas = var.deploy_active ? 10 : 0 # Controlled by deploy script
selector {
match_labels = {
app = "companion-api"
deployment = "green"
}
}
template {
metadata {
labels = {
app = "companion-api"
deployment = "green"
version = var.new_version
}
}
spec {
container {
name = "companion-api"
image = "human/companion-api:${var.new_version}"
resources {
requests {
cpu = "500m"
memory = "512Mi"
}
limits {
cpu = "1000m"
memory = "1Gi"
}
}
}
}
}
}
}
# Service points to blue or green
resource "kubernetes_service" "companion_api" {
metadata {
name = "companion-api"
}
spec {
selector = {
app = "companion-api"
deployment = var.active_deployment # "blue" or "green"
}
port {
port = 80
target_port = 3000
}
type = "ClusterIP"
}
}
Deploy Script with Automated Rollback:
#!/bin/bash
# scripts/deploy-production.sh
set -e
NEW_VERSION=$1
# 1. Update green deployment to new version
terraform apply \
-var="new_version=${NEW_VERSION}" \
-var="deploy_active=true" \
-target=kubernetes_deployment.companion_api_green
# 2. Wait for green pods ready
kubectl wait --for=condition=available \
deployment/companion-api-green \
--timeout=5m
# 3. Smoke tests
curl -f http://companion-api-green:3000/health || exit 1
# 4. Switch traffic (instant)
terraform apply \
-var="active_deployment=green"
# 5. Monitor for 5 minutes
sleep 300
ERROR_RATE=$(curl -s "http://prometheus:9090/api/v1/query" \
--data-urlencode 'query=sum(rate(http_requests_total{version="'${NEW_VERSION}'",status=~"5.."}[5m])) / sum(rate(http_requests_total{version="'${NEW_VERSION}'"}[5m]))' \
| jq -r '.data.result[0].value[1]')
if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
echo "β Rollback: error rate ${ERROR_RATE} > 1%"
# Instant rollback
terraform apply -var="active_deployment=blue"
exit 1
fi
# 6. Success - scale down blue
terraform apply -var="deploy_active=false" \
-target=kubernetes_deployment.companion_api_blue
echo "β
Deploy complete"
Infrastructure State Management
Remote State Backend:
# terraform/backend.tf
terraform {
backend "s3" {
bucket = "human-terraform-state"
key = "production/terraform.tfstate"
region = "us-east-1"
# State locking
dynamodb_table = "terraform-state-lock"
encrypt = true
# Versioning enabled on S3 bucket
}
}
# State locking table
resource "aws_dynamodb_table" "terraform_state_lock" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Name = "Terraform State Lock"
Environment = "production"
}
}
Deployment Rollout Timeline
| Month | Milestone | Status |
|---|---|---|
| Month 1 | Deploy US-East + US-West simultaneously | Pending approval (+$4k/month cost) |
| Month 1 | Terraform IaC for all infrastructure | Pending |
| Month 1 | Blue/green deployment automation | Pending |
| Month 2 | Regional failover tested monthly | Pending |
| Month 3 | Chaos engineering: kill region monthly | Pending |
| Month 6 | Add EU-West (3-region active-active) | Planning |
See Also:
kb/102_performance_engineering_guide.md- 4-nines architecture overviewkb/103_monitoring_and_observability_setup.md- Multi-region observabilitykb/129_ai_driven_operations_strategy.md- AI-driven deployment automation
CROSS-REFERENCES
- See:
26_hybrid_stack_architecture.md- Conceptual architecture and design philosophy - See:
49_devops_and_infrastructure_model.md- Operational infrastructure and multi-cloud strategy - See:
11_engineering_blueprint.md- System layers and component architecture - See:
107_developer_adoption_playbook.md- How deployment flexibility supports developer GTM - See:
109_pricing_mechanics_and_billing.md- How deployment profiles affect pricing - See:
43_haio_developer_architecture.md- API architecture that works across all profiles
Metadata
Created: November 26, 2025
Version: 1.0
Strategic Purpose: Enable every customer segment with zero-regret hosting
Audience: Technical decision-makers, solutions architects, platform teams
Related Docs: 26, 49, 11, 107, 109, 43
Line Count: ~590 lines
Status: β
Complete - Deployment Models and Hosting Strategy