Deployment

latchgate up starts the gate in embedded mode (SQLite + embedded policy) with zero external dependencies. For production with HA replay and defense-in-depth egress, use latchgate up --infra or manage Redis, OPA, and Squid yourself and start with latchgate serve.

Production checklist

Before deploying LatchGate to production, verify:

Pre-flight check

Run latchgate doctor before starting the gate to verify all dependencies and configuration are correct:

latchgate doctor

This checks Redis connectivity, OPA reachability, egress proxy reachability, provider module digests, manifest integrity, SOPS binary availability (when sops_secrets_file is configured), and WASM host capabilities. See CLI Reference for details.

Recommended architecture

┌────────────────────────────────────────┐
│  Agent container / VM                  │
│                                        │
│  agent process                         │
│    │                                   │
│    │ UDS                               │
│    ▼                                   │
│  /run/latchgate/gate.sock              │
│    │                                   │
│  latchgate serve                       │
│    │                                   │
│    ├── Redis (replay, budgets)         │
│    ├── OPA (policy)                    │
│    └── Squid (egress proxy -- required │
│              for proxy_allowlist)      │
└────────────────────────────────────────┘
         │
         │ host I/O (HTTP via Squid in v0.1; SMTP, SQL, AMQP, S3 planned)
         ▼
    external systems

The agent process communicates with LatchGate exclusively over a Unix domain socket. The agent has no direct network access to external systems — all side effects go through LatchGate’s host I/O layer.

Transport

Client socket

Agent processes connect to the client socket:

listen_uds_path = "/run/latchgate/gate.sock"

This exposes: POST /v1/leases, GET /.well-known/jwks.json, POST /v1/actions/{id}/execute, GET /v1/actions, GET /v1/actions/{id}, GET /v1/actions/{id}/schema/request, GET /v1/approvals/{id}/poll, GET /v1/receipts/{id}, and health endpoints.

Admin socket

Operator tools (CLI, dashboards) connect to the admin socket:

listen_admin_uds_path = "/run/latchgate/gate-admin.sock"

This exposes: approval endpoints, audit queries, receipt retrieval (with operator auth), revocation, receipt key export, domain management, path management, policy ACL management, and metrics. Agent processes cannot reach admin APIs.

Rate limits: 20 req/s on operator write endpoints, 100 req/s on operator read endpoints (token-bucket, per-process).

Note: The receipt endpoint is available on both sockets with different auth models. Client socket uses lease-based DPoP auth. Admin socket uses operator auth. Both return identical response bodies.

Why UDS?

Unix domain sockets provide kernel-enforced caller identity via SO_PEERCRED. The kernel guarantees the peer UID — it cannot be forged. This is the foundation for peercred identity (mapping UIDs to principals without any client-side authentication).

Identity: peercred setup

Map each agent’s Unix UID to a principal name and scope set:

[identity]
provider = "peercred"

[identity.peercred]
allow_unmapped = false

[identity.peercred.principals]
1001 = { principal = "agent-support", scopes = ["tools:call"], owner = "alice@company.com" }
1002 = { principal = "agent-ops", scopes = ["tools:call", "db:query"], owner = "bob@company.com" }

With allow_unmapped = false, any UID not in the map is denied at lease issuance.

Key management

Signing keys

LatchGate uses two Ed25519 signing keys:

Receipt signing key — signs ExecutionReceipts for the evidence ledger
Grant signing key — signs ExecutionGrants (separate key for defense-in-depth)

Keys are auto-generated on first run (32-byte seed, mode 0600). Back them up. If a receipt key is lost, receipts signed with it become unverifiable.

On load, LatchGate checks that key files have not been widened beyond 0600 (owner-only). If group or world bits are set, a SECURITY warning is emitted to structured logs. This matches OpenSSH’s private key permission check behavior.

Key rotation

When the receipt signing key is rotated, the old verifying key is appended to the JWKS file. Receipts carry a signing_key_id (kid) and the /v1/receipt-keys endpoint returns all historical verifying keys. Old receipts remain verifiable.

Never delete a verifying key from receipt-keys.jwks unless every receipt signed with that key has been externally verified and archived.

Secrets

Secrets for action execution are stored in a SOPS-encrypted file and decrypted just-in-time. See Secrets Management for setup, rotation, and encryption backend options.

Egress proxy

For defense-in-depth, configure a Squid forward proxy for outbound HTTP from WASM providers. When actions use proxy_allowlist but no proxy is configured, the gate starts with a warning and uses kernel-only enforcement (Layer 1: sink validation + SSRF protection + manifest domain allowlists). The proxy adds an independent Layer 2 backstop.

egress_proxy_url = "http://squid.internal:3128"

See Egress Proxy for the full setup: allowlist generation, Squid configuration, troubleshooting, and how the kernel + proxy layers cooperate.

Docker

Pre-built images are published to GHCR on every release:

docker pull ghcr.io/latchgate-ai/latchgate:latest
docker pull ghcr.io/latchgate-ai/latchgate:0.1.0     # pinned version

The runtime image includes a Docker HEALTHCHECK instruction that polls /healthz every 10 seconds. Container orchestrators (ECS, Compose, Swarm) use this to detect unresponsive instances and trigger restarts automatically.

Docker Compose profiles

The docker-compose.yml at the repo root uses Compose profiles to opt into optional services. Pick one based on what you need:

docker compose up                              # core deps only: redis + opa
docker compose --profile dev up                # core deps + Squid + Prometheus (gate runs on host)
docker compose --profile quickstart up         # full self-contained stack: gate + redis + opa

Default (no profile) — starts only Redis and OPA. Use this when you run latchgate serve on the host and only need its dependencies.
--profile dev — adds the Squid egress proxy and Prometheus alongside the core deps. The gate itself still runs on the host, exercising the egress proxy locally.
--profile quickstart — runs the gate inside Docker too, in a single self-contained stack. Enables HTTP transport for easy demo access. Not for production — production deployments must use UDS with no HTTP exposure.

To build the image from source instead of pulling from GHCR:

docker build -t latchgate .

Kill switch

In an emergency, revoke all active leases and grants:

latchgate revoke

The kill-switch requires operator DPoP authentication. Use the CLI which handles DPoP proof construction automatically, or call the API with Authorization: DPoP <key> and DPoP: <proof> headers.

This advances the revocation epoch. All leases and grants from prior epochs are immediately invalid. Agents must re-authenticate.

Monitoring

/healthz — liveness probe (returns {"status":"ok"})
/readyz — readiness probe (returns 503 until all startup checks pass)
/v1/admin/status — operational status snapshot: version, uptime, dependency health, pending approvals, unresolved intents, revocation epoch, webhook state (admin socket, operator auth required)
/metrics — Prometheus-format metrics (admin socket only)
JSONL audit export for SIEM integration
Outbound webhooks for real-time alerting on approvals, denials, revocations, and failures

Key metrics to alert on

latchgate_unresolved_intents — should be 0 in steady state; non-zero indicates evidence gaps
latchgate_webhook_outbox_pending — growing trend indicates webhook delivery issues
latchgate_oldest_pending_approval_seconds — growing trend indicates operator response delays
latchgate_audit_write_error_total — any increment is a critical incident
latchgate_budget_exhausted_total — indicates undersized budgets or runaway agents
readyz_degraded_total{reason="..."} — per-cause degradation counters

Graceful shutdown

Orchestrators (systemd, Kubernetes) should follow this sequence:

Call POST /v1/admin/drain (the gate refuses new requests with 503)
Poll /v1/admin/status until in_flight_executions == 0
Send SIGTERM

The gate on SIGTERM without a prior drain will wait up to 30 seconds for in-flight executions to complete before aborting. Aborted executions produce unresolved intents — avoid this path.

For configuration reference, see Configuration. For the full threat model, see Security Model. For secrets setup, see Secrets Management. For egress proxy setup, see Egress Proxy.