Skip to content

Deployment

latchgate up starts the gate in embedded mode (SQLite + embedded policy) with zero external dependencies. For production with HA replay and defense-in-depth egress, use latchgate up --infra or manage Redis, OPA, and Squid yourself and start with latchgate serve.

Before deploying LatchGate to production, verify:

  • Identity provider is peercred (or OIDC/mTLS when available), not none
  • Named operator credentials with DPoP ([operator_credentials.NAME]), not shared key
  • Signing keys are persisted to disk (receipt_signing_key_path, grant_signing_key_path)
  • Receipt keys JWKS file is persisted (receipt_keys_jwks_path)
  • response_schema_enforcement = "deny"
  • TCP listener is disabled (no unsafe_expose_http)
  • Binary is a release build (not compiled with --features unsafe-dev)
  • Redis is reachable and persistent (appendonly yes)
  • OPA is reachable with the policy bundle loaded
  • Signing key files are backed up
  • sops_secrets_file is set for actions that declare secrets (see Secrets Management)
  • sops_key_file is set (or SOPS backend auth is configured via environment)
  • Age key file has restricted permissions (chmod 600)
  • egress_proxy_url is set AND Squid is running, if any action uses proxy_allowlist (strongly recommended — see Egress Proxy). Without it, kernel-only enforcement (Layer 1) applies and a startup warning is emitted.
  • latchgate doctor passes all checks including SOPS and egress proxy
  • Evidence ledger backup schedule configured (SQLite online backup)
  • Monitoring alerts on latchgate_unresolved_intents > 0 (see Troubleshooting)
  • Monitoring alerts on webhook_outbox_pending growth (see Webhooks)

Run latchgate doctor before starting the gate to verify all dependencies and configuration are correct:

Terminal window
latchgate doctor

This checks Redis connectivity, OPA reachability, egress proxy reachability, provider module digests, manifest integrity, SOPS binary availability (when sops_secrets_file is configured), and WASM host capabilities. See CLI Reference for details.

┌────────────────────────────────────────┐
│ Agent container / VM │
│ │
│ agent process │
│ │ │
│ │ UDS │
│ ▼ │
│ /run/latchgate/gate.sock │
│ │ │
│ latchgate serve │
│ │ │
│ ├── Redis (replay, budgets) │
│ ├── OPA (policy) │
│ └── Squid (egress proxy — required │
│ for proxy_allowlist) │
└────────────────────────────────────────┘
│ host I/O (HTTP via Squid in v0.1; SMTP, SQL, AMQP, S3 planned)
external systems

The agent process communicates with LatchGate exclusively over a Unix domain socket. The agent has no direct network access to external systems — all side effects go through LatchGate’s host I/O layer.

Agent processes connect to the client socket:

listen_uds_path = "/run/latchgate/gate.sock"

This exposes: POST /v1/leases, GET /.well-known/jwks.json, POST /v1/actions/{id}/execute, GET /v1/actions, GET /v1/actions/{id}, GET /v1/actions/{id}/schema/request, GET /v1/approvals/{id}/poll, GET /v1/receipts/{id}, and health endpoints.

Operator tools (CLI, dashboards) connect to the admin socket:

listen_admin_uds_path = "/run/latchgate/gate-admin.sock"

This exposes: approval endpoints, audit queries, receipt retrieval (with operator auth), revocation, receipt key export, domain management, path management, policy ACL management, and metrics. Agent processes cannot reach admin APIs.

Rate limits: 20 req/s on operator write endpoints, 100 req/s on operator read endpoints (token-bucket, per-process).

Note: The receipt endpoint is available on both sockets with different auth models. Client socket uses lease-based DPoP auth. Admin socket uses operator auth. Both return identical response bodies.

Unix domain sockets provide kernel-enforced caller identity via SO_PEERCRED. The kernel guarantees the peer UID — it cannot be forged. This is the foundation for peercred identity (mapping UIDs to principals without any client-side authentication).

Map each agent’s Unix UID to a principal name and scope set:

[identity]
provider = "peercred"
[identity.peercred]
allow_unmapped = false
[identity.peercred.principals]
1001 = { principal = "agent-support", scopes = ["tools:call"], owner = "alice@company.com" }
1002 = { principal = "agent-ops", scopes = ["tools:call", "db:query"], owner = "bob@company.com" }

With allow_unmapped = false, any UID not in the map is denied at lease issuance.

LatchGate uses two Ed25519 signing keys:

  • Receipt signing key — signs ExecutionReceipts for the evidence ledger
  • Grant signing key — signs ExecutionGrants (separate key for defense-in-depth)

Keys are auto-generated on first run (32-byte seed, mode 0600). Back them up. If a receipt key is lost, receipts signed with it become unverifiable.

On load, LatchGate checks that key files have not been widened beyond 0600 (owner-only). If group or world bits are set, a SECURITY warning is emitted to structured logs. This matches OpenSSH’s private key permission check behavior.

When the receipt signing key is rotated, the old verifying key is appended to the JWKS file. Receipts carry a signing_key_id (kid) and the /v1/receipt-keys endpoint returns all historical verifying keys. Old receipts remain verifiable.

Never delete a verifying key from receipt-keys.jwks unless every receipt signed with that key has been externally verified and archived.

Secrets for action execution are stored in a SOPS-encrypted file and decrypted just-in-time. See Secrets Management for setup, rotation, and encryption backend options.

For defense-in-depth, configure a Squid forward proxy for outbound HTTP from WASM providers. When actions use proxy_allowlist but no proxy is configured, the gate starts with a warning and uses kernel-only enforcement (Layer 1: sink validation + SSRF protection + manifest domain allowlists). The proxy adds an independent Layer 2 backstop.

egress_proxy_url = "http://squid.internal:3128"

See Egress Proxy for the full setup: allowlist generation, Squid configuration, troubleshooting, and how the kernel + proxy layers cooperate.

Pre-built images are published to GHCR on every release:

Terminal window
docker pull ghcr.io/latchgate-ai/latchgate:latest
docker pull ghcr.io/latchgate-ai/latchgate:0.1.0 # pinned version

The runtime image includes a Docker HEALTHCHECK instruction that polls /healthz every 10 seconds. Container orchestrators (ECS, Compose, Swarm) use this to detect unresponsive instances and trigger restarts automatically.

The docker-compose.yml at the repo root uses Compose profiles to opt into optional services. Pick one based on what you need:

Terminal window
docker compose up # core deps only: redis + opa
docker compose --profile dev up # core deps + Squid + Prometheus (gate runs on host)
docker compose --profile quickstart up # full self-contained stack: gate + redis + opa
  • Default (no profile) — starts only Redis and OPA. Use this when you run latchgate serve on the host and only need its dependencies.
  • --profile dev — adds the Squid egress proxy and Prometheus alongside the core deps. The gate itself still runs on the host, exercising the egress proxy locally.
  • --profile quickstart — runs the gate inside Docker too, in a single self-contained stack. Enables HTTP transport for easy demo access. Not for production — production deployments must use UDS with no HTTP exposure.

To build the image from source instead of pulling from GHCR:

Terminal window
docker build -t latchgate .

In an emergency, revoke all active leases and grants:

Terminal window
latchgate revoke

The kill-switch requires operator DPoP authentication. Use the CLI which handles DPoP proof construction automatically, or call the API with Authorization: DPoP <key> and DPoP: <proof> headers.

This advances the revocation epoch. All leases and grants from prior epochs are immediately invalid. Agents must re-authenticate.

  • /healthz — liveness probe (returns {"status":"ok"})
  • /readyz — readiness probe (returns 503 until all startup checks pass)
  • /v1/admin/status — operational status snapshot: version, uptime, dependency health, pending approvals, unresolved intents, revocation epoch, webhook state (admin socket, operator auth required)
  • /metrics — Prometheus-format metrics (admin socket only)
  • JSONL audit export for SIEM integration
  • Outbound webhooks for real-time alerting on approvals, denials, revocations, and failures
  • latchgate_unresolved_intents — should be 0 in steady state; non-zero indicates evidence gaps
  • latchgate_webhook_outbox_pending — growing trend indicates webhook delivery issues
  • latchgate_oldest_pending_approval_seconds — growing trend indicates operator response delays
  • latchgate_audit_write_error_total — any increment is a critical incident
  • latchgate_budget_exhausted_total — indicates undersized budgets or runaway agents
  • readyz_degraded_total{reason="..."} — per-cause degradation counters

Orchestrators (systemd, Kubernetes) should follow this sequence:

  1. Call POST /v1/admin/drain (the gate refuses new requests with 503)
  2. Poll /v1/admin/status until in_flight_executions == 0
  3. Send SIGTERM

The gate on SIGTERM without a prior drain will wait up to 30 seconds for in-flight executions to complete before aborting. Aborted executions produce unresolved intents — avoid this path.

For configuration reference, see Configuration. For the full threat model, see Security Model. For secrets setup, see Secrets Management. For egress proxy setup, see Egress Proxy.