Scaling and High Availability

Read this if you need the deployment mental model from laptop install to multi-instance cluster. Skip this if you need table-level mechanics first; use the drill-down docs.

Tyrum keeps one logical architecture across deployment sizes. The difference between single host and cluster is replica count and infrastructure shape, not control-plane semantics.

What This Page Covers

Deployment roles and how they map to single-host vs cluster.
The durability and coordination primitives that must remain true at every size.
High-level failure posture for restart, failover, and reconnect scenarios.

Deployment Shapes

Single-host

Typical form: one machine with co-located gateway edge, turn processor, worker, and scheduler roles.
Durable state: SQLite StateStore on local persistent disk.
Backplane/lease behavior still exists conceptually, but usually in uncontested form.

Clustered

Typical form: replicated gateway edges, replicated workers, lease-coordinated schedulers, optional desktop-runtime role.
Durable state: HA Postgres StateStore.
Backplane/outbox handles cross-edge event delivery and directed routing.
Workers and schedulers coordinate with durable leases/claims instead of in-memory ownership.

Runtime Roles

gateway-edge: long-lived WebSocket/HTTP edge, routing, auth, control-plane coordination.
worker: claims and executes work attempts through tools, nodes, and MCP.
scheduler: emits cron/watch/heartbeat triggers under DB lease coordination.
desktop-runtime (optional): reconciles gateway-managed desktop environments.

Short Invariants (Must Hold Everywhere)

Durable state is the source of truth; restarts cannot erase execution-critical state.
Requests/events are retry-safe under at-least-once delivery.
Execution that must be serialized stays serialized per conversation.
Secrets remain handles until trusted execution boundaries resolve them.
Workspace durability is explicit; ToolRunner is the writer boundary for workspace-backed steps.

Coordination Primitives

StateStore durability: conversation, approval, turn, artifact, audit, and routing state survives process churn.
Outbox/backplane delivery: cross-instance event routing stays at-least-once with dedupe safety.
Claim/lease execution: workers claim work atomically and take over safely on expiry.
Scheduler DB leases: trigger ownership is durable and failover-safe.
Idempotency + retries: side-effecting work does not duplicate outcomes during retries.

Failure and Recovery Posture

Edge failure: connections reconnect to healthy edges; state is re-derived from durable records.
Worker failure: lease expiry enables takeover without replaying completed side effects.
Scheduler failure: lease transfer prevents double-firing under multi-instance operation.
Database/transient network issues: retries + durable contracts restore service without changing policy semantics.

Not In Scope Here

Protocol wire contracts and payload catalogs.
Component-level execution/approval internals.
Table-level retention, schema, and index tuning mechanics.

What This Page Covers​

Deployment Shapes​

Single-host​

Clustered​

Runtime Roles​

Short Invariants (Must Hold Everywhere)​

Coordination Primitives​

Failure and Recovery Posture​

Not In Scope Here​

Drill-down​