Operating Exodus
Exodus is an operational product. It should be run with clear ownership, observability, approval gates, and rollback criteria.
Required Owners
Every production migration should identify:
- migration operator,
- database owner,
- application owner,
- rollback approver,
- business decision owner,
- and incident escalation contact.
Do not run a production migration where no one is authorized to decide rollback or commit.
Preflight Checklist
Before execution:
- source and target endpoints are configured,
- credentials and TLS settings are verified,
- endpoint health checks pass,
- interlays are reachable from application networks,
- telemetry export is working,
- rollback path is documented,
- source backup or provider recovery path is known,
- target capacity is validated,
- known incompatibilities are documented,
- and the maintenance or communication plan is approved.
During The Run
Watch these product states:
| Area | What to monitor |
|---|---|
| Workflow | Current phase, approval state, errors, retry state. |
| Source | Health, latency, CPU, memory, connection count, provider limits. |
| Target | Health, latency, capacity, connection count, write acceptance. |
| Data movement | Progress, retry rate, failed keys or records, unresolved divergence. |
| Live traffic | Request rate, write latency, read latency, errors, policy blocks. |
| Rollback | Whether source remains current enough to resume authority. |
Rollback Criteria
Define rollback criteria before migration starts. Examples:
- target error rate exceeds the agreed threshold,
- p99 latency exceeds the agreed threshold for the agreed window,
- divergence count grows instead of shrinking,
- compatibility checks find a critical issue,
- application smoke tests fail,
- operator loses observability,
- or business owner rejects the current risk level.
Rollback is a normal product path, not a failure of the migration process.
Commit Criteria
Commit only when:
- target is healthy,
- validation checks pass,
- divergence is resolved or accepted,
- application owners approve,
- rollback window requirements are satisfied,
- and telemetry confirms target authority after cutover.
After commit, keep source available until the customer's decommission policy allows removal.
Incident Notes
If a migration enters incident response:
- preserve workflow run ID and migration ID,
- snapshot source and target health,
- export failed keys or records,
- record last phase transition,
- capture policy and compatibility issues,
- and avoid destructive cleanup before the incident owner approves.
Related Docs
Last updated: October 20, 2018