Operating Exodus

Exodus is an operational product. It should be run with clear ownership, observability, approval gates, and rollback criteria.

Required Owners

Every production migration should identify:

  • migration operator,
  • database owner,
  • application owner,
  • rollback approver,
  • business decision owner,
  • and incident escalation contact.

Do not run a production migration where no one is authorized to decide rollback or commit.

Preflight Checklist

Before execution:

  • source and target endpoints are configured,
  • credentials and TLS settings are verified,
  • endpoint health checks pass,
  • interlays are reachable from application networks,
  • telemetry export is working,
  • rollback path is documented,
  • source backup or provider recovery path is known,
  • target capacity is validated,
  • known incompatibilities are documented,
  • and the maintenance or communication plan is approved.

During The Run

Watch these product states:

AreaWhat to monitor
WorkflowCurrent phase, approval state, errors, retry state.
SourceHealth, latency, CPU, memory, connection count, provider limits.
TargetHealth, latency, capacity, connection count, write acceptance.
Data movementProgress, retry rate, failed keys or records, unresolved divergence.
Live trafficRequest rate, write latency, read latency, errors, policy blocks.
RollbackWhether source remains current enough to resume authority.

Rollback Criteria

Define rollback criteria before migration starts. Examples:

  • target error rate exceeds the agreed threshold,
  • p99 latency exceeds the agreed threshold for the agreed window,
  • divergence count grows instead of shrinking,
  • compatibility checks find a critical issue,
  • application smoke tests fail,
  • operator loses observability,
  • or business owner rejects the current risk level.

Rollback is a normal product path, not a failure of the migration process.

Commit Criteria

Commit only when:

  • target is healthy,
  • validation checks pass,
  • divergence is resolved or accepted,
  • application owners approve,
  • rollback window requirements are satisfied,
  • and telemetry confirms target authority after cutover.

After commit, keep source available until the customer's decommission policy allows removal.

Incident Notes

If a migration enters incident response:

  • preserve workflow run ID and migration ID,
  • snapshot source and target health,
  • export failed keys or records,
  • record last phase transition,
  • capture policy and compatibility issues,
  • and avoid destructive cleanup before the incident owner approves.
Last updated: October 20, 2018
    Eden | Govern AI Access