Skip to main content

Change Playbook Data (Safe)

This playbook must be used for any schema/data changes that impact production.

1) Impact Analysis

Before writing migration, answer this:

  1. Which tables/columns have changed?
  2. Which endpoint/service is reading/writing that column?
  3. Are the changes breaking for the running application?
  4. Estimated data volume for backfill?
  5. Risk of locking, downtime, or data race?

Concise template:

Perubahan:
Entitas terdampak:
API/modul terdampak:
Potensi breaking:
Strategi mitigasi:
Rollback strategy:
Owner verifikasi:

2) Expand/Contract Migration Strategy

Use this pattern for secure changes without significant downtime.

Expand phase (safe, backward compatible)

  • Add new columns/tables.
  • Keep old columns working.
  • Update the code so that it can read from the old + new path (dual-read if necessary).
  • Start writing to the new structure (dual-write if needed).

Backfill Phase

  • Fill old data into the new structure gradually (batch/chunk).
  • Monitor errors and performance.
  • Don't run large backfills during peak hours.

Contract Phase

  • After complete verification, stop reading/writing the old path.
  • Drop old columns/tables in separate release.

3) Backfill Strategy

Choose a strategy according to volume:

  • Small (<100k rows): simple batch + checkpoint.
  • Medium (100k-5M): chunk + progress log + retry.
  • Large (>5M): scheduled workers + throttle + strict observability.

Backfill principle:

  • Idempotent (safe to rerun).
  • Can pause/resume.
  • Save last checkpoint offset/id.
  • Record metrics: processed, success, failed.

4) Rollback Plan

Rollback must be written before deploy:

  1. Rollback code: safe commit/tag version.
  2. Rollback scheme: is migration down safe? If not, write a manual rollback.
  3. Data rollback: snapshot/backup used + restore procedure.
  4. Decision gate: when rollback is decided (error rate, data mismatch, latency).

If the schema rollback is high risk, mark requires verification and requires senior approval.

5) Post-Deploy Validation Checklist

  • Primary endpoint success (2xx) and no 5xx spikes.
  • Error logs related to migration/backfill do not increase significantly.
  • Number of old vs new records is consistent (sampling + aggregate).
  • New data write path running.
  • Dashboard/monitoring does not show severe degradation.
  • Business stakeholders confirm critical flow remains normal.

Example verification query (customize table name):

-- 1) Cek data null yang seharusnya sudah terisi setelah backfill
SELECT COUNT(*) AS kosong
FROM target_table
WHERE new_column IS NULL;

-- 2) Cek konsistensi jumlah berdasarkan status
SELECT status, COUNT(*)
FROM target_table
GROUP BY status
ORDER BY status;

-- 3) Sampling silang kolom lama vs baru
SELECT id, old_column, new_column
FROM target_table
WHERE old_column IS NOT NULL
ORDER BY id DESC
LIMIT 50;

6) When to stop deployment

Stop/rollback immediately if:

  • error rate increases sharply,
  • critical queries timeout many times,
  • significant data mismatch,
  • major business transactions fail.

7) Note uncertainty

If there are assumptions that have not been validated by the production environment, write clearly: requires verification.