# Session 0017: Modular Sync/Promote/Rebuild Architecture **Date:** 2026-02-22 **Status:** Paused (context checkpoint) **Origin:** MDF Webseiten session 0032 --- ## Work Done - [x] Fixed `SL detect_env()` — was returning "seriousletter" instead of the env name; now scans path components for first match after "data" - [x] Fixed `MDF list_backups()` indentation bug — try block was at same level as for loop, only parsed the last backup file - [x] Added `promote` config to `registry.yaml` for mdf (rsync), seriousletter (git), ringsaday (git) — each defines promote type, branch mapping, post-pull behavior - [x] Added `promote` Typer command to SL `sync.py` — git fetch, diff preview, git pull, Dockerfile change detection, container rebuild/restart, health check; only dev→int and int→prod allowed - [x] Added `cmd_promote` to ops CLI — delegates to project CLI with `--from`/`--to` args - [x] Added `cmd_rebuild` to ops CLI — starts containers, waits for health, restores latest backup - [x] Created 4 new FastAPI routers in ops-dashboard: - `promote.py` — SSE streaming promote endpoint - `sync_data.py` — SSE streaming sync endpoint - `registry.py` — exposes project list + environments + promote config as JSON - `rebuild.py` — SSE streaming rebuild/disaster-recovery endpoint - [x] Updated `backups.py` to read project list from registry API instead of hardcoding - [x] Added "Operations" page to dashboard sidebar with three sections: Promote Code, Sync Data, Rebuild (Disaster Recovery) - [x] Operations page uses SSE modal with dry-run toggle; project/direction buttons populated dynamically from `/api/registry/` - [x] Verified all 7 test categories pass ## Key Decisions / Learnings - All long-running ops commands (promote, sync, rebuild) use SSE streaming — consistent with existing backup/restore pattern. The `stream_ops_host()` helper is the standard interface. - Registry is the single source of truth for project/environment/promote config. Dashboard reads it dynamically — no hardcoded project names in API routers. - Promote direction validation lives in the project CLI (`sync.py`), not in the ops CLI or dashboard — keeps enforcement close to the implementation. - `ops rebuild` is the disaster recovery entry point: bring up containers → wait for healthy → restore latest backup. Simple, composable. - `detect_env()` path parsing must handle the full `/opt/data/seriousletter/{env}/code/...` structure — scanning for VALID_ENVS after "data" in path components is robust. ## Files Changed - `/opt/data/seriousletter/{dev,int,prod}/code/scripts/sync/sync.py` — fix `detect_env`, add `promote` command - `Code/mdf-system/scripts/sync/sync.py` (local + deployed to dev) — fix `list_backups` indentation - `/opt/infrastructure/servers/hetzner-vps/registry.yaml` — add `promote` config per project - `/opt/infrastructure/ops` — add `cmd_promote`, `cmd_rebuild` - `/opt/data/ops-dashboard/app/routers/promote.py` — new SSE promote endpoint - `/opt/data/ops-dashboard/app/routers/sync_data.py` — new SSE sync endpoint - `/opt/data/ops-dashboard/app/routers/registry.py` — new registry JSON endpoint - `/opt/data/ops-dashboard/app/routers/rebuild.py` — new SSE rebuild endpoint - `/opt/data/ops-dashboard/app/routers/backups.py` — dynamic project list from registry - `/opt/data/ops-dashboard/app/main.py` — register 4 new routers - `/opt/data/ops-dashboard/static/js/app.js` — Operations page UI + SSE modal - `/opt/data/ops-dashboard/static/index.html` — nav link + ops-modal HTML ## Next Steps (at time of pause) - [ ] Test backup creation from dashboard UI - [ ] Test full promote dry-run via dashboard (Operations page) - [ ] Test sync dry-run via dashboard - [ ] Commit infrastructure and code repo changes on server - [ ] DNS cutover mdf-system.de → .ch - [ ] Disaster recovery test (destroy + rebuild SL dev) --- **Tags:** #Session #OpsDashboard #OpsCLI #Promote #Sync #Rebuild #Registry