Matthias Nott
2026-02-26 365acb650ff1f669d0c6057d22ac69f7ef656619
docs: bootstrap OPS project with CLAUDE.md, session history, and TODO

- CLAUDE.md with project context, paths, architecture, deploy commands
- 12 session notes extracted from MDF Webseiten project (sessions 0022-0055)
- TODO.md with remaining items
- Notes organized in YYYY/MM date hierarchy
14 files added
changed files
CLAUDE.md patch | view | blame | history
Notes/2026/02/0001 - 2026-02-22 - Ops Dashboard Core Fixes.md patch | view | blame | history
Notes/2026/02/0002 - 2026-02-22 - Backup Page Redesign v5-v6.md patch | view | blame | history
Notes/2026/02/0003 - 2026-02-22 - Backup v8-v9 Delete Multi-Select URL Routing.md patch | view | blame | history
Notes/2026/02/0004 - 2026-02-22 - Adjacent Env Restriction & Lifecycle Operations.md patch | view | blame | history
Notes/2026/02/0005 - 2026-02-22 - Rebuild.py Rewrite & App Volume Mount.md patch | view | blame | history
Notes/2026/02/0006 - 2026-02-22 - SSE Streaming Backup & Upload Endpoints.md patch | view | blame | history
Notes/2026/02/0007 - 2026-02-22 - FTP Progress Callbacks & Upload Button Fix.md patch | view | blame | history
Notes/2026/02/0008 - 2026-02-23 - Schedule Management & Backup Coverage System.md patch | view | blame | history
Notes/2026/02/0009 - 2026-02-23 - Dashboard Rewrite Committed & Gen-Timers Migration.md patch | view | blame | history
Notes/2026/02/0010 - 2026-02-25 - Sync Router Bidirectional Fix.md patch | view | blame | history
Notes/2026/02/0011 - 2026-02-25 - v15 Deploy Debug & Full Verification.md patch | view | blame | history
Notes/2026/02/0012 - 2026-02-26 - No-Backup Option for Promote & Sync.md patch | view | blame | history
Notes/TODO.md patch | view | blame | history
CLAUDE.md
....@@ -0,0 +1,64 @@
1
+# Ops Dashboard - Project Context
2
+
3
+## Key Paths
4
+
5
+| What | Path |
6
+|------|------|
7
+| **Code repo (local)** | `~/dev/ai/OPS/` |
8
+| **Code repo (server)** | `/opt/data/ops-dashboard/` |
9
+| **Code repo (remote)** | `git.mnsoft.org/git/APPS/ops-dashboard.git` |
10
+| **Infrastructure repo (server)** | `/opt/infrastructure/` |
11
+| **Infrastructure repo (remote)** | `git.mnsoft.org/git/APPS/infrastructure.git` |
12
+| **Notes** | `~/dev/ai/OPS/Notes/` |
13
+| **TODO** | `~/dev/ai/OPS/Notes/TODO.md` |
14
+
15
+## Server Access
16
+
17
+```bash
18
+ssh mdf-system.ch # root, port 99 (via ~/.ssh/config)
19
+```
20
+
21
+## Application
22
+
23
+| What | Detail |
24
+|------|--------|
25
+| **URL** | https://ops.tekmidian.com |
26
+| **Auth token** | `ops-mdf-2026-secure` |
27
+| **Container** | `ops-dashboard` |
28
+| **Stack** | FastAPI backend, vanilla JS frontend, SSE for real-time ops |
29
+
30
+## Architecture
31
+
32
+- Container mounts: `/opt/data`, `/opt/infrastructure`, `/var/run/docker.sock` + app source
33
+- Container has pyyaml 6.0.3 — can import toolkit directly
34
+- nsenter bridge for host operations (backup, restore, sync, promote, gen-timers)
35
+- `OPS_CLI` = `/usr/local/bin/ops` on host (bash shim -> `python3 -m toolkit.cli`)
36
+- Toolkit: `/opt/infrastructure/toolkit/` — 12 Python modules
37
+- Registry: `project.yaml` descriptors at `/opt/data/{project}/project.yaml` (source of truth)
38
+
39
+## Dashboard Pages
40
+
41
+| Page | Purpose |
42
+|------|---------|
43
+| **Dashboard** | Status tiles, project drill-down |
44
+| **Services** | Container cards, restart / logs / terminal |
45
+| **Backups** | Date-grouped, local + offsite, restore modal, multi-select delete |
46
+| **Operations** | Promote, sync, rebuild — SSE streaming modals |
47
+| **Schedules** | Backup timer management, edit modal |
48
+| **System** | CPU / mem / disk, health checks, timers |
49
+
50
+## Deploy
51
+
52
+```bash
53
+rsync -avz --delete ~/dev/ai/OPS/app/ mdf-system.ch:/opt/data/ops-dashboard/app/
54
+rsync -avz --delete ~/dev/ai/OPS/static/ mdf-system.ch:/opt/data/ops-dashboard/static/
55
+ssh mdf-system.ch 'docker restart ops-dashboard'
56
+```
57
+
58
+## Mandatory Rules
59
+
60
+- **Never edit files directly on the server** — always rsync from local, then restart
61
+- **project.yaml is source of truth** — never hardcode project/env lists in the dashboard
62
+- **toolkit is importable** — use `from toolkit.X import Y` directly inside the container; no subprocess ops calls for data reads
63
+- **nsenter for mutations** — backup, restore, sync, promote must go through the nsenter bridge to the host Python venv, not direct toolkit calls
64
+- **SSE for long ops** — any operation that may take >2s must stream progress via SSE, never block the HTTP response
Notes/2026/02/0001 - 2026-02-22 - Ops Dashboard Core Fixes.md
....@@ -0,0 +1,33 @@
1
+# Session 0001: Ops Dashboard Core Fixes
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0024
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Removed load averages tile — replaced with Containers (running/total) + Processes tiles
12
+- [x] Fixed Health Checks section — was broken inside Docker, now runs via nsenter bridge on host
13
+- [x] Fixed Timers section — was broken (no systemd in container), now uses nsenter on host
14
+- [x] Added `run_command_host()` to `ops_runner.py` for arbitrary host commands via nsenter
15
+- [x] Rewrote timer parser — anchors on timestamp patterns instead of fragile column splitting
16
+- [x] Fixed ops CLI health check — removed stale /opt/data2 reference, added [OK]/[FAIL] output format
17
+- [x] Added Docker daemon running check to `ops health` (reports container count)
18
+
19
+## Key Decisions / Learnings
20
+
21
+- Dashboard container uses COPY (not volume mount) — requires `docker build` + recreate for changes to take effect
22
+- nsenter bridge pattern for host commands: `docker run --rm --privileged --pid=host alpine nsenter -t 1 -m -u -i -n -p --`
23
+- `ops health` must exit 0 always — returning issue count breaks callers using `set -euo pipefail`
24
+
25
+## Files Changed
26
+
27
+- `app/routers/system.py` — nsenter for health+timers, containers/processes tiles
28
+- `app/ops_runner.py` — added `run_command_host()`
29
+- `static/js/app.js` — replaced Load tile with Containers + Processes tiles
30
+
31
+---
32
+
33
+**Tags:** #Session #OpsDashboard
Notes/2026/02/0002 - 2026-02-22 - Backup Page Redesign v5-v6.md
....@@ -0,0 +1,30 @@
1
+# Session 0002: Backup Page Redesign v5-v6
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0026
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] **v5**: Rewrote backup page from flat 65+ row table to date-grouped collapsible sections
12
+ - Summary stat tiles (local count, offsite count, latest backup, total size)
13
+ - Today/yesterday auto-expanded, older collapsed with chevron toggle animation
14
+ - Combined local+offsite view with type badges
15
+- [x] **v6**: Deduplication + inline restore
16
+ - Same filename in both local+offsite locations → single row with "local + offsite" badge
17
+ - Removed separate Restore page from sidebar
18
+ - Added Restore button per row with confirmation modal + SSE streaming output
19
+ - Dry-run checkbox (default on) in restore modal
20
+
21
+## Key Decisions / Learnings
22
+
23
+- Inline restore replaces the separate Restore page — backups and restores live on one page
24
+- Dry-run default-on prevents accidental destructive restores
25
+- SSE streaming for restore output enables real-time feedback in the modal
26
+- Dedup by filename keeps the UI clean when the same backup exists locally and offsite
27
+
28
+---
29
+
30
+**Tags:** #Session #OpsDashboard #Backups
Notes/2026/02/0003 - 2026-02-22 - Backup v8-v9 Delete Multi-Select URL Routing.md
....@@ -0,0 +1,36 @@
1
+# Session 0003: Backup v8-v9 Delete, Multi-Select, URL Routing
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0031
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+### v8: Delete + Granular Restore
12
+- [x] `DELETE /api/backups/{project}/{env}/{name}` endpoint with path traversal validation
13
+- [x] Restore `mode` query param (full/db/wp) → passes `--db-only`/`--wp-only` to ops CLI
14
+- [x] Delete button on every backup row in Level 2 drill-down
15
+- [x] Restore Mode radio buttons (Full / Database only / WP-Content only) in restore modal
16
+
17
+### v9: Multi-Select, Upload, Source Selector, URL Routing
18
+- [x] URL hash routing — `#/backups/mdf/dev`, `#/dashboard/table`, `#/system` — browser refresh preserves location
19
+- [x] Multi-select delete — checkboxes per row, select-all header, blue selection bar, bulk delete
20
+- [x] Upload to offsite — purple "Upload" button on local-only backups
21
+- [x] Restore source selector — Local/Offsite radio buttons when backup exists in both locations
22
+
23
+## Key Decisions / Learnings
24
+
25
+- Path traversal validation is required on delete endpoint (user-supplied filename in URL)
26
+- Static files are volume-mounted (not COPY'd) — frontend changes don't require container rebuild
27
+- URL hash routing lets users bookmark specific dashboard views and survive page refresh
28
+- Granular restore (db-only / wp-only) avoids full restore when only one component needs recovery
29
+
30
+## Pending
31
+
32
+- Selection bar spacing CSS gap not taking effect (possible browser cache issue)
33
+
34
+---
35
+
36
+**Tags:** #Session #OpsDashboard #Backups
Notes/2026/02/0004 - 2026-02-22 - Adjacent Env Restriction & Lifecycle Operations.md
....@@ -0,0 +1,35 @@
1
+# Session 0004: Adjacent Env Restriction & Lifecycle Operations
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0034
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+### Adjacent Environment Restriction
12
+- [x] Removed direct prod↔dev sync/promote paths from UI and API
13
+- [x] Only adjacent pairs allowed: dev↔int, int↔prod
14
+- [x] Backend returns HTTP 400 for invalid environment pairs
15
+
16
+### Container Lifecycle Operations (Rebuild/Recreate)
17
+- [x] Implemented three lifecycle operations (discovered Coolify API caused duplicate containers):
18
+ - **Restart** — `docker restart` via SSH (safe, no image changes)
19
+ - **Rebuild** — stop → build image → start (keeps data volumes)
20
+ - **Recreate** — stop → wipe data → build image → start (full disaster recovery)
21
+- [x] Color-coded UI: green (restart), yellow (rebuild), red (recreate)
22
+- [x] Type-to-confirm dialog for destructive Recreate operation
23
+- [x] Fixed EventSource auto-reconnect causing duplicate banners across operations
24
+- [x] Fixed "already stopped" graceful handling, NameError crash, container filter OR vs AND
25
+
26
+## Key Decisions / Learnings
27
+
28
+- Direct prod↔dev skips review in intermediate env (int) — adjacent-only enforced at API level, not just UI
29
+- Coolify stop prunes local Docker images — cannot use Coolify API to stop services with locally-built images
30
+- EventSource auto-reconnect must be explicitly closed after operation complete to prevent duplicate banners
31
+- Type-to-confirm for Recreate is appropriate UX — wipes data volumes, no undo
32
+
33
+---
34
+
35
+**Tags:** #Session #OpsDashboard #Lifecycle
Notes/2026/02/0005 - 2026-02-22 - Rebuild.py Rewrite & App Volume Mount.md
....@@ -0,0 +1,28 @@
1
+# Session 0005: rebuild.py Rewrite & App Volume Mount
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0035
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Diagnosed root cause: dashboard was calling Coolify API stop/start on a placeholder test-nginx, not actual MDF containers
12
+- [x] Rewrote `rebuild.py` to use `ops rebuild` CLI via host nsenter bridge (no Coolify API)
13
+- [x] Updated `ops rebuild` to do `docker compose down` before `up -d --build`
14
+- [x] Added safety backup step to Recreate operation
15
+- [x] Added `app/` directory as volume mount to ops-dashboard compose (enables live edits without rebuild)
16
+- [x] Added ops-dashboard git remote at `git.mnsoft.org/git/APPS/ops-dashboard.git`
17
+- [x] Committed and pushed all server repos (MDF, infrastructure, ops-dashboard)
18
+
19
+## Key Decisions / Learnings
20
+
21
+- `app/` as volume mount is essential for iterating on dashboard backend without container rebuilds
22
+- `static/` was already volume-mounted; `app/` mount completes the live-edit setup
23
+- Coolify API is unreliable for locally-built images — ops CLI via nsenter bridge is the correct pattern
24
+- Safety backup before Recreate ensures data can be recovered if the restore fails
25
+
26
+---
27
+
28
+**Tags:** #Session #OpsDashboard #Infrastructure
Notes/2026/02/0006 - 2026-02-22 - SSE Streaming Backup & Upload Endpoints.md
....@@ -0,0 +1,24 @@
1
+# Session 0006: SSE Streaming Backup & Upload Endpoints
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0036
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Added `upload` subcommand to `offsite.py` CLI (function existed but wasn't wired)
12
+- [x] Converted `create_backup` endpoint from plain JSON to SSE streaming
13
+- [x] Converted `upload_offsite` endpoint from plain JSON to SSE streaming
14
+- [x] Changed both endpoints to accept GET+POST (EventSource API requires GET)
15
+
16
+## Key Decisions / Learnings
17
+
18
+- EventSource (SSE) requires GET requests — endpoints serving streaming output must accept GET
19
+- Converting to SSE streaming gives real-time feedback for long-running backup and upload operations
20
+- offsite.py had an upload function but it was never exposed as a CLI subcommand — easy fix, high value
21
+
22
+---
23
+
24
+**Tags:** #Session #OpsDashboard #Backups #SSE
Notes/2026/02/0007 - 2026-02-22 - FTP Progress Callbacks & Upload Button Fix.md
....@@ -0,0 +1,30 @@
1
+# Session 0007: FTP Progress Callbacks & Upload Button Fix
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0038
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Added FTP upload/download progress callbacks to `offsite.py` — prints every 5% with size info
12
+- [x] Increased FTP block size to 256KB for better throughput
13
+- [x] Added `flush=True` and `sys.stdout.reconfigure(line_buffering=True)` for SSE streaming compatibility
14
+- [x] Fixed Upload button — now passes exact filename through frontend → API (`?name=` param) → ops CLI; previously always uploaded the latest backup regardless of which row was clicked
15
+- [x] Added `cache: 'no-store'` to all `fetch()` calls in `app.js` to prevent stale UI state
16
+- [x] Added `renderBackups()` call after upload success and on upload modal close
17
+- [x] Added `/etc/tmpfiles.d/mdf-cleanup.conf` to auto-clean orphan `/tmp/tmp*` dirs older than 1 day
18
+- [x] Increased FTP data socket timeout to 300s for large transfers
19
+- [x] Verified via Playwright: LOCAL + OFFSITE badges display correctly, merge-by-filename works
20
+
21
+## Key Decisions / Learnings
22
+
23
+- `sys.stdout.reconfigure(line_buffering=True)` is required for progress output to stream through SSE — buffered stdout swallows output
24
+- The upload endpoint must accept a `name=` param; generic "upload latest" is wrong UX when user clicks a specific row
25
+- `cache: 'no-store'` on all fetches is necessary — stale backup list after upload is confusing
26
+- Corrupt backup (`prod_backup_20260219_164913.tar.gz`) failed FTP at ~5% consistently — safe to delete
27
+
28
+---
29
+
30
+**Tags:** #Session #OpsDashboard #Backups #FTP
Notes/2026/02/0008 - 2026-02-23 - Schedule Management & Backup Coverage System.md
....@@ -0,0 +1,34 @@
1
+# Session 0008: Schedule Management & Backup Coverage System
2
+
3
+**Date:** 2026-02-23
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0040
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Fixed `backup-all.sh` — was appending env suffix twice → files landed in `.../dev/dev/`
12
+- [x] Moved stranded double-nested backup files to correct directories
13
+- [x] Version-controlled `offsite.py` and `backup-all.sh` into infrastructure repo
14
+- [x] Added `_backup_generic()` function to ops CLI — tar-based fallback for projects without a dedicated CLI
15
+- [x] Added `backup:` config blocks to `registry.yaml` for MDF (03:15), SeriousLetter (03:00), Coolify (04:00)
16
+- [x] Created `gen-timers.py` — reads registry, generates systemd `.service` + `.timer` units automatically
17
+- [x] Added `ops gen-timers [--dry-run]` command — replaces legacy backup-all, mdf-backup, seriousletter-backup timers
18
+- [x] Created `schedule.py` FastAPI router:
19
+ - `GET /api/schedule/` — returns backup config for all projects
20
+ - `PUT /api/schedule/{project}` — updates config, writes registry via nsenter, regenerates timers
21
+- [x] Added "Schedules" nav item to dashboard sidebar (clock icon)
22
+- [x] Schedule page: table showing all projects with enabled/schedule/envs/offsite/retention columns
23
+- [x] Schedule edit modal: toggle, time picker, env checkboxes, offsite section, retention fields
24
+
25
+## Key Decisions / Learnings
26
+
27
+- registry.yaml drives both systemd timers and the dashboard schedule UI — single source of truth
28
+- `gen-timers` must auto-remove orphan timers (e.g. `backup-coolify.timer`) — prevents ghost schedules
29
+- `PUT /api/schedule/{project}` writes via nsenter (not inside container) because systemd lives on host
30
+- `backup-all.sh` must NOT append `/$env` suffix if the CLI already appends it internally
31
+
32
+---
33
+
34
+**Tags:** #Session #OpsDashboard #Backups #Scheduling
Notes/2026/02/0009 - 2026-02-23 - Dashboard Rewrite Committed & Gen-Timers Migration.md
....@@ -0,0 +1,46 @@
1
+# Session 0009: Dashboard Rewrite Committed & Gen-Timers Migration
2
+
3
+**Date:** 2026-02-23
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0047
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Committed ops-dashboard rewrite (8 files, rebuild.py reduced from 707 to 348 lines)
12
+- [x] Browser-tested all 5 dashboard pages: Dashboard, Backups, Schedules, Operations, MDF drill-down
13
+
14
+### Frontend Fixes (app.js)
15
+- [x] Fixed environment parsing — `cfg.environments` returns objects, needed `.map(e => e.name)` for promote/sync/lifecycle sections
16
+- [x] Removed `has_coolify` gate — Container Lifecycle section was incorrectly hidden entirely
17
+- [x] Changed all "Coolify API" text references to "docker compose"
18
+- [x] Fixed leftover banner bug — "Go to Backups" banner from Recreate persisted across subsequent Restart/Rebuild operations
19
+
20
+### Gen-Timers Migration
21
+- [x] Rewrote `cmd_gen_timers` to read from `all_projects()` descriptors instead of `registry.yaml`
22
+- [x] Orphan timer auto-cleanup (removed `backup-coolify.timer`)
23
+- [x] Schedule `PUT` endpoint now writes to `project.yaml` (not `registry.yaml`) — registry.yaml is now dead code
24
+
25
+### Seafile Healthchecks
26
+- [x] Added Docker HEALTHCHECK to `prod-mdf-seafile` (curl localhost:80, 60s start_period)
27
+- [x] Added Docker HEALTHCHECK to `prod-mdf-seafile-redis` (redis-cli ping)
28
+- [x] All 3 Seafile containers now report `healthy` in dashboard
29
+
30
+### Backup Bug Fixes
31
+- [x] Fixed single-env backup — MDF CLI `--all` flag was always backing up all envs even when one env was requested
32
+- [x] Fixed `bk.create` delegation — only delegates when command template contains `{env}`
33
+
34
+### Known Issues Remaining
35
+- Restore chain broken: offsite downloaded file path not reaching actual restore (shows wrong filename)
36
+- Backups page shows all entries as "Remote" (local/remote distinction broken in frontend)
37
+
38
+## Key Decisions / Learnings
39
+
40
+- `project.yaml` descriptors replace `registry.yaml` as source of truth for all ops commands
41
+- `has_coolify` gate should be removed — dashboard should always show lifecycle section
42
+- Browser testing after every batch of changes is essential — environment parsing bug only visible in browser
43
+
44
+---
45
+
46
+**Tags:** #Session #OpsDashboard #Refactor
Notes/2026/02/0010 - 2026-02-25 - Sync Router Bidirectional Fix.md
....@@ -0,0 +1,20 @@
1
+# Session 0010: Sync Router Bidirectional Fix
2
+
3
+**Date:** 2026-02-25
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0052
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Fixed `sync_data.py` — added bidirectional sync pairs (int->prod, dev->int); was only defined in one direction → caused "Connection lost" error when triggering int->prod sync from dashboard
12
+
13
+## Key Decisions / Learnings
14
+
15
+- Sync pairs must be defined bidirectionally in the router even if data only ever flows one direction (prod→int→dev) — the UI may call either direction depending on user intent
16
+- This was a trivial fix but caused a visible "Connection lost" failure in the dashboard
17
+
18
+---
19
+
20
+**Tags:** #Session #OpsDashboard #BugFix
Notes/2026/02/0011 - 2026-02-25 - v15 Deploy Debug & Full Verification.md
....@@ -0,0 +1,45 @@
1
+# Session 0011: v15 Deploy, Debug & Full Verification
2
+
3
+**Date:** 2026-02-25
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0054
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Deployed ops dashboard v15 to server (rsync + container rebuild)
12
+- [x] Fixed missing `OPS_CLI` path in `run_job()` — nsenter couldn't find the `ops` command
13
+- [x] Fixed same `OPS_CLI` bug in `restore.py` `_stream_to_job()`
14
+- [x] Fixed Python stdout buffering through nsenter pipe — added `PYTHONUNBUFFERED=1` to `_NSENTER_PREFIX`
15
+- [x] Fixed stdin inheritance — added `stdin=asyncio.subprocess.DEVNULL` to prevent `docker run -i` blocking
16
+- [x] Fixed `schedule.py` hard-coded `/usr/local/bin/ops` path — replaced with `OPS_CLI` constant
17
+- [x] Added logging to `run_job()` (command start, subprocess PID, exit code)
18
+- [x] Fixed terminal `docker exec` missing `-it` flags — shell was exiting immediately with code 0
19
+- [x] Fixed MDF backup timer — `gen-timers` wasn't expanding `{env}` in custom command templates
20
+- [x] Verified backup (dev + int): lines streaming in real-time
21
+- [x] Verified disconnect/reconnect: output replay from offset works
22
+- [x] Verified restart mdf/dev: 2 containers restarted successfully
23
+- [x] Verified terminal: WebSocket handshake + interactive shell working
24
+
25
+## Key Decisions / Learnings
26
+
27
+| Bug | Root Cause | Fix |
28
+|-----|-----------|-----|
29
+| `nsenter: can't execute 'backup'` | `run_job()` missing OPS_CLI prefix | Added `[OPS_CLI]` to `full_args` |
30
+| Backup produces 0 lines | Python stdout buffered through pipe | Added `PYTHONUNBUFFERED=1` to nsenter prefix |
31
+| `docker run -i` hangs | stdin inherited from server process | `stdin=asyncio.subprocess.DEVNULL` |
32
+| Terminal exits immediately (code 0) | `docker exec` missing `-it` flags | Added `-it` to exec command |
33
+| MDF backups not running (2 nights) | `gen-timers`: `{env}` never expanded in custom command | Loop over envs + `.replace("{env}", env)` |
34
+
35
+- `PYTHONUNBUFFERED=1` is essential whenever running Python via nsenter pipe — buffering silently swallows all output
36
+- `stdin=asyncio.subprocess.DEVNULL` is required for non-interactive subprocess calls from async context
37
+
38
+## Commits
39
+
40
+- `9e13f76` — feat: ops dashboard v15 — persistent jobs + container terminal (Webseiten repo)
41
+- `4e65e9e` — fix: gen-timers expand {env} placeholder in custom backup commands (infrastructure repo)
42
+
43
+---
44
+
45
+**Tags:** #Session #OpsDashboard #Debug #Deployment
Notes/2026/02/0012 - 2026-02-26 - No-Backup Option for Promote & Sync.md
....@@ -0,0 +1,25 @@
1
+# Session 0012: No-Backup Option for Promote & Sync
2
+
3
+**Date:** 2026-02-26
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0055
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Added "Skip safety backup" checkbox to promote modal and sync modal
12
+ - Backend: `no_backup` query param on `promote.py` and `sync_data.py`
13
+ - Frontend: amber-colored checkbox, hidden for lifecycle operations (restart/rebuild/recreate)
14
+ - Deployed and verified (200 OK)
15
+
16
+## Key Decisions / Learnings
17
+
18
+- Safety backup before promote/sync is the default — skip is opt-in, not opt-out
19
+- Amber color signals caution without being as severe as red
20
+- The checkbox is only shown for promote/sync, not for container lifecycle ops (different risk profile)
21
+- Useful when iterating quickly on dev/int where the overhead of a safety backup is unnecessary
22
+
23
+---
24
+
25
+**Tags:** #Session #OpsDashboard #Promote #Sync
Notes/TODO.md
....@@ -0,0 +1,17 @@
1
+# TODO
2
+
3
+## Open
4
+
5
+- [ ] Simple WordPress backup plugin (FTP + WebDAV only, UpdraftPlus alternative)
6
+
7
+## Ideas
8
+
9
+- [ ] Dark mode toggle
10
+- [ ] Mobile-responsive improvements
11
+- [ ] Log viewer with search/filter
12
+- [ ] Alerting rules configuration page
13
+
14
+## Completed (Summary)
15
+
16
+Dashboard feature-complete as of 2026-02-26. See session notes in Notes/2026/02/ for full history.
17
+Originated from MDF Webseiten project (sessions 0022-0055).