15 files added
changed files
Notes/2026/02/0013 - 2026-02-20 - Infrastructure Repo & Ops CLI Bootstrap.md patch | view | blame | history
Notes/2026/02/0014 - 2026-02-20 - Registry Naming & Backup System.md patch | view | blame | history
Notes/2026/02/0015 - 2026-02-22 - Offsite Backup Dashboard Fix & Status Format.md patch | view | blame | history
Notes/2026/02/0016 - 2026-02-22 - Backup Drill-Down Redesign & Restore Fix.md patch | view | blame | history
Notes/2026/02/0017 - 2026-02-22 - Modular Sync Promote Rebuild Architecture.md patch | view | blame | history
Notes/2026/02/0018 - 2026-02-22 - CLI Contract Spec, Sync Compliance, Dashboard Bidirectional UI.md patch | view | blame | history
Notes/2026/02/0019 - 2026-02-22 - Offsite Download Feature Added to Dashboard.md patch | view | blame | history
Notes/2026/02/0020 - 2026-02-23 - Backup Coverage Audit, Registry Fixes, Container Resolution.md patch | view | blame | history
Notes/2026/02/0021 - 2026-02-23 - Rebuild.py Coolify-Only Lifecycle, SSE Keepalive, Traefik Flush.md patch | view | blame | history
Notes/2026/02/0022 - 2026-02-23 - Post-Coolify Architecture Context for Ops Toolkit.md patch | view | blame | history
Notes/2026/02/0023 - 2026-02-23 - Toolkit Bootstrap Starting Point.md patch | view | blame | history
Notes/2026/02/0024 - 2026-02-23 - Toolkit and CLI Rewrite and Dashboard Migration.md patch | view | blame | history
Notes/2026/02/0025 - 2026-02-24 - Dashboard Bugs and SL Routing Fixes.md patch | view | blame | history
Notes/2026/02/0026 - 2026-02-25 - Persistent Jobs and Container Terminal.md patch | view | blame | history
Notes/2026/02/0027 - 2026-02-26 - Dynamic Backup Buttons & TEKMidian Registration.md patch | view | blame | history
Notes/2026/02/0013 - 2026-02-20 - Infrastructure Repo & Ops CLI Bootstrap.md
....@@ -0,0 +1,59 @@
1
+# Session 0013: Infrastructure Repo & Ops CLI Bootstrap
2
+
3
+**Date:** 2026-02-20
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0018
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Created infrastructure repo at `git.mnsoft.org/git/APPS/infrastructure.git`
12
+- [x] Local clone: `/Users/i052341/Daten/Cloud/08 - Others/MDF/Infrastruktur/Code/infrastructure/`
13
+- [x] Server clone: `/opt/infrastructure/`
14
+- [x] Wrote `ops` CLI (bash, ~250 lines) — symlinked to `/usr/local/bin/ops`
15
+- [x] Created `servers/hetzner-vps/registry.yaml` — single source of truth for 5 projects
16
+- [x] Captured 5 Traefik dynamic configs from server into git
17
+- [x] Wrote `monitoring/healthcheck.sh` — container health + disk checks → ntfy
18
+- [x] Installed `ops-healthcheck.timer` (every 5 min) on server
19
+- [x] Added Docker labels (`ops.project`, `ops.environment`, `ops.service`) to all MDF compose files
20
+- [x] Replaced hardcoded `container_name()` in `sync.py` with label-based discovery + UUID suffix fallback
21
+- [x] Verified: `ops status`, `ops health`, `ops disk`, `ops backup mdf prod` all working
22
+
23
+## Repo Structure Created
24
+
25
+```
26
+infrastructure/
27
+├── ops # The ops CLI (bash)
28
+├── servers/hetzner-vps/
29
+│ ├── registry.yaml # 5 projects defined
30
+│ ├── traefik/dynamic/ # Traefik configs captured
31
+│ ├── bootstrap/ # Coolify service payloads
32
+│ ├── scaffolding/ # Shell aliases, SSH hardening, venv setup
33
+│ ├── systemd/ # 6 timer/service units
34
+│ └── install.sh # Full fresh server setup script
35
+├── monitoring/
36
+│ ├── healthcheck.sh
37
+│ ├── ops-healthcheck.service
38
+│ └── ops-healthcheck.timer
39
+└── docs/architecture.md
40
+```
41
+
42
+## Key Decisions / Learnings
43
+
44
+- `ops` CLI uses `SCRIPT_DIR` with `readlink -f` for symlink-safe path resolution
45
+- `registry.yaml` uses a `name_prefix` field; container matching uses `grep` with word anchoring to prevent substring false matches
46
+- Label-based discovery is primary; Coolify UUID suffix prefix-search is the fallback
47
+- Docker labels added to compose files are not live on running containers until restart — noted as gap
48
+
49
+## Files Changed
50
+
51
+- `/opt/infrastructure/ops` — new ops CLI (bash)
52
+- `/opt/infrastructure/servers/hetzner-vps/registry.yaml` — new registry
53
+- `/opt/infrastructure/monitoring/healthcheck.sh` — new healthcheck script
54
+- `Code/mdf-system/docker-compose.yaml` — added ops.* Docker labels
55
+- `Code/mdf-system/scripts/sync/sync.py` — label-based container discovery, domain map fix
56
+
57
+---
58
+
59
+**Tags:** #Session #OpsCLI #Infrastructure
Notes/2026/02/0014 - 2026-02-20 - Registry Naming & Backup System.md
....@@ -0,0 +1,42 @@
1
+# Session 0014: Registry Naming & Backup System
2
+
3
+**Date:** 2026-02-20
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0019
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Fixed `sl-website` registry placement — moved under `seriousletter.services.website` to resolve prefix collision
12
+- [x] Renamed all 7 Coolify services to consistent `{project}-{env/purpose}` lowercase naming
13
+- [x] Deleted stale stopped MDF Dev duplicate from Coolify (UUID: qw8wso0ckskccoo0kcog84c0)
14
+- [x] Fixed `ops backup/restore/sync` argument validation (was crashing on unbound variable)
15
+- [x] Fixed SL CLI path in `registry.yaml` (pointed to wrong location)
16
+- [x] Added `container_name()` to SL `sync.py` with label + prefix fallback (mirrors MDF pattern)
17
+- [x] Made `ops backup <project>` work without env arg (passes `--all` to CLI)
18
+- [x] Added backup summary to `ops status` — latest backup per project/env, size, age with color coding
19
+- [x] Consolidated backup dirs to `/opt/data/backups/{project}/{env}/` across all projects
20
+- [x] Updated both MDF and SL CLIs for per-env backup subdirectory structure
21
+- [x] Volume consolidation: all data migrated from 10GB to 50GB volume at `/opt/data`
22
+- [x] Updated all path references across compose files, CLIs, systemd units, registry, ops CLI
23
+
24
+## Key Decisions / Learnings
25
+
26
+- Registry was initially ambiguous about where `sl-website` lived — prefix collision with other SL services caused matching bugs. Moving it under a `services.website` key made the prefix unique.
27
+- Per-env backup subdirs (`/opt/data/backups/{project}/{env}/`) are the correct structure — flat dirs were the source of orphaned files.
28
+- `ops backup <project>` without env should be a valid shorthand — it delegates `--all` to the project CLI rather than requiring explicit env arg.
29
+- Container name resolution logic must be identical across project CLIs — label-based primary, prefix fallback secondary. Divergence causes mysterious "container not found" bugs.
30
+- Old 10GB volume was kept mounted during migration to avoid cwd-in-mountpoint issues during `umount`.
31
+
32
+## Files Changed
33
+
34
+- `/opt/infrastructure/servers/hetzner-vps/registry.yaml` — fixed sl-website placement, naming consistency
35
+- `/opt/infrastructure/ops` — fixed arg validation, `cmd_backup` without env, backup summary in status
36
+- `/opt/data/seriousletter/{dev,int,prod}/code/scripts/sync/sync.py` — added `container_name()` with fallback
37
+- `Code/mdf-system/scripts/sync/sync.py` — per-env backup subdirectory paths
38
+- All compose files, systemd units — `/opt/data2` → `/opt/data` path updates
39
+
40
+---
41
+
42
+**Tags:** #Session #OpsCLI #BackupSystem #Registry
Notes/2026/02/0015 - 2026-02-22 - Offsite Backup Dashboard Fix & Status Format.md
....@@ -0,0 +1,39 @@
1
+# Session 0015: Offsite Backup Dashboard Fix & Status Format
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0025
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Fixed offsite backups not showing in ops dashboard
12
+ - `/api/backups/offsite` was calling `run_ops_json()` (in-container execution) but `ops offsite list` requires the host Python venv
13
+ - Added `run_ops_host_json()` helper to `ops_runner.py` using `nsenter`-based host execution
14
+ - Updated `backups.py` router to use `run_ops_host_json()` for offsite listing
15
+ - Rebuilt and restarted ops-dashboard container
16
+- [x] Reformatted backup list in `ops status` CLI output
17
+ - Changed from flat table sorted by project to date-grouped boxes
18
+ - Each date gets its own Rich table: project / env / time / size / total columns
19
+ - Latest backup per project/env shown, grouped by date descending, sorted by project then env within each date
20
+- [x] Fixed SeriousLetter backup path bug (CLI-level fix, required for dashboard data correctness)
21
+ - SL CLI was dumping backups flat into `/opt/data/backups/` — changed `backup-all.sh` to call SL CLI per-env with explicit `--backup-dir`
22
+ - Moved 15 orphaned backup files to correct per-env directories
23
+- [x] Ran full backup cycle across all 6 environments (MDF + SL x dev/int/prod), verified offsite upload
24
+
25
+## Key Decisions / Learnings
26
+
27
+- Dashboard containers cannot use in-process `ops` commands that require host-side Python venvs — must use `nsenter` bridge. This is a recurring pattern: in-container vs host execution boundary is an important architectural distinction in the ops-dashboard.
28
+- Two execution helpers needed: `run_ops_json()` (in-container, fast) and `run_ops_host_json()` (host via nsenter, required for backup/offsite commands).
29
+- Date-grouped backup status is more readable than a flat project-sorted table — groups make it obvious if a date was missed entirely.
30
+
31
+## Files Changed
32
+
33
+- `/opt/data/ops-dashboard/app/ops_runner.py` — added `run_ops_host_json()` helper
34
+- `/opt/data/ops-dashboard/app/routers/backups.py` — use host execution for offsite listing
35
+- `/opt/infrastructure/ops` — reformatted backup summary with date-grouped Rich tables
36
+
37
+---
38
+
39
+**Tags:** #Session #OpsDashboard #BackupSystem #Offsite
Notes/2026/02/0016 - 2026-02-22 - Backup Drill-Down Redesign & Restore Fix.md
....@@ -0,0 +1,35 @@
1
+# Session 0016: Backup Drill-Down Redesign & Restore Fix
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0030
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Fixed restore API call — `mdf` CLI was falling into interactive selection because no backup filename was passed
12
+ - `app.js`: `startRestore()` now includes `&name=...` from `restoreCtx` in the API URL
13
+- [x] Implemented backups drill-down redesign (deployed as v7)
14
+ - Replaced flat filter state with 3-level drill state (project → env → backup file)
15
+ - Added cached backups to avoid re-fetching on drill-back
16
+ - Extracted `mergeBackups()` helper function
17
+ - Implemented all 13 changes from the redesign plan
18
+- [x] Fixed browser cache problem preventing new JS from loading after rebuild
19
+ - Rebuilt image and restarted container to force cache bust
20
+
21
+## Key Decisions / Learnings
22
+
23
+- Restore API must include the backup filename explicitly — passing only project/env and letting the CLI choose interactively breaks in non-TTY server context.
24
+- 3-level drill state (project → env → file) is the right UX pattern for hierarchical backup selection; flat filter state made navigation confusing and state management error-prone.
25
+- Caching fetched backup lists at each level avoids latency on drill-back and reduces server load.
26
+- Browser cache busting on vanilla JS apps requires either cache-control headers or a version query param — container restart alone does not always clear client caches.
27
+
28
+## Files Changed
29
+
30
+- `/opt/data/ops-dashboard/static/js/app.js` — `startRestore()` fix, 3-level drill state, `mergeBackups()` helper
31
+- Docker image rebuilt and container restarted
32
+
33
+---
34
+
35
+**Tags:** #Session #OpsDashboard #BackupSystem
Notes/2026/02/0017 - 2026-02-22 - Modular Sync Promote Rebuild Architecture.md
....@@ -0,0 +1,61 @@
1
+# Session 0017: Modular Sync/Promote/Rebuild Architecture
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Paused (context checkpoint)
5
+**Origin:** MDF Webseiten session 0032
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Fixed `SL detect_env()` — was returning "seriousletter" instead of the env name; now scans path components for first match after "data"
12
+- [x] Fixed `MDF list_backups()` indentation bug — try block was at same level as for loop, only parsed the last backup file
13
+- [x] Added `promote` config to `registry.yaml` for mdf (rsync), seriousletter (git), ringsaday (git) — each defines promote type, branch mapping, post-pull behavior
14
+- [x] Added `promote` Typer command to SL `sync.py` — git fetch, diff preview, git pull, Dockerfile change detection, container rebuild/restart, health check; only dev→int and int→prod allowed
15
+- [x] Added `cmd_promote` to ops CLI — delegates to project CLI with `--from`/`--to` args
16
+- [x] Added `cmd_rebuild` to ops CLI — starts containers, waits for health, restores latest backup
17
+- [x] Created 4 new FastAPI routers in ops-dashboard:
18
+ - `promote.py` — SSE streaming promote endpoint
19
+ - `sync_data.py` — SSE streaming sync endpoint
20
+ - `registry.py` — exposes project list + environments + promote config as JSON
21
+ - `rebuild.py` — SSE streaming rebuild/disaster-recovery endpoint
22
+- [x] Updated `backups.py` to read project list from registry API instead of hardcoding
23
+- [x] Added "Operations" page to dashboard sidebar with three sections: Promote Code, Sync Data, Rebuild (Disaster Recovery)
24
+- [x] Operations page uses SSE modal with dry-run toggle; project/direction buttons populated dynamically from `/api/registry/`
25
+- [x] Verified all 7 test categories pass
26
+
27
+## Key Decisions / Learnings
28
+
29
+- All long-running ops commands (promote, sync, rebuild) use SSE streaming — consistent with existing backup/restore pattern. The `stream_ops_host()` helper is the standard interface.
30
+- Registry is the single source of truth for project/environment/promote config. Dashboard reads it dynamically — no hardcoded project names in API routers.
31
+- Promote direction validation lives in the project CLI (`sync.py`), not in the ops CLI or dashboard — keeps enforcement close to the implementation.
32
+- `ops rebuild` is the disaster recovery entry point: bring up containers → wait for healthy → restore latest backup. Simple, composable.
33
+- `detect_env()` path parsing must handle the full `/opt/data/seriousletter/{env}/code/...` structure — scanning for VALID_ENVS after "data" in path components is robust.
34
+
35
+## Files Changed
36
+
37
+- `/opt/data/seriousletter/{dev,int,prod}/code/scripts/sync/sync.py` — fix `detect_env`, add `promote` command
38
+- `Code/mdf-system/scripts/sync/sync.py` (local + deployed to dev) — fix `list_backups` indentation
39
+- `/opt/infrastructure/servers/hetzner-vps/registry.yaml` — add `promote` config per project
40
+- `/opt/infrastructure/ops` — add `cmd_promote`, `cmd_rebuild`
41
+- `/opt/data/ops-dashboard/app/routers/promote.py` — new SSE promote endpoint
42
+- `/opt/data/ops-dashboard/app/routers/sync_data.py` — new SSE sync endpoint
43
+- `/opt/data/ops-dashboard/app/routers/registry.py` — new registry JSON endpoint
44
+- `/opt/data/ops-dashboard/app/routers/rebuild.py` — new SSE rebuild endpoint
45
+- `/opt/data/ops-dashboard/app/routers/backups.py` — dynamic project list from registry
46
+- `/opt/data/ops-dashboard/app/main.py` — register 4 new routers
47
+- `/opt/data/ops-dashboard/static/js/app.js` — Operations page UI + SSE modal
48
+- `/opt/data/ops-dashboard/static/index.html` — nav link + ops-modal HTML
49
+
50
+## Next Steps (at time of pause)
51
+
52
+- [ ] Test backup creation from dashboard UI
53
+- [ ] Test full promote dry-run via dashboard (Operations page)
54
+- [ ] Test sync dry-run via dashboard
55
+- [ ] Commit infrastructure and code repo changes on server
56
+- [ ] DNS cutover mdf-system.de → .ch
57
+- [ ] Disaster recovery test (destroy + rebuild SL dev)
58
+
59
+---
60
+
61
+**Tags:** #Session #OpsDashboard #OpsCLI #Promote #Sync #Rebuild #Registry
Notes/2026/02/0018 - 2026-02-22 - CLI Contract Spec, Sync Compliance, Dashboard Bidirectional UI.md
....@@ -0,0 +1,44 @@
1
+# Session 0018: CLI Contract Spec, Sync Compliance, Dashboard Bidirectional UI
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0033
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Defined project CLI contract (`infrastructure/docs/cli-contract.md`, 514 lines): 4 required commands (backup, restore, sync, promote), exact flags, exit codes, output format, compliance checklist, minimal shell CLI example for new projects
12
+- [x] MDF sync.py contract compliance: ANSI suppression (NO_COLOR env var + TTY detection), `--yes` flag for backup, 6 cancellation paths changed exit 0 → exit 2, `[error]` prefix helper for stderr
13
+- [x] SL sync.py contract compliance: ANSI suppression, `error_exit()` helper, backup now uses per-env subdirectories, absolute path output after backup
14
+- [x] Ops CLI de-hardcoding: removed stale `/opt/data2` from disk checks and healthcheck.sh, generalized hardcoded MDF-specific comments, added `find_registry()` multi-server comment
15
+- [x] Disaster recovery docs: fixed `install.sh` (single-volume layout, auto-detection), fixed `bootstrap.sh` (network pre-creation, local image builds, restore instructions), wrote `docs/disaster-recovery.md` (10-phase runbook)
16
+- [x] Dashboard JS fix: fixed syntax errors in Operations page onclick handlers (nested quotes)
17
+- [x] Permanent cache fix: content-hashed asset URLs so manual `?v=XX` bumps are no longer needed
18
+- [x] Bidirectional sync UI: `prod ↔ dev` with direction picker modal ("content flows down" / "content flows up")
19
+- [x] Deployed to server: ops CLI, registry, healthcheck, install.sh, bootstrap.sh, both sync.py scripts (all 3 envs), dashboard rebuilt with content hashing
20
+- [x] Verified: ops status, ops health, promote dry-run, restore --list, dashboard SSE streaming
21
+
22
+## Key Decisions / Learnings
23
+
24
+- CLI contract enforces: ANSI off via `NO_COLOR` or non-TTY detection; exit codes 0 (success), 1 (error), 2 (cancelled by user); `[error]` prefix on stderr; `--yes` flag to skip prompts in automation
25
+- Cancellation paths must exit 2, not 0 — exit 0 was masking user-cancelled operations in the dashboard
26
+- Content hashing (not version query params) is the correct long-term cache-busting solution
27
+- `find_registry()` multi-server support is documented but not yet implemented — placeholder for future
28
+- DR runbook is 10 phases: verify backups → restore server → install deps → clone repo → restore data → start services → verify
29
+
30
+## Files Changed
31
+
32
+- `infrastructure/docs/cli-contract.md` — new, 514 lines, defines the full CLI contract
33
+- `infrastructure/docs/disaster-recovery.md` — new, 10-phase DR runbook
34
+- `infrastructure/install.sh` — single-volume layout with auto-detection
35
+- `infrastructure/bootstrap.sh` — network pre-creation, local image builds, restore instructions
36
+- `infrastructure/ops` — removed `/opt/data2`, generalized hardcoded comments, `find_registry()` note
37
+- `infrastructure/healthcheck.sh` — removed stale `/opt/data2` disk check
38
+- `Code/mdf-system/scripts/sync/sync.py` — ANSI suppression, `--yes`, exit 2 cancellations, `[error]` helper
39
+- `Code/seriousletter-sync/sync.py` — ANSI suppression, `error_exit()`, per-env backup dirs, absolute path output
40
+- `Code/ops-dashboard/` — JS onclick fix, content-hashed assets, bidirectional sync UI
41
+
42
+---
43
+
44
+**Tags:** #Session #OpsToolkit #OpsDashboard #CliContract #DisasterRecovery
Notes/2026/02/0019 - 2026-02-22 - Offsite Download Feature Added to Dashboard.md
....@@ -0,0 +1,27 @@
1
+# Session 0019: Offsite Download Feature Added to Dashboard
2
+
3
+**Date:** 2026-02-22
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0039
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Added offsite download feature to ops dashboard: per-row download buttons on the Backups page plus action bar buttons
12
+- [x] Offsite download uses SSE streaming (consistent with existing backup/restore/upload patterns)
13
+- [x] Updated ops registry with Seafile services (adds ops-visible services to status output)
14
+
15
+## Key Decisions / Learnings
16
+
17
+- Offsite download follows the same SSE streaming pattern as backup upload — consistency across all long-running operations
18
+- Per-row buttons (individual file download) and action bar buttons (bulk/selected) both supported
19
+
20
+## Files Changed
21
+
22
+- `Code/ops-dashboard/` — offsite download UI (per-row + action bar) with SSE streaming
23
+- `infrastructure/servers/hetzner-vps/registry.yaml` — added Seafile services
24
+
25
+---
26
+
27
+**Tags:** #Session #OpsDashboard #Offsite #SSE
Notes/2026/02/0020 - 2026-02-23 - Backup Coverage Audit, Registry Fixes, Container Resolution.md
....@@ -0,0 +1,35 @@
1
+# Session 0020: Backup Coverage Audit, Registry Fixes, Container Resolution
2
+
3
+**Date:** 2026-02-23
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0041
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Fixed ringsaday backup error: added `backup_sources` (volumes, keys, server, website, .env) and `backup` config to registry; changed `backup_dir` to `/opt/data/backups/ringsaday`; fixed `_backup_generic()` — changed `-d` to `-e` flag so individual files (not just directories) can be backed up; tested: 689 MB backup created successfully
12
+- [x] Full backup coverage audit: identified kioskpilot (1.3 MB) and ops-dashboard (1.5 MB) as missing backups
13
+- [x] Added kioskpilot backup (03:45, 30-day retention)
14
+- [x] Added ops-dashboard to registry + nightly backup (04:15, 30-day retention)
15
+- [x] Now 6 nightly backup timers: mdf, seriousletter, ringsaday, kioskpilot, ops-dashboard, coolify
16
+- [x] Fixed ringsaday container resolution: was showing duplicated entries in `ops status`
17
+ - Added `{prefix}-{env}-` matching pattern to `find_containers()` (handles ringsaday-dev-UUID style names)
18
+ - Added ringsaday-website as sub-service with `environments: [prod]`
19
+- [x] Deployed registry.yaml and ops CLI to server; 6 systemd timers active; backup dirs created
20
+
21
+## Key Decisions / Learnings
22
+
23
+- `_backup_generic()` used `-d` (directory flag) which silently skipped individual files like `.env` and SSL keys — the fix to `-e` (existence check) makes it handle both files and directories
24
+- Container naming for ringsaday uses `{prefix}-{env}-UUID` (Coolify-managed), different from other projects — `find_containers()` needed a second pattern to match these
25
+- ops-dashboard itself must be backed up — it holds its own config and data, easy to overlook
26
+- Backup coverage audit should be a recurring check whenever new projects are added
27
+
28
+## Files Changed
29
+
30
+- `infrastructure/servers/hetzner-vps/registry.yaml` — kioskpilot backup, ops-dashboard entry, ringsaday website sub-service, ringsaday backup_sources
31
+- `infrastructure/ops` — `_backup_generic()` -d→-e fix, `find_containers()` new UUID-style pattern
32
+
33
+---
34
+
35
+**Tags:** #Session #OpsToolkit #Backup #Registry #ContainerResolution
Notes/2026/02/0021 - 2026-02-23 - Rebuild.py Coolify-Only Lifecycle, SSE Keepalive, Traefik Flush.md
....@@ -0,0 +1,35 @@
1
+# Session 0021: Rebuild.py Coolify-Only Lifecycle, SSE Keepalive, Traefik Flush
2
+
3
+**Date:** 2026-02-23
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0044 (part 1)
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] rebuild.py — removed all docker compose fallbacks; recreate is now Coolify stop → wipe → Coolify start; rebuild is Coolify stop → docker build → Coolify start; restart stays as `docker restart` (Coolify restart prunes local images — intentional exception)
12
+- [x] Fixed build step: changed from `docker compose --profile {env} build` (requires all Coolify env vars) to `docker build -t {image}:{env} {context}` using registry `build_context` and `image_name` directly — no env vars needed
13
+- [x] Added `_coolify_start_with_retry()`: polls 60s after API call, retries up to 3 times — handles Coolify silently dropping start requests
14
+- [x] Container stabilization polling: `_poll_until_running` now waits for container count to be stable for 2 consecutive polls (10s) before declaring success — previously returned success on first container appearance
15
+- [x] "Already running/stopped" handling: Coolify API HTTP 400 with that message now treated as success, not error
16
+- [x] SSE keepalive for restore: restore connections were dropping during DB import (~60s silence); added `_stream_with_keepalive()` wrapper in `restore.py` — sends SSE comment `: keepalive` every 15s
17
+- [x] Added `responseForwarding.flushInterval: "-1"` to ops-dashboard Traefik dynamic config — Traefik was buffering SSE responses, causing keepalives to not reach the client
18
+
19
+## Key Decisions / Learnings
20
+
21
+- Coolify `restart` prunes locally-built images — `docker restart` (bypassing Coolify) is the correct approach for services with local images; this is a documented exception in rebuild.py
22
+- Coolify can silently queue-and-never-execute start requests — retry logic with polling is mandatory, not optional
23
+- "Already running" from Coolify API is a valid state (idempotent), not an error — treat HTTP 400 with that message as success
24
+- SSE keepalive must happen at the application level (`: keepalive` comment) AND Traefik must be configured to flush immediately (`flushInterval: "-1"`) — both are required; one alone is not enough
25
+- Stable polling (2 consecutive matching counts) is more reliable than "at least one container appeared"
26
+
27
+## Files Changed
28
+
29
+- `Code/ops-dashboard/app/routers/rebuild.py` — Coolify-only lifecycle, `docker build` from registry config, `_coolify_start_with_retry()`, stable container polling, HTTP 400 success handling
30
+- `Code/ops-dashboard/app/routers/restore.py` — `_stream_with_keepalive()` SSE keepalive wrapper
31
+- Server: `/data/coolify/proxy/dynamic/ops-dashboard.yaml` — added `responseForwarding.flushInterval: "-1"`
32
+
33
+---
34
+
35
+**Tags:** #Session #OpsDashboard #Rebuild #SSE #Traefik #Coolify
Notes/2026/02/0022 - 2026-02-23 - Post-Coolify Architecture Context for Ops Toolkit.md
....@@ -0,0 +1,52 @@
1
+# Session 0022: Post-Coolify Architecture Context for Ops Toolkit
2
+
3
+**Date:** 2026-02-23
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0044 (Coolify Removal Complete)
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Coolify fully removed from server (6 containers, 18 UUID networks, /data/coolify/ directory)
12
+- [x] Standalone Traefik v3.6 confirmed as proxy layer (was coolify-proxy, now independent at /opt/data/traefik/)
13
+- [x] All 28 containers verified operational post-removal; 17/17 domains tested
14
+- [x] Dynamic configs migrated: seriousletter.yaml, ringsaday.yaml moved to /opt/data/traefik/dynamic/
15
+- [x] SSL certificates preserved: acme.json migrated to /opt/data/traefik/acme.json
16
+- [x] Coolify archive retained: /opt/data/backups/coolify-final-20260223.tar.gz (125KB, 30-day window)
17
+
18
+## Key Decisions / Learnings
19
+
20
+- **Ops toolkit no longer depends on Coolify API** — all lifecycle management (start/stop/rebuild/recreate) must use Docker CLI and docker compose directly against project compose files at `/opt/data/{project}/`
21
+- **Container naming is now clean** — no more UUID suffixes. Pattern: `{env}-{project}-{service}` (e.g. `prod-mdf-wordpress`, `dev-seriousletter-backend`)
22
+- **Proxy network is `proxy`** (replaces old `coolify` network) — all Traefik-exposed containers connect to it
23
+- **Project descriptors at `/opt/data/{project}/project.yaml`** are the new source of truth for container config — registry.yaml is deprecated (used only by gen-timers and schedule PUT)
24
+- **Docker provider + file provider** coexist in Traefik: MDF services use Docker labels; SeriousLetter, RingsADay, KioskPilot use file provider configs
25
+- metro.ringsaday.com returns 502 — pre-existing issue unrelated to Coolify removal (no metro service in compose)
26
+- Docker system cleanup freed ~9GB of unused images and volumes during removal
27
+
28
+## Architecture Reference (Post-Coolify)
29
+
30
+```
31
+Proxy: Traefik v3.6 at /opt/data/traefik/
32
+Config: traefik.yaml (static), dynamic/ (file provider)
33
+Certs: /opt/data/traefik/acme.json
34
+Proxy network: proxy
35
+
36
+Projects:
37
+ MDF prod: /opt/data/mdf/prod/ — WordPress, MySQL, Mail, PostfixAdmin, Roundcube, Seafile
38
+ MDF int/dev: /opt/data/mdf/{int,dev}/ — WordPress + MySQL
39
+ SeriousLetter: /opt/data/seriousletter/{dev,int,prod}/
40
+ RingsADay: /opt/data/ringsaday/
41
+ KioskPilot: /opt/data/kioskpilot/
42
+ Ops Dashboard: /opt/data/ops-dashboard/
43
+```
44
+
45
+## Files Changed
46
+
47
+- Server: `/data/coolify/` — deleted (backed up first)
48
+- Server: `/opt/data/traefik/dynamic/` — received migrated seriousletter.yaml and ringsaday.yaml
49
+
50
+---
51
+
52
+**Tags:** #Session #OpsToolkit #Architecture #Traefik #PostCoolify
Notes/2026/02/0023 - 2026-02-23 - Toolkit Bootstrap Starting Point.md
....@@ -0,0 +1,44 @@
1
+# Session 0023: Toolkit Bootstrap Starting Point
2
+
3
+**Date:** 2026-02-23
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0045
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+- [x] Created `project.yaml` descriptors for all 5 projects (mdf, seriousletter, ringsaday, kioskpilot, ops-dashboard)
12
+- [x] Updated `ops-dashboard` docker-compose.yaml: network `coolify` → `proxy`
13
+- [x] Added Alpine pre-pull with retry (4 attempts, 15s delays) to `rebuild.py` — note: this was a pre-redesign patch, superseded by Phase 5 rewrite in session 0046
14
+- [x] Added image verification after build to `rebuild.py`
15
+- [x] Identified Phase 3+4 toolkit work as next immediate task (was interrupted this session)
16
+
17
+## Context / Background
18
+
19
+This session was primarily about removing Coolify and migrating all projects to standalone Docker Compose. The OPS-relevant outcome is:
20
+
21
+- All 5 `project.yaml` descriptors now exist and are the source of truth for the toolkit
22
+- The `proxy` Docker network replaces the old `coolify` network — all Traefik-exposed containers connect to it
23
+- The toolkit build (Phase 3+4) was planned but interrupted mid-session — completed in session 0046
24
+- The plan was documented at: `Notes/swarm/plan.md` (since cleaned up)
25
+
26
+## Key Decisions / Learnings
27
+
28
+- `container_prefix` in `project.yaml` uses `{env}` placeholder (e.g. `"{env}-mdf"`) — the toolkit must expand this at runtime
29
+- SeriousLetter uses `"{env}-seriousletter"` as prefix (not `sl`)
30
+- ops-dashboard gets its own `project.yaml` like all other projects
31
+
32
+## Files Changed
33
+
34
+- `/opt/data/mdf/project.yaml` — created
35
+- `/opt/data/seriousletter/project.yaml` — created
36
+- `/opt/data/ringsaday/project.yaml` — created
37
+- `/opt/data/kioskpilot/project.yaml` — created
38
+- `/opt/data/ops-dashboard/project.yaml` — created
39
+- `/opt/data/ops-dashboard/docker-compose.yml` — network coolify→proxy
40
+- `app/routers/rebuild.py` — Alpine retry + image verify (pre-redesign, superseded)
41
+
42
+---
43
+
44
+**Tags:** #Session #OpsToolkit #Infrastructure
Notes/2026/02/0024 - 2026-02-23 - Toolkit and CLI Rewrite and Dashboard Migration.md
....@@ -0,0 +1,65 @@
1
+# Session 0024: Toolkit and CLI Rewrite and Dashboard Migration
2
+
3
+**Date:** 2026-02-23
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0046
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+### Phase 3: Shared Toolkit
12
+
13
+- [x] Completed 5 missing toolkit modules at `/opt/infrastructure/toolkit/`:
14
+ - `cli.py` — main CLI entry point with all commands (status, start, stop, build, rebuild, destroy, backup, restore, sync, promote, logs, health, disk, backups, offsite, gen-timers, init)
15
+ - `output.py` — formatted output (Rich tables, JSON mode, plain text fallback)
16
+ - `restore.py` — restore operations with CLI delegation support
17
+ - `sync.py` — data sync between environments with CLI delegation
18
+ - `promote.py` — code promotion (git, rsync, script) with adjacency enforcement
19
+- [x] 7 modules already existed from prior sessions: `__init__.py`, `descriptor.py`, `docker.py`, `backup.py`, `database.py`, `health.py`, `discovery.py`
20
+
21
+### Phase 4: Ops CLI Rewrite
22
+
23
+- [x] Replaced 950-line bash ops CLI with 7-line bash shim → `python3 -m toolkit.cli`
24
+- [x] Old CLI backed up as `ops.bak.20260223`
25
+- [x] New commands added: `start`, `stop`, `build`, `destroy`, `logs`, `restart`, `init`
26
+- [x] All commands read from `project.yaml` descriptors — no `registry.yaml` dependency
27
+- [x] Container prefix matching fixed: handles `{env}` placeholder expansion in `container_prefix`
28
+
29
+### Phase 5: Dashboard Adaptation
30
+
31
+- [x] Rewrote 4 dashboard routers to use project.yaml:
32
+ - `registry.py` — imports `toolkit.discovery.all_projects()` instead of parsing registry.yaml
33
+ - `services.py` — uses `toolkit.descriptor.find()` for container name resolution
34
+ - `rebuild.py` — massive rewrite: 707 → 348 lines, removed ALL Coolify API code, uses direct docker compose
35
+ - `schedule.py` — reads from descriptors for GET, still writes to registry.yaml for PUT (gen-timers compatibility)
36
+- [x] Verified all API endpoints working:
37
+ - `/api/registry/` — returns all 5 projects from descriptors
38
+ - `/api/status/` — shows 25 containers
39
+ - `/api/schedule/` — shows backup schedules for all 5 projects
40
+ - `/api/services/logs/mdf/prod/wordpress` — correctly resolves container name
41
+
42
+## Key Decisions / Learnings
43
+
44
+- `rebuild.py` now uses `_compose_cmd()` helper that finds compose file (.yaml/.yml), env-file (.env.{env}/.env), and adds `--profile {env}` — removes all Coolify API dependency
45
+- Dashboard container has `/opt/infrastructure` mounted → can import toolkit directly via Python
46
+- pyyaml 6.0.3 confirmed available in dashboard container
47
+- `schedule.py` still writes to `registry.yaml` for PUT/gen-timers — full descriptor migration is a future task
48
+- `container_prefix_for(env)` expands `{env}` in prefix, then matches `{prefix}-*` containers
49
+
50
+## Files Changed
51
+
52
+- `/opt/infrastructure/toolkit/cli.py` — new (all CLI commands)
53
+- `/opt/infrastructure/toolkit/output.py` — new (Rich/JSON/plain output)
54
+- `/opt/infrastructure/toolkit/restore.py` — new
55
+- `/opt/infrastructure/toolkit/sync.py` — new
56
+- `/opt/infrastructure/toolkit/promote.py` — new
57
+- `/usr/local/bin/ops` — rewritten as 7-line bash shim
58
+- `app/routers/registry.py` — uses toolkit.discovery
59
+- `app/routers/services.py` — uses toolkit.descriptor
60
+- `app/routers/rebuild.py` — 707→348 lines, Coolify removed
61
+- `app/routers/schedule.py` — descriptor-backed GET
62
+
63
+---
64
+
65
+**Tags:** #Session #OpsToolkit #OpsCLI #OpsDashboard
Notes/2026/02/0025 - 2026-02-24 - Dashboard Bugs and SL Routing Fixes.md
....@@ -0,0 +1,62 @@
1
+# Session 0025: Dashboard Bugs and SL Routing Fixes
2
+
3
+**Date:** 2026-02-24
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0048 (Part 2 only — DNS cutover and mail recovery sections skipped)
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+### Operations Page: Recreate Replaced by Backup + Restore
12
+
13
+- [x] Removed "Recreate" lifecycle action (redundant with Rebuild for bind-mount projects)
14
+- [x] Added **Backup** button (blue): opens lifecycle modal with SSE streaming to `/api/backups/stream/{project}/{env}`
15
+- [x] Added **Restore** button (purple): navigates to Backups page at drill level 2 for that project/env
16
+- [x] Added cache invalidation on backup success
17
+
18
+### SeriousLetter Bad Gateway Fix
19
+
20
+- [x] Diagnosed root cause: SL containers only on `seriousletter-network`, not on `proxy` network Traefik uses
21
+- [x] Permanent fix: added `proxy` network to docker-compose.yaml for all 3 SL envs (prod/int/dev)
22
+ - `backend` and `frontend` services get `proxy` in networks list
23
+ - `proxy: external: true` added to networks section
24
+- [x] Added health checks for both services:
25
+ - Backend: `python3 urllib.request.urlopen("http://localhost:8000/docs")`
26
+ - Frontend: `wget --spider -q http://127.0.0.1:3000/` (explicit `127.0.0.1`, not `localhost` — Alpine resolves to IPv6 `::1`)
27
+
28
+### Sync Routing Bug Fix
29
+
30
+- [x] Fixed sync section only showing MDF (not SeriousLetter)
31
+- [x] Root cause (two-part):
32
+ 1. `registry.py` had `desc.sync.get("type") == "cli"` — SL had `sync.type: toolkit`, evaluated to `False`
33
+ 2. SL's `toolkit` type was itself wrong — should be `cli` with a CLI path
34
+- [x] Fix in `registry.py`: `"has_cli": desc.sync.get("type") == "cli"` → `"has_cli": bool(desc.sync.get("type"))`
35
+- [x] Fix in `/opt/data/seriousletter/project.yaml`: `sync.type: toolkit` → `type: cli` with `cli:` path
36
+
37
+### Backup Date Inconsistency Fix
38
+
39
+- [x] Fixed overview card showing stale "INT Latest" date while drill-down showed correct newer backups
40
+- [x] Root cause: string comparison between incompatible date formats:
41
+ - Compact (MDF CLI): `20260220_195300`
42
+ - ISO (toolkit): `2026-02-24T03:00:42`
43
+ - Character `'0' > '-'` meant compact dates always "won" the `>` comparison
44
+- [x] Fix: added `normalizeBackupDate()` function to convert all dates to ISO format at merge time in `mergeBackups()`
45
+
46
+## Key Decisions / Learnings
47
+
48
+- When adding a container to a new network, an ad-hoc `docker network connect` is lost on restart — the fix must go in the compose file
49
+- Alpine `localhost` resolves to `::1` (IPv6). Services binding only IPv4 `0.0.0.0` won't respond. Use `127.0.0.1` explicitly in health checks.
50
+- For `has_cli` logic: any truthy `sync.type` value means the project has ops CLI support — don't compare to a specific string
51
+- Date normalization must happen at merge time, not display time, to get correct `max()` comparisons
52
+
53
+## Files Changed
54
+
55
+- `static/js/app.js` — removed recreate modal/handler, added backup modal, URL routing for restore button, cache invalidation, `normalizeBackupDate()` + `mergeBackups()` fix
56
+- `app/routers/registry.py` — `has_cli` logic fix
57
+- `/opt/data/seriousletter/project.yaml` — `sync.type` corrected
58
+- `/opt/data/seriousletter/{prod,int,dev}/code/docker-compose.yaml` — proxy network + health checks
59
+
60
+---
61
+
62
+**Tags:** #Session #OpsDashboard #BugFix
Notes/2026/02/0026 - 2026-02-25 - Persistent Jobs and Container Terminal.md
....@@ -0,0 +1,69 @@
1
+# Session 0026: Persistent Jobs and Container Terminal
2
+
3
+**Date:** 2026-02-25
4
+**Status:** Completed
5
+**Origin:** MDF Webseiten session 0053
6
+
7
+---
8
+
9
+## Work Done
10
+
11
+### Feature 1: Persistent/Reconnectable Jobs
12
+
13
+- [x] New `app/job_store.py` — in-memory job store decouples subprocess from SSE connection
14
+- [x] New `app/routers/jobs.py` — job management endpoints
15
+- [x] New endpoints: `GET /api/jobs/`, `GET /api/jobs/{op_id}`, `GET /api/jobs/{op_id}/stream?from=N`
16
+- [x] Added `run_job()` to `ops_runner.py` — runs subprocess writing to job store, NOT killed on browser disconnect
17
+- [x] Added `job_sse_stream()` to `job_store.py` — shared SSE wrapper with keepalive
18
+- [x] Rewrote 6 routers to use job store pattern: backups.py, restore.py, sync_data.py, promote.py, rebuild.py, schedule.py
19
+- [x] All routers follow pattern: `create_job()` → `asyncio.create_task(run_job())` → `return StreamingResponse(job_sse_stream())`
20
+- [x] Background cleanup task removes expired jobs every 5 minutes (1 hour TTL)
21
+- [x] Frontend: auto-reconnect on SSE error via `/api/jobs/{op_id}/stream?from=N` (3 retries)
22
+- [x] Frontend: check for running jobs on page load, show reconnect banner
23
+
24
+### Feature 2: Container Terminal
25
+
26
+- [x] New `app/routers/terminal.py` — WebSocket endpoint with PTY via `docker exec`
27
+- [x] Protocol: `{"type":"input","data":"..."}` / `{"type":"resize","cols":80,"rows":24}` / `{"type":"output","data":"..."}`
28
+- [x] Frontend: xterm.js 5.5.0 + addon-fit from CDN, terminal modal, Console button on services page
29
+- [x] Security: token auth, container name validation (regex allowlist), running check via docker inspect
30
+
31
+### Fixes Applied
32
+
33
+- [x] Restored bidirectional sync pairs in `sync_data.py` (regression from engineer rewrite)
34
+- [x] Restored multi-compose support in `rebuild.py` (`_all_compose_dirs`, `_compose_cmd_for` for Seafile)
35
+- [x] Updated `main.py` with jobs + terminal routers, cleanup task in lifespan
36
+- [x] Bumped APP_VERSION to v15-20260225
37
+- [x] Also committed + pushed `sync_data.py` bidirectional fix (git commit 31ac43f) and stabilization checks
38
+
39
+## Key Decisions / Learnings
40
+
41
+- Decoupling subprocess from SSE via a job store is the correct pattern — browser disconnect should never kill a running backup/restore
42
+- Job store is in-memory (not persisted) — server restart loses job history, which is acceptable
43
+- xterm.js from CDN (not bundled) keeps the container image lean
44
+- Container name validation via regex allowlist prevents command injection through the WebSocket terminal endpoint
45
+- `from=N` query param on stream endpoint enables replay from any position — client tracks last received line index
46
+
47
+## Files Changed
48
+
49
+- `app/job_store.py` — new (315 lines)
50
+- `app/routers/jobs.py` — new (186 lines)
51
+- `app/routers/terminal.py` — new (287 lines)
52
+- `app/ops_runner.py` — added `run_job()` (388 lines total)
53
+- `app/main.py` — added routers + cleanup task (138 lines)
54
+- `app/routers/backups.py` — job store integration (287 lines)
55
+- `app/routers/restore.py` — job store integration (290 lines)
56
+- `app/routers/sync_data.py` — job store + bidirectional fix (71 lines)
57
+- `app/routers/promote.py` — job store integration (69 lines)
58
+- `app/routers/rebuild.py` — job store + multi-compose (365 lines)
59
+- `static/js/app.js` — v15: reconnect + terminal (2355 lines)
60
+- `static/index.html` — xterm.js CDN + terminal modal
61
+- `static/css/style.css` — terminal styles
62
+
63
+## State at Session End
64
+
65
+Code written locally at `/Users/i052341/Daten/Cloud/08 - Others/MDF/Infrastruktur/Code/ops-dashboard/`. Not yet deployed to server at time of note creation. Deploy + verification is the next session's starting task.
66
+
67
+---
68
+
69
+**Tags:** #Session #OpsDashboard #PersistentJobs #Terminal
Notes/2026/02/0027 - 2026-02-26 - Dynamic Backup Buttons & TEKMidian Registration.md
....@@ -0,0 +1,61 @@
1
+# 0027 - 2026-02-26 - Dynamic Backup Buttons & TEKMidian Registration
2
+
3
+## Context
4
+
5
+Changes made from the TEKMidian project session while registering TEKMidian in the ops dashboard.
6
+
7
+## Changes
8
+
9
+### 1. Dynamic "Create Backup" Buttons (app.js)
10
+
11
+**Problem:** The "Create Backup" buttons on the Backups page were hardcoded to only `mdf` and `seriousletter`:
12
+```javascript
13
+for (const p of ['mdf', 'seriousletter']) {
14
+ for (const e of ['dev', 'int', 'prod']) {
15
+```
16
+
17
+**Fix:** Made buttons dynamic from the `/api/schedule/` endpoint. Now all backup-enabled projects get buttons automatically based on their configured environments:
18
+```javascript
19
+for (const s of (cachedSchedules || [])) {
20
+ if (!s.enabled) continue;
21
+ const envs = s.backup_environments || s.environments || [];
22
+ // render button per environment
23
+}
24
+```
25
+
26
+Also added schedule data fetch in `renderBackups()` alongside the existing backup/offsite fetches:
27
+```javascript
28
+const [local, offsite, schedules] = await Promise.all([
29
+ api('/api/backups/'),
30
+ api('/api/backups/offsite').catch(() => []),
31
+ cachedSchedules ? Promise.resolve(cachedSchedules) : api('/api/schedule/').catch(() => []),
32
+]);
33
+```
34
+
35
+### 2. Empty-State Project Cards (app.js)
36
+
37
+**Problem:** Projects with backup config but no backups yet didn't appear in the project cards grid (only projects with existing backup files showed up).
38
+
39
+**Fix:** After the existing project cards loop, added a second loop over `cachedSchedules` to show backup-configured projects that have 0 backups as dashed-border cards:
40
+```javascript
41
+for (const s of (cachedSchedules || [])) {
42
+ if (!s.enabled || projects[s.project]) continue;
43
+ // render dashed card with "0 backups" and "No backups yet"
44
+}
45
+```
46
+
47
+### 3. Cache Busting
48
+
49
+- Bumped `APP_VERSION` from `v15-20260225` to `v16-20260226`
50
+- Updated `index.html`: `app.js?v=15` to `app.js?v=16`
51
+
52
+## Files Modified (on server)
53
+
54
+- `/opt/data/ops-dashboard/static/js/app.js` — dynamic backup buttons, schedule fetch, empty-state cards, version bump
55
+- `/opt/data/ops-dashboard/static/index.html` — cache bust `?v=16`
56
+
57
+## Notes
58
+
59
+- Edits require `sudo` — file is owned by uid 501 (macOS user via scp)
60
+- No container restart needed — static files are bind-mounted (`./static:/app/static`)
61
+- First TEKMidian backup triggered via schedule API (926K tar.gz)