# Session 0026: Persistent Jobs and Container Terminal **Date:** 2026-02-25 **Status:** Completed **Origin:** MDF Webseiten session 0053 --- ## Work Done ### Feature 1: Persistent/Reconnectable Jobs - [x] New `app/job_store.py` — in-memory job store decouples subprocess from SSE connection - [x] New `app/routers/jobs.py` — job management endpoints - [x] New endpoints: `GET /api/jobs/`, `GET /api/jobs/{op_id}`, `GET /api/jobs/{op_id}/stream?from=N` - [x] Added `run_job()` to `ops_runner.py` — runs subprocess writing to job store, NOT killed on browser disconnect - [x] Added `job_sse_stream()` to `job_store.py` — shared SSE wrapper with keepalive - [x] Rewrote 6 routers to use job store pattern: backups.py, restore.py, sync_data.py, promote.py, rebuild.py, schedule.py - [x] All routers follow pattern: `create_job()` → `asyncio.create_task(run_job())` → `return StreamingResponse(job_sse_stream())` - [x] Background cleanup task removes expired jobs every 5 minutes (1 hour TTL) - [x] Frontend: auto-reconnect on SSE error via `/api/jobs/{op_id}/stream?from=N` (3 retries) - [x] Frontend: check for running jobs on page load, show reconnect banner ### Feature 2: Container Terminal - [x] New `app/routers/terminal.py` — WebSocket endpoint with PTY via `docker exec` - [x] Protocol: `{"type":"input","data":"..."}` / `{"type":"resize","cols":80,"rows":24}` / `{"type":"output","data":"..."}` - [x] Frontend: xterm.js 5.5.0 + addon-fit from CDN, terminal modal, Console button on services page - [x] Security: token auth, container name validation (regex allowlist), running check via docker inspect ### Fixes Applied - [x] Restored bidirectional sync pairs in `sync_data.py` (regression from engineer rewrite) - [x] Restored multi-compose support in `rebuild.py` (`_all_compose_dirs`, `_compose_cmd_for` for Seafile) - [x] Updated `main.py` with jobs + terminal routers, cleanup task in lifespan - [x] Bumped APP_VERSION to v15-20260225 - [x] Also committed + pushed `sync_data.py` bidirectional fix (git commit 31ac43f) and stabilization checks ## Key Decisions / Learnings - Decoupling subprocess from SSE via a job store is the correct pattern — browser disconnect should never kill a running backup/restore - Job store is in-memory (not persisted) — server restart loses job history, which is acceptable - xterm.js from CDN (not bundled) keeps the container image lean - Container name validation via regex allowlist prevents command injection through the WebSocket terminal endpoint - `from=N` query param on stream endpoint enables replay from any position — client tracks last received line index ## Files Changed - `app/job_store.py` — new (315 lines) - `app/routers/jobs.py` — new (186 lines) - `app/routers/terminal.py` — new (287 lines) - `app/ops_runner.py` — added `run_job()` (388 lines total) - `app/main.py` — added routers + cleanup task (138 lines) - `app/routers/backups.py` — job store integration (287 lines) - `app/routers/restore.py` — job store integration (290 lines) - `app/routers/sync_data.py` — job store + bidirectional fix (71 lines) - `app/routers/promote.py` — job store integration (69 lines) - `app/routers/rebuild.py` — job store + multi-compose (365 lines) - `static/js/app.js` — v15: reconnect + terminal (2355 lines) - `static/index.html` — xterm.js CDN + terminal modal - `static/css/style.css` — terminal styles ## State at Session End Code written locally at `/Users/i052341/Daten/Cloud/08 - Others/MDF/Infrastruktur/Code/ops-dashboard/`. Not yet deployed to server at time of note creation. Deploy + verification is the next session's starting task. --- **Tags:** #Session #OpsDashboard #PersistentJobs #Terminal