Browse Source

Fix timelapse baseline race condition (#315)

The snapshot-diff timelapse detection captured its baseline of existing
MP4 files at print completion time inside a background task. Fast-
encoding printers could finish writing the timelapse before the baseline
was taken, so the new file appeared in the baseline and was never
detected as "new" — resulting in no timelapse attached.

Move baseline capture to print start time, when the timelapse file
cannot possibly exist yet. The baseline is stored in a module-level dict
keyed by printer_id and popped at completion time. Falls back to
completion-time baseline if the app was restarted mid-print.
maziggy 3 months ago
parent
commit
0fe2bb1342
2 changed files with 45 additions and 13 deletions
  1. 1 0
      CHANGELOG.md
  2. 44 13
      backend/app/main.py

+ 1 - 0
CHANGELOG.md

@@ -42,6 +42,7 @@ All notable changes to Bambuddy will be documented in this file.
 - **Virtual Printer IP Override for Server Mode** ([#52](https://github.com/maziggy/bambuddy/issues/52)) — The `remote_interface_ip` setting (network interface override) was only used in proxy mode, but users with multiple network interfaces (LAN + Tailscale, Docker bridges) also needed it in server modes (immediate/review/print_queue). Auto-detected IP from `_get_local_ip()` followed the OS default route, causing wrong IP in TLS certificate SAN (handshake failures) and SSDP broadcasts (slicer can't discover printer). Now the interface override applies to all modes: included in certificate SAN, passed to SSDP server as advertise IP, and triggers service restart on change. UI dropdown shown for all modes when enabled (not just proxy).
 - **Wrong Thumbnail When Reprinting Same Project** ([#314](https://github.com/maziggy/bambuddy/issues/314)) — Reprinting a project with the same name but a different bed layout showed the old thumbnail during printing. The cover image cache was keyed by `subtask_name` and never invalidated between prints, so a cache hit returned the stale first-print thumbnail. Now the cover cache is cleared on every print start.
 - **Wrong Timelapse Attached to Archive** ([#315](https://github.com/maziggy/bambuddy/issues/315)) — After a print, the archive could receive a timelapse from a previous print instead of the just-completed one. The auto-scan sorted MP4 files by mtime and grabbed the "most recent," but in LAN-only mode (no NTP) the printer's clock is wrong, making mtime unreliable. Replaced with a snapshot-diff approach: baseline existing files before waiting, then detect the new file that appears after encoding. Falls back to print-name matching if no new file is found after retries.
+- **Timelapse Not Attached — Baseline Race Condition** ([#315](https://github.com/maziggy/bambuddy/issues/315)) — Follow-up to the snapshot-diff timelapse fix: the baseline of existing MP4 files was captured at print completion time inside a background task, but fast-encoding printers could finish writing the timelapse before the baseline was taken, causing the new file to appear in the baseline and never be detected as "new." Moved baseline capture to print start time, when the timelapse file cannot possibly exist yet. Falls back to completion-time baseline if the app was restarted mid-print.
 - **Calibration Prints Archived** ([#315](https://github.com/maziggy/bambuddy/issues/315)) — Standalone calibration prints (flow, vibration, bed leveling) were being archived as regular prints. The calibration gcode (`/usr/etc/print/auto_cali_for_user.gcode`) and other internal printer files under `/usr/` are now detected and skipped during print start.
 - **Camera Stop 401 When Auth Enabled** — Camera stop requests (`sendBeacon`) failed with 401 Unauthorized when authentication was enabled because `sendBeacon` cannot send auth headers. Replaced with `fetch` + `keepalive: true` which supports Authorization headers while remaining reliable during page unload.
 - **Spoolman Creates Duplicate Spools on Startup** ([#295](https://github.com/maziggy/bambuddy/pull/295)) — Each AMS tray independently fetched all spools from Spoolman, causing redundant API calls and duplicate spool creation with large databases (300+ spools). Now fetches spools once and reuses cached data across all tray operations. Added retry logic (3 attempts, 500ms delay) with connection recreation for transient network errors.

+ 44 - 13
backend/app/main.py

@@ -253,6 +253,10 @@ _last_progress_milestone: dict[int, int] = {}
 # This prevents sending duplicate notifications for the same error
 _notified_hms_errors: dict[int, set[str]] = {}
 
+# Track timelapse file baselines at print start: {printer_id: set of MP4 filenames}
+# Used for snapshot-diff detection at print completion
+_timelapse_baselines: dict[int, set[str]] = {}
+
 
 async def _get_plug_energy(plug, db) -> dict | None:
     """Get energy from plug regardless of type (Tasmota, Home Assistant, or MQTT).
@@ -1404,6 +1408,18 @@ async def on_print_start(printer_id: int, data: dict):
                     await _store_spoolman_print_data(printer_id, archive.id, archive.file_path, db, printer_manager)
                 except Exception as e:
                     logger.warning("[SPOOLMAN] Failed to store tracking data: %s", e)
+
+                # Capture timelapse file baseline for snapshot-diff on completion
+                try:
+                    baseline_files, _ = await _list_timelapse_mp4s(printer)
+                    _timelapse_baselines[printer_id] = {f.get("name", "") for f in baseline_files}
+                    logger.info(
+                        "[TIMELAPSE] Baseline at print start: %s MP4 files for printer %s",
+                        len(_timelapse_baselines[printer_id]),
+                        printer_id,
+                    )
+                except Exception as e:
+                    logger.warning("[TIMELAPSE] Failed to capture baseline at print start: %s", e)
         finally:
             if temp_path and temp_path.exists():
                 temp_path.unlink()
@@ -1435,7 +1451,7 @@ async def _list_timelapse_mp4s(printer) -> tuple[list[dict], str | None]:
     return [], None
 
 
-async def _scan_for_timelapse_with_retries(archive_id: int):
+async def _scan_for_timelapse_with_retries(archive_id: int, baseline_names: set[str] | None = None):
     """
     Scan for timelapse with retries using a snapshot-diff approach.
 
@@ -1443,6 +1459,10 @@ async def _scan_for_timelapse_with_retries(archive_id: int):
     clock is wrong in LAN-only mode), we snapshot existing MP4 filenames BEFORE
     waiting, then look for any NEW filename that appears after each delay.
 
+    If baseline_names is provided (captured at print start), it is used directly.
+    Otherwise falls back to taking a baseline at completion time (best-effort
+    for prints started before app restart).
+
     Falls back to name-matching (print name contained in MP4 filename) if no
     new file appears after all retries.
     """
@@ -1468,18 +1488,28 @@ async def _scan_for_timelapse_with_retries(archive_id: int):
                 logger.warning("[TIMELAPSE] Archive %s has no printer, aborting", archive_id)
                 return
 
-            result = await db.execute(select(Printer).where(Printer.id == archive.printer_id))
-            printer = result.scalar_one_or_none()
-            if not printer:
-                logger.warning("[TIMELAPSE] Printer not found for archive %s, aborting", archive_id)
-                return
+            if baseline_names is not None:
+                # Use pre-captured baseline from print start (no race condition)
+                logger.info(
+                    "[TIMELAPSE] Using print-start baseline: %s existing MP4 files for archive %s",
+                    len(baseline_names),
+                    archive_id,
+                )
+            else:
+                # Fallback: take baseline now (e.g. app restarted mid-print)
+                result = await db.execute(select(Printer).where(Printer.id == archive.printer_id))
+                printer = result.scalar_one_or_none()
+                if not printer:
+                    logger.warning("[TIMELAPSE] Printer not found for archive %s, aborting", archive_id)
+                    return
 
-            # Snapshot current MP4 filenames as baseline
-            baseline_files, _ = await _list_timelapse_mp4s(printer)
-            baseline_names: set[str] = {f.get("name", "") for f in baseline_files}
-            logger.info(
-                "[TIMELAPSE] Baseline snapshot: %s existing MP4 files for archive %s", len(baseline_names), archive_id
-            )
+                baseline_files, _ = await _list_timelapse_mp4s(printer)
+                baseline_names = {f.get("name", "") for f in baseline_files}
+                logger.info(
+                    "[TIMELAPSE] Baseline snapshot (fallback): %s existing MP4 files for archive %s",
+                    len(baseline_names),
+                    archive_id,
+                )
 
             # Derive base_name for name-matching fallback
             base_name = Path(archive.filename).stem if archive.filename else ""
@@ -2179,7 +2209,8 @@ async def on_print_complete(printer_id: int, data: dict):
         logger.info("[TIMELAPSE] Timelapse was active during print, scheduling auto-scan for archive %s", archive_id)
         # Schedule timelapse scan as background task with retries
         # The printer needs time to encode the video after print completion
-        asyncio.create_task(_scan_for_timelapse_with_retries(archive_id))
+        baseline = _timelapse_baselines.pop(printer_id, None)
+        asyncio.create_task(_scan_for_timelapse_with_retries(archive_id, baseline))
         log_timing("Timelapse scan scheduled")
 
     # Update queue item if this was a scheduled print