Browse Source

fix(obico): clear Status banner on next successful detection cycle (#172)

maziggy 1 month ago
parent
commit
ea78fe720c

+ 1 - 0
CHANGELOG.md

@@ -5,6 +5,7 @@ All notable changes to Bambuddy will be documented in this file.
 ## [0.2.4b1] - Unreleased
 
 ### Fixed
+- **Obico: Cold-Start Capture Timeout Sticks in Status Banner** ([#172](https://github.com/maziggy/bambuddy/issues/172)) — On the very first detection poll after a restart, the initial RTSP snapshot capture occasionally exceeded the 20 s `SNAPSHOT_CAPTURE_TIMEOUT` (the first keyframe from the printer's camera can take a while on a cold RTSP connection). Subsequent polls every ~8 s recovered and captured in ~1.2 s, but the red `× Failed to capture snapshot for printer N` banner in Settings → Failure Detection → Status stayed up forever because `ObicoDetectionService._last_error` was written on failure and never cleared on the next successful poll. The successful branch in `_check_printer` now clears `_last_error` to `None` once a capture + ML call + classification complete, so the banner reflects only errors from recent cycles. Configuration-level errors (missing `external_url`, missing `ml_url`) still persist because they return before the clearing line — users still see them until they fix the setting. Regression test covers: seed `_last_error`, run one successful `_check_printer`, assert `_last_error is None`. Thanks to @fblix for the reproduction and screenshot.
 - **Printer Card Controls Row Overflows in Chrome** — At Medium card size on a wide viewport, the printer-card controls row (fan badges, airduct mode, print speed, bed jog, then Stop / Pause on the right) visibly overlapped in Chrome while rendering fine in Firefox and Safari. The controls-row layout had a `max-[550px]:flex-wrap` rule on the left badge group that only fires below 550 **viewport** pixels, so on a wide viewport with a narrow card the left group never wrapped — and since its badges don't truncate, Chrome painted the overflowing speed/bed-jog badges on top of the right-pinned Stop/Pause buttons. German locales made it obvious ("Pausieren" is 9 characters). The left group now uses unconditional `flex-wrap`, so when badges don't all fit on one line they wrap inside the left cell instead of colliding with the right cell; the parent row also wraps `gap-y` so Stop/Pause drops to a new line in the worst case. Pre-existing (commit `4ff3e2a6`, Feb 2026), surfaced while testing #939.
 - **MQTT Smart Plug Subscription Lost After Every Restart** ([#1010](https://github.com/maziggy/bambuddy/issues/1010)) — Users integrating a Shelly (or any other) plug through an external MQTT broker (e.g. ioBroker, Zigbee2MQTT, Home Assistant's MQTT broker) saw the plug's power / state / energy readings go dark after every Bambuddy restart, and the only fix was to open Settings → Smart Plugs, rename the topic to a dummy value, save, rename it back and save again. Root cause: the startup restore path in `main.py` (~line 4120) still used the legacy single-topic model (`mqtt_topic` plus `*_path` kwargs), while the Settings UI save path had been upgraded to the newer per-type model (`mqtt_power_topic` / `mqtt_energy_topic` / `mqtt_state_topic` each with their own paths, multipliers and `mqtt_state_on_value`). Plugs configured entirely with the new per-type fields got skipped at startup because the `if plug.mqtt_topic:` guard short-circuited — which is exactly what a Shelly-via-ioBroker setup looks like, since those publish power and state on separate topics. The "rename, save, rename back" workaround triggered the update endpoint, which was using the correct per-type code and re-established the subscription. Fix: extracted the topic-resolution + `service.subscribe()` call into a single `subscribe_plug_to_mqtt(service, plug)` helper in `backend/app/services/mqtt_smart_plug.py` that preserves legacy fallback, and routed the startup restore, create, and update routes all through it so future schema changes can't cause the three paths to drift again. Regression tests cover: per-type topics restored without a legacy topic set, legacy single-topic backward compat, per-type multipliers overriding legacy, per-type winning when both are set, the empty-config skip case, and topic-list de-duplication. Thanks to @saint-hh for the clear repro steps.
 - **Large 3MF Uploads Archived as Corrupted ZIPs** ([#1032](https://github.com/maziggy/bambuddy/issues/1032)) — On bare-metal Raspberry Pi installs (armv7l / Python 3.11 / Bookworm), 3MF files larger than a few MB arrived complete via the virtual-printer FTP server but the copy into `data/archives/` ended up not being a valid ZIP. The archive row was still written, the printer card looked fine, and the problem only surfaced later when opening the archive in the UI, where `GET /archives/{id}/plates` logged `Failed to parse plates from archive N: File is not a zip file` and the thumbnail / plate / filament panels came up blank. Two things conspired: `shutil.copy2` takes the Linux `sendfile()` fast path on Python ≥ 3.8, and a partial-return from that syscall silently truncated the destination for the upload sizes users hit; and `ThreeMFParser.parse()` had a bare `except: pass` around its `zipfile.ZipFile` open, so the archive pipeline kept going with empty metadata and left the bad file on disk. The copy is now an explicit chunked read/write with `fsync()` — no sendfile involved — with a post-condition `zipfile.is_zipfile()` check that refuses to create the archive row (and cleans up the archive directory) when the source was a valid ZIP and the destination isn't, logging both sizes at `ERROR`. The parser's silent catch now logs at `WARNING` so corrupted 3MFs are visible in support bundles instead of disappearing into empty metadata. Regression tests cover small / multi-chunk copies, ZIP roundtrips, the post-copy `is_zipfile` sentinel on a truncated file, and the new parser WARNING. Thanks to @saint-hh for the detailed diagnosis.

+ 4 - 0
backend/app/services/obico_detection.py

@@ -253,6 +253,10 @@ class ObicoDetectionService:
         score = state.update(current_p)
         verdict = classify(score, settings["sensitivity"])
         self._last_class[printer_id] = verdict
+        # A successful capture + ML call clears any transient error from previous
+        # polls (typical case: cold-start RTSP timeout on first frame after startup,
+        # followed by healthy polls that otherwise leave the banner stuck in the UI).
+        self._last_error = None
 
         # Log every non-safe sample — safe samples would flood history
         if verdict != "safe" or detections:

+ 39 - 0
backend/tests/unit/test_obico_detection.py

@@ -430,3 +430,42 @@ class TestCheckPrinterUsesCachedFrameUrl:
         mock_client.get.assert_not_called()
         assert svc._last_error is not None
         assert "external_url" in svc._last_error
+
+    @pytest.mark.asyncio
+    async def test_successful_cycle_clears_previous_error(self):
+        """A cold-start RTSP timeout sets _last_error; the next successful poll must clear it.
+
+        Regression for #172: the Status card banner ("Failed to capture snapshot for
+        printer 1") stuck around after a one-off cold-start failure even though every
+        subsequent poll captured + detected successfully.
+        """
+        svc = ObicoDetectionService()
+        settings = {
+            "enabled": True,
+            "ml_url": "http://obico:3333",
+            "sensitivity": "medium",
+            "action": "notify",
+            "poll_interval": 10,
+            "enabled_printers": None,
+            "external_url": "http://bambuddy:8000",
+        }
+        status = MagicMock(state="RUNNING", task_name="job", subtask_name="")
+
+        # Seed a prior transient error, as would be left by a cold-start capture timeout.
+        svc._last_error = "Failed to capture snapshot for printer 1"
+
+        mock_response = MagicMock()
+        mock_response.json.return_value = {"detections": []}
+        mock_response.raise_for_status = MagicMock()
+        mock_client = MagicMock()
+        mock_client.get = AsyncMock(return_value=mock_response)
+        mock_client.__aenter__ = AsyncMock(return_value=mock_client)
+        mock_client.__aexit__ = AsyncMock(return_value=False)
+
+        with (
+            patch("backend.app.services.obico_detection.httpx.AsyncClient", return_value=mock_client),
+            patch.object(svc, "_capture_frame", new=AsyncMock(return_value=FAKE_JPEG)),
+        ):
+            await svc._check_printer(1, status, settings)
+
+        assert svc._last_error is None