Browse Source

Fix developer mode probe destabilizing MQTT on auto-reconnect (#887)

  The dev mode probe (ams_filament_setting to ext slot) fired on every
  auto-reconnect, which destabilized some firmware MQTT brokers (A1/P1)
  causing a reconnect-probe-disconnect feedback loop. Now caches the
  probe result across reconnects and only probes once on first connect,
  with a 5s delay to let the session stabilize.
maziggy 1 tháng trước cách đây
mục cha
commit
113bae4c63

+ 1 - 1
CHANGELOG.md

@@ -31,7 +31,7 @@ All notable changes to Bambuddy will be documented in this file.
 - **Thumbnails Broken After Backend Restart** — Archive and library thumbnails returned 401 Unauthorized after a backend restart because stream tokens are stored in memory and lost on restart. The frontend now detects failed token-protected image loads and automatically refreshes the stream token, so thumbnails recover without a page reload.
 - **Thumbnails Broken After Backend Restart** — Archive and library thumbnails returned 401 Unauthorized after a backend restart because stream tokens are stored in memory and lost on restart. The frontend now detects failed token-protected image loads and automatically refreshes the stream token, so thumbnails recover without a page reload.
 - **SpoolBuddy Kiosk Screen Blanks on Boot** — The touchscreen display would blank immediately after the RPi booted, requiring a touch to wake. Added `consoleblank=0` to the kernel cmdline to disable Linux console blanking during the Plymouth-to-labwc transition, and changed the `wlr-randr` anti-blank loop to fire immediately instead of sleeping 60 seconds first.
 - **SpoolBuddy Kiosk Screen Blanks on Boot** — The touchscreen display would blank immediately after the RPi booted, requiring a touch to wake. Added `consoleblank=0` to the kernel cmdline to disable Linux console blanking during the Plymouth-to-labwc transition, and changed the `wlr-randr` anti-blank loop to fire immediately instead of sleeping 60 seconds first.
 - **Queue Widget Ignores Plate-Clear Setting** ([#752](https://github.com/maziggy/bambuddy/issues/752)) — The "Clear Plate & Start Next" button on printer cards appeared even when "Require plate-clear confirmation" was disabled in Settings → Queue. The backend correctly auto-dispatched without waiting, but the frontend widget always showed the prompt. The widget now respects the setting and shows a passive queue link instead when plate-clear confirmation is disabled.
 - **Queue Widget Ignores Plate-Clear Setting** ([#752](https://github.com/maziggy/bambuddy/issues/752)) — The "Clear Plate & Start Next" button on printer cards appeared even when "Require plate-clear confirmation" was disabled in Settings → Queue. The backend correctly auto-dispatched without waiting, but the frontend widget always showed the prompt. The widget now respects the setting and shows a passive queue link instead when plate-clear confirmation is disabled.
-- **AMS Slot Changes Fail Until Reconnect** ([#887](https://github.com/maziggy/bambuddy/issues/887)) — After a keep-alive timeout, paho-mqtt auto-reconnects but the new session can be half-broken: the printer continues sending status updates but silently ignores commands. The developer mode probe detected this (no response, leaving `developer_mode` as `null`), but had no timeout or recovery — one unanswered probe permanently blocked retries. Added a 10-second probe timeout with one retry; after two consecutive unanswered probes, Bambuddy force-closes the socket to trigger a clean reconnect with a fresh session.
+- **AMS Slot Changes Fail Until Reconnect** ([#887](https://github.com/maziggy/bambuddy/issues/887)) — After a keep-alive timeout, paho-mqtt auto-reconnects but the new session can be half-broken: the printer continues sending status updates but silently ignores commands. The developer mode probe detected this (no response, leaving `developer_mode` as `null`), but had no timeout or recovery — one unanswered probe permanently blocked retries. Added a 10-second probe timeout with one retry; after two consecutive unanswered probes, Bambuddy force-closes the socket to trigger a clean reconnect with a fresh session. Additionally, the developer mode probe was firing on every auto-reconnect, which destabilized some firmware MQTT brokers (A1/P1 series) — causing a reconnect → probe → disconnect feedback loop. The probe result is now cached across reconnects and only runs once on the first connection, with a 5-second delay after connect to let the session stabilize.
 - **WebSocket Crash on Printers Without `fun` Field** ([#873](https://github.com/maziggy/bambuddy/issues/873)) — Connecting to printers that don't send the MQTT `fun` field (A1, P1 series, X1Plus firmware) caused a repeating `'str' object has no attribute 'get'` crash in the WebSocket handler, showing the printer as offline with missing AMS and SD card info. The developer mode probe introduced in 0.2.3b1 published an MQTT message inside `_update_state()` between overwriting `raw_data` with the full MQTT dict (where `vt_tray` is a raw dict) and restoring the previously normalized list — the `publish()` call released the GIL, letting the event loop read the un-normalized dict and iterate over string keys instead of spool dicts. Fixed by normalizing `vt_tray` dict→list in the MQTT data before assignment, and moving preserved field restoration before the probe. Added defensive normalization in `printer_state_to_dict` as a belt-and-suspenders guard.
 - **WebSocket Crash on Printers Without `fun` Field** ([#873](https://github.com/maziggy/bambuddy/issues/873)) — Connecting to printers that don't send the MQTT `fun` field (A1, P1 series, X1Plus firmware) caused a repeating `'str' object has no attribute 'get'` crash in the WebSocket handler, showing the printer as offline with missing AMS and SD card info. The developer mode probe introduced in 0.2.3b1 published an MQTT message inside `_update_state()` between overwriting `raw_data` with the full MQTT dict (where `vt_tray` is a raw dict) and restoring the previously normalized list — the `publish()` call released the GIL, letting the event loop read the un-normalized dict and iterate over string keys instead of spool dicts. Fixed by normalizing `vt_tray` dict→list in the MQTT data before assignment, and moving preserved field restoration before the probe. Added defensive normalization in `printer_state_to_dict` as a belt-and-suspenders guard.
 
 
 
 

+ 27 - 7
backend/app/services/bambu_mqtt.py

@@ -353,9 +353,11 @@ class BambuMQTTClient:
         # we probe by sending an ams_filament_setting and checking the response.
         # we probe by sending an ams_filament_setting and checking the response.
         # "mqtt message verify failed" → dev mode OFF, success → dev mode ON.
         # "mqtt message verify failed" → dev mode OFF, success → dev mode ON.
         self._dev_mode_probed: bool = False
         self._dev_mode_probed: bool = False
+        self._dev_mode_needs_probe: bool = False  # True after seeing a pushall without "fun"
         self._dev_mode_probe_seq: str | None = None
         self._dev_mode_probe_seq: str | None = None
         self._dev_mode_probe_time: float = 0.0  # monotonic timestamp when probe was sent
         self._dev_mode_probe_time: float = 0.0  # monotonic timestamp when probe was sent
         self._dev_mode_probe_failures: int = 0  # consecutive unanswered probes
         self._dev_mode_probe_failures: int = 0  # consecutive unanswered probes
+        self._connect_time: float = 0.0  # monotonic timestamp of last _on_connect
 
 
         # Set when check_staleness() force-closes the socket to trigger reconnect.
         # Set when check_staleness() force-closes the socket to trigger reconnect.
         # Prevents _on_disconnect from redundantly broadcasting state (already done).
         # Prevents _on_disconnect from redundantly broadcasting state (already done).
@@ -425,12 +427,18 @@ class BambuMQTTClient:
             self._stale_reconnecting = False  # Clear stale-reconnect flag on successful connect
             self._stale_reconnecting = False  # Clear stale-reconnect flag on successful connect
             # Reset per-connection warning state so warnings fire once per (re)connection
             # Reset per-connection warning state so warnings fire once per (re)connection
             self._ams_version_warned = set()
             self._ams_version_warned = set()
-            # Reset developer mode detection so we re-check on each connection
-            self.state.developer_mode = None
+            # Preserve cached developer_mode across auto-reconnects to avoid
+            # re-probing on every reconnect.  The probe (ams_filament_setting to
+            # ext slot) can destabilize some firmware MQTT brokers, causing a
+            # reconnect → probe → disconnect feedback loop (#887).  Only probe
+            # once when developer_mode is truly unknown (first connect).
+            # Reset probe tracking so stale timeout state doesn't carry over.
             self._dev_mode_probed = False
             self._dev_mode_probed = False
+            self._dev_mode_needs_probe = False
             self._dev_mode_probe_seq = None
             self._dev_mode_probe_seq = None
             self._dev_mode_probe_time = 0.0
             self._dev_mode_probe_time = 0.0
             self._dev_mode_probe_failures = 0
             self._dev_mode_probe_failures = 0
+            self._connect_time = time.monotonic()
             client.subscribe(self.topic_subscribe)
             client.subscribe(self.topic_subscribe)
             # Subscribe to request topic for ams_mapping capture (if supported by broker)
             # Subscribe to request topic for ams_mapping capture (if supported by broker)
             if self._request_topic_supported:
             if self._request_topic_supported:
@@ -2440,11 +2448,23 @@ class BambuMQTTClient:
                 self.state.developer_mode = (fun_int & 0x20000000) == 0
                 self.state.developer_mode = (fun_int & 0x20000000) == 0
             except (ValueError, TypeError):
             except (ValueError, TypeError):
                 pass
                 pass
-        elif self.state.developer_mode is None and not self._dev_mode_probed and len(data) > 30:
-            # No "fun" field in this full status message (A1/P1 series never send it).
-            # Only probe after a full pushall response (30+ keys) to avoid firing on
-            # partial incremental updates that arrive before the pushall with "fun".
-            self._probe_developer_mode()
+        elif self.state.developer_mode is None and not self._dev_mode_probed:
+            # No "fun" field — A1/P1 series never send it, so we need to probe.
+            # Two gates: (1) wait for a full pushall (30+ keys) so we don't probe
+            # before a pushall that might contain "fun" arrives, and (2) delay 5s
+            # after connect to let the MQTT session stabilize — probing too early
+            # can destabilize some firmware MQTT brokers (#887).
+            if not self._dev_mode_needs_probe and len(data) > 30:
+                # First full status without "fun" — mark that probe is needed
+                self._dev_mode_needs_probe = True
+            if self._dev_mode_needs_probe and time.monotonic() - self._connect_time >= 5.0:
+                self._probe_developer_mode()
+            elif self._dev_mode_needs_probe:
+                logger.debug(
+                    "[%s] Deferring developer mode probe (%.1fs since connect, need 5s)",
+                    self.serial_number,
+                    time.monotonic() - self._connect_time,
+                )
         elif self._dev_mode_probed and self._dev_mode_probe_seq is not None:
         elif self._dev_mode_probed and self._dev_mode_probe_seq is not None:
             # Probe was sent but no response yet — check for timeout.
             # Probe was sent but no response yet — check for timeout.
             # A half-broken MQTT session (e.g. after keep-alive timeout reconnect)
             # A half-broken MQTT session (e.g. after keep-alive timeout reconnect)

+ 83 - 3
backend/tests/unit/services/test_bambu_mqtt.py

@@ -2798,6 +2798,7 @@ class TestDeveloperModeProbeTimeout:
 
 
     @pytest.fixture
     @pytest.fixture
     def mqtt_client(self):
     def mqtt_client(self):
+        import time
         from unittest.mock import MagicMock
         from unittest.mock import MagicMock
 
 
         from backend.app.services.bambu_mqtt import BambuMQTTClient
         from backend.app.services.bambu_mqtt import BambuMQTTClient
@@ -2812,6 +2813,8 @@ class TestDeveloperModeProbeTimeout:
         mock_paho = MagicMock()
         mock_paho = MagicMock()
         mock_paho.socket.return_value = MagicMock()
         mock_paho.socket.return_value = MagicMock()
         client._client = mock_paho
         client._client = mock_paho
+        # Set connect time in the past so the 5s probe delay is satisfied
+        client._connect_time = time.monotonic() - 10.0
         return client
         return client
 
 
     def _make_pushall_data(self):
     def _make_pushall_data(self):
@@ -2908,8 +2911,8 @@ class TestDeveloperModeProbeTimeout:
         mqtt_client._update_state(data)
         mqtt_client._update_state(data)
         assert mqtt_client._dev_mode_probe_failures == 0
         assert mqtt_client._dev_mode_probe_failures == 0
 
 
-    def test_on_connect_resets_probe_state(self, mqtt_client):
-        """_on_connect resets all probe-related state for fresh detection."""
+    def test_on_connect_resets_probe_state_but_preserves_developer_mode(self, mqtt_client):
+        """_on_connect resets probe tracking but preserves cached developer_mode."""
         import time
         import time
 
 
         mqtt_client._dev_mode_probed = True
         mqtt_client._dev_mode_probed = True
@@ -2922,11 +2925,88 @@ class TestDeveloperModeProbeTimeout:
         mqtt_client._client.subscribe.return_value = (0, 1)
         mqtt_client._client.subscribe.return_value = (0, 1)
         mqtt_client._on_connect(mqtt_client._client, None, None, 0)
         mqtt_client._on_connect(mqtt_client._client, None, None, 0)
 
 
-        assert mqtt_client.state.developer_mode is None
+        # developer_mode is preserved across reconnects (#887)
+        assert mqtt_client.state.developer_mode is True
         assert mqtt_client._dev_mode_probed is False
         assert mqtt_client._dev_mode_probed is False
         assert mqtt_client._dev_mode_probe_seq is None
         assert mqtt_client._dev_mode_probe_seq is None
         assert mqtt_client._dev_mode_probe_time == 0.0
         assert mqtt_client._dev_mode_probe_time == 0.0
         assert mqtt_client._dev_mode_probe_failures == 0
         assert mqtt_client._dev_mode_probe_failures == 0
+        assert mqtt_client._connect_time > 0
+
+    def test_probe_deferred_when_connect_too_recent(self, mqtt_client):
+        """Probe is deferred if less than 5s have passed since _on_connect."""
+        import time
+
+        data = self._make_pushall_data()
+
+        # Set connect time to 1 second ago — too recent for probe
+        mqtt_client._connect_time = time.monotonic() - 1.0
+
+        mqtt_client._update_state(data)
+        # Pushall seen, so needs_probe is set, but probe NOT fired yet
+        assert mqtt_client._dev_mode_needs_probe is True
+        assert mqtt_client._dev_mode_probed is False
+        assert mqtt_client._dev_mode_probe_seq is None
+
+    def test_probe_fires_after_delay(self, mqtt_client):
+        """Probe fires once 5s have passed since _on_connect."""
+        import time
+
+        data = self._make_pushall_data()
+
+        # Set connect time to 6 seconds ago — delay satisfied
+        mqtt_client._connect_time = time.monotonic() - 6.0
+
+        mqtt_client._update_state(data)
+        # Probe should have fired
+        assert mqtt_client._dev_mode_needs_probe is True
+        assert mqtt_client._dev_mode_probed is True
+        assert mqtt_client._dev_mode_probe_seq is not None
+
+    def test_probe_fires_on_incremental_after_delay(self, mqtt_client):
+        """After seeing a pushall within 5s, probe fires on later incremental message."""
+        import time
+
+        pushall_data = self._make_pushall_data()
+        incremental_data = {"gcode_state": "IDLE", "mc_percent": 0}  # < 30 keys
+
+        # Pushall arrives 1s after connect — too early for probe
+        mqtt_client._connect_time = time.monotonic() - 1.0
+        mqtt_client._update_state(pushall_data)
+        assert mqtt_client._dev_mode_needs_probe is True
+        assert mqtt_client._dev_mode_probed is False
+
+        # 5s later, an incremental update arrives — probe fires now
+        mqtt_client._connect_time = time.monotonic() - 6.0
+        mqtt_client._update_state(incremental_data)
+        assert mqtt_client._dev_mode_probed is True
+        assert mqtt_client._dev_mode_probe_seq is not None
+
+    def test_no_reprobe_when_developer_mode_cached(self, mqtt_client):
+        """Auto-reconnect preserves developer_mode, skipping reprobe."""
+        import time
+
+        data = self._make_pushall_data()
+
+        # Simulate known developer_mode from previous connection
+        mqtt_client.state.developer_mode = True
+        mqtt_client._connect_time = time.monotonic() - 10.0
+
+        mqtt_client._update_state(data)
+        # Should NOT probe — developer_mode is already known
+        assert mqtt_client._dev_mode_needs_probe is False
+        assert mqtt_client._dev_mode_probed is False
+        assert mqtt_client._dev_mode_probe_seq is None
+        assert mqtt_client.state.developer_mode is True
+
+    def test_on_connect_resets_needs_probe(self, mqtt_client):
+        """_on_connect resets _dev_mode_needs_probe for a clean start."""
+        mqtt_client._dev_mode_needs_probe = True
+
+        mqtt_client._client.subscribe.return_value = (0, 1)
+        mqtt_client._on_connect(mqtt_client._client, None, None, 0)
+
+        assert mqtt_client._dev_mode_needs_probe is False
 
 
 
 
 class TestVtTrayNormalization:
 class TestVtTrayNormalization: