Преглед на файлове

Fix stale MQTT connection not triggering reconnect (#813)

  Two issues prevented recovery from stale printer MQTT connections:

  1. The spurious-disconnect filter (30s window) could suppress real
     keepalive-timeout disconnects (which fire at ~22.5s with keepalive=15s).
     Now checks rc.is_failure so error disconnects are never suppressed,
     and tightens the window from 30s to 10s.

  2. Stale detection (60s no messages) set state.connected=False but left
     the TCP socket open, so paho kept publishing into the void. Now
     force-closes the socket so paho's loop thread detects the break and
     auto-reconnects.
maziggy преди 2 месеца
родител
ревизия
6ac375a700
променени са 2 файла, в които са добавени 18 реда и са изтрити 3 реда
  1. 1 0
      CHANGELOG.md
  2. 17 3
      backend/app/services/bambu_mqtt.py

+ 1 - 0
CHANGELOG.md

@@ -47,6 +47,7 @@ All notable changes to Bambuddy will be documented in this file.
 - **SpoolBuddy Daemon Reports Stale Version** — The SpoolBuddy daemon maintained its own hardcoded `__version__` that was never bumped to `0.2.3b1`, causing the update check to incorrectly show an update from `0.2.2b1` to the latest release. Fixed by reading the version at import time from the backend's `APP_VERSION` in `backend/app/core/config.py` — the single source of truth — so the daemon version is always in sync.
 - **SpoolBuddy Update Columns Missing from Database** — The OTA update feature added `update_status` and `update_message` to the device model but was missing the database migration, causing "no such column" errors on existing installations.
 - **Queue Print Command Not Reaching Printer** ([#778](https://github.com/maziggy/bambuddy/issues/778)) — When a queue item targeted a specific printer and the scheduler's power-on-wait loop triggered, each reconnection attempt created a new MQTT client that re-attempted subscribing to the request topic. On printers whose broker rejects this subscription (e.g. A1), this caused repeated connect/disconnect cycles for up to 170 seconds, leaving the MQTT connection in a fragile state where the print command could silently fail to reach the printer. Fixed by caching request topic support state per serial number at the class level, so new client instances skip the subscription immediately instead of rediscovering the rejection. Reported by @RubenKremer.
+- **Stale MQTT Connection Not Recovering** ([#813](https://github.com/maziggy/bambuddy/issues/813)) — When a printer's MQTT connection went stale (no messages for 60+ seconds), Bambuddy marked it as disconnected but did not force the underlying TCP socket closed, so paho-mqtt's auto-reconnect never triggered and print commands were silently published into a dead connection. Additionally, the spurious-disconnect filter (designed to ignore false disconnect callbacks from paho) used a 30-second window that could suppress real keepalive-timeout disconnects (which fire at ~22.5s with keepalive=15s). Fixed by: (1) never suppressing error disconnects (`rc.is_failure`) regardless of timing, and tightening the spurious filter window from 30s to 10s; (2) force-closing the socket on stale detection so paho's loop thread detects the break and auto-reconnects. Reported by @inkdawgz.
 - **AMS Slot Search Shows Unrelated Profiles** ([#681](https://github.com/maziggy/bambuddy/issues/681)) — Searching for a non-existent filament profile in the AMS slot configuration showed unrelated profiles instead of an empty result. The saved preset bypassed the search filter entirely, so stale mappings (e.g. a slot previously configured with "Bambu PLA Matte" that now holds a Silk spool) would always appear regardless of the search query. The saved preset now only bypasses the printer model filter, not the search filter. Reported by @RosdasHH.
 - **Virtual Printer FTP Routed to Wrong VP** ([#735](https://github.com/maziggy/bambuddy/issues/735)) — When running multiple virtual printers with different access codes on separate bind IPs, FTP connections were routed to the wrong VP. Root cause: the iptables `REDIRECT` rule rewrites the destination IP to the incoming interface's primary address, so all FTP traffic went to the first VP regardless of the intended target. Fix: FTP server now binds directly to port 990 (standard implicit FTPS), eliminating the need for iptables redirect. Requires `CAP_NET_BIND_SERVICE` (already set in the systemd service and Docker image). Also removed a global `set_exception_handler()` in the MQTT server that caused spurious error messages when running multiple VPs. See `docs/migration-vp-ftp-port.md` for migration steps. Reported by @VREmma.
 - **X1C Virtual Printer Not Accepting Sends** ([#735](https://github.com/maziggy/bambuddy/issues/735)) — X1C (and X1) virtual printers were advertised with legacy SSDP model codes (`3DPrinter-X1-Carbon` / `3DPrinter-X1`) that BambuStudio doesn't recognize, causing "incompatible printer preset" when sending. Fixed to use the correct codes (`BL-P001` / `BL-P002`). Also fixed proxy mode auto-inherit storing the printer's display name (e.g. `X1C`) instead of the SSDP code. Existing VPs are automatically migrated on startup. Reported by @RosdasHH.

+ 17 - 3
backend/app/services/bambu_mqtt.py

@@ -366,11 +366,22 @@ class BambuMQTTClient:
         """Check staleness and update connected state if stale. Returns True if connected."""
         if self.state.connected and self.is_stale():
             logger.warning(
-                f"[{self.serial_number}] Connection stale - no message for {time.time() - self._last_message_time:.1f}s"
+                f"[{self.serial_number}] Connection stale - no message for {time.time() - self._last_message_time:.1f}s, forcing reconnect"
             )
             self.state.connected = False
             if self.on_state_change:
                 self.on_state_change(self.state)
+            # Force-close the underlying socket so paho's loop thread detects
+            # the broken connection and triggers auto-reconnect.  We don't call
+            # client.disconnect() because that's a clean disconnect and paho
+            # would NOT auto-reconnect afterwards.
+            if self._client:
+                try:
+                    sock = self._client.socket()
+                    if sock:
+                        sock.close()
+                except Exception:
+                    pass  # Best-effort; paho loop will reconnect on next iteration
         return self.state.connected
 
     def _on_connect(self, client, userdata, flags, rc, properties=None):
@@ -434,9 +445,12 @@ class BambuMQTTClient:
 
     def _on_disconnect(self, client, userdata, disconnect_flags=None, rc=None, properties=None):
         # Ignore spurious disconnect callbacks if we've received a message recently
-        # Paho-mqtt sometimes fires disconnect callbacks while the connection is still active
+        # Paho-mqtt sometimes fires disconnect callbacks while the connection is still active.
+        # BUT: never suppress error disconnects (keepalive timeout, connection lost, etc.)
+        # — only suppress when rc indicates a clean/normal disconnect.
+        is_error_disconnect = rc is not None and hasattr(rc, "is_failure") and rc.is_failure
         time_since_last_message = time.time() - self._last_message_time
-        if time_since_last_message < 30.0 and self._last_message_time > 0:
+        if not is_error_disconnect and time_since_last_message < 10.0 and self._last_message_time > 0:
             logger.debug(
                 f"[{self.serial_number}] Ignoring spurious disconnect (last message {time_since_last_message:.1f}s ago)"
             )