Browse Source

fix(mqtt): self-heal half-broken session on dispatch timeout (#936)

  When a project_file command was unacknowledged for 15s, Bambuddy
  previously logged "printer may need restart" and left the broken MQTT
  session in place — requiring the user to power-cycle the printer. The
  existing half-broken-session recovery only ran via the developer-mode
  probe path, which skips printers with a known developer_mode value.

  Extract the existing force-reconnect logic into a reusable helper and
  call it from _verify_print_response on dispatch timeout. The next
  dispatch attempt then lands on a fresh MQTT session without a reboot.
maziggy 1 month ago
parent
commit
4a2b9bc5d4
3 changed files with 27 additions and 18 deletions
  1. 1 0
      CHANGELOG.md
  2. 8 0
      backend/app/services/background_dispatch.py
  3. 18 18
      backend/app/services/bambu_mqtt.py

+ 1 - 0
CHANGELOG.md

@@ -19,6 +19,7 @@ All notable changes to Bambuddy will be documented in this file.
 - **Plate-Clear Confirmation Disabled by Default** — New installs ship with Settings → Workflow → "Require Plate-Clear Confirmation" off. Multiple new users reported queued prints appearing to not start because the prompt was waiting for acknowledgement; opt in from Workflow if you want the confirmation gate.
 
 ### Fixed
+- **Queued Prints Require Printer Reboot to Start** ([#936](https://github.com/maziggy/bambuddy/issues/936)) — On some printers, a queued print would be uploaded via FTP and the `project_file` MQTT command would be sent, but the printer never transitioned out of `FINISH`/`IDLE` and required a power cycle to unstick — after which it often started a previously cancelled print rather than the intended one. Root cause is a half-broken MQTT session (same shape as #887): the printer keeps publishing telemetry so Bambuddy reports it as connected, but our publishes on the command topic never reach the firmware. Existing recovery only triggered via the developer-mode probe path, which skips printers that already have a known `developer_mode` value. The print-dispatch verifier now treats an unacknowledged `project_file` (state unchanged after 15 s) as the same "commands not reaching printer" signal and forces a fresh MQTT session so the next dispatch can land without a printer reboot. The existing dev-mode probe path is refactored to share the same helper.
 - **Clear Plate Confirmation Bypassed on Power Cycle** ([#961](https://github.com/maziggy/bambuddy/issues/961)) — With Auto Off enabled and another job queued, the smart plug would cut power when a print finished and immediately re-power when the scheduler saw the queue, at which point the printer booted fresh into `IDLE` and the next job auto-dispatched without the "Clear Plate & Start Next" confirmation. Root cause: the plate-cleared gate lived only in the in-memory `PrinterManager._plate_cleared` set, and the scheduler's idle check treated `IDLE` as always-idle regardless of whether a previous finish had been acknowledged — so the gate was lost across both Bambuddy restarts and the IDLE-on-boot state transition. The gate is now an `awaiting_plate_clear` column on the `printers` table, set by `on_print_complete` when a print finishes or fails, cleared by the `/printers/{id}/clear-plate` endpoint and by the scheduler when it dispatches the next job, and rehydrated from the DB into `PrinterManager` on startup. `_is_printer_idle` now short-circuits to not-idle whenever `require_plate_clear` is on and the printer is awaiting ack, regardless of the currently reported state — so the prompt survives Auto Off cycles, Bambuddy restarts, and the printer booting back into `IDLE`. The clear-plate endpoint no longer requires the printer to currently report `FINISH`/`FAILED` (it accepts the ack whenever the awaiting flag is set), and the Printers page widget prompts based on the flag rather than the reported state. Thanks to @miaopas for reporting.
 - **Insecure Temp File Creation in Backup Export** — The manual backup download endpoint used `tempfile.mktemp()`, which is vulnerable to a symlink race condition (CWE-377). Replaced with `tempfile.mkstemp()` which atomically creates the file, eliminating the TOCTOU window.
 - **Spoolman Iframe Blocked After 0.2.3b4 Security Headers** — The Spoolman page (Inventory → Spoolman iframe) failed to load when Spoolman was served from the same host as Bambuddy via a reverse proxy. The security-headers middleware added in 0.2.3b4 set `X-Frame-Options: DENY` on every response, which blocked even same-origin iframing. Relaxed to `SAMEORIGIN` so Spoolman (and any other same-origin tool behind the same reverse proxy) can be embedded again, while still preventing cross-origin clickjacking.

+ 8 - 0
backend/app/services/background_dispatch.py

@@ -890,6 +890,14 @@ class BackgroundDispatchService:
             timeout,
             pre_state,
         )
+        # Strong signal the MQTT session is half-broken (#887, #936): telemetry
+        # still arrives but our publishes don't reach the printer. Force a fresh
+        # session so the next dispatch can land without a power cycle.
+        client = printer_manager.get_client(printer_id)
+        if client:
+            client.force_reconnect_stale_session(
+                f"print command unacknowledged after {timeout:.0f}s (state still {pre_state})"
+            )
 
     @staticmethod
     async def _cleanup_sd_card_file(

+ 18 - 18
backend/app/services/bambu_mqtt.py

@@ -422,6 +422,23 @@ class BambuMQTTClient:
                     pass  # Best-effort; paho loop will reconnect on next iteration
         return self.state.connected
 
+    def force_reconnect_stale_session(self, reason: str) -> None:
+        # Heals the #887 half-broken session: telemetry keeps arriving but our
+        # publishes no longer reach the printer. Closing the socket makes paho
+        # drop and re-establish with a fresh session.
+        logger.warning("[%s] Forcing MQTT reconnect: %s", self.serial_number, reason)
+        self._stale_reconnecting = True
+        self.state.connected = False
+        if self.on_state_change:
+            self.on_state_change(self.state)
+        if self._client:
+            try:
+                sock = self._client.socket()
+                if sock:
+                    sock.close()
+            except Exception:
+                pass
+
     def _on_connect(self, client, userdata, flags, rc, properties=None):
         if rc == 0:
             self.state.connected = True
@@ -2527,24 +2544,7 @@ class BambuMQTTClient:
                 )
                 self._dev_mode_probe_seq = None
                 if self._dev_mode_probe_failures >= 2:
-                    # Two consecutive unanswered probes — the MQTT session is likely
-                    # half-broken (printer sends status but ignores commands).
-                    # Force-close the socket so paho reconnects with a fresh session.
-                    logger.warning(
-                        "[%s] MQTT session appears broken (commands not reaching printer), forcing reconnect",
-                        self.serial_number,
-                    )
-                    self._stale_reconnecting = True
-                    self.state.connected = False
-                    if self.on_state_change:
-                        self.on_state_change(self.state)
-                    if self._client:
-                        try:
-                            sock = self._client.socket()
-                            if sock:
-                                sock.close()
-                        except Exception:
-                            pass
+                    self.force_reconnect_stale_session("developer mode probe unanswered 2×")
                 else:
                     # Allow retry on next full status message
                     self._dev_mode_probed = False