Browse Source

Fix virtual printer FTP transfer failure with connection reset (#58)

Large 3MF uploads intermittently failed with [Errno 104] Connection
reset by peer while the 16-byte verify_job always succeeded. The
_handle_data_connection callback returned immediately, letting the
asyncio task complete while cmd_STOR was still reading from the data
connection. The passive port listener also stayed open during transfers.

- Keep _handle_data_connection alive via _transfer_done event so the
  asyncio task holds strong references throughout the transfer
- Close passive port listener after accepting the data connection
- Reject duplicate data connections with a warning log
- Add drain timeout (5s) to MQTT status pushes to prevent blocking
  when the slicer is busy with FTP upload
- Improve error logging with bytes received and exception type
maziggy 3 months ago
parent
commit
3f7d2bf619

+ 1 - 0
CHANGELOG.md

@@ -17,6 +17,7 @@ All notable changes to Bambuddy will be documented in this file.
 - **Japanese Locale Complete Overhaul** — Restructured `ja.ts` from a divergent format (different key structure, 12 structural conflicts, 1,366 missing translations) to match the English/German locale structure exactly. Translated all 2,083 keys into Japanese, achieving full parity with EN/DE. Zero structural divergences, zero missing keys.
 
 ### Fixed
+- **Virtual Printer FTP Transfer Fails With Connection Reset** ([#58](https://github.com/maziggy/bambuddy/issues/58)) — Large 3MF uploads to the virtual printer intermittently failed with `[Errno 104] Connection reset by peer` while the small verify_job always succeeded. The `_handle_data_connection` callback returned immediately, allowing the asyncio server-handler task to complete while the data connection was still in active use. The passive port listener also stayed open during transfers, risking duplicate data connections. Fixed by keeping the callback alive until the transfer completes (`_transfer_done` event), closing the passive listener after accepting the connection, and rejecting duplicate data connections. Also added a 5-second drain timeout to MQTT status pushes to prevent blocking when the slicer is busy uploading.
 - **Camera Stop 401 When Auth Enabled** — Camera stop requests (`sendBeacon`) failed with 401 Unauthorized when authentication was enabled because `sendBeacon` cannot send auth headers. Replaced with `fetch` + `keepalive: true` which supports Authorization headers while remaining reliable during page unload.
 - **Spoolman Creates Duplicate Spools on Startup** ([#295](https://github.com/maziggy/bambuddy/pull/295)) — Each AMS tray independently fetched all spools from Spoolman, causing redundant API calls and duplicate spool creation with large databases (300+ spools). Now fetches spools once and reuses cached data across all tray operations. Added retry logic (3 attempts, 500ms delay) with connection recreation for transient network errors.
 - **Filament Usage Charts Inflated by Quantity Multiplier** ([#229](https://github.com/maziggy/bambuddy/issues/229)) — Daily, weekly, and filament-type charts were multiplying `filament_used_grams` by print quantity, even though the value already represents the total for the entire job. A 26-object print using 126g was counted as 3,276g. Removed the erroneous multiplier from three aggregations in `FilamentTrends.tsx`.

+ 52 - 8
backend/app/services/virtual_printer/ftp_server.py

@@ -55,6 +55,7 @@ class FTPSession:
         self._data_reader: asyncio.StreamReader | None = None
         self._data_writer: asyncio.StreamWriter | None = None
         self._data_connected = asyncio.Event()
+        self._transfer_done = asyncio.Event()
 
         peername = writer.get_extra_info("peername")
         self.remote_ip = peername[0] if peername else "unknown"
@@ -118,6 +119,9 @@ class FTPSession:
 
     async def _cleanup(self) -> None:
         """Clean up session resources."""
+        # Release any waiting data connection callback
+        self._transfer_done.set()
+
         if self.data_server:
             self.data_server.close()
             try:
@@ -233,10 +237,11 @@ class FTPSession:
         # Close any existing data connection/server
         await self._close_data_connection()
 
-        # Reset connection state
+        # Reset connection state for the new transfer
         self._data_connected.clear()
         self._data_reader = None
         self._data_writer = None
+        self._transfer_done = asyncio.Event()
 
         if await self._bind_passive_port():
             # EPSV response format: 229 Entering Extended Passive Mode (|||port|)
@@ -255,10 +260,11 @@ class FTPSession:
         # Close any existing data connection/server
         await self._close_data_connection()
 
-        # Reset connection state
+        # Reset connection state for the new transfer
         self._data_connected.clear()
         self._data_reader = None
         self._data_writer = None
+        self._transfer_done = asyncio.Event()
 
         if await self._bind_passive_port():
             # Determine the IP to advertise in PASV response
@@ -288,7 +294,24 @@ class FTPSession:
             await self.send(425, "Cannot open data connection")
 
     async def _handle_data_connection(self, reader: asyncio.StreamReader, writer: asyncio.StreamWriter) -> None:
-        """Handle incoming data connection (used by PASV)."""
+        """Handle incoming data connection (used by PASV/EPSV).
+
+        This callback stays alive until the transfer completes to ensure the
+        asyncio task holds strong references to the reader/writer throughout
+        the data transfer.  If the callback returned immediately, the task
+        would complete and the StreamReaderProtocol could release its strong
+        reader reference, potentially destabilising the connection.
+        """
+        # Reject duplicate connections — only one data connection per transfer
+        if self._data_reader is not None:
+            logger.warning("FTP rejecting duplicate data connection from %s", self.remote_ip)
+            try:
+                writer.close()
+                await writer.wait_closed()
+            except OSError:
+                pass
+            return
+
         # Log TLS details for debugging
         ssl_obj = writer.get_extra_info("ssl_object")
         if ssl_obj:
@@ -302,13 +325,26 @@ class FTPSession:
         logger.info("FTP data connection established from %s", self.remote_ip)
         self._data_reader = reader
         self._data_writer = writer
+
+        # Stop accepting further connections on the passive port
+        if self.data_server:
+            self.data_server.close()
+
         self._data_connected.set()
-        # Don't close - let the transfer command handle it
+
+        # Keep this callback alive until the transfer command (STOR/RETR)
+        # finishes. This ensures the asyncio server-handler task holds strong
+        # references to reader/writer for the entire transfer lifetime.
+        await self._transfer_done.wait()
 
     async def _close_data_connection(self) -> None:
         """Close the data connection and server."""
         had_connection = self._data_writer is not None or self.data_server is not None
 
+        # Signal the _handle_data_connection callback to return, allowing
+        # its asyncio task to complete cleanly.
+        self._transfer_done.set()
+
         if self._data_writer:
             try:
                 self._data_writer.close()
@@ -336,7 +372,7 @@ class FTPSession:
             await self.send(530, "Not logged in")
             return
 
-        if not self.data_server:
+        if not self.data_server and not self._data_connected.is_set():
             await self.send(425, "Use PASV first")
             return
 
@@ -363,20 +399,28 @@ class FTPSession:
 
         # Receive data
         data_content: list[bytes] = []
+        total_received = 0
         try:
             while True:
                 chunk = await asyncio.wait_for(self._data_reader.read(65536), timeout=60)
                 if not chunk:
                     break
                 data_content.append(chunk)
-                logger.debug("FTP received chunk: %s bytes", len(chunk))
+                total_received += len(chunk)
+                logger.debug("FTP received chunk: %s bytes (total: %s)", len(chunk), total_received)
         except TimeoutError:
-            logger.error("FTP data transfer timeout")
+            logger.error("FTP data transfer timeout after %s bytes for %s", total_received, filename)
             await self.send(426, "Transfer timeout")
             await self._close_data_connection()
             return
         except Exception as e:
-            logger.error("FTP data transfer error: %s", e)
+            logger.error(
+                "FTP data transfer error after %s bytes for %s: %s(%s)",
+                total_received,
+                filename,
+                type(e).__name__,
+                e,
+            )
             await self.send(426, f"Transfer failed: {e}")
             await self._close_data_connection()
             return

+ 6 - 1
backend/app/services/virtual_printer/mqtt_server.py

@@ -696,7 +696,12 @@ class SimpleMQTTServer:
         packet += message_bytes
 
         writer.write(packet)
-        await writer.drain()
+        # Timeout the drain to prevent blocking the event loop if the
+        # MQTT client stops reading (e.g. slicer busy with FTP upload).
+        try:
+            await asyncio.wait_for(writer.drain(), timeout=5)
+        except TimeoutError:
+            logger.debug("MQTT drain timeout for %s — client may be busy", topic)
 
     async def _send_print_response(self, writer: asyncio.StreamWriter, sequence_id: str, filename: str) -> None:
         """Send project_file acknowledgment matching real Bambu printer behavior."""