Browse Source

fix(archive): truncation of large 3MF uploads on sendfile short-return (#1032)

  On bare-metal Raspberry Pi OS bookworm / armv7l / Python 3.11, 3MF
  files larger than a few megabytes arrived complete via the
  virtual-printer FTP server but the copy into data/archives/ was
  silently truncated. The archive row was still written, the printer
  card looked fine, and the problem only surfaced later when opening
  the archive — the subsequent zipfile.ZipFile() in
  GET /archives/{id}/plates raised BadZipFile and the UI came up blank
  with no thumbnail, plate list, or filament data.

  Two things conspired:

  1. archive_print() used shutil.copy2, which takes Python's sendfile()
     fast path on Linux. On the reporter's kernel/fs combination
     sendfile returned a short count on the first call for the upload
     sizes hit in practice and the destination ended up truncated.
     Small files completed in one syscall and were fine.
  2. ThreeMFParser.parse() caught the resulting BadZipFile in a bare
     `except Exception: pass`, so the archive pipeline kept going with
     empty metadata and left the bad file on disk — nothing in the
     logs hinted anything had gone wrong until a support bundle came
     in and the "Failed to parse plates" warning fired much later.

  The archive copy is now an explicit chunked read/write with fsync —
  sendfile is not in the path. After the copy, if the source was a
  valid ZIP but the destination isn't, we refuse to create the archive
  row, remove only the truncated file (and the archive directory if
  empty — archive_dir is created with exist_ok=True so rmtree would be
  unsafe if a same-second same-filename collision happened), and log
  both sizes at ERROR so the condition is obvious in future support
  bundles. The parser's silent catch now logs at WARNING for the same
  reason.

  All nine archive_print() call sites already check `if archive:` or
  `if not archive:`, so returning None for corrupted ZIPs propagates
  cleanly without behaviour changes elsewhere.

  Regression tests cover single-chunk and multi-chunk copies, mtime
  preservation via copystat, overwrite of an existing destination, a
  ZIP roundtrip through a multi-megabyte 3MF, the new parser WARNING,
  and a truncation sentinel verifying that zipfile.is_zipfile() flips
  to False on a half-written ZIP — the exact post-condition
  archive_print now trusts.
maziggy 1 month ago
parent
commit
936b748127
3 changed files with 205 additions and 4 deletions
  1. 1 0
      CHANGELOG.md
  2. 71 4
      backend/app/services/archive.py
  3. 133 0
      backend/tests/unit/services/test_archive_copy.py

+ 1 - 0
CHANGELOG.md

@@ -5,6 +5,7 @@ All notable changes to Bambuddy will be documented in this file.
 ## [0.2.4b1] - Unreleased
 ## [0.2.4b1] - Unreleased
 
 
 ### Fixed
 ### Fixed
+- **Large 3MF Uploads Archived as Corrupted ZIPs** ([#1032](https://github.com/maziggy/bambuddy/issues/1032)) — On bare-metal Raspberry Pi installs (armv7l / Python 3.11 / Bookworm), 3MF files larger than a few MB arrived complete via the virtual-printer FTP server but the copy into `data/archives/` ended up not being a valid ZIP. The archive row was still written, the printer card looked fine, and the problem only surfaced later when opening the archive in the UI, where `GET /archives/{id}/plates` logged `Failed to parse plates from archive N: File is not a zip file` and the thumbnail / plate / filament panels came up blank. Two things conspired: `shutil.copy2` takes the Linux `sendfile()` fast path on Python ≥ 3.8, and a partial-return from that syscall silently truncated the destination for the upload sizes users hit; and `ThreeMFParser.parse()` had a bare `except: pass` around its `zipfile.ZipFile` open, so the archive pipeline kept going with empty metadata and left the bad file on disk. The copy is now an explicit chunked read/write with `fsync()` — no sendfile involved — with a post-condition `zipfile.is_zipfile()` check that refuses to create the archive row (and cleans up the archive directory) when the source was a valid ZIP and the destination isn't, logging both sizes at `ERROR`. The parser's silent catch now logs at `WARNING` so corrupted 3MFs are visible in support bundles instead of disappearing into empty metadata. Regression tests cover small / multi-chunk copies, ZIP roundtrips, the post-copy `is_zipfile` sentinel on a truncated file, and the new parser WARNING. Thanks to @saint-hh for the detailed diagnosis.
 - **Thumbnails Blank Until Reload After Sign-In** — On auth-enabled instances, signing out and back in left the File Manager (and occasionally the Archives page) full of broken thumbnails until the page was manually reloaded. Thumbnail URLs are gated by a short-lived camera-stream token that `<img>` tags can't send via `Authorization` headers, so the token is appended as `?token=…` at render time. Two race conditions conspired to break this: (1) the token query was keyed only on `['camera-stream-token']` and fired while the user was still on the login page, 401'd, and stayed cached — after sign-in nothing invalidated it; (2) when the token did eventually arrive, the global variable holding it was not reactive, so any File Manager / Archives page that had already rendered kept serving image URLs with no token. The token query now includes the user id in its key and is gated on `!!user`, so a new login always triggers a fresh fetch; and when the token transitions from null to a value, `useStreamTokenSync` walks the DOM once and updates `src` on every already-rendered `<img>`/`<video>` pointing at `/api/v1/` without the current token, reloading them in place.
 - **Thumbnails Blank Until Reload After Sign-In** — On auth-enabled instances, signing out and back in left the File Manager (and occasionally the Archives page) full of broken thumbnails until the page was manually reloaded. Thumbnail URLs are gated by a short-lived camera-stream token that `<img>` tags can't send via `Authorization` headers, so the token is appended as `?token=…` at render time. Two race conditions conspired to break this: (1) the token query was keyed only on `['camera-stream-token']` and fired while the user was still on the login page, 401'd, and stayed cached — after sign-in nothing invalidated it; (2) when the token did eventually arrive, the global variable holding it was not reactive, so any File Manager / Archives page that had already rendered kept serving image URLs with no token. The token query now includes the user id in its key and is gated on `!!user`, so a new login always triggers a fresh fetch; and when the token transitions from null to a value, `useStreamTokenSync` walks the DOM once and updates `src` on every already-rendered `<img>`/`<video>` pointing at `/api/v1/` without the current token, reloading them in place.
 - **P2S Firmware Check Shows Stale "Latest" Version** ([#1030](https://github.com/maziggy/bambuddy/issues/1030)) — On P2S (and X2D) the Firmware Info modal reported `01.01.01.00` as the newest available release even though `01.02.00.00` had shipped on the Bambu Lab wiki weeks earlier, so the "update available" badge never appeared. Two silent regex mismatches in the wiki scraper caused `_fetch_all_versions_from_wiki()` to return an empty list: (1) the section-heading anchor parser required a dash between the version bytes and the release date (`id="h-01020000-20260409"`), but P2S and X2D publish anchors without the dash (`id="h-0102000020260409"`); (2) the text-based fallback only accepted ASCII parens around the date, while P2S, X2D, A1 and A1-mini headings render dates in full-width `(YYYYMMDD)` (U+FF08/U+FF09). When both paths failed, the code silently fell back to the Bambu Lab download page, which still lagged at `01.01.01.00`. The anchor regex now accepts an optional dash and the fallback accepts both paren styles; added regression tests for the no-dash anchor and full-width paren shapes. Thanks to @Minebuddy for reporting.
 - **P2S Firmware Check Shows Stale "Latest" Version** ([#1030](https://github.com/maziggy/bambuddy/issues/1030)) — On P2S (and X2D) the Firmware Info modal reported `01.01.01.00` as the newest available release even though `01.02.00.00` had shipped on the Bambu Lab wiki weeks earlier, so the "update available" badge never appeared. Two silent regex mismatches in the wiki scraper caused `_fetch_all_versions_from_wiki()` to return an empty list: (1) the section-heading anchor parser required a dash between the version bytes and the release date (`id="h-01020000-20260409"`), but P2S and X2D publish anchors without the dash (`id="h-0102000020260409"`); (2) the text-based fallback only accepted ASCII parens around the date, while P2S, X2D, A1 and A1-mini headings render dates in full-width `(YYYYMMDD)` (U+FF08/U+FF09). When both paths failed, the code silently fell back to the Bambu Lab download page, which still lagged at `01.01.01.00`. The anchor regex now accepts an optional dash and the fallback accepts both paren styles; added regression tests for the no-dash anchor and full-width paren shapes. Thanks to @Minebuddy for reporting.
 - **Library File Print-Usage Tracking** ([#1008](https://github.com/maziggy/bambuddy/issues/1008)) — `LibraryFile.print_count` and `last_printed_at` are now updated on every successful queued print completion. Previously both fields were defined on the model and displayed in the File Manager, but nothing ever wrote to them — every file in every library showed as never printed. Now counts increment cumulatively and `last_printed_at` stamps the completion timestamp (UTC). Failed, cancelled and user-aborted prints are intentionally excluded, so the fields represent "successful usage" rather than "attempted usage." This unblocks sorting the File Manager by last-printed date and is a prerequisite for the scheduled-purge feature requested in #1008. Thanks to @cadtoolbox for the report.
 - **Library File Print-Usage Tracking** ([#1008](https://github.com/maziggy/bambuddy/issues/1008)) — `LibraryFile.print_count` and `last_printed_at` are now updated on every successful queued print completion. Previously both fields were defined on the model and displayed in the File Manager, but nothing ever wrote to them — every file in every library showed as never printed. Now counts increment cumulatively and `last_printed_at` stamps the completion timestamp (UTC). Failed, cancelled and user-aborted prints are intentionally excluded, so the fields represent "successful usage" rather than "attempted usage." This unblocks sorting the File Manager by last-printed date and is a prerequisite for the scheduled-purge feature requested in #1008. Thanks to @cadtoolbox for the report.

+ 71 - 4
backend/app/services/archive.py

@@ -1,6 +1,7 @@
 import hashlib
 import hashlib
 import json
 import json
 import logging
 import logging
+import os
 import re
 import re
 import shutil
 import shutil
 import zipfile
 import zipfile
@@ -19,6 +20,27 @@ from backend.app.models.printer import Printer
 logger = logging.getLogger(__name__)
 logger = logging.getLogger(__name__)
 
 
 
 
+def _copy_and_fsync(src: Path, dst: Path, chunk_size: int = 1024 * 1024) -> None:
+    """Copy src to dst with an explicit chunked read/write and fsync the dst.
+
+    Replacement for shutil.copy2 in the archive pipeline. shutil.copy2 uses
+    Linux sendfile(), which on some kernels/filesystems has returned a short
+    count on the first call and truncated the destination for larger 3MF
+    uploads (#1032, observed on Raspberry Pi OS bookworm / armv7l). An
+    explicit loop with fsync avoids that path and guarantees the dest bytes
+    are on disk before the caller inspects them as a ZIP.
+    """
+    with src.open("rb") as rf, dst.open("wb") as wf:
+        while True:
+            buf = rf.read(chunk_size)
+            if not buf:
+                break
+            wf.write(buf)
+        wf.flush()
+        os.fsync(wf.fileno())
+    shutil.copystat(src, dst)
+
+
 class ThreeMFParser:
 class ThreeMFParser:
     """Parser for Bambu Lab 3MF files."""
     """Parser for Bambu Lab 3MF files."""
 
 
@@ -56,8 +78,16 @@ class ThreeMFParser:
                 self.metadata.pop("_slice_filament_type", None)
                 self.metadata.pop("_slice_filament_type", None)
                 self.metadata.pop("_slice_filament_color", None)
                 self.metadata.pop("_slice_filament_color", None)
                 self.metadata.pop("_plate_index", None)
                 self.metadata.pop("_plate_index", None)
-        except Exception:
-            pass  # Return whatever metadata was extracted before the error
+        except Exception as e:
+            # Return whatever metadata was extracted before the error, but
+            # surface the failure so corrupted / truncated 3MF archives are
+            # visible in support bundles (#1032).
+            logger.warning(
+                "ThreeMFParser: failed to parse %s: %s(%s) — returning partial metadata",
+                self.file_path,
+                type(e).__name__,
+                e,
+            )
         return self.metadata
         return self.metadata
 
 
     def _parse_slice_info(self, zf: zipfile.ZipFile):
     def _parse_slice_info(self, zf: zipfile.ZipFile):
@@ -888,9 +918,46 @@ class ArchiveService:
         archive_dir = settings.archive_dir / printer_folder / archive_name
         archive_dir = settings.archive_dir / printer_folder / archive_name
         archive_dir.mkdir(parents=True, exist_ok=True)
         archive_dir.mkdir(parents=True, exist_ok=True)
 
 
-        # Copy 3MF file
+        # Copy 3MF file with an explicit fsync'd loop (avoids a sendfile
+        # short-read quirk that silently truncated 3MF archives on some
+        # platforms — see _copy_and_fsync and #1032).
         dest_file = archive_dir / source_file.name
         dest_file = archive_dir / source_file.name
-        shutil.copy2(source_file, dest_file)
+        _copy_and_fsync(source_file, dest_file)
+
+        # If we just archived a 3MF, verify the dest is a valid ZIP before
+        # going any further. Staying quiet here is how #1032 escaped review —
+        # the archive row was written but every later zipfile.ZipFile() call
+        # on the dest failed with "File is not a zip file".
+        if (
+            source_file.suffix.lower() == ".3mf"
+            and zipfile.is_zipfile(source_file)
+            and not zipfile.is_zipfile(dest_file)
+        ):
+            try:
+                src_size = source_file.stat().st_size
+                dst_size = dest_file.stat().st_size
+            except OSError:
+                src_size = dst_size = -1
+            logger.error(
+                "Archive copy corrupted 3MF: src=%s (%s bytes, valid ZIP) -> dst=%s (%s bytes, NOT a ZIP). Refusing to create archive row.",
+                source_file,
+                src_size,
+                dest_file,
+                dst_size,
+            )
+            # Narrow cleanup: remove only the truncated file and the archive
+            # directory if it's now empty. archive_dir was created with
+            # exist_ok=True so it could in theory pre-date this call (e.g.
+            # same-second same-filename collision); rmtree would be too broad.
+            try:
+                dest_file.unlink()
+            except OSError:
+                pass
+            try:
+                archive_dir.rmdir()
+            except OSError:
+                pass  # directory not empty — leave untouched
+            return None
 
 
         # Compute content hash for duplicate detection
         # Compute content hash for duplicate detection
         content_hash = self.compute_file_hash(dest_file)
         content_hash = self.compute_file_hash(dest_file)

+ 133 - 0
backend/tests/unit/services/test_archive_copy.py

@@ -0,0 +1,133 @@
+"""
+Tests for the 3MF archive copy path.
+
+Regression guards for #1032 where large 3MF files were silently truncated
+during archiving on Raspberry Pi OS / armv7l, leaving the archive row in
+place but the on-disk file no longer a valid ZIP.
+"""
+
+import io
+import logging
+import os
+import zipfile
+from pathlib import Path
+
+import pytest
+
+from backend.app.services.archive import ThreeMFParser, _copy_and_fsync
+
+
+def _make_3mf(path: Path, payload_size: int = 0) -> None:
+    """Write a minimal valid 3MF (ZIP) file with an optional large payload."""
+    with zipfile.ZipFile(path, "w", zipfile.ZIP_DEFLATED) as zf:
+        zf.writestr("Metadata/slice_info.config", "<config/>")
+        if payload_size:
+            # Uncompressible payload forces real bytes on disk.
+            zf.writestr("blob.bin", os.urandom(payload_size))
+
+
+class TestCopyAndFsync:
+    def test_copies_small_file_byte_for_byte(self, tmp_path: Path) -> None:
+        src = tmp_path / "src.bin"
+        dst = tmp_path / "dst.bin"
+        src.write_bytes(b"hello world")
+
+        _copy_and_fsync(src, dst)
+
+        assert dst.read_bytes() == b"hello world"
+
+    def test_copies_large_file_byte_for_byte(self, tmp_path: Path) -> None:
+        """Spans multiple 1 MiB chunks to exercise the copy loop."""
+        src = tmp_path / "src.bin"
+        dst = tmp_path / "dst.bin"
+        payload = os.urandom(5 * 1024 * 1024 + 123)  # 5 MiB + change
+        src.write_bytes(payload)
+
+        _copy_and_fsync(src, dst)
+
+        assert dst.stat().st_size == len(payload)
+        assert dst.read_bytes() == payload
+
+    def test_preserves_mtime_via_copystat(self, tmp_path: Path) -> None:
+        src = tmp_path / "src.bin"
+        dst = tmp_path / "dst.bin"
+        src.write_bytes(b"x")
+        os.utime(src, (1_700_000_000, 1_700_000_000))
+
+        _copy_and_fsync(src, dst)
+
+        assert int(dst.stat().st_mtime) == 1_700_000_000
+
+    def test_overwrites_existing_destination(self, tmp_path: Path) -> None:
+        src = tmp_path / "src.bin"
+        dst = tmp_path / "dst.bin"
+        src.write_bytes(b"new")
+        dst.write_bytes(b"old old old")
+
+        _copy_and_fsync(src, dst)
+
+        assert dst.read_bytes() == b"new"
+
+    def test_produces_valid_zip_on_3mf(self, tmp_path: Path) -> None:
+        """The whole point of #1032: copy of a valid 3MF stays a valid ZIP."""
+        src = tmp_path / "src.3mf"
+        dst = tmp_path / "dst.3mf"
+        _make_3mf(src, payload_size=2 * 1024 * 1024)  # 2 MiB, multi-chunk
+        assert zipfile.is_zipfile(src)
+
+        _copy_and_fsync(src, dst)
+
+        assert zipfile.is_zipfile(dst)
+
+
+class TestThreeMFParserErrorVisibility:
+    def test_parse_logs_warning_on_corrupted_zip(
+        self,
+        tmp_path: Path,
+        caplog: pytest.LogCaptureFixture,
+    ) -> None:
+        """Silent `except Exception: pass` was how #1032 escaped detection;
+        parse() must now surface the failure at WARNING."""
+        corrupted = tmp_path / "bad.3mf"
+        corrupted.write_bytes(b"not a zip")
+
+        with caplog.at_level(logging.WARNING, logger="backend.app.services.archive"):
+            result = ThreeMFParser(corrupted).parse()
+
+        assert result == {}
+        assert any("failed to parse" in rec.message and str(corrupted) in rec.message for rec in caplog.records), (
+            "Expected a WARNING mentioning the failed parse and file path"
+        )
+
+    def test_parse_returns_partial_metadata_without_raising(
+        self,
+        tmp_path: Path,
+    ) -> None:
+        """A valid-but-minimal 3MF must still parse without raising."""
+        p = tmp_path / "ok.3mf"
+        with zipfile.ZipFile(p, "w") as zf:
+            zf.writestr("Metadata/slice_info.config", "<config/>")
+
+        result = ThreeMFParser(p).parse()
+
+        # No assertions about which keys are present — just that it didn't blow up.
+        assert isinstance(result, dict)
+
+
+class TestZipFileSentinel:
+    """Sanity check the sentinel the archive pipeline relies on."""
+
+    def test_is_zipfile_on_truncated_zip_returns_false(self, tmp_path: Path) -> None:
+        """Truncating a valid ZIP mid-stream must flip is_zipfile() to False.
+        This is the exact post-condition archive_print now trusts."""
+        src = tmp_path / "src.3mf"
+        _make_3mf(src, payload_size=1024 * 1024)
+        full = src.read_bytes()
+        assert zipfile.is_zipfile(io.BytesIO(full))
+
+        truncated = tmp_path / "truncated.3mf"
+        # Strip the trailing end-of-central-directory record — exactly what a
+        # short sendfile return would leave behind.
+        truncated.write_bytes(full[: len(full) // 2])
+
+        assert not zipfile.is_zipfile(truncated)