6 روز پیش · 774eba73c8
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,7 @@ All notable changes to Bambuddy will be documented in this file.
 
				 ## [0.2.5b1] - Unreleased
			
 
				 
			
 
				 ### Added
			
 
				+- **Event-loop stall watchdog — makes a frozen backend self-diagnose (#1486 groundwork)** — Several "container hangs after adding a printer" reports share a signature that leaves nothing to act on: the HTTP server goes silent, `/health` hangs, the process may stop responding to SIGTERM — and the logs just stop mid-stream with no traceback, because a frozen asyncio event loop cannot log anything. New `backend/app/services/loop_watchdog.py` closes that blind spot: an async heartbeat re-arms `faulthandler.dump_traceback_later()` every 10s, always 30s ahead. While the loop ticks, the timer is cancelled and re-armed before it can fire; if the loop stalls, the heartbeat can't re-arm and faulthandler's dedicated C-level timer thread — which runs independently of the frozen loop — dumps **every thread's stack to stderr**. The blocked frame then appears in `docker compose logs`, turning an un-diagnosable freeze into a one-command capture. Started in the app lifespan after migrations, stopped cleanly on shutdown; 30s threshold is well above any legitimate on-loop operation, so a trip always means a real bug. 5 unit tests in `test_loop_watchdog.py` (arms the timer, idempotent start, stop disarms + cancels, heartbeat interval below the threshold, survives a re-arm error). Backend ruff clean; full app lifespan verified via the integration suite.
			
 
				 - **Slicer: process & filament profiles filtered by the selected printer (#1325, requested by @IndividualGhost1905)** — In the server-side Slice dialog, picking a printer profile now filters the Process and Filament dropdowns to presets compatible with that printer; presets that resolve to a different Bambu model drop into a trailing "Other printers" group instead of cluttering the main list. Matching uses the slicer's own `compatible_printers` list for imported (local) presets, and falls back to the `@BBL <model>` name suffix for cloud and standard presets, so all three tiers are covered. Compatibility-unknown presets (custom or untagged) are never hidden. Defaults follow suit — the pre-picked process and per-slot filament now prefer a printer-compatible preset, and switching the printer re-picks any selection left incompatible. The printer and process dropdowns also default to the preset names embedded in the source 3MF's `project_settings.config` when those presets are available, instead of always taking the first listed preset. New `frontend/src/utils/slicerPrinterMatch.ts` (11 unit tests) and `extract_embedded_presets_from_3mf` (5 unit tests); `UnifiedPreset` now carries `compatible_printers`, exposed for the local tier (`backend/app/api/routes/slicer_presets.py`); the plates endpoints return `embedded_printer` / `embedded_process`. Parity green, build clean.
			
 
				 - **Spanish (es) translation (#1243, requested by @MiguelAngelLV)** — Bambuddy now ships a full European Spanish locale. New `frontend/src/i18n/locales/es.ts` translates all 4899 keys with placeholders, plural forms, and inline markup preserved; registered in `frontend/src/i18n/index.ts` and selectable as "Español" in the language picker. The parity checker auto-discovers the file — `frontend/scripts/check-i18n-parity.mjs` gained an `ES_COGNATES` allow-list for genuine Spanish cognates and brand/format tokens. Brings the supported-language count to 9 (en / de / es / fr / it / ja / pt-BR / zh-CN / zh-TW). Parity green, frontend build clean.
			
 
				 - **Currency: Belize Dollars (BZD) added to the Settings → Cost currency dropdown (#1454, requested by @PLGuerraDesigns)** — Reporter accurately tracks 3D-printing filament costs in his local currency and BZD wasn't selectable, forcing a manual 2:1 mental conversion from USD. Added `BZD: 'BZ$'` to `frontend/src/utils/currency.ts` next to MXN (Americas dollar-prefix grouping); `getCurrencySymbol('BZD')` returns `'BZ$'` and the SUPPORTED_CURRENCIES list now has 30 entries. Unit test added in `frontend/src/__tests__/utils/currency.test.ts` covering the symbol lookup and presence in SUPPORTED_CURRENCIES; entry-count assertion bumped to 30 so any future additions/removals are caught immediately. 14 currency tests green; frontend build clean.
			
--- a/backend/app/main.py
+++ b/backend/app/main.py
@@ -4933,6 +4933,12 @@ async def lifespan(app: FastAPI):
 
				     # L-2: Start periodic auth cleanup (stale TOTP + expired revoked JTIs)
			
 
				     start_auth_cleanup()
			
 
				 
			
 
				+    # Event-loop stall watchdog: dumps all thread stacks to stderr if the loop
			
 
				+    # freezes (#1486 — silent "container hangs after adding a printer" reports).
			
 
				+    from backend.app.services.loop_watchdog import start_loop_watchdog
			
 
				+
			
 
				+    start_loop_watchdog()
			
 
				+
			
 
				     # Initialize virtual printer manager and sync from DB
			
 
				     from backend.app.services.virtual_printer import virtual_printer_manager
			
 
				 
			
@@ -4960,6 +4966,9 @@ async def lifespan(app: FastAPI):
 
				     stop_runtime_tracking()
			
 
				     stop_spoolbuddy_watchdog()
			
 
				     stop_camera_cleanup()
			
 
				+    from backend.app.services.loop_watchdog import stop_loop_watchdog
			
 
				+
			
 
				+    stop_loop_watchdog()
			
 
				     # Tear down all camera fan-out broadcasters (#1089) so subscribers exit
			
 
				     # cleanly rather than waiting on a queue that nothing will ever fill.
			
 
				     try:
			
--- a/backend/app/services/loop_watchdog.py
+++ b/backend/app/services/loop_watchdog.py
@@ -0,0 +1,77 @@
 
				+"""Event-loop stall watchdog (#1486).
			
 
				+
			
 
				+A frozen asyncio event loop is invisible: it produces no log line and no
			
 
				+traceback — the HTTP server just goes silent, ``/health`` hangs, and the
			
 
				+process can stop responding to SIGTERM. Several "container hangs after adding
			
 
				+a printer" reports had exactly this shape, with nothing in the logs to act on.
			
 
				+
			
 
				+This watchdog makes such a freeze diagnosable. An async heartbeat re-arms
			
 
				+``faulthandler.dump_traceback_later()`` every ``HEARTBEAT_INTERVAL`` seconds,
			
 
				+always ``STALL_THRESHOLD`` seconds ahead. While the loop keeps ticking the
			
 
				+timer is cancelled and re-armed before it can fire. If the loop stalls, the
			
 
				+heartbeat can't re-arm — and faulthandler's timer runs in a dedicated C-level
			
 
				+thread that fires regardless of the frozen loop, dumping *every* thread's
			
 
				+stack to stderr. The blocked frame then shows up in ``docker compose logs``.
			
 
				+"""
			
 
				+
			
 
				+import asyncio
			
 
				+import faulthandler
			
 
				+import logging
			
 
				+
			
 
				+logger = logging.getLogger(__name__)
			
 
				+
			
 
				+# How often the heartbeat cancels + re-arms the faulthandler timer. Must be
			
 
				+# comfortably below STALL_THRESHOLD so a healthy loop always re-arms in time.
			
 
				+HEARTBEAT_INTERVAL = 10.0
			
 
				+
			
 
				+# The loop must be unresponsive for at least this long before thread stacks
			
 
				+# are dumped. Generous on purpose: no legitimate on-loop operation should
			
 
				+# block for 30s, so anything that does is itself a bug worth a stack dump.
			
 
				+STALL_THRESHOLD = 30.0
			
 
				+
			
 
				+_watchdog_task: asyncio.Task | None = None
			
 
				+
			
 
				+
			
 
				+async def _heartbeat_loop() -> None:
			
 
				+    """Re-arm the faulthandler stall timer on every tick."""
			
 
				+    while True:
			
 
				+        try:
			
 
				+            faulthandler.cancel_dump_traceback_later()
			
 
				+            # repeat=False: one dump pinpoints a hard freeze. If the loop
			
 
				+            # recovers and stalls again, the next heartbeat re-arms anyway.
			
 
				+            faulthandler.dump_traceback_later(STALL_THRESHOLD, repeat=False)
			
 
				+        except Exception as e:  # never let the watchdog itself crash the app
			
 
				+            logger.warning("Loop watchdog re-arm failed: %s", e)
			
 
				+        try:
			
 
				+            await asyncio.sleep(HEARTBEAT_INTERVAL)
			
 
				+        except asyncio.CancelledError:
			
 
				+            break
			
 
				+
			
 
				+
			
 
				+def start_loop_watchdog() -> None:
			
 
				+    """Start the event-loop stall watchdog. Idempotent."""
			
 
				+    global _watchdog_task
			
 
				+    if _watchdog_task is not None:
			
 
				+        return
			
 
				+    if not faulthandler.is_enabled():
			
 
				+        # Also installs handlers for fatal signals (SIGSEGV etc.) — harmless
			
 
				+        # and useful; the dump_traceback_later timer works either way.
			
 
				+        faulthandler.enable()
			
 
				+    _watchdog_task = asyncio.create_task(_heartbeat_loop())
			
 
				+    logger.info(
			
 
				+        "Event-loop stall watchdog started — dumps all thread stacks to stderr if the loop stalls for more than %.0fs",
			
 
				+        STALL_THRESHOLD,
			
 
				+    )
			
 
				+
			
 
				+
			
 
				+def stop_loop_watchdog() -> None:
			
 
				+    """Stop the watchdog and disarm the pending stall timer."""
			
 
				+    global _watchdog_task
			
 
				+    if _watchdog_task is not None:
			
 
				+        _watchdog_task.cancel()
			
 
				+        _watchdog_task = None
			
 
				+    try:
			
 
				+        faulthandler.cancel_dump_traceback_later()
			
 
				+    except Exception:
			
 
				+        pass
			
 
				+    logger.info("Event-loop stall watchdog stopped")
			
--- a/backend/tests/unit/services/test_loop_watchdog.py
+++ b/backend/tests/unit/services/test_loop_watchdog.py
@@ -0,0 +1,72 @@
 
				+"""Tests for the event-loop stall watchdog (#1486)."""
			
 
				+
			
 
				+import asyncio
			
 
				+import faulthandler
			
 
				+from unittest.mock import patch
			
 
				+
			
 
				+import pytest
			
 
				+
			
 
				+from backend.app.services import loop_watchdog
			
 
				+
			
 
				+
			
 
				+@pytest.fixture(autouse=True)
			
 
				+def _mock_faulthandler():
			
 
				+    """Patch faulthandler so tests never arm a real 30s stall timer that
			
 
				+    could fire mid-suite. Yields (arm_mock, cancel_mock)."""
			
 
				+    with (
			
 
				+        patch.object(faulthandler, "dump_traceback_later") as arm,
			
 
				+        patch.object(faulthandler, "cancel_dump_traceback_later") as cancel,
			
 
				+    ):
			
 
				+        yield arm, cancel
			
 
				+    # Safety net: make sure no test leaves the watchdog task running.
			
 
				+    loop_watchdog.stop_loop_watchdog()
			
 
				+
			
 
				+
			
 
				+async def test_start_arms_the_stall_timer(_mock_faulthandler):
			
 
				+    arm, cancel = _mock_faulthandler
			
 
				+    loop_watchdog.start_loop_watchdog()
			
 
				+    await asyncio.sleep(0)  # let the heartbeat run its first iteration
			
 
				+
			
 
				+    assert cancel.called, "previous timer must be cancelled before re-arming"
			
 
				+    assert arm.called
			
 
				+    # Armed STALL_THRESHOLD seconds ahead, single-shot.
			
 
				+    args, kwargs = arm.call_args
			
 
				+    assert args[0] == loop_watchdog.STALL_THRESHOLD
			
 
				+    assert kwargs.get("repeat") is False
			
 
				+
			
 
				+
			
 
				+async def test_start_is_idempotent(_mock_faulthandler):
			
 
				+    loop_watchdog.start_loop_watchdog()
			
 
				+    first = loop_watchdog._watchdog_task
			
 
				+    loop_watchdog.start_loop_watchdog()
			
 
				+    assert loop_watchdog._watchdog_task is first, "second start must not spawn a task"
			
 
				+
			
 
				+
			
 
				+async def test_stop_cancels_the_task_and_disarms(_mock_faulthandler):
			
 
				+    _arm, cancel = _mock_faulthandler
			
 
				+    loop_watchdog.start_loop_watchdog()
			
 
				+    task = loop_watchdog._watchdog_task
			
 
				+    assert task is not None
			
 
				+
			
 
				+    cancel.reset_mock()
			
 
				+    loop_watchdog.stop_loop_watchdog()
			
 
				+
			
 
				+    assert loop_watchdog._watchdog_task is None
			
 
				+    assert cancel.called, "stop must disarm the pending faulthandler timer"
			
 
				+    await asyncio.sleep(0)
			
 
				+    assert task.cancelled() or task.done()
			
 
				+
			
 
				+
			
 
				+async def test_heartbeat_interval_is_below_stall_threshold():
			
 
				+    """A healthy loop must always re-arm before the timer can fire."""
			
 
				+    assert loop_watchdog.HEARTBEAT_INTERVAL < loop_watchdog.STALL_THRESHOLD
			
 
				+
			
 
				+
			
 
				+async def test_rearm_failure_does_not_crash_the_watchdog(_mock_faulthandler):
			
 
				+    """A faulthandler hiccup must not take down the heartbeat task."""
			
 
				+    arm, _cancel = _mock_faulthandler
			
 
				+    arm.side_effect = RuntimeError("boom")
			
 
				+    loop_watchdog.start_loop_watchdog()
			
 
				+    await asyncio.sleep(0)
			
 
				+    task = loop_watchdog._watchdog_task
			
 
				+    assert task is not None and not task.done(), "watchdog must survive a re-arm error"