Jelajahi Sumber

Fix SpoolBuddy update still failing in Docker after keypair fix

  Commit 67749565 eliminated ssh-keygen from the SpoolBuddy remote-update
  flow, but the update path still shelled out to the OpenSSH `ssh` client
  for every command. Like ssh-keygen, the `ssh` binary calls
  getpwuid(getuid()) during startup and aborts with "No user exists for
  uid <N>" when the container runs under an arbitrary PUID that isn't in
  /etc/passwd (python:3.13-slim only ships a root entry, so any
  `user: "1000:1000"` compose setup trips the same error).

  detect_current_branch() had a related problem: when the git repo is
  bind-mounted into the container, .git exists inside Docker, so the code
  tried to run `git rev-parse`. Git isn't in the image, so the subprocess
  silently fell back to the GIT_BRANCH env var — and if git ever were
  added, it could hit the same getpwuid trap.

  The entire update path is now subprocess-free:

  - _run_ssh_command uses asyncssh (pure-Python, built on the already
    installed cryptography library). Connection errors map to rc=255 to
    match `ssh`'s convention; asyncio.timeout handles the timeout path.
  - detect_current_branch reads .git/HEAD directly (handling git-worktree
    `gitdir:` pointer files too), keeping the same GIT_BRANCH → "main"
    fallback chain.
  - shutil and the inline `import subprocess` are gone from the module.

  Regression tests assert that neither keypair creation, branch
  detection, nor command execution spawns any subprocess. Native installs
  are unaffected.
maziggy 1 bulan lalu
induk
melakukan
a78a4bff2e

+ 1 - 1
CHANGELOG.md

@@ -13,7 +13,7 @@ All notable changes to Bambuddy will be documented in this file.
 ### Fixed
 - **External Sidebar Link Icon Not Showing** ([#878](https://github.com/maziggy/bambuddy/issues/878)) — Custom icons uploaded for external sidebar links rendered correctly in the edit dialog but were missing from the sidebar itself, and opening the icon URL directly returned `{"detail":"Valid camera stream token required..."}`. The sidebar `<img>` tag in `Layout.tsx` used a raw `/api/v1/external-links/{id}/icon` URL, but that endpoint is protected by a query-string stream token (the same mechanism used for camera streams and archive thumbnails, because `<img>` tags cannot send Authorization headers). The edit dialog already routed through `api.getExternalLinkIconUrl()`, which wraps the URL via `withStreamToken()`; the sidebar now does the same, so icons appear when auth is enabled.
 - **Shortest Job First Toggle Disappears After Clicking** ([#879](https://github.com/maziggy/bambuddy/issues/879)) — The SJF toggle badge on the queue page was rendered inside the Pending Queue section header, which is only shown when there is at least one pending item and the list view is active. Clicking the toggle often coincided with the scheduler starting the only pending print, at which point the Pending section unmounted and the toggle vanished along with it — making it look like the button had disappeared after clicking. The toggle has been moved to the top of the queue page, next to the list/timeline view switcher, so it stays reachable regardless of pending-item count, active filters, or the selected view mode.
-- **SpoolBuddy Update Fails in Docker with "no user exists for uid 1001"** — The SpoolBuddy remote-update flow shelled out to `ssh-keygen` to create its update keypair on first use. Inside the Docker container the process runs under an arbitrary PUID (default 1000, also seen as 1001) that is not listed in `/etc/passwd`, so `ssh-keygen` aborted at the `getpwuid()` home-directory lookup and the update button reported `ssh-keygen failed. no user exists for uid 1001`. The keypair is now generated in-process via the `cryptography` library (already a dependency), which has no user-database lookup and produces the same OpenSSH-format files. Native installs are unaffected — they already worked because the running user was always in `/etc/passwd`.
+- **SpoolBuddy Update Fails in Docker with "no user exists for uid 1000"** — The SpoolBuddy remote-update flow shelled out to the OpenSSH `ssh-keygen` and `ssh` binaries for keypair creation and command execution. Both of those binaries call `getpwuid(getuid())` during startup and abort with `No user exists for uid <N>` when the container runs under an arbitrary PUID that is not listed in `/etc/passwd` (the default `python:3.13-slim` image only has an entry for root, so running with `user: "1000:1000"` — or any non-root user — tripped the same error). The entire SpoolBuddy update path is now subprocess-free: keypairs are generated in-process via the `cryptography` library (already a dependency), SSH commands run through the pure-Python `asyncssh` client, and git-branch detection reads `.git/HEAD` directly instead of shelling out to `git`. Native installs behave identically — they already worked because the running user was always in `/etc/passwd`. A regression test asserts that neither keypair creation nor command execution spawns any subprocess.
 - **Camera Stream "6 of 5" Reconnect Counter + ffmpeg Log Flood** ([#925](https://github.com/maziggy/bambuddy/issues/925)) — Two bugs surfaced while investigating camera reconnect behaviour. First, the camera page briefly displayed "Reconnecting attempt 6 of 5" before giving up, because the attempt counter could be incremented to the maximum while the reconnect banner was still rendering. The displayed value is now clamped to the configured maximum. Second, every failed ffmpeg spawn logged the full ~20-line ffmpeg version/configuration banner, producing hundreds of lines of noise per failed camera click (one reported click produced 555 log lines across 30 retries). A new stderr summarizer strips the ffmpeg banner before logging so only the actual error lines remain. The underlying "camera service stops accepting new connections after prolonged uptime" behaviour in the X1C firmware is still under investigation.
 - **LDAP POSIX Primary Group Ignored** — LDAP authentication only looked at groups that listed the user explicitly via `memberUid` (supplementary group membership). A user's POSIX primary group — referenced by the `gidNumber` attribute on the user object and matching the `gidNumber` on a `posixGroup` — was ignored entirely, so users whose role came from their primary group landed without the expected permissions. The authenticator now also searches for `posixGroup` entries whose `gidNumber` matches the user's primary `gidNumber`, and dedupes DNs case-insensitively before resolving the group mapping (LDAP DNs are case-insensitive by spec).
 - **Support Bundle Leaks Virtual Printer IP Address** — The debug support bundle included the `virtual_printer_remote_interface_ip` setting value unmasked in `support-info.json`. The setting key didn't match any of the existing sensitive-key filters, so the raw IP address was included in the bundle. Added `_ip` to the sensitive key filter so IP address settings are excluded from support bundles. Log file content was already covered by the existing IPv4 regex redaction.

+ 56 - 46
backend/app/services/spoolbuddy_ssh.py

@@ -3,14 +3,21 @@
 Instead of the daemon updating itself (fragile: permission issues, self-modifying
 code, hardcoded branch), Bambuddy SSHes into the SpoolBuddy Pi and drives the
 update remotely: git fetch/checkout, pip install, systemctl restart.
+
+Uses `asyncssh` (pure-Python async SSH client) rather than shelling out to the
+OpenSSH `ssh` binary. The subprocess approach fails in Docker: both `ssh` and
+`ssh-keygen` call `getpwuid(getuid())` during startup and abort with
+"No user exists for uid <N>" when the container runs under a UID that is not
+listed in /etc/passwd (e.g. PUID=1000 on python:3.13-slim, which only has
+entries for root). asyncssh does all of its work in-process.
 """
 
 import asyncio
 import logging
 import os
-import shutil
 from pathlib import Path
 
+import asyncssh
 from cryptography.hazmat.primitives import serialization
 from cryptography.hazmat.primitives.asymmetric import ed25519
 
@@ -80,26 +87,32 @@ async def get_public_key() -> str:
 def detect_current_branch() -> str:
     """Detect the git branch Bambuddy is running on.
 
-    For native installs, reads from the .git directory.
-    For Docker (no .git), falls back to GIT_BRANCH env var, then "main".
+    Reads `.git/HEAD` directly rather than shelling out to `git`. This keeps
+    the behaviour identical for native installs, bare Docker containers
+    (no .git — fall through to the env var), and Docker containers that
+    bind-mount the repo (.git is present, no `git` binary required, and no
+    `getpwuid()` call that could fail under an arbitrary PUID).
+
+    Fallback order: `.git/HEAD` → `GIT_BRANCH` env var → `"main"`.
     """
-    git_dir = settings.base_dir / ".git"
-    if git_dir.exists():
-        git_path = shutil.which("git") or "/usr/bin/git"
-        try:
-            import subprocess
-
-            result = subprocess.run(
-                [git_path, "rev-parse", "--abbrev-ref", "HEAD"],
-                cwd=str(settings.base_dir),
-                capture_output=True,
-                text=True,
-                timeout=5,
-            )
-            if result.returncode == 0 and result.stdout.strip():
-                return result.stdout.strip()
-        except Exception:
-            pass
+    git_path = settings.base_dir / ".git"
+    try:
+        if git_path.exists():
+            # Git worktrees use a file containing `gitdir: <path>` instead of a dir.
+            if git_path.is_file():
+                content = git_path.read_text(encoding="utf-8").strip()
+                if content.startswith("gitdir:"):
+                    git_path = (settings.base_dir / content.removeprefix("gitdir:").strip()).resolve()
+
+            head_file = git_path / "HEAD"
+            if head_file.is_file():
+                head = head_file.read_text(encoding="utf-8").strip()
+                # Normal case: `ref: refs/heads/<branch>`.
+                # Detached HEAD stores a raw commit hash — fall through to env var.
+                if head.startswith("ref: refs/heads/"):
+                    return head.removeprefix("ref: refs/heads/").strip()
+    except OSError as exc:
+        logger.debug("Could not read .git/HEAD, falling back: %s", exc)
 
     return os.environ.get("GIT_BRANCH", "main")
 
@@ -112,36 +125,33 @@ async def _run_ssh_command(
 ) -> tuple[int, str, str]:
     """Execute a command on a SpoolBuddy device via SSH.
 
-    Returns (returncode, stdout, stderr).
+    Uses asyncssh rather than the OpenSSH `ssh` binary — see module docstring
+    for the Docker/PUID rationale.
+
+    Returns (returncode, stdout, stderr). On connection failure the return
+    code is 255 (matching `ssh`'s own convention) and stderr carries the
+    asyncssh error message. On timeout the return code is -1.
     """
-    ssh_path = shutil.which("ssh") or "/usr/bin/ssh"
-    proc = await asyncio.create_subprocess_exec(
-        ssh_path,
-        "-i",
-        str(private_key),
-        "-o",
-        "StrictHostKeyChecking=no",
-        "-o",
-        "UserKnownHostsFile=/dev/null",
-        "-o",
-        "ConnectTimeout=10",
-        "-o",
-        "BatchMode=yes",
-        "-o",
-        "LogLevel=ERROR",
-        f"{SSH_USER}@{ip}",
-        command,
-        stdout=asyncio.subprocess.PIPE,
-        stderr=asyncio.subprocess.PIPE,
-    )
     try:
-        stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout)
+        async with asyncio.timeout(timeout):
+            async with asyncssh.connect(
+                host=ip,
+                username=SSH_USER,
+                client_keys=[str(private_key)],
+                known_hosts=None,  # equivalent to StrictHostKeyChecking=no + UserKnownHostsFile=/dev/null
+                connect_timeout=10,
+            ) as conn:
+                result = await conn.run(command, check=False)
     except TimeoutError:
-        proc.kill()
-        await proc.communicate()
         return -1, "", "SSH command timed out"
-
-    return proc.returncode, stdout.decode(), stderr.decode()
+    except (asyncssh.Error, OSError) as exc:
+        return 255, "", str(exc)
+
+    stdout = result.stdout if isinstance(result.stdout, str) else (result.stdout or b"").decode(errors="replace")
+    stderr = result.stderr if isinstance(result.stderr, str) else (result.stderr or b"").decode(errors="replace")
+    # asyncssh's exit_status is None when the remote closed without setting one
+    returncode = result.exit_status if result.exit_status is not None else 0
+    return returncode, stdout, stderr
 
 
 async def perform_ssh_update(device_id: str, ip_address: str, install_path: str | None = None) -> None:

+ 130 - 29
backend/tests/unit/services/test_spoolbuddy_ssh.py

@@ -1,5 +1,6 @@
 """Unit tests for SpoolBuddy SSH update service."""
 
+import asyncio
 import os
 from unittest.mock import AsyncMock, MagicMock, patch
 
@@ -113,15 +114,49 @@ async def test_get_public_key(tmp_path):
 # -- detect_current_branch ----------------------------------------------------
 
 
-def test_detect_branch_from_git(tmp_path):
-    (tmp_path / ".git").mkdir()
+def test_detect_branch_from_git_head(tmp_path):
+    """Read branch directly from .git/HEAD — no subprocess."""
+    git_dir = tmp_path / ".git"
+    git_dir.mkdir()
+    (git_dir / "HEAD").write_text("ref: refs/heads/dev\n")
+
     with (
         patch("backend.app.services.spoolbuddy_ssh.settings") as mock_settings,
+        patch("asyncio.create_subprocess_exec") as mock_exec,
         patch("subprocess.run") as mock_run,
     ):
         mock_settings.base_dir = tmp_path
-        mock_run.return_value = MagicMock(returncode=0, stdout="dev\n")
         assert detect_current_branch() == "dev"
+        # Regression guard: must not shell out (fails with getpwuid under
+        # arbitrary Docker PUIDs if ever reintroduced).
+        mock_exec.assert_not_called()
+        mock_run.assert_not_called()
+
+
+def test_detect_branch_worktree_gitdir_file(tmp_path):
+    """Git worktrees store a `gitdir:` pointer instead of a dir — follow it."""
+    real_git_dir = tmp_path / "real-git"
+    real_git_dir.mkdir()
+    (real_git_dir / "HEAD").write_text("ref: refs/heads/feature-x\n")
+    (tmp_path / ".git").write_text(f"gitdir: {real_git_dir}\n")
+
+    with patch("backend.app.services.spoolbuddy_ssh.settings") as mock_settings:
+        mock_settings.base_dir = tmp_path
+        assert detect_current_branch() == "feature-x"
+
+
+def test_detect_branch_detached_head_falls_back(tmp_path):
+    """Detached HEAD (raw commit hash) should fall through to the env var."""
+    git_dir = tmp_path / ".git"
+    git_dir.mkdir()
+    (git_dir / "HEAD").write_text("deadbeef1234\n")
+
+    with (
+        patch("backend.app.services.spoolbuddy_ssh.settings") as mock_settings,
+        patch.dict(os.environ, {"GIT_BRANCH": "release"}),
+    ):
+        mock_settings.base_dir = tmp_path
+        assert detect_current_branch() == "release"
 
 
 def test_detect_branch_env_fallback(tmp_path):
@@ -145,6 +180,12 @@ def test_detect_branch_default_main(tmp_path):
 
 
 # -- _run_ssh_command ----------------------------------------------------------
+#
+# _run_ssh_command uses asyncssh (pure Python) rather than the OpenSSH `ssh`
+# binary. Both `ssh` and `ssh-keygen` call getpwuid(getuid()) during startup
+# and abort with "No user exists for uid <N>" when the container runs under
+# an arbitrary PUID that is not listed in /etc/passwd — asyncssh avoids the
+# subprocess entirely.
 
 
 @pytest.mark.asyncio
@@ -152,61 +193,121 @@ async def test_run_ssh_command_success(tmp_path):
     key_file = tmp_path / "key"
     key_file.write_text("KEY")
 
-    mock_proc = AsyncMock()
-    mock_proc.communicate = AsyncMock(return_value=(b"hello\n", b""))
-    mock_proc.returncode = 0
+    mock_result = MagicMock()
+    mock_result.stdout = "hello\n"
+    mock_result.stderr = ""
+    mock_result.exit_status = 0
+
+    mock_conn = AsyncMock()
+    mock_conn.run = AsyncMock(return_value=mock_result)
+    mock_conn.__aenter__ = AsyncMock(return_value=mock_conn)
+    mock_conn.__aexit__ = AsyncMock(return_value=False)
 
-    with patch("asyncio.create_subprocess_exec", return_value=mock_proc) as mock_exec:
+    with patch("backend.app.services.spoolbuddy_ssh.asyncssh.connect", return_value=mock_conn) as mock_connect:
         rc, stdout, stderr = await _run_ssh_command("10.0.0.1", "echo hello", key_file)
 
     assert rc == 0
     assert stdout == "hello\n"
     assert stderr == ""
-    args = mock_exec.call_args[0]
-    assert "spoolbuddy@10.0.0.1" in args
-    assert "echo hello" in args
-    assert "BatchMode=yes" in args
+    kwargs = mock_connect.call_args.kwargs
+    assert kwargs["host"] == "10.0.0.1"
+    assert kwargs["username"] == "spoolbuddy"
+    assert kwargs["client_keys"] == [str(key_file)]
+    # Host-key verification is disabled (equivalent to StrictHostKeyChecking=no)
+    assert kwargs["known_hosts"] is None
+    mock_conn.run.assert_awaited_once()
+    run_args = mock_conn.run.call_args
+    assert run_args.args[0] == "echo hello"
+    # check=False — we handle non-zero exit codes ourselves
+    assert run_args.kwargs.get("check") is False
 
 
 @pytest.mark.asyncio
-async def test_run_ssh_command_failure(tmp_path):
+async def test_run_ssh_command_no_subprocess(tmp_path):
+    """Regression guard: _run_ssh_command must not spawn any subprocess.
+
+    The whole point of switching to asyncssh is to avoid `ssh`/`ssh-keygen`
+    calling getpwuid() inside Docker containers with arbitrary PUIDs.
+    """
     key_file = tmp_path / "key"
     key_file.write_text("KEY")
 
-    mock_proc = AsyncMock()
-    mock_proc.communicate = AsyncMock(return_value=(b"", b"Connection refused"))
-    mock_proc.returncode = 255
+    mock_result = MagicMock()
+    mock_result.stdout = ""
+    mock_result.stderr = ""
+    mock_result.exit_status = 0
+
+    mock_conn = AsyncMock()
+    mock_conn.run = AsyncMock(return_value=mock_result)
+    mock_conn.__aenter__ = AsyncMock(return_value=mock_conn)
+    mock_conn.__aexit__ = AsyncMock(return_value=False)
 
-    with patch("asyncio.create_subprocess_exec", return_value=mock_proc):
+    with (
+        patch("backend.app.services.spoolbuddy_ssh.asyncssh.connect", return_value=mock_conn),
+        patch("asyncio.create_subprocess_exec") as mock_exec,
+    ):
+        await _run_ssh_command("10.0.0.1", "echo hi", key_file)
+
+    mock_exec.assert_not_called()
+
+
+@pytest.mark.asyncio
+async def test_run_ssh_command_connection_failure(tmp_path):
+    """Connection errors should surface as rc=255 with the asyncssh message."""
+    import asyncssh
+
+    key_file = tmp_path / "key"
+    key_file.write_text("KEY")
+
+    with patch(
+        "backend.app.services.spoolbuddy_ssh.asyncssh.connect",
+        side_effect=asyncssh.Error(code=0, reason="Connection refused"),
+    ):
         rc, stdout, stderr = await _run_ssh_command("10.0.0.1", "echo hello", key_file)
 
     assert rc == 255
+    assert stdout == ""
     assert "Connection refused" in stderr
 
 
+@pytest.mark.asyncio
+async def test_run_ssh_command_os_error(tmp_path):
+    """OS-level connection errors (DNS, route) also map to rc=255."""
+    key_file = tmp_path / "key"
+    key_file.write_text("KEY")
+
+    with patch(
+        "backend.app.services.spoolbuddy_ssh.asyncssh.connect",
+        side_effect=OSError("Network is unreachable"),
+    ):
+        rc, _, stderr = await _run_ssh_command("10.0.0.1", "echo hello", key_file)
+
+    assert rc == 255
+    assert "Network is unreachable" in stderr
+
+
 @pytest.mark.asyncio
 async def test_run_ssh_command_timeout(tmp_path):
+    """asyncio.timeout should convert long-running commands into rc=-1."""
     key_file = tmp_path / "key"
     key_file.write_text("KEY")
 
-    mock_proc = AsyncMock()
-    mock_proc.communicate = AsyncMock(return_value=(b"", b""))
-    mock_proc.kill = MagicMock()
+    # asyncssh.connect() returns a _ConnectionManager synchronously; the hang
+    # must happen inside __aenter__ so the surrounding asyncio.timeout can
+    # cancel it.
+    mock_conn = AsyncMock()
 
-    async def fake_wait_for(coro, timeout):
-        # Consume the coroutine to avoid warning
-        coro.close()
-        raise TimeoutError
+    async def hang_enter():
+        await asyncio.sleep(10)
 
-    with (
-        patch("asyncio.create_subprocess_exec", return_value=mock_proc),
-        patch("backend.app.services.spoolbuddy_ssh.asyncio.wait_for", side_effect=fake_wait_for),
-    ):
-        rc, stdout, stderr = await _run_ssh_command("10.0.0.1", "sleep 999", key_file, timeout=1)
+    mock_conn.__aenter__ = AsyncMock(side_effect=hang_enter)
+    mock_conn.__aexit__ = AsyncMock(return_value=False)
+
+    with patch("backend.app.services.spoolbuddy_ssh.asyncssh.connect", return_value=mock_conn):
+        rc, _, stderr = await _run_ssh_command("10.0.0.1", "sleep 999", key_file, timeout=0.05)
 
     assert rc == -1
     assert "timed out" in stderr
-    mock_proc.kill.assert_called_once()
 
 
 # -- perform_ssh_update --------------------------------------------------------

+ 5 - 0
requirements.txt

@@ -20,6 +20,11 @@ aioftp>=0.22.0
 pyftpdlib>=2.0.0
 cryptography>=46.0.5
 
+# SpoolBuddy remote SSH updates (pure-Python SSH client; avoids the
+# OpenSSH `ssh` binary which calls getpwuid() and fails in Docker when
+# the container UID isn't in /etc/passwd)
+asyncssh>=2.18.0
+
 # 3MF Processing (standard zipfile is sufficient for Bambu 3MF files)
 defusedxml>=0.7.0  # Safe XML parsing (prevents XXE attacks)