Browse Source

Fix SpoolBuddy update failing in Docker with "no user exists for uid"

  The SpoolBuddy remote-update flow shelled out to `ssh-keygen` to create
  its update keypair on first use. Inside the Docker container the process
  runs under an arbitrary PUID that is not listed in /etc/passwd, so
  ssh-keygen aborted at the getpwuid() home-directory lookup with
  "no user exists for uid 1001" and the update button failed.

  Generate the ed25519 keypair in-process via the `cryptography` library
  (already a dependency) and serialize it in OpenSSH format. No subprocess,
  no /etc/passwd lookup. Native installs are unaffected.

  Added a regression test that asserts no subprocess is spawned during
  keypair creation so this can't come back.
maziggy 1 tháng trước cách đây
mục cha
commit
6774956565

+ 1 - 0
CHANGELOG.md

@@ -8,6 +8,7 @@ All notable changes to Bambuddy will be documented in this file.
 - **LDAP Default Fallback Group** — Settings → Authentication → LDAP → Advanced now has a "Default group" selector. When an LDAP user authenticates but is not listed in any mapped LDAP group, they are automatically assigned to this fallback group instead of being left without permissions. Previously such users could log in successfully but landed on empty pages because every permission check failed. Leave the setting empty to preserve the old behavior. A warning is logged each time the fallback is applied so administrators can spot missing group assignments.
 
 ### Fixed
+- **SpoolBuddy Update Fails in Docker with "no user exists for uid 1001"** — The SpoolBuddy remote-update flow shelled out to `ssh-keygen` to create its update keypair on first use. Inside the Docker container the process runs under an arbitrary PUID (default 1000, also seen as 1001) that is not listed in `/etc/passwd`, so `ssh-keygen` aborted at the `getpwuid()` home-directory lookup and the update button reported `ssh-keygen failed. no user exists for uid 1001`. The keypair is now generated in-process via the `cryptography` library (already a dependency), which has no user-database lookup and produces the same OpenSSH-format files. Native installs are unaffected — they already worked because the running user was always in `/etc/passwd`.
 - **Camera Stream "6 of 5" Reconnect Counter + ffmpeg Log Flood** ([#925](https://github.com/maziggy/bambuddy/issues/925)) — Two bugs surfaced while investigating camera reconnect behaviour. First, the camera page briefly displayed "Reconnecting attempt 6 of 5" before giving up, because the attempt counter could be incremented to the maximum while the reconnect banner was still rendering. The displayed value is now clamped to the configured maximum. Second, every failed ffmpeg spawn logged the full ~20-line ffmpeg version/configuration banner, producing hundreds of lines of noise per failed camera click (one reported click produced 555 log lines across 30 retries). A new stderr summarizer strips the ffmpeg banner before logging so only the actual error lines remain. The underlying "camera service stops accepting new connections after prolonged uptime" behaviour in the X1C firmware is still under investigation.
 - **LDAP POSIX Primary Group Ignored** — LDAP authentication only looked at groups that listed the user explicitly via `memberUid` (supplementary group membership). A user's POSIX primary group — referenced by the `gidNumber` attribute on the user object and matching the `gidNumber` on a `posixGroup` — was ignored entirely, so users whose role came from their primary group landed without the expected permissions. The authenticator now also searches for `posixGroup` entries whose `gidNumber` matches the user's primary `gidNumber`, and dedupes DNs case-insensitively before resolving the group mapping (LDAP DNs are case-insensitive by spec).
 - **Support Bundle Leaks Virtual Printer IP Address** — The debug support bundle included the `virtual_printer_remote_interface_ip` setting value unmasked in `support-info.json`. The setting key didn't match any of the existing sensitive-key filters, so the raw IP address was included in the bundle. Added `_ip` to the sensitive key filter so IP address settings are excluded from support bundles. Log file content was already covered by the existing IPv4 regex redaction.

+ 28 - 16
backend/app/services/spoolbuddy_ssh.py

@@ -11,6 +11,9 @@ import os
 import shutil
 from pathlib import Path
 
+from cryptography.hazmat.primitives import serialization
+from cryptography.hazmat.primitives.asymmetric import ed25519
+
 from backend.app.core.config import settings
 
 logger = logging.getLogger(__name__)
@@ -28,7 +31,14 @@ def _get_ssh_key_dir() -> Path:
 
 
 async def get_or_create_keypair() -> tuple[Path, Path]:
-    """Return (private_key_path, public_key_path), generating if missing."""
+    """Return (private_key_path, public_key_path), generating if missing.
+
+    Uses the in-process `cryptography` library instead of shelling out to
+    `ssh-keygen`. The subprocess approach fails inside Docker containers when
+    the image runs under an arbitrary UID (e.g. PUID=1001) that is not listed
+    in /etc/passwd — `ssh-keygen` calls `getpwuid()` for the current user's
+    home directory and aborts with "no user exists for uid <N>".
+    """
     key_dir = _get_ssh_key_dir()
     private_key = key_dir / "id_ed25519"
     public_key = key_dir / "id_ed25519.pub"
@@ -37,24 +47,26 @@ async def get_or_create_keypair() -> tuple[Path, Path]:
         return private_key, public_key
 
     logger.info("Generating SSH keypair for SpoolBuddy updates")
-    proc = await asyncio.create_subprocess_exec(
-        "ssh-keygen",
-        "-t",
-        "ed25519",
-        "-f",
-        str(private_key),
-        "-N",
-        "",  # no passphrase
-        "-C",
-        "bambuddy-spoolbuddy",
-        stdout=asyncio.subprocess.PIPE,
-        stderr=asyncio.subprocess.PIPE,
+    priv_obj = ed25519.Ed25519PrivateKey.generate()
+    pub_obj = priv_obj.public_key()
+
+    private_bytes = priv_obj.private_bytes(
+        encoding=serialization.Encoding.PEM,
+        format=serialization.PrivateFormat.OpenSSH,
+        encryption_algorithm=serialization.NoEncryption(),
     )
-    _, stderr = await proc.communicate()
-    if proc.returncode != 0:
-        raise RuntimeError(f"ssh-keygen failed: {stderr.decode()[:200]}")
+    public_bytes = pub_obj.public_bytes(
+        encoding=serialization.Encoding.OpenSSH,
+        format=serialization.PublicFormat.OpenSSH,
+    )
+    # OpenSSH public format has no comment field by default; append one to match
+    # the previous ssh-keygen output so the authorized_keys line is identifiable.
+    public_line = public_bytes + b" bambuddy-spoolbuddy\n"
 
+    private_key.write_bytes(private_bytes)
     private_key.chmod(0o600)
+    public_key.write_bytes(public_line)
+
     logger.info("SSH keypair generated at %s", key_dir)
     return private_key, public_key
 

+ 29 - 29
backend/tests/unit/services/test_spoolbuddy_ssh.py

@@ -53,45 +53,45 @@ async def test_get_or_create_keypair_returns_existing(tmp_path):
 
 @pytest.mark.asyncio
 async def test_get_or_create_keypair_generates_new(tmp_path):
+    """Key generation runs in-process via `cryptography` — no ssh-keygen subprocess.
+
+    This matters in Docker: when the container runs under an arbitrary PUID
+    that isn't in /etc/passwd, `ssh-keygen` aborts with "no user exists for uid
+    <N>". Generating the keypair in-process avoids the getpwuid() lookup.
+    """
+    from cryptography.hazmat.primitives import serialization
+    from cryptography.hazmat.primitives.asymmetric import ed25519
+
     with patch("backend.app.services.spoolbuddy_ssh.settings") as mock_settings:
         mock_settings.base_dir = tmp_path
 
-        ssh_dir = tmp_path / "spoolbuddy" / "ssh"
+        priv, pub = await get_or_create_keypair()
 
-        async def fake_keygen(*args, **kwargs):
-            # Simulate ssh-keygen creating the files
-            ssh_dir.mkdir(parents=True, exist_ok=True)
-            (ssh_dir / "id_ed25519").write_text("PRIVATE")
-            (ssh_dir / "id_ed25519.pub").write_text("PUBLIC")
-            mock_proc = AsyncMock()
-            mock_proc.communicate = AsyncMock(return_value=(b"", b""))
-            mock_proc.returncode = 0
-            return mock_proc
+        assert priv.exists()
+        assert pub.exists()
+        # Private key permissions — no world/group access
+        assert (priv.stat().st_mode & 0o077) == 0
 
-        with patch("asyncio.create_subprocess_exec", side_effect=fake_keygen) as mock_exec:
-            priv, pub = await get_or_create_keypair()
+        # Public key is a valid OpenSSH ed25519 key with our comment
+        pub_text = pub.read_text()
+        assert pub_text.startswith("ssh-ed25519 ")
+        assert pub_text.rstrip().endswith("bambuddy-spoolbuddy")
 
-            mock_exec.assert_called_once()
-            args = mock_exec.call_args[0]
-            assert "ssh-keygen" in args
-            assert "-t" in args
-            assert "ed25519" in args
+        # Private key is a valid OpenSSH-format ed25519 key we can load back
+        loaded = serialization.load_ssh_private_key(priv.read_bytes(), password=None)
+        assert isinstance(loaded, ed25519.Ed25519PrivateKey)
 
 
 @pytest.mark.asyncio
-async def test_get_or_create_keypair_raises_on_failure(tmp_path):
-    with patch("backend.app.services.spoolbuddy_ssh.settings") as mock_settings:
+async def test_get_or_create_keypair_does_not_shell_out(tmp_path):
+    """Regression guard: must not invoke any subprocess (fixes Docker PUID bug)."""
+    with (
+        patch("backend.app.services.spoolbuddy_ssh.settings") as mock_settings,
+        patch("asyncio.create_subprocess_exec") as mock_exec,
+    ):
         mock_settings.base_dir = tmp_path
-
-        mock_proc = AsyncMock()
-        mock_proc.communicate = AsyncMock(return_value=(b"", b"keygen error"))
-        mock_proc.returncode = 1
-
-        with (
-            patch("asyncio.create_subprocess_exec", return_value=mock_proc),
-            pytest.raises(RuntimeError, match="ssh-keygen failed"),
-        ):
-            await get_or_create_keypair()
+        await get_or_create_keypair()
+        mock_exec.assert_not_called()
 
 
 # -- get_public_key ------------------------------------------------------------