Browse Source

fix(test): stop sys.modules-deleting backend.app.main in test_code_quality
+ ci: shard backend tests 4-way + drop -v for ~3.5x wall-clock speedup

Root cause of the 4 CI failures on PR #1514 (all in
test_print_start_assigns_printer_id_to_vp_archive.py +
test_timelapse_baseline_restart_recovery.py): test_all_modules_importable
in test_code_quality.py was deleting backend.app.main from sys.modules
and re-importing it via importlib.import_module. That created NEW
module-level dicts (_timelapse_baselines, _expected_prints,
_active_prints, …) and re-ran root_logger.addHandler — hence the
duplicate log lines at the same microsecond in captured stderr.

Any sibling test that bound those names via "from backend.app.main
import _timelapse_baselines" before the reimport now held a reference
to the OLD dict; production code (reached via "from backend.app.main
import on_print_start") resolved the symbol through the NEW module
instance. Production mutated the new dict, the test read the old one,
the assertion saw None / un-mutated mock_archive.

Locally with -n 30, xdist load-balanced test_code_quality.py to a
different worker process so the collision never happened (which is
why the suite was green for me). CI's -n auto = -n 2 on ubuntu-latest
made the collision deterministic.

Fix: drop the "del sys.modules[name]" step. importlib.import_module
already returns the cached module if cached, or runs the import
machinery if not — either way, any import-time error surfaces. The
"fresh import" framing was theatre; in practice every module in the
list is already imported by other tests/fixtures before this test
runs, so we were never actually getting a fresh import anyway — just
destruction.

CI workflow tightening (separate concern, same PR since both touch
the test infrastructure):

- Dropped -v from the pytest invocation. 5300+ "PASSED foo::bar"
lines per worker were eating ~30-60s of stdout I/O on 2-vCPU
runners. --tb=short is sufficient for failure context.
- Sharded backend-tests into a 4-way matrix via pytest-split (new
dev dep). Each shard runs ~1326 tests in ~95s on a 2-vCPU runner;
all 4 run in parallel so wall-clock drops from 362s -> ~100s.
- fail-fast: false on the matrix so a single failing shard doesn't
hide failures in the other three — PRs see the complete failure
picture in one push.

maziggy 3 days ago
parent
commit
fdf818e54c
4 changed files with 44 additions and 7 deletions
  1. 21 3
      .github/workflows/ci.yml
  2. 0 0
      backend/=0.9.0
  3. 19 4
      backend/tests/unit/test_code_quality.py
  4. 4 0
      requirements-dev.txt

+ 21 - 3
.github/workflows/ci.yml

@@ -84,10 +84,17 @@ jobs:
             --ignore-vuln CVE-2025-45768
 
   backend-tests:
-    name: Backend Tests
+    name: Backend Tests (shard ${{ matrix.shard }}/4)
     runs-on: ubuntu-latest
     if: github.event_name == 'push' || github.event.pull_request.user.login != github.repository_owner
     needs: backend-lint
+    strategy:
+      # Don't cancel sibling shards if one fails — we want every shard's
+      # failure list, not just the first one, so a single PR push shows
+      # all broken tests in one go.
+      fail-fast: false
+      matrix:
+        shard: [1, 2, 3, 4]
     steps:
       - uses: actions/checkout@v4
 
@@ -110,11 +117,22 @@ jobs:
           pip install -r requirements.txt
           pip install -r requirements-dev.txt
 
-      - name: Run tests
+      - name: Run tests (shard ${{ matrix.shard }}/4)
         timeout-minutes: 10
         run: |
           cd backend
-          python -m pytest tests/ -v --tb=short --timeout=60 --timeout-method=thread -n auto
+          # -v dropped: 5300+ "PASSED foo::bar" lines per worker eat 30-60s
+          # of stdout I/O time on 2-vCPU runners. --tb=short is enough.
+          # --splits 4 --group N uses pytest-split to slice the collected
+          # test set roughly evenly across the 4 matrix shards; first run
+          # is name-hash-based, subsequent runs improve via .test_durations
+          # if you ever commit one (we don't — even the naive hash split
+          # gets us ≈25% per shard given the test mix here).
+          python -m pytest tests/ \
+            --tb=short \
+            --timeout=60 --timeout-method=thread \
+            -n auto \
+            --splits 4 --group ${{ matrix.shard }}
 
   # ============================================================================
   # Frontend Checks

+ 0 - 0
backend/=0.9.0


+ 19 - 4
backend/tests/unit/test_code_quality.py

@@ -218,9 +218,27 @@ class TestModuleImports:
         """Verify all Python modules can be imported without errors.
 
         This catches syntax errors and missing dependencies.
+
+        IMPORTANT: We must NOT ``del sys.modules[name]`` to force a fresh
+        import here. ``backend.app.main`` is a stateful module — re-importing
+        it builds NEW module-level dicts (_timelapse_baselines,
+        _expected_prints, _active_prints, …) and re-runs ``root_logger.
+        addHandler(console_handler)``. Any test that already bound those
+        names via ``from backend.app.main import _timelapse_baselines`` now
+        holds a stale reference, while production code resolves the symbol
+        through the new module instance — they're two different dicts. CI
+        under -n 2 puts test_code_quality.py on the same worker as
+        test_print_start_assigns_printer_id_to_vp_archive.py and
+        test_timelapse_baseline_restart_recovery.py, and those tests see
+        their mock_archive un-mutated / their baseline dict empty even
+        though production logged the mutations went through. Local -n 30
+        spreads the tests across workers and the collision never happens.
+
+        ``importlib.import_module`` already covers the "is this importable"
+        check — it returns the cached module if cached, or runs the import
+        machinery if not. Either way, an import-time error surfaces here.
         """
         import importlib
-        import sys
 
         # Modules to test importing
         modules = [
@@ -235,9 +253,6 @@ class TestModuleImports:
         errors = []
         for module_name in modules:
             try:
-                # Remove from cache first to ensure fresh import
-                if module_name in sys.modules:
-                    del sys.modules[module_name]
                 importlib.import_module(module_name)
             except Exception as e:
                 errors.append(f"{module_name}: {type(e).__name__}: {e}")

+ 4 - 0
requirements-dev.txt

@@ -4,6 +4,10 @@ pytest-asyncio>=0.23.0
 pytest-cov>=4.1.0
 pytest-xdist>=3.5.0
 pytest-timeout>=2.4.0
+# Test-suite sharding for CI matrix. Splits the test set evenly across N
+# shards via --splits/--group; falls back to test-name hashing on the first
+# run, and uses recorded durations on subsequent runs (.test_durations).
+pytest-split>=0.9.0
 httpx>=0.27.0
 ruff>=0.8.0
 pre-commit>=4.0