Преглед изворни кода

fix(test): stop sys.modules-deleting backend.app.main in test_code_quality
+ ci: shard backend tests 4-way + drop -v for ~3.5x wall-clock speedup

Root cause of the 4 CI failures on PR #1514 (all in
test_print_start_assigns_printer_id_to_vp_archive.py +
test_timelapse_baseline_restart_recovery.py): test_all_modules_importable
in test_code_quality.py was deleting backend.app.main from sys.modules
and re-importing it via importlib.import_module. That created NEW
module-level dicts (_timelapse_baselines, _expected_prints,
_active_prints, …) and re-ran root_logger.addHandler — hence the
duplicate log lines at the same microsecond in captured stderr.

Any sibling test that bound those names via "from backend.app.main
import _timelapse_baselines" before the reimport now held a reference
to the OLD dict; production code (reached via "from backend.app.main
import on_print_start") resolved the symbol through the NEW module
instance. Production mutated the new dict, the test read the old one,
the assertion saw None / un-mutated mock_archive.

Locally with -n 30, xdist load-balanced test_code_quality.py to a
different worker process so the collision never happened (which is
why the suite was green for me). CI's -n auto = -n 2 on ubuntu-latest
made the collision deterministic.

Fix: drop the "del sys.modules[name]" step. importlib.import_module
already returns the cached module if cached, or runs the import
machinery if not — either way, any import-time error surfaces. The
"fresh import" framing was theatre; in practice every module in the
list is already imported by other tests/fixtures before this test
runs, so we were never actually getting a fresh import anyway — just
destruction.

CI workflow tightening (separate concern, same PR since both touch
the test infrastructure):

- Dropped -v from the pytest invocation. 5300+ "PASSED foo::bar"
lines per worker were eating ~30-60s of stdout I/O on 2-vCPU
runners. --tb=short is sufficient for failure context.
- Sharded backend-tests into a 4-way matrix via pytest-split (new
dev dep). Each shard runs ~1326 tests in ~95s on a 2-vCPU runner;
all 4 run in parallel so wall-clock drops from 362s -> ~100s.
- fail-fast: false on the matrix so a single failing shard doesn't
hide failures in the other three — PRs see the complete failure
picture in one push.

maziggy пре 3 дана
родитељ
комит
ca08f1f340
4 измењених фајлова са 44 додато и 7 уклоњено
  1. 21 3
      .github/workflows/ci.yml
  2. 0 0
      backend/=0.9.0
  3. 19 4
      backend/tests/unit/test_code_quality.py
  4. 4 0
      requirements-dev.txt

+ 21 - 3
.github/workflows/ci.yml

@@ -84,10 +84,17 @@ jobs:
             --ignore-vuln CVE-2025-45768
             --ignore-vuln CVE-2025-45768
 
 
   backend-tests:
   backend-tests:
-    name: Backend Tests
+    name: Backend Tests (shard ${{ matrix.shard }}/4)
     runs-on: ubuntu-latest
     runs-on: ubuntu-latest
     if: github.event_name == 'push' || github.event.pull_request.user.login != github.repository_owner
     if: github.event_name == 'push' || github.event.pull_request.user.login != github.repository_owner
     needs: backend-lint
     needs: backend-lint
+    strategy:
+      # Don't cancel sibling shards if one fails — we want every shard's
+      # failure list, not just the first one, so a single PR push shows
+      # all broken tests in one go.
+      fail-fast: false
+      matrix:
+        shard: [1, 2, 3, 4]
     steps:
     steps:
       - uses: actions/checkout@v4
       - uses: actions/checkout@v4
 
 
@@ -110,11 +117,22 @@ jobs:
           pip install -r requirements.txt
           pip install -r requirements.txt
           pip install -r requirements-dev.txt
           pip install -r requirements-dev.txt
 
 
-      - name: Run tests
+      - name: Run tests (shard ${{ matrix.shard }}/4)
         timeout-minutes: 10
         timeout-minutes: 10
         run: |
         run: |
           cd backend
           cd backend
-          python -m pytest tests/ -v --tb=short --timeout=60 --timeout-method=thread -n auto
+          # -v dropped: 5300+ "PASSED foo::bar" lines per worker eat 30-60s
+          # of stdout I/O time on 2-vCPU runners. --tb=short is enough.
+          # --splits 4 --group N uses pytest-split to slice the collected
+          # test set roughly evenly across the 4 matrix shards; first run
+          # is name-hash-based, subsequent runs improve via .test_durations
+          # if you ever commit one (we don't — even the naive hash split
+          # gets us ≈25% per shard given the test mix here).
+          python -m pytest tests/ \
+            --tb=short \
+            --timeout=60 --timeout-method=thread \
+            -n auto \
+            --splits 4 --group ${{ matrix.shard }}
 
 
   # ============================================================================
   # ============================================================================
   # Frontend Checks
   # Frontend Checks

+ 0 - 0
backend/=0.9.0


+ 19 - 4
backend/tests/unit/test_code_quality.py

@@ -218,9 +218,27 @@ class TestModuleImports:
         """Verify all Python modules can be imported without errors.
         """Verify all Python modules can be imported without errors.
 
 
         This catches syntax errors and missing dependencies.
         This catches syntax errors and missing dependencies.
+
+        IMPORTANT: We must NOT ``del sys.modules[name]`` to force a fresh
+        import here. ``backend.app.main`` is a stateful module — re-importing
+        it builds NEW module-level dicts (_timelapse_baselines,
+        _expected_prints, _active_prints, …) and re-runs ``root_logger.
+        addHandler(console_handler)``. Any test that already bound those
+        names via ``from backend.app.main import _timelapse_baselines`` now
+        holds a stale reference, while production code resolves the symbol
+        through the new module instance — they're two different dicts. CI
+        under -n 2 puts test_code_quality.py on the same worker as
+        test_print_start_assigns_printer_id_to_vp_archive.py and
+        test_timelapse_baseline_restart_recovery.py, and those tests see
+        their mock_archive un-mutated / their baseline dict empty even
+        though production logged the mutations went through. Local -n 30
+        spreads the tests across workers and the collision never happens.
+
+        ``importlib.import_module`` already covers the "is this importable"
+        check — it returns the cached module if cached, or runs the import
+        machinery if not. Either way, an import-time error surfaces here.
         """
         """
         import importlib
         import importlib
-        import sys
 
 
         # Modules to test importing
         # Modules to test importing
         modules = [
         modules = [
@@ -235,9 +253,6 @@ class TestModuleImports:
         errors = []
         errors = []
         for module_name in modules:
         for module_name in modules:
             try:
             try:
-                # Remove from cache first to ensure fresh import
-                if module_name in sys.modules:
-                    del sys.modules[module_name]
                 importlib.import_module(module_name)
                 importlib.import_module(module_name)
             except Exception as e:
             except Exception as e:
                 errors.append(f"{module_name}: {type(e).__name__}: {e}")
                 errors.append(f"{module_name}: {type(e).__name__}: {e}")

+ 4 - 0
requirements-dev.txt

@@ -4,6 +4,10 @@ pytest-asyncio>=0.23.0
 pytest-cov>=4.1.0
 pytest-cov>=4.1.0
 pytest-xdist>=3.5.0
 pytest-xdist>=3.5.0
 pytest-timeout>=2.4.0
 pytest-timeout>=2.4.0
+# Test-suite sharding for CI matrix. Splits the test set evenly across N
+# shards via --splits/--group; falls back to test-name hashing on the first
+# run, and uses recorded durations on subsequent runs (.test_durations).
+pytest-split>=0.9.0
 httpx>=0.27.0
 httpx>=0.27.0
 ruff>=0.8.0
 ruff>=0.8.0
 pre-commit>=4.0
 pre-commit>=4.0