yunoadmin
synced commits to 0.2.4.3 at yunoadmin/bambuddy from mirror
66fea70564 ci(docker): full backend suite in Docker, 4-way matrix shard, GHA cache backend
Earlier patch trimmed the duplicate unit-test re-run from docker-test
to drop a 5-10 min job that wasn't adding coverage. But "wasn't adding
coverage" only holds for pure-logic tests — system-touching tests
(ffmpeg version probes, ftp clients, subprocess shell-outs, locale/
timezone-sensitive assertions, paths) genuinely can pass on the GHA
host and fail in python:3.13-slim. Curation via a `docker_env`
marker is fragile (new tests get forgotten); gating on `main` only
defers the cost without removing it.
Instead, run the full backend suite IN Docker on every PR but make
it fast:
- New docker-backend-tests job runs the same 4-way pytest-split
matrix as the host backend-tests, just inside the test image.
- docker/setup-buildx-action + docker/build-push-action@v5 with
cache-from/cache-to: type=gha,scope=backend-test persist the
BuildKit cache (pip-install layer included) across CI runs and
across the 4 sibling shards. Cold build is ~150s/shard; warm
build drops to ~10s/shard.
- fail-fast: false so a single failing shard surfaces the rest's
output too.
Total CI wall-clock for a PR push is now gated by docker-test (the
image-build + integration HTTP smoke + integration test suite job)
at ~3 min, not by the unit-test re-run anymore.
The earlier ci.yml step that ran `docker compose run --rm
backend-test` synchronously in the docker-test job stays removed —
the new docker-backend-tests matrix covers the same ground and is
much faster.
1b271a8ead ci(docker): stop re-running unit tests inside the test image
The "Docker Build" job in ci.yml was running the same 5287 backend
tests + 2022 frontend tests inside the bambuddy-backend-test /
bambuddy-frontend-test images that the host-side backend-tests and
frontend-tests jobs had already run. Same test code, same Python
version (env.PYTHON_VERSION), same requirements.txt the test image
installs. On 2-vCPU GHA runners that re-run added 5-10 min of
wall-clock for zero new coverage — and "frontend tests in Docker"
added another 2-3 min for the same reason.
Drop both steps from the CI job. Keep everything that validates the
Docker IMAGE specifically: production image build, backend module
import verification, static-files-copied check, integration
container bring-up + health/API/static HTTP smoke checks, and the
integration test suite (which IS genuinely Docker-specific — it
runs against the live container via BAMBUDDY_TEST_URL).
test_docker.sh keeps the unit-test reruns because devs running it
locally don't have a separate host-side pytest job to compare
against.
Combined with the earlier 4-way pytest-split shard on the host
backend-tests job, expected PR-push wall-clock drops from
~10-12 min to ~3 min, gated on max(backend-tests shard, frontend
tests, docker-image-build+integration).
e377b4aaa9 ci(docker): drop -v, -n auto instead of -n 30, pip cache mount
Three things were making the Docker test runs noisier and slower than
they needed to be:
1. -v was hardcoded in Dockerfile.test:35 CMD and in docker-compose.
test.yml's integration-test-runner command. The ci.yml change to
drop -v from the bare pytest call missed both — Docker runs use
the image's CMD, not the workflow's.
2. -n 30 was hardcoded as the xdist worker count. On a 2-vCPU CI box
that's 30 Python processes fighting over 2 cores — mostly IPC and
import-thrash overhead. -n auto adapts to the host: 2 on CI, 30
on a 30-core dev box. Same final-result throughput on the dev
box, much better on small runners.
3. pip install had --no-cache-dir and no BuildKit cache mount, so
every Docker build re-fetched ~50 packages from PyPI (~60-90s
on a cold pip cache). Adding `RUN --mount=type=cache,target=
/root/.cache/pip` (with the `# syntax=docker/dockerfile:1.7`
directive that enables it) makes subsequent builds re-use the
download cache so they only do install work, ~5s instead of
~90s. DOCKER_BUILDKIT=1 is already exported in test_docker.sh
and is the GHA default since runner image 2023, so the cache
mount is always honoured.
Verified locally: Docker build is 19s warm (was ~90s cold each
time), test run is 102s with 5287 passed / 1 skipped (the
by-design spoolbuddy importorskip) — clean output, no [gwN]
worker spam, no "created: 30/30 workers" startup line.
GHA-side per-run cold-build slowness still happens because GHA
runners are ephemeral; a follow-up using docker/build-push-action
with type=gha cache backend would persist the BuildKit cache
across CI runs but that's a bigger workflow change.
fdf818e54c fix(test): stop sys.modules-deleting backend.app.main in test_code_quality
+ ci: shard backend tests 4-way + drop -v for ~3.5x wall-clock speedup
Root cause of the 4 CI failures on PR #1514 (all in
test_print_start_assigns_printer_id_to_vp_archive.py +
test_timelapse_baseline_restart_recovery.py): test_all_modules_importable
in test_code_quality.py was deleting backend.app.main from sys.modules
and re-importing it via importlib.import_module. That created NEW
module-level dicts (_timelapse_baselines, _expected_prints,
_active_prints, …) and re-ran root_logger.addHandler — hence the
duplicate log lines at the same microsecond in captured stderr.
Any sibling test that bound those names via "from backend.app.main
import _timelapse_baselines" before the reimport now held a reference
to the OLD dict; production code (reached via "from backend.app.main
import on_print_start") resolved the symbol through the NEW module
instance. Production mutated the new dict, the test read the old one,
the assertion saw None / un-mutated mock_archive.
Locally with -n 30, xdist load-balanced test_code_quality.py to a
different worker process so the collision never happened (which is
why the suite was green for me). CI's -n auto = -n 2 on ubuntu-latest
made the collision deterministic.
Fix: drop the "del sys.modules[name]" step. importlib.import_module
already returns the cached module if cached, or runs the import
machinery if not — either way, any import-time error surfaces. The
"fresh import" framing was theatre; in practice every module in the
list is already imported by other tests/fixtures before this test
runs, so we were never actually getting a fresh import anyway — just
destruction.
CI workflow tightening (separate concern, same PR since both touch
the test infrastructure):
- Dropped -v from the pytest invocation. 5300+ "PASSED foo::bar"
lines per worker were eating ~30-60s of stdout I/O on 2-vCPU
runners. --tb=short is sufficient for failure context.
- Sharded backend-tests into a 4-way matrix via pytest-split (new
dev dep). Each shard runs ~1326 tests in ~95s on a 2-vCPU runner;
all 4 run in parallel so wall-clock drops from 362s -> ~100s.
- fail-fast: false on the matrix so a single failing shard doesn't
hide failures in the other three — PRs see the complete failure
picture in one push.
b92bdb7d09 fix(test): snapshot _timelapse_baselines inside the patch context to dodge CI race
test_running_observed_captures_baseline_on_restart_recovery was reading
_timelapse_baselines.get(1) after the patch() with-block exited.
Locally and under low parallelism this works fine — the dict still
holds what _capture_timelapse_baseline_at_start wrote. CI under
xdist's default load-balancing scheduling intermittently saw the
dict empty by the time the top-level assert ran, even though the
production code logged "Baseline at print start: 3 video files for
printer 1" right before returning. The duplicate log line at the
same microsecond in the captured stderr is the tell — module state
is being re-touched between the handler completing and the test
asserting, almost certainly via the session-scoped event_loop
fixture in conftest.py interacting badly with the per-file
autouse _clear_baselines teardown of a sibling test on the same
worker.
The test is verifying the handler captured the baseline at the
moment it returned, so capture the relevant value at exactly
that point — inside the with-block, immediately after the await.
That's immune to whatever happens to the module-level dict
afterward.
- View comparison for these 10 commits »
1 day ago