tractor/.claude/skills/run-tests/SKILL.md

15 KiB
Raw Blame History

Run the tractor test suite using pytest. Follow this process:

1. Parse user intent

From the users message and any arguments, determine:

  • scope: full suite, specific file(s), specific test(s), or a keyword pattern (-k).
  • transport: which IPC transport protocol to test against (default: tcp, also: uds).
  • options: any extra pytest flags the user wants (e.g. --ll debug, --tpdb, -x, -v).

If the user provides a bare path or pattern as argument, treat it as the test target. Examples:

  • /run-tests → full suite
  • /run-tests test_local.py → single file
  • /run-tests test_registrar -v → file + verbose
  • /run-tests -k cancel → keyword filter
  • /run-tests tests/ipc/ --tpt-proto uds → subdir + UDS

2. Construct the pytest command

Base command:

python -m pytest

Default flags (always include unless user overrides):

  • -x (stop on first failure)
  • --tb=short (concise tracebacks)
  • --no-header (reduce noise)

Path resolution:

  • If the user gives a bare filename like test_local.py, resolve it under tests/.
  • If the user gives a subdirectory like ipc/, resolve under tests/ipc/.
  • Glob if needed: tests/**/test_*<pattern>*.py

Key pytest options for this project:

Flag Purpose
--ll <level> Set tractor log level (e.g. debug, info, runtime)
--tpdb / --debug-mode Enable tractors multi-proc debugger
--tpt-proto <key> IPC transport: tcp (default) or uds
--spawn-backend <be> Spawn method: trio (default), mp_spawn, mp_forkserver
-k <expr> pytest keyword filter
-v / -vv Verbosity
-s No output capture (useful with --tpdb)

Common combos:

# quick smoke test of core modules
python -m pytest tests/test_local.py tests/test_rpc.py -x --tb=short --no-header

# full suite, stop on first failure
python -m pytest tests/ -x --tb=short --no-header

# specific test with debug
python -m pytest tests/discovery/test_registrar.py::test_reg_then_unreg -x -s --tpdb --ll debug

# run with UDS transport
python -m pytest tests/ -x --tb=short --no-header --tpt-proto uds

# keyword filter
python -m pytest tests/ -x --tb=short --no-header -k "cancel and not slow"

3. Pre-flight: venv detection (MANDATORY)

Always verify a uv venv is active before running python or pytest. This project uses UV_PROJECT_ENVIRONMENT=py<MINOR> naming (e.g. py313) — never .venv.

Step 1: detect active venv

Run this check first:

python -c "
import sys, os
venv = os.environ.get('VIRTUAL_ENV', '')
prefix = sys.prefix
print(f'VIRTUAL_ENV={venv}')
print(f'sys.prefix={prefix}')
print(f'executable={sys.executable}')
"

Step 2: interpret results

Case A — venv is active (VIRTUAL_ENV is set and points to a py<MINOR>/ dir under the project root or worktree):

Use bare python / python -m pytest for all commands. This is the normal, fast path.

Case B — no venv active (VIRTUAL_ENV is empty or sys.prefix points to a system Python):

Use AskUserQuestion to ask the user:

“No uv venv is active. Should I activate one via UV_PROJECT_ENVIRONMENT=py<MINOR> uv sync, or would you prefer to activate your shell venv first?”

Options: 1. “Create/sync venv” — run UV_PROJECT_ENVIRONMENT=py<MINOR> uv sync where <MINOR> is detected from python --version (e.g. 313 for 3.13). Then use py<MINOR>/bin/python for all subsequent commands in this session. 2. “Ill activate it myself” — stop and let the user source py<MINOR>/bin/activate or similar.

Case C — inside a git worktree (git rev-parse --git-common-dir differs from --git-dir):

Verify Python resolves from the worktrees own venv, not the main repos:

python -c "import tractor; print(tractor.__file__)"

If the path points outside the worktree, create a worktree-local venv:

UV_PROJECT_ENVIRONMENT=py<MINOR> uv sync

Then use py<MINOR>/bin/python for all commands.

Why this matters: without the correct venv, subprocesses spawned by tractor resolve modules from the wrong editable install, causing spurious AttributeError / ModuleNotFoundError.

Fallback: uv run

If the user cant or wont activate a venv, all python and pytest commands can be prefixed with UV_PROJECT_ENVIRONMENT=py<MINOR> uv run:

# instead of: python -m pytest tests/ -x
UV_PROJECT_ENVIRONMENT=py313 uv run pytest tests/ -x

# instead of: python -c 'import tractor'
UV_PROJECT_ENVIRONMENT=py313 uv run python -c 'import tractor'

uv run auto-discovers the project and venv, but is slower than a pre-activated venv due to lock-file resolution on each invocation. Prefer activating the venv when possible.

Step 3: import + collection checks

After venv is confirmed, always run these (especially after refactors or module moves):

# 1. package import smoke check
python -c 'import tractor; print(tractor)'

# 2. verify all tests collect (no import errors)
python -m pytest tests/ -x -q --co 2>&1 | tail -5

If either fails, fix the import error before running any actual tests.

Step 4: zombie-actor / stale-registry check (MANDATORY)

The tractor runtimes default registry address is 127.0.0.1:1616 (TCP) / /tmp/registry@1616.sock (UDS). Whenever any prior test run — especially one using a fork-based backend like subint_forkserver — leaks a child actor process, that zombie keeps the registry port bound and every subsequent test session fails to bind, often presenting as 50+ unrelated failures (“all tests broken”!) across backends.

This has to be checked before the first run AND after any cancelled/SIGINTd run — signal failures in the middle of a test can leave orphan children.

# 1. TCP registry — any listener on :1616? (primary signal)
ss -tlnp 2>/dev/null | grep ':1616' || echo 'TCP :1616 free'

# 2. leftover actor/forkserver procs — scoped to THIS
#    repo's python path, so we don't false-flag legit
#    long-running tractor-using apps (e.g. `piker`,
#    downstream projects that embed tractor).
pgrep -af "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv" \
  | grep -v 'grep\|pgrep' \
  || echo 'no leaked actor procs from this repo'

# 3. stale UDS registry sockets
ls -la /tmp/registry@*.sock 2>/dev/null \
  || echo 'no leaked UDS registry sockets'

Interpretation:

  • TCP :1616 free AND no stale sockets → clean, proceed. The actor-procs probe is secondary — false positives are common (piker, any other tractor- embedding app); only cleanup if :1616 is bound or sockets linger.

  • TCP :1616 bound OR stale sockets present → surface PIDs + cmdlines to the user, offer cleanup:

    # 1. kill test zombies scoped to THIS repo's python only
    #    (don't pkill by bare pattern — that'd nuke legit
    #    long-running tractor apps like piker)
    pkill -f "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv"
    sleep 0.3
    pkill -9 -f "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv" 2>/dev/null
    
    # 2. if a test zombie holds :1616 specifically and doesn't
    #    match the above pattern, find its PID the hard way:
    ss -tlnp 2>/dev/null | grep ':1616'   # prints `users:(("<name>",pid=NNNN,...))`
    # then: kill <NNNN>
    
    # 3. remove stale UDS sockets
    rm -f /tmp/registry@*.sock
    
    # 4. re-verify
    ss -tlnp 2>/dev/null | grep ':1616' || echo 'TCP :1616 now free'

Never ignore stale registry state. If you see the “all tests failing” pattern — especially trio.TooSlowError / connection refused / address in use on many unrelated tests — check registry before spelunking into test code. The failure signature will be identical across backends because theyre all fighting for the same port.

False-positive warning for step 2: a plain pgrep -af '_actor_child_main' will also match legit long-running tractor-embedding apps (e.g. piker at ~/repos/piker/py*/bin/python3 -m tractor._child ...). Always scope to the current repos python path, or only use step 1 (:1616) as the authoritative signal.

4. Run and report

  • Run the constructed command.
  • Use a timeout of 600000ms (10min) for full suite runs, 120000ms (2min) for single-file runs.
  • If the suite is large (full tests/), consider running in the background and checking output when done.
  • Use --lf (last-failed) to re-run only previously failing tests when iterating on a fix.

On failure:

  • Show the failing test name(s) and short traceback.
  • If the failure looks related to recent changes, point out the likely cause and suggest a fix.
  • Check the known-flaky list (section 8) before investigating — dont waste time on pre-existing timeout issues.
  • NEVER auto-commit fixes. If you apply a code fix during test iteration, leave it unstaged. Tell the user what changed and suggest they review the worktree state, stage files manually, and use /commit-msg (inline or in a separate session) to generate the commit message. The human drives all git add and git commit operations.

On success:

  • Report the pass/fail/skip counts concisely.

5. Test directory layout (reference)

tests/
├── conftest.py          # root fixtures, daemon, signals
├── devx/                # debugger/tooling tests
├── ipc/                 # transport protocol tests
├── msg/                 # messaging layer tests
├── discovery/           # discovery subsystem tests
│   ├── test_multiaddr.py  # multiaddr construction
│   └── test_registrar.py  # registry/discovery protocol
├── test_local.py        # registrar + local actor basics
├── test_rpc.py          # RPC error handling
├── test_spawning.py     # subprocess spawning
├── test_multi_program.py  # multi-process tree tests
├── test_cancellation.py # cancellation semantics
├── test_context_stream_semantics.py  # ctx streaming
├── test_inter_peer_cancellation.py   # peer cancel
├── test_infected_asyncio.py  # trio-in-asyncio
└── ...

6. Change-type → test mapping

After modifying specific modules, run the corresponding test subset first for fast feedback:

Changed module(s) Run these tests first
runtime/_runtime.py, runtime/_state.py test_local.py test_rpc.py test_spawning.py test_root_runtime.py
discovery/ (_registry, _discovery, _addr) tests/discovery/ test_multi_program.py test_local.py
_context.py, _streaming.py test_context_stream_semantics.py test_advanced_streaming.py
ipc/ (_chan, _server, _transport) tests/ipc/ test_2way.py
runtime/_portal.py, runtime/_rpc.py test_rpc.py test_cancellation.py
spawn/ (_spawn, _entry) test_spawning.py test_multi_program.py
devx/debug/ tests/devx/test_debugger.py (slow!)
to_asyncio.py test_infected_asyncio.py test_root_infect_asyncio.py
msg/ tests/msg/
_exceptions.py test_remote_exc_relay.py test_inter_peer_cancellation.py
runtime/_supervise.py test_cancellation.py test_spawning.py

7. Quick-check shortcuts

After refactors (fastest first-pass):

# import + collect check
python -c 'import tractor' && python -m pytest tests/ -x -q --co 2>&1 | tail -3

# core subset (~10s)
python -m pytest tests/test_local.py tests/test_rpc.py tests/test_spawning.py tests/discovery/test_registrar.py -x --tb=short --no-header

Inspect last failures (without re-running):

When the user asks “what failed?”, “show failures”, or wants to check the last-failed set before re-running — read the pytest cache directly. This is instant and avoids test collection overhead.

python -c "
import json, pathlib, sys
p = pathlib.Path('.pytest_cache/v/cache/lastfailed')
if not p.exists():
    print('No lastfailed cache found.'); sys.exit()
data = json.loads(p.read_text())
# filter to real test node IDs (ignore junk
# entries that can accumulate from system paths)
tests = sorted(k for k in data if k.startswith('tests/'))
if not tests:
    print('No failures recorded.')
else:
    print(f'{len(tests)} last-failed test(s):')
    for t in tests:
        print(f'  {t}')
"

Why not --cache-show or --co --lf?

  • pytest --cache-show 'cache/lastfailed' works but dumps raw dict repr including junk entries (stale system paths that leak into the cache).
  • pytest --co --lf actually collects tests which triggers import resolution and is slow (~0.5s+). Worse, when cached node IDs dont exactly match current parametrize IDs (e.g. param names changed between runs), pytest falls back to collecting the entire file, giving false positives.
  • Reading the JSON directly is instant, filterable to tests/-prefixed entries, and shows exactly what pytest recorded — no interpretation.

After inspecting, re-run the failures:

python -m pytest --lf -x --tb=short --no-header

Full suite in background:

When core tests pass and you want full coverage while continuing other work, run in background:

python -m pytest tests/ -x --tb=short --no-header -q

(use run_in_background=true on the Bash tool)

8. Known flaky tests

These tests have pre-existing timing/environment sensitivity. If they fail with TooSlowError or pexpect TIMEOUT, they are almost certainly NOT caused by your changes — note them and move on.

Test Typical error Notes
devx/test_debugger.py::test_multi_nested_subactors_error_through_nurseries pexpect TIMEOUT Debugger pexpect timing
test_cancellation.py::test_cancel_via_SIGINT_other_task TooSlowError Signal handling race
test_inter_peer_cancellation.py::test_peer_spawns_and_cancels_service_subactor TooSlowError Async timing (both param variants)
test_docs_examples.py::test_example[we_are_processes.py] assert None == 0 __main__ missing __file__ in subproc

Rule of thumb: if a test fails with TooSlowError, trio.TooSlowError, or pexpect.TIMEOUT and you didnt touch the relevant code path, its flaky — skip it.