tractor

Commit Graph

Author	SHA1	Message	Date
Gud Boi	781abf7558	Flip back to default `pytest` capture for CI (cherry picked from commit `22cdf15b73`)	2026-06-09 20:27:26 -04:00
Gud Boi	23b8a80e15	Add posix-multithreaded-`fork()` explainer doc (cherry picked from commit `532a9834f3`)	2026-06-09 20:27:26 -04:00
Gud Boi	d4e4062bbd	Add todo for running `test_debugger` suite on forkserver spawner (cherry picked from commit `2917b74ba4`)	2026-06-09 20:27:26 -04:00
Gud Boi	e3834f2d95	Route `stackscope` SIGUSR1 onto trio loop Signal handlers fire in a non-trio stack frame; calling `stackscope.extract(recurse_child_tasks=True)` from there only walks the `<init>` task and misses everything inside `async_main`'s nurseries — exactly the part you want to see during a hang. Fix: capture `trio.lowlevel.current_trio_token()` at `enable_stack_on_sig()` time and stash it as a module- level `_trio_token`. The SIGUSR1 handler then dispatches the dump onto the trio loop via `_trio_token.run_sync_soon(_safe_dump_task_tree)`, so `stackscope.extract` runs from a real trio-task context and walks the full nursery tree. Late-binding: pytest's `pytest_configure` calls `enable_stack_on_sig()` outside any `trio.run`, so token capture there is a `RuntimeError` — left at `None`. The runtime re-calls `enable_stack_on_sig()` from inside `async_main` (subactor side) where the token IS available, so subactors get the full-tree path. `dump_tree_on_sig` falls back to a direct call when `_trio_token is None` (parent process pre-trio.run, or signal delivered after `trio.run` returns). `_safe_dump_task_tree()` is a `run_sync_soon`-friendly wrapper that swallows any exception from `dump_task_tree()` — trio prints + crashes on uncaught exceptions in scheduled callbacks; better to log + keep the run alive so the user can re-trigger. Other, - emit `capture-bypass tee: <fpath>` line + `tail -f` hint in the rendered dump header so users know where to find the artifact even when stdio is captured. - swap the inline `f' \|_{actor}'` line for a `_pformat.nest_from_op` rendering of `actor_repr` (matches the rest of the runtime's nested-op style). - log lines on handler install + already-installed branches now note `(trio_token captured: <bool>)` so it's obvious from the log whether the full-tree path is wired. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `2d4995e08d`)	2026-06-09 20:27:26 -04:00
Gud Boi	b0bca7c81e	Add `--enable-stackscope` pytest plugin flag New `--enable-stackscope` CLI flag installs a SIGUSR1 → trio-task-tree-dump handler in pytest itself + every spawned subactor for live stack visibility during hang investigations. Lighter than `--tpdb` (no pdb machinery / tty-lock contention) — pure stack-only triage. Plumbing: - `_testing.pytest.pytest_addoption()` adds the flag. - `_testing.pytest.pytest_configure()` (when flag set): * exports `TRACTOR_ENABLE_STACKSCOPE=1` so fork-children inherit it via environ, * installs the handler in pytest itself via `enable_stack_on_sig()`. - `runtime._runtime.Actor.async_main()` extends the existing `_debug_mode` gate to ALSO fire when `TRACTOR_ENABLE_STACKSCOPE` is in env — so subactors install the same handler at runtime startup. Capture-bypass tee in `dump_task_tree()`: Pytest's default `--capture=fd` swallows `log.devx()` output, making SIGUSR1 dumps invisible right when you need them. Render the dump once to a `full_dump` str, then unconditionally tee to: - `/tmp/tractor-stackscope-<pid>.log` (append-mode, always written) — guaranteed-readable artifact even under CI / `nohup` / no-tty. `tail -f` to follow. - `/dev/tty` (best-effort) — pytest never captures the tty; ignored if device is missing. Other, - squelch the benign `RuntimeWarning` ("coroutine method 'asend'/'athrow' was never awaited") from `stackscope._glue`'s import-time async-gen type introspection so `--enable-stackscope` setup stays quiet. - log msg in the `_runtime` ImportError branch now mentions `--enable-stackscope` alongside debug-mode. Usage, pytest --enable-stackscope -k <hang-test> # in another shell, find the pid + signal: kill -USR1 <pytest-or-subactor-pid> # tail the artifact: tail -f /tmp/tractor-stackscope-<pid>.log (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `5418f2dc3c`)	2026-06-09 20:27:26 -04:00
Gud Boi	ff7acfcbd6	Backend-aware `fail_after` in pub/sub test Mirror `060f7d24`'s pattern (backend-aware timeout in `maybe_expect_raises`) for `test_dynamic_pub_sub`'s hard `trio.fail_after` cap. Fork-based backends pay per-spawn fork+IPC-handshake cost which stacks over `cpus - 1` sequential `n.run_in_actor()` calls; empirically 12s flakes on `main_thread_forkserver` under UDS cross-pytest contention (#451 / #452). Defaults: - `main_thread_forkserver` → 30s - everything else → 12s (unchanged) Hoist the timeout-pick out of the `main()` closure so the dispatch happens once in the trio task rather than re-evaluating per spawn. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `383b0fdd75`)	2026-06-09 20:27:26 -04:00
Gud Boi	6f003d7efd	Backend-aware timeout in `maybe_expect_raises` Default `timeout` from `int = 3` → `int\|None = None`; when unset, pick a backend-aware value. Fork-based backends (`main_thread_forkserver`) need real headroom bc actor spawn + IPC ctx-exit + msg-validation error path is much heavier than under `trio` backend — especially under cross-pytest-stream contention (#451). Defaults: - `main_thread_forkserver` → 30s - everything else → 3s (unchanged) Empirical flake history that motivated 30s as the floor on fork backends (all from `test_basic_payload_spec`): - 3s → all-valid variant flaked w/ `TooSlowError` - 8s → `invalid-return` variant flaked w/ `Cancelled` (surfaced instead of `MsgTypeError` bc the outer `fail_after` fired mid-error-path) - 15s → flaked under cross-pytest-stream contention 30s gives plenty of headroom while still failing-loud on a genuine hang. Callers can opt out by passing an explicit `timeout=` kw. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `060f7d24c4`)	2026-06-09 20:27:26 -04:00
Gud Boi	bbf4fe66e3	Drop global `pytest-timeout` cap from `pyproject.toml` `timeout = 200` was firing via SIGALRM (the default `method='signal'`) which synchronously raises `Failed` in trio's main thread mid-`epoll.poll()`, abandoning trio's runner mid-flight and leaving `GLOBAL_RUN_CONTEXT` half- installed. EVERY subsequent `trio.run()` in the same pytest session then bails with `RuntimeError: Attempted to call run() from inside a run()`. Empirical impact: a session that hits a single 200s hang cascades into 30-40 false-positive failures across every downstream test file that uses `trio.run`. Recent UDS run saw 1 real timeout (`test_unregistered_err_still_relayed`) poison 38 sibling tests with cascade-fails — a debugging nightmare. Same architectural bug we already documented in `tests/test_advanced_streaming.py::test_dynamic_pub_sub` (see its module-level NOTE) — both `pytest-timeout` enforcement modes are incompatible with trio under fork- based spawn backends. Now scoped session-wide. For tests that legitimately need a wall-clock cap, the canonical pattern is `with trio.fail_after(N):` INSIDE the test — trio's own `Cancelled` machinery cleanly unwinds the actor nursery without disturbing global state. For CI: rely on job-level wall-clock timeouts (e.g. GitHub Actions `timeout-minutes`) to abort genuinely-stuck suites. `pyproject.toml` comment block spells this all out so a future contributor doesn't reach back for `timeout =` and re-introduce the bug. ALSO, bump `xonsh` to at least `0.23.0` release. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `3c366cac13`)	2026-06-09 20:27:26 -04:00
Gud Boi	28ad06be8c	Return parent `pid: int` from new `reap_subactors_per_test` fixture (cherry picked from commit `f8178df0fd`)	2026-06-09 20:27:26 -04:00
Gud Boi	2d9a95d13a	Use `trio.fail_after` cap in `test_dynamic_pub_sub` Drop `@pytest.mark.timeout(...)` for the per-test wall-clock cap on `test_dynamic_pub_sub`; rely on `trio.fail_after(12)` inside `main()` instead. Both pytest-timeout enforcement modes are incompatible with trio under fork-based backends: - `method='signal'` (SIGALRM) synchronously raises `Failed` in trio's main thread mid-`epoll.poll()`, leaving `GLOBAL_RUN_CONTEXT` half-installed ("Trio guest run got abandoned") so EVERY subsequent `trio.run()` in the same pytest process bails with `RuntimeError: Attempted to call run() from inside a run()` — full-session poison. - `method='thread'` calls `_thread.interrupt_main()` which can let the KBI escape trio's `KIManager` under fork- cascade teardown races and bubble out of pytest entirely — kills the whole session. `trio.fail_after()` keeps cancellation inside the trio loop: - Raises `TooSlowError` cleanly through the open-nursery's cancel cascade. - Doesn't disturb any out-of-band signal/thread state. - Failure stays scoped to the single test — no cross-test global state corruption either way. Verified empirically: 10 hammer-runs of `test_dynamic_pub_sub` go from 5/10 fail (with global-state poison) to 3/10 fail (no poison, all sibling tests still pass). The ~30% remaining flake rate is a genuine fork-cancel-cascade hang — separate from this fix but no longer contaminates. Module-level NOTE comment explains the rationale so future readers don't re-introduce the bug. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `530160fa69`)	2026-06-09 20:27:26 -04:00
Gud Boi	3315a8a292	Add opt-in `reap_subactors_per_test` fixture Function-scoped, NON-autouse zombie-subactor reaper for modules whose teardown is known-leaky enough to cascade- fail every following test in a session. Sibling to the autouse session-scoped `_reap_orphaned_subactors`. The session-scoped one fires at session end — too late to save tests that follow a hung/leaky test in the suite. The new fixture, opted into via `pytestmark = pytest.mark.usefixtures(...)`, runs between tests in a problem-module so a leftover subactor from test N can't squat on registrar ports / UDS paths / shm segments needed by tests N+1, N+2, ... Intentionally NOT autouse — the fixture's presence on a module signals "this module's teardown leaks; please root-cause instead of relying forever on cleanup". A visibility-vs-convenience trade picked in favor of the former. Apply to `tests/test_infected_asyncio.py` since both recent full-suite runs (parallel-tpt-proto + TCP-only) showed the cascade originating in this file's KBI- and SIGINT-flavored tests under `main_thread_forkserver`. Module-comment names the specific offenders so future de-flake work has a starting point. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `b376eb0332`)	2026-06-09 20:27:26 -04:00
Gud Boi	b7056d8da9	Fix `_testing.addr.get_rando_addr` cross-process collisions Previously the random port was a default-arg expression (`_rando_port: str = random.randint(1000, 9999)`) — evaluated ONCE at module import time, making it a per-process singleton. Two parallel pytest sessions had a 1/9000 birthday-pair chance of picking the same port; when it hit, every `reg_addr`-using test in BOTH runs would cascade-fail with "Address already in use". Switch to per-call `random.randint()` salted with `os.getpid()` so: - within one session: two calls return distinct ports — e.g. `test_tpt_bind_addrs::bind-subset-reg` now actually gets two different reg addrs on the TCP backend (it was silently duplicating before), - across parallel sessions: pid salt biases each process's port choices apart, making cross-run collisions vanishingly rare. Drop the bogus `: str` annotation (was always `int`). UDS already gets per-process isolation via `UDSAddress.get_random()`'s `@<pid>` socket-path suffix, so no change needed there. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `7c5dd4d033`)	2026-06-09 20:27:26 -04:00
Gud Boi	5b08c6b034	Sweep `subint_forkserver` → `main_thread_forkserver` in code After the variant-1 / variant-2 backend split, update remaining string-match refs to the variant-1 backend so user-visible gates + skip-marks + comments name the working backend correctly: - `tractor._root._DEBUG_COMPATIBLE_BACKENDS`: include `main_thread_forkserver`, drop the stub-only `subint_forkserver` entry. - `tests/test_spawning.py::test_loglevel_propagated_to_subactor`: capfd-skip flips to `main_thread_forkserver`. - `tests/test_infected_asyncio.py::test_sigint_closes_lifetime_stack`: xfail-condition flips to `main_thread_forkserver`. - `tests/test_shm.py`: drop stale "broken on `main_thread_forkserver`" reason-text since the `mp.SharedMemory(track=False)` + resource-tracker monkey-patch in `.ipc._mp_bs` makes the tests pass; the skip-mark only fires on plain `subint` now. - Comment / docstring sweep: `runtime._state`, `runtime._runtime`, `_testing.pytest`, `_subint.py`, `pyproject.toml`, `test_cancellation.py`, `test_registrar.py` — refs to variant-1 backend updated. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `205382a39b`) (factored: dropped spawn-backend-only path: tractor/spawn/_subint.py)	2026-06-09 20:27:26 -04:00
Gud Boi	1f7403abc2	Wire `reg_addr` into `test_context_stream_semantics` Same wire-up pattern as the prior `test_dynamic_pub_sub` commit: each test that already pulled in `debug_mode` now also pulls in `reg_addr` and passes `registry_addrs=[reg_addr]` into `tractor.open_nursery()`, so the suite's standard registry-addr conventions apply. Tests touched: - `test_started_misuse` - `test_simple_context` - `test_parent_cancels` - `test_one_end_stream_not_opened` - `test_maybe_allow_overruns_stream` - `test_ctx_with_self_actor` (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `66f1941f46`)	2026-06-09 20:27:26 -04:00
Gud Boi	ed00b75a7b	Wire `test_dynamic_pub_sub` to standard fixtures Pull in the `reg_addr`, `debug_mode`, and `test_log` fixtures so this test follows the same conventions as the rest of the suite: - pass `registry_addrs=[reg_addr]` + `debug_mode` into `tractor.open_nursery()` (so `--tpdb` etc work). - after the `pytest.raises` block, add `assert err` + `test_log.exception('Timed out AS EXPECTED')` so the expected timeout is logged explicitly instead of swallowed. Also, - drop whitespace-only blank lines around the `subs` param of `consumer()` and `ctx` param of `one_task_streams_and_one_handles_reqresp()`. - promote `test_sigint_both_stream_types`'s one-line docstring to multi-line form. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `9b05f659b3`)	2026-06-09 20:27:26 -04:00
Gud Boi	8daf8eeaca	Bump `test_stale_entry_is_deleted`'s timeout to 30 Seems that when run in-suite it delays more then the so-measured "happy path" timing; better to have no suite-global interruption then asserting a fast single test's run. (cherry picked from commit `65fcfbf224`)	2026-06-09 20:27:26 -04:00
Gud Boi	8598da2d3a	Add `--shm` orphan sweep to `tractor-reap` Since `tractor.ipc._mp_bs.disable_mantracker()` turns off `mp.resource_tracker` entirely (see the conc-anal doc `subint_forkserver_mp_shared_memory_issue.md`), a hard-crashing actor can leave `/dev/shm/<key>` segments that nothing else GCs. New `tractor-reap` phase 2 sweeps them. Deats, - `tractor/_testing/_reap.py`: add `find_orphaned_shm()` + `reap_shm()` helpers. Match criteria: regular file under `/dev/shm`, owned by current uid, AND no live proc has it open (mmap'd or fd-held). In-use enumeration via `psutil.Process.memory_maps()` + `.open_files()` — xplatform, kernel-canonical (same answer `lsof` would give), no reliance on tractor-specific shm-key naming. - `_ensure_shm_supported()` guard: helpers raise `NotImplementedError` outside Linux/FreeBSD bc macOS POSIX shm has no fs-visible path (`shm_open` only) and Windows is a different story. - `scripts/tractor-reap`: new `--shm` (run after process reap) and `--shm-only` (skip process phase) flags. `-n` dry-runs both phases. Exit code is `1` if either phase had survivors/errors. - `pyproject.toml` + `uv.lock`: add `psutil>=7.0.0` to the `testing` dep group; lazy-imported in `_reap.py` so the process-reap path stays import-clean without it. Also, - doc `--shm` in `.claude/skills/run-tests/SKILL.md` (new section 10c) — covers match criteria + the preservation guarantee for unrelated apps. - flip mitigation status in `subint_forkserver_mp_shared_memory_issue.md` from "could extend `tractor-reap`" to "implemented", with a note that callers should still UUID-pin shm keys to avoid cross-session collisions. Verified locally vs 81 in-use segments held by `piker`, `lttng-ust-`, `aja-shm-` — all preserved; only the genuinely-orphaned tractor segments got unlinked. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `4f12d69b41`) (factored: dropped subint_forkserver conc-anal doc update)	2026-06-09 20:27:26 -04:00
Gud Boi	2e2977b74c	Fix `SharedMemory` under `subint_forkserver` Implements the resolution described in c99d475d's `subint_forkserver_mp_shared_memory_issue.md` (now updated with the resolution post-mortem). Two-part fix that side-steps `mp.resource_tracker` entirely rather than try to make it fork-safe — turns out that's both simpler AND more correct given tractor already SC-manages allocation lifetimes. Deats, - `tractor/ipc/_mp_bs.py::disable_mantracker()`: drop the `platform.python_version_tuple()[:-1] >= ('3', '13')` branch — patches now run unconditionally: * monkey-patch `mp.resource_tracker. _resource_tracker` to a no-op `ManTracker` subclass (empty `register` / `unregister` / `ensure_running`). * return `partial(SharedMemory, track=False)` for the per-allocation opt-out. * belt + suspenders: even if something dodges the wrapper, the singleton can't talk to the inherited (broken) parent fd. - `tractor/ipc/_shm.py::open_shm_list()`: drop the 3.13+ conditional skip of the unlink-callback; install a `try_unlink()` wrapper that swallows `FileNotFoundError` (sibling-already-cleaned race in shared-key setups). Without `mp.resource_tracker` doing it for us, we own the unlink — `actor. lifetime_stack` is the right place since tractor already controls actor lifecycle. - `tests/test_shm.py`: uncomment-out `subint_forkserver` from the module-level skip- list (tests pass now). Inline comment cross-refs the two `_mp_bs` / `_shm` workarounds. - `ai/conc-anal/subint_forkserver_mp_shared_memory_ issue.md`: heavy rewrite — flips status from "open / unresolvable in tractor" to "resolved, kept as decision record". Adds Resolution section, "Why this is the right call" rationale (mp tracker is widely criticized; tractor already owns lifecycle), trade-offs (crash-leaked segments, lost mp leak warning), verification (7 passed under both `subint_forkserver` and `trio` backends), and upstream issue links (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `aa3e230926`) (factored: dropped subint_forkserver conc-anal doc update)	2026-06-09 20:27:26 -04:00
Gud Boi	da0c457ff7	Document `SharedMemory` × `subint_forkserver` incompat New `ai/conc-anal/` doc: `mp.SharedMemory` is fork-without-exec unsafe — child inherits parent's `resource_tracker` fd → EBADF on first shm op; leaked `/shm_list` cascades `FileExistsError` across parametrize variants. Canonical CPython issue class, NOT a tractor bug. Includes two longer-term mitigation paths (reset inherited tracker fd vs migrate off `mp.shared_memory`). Also, update `tests/test_shm.py`: - comment out `subint_forkserver` from skip list - rewrite reason with precise failure-mode descriptions + link to the analysis doc (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `c99d475d03`) (factored: dropped spawn-backend-only paths: ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md)	2026-06-09 20:27:26 -04:00
Gud Boi	352adc64a8	Add `tractor-reap` CLI + document auto-reap New `scripts/tractor-reap` CLI wraps the `_testing._reap` mod for manual zombie-subactor cleanup after crashed pytest sessions. Two modes: - orphan-mode (default): finds PPid==1 procs with cwd matching repo root + `python` in cmdline. - descendant-mode (`--parent <pid>`): scoped sweep under a still-live supervisor. SC-polite: SIGINT with bounded grace window (default 3s) before escalating to SIGKILL. Exit code signals whether escalation was needed (useful for CI health-checks). Also, document both the auto-reap fixture and the CLI in `/run-tests` SKILL.md (section 10). (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `6d76b60404`)	2026-06-09 20:27:26 -04:00
Gud Boi	2df05e8225	Add `_testing._reap` + auto-reap fixture Zombie-subactor cleanup for the test suite, SC-polite discipline (`SIGINT` first, bounded grace, `SIGKILL` only on survivors). Two parts: a shared reaper module + an autouse session-end fixture that runs it. Deats, - new `tractor/_testing/_reap.py` (+230 LOC) — Linux- only reaper using `/proc/<pid>/{status,cwd,cmdline}` inspection. Two detection modes: - `find_descendants(parent_pid)` for the in-session case (PPid-direct-match while pytest is still alive). - `find_orphans(repo_root)` for the CLI / post- mortem case (`PPid==1` reparented to init + `cwd` filter to repo root + `python` cmdline filter). - `reap(pids, *, grace=3.0, poll=0.25)` does the signal ladder: SIGINT all, poll up to `grace` for exit, SIGKILL any survivors. Returns `(signalled, killed)` for caller-side reporting. - new `_reap_orphaned_subactors` session-scoped autouse fixture in `tractor/_testing/pytest.py` — after `yield`, runs `find_descendants(os.getpid())` + `reap(...)` so each pytest session leaves no surviving forks. - companion CLI scaffolding lives at `scripts/tractor-reap` (separate commit) for the pytest-died-mid-session case where the in-session fixture didn't get to run. Also, - promote `from tractor.spawn._spawn import SpawnMethodKey` to module-top in `pytest.py` (was inline-imported inside `pytest_generate_tests`), and reuse it in `pytest_collection_modifyitems` to assert each `skipon_spawn_backend` mark arg is a valid spawn-method literal — catches typos at collection time. - inline `# ?TODO` flags running these through the `try_set_backend` checker for stronger validation. Cross-refs `feedback_sc_graceful_cancel_first.md` for the SIGINT-before-SIGKILL discipline rationale. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `eae478f3d5`)	2026-06-09 20:27:26 -04:00
Gud Boi	13053f9cbe	Skip `test_loglevel_propagated_to_subactor` on subint forkserver too (cherry picked from commit `2ca0f41e61`)	2026-06-09 20:22:23 -04:00
Gud Boi	a199aa5096	Wire `reg_addr` through infected-asyncio tests Continues the hygiene pattern from `de601676` (cancel tests) into `tests/test_infected_asyncio.py`: many tests here were calling `tractor.open_nursery()` w/o `registry_addrs=[reg_addr]` and thus racing on the default `:1616` registry across sessions. Thread the session-unique `reg_addr` through so leaked or slow-to-teardown subactors from a prior test can't cross-pollute. Deats, - add `registry_addrs=[reg_addr]` to `open_nursery()` calls in suite where missing. - `test_sigint_closes_lifetime_stack`: - add `reg_addr`, `debug_mode`, `start_method` fixture params - `delay` now reads the `debug_mode` param directly instead of calling `tractor.debug_mode()` (fires slightly earlier in the test lifecycle) - sanity assert `if debug_mode: assert tractor.debug_mode()` after nursery open - new print showing SIGINT target (`send_sigint_to` + resolved pid) - catch `trio.TooSlowError` around `ctx.wait_for_result()` and conditionally `pytest.xfail` when `send_sigint_to == 'child' and start_method == 'subint_forkserver'` — the known orphan-SIGINT limitation tracked in `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md` - parametrize id typo fix: `'just_trio_slee'` → `'just_trio_sleep'` (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `b350aa09ee`)	2026-06-09 20:22:23 -04:00
Gud Boi	ba2e474d9d	Import-or-skip `.devx.` tests requiring `greenback` Which is for sure true on py3.14+ rn since `greenlet` didn't want to build for us (yet). (cherry picked from commit `d6e70e9de4`)	2026-06-09 20:22:23 -04:00
Gud Boi	e4c7ac34db	Default `pytest` to use `--capture=sys` Lands the capture-pipe workaround from the prior cluster of diagnosis commits: switch pytest's `--capture` mode from the default `fd` (redirects fd 1,2 to temp files, which fork children inherit and can deadlock writing into) to `sys` (only `sys.stdout` / `sys.stderr` — fd 1,2 left alone). Trade-off documented inline in `pyproject.toml`: - LOST: per-test attribution of raw-fd output (C-ext writes, `os.write(2, ...)`, subproc stdout). Still goes to terminal / CI capture, just not per-test-scoped in the failure report. - KEPT: `print()` + `logging` capture per-test (tractor's logger uses `sys.stderr`). - KEPT: `pytest -s` debugging behavior. This allows us to re-enable `test_nested_multierrors` without skip-marking + clears the class of pytest-capture-induced hangs for any future fork-based backend tests. Deats, - `pyproject.toml`: `'--capture=sys'` added to `addopts` w/ ~20 lines of rationale comment cross-ref'ing the post-mortem doc - `test_cancellation`: drop `skipon_spawn_backend('subint_forkserver')` from `test_nested_ multierrors` — no longer needed. * file-level `pytestmark` covers any residual. - `tests/spawn/test_subint_forkserver.py`: orphan-SIGINT test's xfail mark loosened from `strict=True` to `strict=False` + reason rewritten. * it passes in isolation but is session-env-pollution sensitive (leftover subactor PIDs competing for ports / inheriting harness FDs). * tolerate both outcomes until suite isolation improves. - `test_shm`: extend the existing `skipon_spawn_backend('subint', ...)` to also skip `'subint_forkserver'`. * Different root cause from the cancel-cascade class: `multiprocessing.SharedMemory`'s `resource_tracker` + internals assume fresh- process state, don't survive fork-without-exec cleanly - `tests/discovery/test_registrar.py`: bump timeout 3→7s on one test (unrelated to forkserver; just a flaky-under-load bump). - `tractor.spawn._subint_forkserver`: inline comment-only future-work marker right before `_actor_child_main()` describing the planned conditional stdout/stderr-to-`/dev/null` redirect for cases where `--capture=sys` isn't enough (no code change — the redirect logic itself is deferred). EXTRA NOTEs ----------- The `--capture=sys` approach is the minimum- invasive fix: just a pytest ini change, no runtime code change, works for all fork-based backends, trade-offs well-understood (terminal-level capture still happens, just not pytest's per-test attribution of raw-fd output). (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `4c133ab541`) (factored: dropped spawn-backend-only paths: tests/spawn/test_subint_forkserver.py + tractor/spawn/_subint_forkserver.py; the xfail-loosening bullet above no longer applies)	2026-06-09 20:22:23 -04:00
Gud Boi	45c442060b	Codify capture-pipe hang lesson in skills Encode the hard-won lesson from the forkserver cancel-cascade investigation into two skill docs so future sessions grep-find it before spelunking into trio internals. Deats, - `.claude/skills/conc-anal/SKILL.md`: - new "Unbounded waits in cleanup paths" section — rule: bound every `await X.wait()` in cleanup paths with `trio.move_on_after()` unless the setter is unconditionally reachable. Recent example: `ipc_server.wait_for_no_more_peers()` in `async_main`'s finally (was unbounded, deadlocked when any peer handler stuck) - new "The capture-pipe-fill hang pattern" section — mechanism, grep-pointers to the existing `conftest.py` guards (`tests/conftest .py:258`, `:316`), cross-ref to the full post-mortem doc, and the grep-note: "if a multi-subproc tractor test hangs, `pytest -s` first, conc-anal second" - `.claude/skills/run-tests/SKILL.md`: new "Section 9: The pytest-capture hang pattern (CHECK THIS FIRST)" with symptom / cause / pre-existing guards to grep / three-step debug recipe (try `-s`, lower loglevel, redirect stdout/stderr) / signature of this bug vs. a real code hang / historical reference Cost several investigation sessions before the capture-pipe issue surfaced — it was masked by deeper cascade deadlocks. Once the cascades were fixed, the tree tore down enough to generate pipe-filling log volume. Lesson: grep this pattern first when any multi-subproc tractor test hangs under default pytest but passes with `-s`. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `4106ba73ea`)	2026-06-09 20:21:58 -04:00
Gud Boi	828df7df79	Update `subint_forkserver` skip reason: capture-pipe Refresh the `test_nested_multierrors` skip-mark reason to the final diagnosis: the hang is pytest's default `--capture=fd` pipe filling from high-volume subactor traceback output inherited via fds 1,2 in fork children — `pytest -s` passes cleanly. Records the fix direction (redirect child stdio to `/dev/null` in the fork-child prelude) for whoever lands the backend. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `eceed29d4a`) (factored: kept only the tests/test_cancellation.py skip-reason update of "Pin forkserver hang to pytest `--capture=fd`"; dropped the subint conc-anal doc + tests/spawn/test_subint_forkserver.py)	2026-06-09 20:21:58 -04:00
Gud Boi	41813ac9a0	Bound peer-clear wait in `async_main` finally Fifth diagnostic pass pinpointed the hang to `async_main`'s finally block — every stuck actor reaches `FINALLY ENTER` but never `RETURNING`. Specifically `await ipc_server.wait_for_no_more_ peers()` never returns when a peer-channel handler is stuck: the `_no_more_peers` Event is set only when `server._peers` empties, and stuck handlers keep their channels registered. Wrap the call in `trio.move_on_after(3.0)` + a warning-log on timeout that records the still- connected peer count. 3s is enough for any graceful cancel-ack round-trip; beyond that we're in bug territory and need to proceed with local teardown so the parent's `_ForkedProc.wait()` can unblock. Defensive-in-depth regardless of the underlying bug — a local finally shouldn't block on remote cooperation forever. Verified: with this fix, ALL 15 actors reach `async_main: RETURNING` (up from 10/15 before). Test still hangs past 45s though — there's at least one MORE unbounded wait downstream of `async_main`. Candidates enumerated in the doc update (`open_root_actor` finally / `actor.cancel()` internals / trio.run bg tasks / `_serve_ipc_eps` finally). Skip-mark stays on `test_nested_multierrors[subint_forkserver]`. Also updates `subint_forkserver_test_cancellation_leak_issue.md` with the new pinpoint + summary of the 6-item investigation win list: 1. FD hygiene fix (`_close_inherited_fds`) — orphan-SIGINT closed 2. pidfd-based `_ForkedProc.wait` — cancellable 3. `_parent_chan_cs` wiring — shielded parent-chan loop now breakable 4. `wait_for_no_more_peers` bound — THIS commit 5. Ruled-out hypotheses: tree-kill missing, stuck socket recv, capture-pipe fill (all wrong) 6. Remaining unknown: at least one more unbounded wait in the teardown cascade above `async_main` (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `e312a68d8a`) (factored: dropped subint_forkserver conc-anal doc update)	2026-06-09 20:21:26 -04:00
Gud Boi	1d70d33d9a	Claude-perms: ensure /commit-msg files can be written! (cherry picked from commit `76d12060aa`)	2026-06-09 20:20:52 -04:00
Gud Boi	555f64fdf2	Skip-mark `subint_forkserver` nested-multierror hang Skip-mark the still-hanging `test_nested_multierrors[subint_forkserver]` via `@pytest.mark.skipon_spawn_backend('subint_forkserver', reason=...)` so it stops blocking the test matrix while the remaining bug is being chased. The mark is an inert no-op until that (in-dev) backend lands. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `506617c695`) (factored: kept only the tests/test_cancellation.py skip-mark; dropped the subint_forkserver conc-anal doc update)	2026-06-09 20:20:29 -04:00
Gud Boi	d9a99d9c48	Break parent-chan shield during teardown Completes the nested-cancel deadlock fix started in `0cd0b633` (fork-child FD scrub) and `fe540d02` (pidfd- cancellable wait). The remaining piece: the parent- channel `process_messages` loop runs under `shield=True` (so normal cancel cascades don't kill it prematurely), and relies on EOF arriving when the parent closes the socket to exit naturally. Under exec-spawn backends (`trio_proc`, mp) that EOF arrival is reliable — parent's teardown closes the handler-task socket deterministically. But fork- based backends like `subint_forkserver` share enough process-image state that EOF delivery becomes racy: the loop parks waiting for an EOF that only arrives after the parent finishes its own teardown, but the parent is itself blocked on `os.waitpid()` for THIS actor's exit. Mutual wait → deadlock. Deats, - `async_main` stashes the cancel-scope returned by `root_tn.start(...)` for the parent-chan `process_messages` task onto the actor as `_parent_chan_cs` - `Actor.cancel()`'s teardown path (after `ipc_server.cancel()` + `wait_for_shutdown()`) calls `self._parent_chan_cs.cancel()` to explicitly break the shield — no more waiting for EOF delivery, unwinding proceeds deterministically regardless of backend - inline comments on both sites explain the mutual- wait deadlock + why the explicit cancel is backend-agnostic rather than a forkserver-specific workaround With this + the prior two fixes, the `subint_forkserver` nested-cancel cascade unwinds cleanly end-to-end. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `8ac3dfeb85`)	2026-06-09 20:19:56 -04:00
Gud Boi	222784ccc8	Use SIGINT-first ladder in `run-tests` cleanup The previous cleanup recipe went straight to SIGTERM+SIGKILL, which hides bugs: tractor is structured concurrent — `_trio_main` catches SIGINT as an OS-cancel and cascades `Portal.cancel_actor` over IPC to every descendant. So a graceful SIGINT exercises the actual SC teardown path; if it hangs, that's a real bug to file (the forkserver `:1616` zombie was originally suspected to be one of these but turned out to be a teardown gap in `_ForkedProc.kill()` instead). Deats, - step 1: `pkill -INT` scoped to `$(pwd)/py*` — no sleep yet, just send the signal - step 2: bounded wait loop (10 × 0.3s = ~3s) using `pgrep` to poll for exit. Loop breaks early on clean exit - step 3: `pkill -9` only if graceful timed out, w/ a logged escalation msg so it's obvious when SC teardown didn't complete - step 4: same SIGINT-first ladder for the rare `:1616`-holding zombie that doesn't match the cmdline pattern (find PID via `ss -tlnp`, then `kill -INT NNNN; sleep 1; kill -9 NNNN`) - steps 5-6: UDS-socket `rm -f` + re-verify unchanged Goal: surface real teardown bugs through the test- cleanup workflow instead of papering over them with `-9`. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `70d58c4bd2`)	2026-06-09 20:19:56 -04:00
Gud Boi	9c3fc19f35	Wire `reg_addr` through leaky cancel tests Stopgap companion to `d0121960` (`subint_forkserver` test-cancellation leak doc): five tests in `tests/test_cancellation.py` were running against the default `:1616` registry, so any leaked `subint-forkserv` descendant from a prior test holds the port and blows up every subsequent run with `TooSlowError` / "address in use". Thread the session-unique `reg_addr` fixture through so each run picks its own port — zombies can no longer poison other tests (they'll only cross-contaminate whatever happens to share their port, which is now nothing). Deats, - add `reg_addr: tuple` fixture param to: - `test_cancel_infinite_streamer` - `test_some_cancels_all` - `test_nested_multierrors` - `test_cancel_via_SIGINT` - `test_cancel_via_SIGINT_other_task` - explicitly pass `registry_addrs=[reg_addr]` to the two `open_nursery()` calls that previously had no kwargs at all (in `test_cancel_via_SIGINT` and `test_cancel_via_SIGINT_other_task`) - add bounded `@pytest.mark.timeout(7, method='thread')` to `test_nested_multierrors` so a hung run doesn't wedge the whole session Still doesn't close the real leak — the `subint_forkserver` backend's `_ForkedProc.kill()` is PID-scoped not tree-scoped, so grandchildren survive teardown regardless of registry port. This commit is just blast-radius containment until that fix lands. See `ai/conc-anal/ subint_forkserver_test_cancellation_leak_issue.md`. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `1af2121057`)	2026-06-09 20:19:56 -04:00
Gud Boi	1ebe15db3b	Add zombie-actor check to `run-tests` skill Fork-based backends (esp. `subint_forkserver`) can leak child actor processes on cancelled / SIGINT'd test runs; the zombies keep the tractor default registry (`127.0.0.1:1616` / `/tmp/registry@1616.sock`) bound, so every subsequent session can't bind and 50+ unrelated tests fail with the same `TooSlowError` / "address in use" signature. Document the pre-flight + post-cancel check as a mandatory step 4. Deats, - primary signal: `ss -tlnp \| grep ':1616'` for a bound TCP registry listener — the authoritative check since :1616 is unique to our runtime - `pgrep -af` scoped to `$(pwd)/py[0-9]/bin/python. _actor_child_main\|subint-forkserv` for leftover actor/forkserver procs — scoped deliberately so we don't false-flag legit long-running tractor- embedding apps like `piker` - `ls /tmp/registry@.sock` for stale UDS sockets - scoped cleanup recipe (SIGTERM + SIGKILL sweep using the same `$(pwd)/py` pattern, UDS `rm -f`, re-verify) plus a fallback for when a zombie holds :1616 but doesn't match the pattern: `ss -tlnp` → kill by PID - explicit false-positive warning calling out the `piker` case (`~/repos/piker/py*/bin/python3 -m tractor._child ...`) so a bare `pgrep` doesn't lead to nuking unrelated apps Goal: short-circuit the "spelunking into test code" rabbit-hole when the real cause is just a leaked PID from a prior session, without collateral damage to other tractor-embedding projects on the same box. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `d093c31979`)	2026-06-09 20:19:56 -04:00
Gud Boi	5ab2739b40	Enable `debug_mode` for `subint_forkserver` The `subint_forkserver` backend's child runtime is trio-native (uses `_trio_main` + receives `SpawnSpec` over IPC just like `trio`/`subint`), so `tractor.devx.debug._tty_lock` works in those subactors. Wire the runtime gates that historically hard-coded `_spawn_method == 'trio'` to recognize this third backend. Deats, - new `_DEBUG_COMPATIBLE_BACKENDS` module-const in `tractor._root` listing the spawn backends whose subactor runtime is trio-native (`'trio'`, `'subint_forkserver'`). Both the enable-site (`_runtime_vars['_debug_mode'] = True`) and the cleanup-site reset key. off the same tuple — keep them in lockstep when adding backends - `open_root_actor`'s `RuntimeError` for unsupported backends now reports the full compatible-set + the rejected method instead of the stale "only `trio`" msg. - `runtime._runtime.Actor._from_parent`'s SpawnSpec-recv gate adds `'subint_forkserver'` to the existing `('trio', 'subint')` tuple — fork child-side runtime receives the same SpawnSpec IPC handshake as the others. - `subint_forkserver_proc` child-target now passes `spawn_method='subint_forkserver'` (was hard-coded `'trio'`) so `Actor.pformat()` / log lines reflect the actual parent-side spawn mechanism rather than masquerading as plain `trio`. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `8bcbe730bf`)	2026-06-09 20:19:56 -04:00
Gud Boi	fc049abe2a	Refactor `_runtime_vars` into pure get/set API Resetting `_runtime_vars` post-(forking-)spawn was previously only possible via direct mutation of `_state._runtime_vars` from an external module + an inline default dict duplicating the `_state.py`-internal defaults. Split the access surface into a pure getter + explicit setter so such a reset call site becomes a one-liner composition: `set_runtime_vars(get_runtime_vars(clear_values=True))`. Deats `tractor/runtime/_state.py`, - extract initial values into a module-level `_RUNTIME_VARS_DEFAULTS: dict[str, Any]` constant; the live `_runtime_vars` is now initialised from `dict(_RUNTIME_VARS_DEFAULTS)` - `get_runtime_vars()` grows a `clear_values: bool = False` kwarg. When True, returns a fresh copy of `_RUNTIME_VARS_DEFAULTS` instead of the live dict — still a pure read, never mutates anything - new `set_runtime_vars(rtvars: dict \| RuntimeVars)` — atomic replacement of the live dict's contents via `.clear()` + `.update()`, so existing references to the same dict object remain valid. Accepts either the historical dict form or the `RuntimeVars` struct (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit 7804a9fe57693dd5e15bee6a08e7d2fa14b6a98a) (factored: kept only the tractor/runtime/_state.py part; dropped tractor/spawn/_subint_forkserver.py call-site rewire)	2026-06-09 20:19:56 -04:00
Gud Boi	668ad69fd2	Mark `subint`-hanging tests with `skipon_spawn_backend` Adopt the `@pytest.mark.skipon_spawn_backend('subint', reason=...)` marker (`a617b521`) across the suites reproducing the `subint` GIL-contention / starvation hang classes doc'd in `ai/conc-anal/subint_*_issue.md`. Deats, - Module-level `pytestmark` on full-file-hanging suites: - `tests/test_cancellation.py` - `tests/test_inter_peer_cancellation.py` - `tests/test_pubsub.py` - `tests/test_shm.py` - Per-test decorator where only one test in the file hangs: - `tests/discovery/test_registrar.py ::test_stale_entry_is_deleted` — replaces the inline `if start_method == 'subint': pytest.skip` branch with a declarative skip. - `tests/test_subint_cancellation.py ::test_subint_non_checkpointing_child`. - A few per-test decorators are left commented-in- place as breadcrumbs for later finer-grained unskips. Also, some nearby tidying in the affected files: - Annotate loose fixture / test params (`pytest.FixtureRequest`, `str`, `tuple`, `bool`) in `tests/conftest.py`, `tests/devx/conftest.py`, and `tests/test_cancellation.py`. - Normalize `"""..."""` → `'''...'''` docstrings per repo convention on a few touched tests. - Add `timeout=6` / `timeout=10` to `@tractor_test(...)` on `test_cancel_infinite_streamer` and `test_some_cancels_all`. - Drop redundant `spawn_backend` param from `test_cancel_via_SIGINT`; use `start_method` in the `'mp' in ...` check instead. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `4b2a0886c3`) (factored: dropped spawn-backend-only path: tests/test_subint_cancellation.py)	2026-06-09 20:19:26 -04:00
Gud Boi	b71a21b598	Add `skipon_spawn_backend` pytest marker A reusable `@pytest.mark.skipon_spawn_backend( '<backend>' [, ...], reason='...')` marker for backend-specific known-hang / -borked cases — avoids scattering `@pytest.mark.skipif(lambda ...)` branches across tests that misbehave under a particular `--spawn-backend`. Deats, - `pytest_configure()` registers the marker via `addinivalue_line('markers', ...)`. - New `pytest_collection_modifyitems()` hook walks each collected item with `item.iter_markers( name='skipon_spawn_backend')`, checks whether the active `--spawn-backend` appears in `mark.args`, and if so injects a concrete `pytest.mark.skip( reason=...)`. `iter_markers()` makes the decorator work at function, class, or module (`pytestmark = [...]`) scope transparently. - First matching mark wins; default reason is `f'Borked on --spawn-backend={backend!r}'` if the caller doesn't supply one. Also, tighten type annotations on nearby `pytest` integration points — `pytest_configure`, `debug_mode`, `spawn_backend`, `tpt_protos`, `tpt_proto` — now taking typed `pytest.Config` / `pytest.FixtureRequest` params. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `3b26b59dad`)	2026-06-09 20:19:11 -04:00
Gud Boi	33f1257721	Skip `test_stale_entry_is_deleted` hanger with `subint`s (cherry picked from commit `985ea76de5`)	2026-06-09 20:19:11 -04:00
Gud Boi	19dd6fc739	Add global 200s `pytest-timeout` (cherry picked from commit `5998774535`)	2026-06-09 20:19:11 -04:00
Gud Boi	b3536b755a	Bump lock-file for `pytest-timeout` + 3.13 gated wheel-deps (cherry picked from commit `a6cbac954d`)	2026-06-09 20:19:11 -04:00
Gud Boi	154cba86ac	Wall-cap `test_stale_entry_is_deleted` via `pytest-timeout` Add a hard process-level wall-clock bound on a test known to wedge un-Ctrl-C-ably under an in-dev spawn backend, so an unattended suite run can't hang indefinitely. Deats, - New `testing` dep: `pytest-timeout>=2.3`. - `test_stale_entry_is_deleted`: `@pytest.mark.timeout(3, method='thread')`. The `method='thread'` choice is deliberate — `method='signal'` routes via `SIGALRM` which can be starved by the same GIL-hostage path that drops `SIGINT`, so it'd never actually fire in the starvation case. At timeout, `pytest-timeout` hard-kills the pytest process itself — that's the intended behavior here; the alternative is the suite never returning. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit 189f4e3f72e9f1eda5d24bcbab5743f7e35bd913) (factored: kept pyproject + tests/discovery/test_registrar.py parts of "Wall-cap `subint` audit tests via `pytest-timeout`"; dropped tests/test_subint_cancellation.py)	2026-06-09 20:19:11 -04:00
Gud Boi	d60cf23659	Arm `dump_on_hang` on `test_stale_entry_is_deleted` Wrap the test's `trio.run(main)` in `dump_on_hang(seconds=20)` so any future hang regression captures a stack dump for triage instead of wedging CI silently; under the default backends it's a no-op safety net. Includes a "KNOWN ISSUE" comment block documenting the (future) `subint` backend hang classes observed against this test during Phase B bringup (#379). (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `4a3254583b`) (factored: kept only the tests/discovery/test_registrar.py part of "Doc `subint` backend hang classes + arm `dump_on_hang`"; dropped subint conc-anal docs + tests/test_subint_cancellation.py)	2026-06-09 20:18:44 -04:00
Gud Boi	ab6796dd45	Split py-version-gated uv dependency-groups Reshuffle `pyproject.toml` deps into per-python-version `[tool.uv.dependency-groups]`: - `subints` group: `msgspec>=0.21.0`, py>=3.14 - `eventfd` group: `cffi>=1.17.1`, py>=3.13,<3.14 - `sync_pause` group: `greenback`, py>=3.13,<3.14 (was in `devx`; moved out bc no 3.14 yet) Bump top-level `msgspec>=0.20.0` too. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `34d9d482e4`) (factored: kept only the pyproject dep-group parts of "Raise `subint` floor to py3.14 and split dep-groups"; dropped tractor/spawn/_spawn.py + tractor/spawn/_subint.py)	2026-06-09 20:18:04 -04:00
Gud Boi	e2cc5d150e	Add `._debug_hangs` to `.devx` for hang triage Bottle up the diagnostic primitives that actually cracked the silent mid-suite hangs in the `subint` spawn-backend bringup (issue there" session has them on the shelf instead of reinventing from scratch. Deats, - `dump_on_hang(seconds, , path)` — context manager wrapping `faulthandler.dump_traceback_later()`. Critical gotcha baked in: dumps go to a file, not `sys.stderr`, bc pytest's stderr capture silently eats the output and you can spend an hour convinced you're looking at the wrong thing - `track_resource_deltas(label, , writer)` — context manager logging per-block `(threading.active_count(), len(_interpreters.list_all()))` deltas; quickly rules out leak-accumulation theories when a suite progressively worsens (if counts don't grow, it's not a leak, look for a race on shared cleanup instead) - `resource_delta_fixture(*, autouse, writer)` — factory returning a `pytest` fixture wrapping `track_resource_deltas` per-test; opt in by importing into a `conftest.py`. Kept as a factory (not a bare fixture) so callers own `autouse` / `writer` wiring Also, - export the three names from `tractor.devx` - dep-free on py<3.13 (swallows `ImportError` for `_interpreters`) - link back to the provenance in the module docstring (issue #379 / commit `26fb820`) (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `09466a1e9d`)	2026-06-09 20:17:32 -04:00
Gud Boi	9157f58c15	Avoid skip `.ipc._ringbuf` import when no `cffi` (cherry picked from commit `03bf2b931e`)	2026-06-09 20:17:32 -04:00
Gud Boi	8726323170	Extract `_actor_child_main()` as shared child entry Pull the `_child.py` `__main__` block body out into a callable `_actor_child_main()` so alternate spawn backends can bootstrap a subactor without going through the CLI entrypoint. Deats, - new `_actor_child_main(uid, loglevel, parent_addr, infect_asyncio, spawn_method='trio')` holds the full child-side runtime startup previously inlined under `if __name__ == '__main__':` - `__main__` block reduces to arg-parsing + a call into the new func - add `"subint"` to the `_runtime.py` spawn-method check so a child accepts `SpawnSpec` from that (future) backend; inert str-compare w/o it (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `b8f243e98d`) (factored: kept only the `_child.py`/`_runtime.py` entry-extraction parts of "Impl min-viable `subint` spawn backend (B.2)"; dropped tractor/spawn/_subint.py + subint prompt-io logs)	2026-06-09 20:17:20 -04:00
Gud Boi	4052c5b562	Handle py3.14+ incompats as test skips Since we're devving subints we require the 3.14+ stdlib API and a couple compiled libs don't support it yet, namely: - `cffi`, which we're only using for the `.ipc._linux` eventfd stuff (now factored into `hotbaud` anyway). - `greenback`, which requires `greenlet` which doesn't seem to be wheeled yet * on nixos the sdist build was failing due to lack of `g++` which i don't care to figure out rn since we don't need `.devx` stuff immediately for this subints prototype. * [ ] we still need to adjust any dependent suites to skip. Adjust `test_ringbuf` to skip on import failure. Also project wide, - pin us to py 3.13+ in prep for last-2-minor-version policy. - drop `msgspec>=0.20.0`, the first release with py3.14 support. (cherry picked from commit `d2ea8aa2de`)	2026-06-09 20:17:20 -04:00
Gud Boi	d905c08f82	Open py-version range + harness gate for py3.14 backends (#379 ) Prep for a future sub-interpreter (PEP 734 `concurrent.interpreters`) spawn backend per issue #379 — land just the py-version range bump and the test-harness error-gating; the backend itself comes later. Deats, - bump `pyproject.toml` `requires-python` to `>=3.12, <3.15` and list the `3.14` classifier — the new stdlib `concurrent.interpreters` module only ships on 3.14 - `_testing.pytest.pytest_configure` wraps `try_set_start_method()` in a `pytest.UsageError` handler so an unsupported `--spawn-backend` on the running py-version prints a clean banner instead of a traceback (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `d318f1f8f4`) (factored: kept only the pyproject + `_testing/pytest.py` parts of "Add `'subint'` spawn backend scaffold (#379)"; dropped tractor/spawn/_spawn.py + tractor/spawn/_subint.py)	2026-06-09 20:17:20 -04:00
Gud Boi	c4951c86ec	Pin `xonsh` to GH `main` in editable mode (cherry picked from commit `64ddc42ad8`)	2026-06-09 20:15:31 -04:00

1 2 3 4 5 ...

2599 Commits (781abf75587d2d9c74db8860ae24fa846ae068cf) All Branches Search

2599 Commits (781abf75587d2d9c74db8860ae24fa846ae068cf)

All Branches