Commit Graph

661 Commits (2d9a95d13a2d042fc3b5c7b7a951f60efe0fa85e)

Author SHA1 Message Date
Gud Boi 2d9a95d13a Use `trio.fail_after` cap in `test_dynamic_pub_sub`
Drop `@pytest.mark.timeout(...)` for the per-test wall-clock
cap on `test_dynamic_pub_sub`; rely on `trio.fail_after(12)`
inside `main()` instead.

Both pytest-timeout enforcement modes are incompatible with
trio under fork-based backends:

- `method='signal'` (SIGALRM) synchronously raises `Failed`
  in trio's main thread mid-`epoll.poll()`, leaving
  `GLOBAL_RUN_CONTEXT` half-installed ("Trio guest run got
  abandoned") so EVERY subsequent `trio.run()` in the same
  pytest process bails with
  `RuntimeError: Attempted to call run() from inside a run()`
  — full-session poison.
- `method='thread'` calls `_thread.interrupt_main()` which
  can let the KBI escape trio's `KIManager` under fork-
  cascade teardown races and bubble out of pytest entirely
  — kills the whole session.

`trio.fail_after()` keeps cancellation inside the trio loop:
- Raises `TooSlowError` cleanly through the open-nursery's
  cancel cascade.
- Doesn't disturb any out-of-band signal/thread state.
- Failure stays scoped to the single test — no cross-test
  global state corruption either way.

Verified empirically: 10 hammer-runs of `test_dynamic_pub_sub`
go from 5/10 fail (with global-state poison) to 3/10 fail
(no poison, all sibling tests still pass). The ~30%
remaining flake rate is a genuine fork-cancel-cascade
hang — separate from this fix but no longer contaminates.

Module-level NOTE comment explains the rationale so future
readers don't re-introduce the bug.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 530160fa69)
2026-06-09 20:27:26 -04:00
Gud Boi 3315a8a292 Add opt-in `reap_subactors_per_test` fixture
Function-scoped, NON-autouse zombie-subactor reaper for
modules whose teardown is known-leaky enough to cascade-
fail every following test in a session.

Sibling to the autouse session-scoped `_reap_orphaned_subactors`. The
session-scoped one fires at session end — too late to save tests that
follow a hung/leaky test in the suite. The new fixture, opted into via
`pytestmark = pytest.mark.usefixtures(...)`, runs between tests in
a problem-module so a leftover subactor from test N can't squat on
registrar ports / UDS paths / shm segments needed by tests N+1,
N+2, ...

Intentionally NOT autouse — the fixture's presence on a module signals
"this module's teardown leaks; please root-cause instead of relying
forever on cleanup". A visibility-vs-convenience trade picked in favor
of the former.

Apply to `tests/test_infected_asyncio.py` since both recent full-suite
runs (parallel-tpt-proto + TCP-only) showed the cascade originating in
this file's KBI- and SIGINT-flavored tests under
`main_thread_forkserver`. Module-comment names the specific offenders so
future de-flake work has a starting point.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit b376eb0332)
2026-06-09 20:27:26 -04:00
Gud Boi 5b08c6b034 Sweep `subint_forkserver` → `main_thread_forkserver` in code
After the variant-1 / variant-2 backend split, update remaining
string-match refs to the variant-1 backend so user-visible gates
+ skip-marks + comments name the working backend correctly:

- `tractor._root._DEBUG_COMPATIBLE_BACKENDS`: include
  `main_thread_forkserver`, drop the stub-only `subint_forkserver`
  entry.
- `tests/test_spawning.py::test_loglevel_propagated_to_subactor`:
  capfd-skip flips to `main_thread_forkserver`.
- `tests/test_infected_asyncio.py::test_sigint_closes_lifetime_stack`:
  xfail-condition flips to `main_thread_forkserver`.
- `tests/test_shm.py`: drop stale "broken on `main_thread_forkserver`"
  reason-text since the `mp.SharedMemory(track=False)`
  + resource-tracker monkey-patch in `.ipc._mp_bs` makes the tests pass;
  the skip-mark only fires on plain `subint` now.
- Comment / docstring sweep: `runtime._state`, `runtime._runtime`,
  `_testing.pytest`, `_subint.py`, `pyproject.toml`,
  `test_cancellation.py`, `test_registrar.py` — refs to variant-1
  backend updated.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 205382a39b)
(factored: dropped spawn-backend-only path: tractor/spawn/_subint.py)
2026-06-09 20:27:26 -04:00
Gud Boi 1f7403abc2 Wire `reg_addr` into `test_context_stream_semantics`
Same wire-up pattern as the prior `test_dynamic_pub_sub`
commit: each test that already pulled in `debug_mode`
now also pulls in `reg_addr` and passes
`registry_addrs=[reg_addr]` into `tractor.open_nursery()`,
so the suite's standard registry-addr conventions apply.

Tests touched:
- `test_started_misuse`
- `test_simple_context`
- `test_parent_cancels`
- `test_one_end_stream_not_opened`
- `test_maybe_allow_overruns_stream`
- `test_ctx_with_self_actor`

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 66f1941f46)
2026-06-09 20:27:26 -04:00
Gud Boi ed00b75a7b Wire `test_dynamic_pub_sub` to standard fixtures
Pull in the `reg_addr`, `debug_mode`, and `test_log`
fixtures so this test follows the same conventions as
the rest of the suite:

- pass `registry_addrs=[reg_addr]` + `debug_mode` into
  `tractor.open_nursery()` (so `--tpdb` etc work).
- after the `pytest.raises` block, add `assert err` +
  `test_log.exception('Timed out AS EXPECTED')` so the
  expected timeout is logged explicitly instead of
  swallowed.

Also,
- drop whitespace-only blank lines around the
  `subs` param of `consumer()` and `ctx` param of
  `one_task_streams_and_one_handles_reqresp()`.
- promote `test_sigint_both_stream_types`'s one-line
  docstring to multi-line form.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 9b05f659b3)
2026-06-09 20:27:26 -04:00
Gud Boi 8daf8eeaca Bump `test_stale_entry_is_deleted`'s timeout to 30
Seems that when run in-suite it delays more then the so-measured "happy
path" timing; better to have no suite-global interruption then asserting
a fast single test's run.

(cherry picked from commit 65fcfbf224)
2026-06-09 20:27:26 -04:00
Gud Boi 2e2977b74c Fix `SharedMemory` under `subint_forkserver`
Implements the resolution described in c99d475d's
`subint_forkserver_mp_shared_memory_issue.md` (now
updated with the resolution post-mortem). Two-part
fix that side-steps `mp.resource_tracker` entirely
rather than try to make it fork-safe — turns out
that's both simpler AND more correct given tractor
already SC-manages allocation lifetimes.

Deats,
- `tractor/ipc/_mp_bs.py::disable_mantracker()`: drop the
  `platform.python_version_tuple()[:-1] >= ('3', '13')` branch — patches
  now run unconditionally:
  * monkey-patch `mp.resource_tracker. _resource_tracker` to a no-op
    `ManTracker` subclass (empty `register` / `unregister`
    / `ensure_running`).
  * return `partial(SharedMemory, track=False)` for the per-allocation
    opt-out.
  * belt + suspenders: even if something dodges the wrapper, the
    singleton can't talk to the inherited (broken) parent fd.

- `tractor/ipc/_shm.py::open_shm_list()`: drop the 3.13+ conditional
  skip of the unlink-callback; install a `try_unlink()` wrapper that
  swallows `FileNotFoundError` (sibling-already-cleaned race in
  shared-key setups). Without `mp.resource_tracker` doing it for us, we
  own the unlink — `actor. lifetime_stack` is the right place since
  tractor already controls actor lifecycle.

- `tests/test_shm.py`: uncomment-out `subint_forkserver` from the
  module-level skip- list (tests pass now). Inline comment cross-refs
  the two `_mp_bs` / `_shm` workarounds.

- `ai/conc-anal/subint_forkserver_mp_shared_memory_ issue.md`: heavy
  rewrite — flips status from "open / unresolvable in tractor" to
  "resolved, kept as decision record". Adds Resolution section, "Why
  this is the right call" rationale (mp tracker is widely criticized;
  tractor already owns lifecycle), trade-offs (crash-leaked segments,
  lost mp leak warning), verification (7 passed under both
  `subint_forkserver` and `trio` backends), and upstream issue links

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit aa3e230926)
(factored: dropped subint_forkserver conc-anal doc update)
2026-06-09 20:27:26 -04:00
Gud Boi da0c457ff7 Document `SharedMemory` × `subint_forkserver` incompat
New `ai/conc-anal/` doc: `mp.SharedMemory` is
fork-without-exec unsafe — child inherits parent's
`resource_tracker` fd → EBADF on first shm op;
leaked `/shm_list` cascades `FileExistsError`
across parametrize variants. Canonical CPython
issue class, NOT a tractor bug. Includes two
longer-term mitigation paths (reset inherited
tracker fd vs migrate off `mp.shared_memory`).

Also, update `tests/test_shm.py`:
- comment out `subint_forkserver` from skip list
- rewrite reason with precise failure-mode
  descriptions + link to the analysis doc

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit c99d475d03)
(factored: dropped spawn-backend-only paths: ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md)
2026-06-09 20:27:26 -04:00
Gud Boi 13053f9cbe Skip `test_loglevel_propagated_to_subactor` on subint forkserver too
(cherry picked from commit 2ca0f41e61)
2026-06-09 20:22:23 -04:00
Gud Boi a199aa5096 Wire `reg_addr` through infected-asyncio tests
Continues the hygiene pattern from de601676 (cancel tests) into
`tests/test_infected_asyncio.py`: many tests here were calling
`tractor.open_nursery()` w/o `registry_addrs=[reg_addr]` and thus racing
on the default `:1616` registry across sessions. Thread the
session-unique `reg_addr` through so leaked or slow-to-teardown
subactors from a prior test can't cross-pollute.

Deats,
- add `registry_addrs=[reg_addr]` to `open_nursery()`
  calls in suite where missing.
- `test_sigint_closes_lifetime_stack`:
  - add `reg_addr`, `debug_mode`, `start_method`
    fixture params
  - `delay` now reads the `debug_mode` param directly
    instead of calling `tractor.debug_mode()` (fires
    slightly earlier in the test lifecycle)
  - sanity assert `if debug_mode: assert
    tractor.debug_mode()` after nursery open
  - new print showing SIGINT target
    (`send_sigint_to` + resolved pid)
  - catch `trio.TooSlowError` around
    `ctx.wait_for_result()` and conditionally
    `pytest.xfail` when `send_sigint_to == 'child'
    and start_method == 'subint_forkserver'` — the
    known orphan-SIGINT limitation tracked in
    `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
- parametrize id typo fix: `'just_trio_slee'` → `'just_trio_sleep'`

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit b350aa09ee)
2026-06-09 20:22:23 -04:00
Gud Boi ba2e474d9d Import-or-skip `.devx.` tests requiring `greenback`
Which is for sure true on py3.14+ rn since `greenlet` didn't want to
build for us (yet).

(cherry picked from commit d6e70e9de4)
2026-06-09 20:22:23 -04:00
Gud Boi e4c7ac34db Default `pytest` to use `--capture=sys`
Lands the capture-pipe workaround from the prior cluster of diagnosis
commits: switch pytest's `--capture` mode from the default `fd`
(redirects fd 1,2 to temp files, which fork children inherit and can
deadlock writing into) to `sys` (only `sys.stdout` / `sys.stderr` — fd
1,2 left alone).

Trade-off documented inline in `pyproject.toml`:
- LOST: per-test attribution of raw-fd output (C-ext writes,
  `os.write(2, ...)`, subproc stdout). Still goes to terminal / CI
  capture, just not per-test-scoped in the failure report.
- KEPT: `print()` + `logging` capture per-test (tractor's logger uses
  `sys.stderr`).
- KEPT: `pytest -s` debugging behavior.

This allows us to re-enable `test_nested_multierrors` without
skip-marking + clears the class of pytest-capture-induced hangs for any
future fork-based backend tests.

Deats,
- `pyproject.toml`: `'--capture=sys'` added to `addopts` w/ ~20 lines of
  rationale comment cross-ref'ing the post-mortem doc

- `test_cancellation`: drop `skipon_spawn_backend('subint_forkserver')`
  from `test_nested_ multierrors` — no longer needed.
  * file-level `pytestmark` covers any residual.

- `tests/spawn/test_subint_forkserver.py`: orphan-SIGINT test's xfail
  mark loosened from `strict=True` to `strict=False` + reason rewritten.
  * it passes in isolation but is session-env-pollution sensitive
    (leftover subactor PIDs competing for ports / inheriting harness
    FDs).
  * tolerate both outcomes until suite isolation improves.

- `test_shm`: extend the existing
  `skipon_spawn_backend('subint', ...)` to also skip
  `'subint_forkserver'`.
  * Different root cause from the cancel-cascade class:
    `multiprocessing.SharedMemory`'s `resource_tracker` + internals
    assume fresh- process state, don't survive fork-without-exec cleanly

- `tests/discovery/test_registrar.py`: bump timeout 3→7s on one test
  (unrelated to forkserver; just a flaky-under-load bump).

- `tractor.spawn._subint_forkserver`: inline comment-only future-work
  marker right before `_actor_child_main()` describing the planned
  conditional stdout/stderr-to-`/dev/null` redirect for cases where
  `--capture=sys` isn't enough (no code change — the redirect logic
  itself is deferred).

EXTRA NOTEs
-----------
The `--capture=sys` approach is the minimum- invasive fix: just a pytest
ini change, no runtime code change, works for all fork-based backends,
trade-offs well-understood (terminal-level capture still happens, just
not pytest's per-test attribution of raw-fd output).

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 4c133ab541)
(factored: dropped spawn-backend-only paths: tests/spawn/test_subint_forkserver.py + tractor/spawn/_subint_forkserver.py; the xfail-loosening bullet above no longer applies)
2026-06-09 20:22:23 -04:00
Gud Boi 828df7df79 Update `subint_forkserver` skip reason: capture-pipe
Refresh the `test_nested_multierrors` skip-mark
reason to the final diagnosis: the hang is pytest's
default `--capture=fd` pipe filling from high-volume
subactor traceback output inherited via fds 1,2 in
fork children — `pytest -s` passes cleanly. Records
the fix direction (redirect child stdio to
`/dev/null` in the fork-child prelude) for whoever
lands the backend.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit eceed29d4a)
(factored: kept only the tests/test_cancellation.py skip-reason update of
 "Pin forkserver hang to pytest `--capture=fd`"; dropped the subint
 conc-anal doc + tests/spawn/test_subint_forkserver.py)
2026-06-09 20:21:58 -04:00
Gud Boi 555f64fdf2 Skip-mark `subint_forkserver` nested-multierror hang
Skip-mark the still-hanging
`test_nested_multierrors[subint_forkserver]` via
`@pytest.mark.skipon_spawn_backend('subint_forkserver',
reason=...)` so it stops blocking the test matrix
while the remaining bug is being chased. The mark is
an inert no-op until that (in-dev) backend lands.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 506617c695)
(factored: kept only the tests/test_cancellation.py skip-mark; dropped
 the subint_forkserver conc-anal doc update)
2026-06-09 20:20:29 -04:00
Gud Boi 9c3fc19f35 Wire `reg_addr` through leaky cancel tests
Stopgap companion to d0121960 (`subint_forkserver`
test-cancellation leak doc): five tests in
`tests/test_cancellation.py` were running against the
default `:1616` registry, so any leaked
`subint-forkserv` descendant from a prior test holds
the port and blows up every subsequent run with
`TooSlowError` / "address in use". Thread the
session-unique `reg_addr` fixture through so each run
picks its own port — zombies can no longer poison
other tests (they'll only cross-contaminate whatever
happens to share their port, which is now nothing).

Deats,
- add `reg_addr: tuple` fixture param to:
  - `test_cancel_infinite_streamer`
  - `test_some_cancels_all`
  - `test_nested_multierrors`
  - `test_cancel_via_SIGINT`
  - `test_cancel_via_SIGINT_other_task`
- explicitly pass `registry_addrs=[reg_addr]` to the
  two `open_nursery()` calls that previously had no
  kwargs at all (in `test_cancel_via_SIGINT` and
  `test_cancel_via_SIGINT_other_task`)
- add bounded `@pytest.mark.timeout(7, method='thread')`
  to `test_nested_multierrors` so a hung run doesn't
  wedge the whole session

Still doesn't close the real leak — the
`subint_forkserver` backend's `_ForkedProc.kill()` is
PID-scoped not tree-scoped, so grandchildren survive
teardown regardless of registry port. This commit is
just blast-radius containment until that fix lands.
See `ai/conc-anal/
subint_forkserver_test_cancellation_leak_issue.md`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 1af2121057)
2026-06-09 20:19:56 -04:00
Gud Boi 668ad69fd2 Mark `subint`-hanging tests with `skipon_spawn_backend`
Adopt the `@pytest.mark.skipon_spawn_backend('subint',
reason=...)` marker (a617b521) across the suites
reproducing the `subint` GIL-contention / starvation
hang classes doc'd in `ai/conc-anal/subint_*_issue.md`.

Deats,
- Module-level `pytestmark` on full-file-hanging suites:
  - `tests/test_cancellation.py`
  - `tests/test_inter_peer_cancellation.py`
  - `tests/test_pubsub.py`
  - `tests/test_shm.py`
- Per-test decorator where only one test in the file
  hangs:
  - `tests/discovery/test_registrar.py
    ::test_stale_entry_is_deleted` — replaces the
    inline `if start_method == 'subint': pytest.skip`
    branch with a declarative skip.
  - `tests/test_subint_cancellation.py
    ::test_subint_non_checkpointing_child`.
- A few per-test decorators are left commented-in-
  place as breadcrumbs for later finer-grained unskips.

Also, some nearby tidying in the affected files:
- Annotate loose fixture / test params
  (`pytest.FixtureRequest`, `str`, `tuple`, `bool`) in
  `tests/conftest.py`, `tests/devx/conftest.py`, and
  `tests/test_cancellation.py`.
- Normalize `"""..."""` → `'''...'''` docstrings per
  repo convention on a few touched tests.
- Add `timeout=6` / `timeout=10` to
  `@tractor_test(...)` on `test_cancel_infinite_streamer`
  and `test_some_cancels_all`.
- Drop redundant `spawn_backend` param from
  `test_cancel_via_SIGINT`; use `start_method` in the
  `'mp' in ...` check instead.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 4b2a0886c3)
(factored: dropped spawn-backend-only path: tests/test_subint_cancellation.py)
2026-06-09 20:19:26 -04:00
Gud Boi 33f1257721 Skip `test_stale_entry_is_deleted` hanger with `subint`s
(cherry picked from commit 985ea76de5)
2026-06-09 20:19:11 -04:00
Gud Boi 154cba86ac Wall-cap `test_stale_entry_is_deleted` via `pytest-timeout`
Add a hard process-level wall-clock bound on a test
known to wedge un-Ctrl-C-ably under an in-dev spawn
backend, so an unattended suite run can't hang
indefinitely.

Deats,
- New `testing` dep: `pytest-timeout>=2.3`.
- `test_stale_entry_is_deleted`:
  `@pytest.mark.timeout(3, method='thread')`. The
  `method='thread'` choice is deliberate —
  `method='signal'` routes via `SIGALRM` which can be
  starved by the same GIL-hostage path that drops
  `SIGINT`, so it'd never actually fire in the
  starvation case.

At timeout, `pytest-timeout` hard-kills the pytest
process itself — that's the intended behavior here;
the alternative is the suite never returning.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 189f4e3f72e9f1eda5d24bcbab5743f7e35bd913)
(factored: kept pyproject + tests/discovery/test_registrar.py parts of
 "Wall-cap `subint` audit tests via `pytest-timeout`"; dropped
 tests/test_subint_cancellation.py)
2026-06-09 20:19:11 -04:00
Gud Boi d60cf23659 Arm `dump_on_hang` on `test_stale_entry_is_deleted`
Wrap the test's `trio.run(main)` in
`dump_on_hang(seconds=20)` so any future hang
regression captures a stack dump for triage instead
of wedging CI silently; under the default backends
it's a no-op safety net.

Includes a "KNOWN ISSUE" comment block documenting
the (future) `subint` backend hang classes observed
against this test during Phase B bringup (#379).

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 4a3254583b)
(factored: kept only the tests/discovery/test_registrar.py part of
 "Doc `subint` backend hang classes + arm `dump_on_hang`"; dropped
 subint conc-anal docs + tests/test_subint_cancellation.py)
2026-06-09 20:18:44 -04:00
Gud Boi 9157f58c15 Avoid skip `.ipc._ringbuf` import when no `cffi`
(cherry picked from commit 03bf2b931e)
2026-06-09 20:17:32 -04:00
Gud Boi 4052c5b562 Handle py3.14+ incompats as test skips
Since we're devving subints we require the 3.14+ stdlib API
and a couple compiled libs don't support it yet, namely:
- `cffi`, which we're only using for the `.ipc._linux` eventfd
  stuff (now factored into `hotbaud` anyway).
- `greenback`, which requires `greenlet` which doesn't seem to be
  wheeled yet
  * on nixos the sdist build was failing due to lack of `g++` which
    i don't care to figure out rn since we don't need `.devx` stuff
    immediately for this subints prototype.
  * [ ] we still need to adjust any dependent suites to skip.

Adjust `test_ringbuf` to skip on import failure.

Also project wide,
- pin us to py 3.13+ in prep for last-2-minor-version policy.
- drop `msgspec>=0.20.0`, the first release with py3.14 support.

(cherry picked from commit d2ea8aa2de)
2026-06-09 20:17:20 -04:00
Gud Boi 3867403fab Scale `test_open_local_sub_to_stream` timeout by CPU factor
Import and apply `cpu_scaling_factor()` from
`conftest`; bump base from 3.6 -> 4 and multiply
through so CI boxes with slow CPUs don't flake.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-16 20:03:32 -04:00
Gud Boi ed65301d32 Fix misc bugs caught by Copilot review
Deats,
- use `proc.poll() is None` in `sig_prog()` to
  distinguish "still running" from exit code 0;
  drop stale `breakpoint()` from fallback kill
  path (would hang CI).
- add missing `raise` on the `RuntimeError` in
  `async_main()` when no tpt bind addrs given.
- clean up stale uid entries from the registrar
  `_registry` when addr eviction empties the
  addr list.
- update `discovery.__init__` docstring to match
  the new eager `._multiaddr` import.
- fix `registar` -> `registrar` typo in teardown
  report log msg.

Review: PR #429 (Copilot)
https://github.com/goodboy/tractor/pull/429

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:15 -04:00
Gud Boi cd287c7e93 Fix `test_registrar_merge_binds_union` for UDS collision
`get_random()` can produce the same UDS filename for a given
pid+actor-state, so the "disjoint addrs" premise doesn't always hold.
Gate the `len(bound) >= 2` assertion on whether the registry and bind
addrs actually differ via `expect_disjoint`.

Also,
- drop unused `partial` import

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:15 -04:00
Gud Boi 86d4e0d3ed Harden `sig_prog()` retries, adjust debugger test timeouts
Retry signal delivery in `sig_prog()` up to `tries`
times (default 3) w/ `canc_timeout` sleep between
attempts; only fall back to `_KILL_SIGNAL` after all
retries exhaust. Bump default timeout 0.1 -> 0.2.

Also,
- `test_multi_nested_subactors_error_through_nurseries`
  gives the first prompt iteration a 5s timeout even
  on linux bc the initial crash sequence can be slow
  to arrive at a `pdb` prompt

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi c3d6cc9007 Rename `discovery._discovery` to `._api`
Adjust all imports to match.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi e90241baaa Add `parse_endpoints()` to `_multiaddr`
Provide a service-table parsing API for downstream projects (like
`piker`) to declare per-actor transport bind addresses as a config map
of actor-name -> multiaddr strings (e.g. from a TOML `[network]`
section).

Deats,
- `EndpointsTable` type alias: input `dict[str, list[str|tuple]]`.
- `ParsedEndpoints` type alias: output `dict[str, list[Address]]`.
- `parse_endpoints()` iterates the table and delegates each entry to the
  existing `tractor.discovery._discovery.wrap_address()` helper, which
  handles maddr strings, raw `(host, port)` tuples, and pre-wrapped
  `Address` objs.
- UDS maddrs use the multiaddr spec name `/unix/...` (not tractor's
  internal `/uds/` proto_key)

Also add new tests,
- 7 new pure unit tests (no trio runtime): TCP-only, mixed tpts,
  unwrapped tuples, mixed str+tuple, unsupported proto (`/udp/`),
  empty table, empty actor list
- all 22 multiaddr tests pass rn.

Prompt-IO:
ai/prompt-io/claude/20260413T205048Z_269d939c_prompt_io.md

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 7079a597c5 Add `test_tpt_bind_addrs.py` + fix type-mixing bug
Add 9 test variants (6 fns) covering all three
`tpt_bind_addrs` code paths in `open_root_actor()`:
- registrar w/ explicit bind (eq, subset, disjoint)
- non-registrar w/ explicit bind (same/diff
  bindspace) using `daemon` fixture
- non-registrar default random bind (baseline)
- maddr string input parsing
- registrar merge produces union
- `open_nursery()` forwards `tpt_bind_addrs`

Fix type-mixing bug at `_root.py:446` where the
registrar merge path did `set(Address + tuple)`,
preventing dedup and causing double-bind `OSError`.
Wrap `uw_reg_addrs` before the set union so both
sides are `Address` objs.

Also,
- add prompt-io output log for this session
- stage original prompt input for tracking

Prompt-IO: ai/prompt-io/claude/20260413T192116Z_f851f28_prompt_io.md

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi f881683c97 Tweak timeouts and rm `arbiter_addr` in tests
Use `cpu_scaling_factor()` headroom in
`test_peer_spawns_and_cancels_service_subactor`'s `fail_after` to avoid
flaky timeouts on throttled CI runners. Rename `arbiter_addr=` ->
`registry_addrs=[..]` throughout `test_spawning` and
`test_task_broadcasting` suites to match the current `open_root_actor()`
/ `open_nursery()` API.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 490fac432c Preserve absolute UDS paths in `parse_maddr()`
Drop the `.lstrip('/')` on the unix protocol value
so the lib-prepended `/` restores the absolute-path
semantics that `mk_maddr()` strips when encoding.
Pass `Path` components (not `str`) to `UDSAddress`.

Also, update all UDS test params to use absolute
paths (`/tmp/tractor_test/...`, `/tmp/tractor_rt/...`)
matching real runtime sockpath behavior; tighten
`test_parse_maddr_uds` to assert exact `filedir`.

Review: PR #429 (copilot-pull-request-reviewer[bot])
https://github.com/goodboy/tractor/pull/429#pullrequestreview-4018448152

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 5c4438bacc Add `parse_maddr()` tests + registrar maddr integ test
Cover `parse_maddr()` with unit tests for tcp/ipv4,
tcp/ipv6, uds, and unsupported-protocol error paths,
plus full `addr -> mk_maddr -> str -> parse_maddr`
roundtrip verification.

Adds,
- a `_maddr_to_tpt_proto` inverse-mapping assertion.
- an `wrap_address()` maddr-string acceptance test.
- a `test_reg_then_unreg_maddr` end-to-end suite which audits passing
  the registry addr as multiaddr str through the entire runtime.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 1f1e09a786 Move `test_discovery` to `tests/discovery/test_registrar`
All tests are registrar-actor integration scenarios
sharing intertwined helpers + `enable_modules=[__name__]`
task fns, so keep as one mod but rename to reflect
content. Now lives alongside `test_multiaddr.py` in
the new `tests/discovery/` subpkg.

Also,
- update 5 refs in `/run-tests` SKILL.md to match
  the new path
- add `discovery/` subdir to the test directory
  layout tree

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 7cf3b5d00d Add `test_multiaddr.py` suite for `mk_maddr()`
Cover `_tpt_proto_to_maddr` mapping, TCP (ipv4/ipv6),
UDS, unsupported `proto_key` error, and round-trip
re-parse for both transport types.

Deats,
- new `tests/discovery/` subpkg w/ empty `__init__.py`
- `test_tpt_proto_to_maddr_mapping`: verify `tcp` and
  `uds` entries
- `test_mk_maddr_tcp_ipv4`: full assertion on
  `/ip4/127.0.0.1/tcp/1234` incl protocol iteration
- `test_mk_maddr_tcp_ipv6`: verify `/ip6/::1/tcp/5678`
- `test_mk_maddr_uds`: relative `filedir` bc the
  multiaddr parser rejects double-slash from abs paths
- `test_mk_maddr_unsupported_proto_key`: `ValueError`
  on `proto_key='quic'` via `SimpleNamespace` mock
- `test_mk_maddr_roundtrip`: parametrized over tcp +
  uds, re-parse `str(maddr)` back through `Multiaddr`

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 570c975f14 Fix test typo + use `wait_for_result()` API
- "boostrap" → "bootstrap" in mod docstring
- replace deprecated `portal.result()` with
  `portal.wait_for_result()` + value assertion
  inside the nursery block

Review: PR #1 (Copilot)
https://github.com/mahmoudhas/tractor/pull/1#pullrequestreview-4091096072

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-10 20:58:54 -04:00
Gud Boi e8f1eca8d2 Tighten `test_spawning` types, parametrize loglevel
Parametrize `test_loglevel_propagated_to_subactor`
across `'debug'`, `'cancel'`, `'critical'` levels
(was hardcoded to just `'critical'`) and move it
above the parent-main tests for logical grouping.

Also,
- add `start_method: str` annotations throughout
- use `portal.wait_for_result()` in
  `test_most_beautiful_word` (replaces `.result()`)
- expand mod docstring to describe test coverage
- reformat `check_parent_main_inheritance` docstr

Review: PR #438 (Copilot)
https://github.com/goodboy/tractor/pull/438

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-10 20:58:54 -04:00
Gud Boi c6c591e61a Drop `spawn_test_support` pkg, inline parent-main tests
Replace the subproc-based test harness with inline
`tractor.open_nursery()` calls that directly check
`actor._parent_main_data` instead of comparing
`__main__.__name__` across a process boundary
(which is a no-op under pytest bc the parent
`__main__` is `pytest.__main__`).

Deats,
- delete `tests/spawn_test_support/` pkg (3 files)
- add `check_parent_main_inheritance()` helper fn
  that asserts on `_parent_main_data` emptiness
- rewrite both `run_in_actor` and `start_actor`
  parent-main tests as inline async fns
- drop `tmp_path` fixture and unused imports

Review: PR #434 (goodboy, Copilot)
https://github.com/goodboy/tractor/pull/434

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-10 20:58:54 -04:00
mahmoud b883b27646 Exercise parent-main inheritance through spawn test support
Move the subprocess probe into dedicated spawn test support files so the inheritance tests cover the real __main__ replay path without monkeypatching or inline script strings.
2026-04-10 20:58:54 -04:00
mahmoud 00637764d9 Address review follow-ups for parent-main inheritance opt-out
Clean up mutable defaults, give parent-main bootstrap data a named type, and add direct start_actor coverage so the opt-out change is clearer to review.
2026-04-10 20:58:54 -04:00
mahmoud ea971d25aa Rename parent-main inheritance flag.
Use `inherit_parent_main` across the actor APIs and helper to better describe the behavior, and restore the reviewer note at child bootstrap where the inherited `__main__` data is copied from `SpawnSpec`.
2026-04-10 20:58:54 -04:00
mahmoud 83b6c4270a Simplify parent-main replay opt-out.
Keep actor-owned parent-main capture and let `_mp_figure_out_main()` decide whether to return `__main__` bootstrap data, avoiding the extra SpawnSpec plumbing while preserving the per-actor flag.
2026-04-10 20:58:54 -04:00
mahmoud 6309c2e6fc Route parent-main replay through SpawnSpec
Keep trio child bootstrap data in the spawn handshake instead of stashing it on Actor state so the replay opt-out stays explicit and avoids stale-looking runtime fields.
2026-04-10 20:58:54 -04:00
mahmoud f5301d3fb0 Add per-actor parent-main replay opt-out
Let actor callers skip replaying the parent __main__ during child startup so downstream integrations can avoid inheriting incompatible bootstrap state without changing the default spawn behavior.
2026-04-10 20:58:54 -04:00
Gud Boi febe587c6c Drop `xfail` from `test_moc_reentry_during_teardown`
The per-`ctx_key` locking fix in f086222d intended to resolve the
teardown race reproduced by the new test suite, so the test SHOULD now
pass. TLDR, it doesn't Bp

Also add `collapse_eg()` to the test's ctx-manager stack so that when
run with `pytest <...> --tpdb` we'll actually `pdb`-REPL the RTE when it
hits (previously an assert-error).

(this commit-msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-09 14:42:42 -04:00
Gud Boi cab366cd65 Add xfail test for `_Cache.run_ctx` teardown race
Reproduce the piker `open_cached_client('kraken')` scenario: identical
`ctx_key` callers share one cached resource, and a new task re-enters
during `__aexit__` — hitting `assert not resources.get()` bc `values`
was popped but `resources` wasn't yet.

Deats,
- `test_moc_reentry_during_teardown` uses an `in_aexit` event to
  deterministically land in the teardown window.
- marked `xfail(raises=AssertionError)` against unpatched code (fix in
  `9e49eddd` or wtv lands on the `maybe_open_ctx_locking` or thereafter
  patch branch).

Also, add prompt-io log for the session.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

Prompt-IO: ai/prompt-io/claude/20260406T193125Z_85f9c5d_prompt_io.md
2026-04-06 18:17:04 -04:00
Gud Boi 85f9c5df6f Add per-`ctx_key` isolation tests for `maybe_open_context()`
Add `test_per_ctx_key_resource_lifecycle` to verify that per-key user
tracking correctly tears down resources independently - exercises the
fix from 02b2ef18 where a global `_Cache.users` counter caused stale
cache hits when the same `acm_func` was called with different kwargs.

Also, add a paired `acm_with_resource()` helper `@acm` that yields its
`resource_id` for per-key testing in the above suite.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

Prompt-IO: ai/prompt-io/claude/20260406T172848Z_02b2ef1_prompt_io.md
2026-04-06 14:37:47 -04:00
Gud Boi ebe9d5e4b5 Parametrize `test_resource_cache.test_open_local_sub_to_stream`
Namely with multiple pre-sleep `delay`-parametrizations before either,

- parent-scope cancel-calling (as originally) or,
- depending on the  new `cancel_by_cs: bool` suite parameter, optionally
  just immediately exiting from (the newly named)
  `maybe_cancel_outer_cs()` a checkpoint.

In the latter case we ensure we **don't** inf sleep to avoid leaking
those tasks into the `Actor._service_tn` (though we should really have
a better soln for this)..

Deats,
- make `cs` args optional and adjust internal logic to match.
- add some notes around various edge cases and issues with using the
  actor-service-tn as the scope by default.
2026-04-06 14:37:47 -04:00
Gud Boi 8f44efa327 Drop stale `.cancel()`, fix docstring typo in tests
- Remove leftover `await an.cancel()` in
  `test_registered_custom_err_relayed`; the
  nursery already cancels on scope exit.
- Fix `This document` -> `This documents` typo in
  `test_unregistered_err_still_relayed` docstring.

Review: PR #426 (Copilot)
https://github.com/goodboy/tractor/pull/426

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 5968a3c773 Use `'<unknown>'` for unresolvable `.boxed_type_str`
Add a teensie unit test to match.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 80597b80bf Add passing test for unregistered err relay
Drop the `xfail` test and instead add a new one that ensures the
`tractor._exceptions` fixes enable graceful relay of
remote-but-unregistered error types via the unboxing of just the
`rae.src_type_str/boxed_type_str` content. The test also ensures
a warning is included with remote error content indicating the user
should register their error type for effective cross-actor re-raising.

Deats,
- add `test_unregistered_err_still_relayed`: verify the
  `RemoteActorError` IS raised with `.boxed_type`
  as `None` but `.src_type_str`, `.boxed_type_str`,
  and `.tb_str` all preserved from the IPC msg.
- drop `test_unregistered_boxed_type_resolution_xfail`
  since the new above case covers it and we don't need to have
  an effectively entirely repeated test just with an inverse assert
  as it's last line..

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 9c37b3f956 Add `reg_err_types()` test suite for remote exc relay
Verify registered custom error types round-trip correctly over IPC via
`reg_err_types()` + `get_err_type()`.

Deats,
- `TestRegErrTypesPlumbing`: 5 unit tests for the type-registry plumbing
  (register, lookup, builtins, tractor-native types, unregistered
  returns `None`)
- `test_registered_custom_err_relayed`: IPC end-to-end for a registered
  `CustomAppError` checking `.boxed_type`, `.src_type`, and `.tb_str`
- `test_registered_another_err_relayed`: same for `AnotherAppError`
  (multi-type coverage)
- `test_unregistered_custom_err_fails_lookup`: `xfail` documenting that
  `.boxed_type` can't resolve without `reg_err_types()` registration

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00