Continues the hygiene pattern from de601676 (cancel tests) into
`tests/test_infected_asyncio.py`: many tests here were calling
`tractor.open_nursery()` w/o `registry_addrs=[reg_addr]` and thus racing
on the default `:1616` registry across sessions. Thread the
session-unique `reg_addr` through so leaked or slow-to-teardown
subactors from a prior test can't cross-pollute.
Deats,
- add `registry_addrs=[reg_addr]` to `open_nursery()`
calls in suite where missing.
- `test_sigint_closes_lifetime_stack`:
- add `reg_addr`, `debug_mode`, `start_method`
fixture params
- `delay` now reads the `debug_mode` param directly
instead of calling `tractor.debug_mode()` (fires
slightly earlier in the test lifecycle)
- sanity assert `if debug_mode: assert
tractor.debug_mode()` after nursery open
- new print showing SIGINT target
(`send_sigint_to` + resolved pid)
- catch `trio.TooSlowError` around
`ctx.wait_for_result()` and conditionally
`pytest.xfail` when `send_sigint_to == 'child'
and start_method == 'subint_forkserver'` — the
known orphan-SIGINT limitation tracked in
`ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
- parametrize id typo fix: `'just_trio_slee'` → `'just_trio_sleep'`
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit b350aa09ee)
Lands the capture-pipe workaround from the prior cluster of diagnosis
commits: switch pytest's `--capture` mode from the default `fd`
(redirects fd 1,2 to temp files, which fork children inherit and can
deadlock writing into) to `sys` (only `sys.stdout` / `sys.stderr` — fd
1,2 left alone).
Trade-off documented inline in `pyproject.toml`:
- LOST: per-test attribution of raw-fd output (C-ext writes,
`os.write(2, ...)`, subproc stdout). Still goes to terminal / CI
capture, just not per-test-scoped in the failure report.
- KEPT: `print()` + `logging` capture per-test (tractor's logger uses
`sys.stderr`).
- KEPT: `pytest -s` debugging behavior.
This allows us to re-enable `test_nested_multierrors` without
skip-marking + clears the class of pytest-capture-induced hangs for any
future fork-based backend tests.
Deats,
- `pyproject.toml`: `'--capture=sys'` added to `addopts` w/ ~20 lines of
rationale comment cross-ref'ing the post-mortem doc
- `test_cancellation`: drop `skipon_spawn_backend('subint_forkserver')`
from `test_nested_ multierrors` — no longer needed.
* file-level `pytestmark` covers any residual.
- `tests/spawn/test_subint_forkserver.py`: orphan-SIGINT test's xfail
mark loosened from `strict=True` to `strict=False` + reason rewritten.
* it passes in isolation but is session-env-pollution sensitive
(leftover subactor PIDs competing for ports / inheriting harness
FDs).
* tolerate both outcomes until suite isolation improves.
- `test_shm`: extend the existing
`skipon_spawn_backend('subint', ...)` to also skip
`'subint_forkserver'`.
* Different root cause from the cancel-cascade class:
`multiprocessing.SharedMemory`'s `resource_tracker` + internals
assume fresh- process state, don't survive fork-without-exec cleanly
- `tests/discovery/test_registrar.py`: bump timeout 3→7s on one test
(unrelated to forkserver; just a flaky-under-load bump).
- `tractor.spawn._subint_forkserver`: inline comment-only future-work
marker right before `_actor_child_main()` describing the planned
conditional stdout/stderr-to-`/dev/null` redirect for cases where
`--capture=sys` isn't enough (no code change — the redirect logic
itself is deferred).
EXTRA NOTEs
-----------
The `--capture=sys` approach is the minimum- invasive fix: just a pytest
ini change, no runtime code change, works for all fork-based backends,
trade-offs well-understood (terminal-level capture still happens, just
not pytest's per-test attribution of raw-fd output).
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 4c133ab541)
(factored: dropped spawn-backend-only paths: tests/spawn/test_subint_forkserver.py + tractor/spawn/_subint_forkserver.py; the xfail-loosening bullet above no longer applies)
Refresh the `test_nested_multierrors` skip-mark
reason to the final diagnosis: the hang is pytest's
default `--capture=fd` pipe filling from high-volume
subactor traceback output inherited via fds 1,2 in
fork children — `pytest -s` passes cleanly. Records
the fix direction (redirect child stdio to
`/dev/null` in the fork-child prelude) for whoever
lands the backend.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit eceed29d4a)
(factored: kept only the tests/test_cancellation.py skip-reason update of
"Pin forkserver hang to pytest `--capture=fd`"; dropped the subint
conc-anal doc + tests/spawn/test_subint_forkserver.py)
Skip-mark the still-hanging
`test_nested_multierrors[subint_forkserver]` via
`@pytest.mark.skipon_spawn_backend('subint_forkserver',
reason=...)` so it stops blocking the test matrix
while the remaining bug is being chased. The mark is
an inert no-op until that (in-dev) backend lands.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 506617c695)
(factored: kept only the tests/test_cancellation.py skip-mark; dropped
the subint_forkserver conc-anal doc update)
Stopgap companion to d0121960 (`subint_forkserver`
test-cancellation leak doc): five tests in
`tests/test_cancellation.py` were running against the
default `:1616` registry, so any leaked
`subint-forkserv` descendant from a prior test holds
the port and blows up every subsequent run with
`TooSlowError` / "address in use". Thread the
session-unique `reg_addr` fixture through so each run
picks its own port — zombies can no longer poison
other tests (they'll only cross-contaminate whatever
happens to share their port, which is now nothing).
Deats,
- add `reg_addr: tuple` fixture param to:
- `test_cancel_infinite_streamer`
- `test_some_cancels_all`
- `test_nested_multierrors`
- `test_cancel_via_SIGINT`
- `test_cancel_via_SIGINT_other_task`
- explicitly pass `registry_addrs=[reg_addr]` to the
two `open_nursery()` calls that previously had no
kwargs at all (in `test_cancel_via_SIGINT` and
`test_cancel_via_SIGINT_other_task`)
- add bounded `@pytest.mark.timeout(7, method='thread')`
to `test_nested_multierrors` so a hung run doesn't
wedge the whole session
Still doesn't close the real leak — the
`subint_forkserver` backend's `_ForkedProc.kill()` is
PID-scoped not tree-scoped, so grandchildren survive
teardown regardless of registry port. This commit is
just blast-radius containment until that fix lands.
See `ai/conc-anal/
subint_forkserver_test_cancellation_leak_issue.md`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 1af2121057)
Adopt the `@pytest.mark.skipon_spawn_backend('subint',
reason=...)` marker (a617b521) across the suites
reproducing the `subint` GIL-contention / starvation
hang classes doc'd in `ai/conc-anal/subint_*_issue.md`.
Deats,
- Module-level `pytestmark` on full-file-hanging suites:
- `tests/test_cancellation.py`
- `tests/test_inter_peer_cancellation.py`
- `tests/test_pubsub.py`
- `tests/test_shm.py`
- Per-test decorator where only one test in the file
hangs:
- `tests/discovery/test_registrar.py
::test_stale_entry_is_deleted` — replaces the
inline `if start_method == 'subint': pytest.skip`
branch with a declarative skip.
- `tests/test_subint_cancellation.py
::test_subint_non_checkpointing_child`.
- A few per-test decorators are left commented-in-
place as breadcrumbs for later finer-grained unskips.
Also, some nearby tidying in the affected files:
- Annotate loose fixture / test params
(`pytest.FixtureRequest`, `str`, `tuple`, `bool`) in
`tests/conftest.py`, `tests/devx/conftest.py`, and
`tests/test_cancellation.py`.
- Normalize `"""..."""` → `'''...'''` docstrings per
repo convention on a few touched tests.
- Add `timeout=6` / `timeout=10` to
`@tractor_test(...)` on `test_cancel_infinite_streamer`
and `test_some_cancels_all`.
- Drop redundant `spawn_backend` param from
`test_cancel_via_SIGINT`; use `start_method` in the
`'mp' in ...` check instead.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 4b2a0886c3)
(factored: dropped spawn-backend-only path: tests/test_subint_cancellation.py)
Add a hard process-level wall-clock bound on a test
known to wedge un-Ctrl-C-ably under an in-dev spawn
backend, so an unattended suite run can't hang
indefinitely.
Deats,
- New `testing` dep: `pytest-timeout>=2.3`.
- `test_stale_entry_is_deleted`:
`@pytest.mark.timeout(3, method='thread')`. The
`method='thread'` choice is deliberate —
`method='signal'` routes via `SIGALRM` which can be
starved by the same GIL-hostage path that drops
`SIGINT`, so it'd never actually fire in the
starvation case.
At timeout, `pytest-timeout` hard-kills the pytest
process itself — that's the intended behavior here;
the alternative is the suite never returning.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 189f4e3f72e9f1eda5d24bcbab5743f7e35bd913)
(factored: kept pyproject + tests/discovery/test_registrar.py parts of
"Wall-cap `subint` audit tests via `pytest-timeout`"; dropped
tests/test_subint_cancellation.py)
Wrap the test's `trio.run(main)` in
`dump_on_hang(seconds=20)` so any future hang
regression captures a stack dump for triage instead
of wedging CI silently; under the default backends
it's a no-op safety net.
Includes a "KNOWN ISSUE" comment block documenting
the (future) `subint` backend hang classes observed
against this test during Phase B bringup (#379).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 4a3254583b)
(factored: kept only the tests/discovery/test_registrar.py part of
"Doc `subint` backend hang classes + arm `dump_on_hang`"; dropped
subint conc-anal docs + tests/test_subint_cancellation.py)
Since we're devving subints we require the 3.14+ stdlib API
and a couple compiled libs don't support it yet, namely:
- `cffi`, which we're only using for the `.ipc._linux` eventfd
stuff (now factored into `hotbaud` anyway).
- `greenback`, which requires `greenlet` which doesn't seem to be
wheeled yet
* on nixos the sdist build was failing due to lack of `g++` which
i don't care to figure out rn since we don't need `.devx` stuff
immediately for this subints prototype.
* [ ] we still need to adjust any dependent suites to skip.
Adjust `test_ringbuf` to skip on import failure.
Also project wide,
- pin us to py 3.13+ in prep for last-2-minor-version policy.
- drop `msgspec>=0.20.0`, the first release with py3.14 support.
(cherry picked from commit d2ea8aa2de)
Import and apply `cpu_scaling_factor()` from
`conftest`; bump base from 3.6 -> 4 and multiply
through so CI boxes with slow CPUs don't flake.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Deats,
- use `proc.poll() is None` in `sig_prog()` to
distinguish "still running" from exit code 0;
drop stale `breakpoint()` from fallback kill
path (would hang CI).
- add missing `raise` on the `RuntimeError` in
`async_main()` when no tpt bind addrs given.
- clean up stale uid entries from the registrar
`_registry` when addr eviction empties the
addr list.
- update `discovery.__init__` docstring to match
the new eager `._multiaddr` import.
- fix `registar` -> `registrar` typo in teardown
report log msg.
Review: PR #429 (Copilot)
https://github.com/goodboy/tractor/pull/429
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
`get_random()` can produce the same UDS filename for a given
pid+actor-state, so the "disjoint addrs" premise doesn't always hold.
Gate the `len(bound) >= 2` assertion on whether the registry and bind
addrs actually differ via `expect_disjoint`.
Also,
- drop unused `partial` import
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Retry signal delivery in `sig_prog()` up to `tries`
times (default 3) w/ `canc_timeout` sleep between
attempts; only fall back to `_KILL_SIGNAL` after all
retries exhaust. Bump default timeout 0.1 -> 0.2.
Also,
- `test_multi_nested_subactors_error_through_nurseries`
gives the first prompt iteration a 5s timeout even
on linux bc the initial crash sequence can be slow
to arrive at a `pdb` prompt
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Adjust all imports to match.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Provide a service-table parsing API for downstream projects (like
`piker`) to declare per-actor transport bind addresses as a config map
of actor-name -> multiaddr strings (e.g. from a TOML `[network]`
section).
Deats,
- `EndpointsTable` type alias: input `dict[str, list[str|tuple]]`.
- `ParsedEndpoints` type alias: output `dict[str, list[Address]]`.
- `parse_endpoints()` iterates the table and delegates each entry to the
existing `tractor.discovery._discovery.wrap_address()` helper, which
handles maddr strings, raw `(host, port)` tuples, and pre-wrapped
`Address` objs.
- UDS maddrs use the multiaddr spec name `/unix/...` (not tractor's
internal `/uds/` proto_key)
Also add new tests,
- 7 new pure unit tests (no trio runtime): TCP-only, mixed tpts,
unwrapped tuples, mixed str+tuple, unsupported proto (`/udp/`),
empty table, empty actor list
- all 22 multiaddr tests pass rn.
Prompt-IO:
ai/prompt-io/claude/20260413T205048Z_269d939c_prompt_io.md
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add 9 test variants (6 fns) covering all three
`tpt_bind_addrs` code paths in `open_root_actor()`:
- registrar w/ explicit bind (eq, subset, disjoint)
- non-registrar w/ explicit bind (same/diff
bindspace) using `daemon` fixture
- non-registrar default random bind (baseline)
- maddr string input parsing
- registrar merge produces union
- `open_nursery()` forwards `tpt_bind_addrs`
Fix type-mixing bug at `_root.py:446` where the
registrar merge path did `set(Address + tuple)`,
preventing dedup and causing double-bind `OSError`.
Wrap `uw_reg_addrs` before the set union so both
sides are `Address` objs.
Also,
- add prompt-io output log for this session
- stage original prompt input for tracking
Prompt-IO: ai/prompt-io/claude/20260413T192116Z_f851f28_prompt_io.md
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Use `cpu_scaling_factor()` headroom in
`test_peer_spawns_and_cancels_service_subactor`'s `fail_after` to avoid
flaky timeouts on throttled CI runners. Rename `arbiter_addr=` ->
`registry_addrs=[..]` throughout `test_spawning` and
`test_task_broadcasting` suites to match the current `open_root_actor()`
/ `open_nursery()` API.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Drop the `.lstrip('/')` on the unix protocol value
so the lib-prepended `/` restores the absolute-path
semantics that `mk_maddr()` strips when encoding.
Pass `Path` components (not `str`) to `UDSAddress`.
Also, update all UDS test params to use absolute
paths (`/tmp/tractor_test/...`, `/tmp/tractor_rt/...`)
matching real runtime sockpath behavior; tighten
`test_parse_maddr_uds` to assert exact `filedir`.
Review: PR #429 (copilot-pull-request-reviewer[bot])
https://github.com/goodboy/tractor/pull/429#pullrequestreview-4018448152
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Cover `parse_maddr()` with unit tests for tcp/ipv4,
tcp/ipv6, uds, and unsupported-protocol error paths,
plus full `addr -> mk_maddr -> str -> parse_maddr`
roundtrip verification.
Adds,
- a `_maddr_to_tpt_proto` inverse-mapping assertion.
- an `wrap_address()` maddr-string acceptance test.
- a `test_reg_then_unreg_maddr` end-to-end suite which audits passing
the registry addr as multiaddr str through the entire runtime.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
All tests are registrar-actor integration scenarios
sharing intertwined helpers + `enable_modules=[__name__]`
task fns, so keep as one mod but rename to reflect
content. Now lives alongside `test_multiaddr.py` in
the new `tests/discovery/` subpkg.
Also,
- update 5 refs in `/run-tests` SKILL.md to match
the new path
- add `discovery/` subdir to the test directory
layout tree
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Cover `_tpt_proto_to_maddr` mapping, TCP (ipv4/ipv6),
UDS, unsupported `proto_key` error, and round-trip
re-parse for both transport types.
Deats,
- new `tests/discovery/` subpkg w/ empty `__init__.py`
- `test_tpt_proto_to_maddr_mapping`: verify `tcp` and
`uds` entries
- `test_mk_maddr_tcp_ipv4`: full assertion on
`/ip4/127.0.0.1/tcp/1234` incl protocol iteration
- `test_mk_maddr_tcp_ipv6`: verify `/ip6/::1/tcp/5678`
- `test_mk_maddr_uds`: relative `filedir` bc the
multiaddr parser rejects double-slash from abs paths
- `test_mk_maddr_unsupported_proto_key`: `ValueError`
on `proto_key='quic'` via `SimpleNamespace` mock
- `test_mk_maddr_roundtrip`: parametrized over tcp +
uds, re-parse `str(maddr)` back through `Multiaddr`
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Parametrize `test_loglevel_propagated_to_subactor`
across `'debug'`, `'cancel'`, `'critical'` levels
(was hardcoded to just `'critical'`) and move it
above the parent-main tests for logical grouping.
Also,
- add `start_method: str` annotations throughout
- use `portal.wait_for_result()` in
`test_most_beautiful_word` (replaces `.result()`)
- expand mod docstring to describe test coverage
- reformat `check_parent_main_inheritance` docstr
Review: PR #438 (Copilot)
https://github.com/goodboy/tractor/pull/438
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Replace the subproc-based test harness with inline
`tractor.open_nursery()` calls that directly check
`actor._parent_main_data` instead of comparing
`__main__.__name__` across a process boundary
(which is a no-op under pytest bc the parent
`__main__` is `pytest.__main__`).
Deats,
- delete `tests/spawn_test_support/` pkg (3 files)
- add `check_parent_main_inheritance()` helper fn
that asserts on `_parent_main_data` emptiness
- rewrite both `run_in_actor` and `start_actor`
parent-main tests as inline async fns
- drop `tmp_path` fixture and unused imports
Review: PR #434 (goodboy, Copilot)
https://github.com/goodboy/tractor/pull/434
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Move the subprocess probe into dedicated spawn test support files so the inheritance tests cover the real __main__ replay path without monkeypatching or inline script strings.
Clean up mutable defaults, give parent-main bootstrap data a named type, and add direct start_actor coverage so the opt-out change is clearer to review.
Use `inherit_parent_main` across the actor APIs and helper to better describe the behavior, and restore the reviewer note at child bootstrap where the inherited `__main__` data is copied from `SpawnSpec`.
Keep actor-owned parent-main capture and let `_mp_figure_out_main()` decide whether to return `__main__` bootstrap data, avoiding the extra SpawnSpec plumbing while preserving the per-actor flag.
Keep trio child bootstrap data in the spawn handshake instead of stashing it on Actor state so the replay opt-out stays explicit and avoids stale-looking runtime fields.
Let actor callers skip replaying the parent __main__ during child startup so downstream integrations can avoid inheriting incompatible bootstrap state without changing the default spawn behavior.
The per-`ctx_key` locking fix in f086222d intended to resolve the
teardown race reproduced by the new test suite, so the test SHOULD now
pass. TLDR, it doesn't Bp
Also add `collapse_eg()` to the test's ctx-manager stack so that when
run with `pytest <...> --tpdb` we'll actually `pdb`-REPL the RTE when it
hits (previously an assert-error).
(this commit-msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Reproduce the piker `open_cached_client('kraken')` scenario: identical
`ctx_key` callers share one cached resource, and a new task re-enters
during `__aexit__` — hitting `assert not resources.get()` bc `values`
was popped but `resources` wasn't yet.
Deats,
- `test_moc_reentry_during_teardown` uses an `in_aexit` event to
deterministically land in the teardown window.
- marked `xfail(raises=AssertionError)` against unpatched code (fix in
`9e49eddd` or wtv lands on the `maybe_open_ctx_locking` or thereafter
patch branch).
Also, add prompt-io log for the session.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Prompt-IO: ai/prompt-io/claude/20260406T193125Z_85f9c5d_prompt_io.md
Add `test_per_ctx_key_resource_lifecycle` to verify that per-key user
tracking correctly tears down resources independently - exercises the
fix from 02b2ef18 where a global `_Cache.users` counter caused stale
cache hits when the same `acm_func` was called with different kwargs.
Also, add a paired `acm_with_resource()` helper `@acm` that yields its
`resource_id` for per-key testing in the above suite.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Prompt-IO: ai/prompt-io/claude/20260406T172848Z_02b2ef1_prompt_io.md
Namely with multiple pre-sleep `delay`-parametrizations before either,
- parent-scope cancel-calling (as originally) or,
- depending on the new `cancel_by_cs: bool` suite parameter, optionally
just immediately exiting from (the newly named)
`maybe_cancel_outer_cs()` a checkpoint.
In the latter case we ensure we **don't** inf sleep to avoid leaking
those tasks into the `Actor._service_tn` (though we should really have
a better soln for this)..
Deats,
- make `cs` args optional and adjust internal logic to match.
- add some notes around various edge cases and issues with using the
actor-service-tn as the scope by default.
- Remove leftover `await an.cancel()` in
`test_registered_custom_err_relayed`; the
nursery already cancels on scope exit.
- Fix `This document` -> `This documents` typo in
`test_unregistered_err_still_relayed` docstring.
Review: PR #426 (Copilot)
https://github.com/goodboy/tractor/pull/426
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add a teensie unit test to match.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Drop the `xfail` test and instead add a new one that ensures the
`tractor._exceptions` fixes enable graceful relay of
remote-but-unregistered error types via the unboxing of just the
`rae.src_type_str/boxed_type_str` content. The test also ensures
a warning is included with remote error content indicating the user
should register their error type for effective cross-actor re-raising.
Deats,
- add `test_unregistered_err_still_relayed`: verify the
`RemoteActorError` IS raised with `.boxed_type`
as `None` but `.src_type_str`, `.boxed_type_str`,
and `.tb_str` all preserved from the IPC msg.
- drop `test_unregistered_boxed_type_resolution_xfail`
since the new above case covers it and we don't need to have
an effectively entirely repeated test just with an inverse assert
as it's last line..
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Verify registered custom error types round-trip correctly over IPC via
`reg_err_types()` + `get_err_type()`.
Deats,
- `TestRegErrTypesPlumbing`: 5 unit tests for the type-registry plumbing
(register, lookup, builtins, tractor-native types, unregistered
returns `None`)
- `test_registered_custom_err_relayed`: IPC end-to-end for a registered
`CustomAppError` checking `.boxed_type`, `.src_type`, and `.tb_str`
- `test_registered_another_err_relayed`: same for `AnotherAppError`
(multi-type coverage)
- `test_unregistered_custom_err_fails_lookup`: `xfail` documenting that
`.boxed_type` can't resolve without `reg_err_types()` registration
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
The `open_actor_cluster()` teardown hangs
intermittently on UDS when `gather_contexts(mngrs=())`
raises `ValueError` mid-setup; likely a race in the
actor-nursery cleanup vs UDS socket shutdown. TCP
passes reliably (5/5 runs).
- Add `tpt_proto` fixture param to the test
- `pytest.skip()` on UDS with a TODO for deeper
investigation of `._clustering`/`._supervise`
teardown paths
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Factor the CPU-freq-scaling helper out of
`test_legacy_one_way_streaming` into `conftest.py`
alongside a new `cpu_scaling_factor()` convenience fn
that returns a latency-headroom multiplier (>= 1.0).
Apply it to the two other flaky-timeout tests,
- `test_cancel_via_SIGINT_other_task`: 2s -> scaled
- `test_example[we_are_processes.py]`: 16s -> scaled
Deats,
- add `get_cpu_state()` + `cpu_scaling_factor()` to
`conftest.py` so all test mods can share the logic.
- catch `IndexError` (empty glob) in addition to
`FileNotFoundError`.
- rename `factor` var -> `headroom` at call sites for
clarity on intent.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add `get_cpu_state()` helper to read CPU freq settings
from `/sys/devices/system/cpu/` and use it to compensate
the perf time-limit when `auto-cpufreq` (or similar)
scales down the max frequency.
Deats,
- read `*_pstate_max_freq` and `scaling_max_freq`
to compute a `cpu_scaled` ratio.
- when `cpu_scaled != 1.`, increase `this_fast` limit
proportionally (factoring dual-threaded cores).
- log a warning via `test_log` when compensating.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Move the `Arbiter` class out of `runtime._runtime` into its
logical home at `discovery._registry` as `Registrar(Actor)`.
This completes the long-standing terminology migration from
"arbiter" to "registrar/registry" throughout the codebase.
Deats,
- add new `discovery/_registry.py` mod with `Registrar`
class + backward-compat `Arbiter = Registrar` alias.
- rename `Actor.is_arbiter` attr -> `.is_registrar`;
old attr now a `@property` with `DeprecationWarning`.
- `_root.py` imports `Registrar` directly for
root-actor instantiation.
- export `Registrar` + `Arbiter` from `tractor.__init__`.
- `_runtime.py` re-imports from `discovery._registry`
for backward compat.
Also,
- update all test files to use `.is_registrar`
(`test_local`, `test_rpc`, `test_spawning`,
`test_discovery`, `test_multi_program`).
- update "arbiter" -> "registrar" in comments/docstrings
across `_discovery.py`, `_server.py`, `_transport.py`,
`_testing/pytest.py`, and examples.
- drop resolved TODOs from `_runtime.py` and `_root.py`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Adjust all `tractor._state`, `tractor._addr`,
`tractor._supervise`, etc. refs in tests and examples
to use the new `runtime/`, `discovery/`, `spawn/` paths.
Also,
- use `tractor.debug_mode()` pub API instead of
`tractor._state.debug_mode()` in a few test mods
- add explicit `timeout=20` to `test_respawn_consumer_task`
`@tractor_test` deco call
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- Use `bidict.forceput()` in `register_actor()` to handle
duplicate addr values from stale entries or actor restarts.
- Fix `uid` annotation to `tuple[str, str]|None` in
`maybe_open_portal()` and handle the `None` return from
`delete_addr()` in log output.
- Pass explicit `registry_addrs=[reg_addr]` to `open_nursery()`
and `find_actor()` in `test_stale_entry_is_deleted` to ensure
the test uses the remote registrar.
- Update `query_actor()` docstring to document the new
`(addr, reg_portal)` yield shape.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
By spawning an actor task that immediately shuts down the transport
server and then sleeps, verify that attempting to connect via the
`._discovery.find_actor()` helper delivers `None` for the `Portal`
value.
Relates to #184 and #216
- `test_inter_peer_cancellation`: swap all `.uid` refs
on `Actor`, `Channel`, and `Portal` to `.aid.uid`
- `test_legacy_one_way_streaming`: same + fix `print()`
to multiline style
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Use 6s timeout on non-linux (vs 4s) in
`test_cancel_while_childs_child_in_sync_sleep()` to avoid
flaky `TooSlowError` on slower CI runners.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Removes the `pytest` deprecation warns and attempts to avoid
some de-registration raciness, though i'm starting to think the
real issue is due to not having the fixes from #366 (which handle
the new dereg on `OSError` case from UDS)?
- use `.channel.aid.uid` over deprecated `.channel.uid`
throughout `test_discovery.py`.
- add polling loop (up to 5s) for subactor de-reg check
in `spawn_and_check_registry()` to handle slower
transports like UDS where teardown takes longer.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add UDS skip-guard to `test_streaming_to_actor_cluster()`
and plumb `tpt_proto` through the `@tractor_test` wrapper
so transport-parametrized tests can receive it.
Deats,
- skip cluster test when `tpt_proto == 'uds'` with
descriptive msg, add TODO about `@pytest.mark.no_tpt`.
- add `tpt_proto: str|None` param to inner wrapper in
`tractor_test()`, forward to decorated fn when its sig
accepts it.
- register custom `no_tpt` marker via `pytest_configure()`
to avoid unknown-marker warnings.
- add masked todo for `no_tpt` marker-check code in `tpt_proto` fixture
(needs fn-scope to work, left as TODO).
- add `request` param to `tpt_proto` fixture for future
marker inspection.
Also,
- add doc-string to `test_streaming_to_actor_cluster()`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Namely the workaround expected exc branches added in ef7ed7a for the UDS
parametrization. With the new boxing of the underlying CREs as
tpt-closed, we can expect the same exc outcomes as in the TCP cases.
Also this tweaks some error report logging content used while debugging
this,
- properly `repr()` the `TransportClosed.src_exc`-type from
the maybe emit in `.report_n_maybe_raise()`.
- remove the redudant `chan.raddr` from the "closed abruptly"
header in the tpt-closed handler of `._rpc.process_messages()`,
the `Channel.__repr__()` now contains it by default.
Forward the `tpt_proto` fixture val into spawned daemon
subprocesses via `run_daemon(enable_transports=..)` and
sync `_runtime_vars['_enable_tpts']` in the `tpt_proto`
fixture so sub-actors inherit the transport setting.
Deats,
- add `enable_transports={enable_tpts}` to the daemon
spawn-cmd template in `tests/conftest.py`.
- set `_state._runtime_vars['_enable_tpts']` in the
`tpt_proto` fixture in `_testing/pytest.py`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
For more reliability with the oob registrar using tests
via the `daemon` fixture,
- increase spawn-wait to `2` in CI, `1` OW; drop
the old py<3.7 branch.
- move `_ci_env` to module-level (above `_non_linux`)
so `_PROC_SPAWN_WAIT` can reference it at parse time.
- add `test_log` fixture param to `daemon()`, use it
for the error-on-exit log line instead of bare `log`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add `expect_timeout: float` param to `_spawn()`
so individual tests can tune `pexpect` timeouts
instead of relying on the hard-coded 3/10 split.
Deats,
- default to 4s, bump by +6 on non-linux CI.
- use walrus `:=` to capture resolved timeout and assert
`spawned.timeout == timeout` for sanity.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Event on linux i was noticing lotsa false negatives based on sub
teardown race conditions, so this tries to both make way for
(eventually?) expanding the set of suite cases and ensure the current
ones are more reliable on every run.
The main change is to hange the `error_in_child=False` case to use
parent-side-cancellation via a new `trio.move_on_after(timeout)` instead
of `actor.cancel_soon()` (which is now toggled by a new `self_cancel:
bool` but unused rn), and add better teardown assertions.
Low level deats,
- add `rent_cancel`/`self_cancel` params to
`crash_and_clean_tmpdir()` for different cancel paths;
default to `rent_cancel=True` which just sleeps forever
letting the parent's timeout do the work.
- use `trio.move_on_after()` with longer timeouts per
case: 1.6s for error, 1s for cancel.
- use the `.move_on_after()` cancel-scope to assert `.cancel_called`
pnly when `error_in_child=False`, indicating we
parent-graceful-cancelled the sub.
- add `loglevel` fixture, pass to `open_nursery()`.
- log caught `RemoteActorError` via console logger.
- add `ids=` to parametrize for readable test names.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add a 6s timeout guard around `test_streaming_to_actor_cluster()`
to catch hangs, and nest the `async with` block inside it.
Found this when running `pytest tests/ --tpt-proto uds`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Deliver `(LinkedTaskChannel, Any)` instead of the prior `(first, chan)`
order from `open_channel_from()` to match the type annotation and be
consistent with `trio.open_*_channel()` style where the channel obj
comes first.
- flip `yield first, chan` -> `yield chan, first`
- update type annotation + docstring to match
- swap all unpack sites in tests and examples
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Convert every remaining `to_trio`/`from_trio` fn-sig style
to the new unified `chan: LinkedTaskChannel` iface added in
prior commit (c46e9ee8).
Deats,
- `to_trio.send_nowait(val)` (1st call) -> `chan.started_nowait(val)`
- `to_trio.send_nowait(val)` (subsequent) -> `chan.send_nowait(val)`
- `await from_trio.get()` -> `await chan.get()`
Converted fns,
- `sleep_and_err()`, `push_from_aio_task()` in
`tests/test_infected_asyncio.py`
- `sync_and_err()` in `tests/test_root_infect_asyncio.py`
- `aio_streamer()` in
`tests/test_child_manages_service_nursery.py`
- `aio_echo_server()` in
`examples/infected_asyncio_echo_server.py`
- `bp_then_error()` in `examples/debugging/asyncio_bp.py`
Also,
- drop stale comments referencing old param names.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
With methods to comms similar to those that exist for the `trio` side,
- `.get()` which proxies verbatim to the `._to_aio: asyncio.Queue`,
- `.send_nowait()` which thin-wraps to `._to_trio: trio.MemorySendChannel`.
Obviously the more correct design is to break up the channel type into
a pair of handle types, one for each "side's" task in each event-loop,
that's hopefully coming shortly in a follow up patch B)
Also,
- fill in some missing doc strings, tweak some explanation comments and
update todos.
- adjust the `test_aio_errors_and_channel_propagates_and_closes()` suite
to use the new `chan` fn-sig-API with `.open_channel_from()` including
the new methods for msg comms; ensures everything added here works e2e.
Reorganize existing msg-related test suites under
a new `tests/msg/` subdir (matching `tests/devx/`
and `tests/ipc/` convention) and add unit tests for
the `_`-prefixed field filtering in `pformat()`.
Deats,
- `git mv` `test_ext_types_msgspec` and `test_pldrx_limiting` into
`tests/msg/`.
- add `__init__.py` + `conftest.py` for the new test sub-pkg.
- add new `test_pretty_struct.py` suite with 8 unit tests:
- parametrized field visibility (public shown, `_`-private hidden,
mixed)
- direct `iter_struct_ppfmt_lines()` assertion
- nested struct recursion filtering
- empty struct edge case
- real `MsgDec` via `mk_dec()` hiding `_dec`
- `repr()` integration via `Struct.__repr__()`
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
It seems something is up with their VM-img or wtv bc i keep increasing
the subproc timeout and nothing is changing. Since i can't try
a `-xlarge` one without paying i'm just muting this test for now.
- convert all doc-strings to `'''` multiline style.
- rename `nursery` -> `an`, `n` -> `tn` to match
project-wide conventions.
- add type annotations to fn params (fixtures, test
helpers).
- break long lines into multiline style for fn calls,
assertions, and `parametrize` decorator lists.
- add `ids=` to `@pytest.mark.parametrize`.
- use `'` over `"` for string literals.
- add `from typing import Callable` import.
- drop spurious blank lines inside generators.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Via ensuring `all(mark.args)` on wtv expressions are arg-passed to the
mark decorator; use it to skip the `test_subactor_breakpoint` suite when
`ctlc=True` since it seems too unreliable in CI.
There's a very sloppy registrar-actor-bootup syncing approach used in
this fixture (basically just guessing how long to sleep to wait for it
to init and bind the registry socket) using a `global _PROC_SPAWN_WAIT`
that needs to be made more reliable. But, for now i'm just playing along
with what's there to try and make less CI runs flaky by,
- sleeping *another* 1s when run from non-linux CI.
- reporting stdout (if any) alongside stderr on teardown.
- not strictly requiring a `proc.returncode == -2` indicating successful
graceful cancellation via SIGINT; instead we now error-log and only
raise the RTE on `< 0` exit code.
* though i can't think of why this would happen other then an
underlying crash which should propagate.. but i don't think any test
suite does this intentionally rn?
* though i don't think it should ever happen, having a CI run
"error"-fail bc of this isn't all that illuminating, if there is
some weird `.returncode == 0` termination case it's likely not
a failure?
For later, see the new todo list; we should sync to some kind of "ping"
polling of the tpt address if possible which is already easy enough for
TCP reusing an internal closure from `._root.open_root_actor()`.
Namely, after trying to get `test_multi_daemon_subactors` to work for
the `ctlc=True` case (for way too long), give up on that (see
todo/comments) and skip it; the normal case works just fine. Also tweak
the `test_ctxep_pauses_n_maybe_ipc_breaks` pattern matching for
non-`'UDS'` per the previous script commit; we can't use UDS alongside
`pytest`'s tmp dir generation, mega lulz.
To be a null default and set to `0.1` when not passed by the caller so
as to avoid having to pass `0.1` if you wanted the
param-defined-default.
Also,
- in the `spawn()` fixtures's `unset_colors()` closure, add in a masked
`os.environ['NO_COLOR'] = '1'` since i found it while trying to debug
debugger tests.
- always return the `child.before` content from `assert_before()`
helper; again it comes in handy when debugging console matching tests.
Improve the `spawn` fixture teardown logic in
`tests/devx/conftest.py` fixing the while-else bug, and fix
`test_advanced_faults` genexp for `TransportClosed` exc type
checking.
Deats,
- replace broken `while-else` pattern with direct
`if ptyproc.isalive()` check after the SIGINT loop.
- fix undefined `spawned` ref -> `ptyproc.isalive()` in
while condition.
- improve walrus expr formatting in timeout check (multiline
style).
Also fix `test_ipc_channel_break_during_stream()` assertion,
- wrap genexp in `all()` call so it actually checks all excs
are `TransportClosed` instead of just creating an unused
generator.
(this patch was suggested by copilot in,
https://github.com/goodboy/tractor/pull/411)
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Adjust `basic_echo_server()` default sequence len to avoid the race
where the 'tell_little_bro()` finished streaming **before** the
echo-server sub is cancelled by its peer subactor (which is the whole
thing we're testing!).
Deats,
- bump `rng_seed` default from 50 -> 100 to ensure peer
cancel req arrives before echo dialog completes on fast hw.
- add `trio.sleep(0.001)` between send/receive in msg loop on the
"client" streamer side to give cancel request transit more time to
arrive.
Also,
- add more native `tractor`-type hints.
- reflow `basic_echo_server()` doc-string for 67 char limit
- add masked `pause()` call with comment about unreachable
code path
- alphabetize imports: mv `current_actor` and `open_nursery`
below typed imports
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add `TransportClosed` to except clauses where `trio`'s own
resource-closed errors are already caught, ensuring our
higher-level tpt exc is also tolerated in those same spots.
Likely i will follow up with a removal of the `trio` variants since most
*should be* caught and re-raised as tpt-closed out of the `.ipc` stack
now?
Add `TransportClosed` to various handler blocks,
- `._streaming.MsgStream.aclose()/.send()` except blocks.
- the broken-channel except in `._context.open_context_from_portal()`.
- obvi import it where necessary in those ^ mods.
Adjust `test_advanced_faults` suite + exs-script to match,
- update `ipc_failure_during_stream.py` example to catch
`TransportClosed` alongside `trio.ClosedResourceError`
in both the break and send-check paths.
- shield the `trio.sleep(0.01)` after tpt close in example to avoid
taskc-raise/masking on that checkpoint since we want to simulate
waiting for a user to send a KBI.
- loosen `ExceptionGroup` assertion to `len(excs) <= 2` and ensure all
excs are `TransportClosed`.
- improve multi-line formatting, minor style/formatting fixes in
condition expressions.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code