Commit Graph

649 Commits (subint_fork_proto)

Author SHA1 Message Date
Gud Boi 43bd6a6410 Wire `subint_forkserver` as first-class backend
Promote `_subint_forkserver` from primitives-only into a
registered spawn backend: `'subint_forkserver'` is now a
`SpawnMethodKey` literal, dispatched via `_methods` to
the new `subint_forkserver_proc()` target, feature-gated
under the existing `subint`-family py3.14+ case, and
selectable via `--spawn-backend=subint_forkserver`.

Deats,
- new `subint_forkserver_proc()` spawn target in
  `_subint_forkserver`:
  - mirrors `trio_proc()`'s supervision model — real OS
    subprocess so `Portal.cancel_actor()` + `soft_kill()`
    on graceful teardown, `os.kill(SIGKILL)` on hard-reap
    (no `_interpreters.destroy()` race to fuss over bc the
    child lives in its own process)
  - only real diff from `trio_proc` is the spawn mechanism:
    fork from a main-interp worker thread via
    `fork_from_worker_thread()` (off-loaded to trio's
    thread pool) instead of `trio.lowlevel.open_process()`
  - child-side `_child_target` closure runs
    `tractor._child._actor_child_main()` with
    `spawn_method='trio'` — the child is a regular trio
    actor, "subint_forkserver" names how the parent
    spawned, not what the child runs
- new `_ForkedProc` class — thin `trio.Process`-compatible
  shim around a raw OS pid: `.poll()` via
  `waitpid(WNOHANG)`, async `.wait()` off-loaded to a trio
  cache thread, `.kill()` via `SIGKILL`, `.returncode`
  cached for repeat calls. `.stdin`/`.stdout`/`.stderr`
  are `None` (fork-w/o-exec inherits parent FDs; we don't
  marshal them) which matches `soft_kill()`'s `is not None`
  guards

Also, new backend-tier test
`test_subint_forkserver_spawn_basic` drives the registered
backend end-to-end via `open_root_actor` + `open_nursery` +
`run_in_actor` w/ a trivial portal-RPC round-trip. Uses a
`forkserver_spawn_method` fixture to flip
`_spawn_method`/`_ctx` for the test's duration + restore on
teardown (so other session-level tests don't observe the
global flip). Test module docstring reworked to describe
the three tiers now covered: (1) primitive-level, (2)
parent-trio-driven primitives, (3) full registered backend.

Status: still-open work (tracked on `tractor#379`) doc'd
inline in the module docstring — no cancel/hard-kill stress
coverage yet, child-side subint-hosted root runtime still
future (gated on `msgspec#563`), thread-hygiene audit
pending the same unblock.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 18:49:23 -04:00
Gud Boi 1d4867e51c Add trio-parent tests for `_subint_forkserver`
New pytest module `tests/spawn/test_subint_forkserver.py`
drives the forkserver primitives from inside a real
`trio.run()` in the parent — the runtime shape tractor will
actually use when we wire up a `subint_forkserver` spawn
backend proper. Complements the standalone no-trio-in-parent
`ai/conc-anal/subint_fork_from_main_thread_smoketest.py`.

Deats,
- new test pkg `tests/spawn/` (+ empty `__init__.py`)
- two tests, both `@pytest.mark.timeout(30, method='thread')`
  for the GIL-hostage safety reason doc'd in
  `ai/conc-anal/subint_sigint_starvation_issue.md`:
  - `test_fork_from_worker_thread_via_trio` — parent-side
    plumbing baseline. `trio.run()` off-loads forkserver
    prims via `trio.to_thread.run_sync()` + asserts the
    child reaps cleanly
  - `test_fork_and_run_trio_in_child` — end-to-end: forked
    child calls `run_subint_in_worker_thread()` with a
    bootstrap str that does `trio.run()` in a fresh subint
- both tests wrap the inner `trio.run()` in a
  `dump_on_hang()` for post-mortem if the outer
  `pytest-timeout` fires
- intentionally NOT using `--spawn-backend` — the tests
  drive the primitives directly rather than going through
  tractor's spawn-method registry (which the forkserver
  isn't plugged into yet)

Also, rename `run_trio_in_subint()` →
`run_subint_in_worker_thread()` for naming consistency with
the sibling `fork_from_worker_thread()`. The action is really
"host a subint on a worker thread", not specifically "run
trio" — trio just happens to be the typical payload.
Propagate the rename to the smoketest.

Further, add a "TODO — cleanup gated on msgspec PEP 684
support" section to the `_subint_forkserver` module
docstring: flags the dedicated-`threading.Thread` design as
potentially-revisable once isolated-mode subints are viable
in tractor. Cross-refs `msgspec#563` + `tractor#379` and
points at an audit-plan conc-anal doc we'll add next.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-22 18:00:06 -04:00
Gud Boi 99d70337b7 Mark `subint`-hanging tests with `skipon_spawn_backend`
Adopt the `@pytest.mark.skipon_spawn_backend('subint',
reason=...)` marker (a617b521) across the suites
reproducing the `subint` GIL-contention / starvation
hang classes doc'd in `ai/conc-anal/subint_*_issue.md`.

Deats,
- Module-level `pytestmark` on full-file-hanging suites:
  - `tests/test_cancellation.py`
  - `tests/test_inter_peer_cancellation.py`
  - `tests/test_pubsub.py`
  - `tests/test_shm.py`
- Per-test decorator where only one test in the file
  hangs:
  - `tests/discovery/test_registrar.py
    ::test_stale_entry_is_deleted` — replaces the
    inline `if start_method == 'subint': pytest.skip`
    branch with a declarative skip.
  - `tests/test_subint_cancellation.py
    ::test_subint_non_checkpointing_child`.
- A few per-test decorators are left commented-in-
  place as breadcrumbs for later finer-grained unskips.

Also, some nearby tidying in the affected files:
- Annotate loose fixture / test params
  (`pytest.FixtureRequest`, `str`, `tuple`, `bool`) in
  `tests/conftest.py`, `tests/devx/conftest.py`, and
  `tests/test_cancellation.py`.
- Normalize `"""..."""` → `'''...'''` docstrings per
  repo convention on a few touched tests.
- Add `timeout=6` / `timeout=10` to
  `@tractor_test(...)` on `test_cancel_infinite_streamer`
  and `test_some_cancels_all`.
- Drop redundant `spawn_backend` param from
  `test_cancel_via_SIGINT`; use `start_method` in the
  `'mp' in ...` check instead.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-21 21:33:15 -04:00
Gud Boi e2b1035ff0 Skip `test_stale_entry_is_deleted` hanger with `subint`s 2026-04-21 13:36:41 -04:00
Gud Boi 5ea5fb211d Wall-cap `subint` audit tests via `pytest-timeout`
Add a hard process-level wall-clock bound on the two
known-hanging subint-backend tests so an unattended
suite run can't wedge indefinitely in either of the
hang classes doc'd in `ai/conc-anal/`.

Deats,
- New `testing` dep: `pytest-timeout>=2.3`.
- `test_stale_entry_is_deleted`:
  `@pytest.mark.timeout(3, method='thread')`. The
  `method='thread'` choice is deliberate —
  `method='signal'` routes via `SIGALRM` which is
  starved by the same GIL-hostage path that drops
  `SIGINT` (see `subint_sigint_starvation_issue.md`),
  so it'd never actually fire in the starvation case.
- `test_subint_non_checkpointing_child`: same
  decorator, same reasoning (defense-in-depth over
  the inner `trio.fail_after(15)`).

At timeout, `pytest-timeout` hard-kills the pytest
process itself — that's the intended behavior here;
the alternative is the suite never returning.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-20 20:45:56 -04:00
Gud Boi 35796ec8ae Doc `subint` backend hang classes + arm `dump_on_hang`
Classify and write up the two distinct hang modes hit during Phase
B subint bringup (issue #379) so future triage doesn't re-derive them
from scratch.

Deats, two new `ai/conc-anal/` docs,
- `subint_sigint_starvation_issue.md`: abandoned legacy-subint thread
  + shared GIL → main trio loop starves → signal-wakeup-fd pipe fills
  → `SIGINT` silently dropped (`strace` shows `write() = EAGAIN` on the
  wakeup-fd). Un- Ctrl-C-able. Structurally a CPython limit; blocked on
  `msgspec` PEP 684 (jcrist/msgspec#563)

- `subint_cancel_delivery_hang_issue.md`: parent-side trio task parks on
  an orphaned IPC channel after subint teardown — no clean EOF delivered
  to the waiting receive. Ctrl-C-able (main loop iterates fine); OUR bug
  to fix. Candidate fix: explicit parent-side channel abort in
  `subint_proc`'s hard-kill teardown

Cross-link the docs from their test reproducers,
- `test_stale_entry_is_deleted` (→ starvation class): wrap
  `trio.run(main)` in `dump_on_hang(seconds=20)` so a future regression
  captures a stack dump. Kept un- skipped so the dump file is
  inspectable

- `test_subint_non_checkpointing_child` (→ delivery class): extend
  docstring with a "KNOWN ISSUE" block pointing at the analysis

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-20 16:41:07 -04:00
Gud Boi baf7ec54ac Add `subint` cancellation + hard-kill test audit
Lock in the escape-hatch machinery added to `tractor.spawn._subint`
during the Phase B.2/B.3 bringup (issue #379) so future stdlib
regressions or our own refactors don't silently re-introduce the
mid-suite hangs.

Deats,
- `test_subint_happy_teardown`: baseline — spawn a subactor, one portal
  RPC, clean teardown. If this breaks, something's wrong unrelated to
  the hard-kill shields.
- `test_subint_non_checkpointing_child`: cancel a subactor stuck in
  a non-checkpointing Python loop (`threading.Event.wait()` releases the
  GIL but never inserts a trio checkpoint). Validates the bounded-shield
  + daemon-driver-thread combo abandons the thread after
    `_HARD_KILL_TIMEOUT`.

Every test is wrapped in `trio.fail_after()` for a deterministic
per-test wall-clock ceiling (an unbounded audit would defeat itself) and
arms `tractor.devx.dump_on_hang()` so a hang captures a stack dump
— pytest's stderr capture swallows `faulthandler` output by default.

Gated via `pytest.importorskip('concurrent.interpreters')` and
a module-level skip when `--spawn-backend` isn't `'subint'`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-20 16:41:07 -04:00
Gud Boi 1037c05eaf Avoid skip `.ipc._ringbuf` import when no `cffi` 2026-04-20 16:39:32 -04:00
Gud Boi 2e9dbc5b12 Handle py3.14+ incompats as test skips
Since we're devving subints we require the 3.14+ stdlib API
and a couple compiled libs don't support it yet, namely:
- `cffi`, which we're only using for the `.ipc._linux` eventfd
  stuff (now factored into `hotbaud` anyway).
- `greenback`, which requires `greenlet` which doesn't seem to be
  wheeled yet
  * on nixos the sdist build was failing due to lack of `g++` which
    i don't care to figure out rn since we don't need `.devx` stuff
    immediately for this subints prototype.
  * [ ] we still need to adjust any dependent suites to skip.

Adjust `test_ringbuf` to skip on import failure.

Also project wide,
- pin us to py 3.13+ in prep for last-2-minor-version policy.
- drop `msgspec>=0.20.0`, the first release with py3.14 support.
2026-04-20 16:30:46 -04:00
Gud Boi 3867403fab Scale `test_open_local_sub_to_stream` timeout by CPU factor
Import and apply `cpu_scaling_factor()` from
`conftest`; bump base from 3.6 -> 4 and multiply
through so CI boxes with slow CPUs don't flake.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-16 20:03:32 -04:00
Gud Boi ed65301d32 Fix misc bugs caught by Copilot review
Deats,
- use `proc.poll() is None` in `sig_prog()` to
  distinguish "still running" from exit code 0;
  drop stale `breakpoint()` from fallback kill
  path (would hang CI).
- add missing `raise` on the `RuntimeError` in
  `async_main()` when no tpt bind addrs given.
- clean up stale uid entries from the registrar
  `_registry` when addr eviction empties the
  addr list.
- update `discovery.__init__` docstring to match
  the new eager `._multiaddr` import.
- fix `registar` -> `registrar` typo in teardown
  report log msg.

Review: PR #429 (Copilot)
https://github.com/goodboy/tractor/pull/429

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:15 -04:00
Gud Boi cd287c7e93 Fix `test_registrar_merge_binds_union` for UDS collision
`get_random()` can produce the same UDS filename for a given
pid+actor-state, so the "disjoint addrs" premise doesn't always hold.
Gate the `len(bound) >= 2` assertion on whether the registry and bind
addrs actually differ via `expect_disjoint`.

Also,
- drop unused `partial` import

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:15 -04:00
Gud Boi 86d4e0d3ed Harden `sig_prog()` retries, adjust debugger test timeouts
Retry signal delivery in `sig_prog()` up to `tries`
times (default 3) w/ `canc_timeout` sleep between
attempts; only fall back to `_KILL_SIGNAL` after all
retries exhaust. Bump default timeout 0.1 -> 0.2.

Also,
- `test_multi_nested_subactors_error_through_nurseries`
  gives the first prompt iteration a 5s timeout even
  on linux bc the initial crash sequence can be slow
  to arrive at a `pdb` prompt

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi c3d6cc9007 Rename `discovery._discovery` to `._api`
Adjust all imports to match.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi e90241baaa Add `parse_endpoints()` to `_multiaddr`
Provide a service-table parsing API for downstream projects (like
`piker`) to declare per-actor transport bind addresses as a config map
of actor-name -> multiaddr strings (e.g. from a TOML `[network]`
section).

Deats,
- `EndpointsTable` type alias: input `dict[str, list[str|tuple]]`.
- `ParsedEndpoints` type alias: output `dict[str, list[Address]]`.
- `parse_endpoints()` iterates the table and delegates each entry to the
  existing `tractor.discovery._discovery.wrap_address()` helper, which
  handles maddr strings, raw `(host, port)` tuples, and pre-wrapped
  `Address` objs.
- UDS maddrs use the multiaddr spec name `/unix/...` (not tractor's
  internal `/uds/` proto_key)

Also add new tests,
- 7 new pure unit tests (no trio runtime): TCP-only, mixed tpts,
  unwrapped tuples, mixed str+tuple, unsupported proto (`/udp/`),
  empty table, empty actor list
- all 22 multiaddr tests pass rn.

Prompt-IO:
ai/prompt-io/claude/20260413T205048Z_269d939c_prompt_io.md

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 7079a597c5 Add `test_tpt_bind_addrs.py` + fix type-mixing bug
Add 9 test variants (6 fns) covering all three
`tpt_bind_addrs` code paths in `open_root_actor()`:
- registrar w/ explicit bind (eq, subset, disjoint)
- non-registrar w/ explicit bind (same/diff
  bindspace) using `daemon` fixture
- non-registrar default random bind (baseline)
- maddr string input parsing
- registrar merge produces union
- `open_nursery()` forwards `tpt_bind_addrs`

Fix type-mixing bug at `_root.py:446` where the
registrar merge path did `set(Address + tuple)`,
preventing dedup and causing double-bind `OSError`.
Wrap `uw_reg_addrs` before the set union so both
sides are `Address` objs.

Also,
- add prompt-io output log for this session
- stage original prompt input for tracking

Prompt-IO: ai/prompt-io/claude/20260413T192116Z_f851f28_prompt_io.md

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi f881683c97 Tweak timeouts and rm `arbiter_addr` in tests
Use `cpu_scaling_factor()` headroom in
`test_peer_spawns_and_cancels_service_subactor`'s `fail_after` to avoid
flaky timeouts on throttled CI runners. Rename `arbiter_addr=` ->
`registry_addrs=[..]` throughout `test_spawning` and
`test_task_broadcasting` suites to match the current `open_root_actor()`
/ `open_nursery()` API.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 490fac432c Preserve absolute UDS paths in `parse_maddr()`
Drop the `.lstrip('/')` on the unix protocol value
so the lib-prepended `/` restores the absolute-path
semantics that `mk_maddr()` strips when encoding.
Pass `Path` components (not `str`) to `UDSAddress`.

Also, update all UDS test params to use absolute
paths (`/tmp/tractor_test/...`, `/tmp/tractor_rt/...`)
matching real runtime sockpath behavior; tighten
`test_parse_maddr_uds` to assert exact `filedir`.

Review: PR #429 (copilot-pull-request-reviewer[bot])
https://github.com/goodboy/tractor/pull/429#pullrequestreview-4018448152

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 5c4438bacc Add `parse_maddr()` tests + registrar maddr integ test
Cover `parse_maddr()` with unit tests for tcp/ipv4,
tcp/ipv6, uds, and unsupported-protocol error paths,
plus full `addr -> mk_maddr -> str -> parse_maddr`
roundtrip verification.

Adds,
- a `_maddr_to_tpt_proto` inverse-mapping assertion.
- an `wrap_address()` maddr-string acceptance test.
- a `test_reg_then_unreg_maddr` end-to-end suite which audits passing
  the registry addr as multiaddr str through the entire runtime.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 1f1e09a786 Move `test_discovery` to `tests/discovery/test_registrar`
All tests are registrar-actor integration scenarios
sharing intertwined helpers + `enable_modules=[__name__]`
task fns, so keep as one mod but rename to reflect
content. Now lives alongside `test_multiaddr.py` in
the new `tests/discovery/` subpkg.

Also,
- update 5 refs in `/run-tests` SKILL.md to match
  the new path
- add `discovery/` subdir to the test directory
  layout tree

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 7cf3b5d00d Add `test_multiaddr.py` suite for `mk_maddr()`
Cover `_tpt_proto_to_maddr` mapping, TCP (ipv4/ipv6),
UDS, unsupported `proto_key` error, and round-trip
re-parse for both transport types.

Deats,
- new `tests/discovery/` subpkg w/ empty `__init__.py`
- `test_tpt_proto_to_maddr_mapping`: verify `tcp` and
  `uds` entries
- `test_mk_maddr_tcp_ipv4`: full assertion on
  `/ip4/127.0.0.1/tcp/1234` incl protocol iteration
- `test_mk_maddr_tcp_ipv6`: verify `/ip6/::1/tcp/5678`
- `test_mk_maddr_uds`: relative `filedir` bc the
  multiaddr parser rejects double-slash from abs paths
- `test_mk_maddr_unsupported_proto_key`: `ValueError`
  on `proto_key='quic'` via `SimpleNamespace` mock
- `test_mk_maddr_roundtrip`: parametrized over tcp +
  uds, re-parse `str(maddr)` back through `Multiaddr`

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 570c975f14 Fix test typo + use `wait_for_result()` API
- "boostrap" → "bootstrap" in mod docstring
- replace deprecated `portal.result()` with
  `portal.wait_for_result()` + value assertion
  inside the nursery block

Review: PR #1 (Copilot)
https://github.com/mahmoudhas/tractor/pull/1#pullrequestreview-4091096072

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-10 20:58:54 -04:00
Gud Boi e8f1eca8d2 Tighten `test_spawning` types, parametrize loglevel
Parametrize `test_loglevel_propagated_to_subactor`
across `'debug'`, `'cancel'`, `'critical'` levels
(was hardcoded to just `'critical'`) and move it
above the parent-main tests for logical grouping.

Also,
- add `start_method: str` annotations throughout
- use `portal.wait_for_result()` in
  `test_most_beautiful_word` (replaces `.result()`)
- expand mod docstring to describe test coverage
- reformat `check_parent_main_inheritance` docstr

Review: PR #438 (Copilot)
https://github.com/goodboy/tractor/pull/438

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-10 20:58:54 -04:00
Gud Boi c6c591e61a Drop `spawn_test_support` pkg, inline parent-main tests
Replace the subproc-based test harness with inline
`tractor.open_nursery()` calls that directly check
`actor._parent_main_data` instead of comparing
`__main__.__name__` across a process boundary
(which is a no-op under pytest bc the parent
`__main__` is `pytest.__main__`).

Deats,
- delete `tests/spawn_test_support/` pkg (3 files)
- add `check_parent_main_inheritance()` helper fn
  that asserts on `_parent_main_data` emptiness
- rewrite both `run_in_actor` and `start_actor`
  parent-main tests as inline async fns
- drop `tmp_path` fixture and unused imports

Review: PR #434 (goodboy, Copilot)
https://github.com/goodboy/tractor/pull/434

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-10 20:58:54 -04:00
mahmoud b883b27646 Exercise parent-main inheritance through spawn test support
Move the subprocess probe into dedicated spawn test support files so the inheritance tests cover the real __main__ replay path without monkeypatching or inline script strings.
2026-04-10 20:58:54 -04:00
mahmoud 00637764d9 Address review follow-ups for parent-main inheritance opt-out
Clean up mutable defaults, give parent-main bootstrap data a named type, and add direct start_actor coverage so the opt-out change is clearer to review.
2026-04-10 20:58:54 -04:00
mahmoud ea971d25aa Rename parent-main inheritance flag.
Use `inherit_parent_main` across the actor APIs and helper to better describe the behavior, and restore the reviewer note at child bootstrap where the inherited `__main__` data is copied from `SpawnSpec`.
2026-04-10 20:58:54 -04:00
mahmoud 83b6c4270a Simplify parent-main replay opt-out.
Keep actor-owned parent-main capture and let `_mp_figure_out_main()` decide whether to return `__main__` bootstrap data, avoiding the extra SpawnSpec plumbing while preserving the per-actor flag.
2026-04-10 20:58:54 -04:00
mahmoud 6309c2e6fc Route parent-main replay through SpawnSpec
Keep trio child bootstrap data in the spawn handshake instead of stashing it on Actor state so the replay opt-out stays explicit and avoids stale-looking runtime fields.
2026-04-10 20:58:54 -04:00
mahmoud f5301d3fb0 Add per-actor parent-main replay opt-out
Let actor callers skip replaying the parent __main__ during child startup so downstream integrations can avoid inheriting incompatible bootstrap state without changing the default spawn behavior.
2026-04-10 20:58:54 -04:00
Gud Boi febe587c6c Drop `xfail` from `test_moc_reentry_during_teardown`
The per-`ctx_key` locking fix in f086222d intended to resolve the
teardown race reproduced by the new test suite, so the test SHOULD now
pass. TLDR, it doesn't Bp

Also add `collapse_eg()` to the test's ctx-manager stack so that when
run with `pytest <...> --tpdb` we'll actually `pdb`-REPL the RTE when it
hits (previously an assert-error).

(this commit-msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-09 14:42:42 -04:00
Gud Boi cab366cd65 Add xfail test for `_Cache.run_ctx` teardown race
Reproduce the piker `open_cached_client('kraken')` scenario: identical
`ctx_key` callers share one cached resource, and a new task re-enters
during `__aexit__` — hitting `assert not resources.get()` bc `values`
was popped but `resources` wasn't yet.

Deats,
- `test_moc_reentry_during_teardown` uses an `in_aexit` event to
  deterministically land in the teardown window.
- marked `xfail(raises=AssertionError)` against unpatched code (fix in
  `9e49eddd` or wtv lands on the `maybe_open_ctx_locking` or thereafter
  patch branch).

Also, add prompt-io log for the session.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

Prompt-IO: ai/prompt-io/claude/20260406T193125Z_85f9c5d_prompt_io.md
2026-04-06 18:17:04 -04:00
Gud Boi 85f9c5df6f Add per-`ctx_key` isolation tests for `maybe_open_context()`
Add `test_per_ctx_key_resource_lifecycle` to verify that per-key user
tracking correctly tears down resources independently - exercises the
fix from 02b2ef18 where a global `_Cache.users` counter caused stale
cache hits when the same `acm_func` was called with different kwargs.

Also, add a paired `acm_with_resource()` helper `@acm` that yields its
`resource_id` for per-key testing in the above suite.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

Prompt-IO: ai/prompt-io/claude/20260406T172848Z_02b2ef1_prompt_io.md
2026-04-06 14:37:47 -04:00
Gud Boi ebe9d5e4b5 Parametrize `test_resource_cache.test_open_local_sub_to_stream`
Namely with multiple pre-sleep `delay`-parametrizations before either,

- parent-scope cancel-calling (as originally) or,
- depending on the  new `cancel_by_cs: bool` suite parameter, optionally
  just immediately exiting from (the newly named)
  `maybe_cancel_outer_cs()` a checkpoint.

In the latter case we ensure we **don't** inf sleep to avoid leaking
those tasks into the `Actor._service_tn` (though we should really have
a better soln for this)..

Deats,
- make `cs` args optional and adjust internal logic to match.
- add some notes around various edge cases and issues with using the
  actor-service-tn as the scope by default.
2026-04-06 14:37:47 -04:00
Gud Boi 8f44efa327 Drop stale `.cancel()`, fix docstring typo in tests
- Remove leftover `await an.cancel()` in
  `test_registered_custom_err_relayed`; the
  nursery already cancels on scope exit.
- Fix `This document` -> `This documents` typo in
  `test_unregistered_err_still_relayed` docstring.

Review: PR #426 (Copilot)
https://github.com/goodboy/tractor/pull/426

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 5968a3c773 Use `'<unknown>'` for unresolvable `.boxed_type_str`
Add a teensie unit test to match.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 80597b80bf Add passing test for unregistered err relay
Drop the `xfail` test and instead add a new one that ensures the
`tractor._exceptions` fixes enable graceful relay of
remote-but-unregistered error types via the unboxing of just the
`rae.src_type_str/boxed_type_str` content. The test also ensures
a warning is included with remote error content indicating the user
should register their error type for effective cross-actor re-raising.

Deats,
- add `test_unregistered_err_still_relayed`: verify the
  `RemoteActorError` IS raised with `.boxed_type`
  as `None` but `.src_type_str`, `.boxed_type_str`,
  and `.tb_str` all preserved from the IPC msg.
- drop `test_unregistered_boxed_type_resolution_xfail`
  since the new above case covers it and we don't need to have
  an effectively entirely repeated test just with an inverse assert
  as it's last line..

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 9c37b3f956 Add `reg_err_types()` test suite for remote exc relay
Verify registered custom error types round-trip correctly over IPC via
`reg_err_types()` + `get_err_type()`.

Deats,
- `TestRegErrTypesPlumbing`: 5 unit tests for the type-registry plumbing
  (register, lookup, builtins, tractor-native types, unregistered
  returns `None`)
- `test_registered_custom_err_relayed`: IPC end-to-end for a registered
  `CustomAppError` checking `.boxed_type`, `.src_type`, and `.tb_str`
- `test_registered_another_err_relayed`: same for `AnotherAppError`
  (multi-type coverage)
- `test_unregistered_custom_err_fails_lookup`: `xfail` documenting that
  `.boxed_type` can't resolve without `reg_err_types()` registration

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi b14dbde77b Skip `test_empty_mngrs_input_raises` on UDS tpt
The `open_actor_cluster()` teardown hangs
intermittently on UDS when `gather_contexts(mngrs=())`
raises `ValueError` mid-setup; likely a race in the
actor-nursery cleanup vs UDS socket shutdown. TCP
passes reliably (5/5 runs).

- Add `tpt_proto` fixture param to the test
- `pytest.skip()` on UDS with a TODO for deeper
  investigation of `._clustering`/`._supervise`
  teardown paths

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 93d99ed2eb Move `get_cpu_state()` to `conftest` as shared latency headroom
Factor the CPU-freq-scaling helper out of
`test_legacy_one_way_streaming` into `conftest.py`
alongside a new `cpu_scaling_factor()` convenience fn
that returns a latency-headroom multiplier (>= 1.0).

Apply it to the two other flaky-timeout tests,
- `test_cancel_via_SIGINT_other_task`: 2s -> scaled
- `test_example[we_are_processes.py]`: 16s -> scaled

Deats,
- add `get_cpu_state()` + `cpu_scaling_factor()` to
  `conftest.py` so all test mods can share the logic.
- catch `IndexError` (empty glob) in addition to
  `FileNotFoundError`.
- rename `factor` var -> `headroom` at call sites for
  clarity on intent.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 6215e3b2dd Adjust `test_a_quadruple_example` time-limit for CPU scaling
Add `get_cpu_state()` helper to read CPU freq settings
from `/sys/devices/system/cpu/` and use it to compensate
the perf time-limit when `auto-cpufreq` (or similar)
scales down the max frequency.

Deats,
- read `*_pstate_max_freq` and `scaling_max_freq`
  to compute a `cpu_scaled` ratio.
- when `cpu_scaled != 1.`, increase `this_fast` limit
  proportionally (factoring dual-threaded cores).
- log a warning via `test_log` when compensating.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 21ed181835 Filter `__pycache__` from example discovery in tests
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 9ec2749ab7 Rename `Arbiter` -> `Registrar`, mv to `discovery._registry`
Move the `Arbiter` class out of `runtime._runtime` into its
logical home at `discovery._registry` as `Registrar(Actor)`.
This completes the long-standing terminology migration from
"arbiter" to "registrar/registry" throughout the codebase.

Deats,
- add new `discovery/_registry.py` mod with `Registrar`
  class + backward-compat `Arbiter = Registrar` alias.
- rename `Actor.is_arbiter` attr -> `.is_registrar`;
  old attr now a `@property` with `DeprecationWarning`.
- `_root.py` imports `Registrar` directly for
  root-actor instantiation.
- export `Registrar` + `Arbiter` from `tractor.__init__`.
- `_runtime.py` re-imports from `discovery._registry`
  for backward compat.

Also,
- update all test files to use `.is_registrar`
  (`test_local`, `test_rpc`, `test_spawning`,
  `test_discovery`, `test_multi_program`).
- update "arbiter" -> "registrar" in comments/docstrings
  across `_discovery.py`, `_server.py`, `_transport.py`,
  `_testing/pytest.py`, and examples.
- drop resolved TODOs from `_runtime.py` and `_root.py`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi f3441a6790 Update tests+examples imports for new subpkgs
Adjust all `tractor._state`, `tractor._addr`,
`tractor._supervise`, etc. refs in tests and examples
to use the new `runtime/`, `discovery/`, `spawn/` paths.

Also,
- use `tractor.debug_mode()` pub API instead of
  `tractor._state.debug_mode()` in a few test mods
- add explicit `timeout=20` to `test_respawn_consumer_task`
  `@tractor_test` deco call

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 85457cb839 Address Copilot review suggestions on PR #366
- Use `bidict.forceput()` in `register_actor()` to handle
  duplicate addr values from stale entries or actor restarts.
- Fix `uid` annotation to `tuple[str, str]|None` in
  `maybe_open_portal()` and handle the `None` return from
  `delete_addr()` in log output.
- Pass explicit `registry_addrs=[reg_addr]` to `open_nursery()`
  and `find_actor()` in `test_stale_entry_is_deleted` to ensure
  the test uses the remote registrar.
- Update `query_actor()` docstring to document the new
  `(addr, reg_portal)` yield shape.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 00:21:09 -04:00
Tyler Goodlet d929fb75b5 Rename `.delete_sockaddr()` -> `.delete_addr()` 2026-03-13 21:51:15 -04:00
Tyler Goodlet 528012f35f Add stale entry deleted from registrar test
By spawning an actor task that immediately shuts down the transport
server and then sleeps, verify that attempting to connect via the
`._discovery.find_actor()` helper delivers `None` for the `Portal`
value.

Relates to #184 and #216
2026-03-13 21:51:15 -04:00
Gud Boi 5b8f6cf4c7 Use `.aid.uid` to avoid deprecation warns in tests
- `test_inter_peer_cancellation`: swap all `.uid` refs
  on `Actor`, `Channel`, and `Portal` to `.aid.uid`
- `test_legacy_one_way_streaming`: same + fix `print()`
  to multiline style

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 21:10:52 -04:00
Gud Boi 066011b83d Bump `fail_after` delay on non-linux for sync-sleep test
Use 6s timeout on non-linux (vs 4s) in
`test_cancel_while_childs_child_in_sync_sleep()` to avoid
flaky `TooSlowError` on slower CI runners.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 21:10:52 -04:00
Gud Boi 8991ec2bf5 Fix warns and de-reg race in `test_discovery`
Removes the `pytest` deprecation warns and attempts to avoid
some de-registration raciness, though i'm starting to think the
real issue is due to not having the fixes from #366 (which handle
the new dereg on `OSError` case from UDS)?

- use `.channel.aid.uid` over deprecated `.channel.uid`
  throughout `test_discovery.py`.
- add polling loop (up to 5s) for subactor de-reg check
  in `spawn_and_check_registry()` to handle slower
  transports like UDS where teardown takes longer.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 21:10:52 -04:00