tractor

Commit Graph

Author	SHA1	Message	Date
Gud Boi	eceed29d4a	Pin forkserver hang to pytest `--capture=fd` Sixth and final diagnostic pass — after all 4 cascade fixes landed (FD hygiene, pidfd wait, `_parent_chan_cs` wiring, bounded peer-clear), the actual last gate on `test_nested_multierrors[subint_forkserver]` turned out to be pytest's default `--capture=fd` stdout/stderr capture, not anything in the runtime cascade. Empirical result: `pytest -s` → test PASSES in 6.20s. Default `--capture=fd` → hangs forever. Mechanism: pytest replaces the parent's fds 1,2 with pipe write-ends it reads from. Fork children inherit those pipes (since `_close_inherited_fds` correctly preserves stdio). The error-propagation cascade in a multi-level cancel test generates 7+ actors each logging multiple `RemoteActorError` / `ExceptionGroup` tracebacks — enough output to fill Linux's 64KB pipe buffer. Writes block, subactors can't progress, processes don't exit, `_ForkedProc.wait` hangs. Self-critical aside: I earlier tested w/ and w/o `-s` and both hung, concluding "capture-pipe ruled out". That was wrong — at that time fixes 1-4 weren't all in place, so the test was failing at deeper levels long before reaching the "produce lots of output" phase. Once the cascade could actually tear down cleanly, enough output flowed to hit the pipe limit. Order-of- operations mistake: ruling something out based on a test that was failing for a different reason. Deats, - `subint_forkserver_test_cancellation_leak_issue .md`: new section "Update — VERY late: pytest capture pipe IS the final gate" w/ DIAG timeline showing `trio.run` fully returns, diagnosis of pipe-fill mechanism, retrospective on the earlier wrong ruling-out, and fix direction (redirect subactor stdout/stderr to `/dev/null` in fork-child prelude, conditional on pytest-detection or opt-in flag) - `tests/test_cancellation.py`: skip-mark reason rewritten to describe the capture-pipe gate specifically; cross-refs the new doc section - `tests/spawn/test_subint_forkserver.py`: the orphan-SIGINT test regresses back to xfail. Previously passed after the FD-hygiene fix, but the new `wait_for_no_more_peers( move_on_after=3.0)` bound in `async_main`'s teardown added up to 3s latency, pushing orphan-subactor exit past the test's 10s poll window. Real fix: faster orphan-side teardown OR extend poll window to 15s No runtime code changes in this commit — just test-mark adjustments + doc wrap-up. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-23 23:18:14 -04:00
Gud Boi	506617c695	Skip-mark + narrow `subint_forkserver` cancel hang Two-part stopgap for the still-hanging `test_nested_multierrors[subint_forkserver]`: 1. Skip-mark the test via `@pytest.mark.skipon_spawn_backend('subint_forkserver', reason=...)` so it stops blocking the test matrix while the remaining bug is being chased. The reason string cross-refs the conc-anal doc for full context. 2. Update the conc-anal doc (`subint_forkserver_test_cancellation_leak_issue.md`) with the empirical state after the three nested- cancel fix commits (`0cd0b633` FD scrub + `fe540d02` pidfd wait + `57935804` parent-chan shield break) landed, narrowing the remaining hang from "everything broken" to "peer-channel loops don't exit on `service_tn` cancel". Deats from the DIAGDEBUG instrumentation pass, - 80 `process_messages` ENTERs, 75 EXITs → 5 stuck - ALL 40 `shield=True` ENTERs matched EXIT — the `_parent_chan_cs.cancel()` wiring from `57935804` works as intended for shielded loops. - the 5 stuck loops are all `shield=False` peer- channel handlers in `handle_stream_from_peer` (inbound connections handled by `stream_handler_tn`, which IS `service_tn` in the current config). - after `_parent_chan_cs.cancel()` fires, NEW shielded loops appear on the session reg_addr port — probably discovery-layer reconnection; doesn't block teardown but indicates the cascade has more moving parts than expected. The remaining unknown: why don't the 5 peer-channel loops exit when `service_tn.cancel_scope.cancel()` fires? They're not shielded, they're inside the service_tn scope, a standard cancel should propagate through. Some fork-config-specific divergence keeps them alive. Doc lists three follow-up experiments (stackscope dump, side-by-side `trio_proc` comparison, audit of the `tractor/ipc/_server.py:448` `except trio.Cancelled:` path). (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-23 18:48:34 -04:00
Gud Boi	1af2121057	Wire `reg_addr` through leaky cancel tests Stopgap companion to `d0121960` (`subint_forkserver` test-cancellation leak doc): five tests in `tests/test_cancellation.py` were running against the default `:1616` registry, so any leaked `subint-forkserv` descendant from a prior test holds the port and blows up every subsequent run with `TooSlowError` / "address in use". Thread the session-unique `reg_addr` fixture through so each run picks its own port — zombies can no longer poison other tests (they'll only cross-contaminate whatever happens to share their port, which is now nothing). Deats, - add `reg_addr: tuple` fixture param to: - `test_cancel_infinite_streamer` - `test_some_cancels_all` - `test_nested_multierrors` - `test_cancel_via_SIGINT` - `test_cancel_via_SIGINT_other_task` - explicitly pass `registry_addrs=[reg_addr]` to the two `open_nursery()` calls that previously had no kwargs at all (in `test_cancel_via_SIGINT` and `test_cancel_via_SIGINT_other_task`) - add bounded `@pytest.mark.timeout(7, method='thread')` to `test_nested_multierrors` so a hung run doesn't wedge the whole session Still doesn't close the real leak — the `subint_forkserver` backend's `_ForkedProc.kill()` is PID-scoped not tree-scoped, so grandchildren survive teardown regardless of registry port. This commit is just blast-radius containment until that fix lands. See `ai/conc-anal/ subint_forkserver_test_cancellation_leak_issue.md`. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-23 18:48:34 -04:00
Gud Boi	4b2a0886c3	Mark `subint`-hanging tests with `skipon_spawn_backend` Adopt the `@pytest.mark.skipon_spawn_backend('subint', reason=...)` marker (`a617b521`) across the suites reproducing the `subint` GIL-contention / starvation hang classes doc'd in `ai/conc-anal/subint_*_issue.md`. Deats, - Module-level `pytestmark` on full-file-hanging suites: - `tests/test_cancellation.py` - `tests/test_inter_peer_cancellation.py` - `tests/test_pubsub.py` - `tests/test_shm.py` - Per-test decorator where only one test in the file hangs: - `tests/discovery/test_registrar.py ::test_stale_entry_is_deleted` — replaces the inline `if start_method == 'subint': pytest.skip` branch with a declarative skip. - `tests/test_subint_cancellation.py ::test_subint_non_checkpointing_child`. - A few per-test decorators are left commented-in- place as breadcrumbs for later finer-grained unskips. Also, some nearby tidying in the affected files: - Annotate loose fixture / test params (`pytest.FixtureRequest`, `str`, `tuple`, `bool`) in `tests/conftest.py`, `tests/devx/conftest.py`, and `tests/test_cancellation.py`. - Normalize `"""..."""` → `'''...'''` docstrings per repo convention on a few touched tests. - Add `timeout=6` / `timeout=10` to `@tractor_test(...)` on `test_cancel_infinite_streamer` and `test_some_cancels_all`. - Drop redundant `spawn_backend` param from `test_cancel_via_SIGINT`; use `start_method` in the `'mp' in ...` check instead. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-23 18:47:49 -04:00
Gud Boi	93d99ed2eb	Move `get_cpu_state()` to `conftest` as shared latency headroom Factor the CPU-freq-scaling helper out of `test_legacy_one_way_streaming` into `conftest.py` alongside a new `cpu_scaling_factor()` convenience fn that returns a latency-headroom multiplier (>= 1.0). Apply it to the two other flaky-timeout tests, - `test_cancel_via_SIGINT_other_task`: 2s -> scaled - `test_example[we_are_processes.py]`: 16s -> scaled Deats, - add `get_cpu_state()` + `cpu_scaling_factor()` to `conftest.py` so all test mods can share the logic. - catch `IndexError` (empty glob) in addition to `FileNotFoundError`. - rename `factor` var -> `headroom` at call sites for clarity on intent. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-02 17:59:13 -04:00
Gud Boi	066011b83d	Bump `fail_after` delay on non-linux for sync-sleep test Use 6s timeout on non-linux (vs 4s) in `test_cancel_while_childs_child_in_sync_sleep()` to avoid flaky `TooSlowError` on slower CI runners. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-03-13 21:10:52 -04:00
Gud Boi	6ee0149e8d	Another cancellation test timeout bump for non-linux	2026-03-09 19:46:42 -04:00
Tyler Goodlet	88c1c083bd	Add timeout to inf-streamer test	2025-08-18 13:31:15 -04:00
Tyler Goodlet	b096867d40	Remove lingering seg=False-flags from tests	2025-08-18 12:03:32 -04:00
Tyler Goodlet	d2ac9ecf95	Resolve `test_cancel_while_childs_child_in_sync_sleep` Was failing due to the `.fail_after()` timeout being too short and somehow the new interplay of that with strict-exception groups resulting in the `TooSlowError` never raising but instead an eg with the embedded `AssertionError`?? I still don't really get it honestly.. I've written up lengthy notes around the different `delay` settings that can be used to see the diff outcomes, the failing case being the one i still don't really grok and think is justification for `trio` to bubble inner `Cancelled`s differently possibly? For now i've included the original failing case as an `xfail` parametrization for now which will hopefully drive a follow lowlevel `trio` test in `test_trioisms`!	2025-08-18 10:46:37 -04:00
Tyler Goodlet	8218f0f51f	Bit of multi-line styling / name tweaks in cancellation suites	2025-08-18 10:46:37 -04:00
Tyler Goodlet	8f7c022afe	Various test tweaks related to 3.13 egs Including changes like, - loose eg flagging in various test emedded `trio.open_nursery()`s. - changes to eg handling (like using `except*`). - added `debug_mode` integration to tests that needed some REPLin in order to figure out appropriate updates.	2025-03-27 13:38:47 -04:00
Tyler Goodlet	41a3297b9f	Tweak some test asserts to better `is` style	2025-03-27 13:24:25 -04:00
Tyler Goodlet	683288c8db	Update tests for `PldRx` and `Context` changes Mostly adjustments for the new pld-receiver semantics/shim-layer which results more often in the direct delivery of `RemoteActorError`s from IPC API primitives (like `Portal.result()`) instead of being embedded in an `ExceptionGroup` bundled from an embedded nursery. Tossed usage of the `debug_mode: bool` fixture to a couple problematic tests while i was working on them. Also includes detailed assertion updates to the inter-peer cancellation suite in terms of, - `Context.canceller` state correctly matching the true src actor when expecting a ctxc. - any rxed `ContextCancelled` should instance match the `Context._local/remote_error` as should the `.msgdata` and `._ipc_msg`.	2025-03-24 14:04:51 -04:00
Tyler Goodlet	dd9fe0b043	Add `tests/__init__.py` for `.conftest` imports I must have had a local touched file but never committed or something? Seems that new `pytest` requires a top level `tests` pkg in order for relative `.conftest` imports to work.	2025-03-20 20:53:54 -04:00
Tyler Goodlet	5bf550b64a	Adjust all `RemoteActorError.type` using tests To instead use the new `.boxed_type` B)	2025-03-20 20:35:02 -04:00
Tyler Goodlet	dec2b1f0f5	Reapply "Port all tests to new `reg_addr` fixture name" This reverts-the-revert of commit `bc13599e1f` which was needed to land pre `multihomed` feat branch history.	2025-03-20 19:50:31 -04:00
Tyler Goodlet	bc13599e1f	Revert "Port all tests to new `reg_addr` fixture name" This reverts commit `715348c5c2`.	2025-03-19 15:34:30 -04:00
Tyler Goodlet	a87df3009f	Drop now-deprecated deps on modern `trio`/Python - `trio_typing` is nearly obsolete since `trio >= 0.23` - `exceptiongroup` is built-in to python 3.11 - `async_generator` primitives have lived in `contextlib` for quite a while!	2025-03-16 16:06:24 -04:00
Tyler Goodlet	389b305d3b	Add (back) a `tractor._testing` sub-pkg Since importing from our top level `conftest.py` is not scaleable or as "future forward thinking" in terms of: - LoC-wise (it's only one file), - prevents "external" (aka non-test) example scripts from importing content easily, - seemingly(?) can't be used via abs-import if using a `[tool.pytest.ini_options]` in a `pyproject.toml` vs. a `pytest.ini`, see: https://docs.pytest.org/en/8.0.x/reference/customize.html#pyproject-toml) => Go back to having an internal "testing" pkg like `trio` (kinda) does. Deats: - move generic top level helpers into pkg-mod including the new `expect_ctxc()` (which i needed in the advanced faults testing script. - move `@tractor_test` into `._testing.pytest` sub-mod. - adjust all the helper imports to be a `from tractor._testing import <..>` Rework `test_ipc_channel_break_during_stream()` and backing script: - make test(s) pull `debug_mode` from new fixture (which is now controlled manually from `--tpdb` flag) and drop the previous parametrized input. - update logic in ^ test for "which-side-fails" cases to better match recently updated/stricter cancel/failure semantics in terms of `ClosedResouruceError` vs. `EndOfChannel` expectations. - handle `ExceptionGroup`s with expected embedded errors in test. - better pendantics around whether to expect a user simulated KBI. - for `examples/advanced_faults/ipc_failure_during_stream.py` script: - generalize ipc breakage in new `break_ipc()` with support for diff internal `trio` methods and a #TODO for future disti frameworks - only make one sub-actor task break and the other just stream. - use new `._testing.expect_ctxc()` around ctx block. - add a bit of exception handling with `print()`s around ctxc (unused except if 'msg' break method is set) and eoc cases. - don't break parent side ipc in loop any more then once after first break, checked via flag var. - add a `pre_close: bool` flag to control whether `MsgStreama.aclose()` is called before any ipc breakage method. Still TODO: - drop `pytest.ini` and add the alt section to `pyproject.py`. -> currently can't get `--rootdir=` opt to work.. not showing in console header. -> ^ also breaks on 'tests' `enable_modules` imports in subactors during discovery tests?	2025-03-16 15:28:28 -04:00
Tyler Goodlet	664ae87588	Make `@context`-cancelled tests more pedantic In order to match a very significant and coming-soon patch set to the IPC `Context` and `Channel` cancellation semantics with significant but subtle changes to the primitives and runtime logic: - a new set of `Context` state pub meth APIs for checking exact inter-actor-linked-task outcomes such as `.outcome`, `.maybe_error`, and `.cancel_acked`. - trying to move away from `Context.cancelled_caught` usage since the semantics from `trio` don't really map well (in terms of cancel requests and how they result in cancel-scope graceful closure) and `.cancel_acked: bool` is a better approach for IPC req-resp msging. - change test usage to access `._scope.cancelled_caught` directly. - more pedantic ctxc-raising expects around the "type of self cancellation" and final outcome in ctxc cases: - `ContextCancelled` is raised by ctx (`Context.result()`) consumer methods when `Portal.cancel_actor()` is called (since it's an out-of-band request) despite `Channel._cancel_called` being set. - also raised by `.open_context().__aexit__()` on close. - `.outcome` is always `.maybe_error` is always one of `._local/remote_error`.	2025-03-14 22:18:31 -04:00
Tyler Goodlet	715348c5c2	Port all tests to new `reg_addr` fixture name	2025-03-14 13:42:15 -04:00
Tyler Goodlet	347591c348	Expect egs in tests which retreive portal results	2022-10-14 19:42:23 -04:00
Tyler Goodlet	0f523b65fb	Change cancel test over the exception group	2022-10-14 18:16:51 -04:00
Tyler Goodlet	d24fae8381	'Rename mp spawn methods to have a `'mp_'` prefix'	2022-10-09 17:54:55 -04:00
Tyler Goodlet	b3ff4b7804	Increase some timeouts for windows	2022-01-21 12:20:06 -05:00
Tyler Goodlet	d65912e1ae	Increase kbi delay in remote cancel test	2021-12-17 09:38:04 -05:00
Tyler Goodlet	a29924f330	Don't assume exception order from nursery	2021-12-02 08:45:58 -05:00
Tyler Goodlet	16a3321a38	Increase timeout for windows..	2021-11-29 21:52:30 -05:00
Tyler Goodlet	121f7fd844	Draft test that shows a slow daemon cancellation Currently if the spawn task is waiting on a daemon actor it is likely in `await proc.wait()`, however, if the actor nursery is subsequently cancelled this checkpoint will be abandoned and the hard proc reaping sequence will execute which results in a up to 3 second wait before a "hard" system signal is sent to the child. Ideally such a cancelled-during-daemon-actor-wait condition is instead handled by first trying to cancel the remote actor using `Portal.cancel_actor()` (a "graceful" remote cancel request) which should (presuming normal runtime operation) result in an immediate collection of the process after normal actor (remotely triggered) runtime cancellation.	2021-11-29 16:03:14 -05:00
Tyler Goodlet	4f222a5f9c	Use type match of expected error	2021-10-15 10:25:50 -04:00
Tyler Goodlet	533457c64d	Handle nested multierror case on windows	2021-10-15 09:16:51 -04:00
Tyler Goodlet	7ee121aeaf	Try to handle variable windows errors	2021-10-14 13:39:46 -04:00
Tyler Goodlet	b372f4c92b	Handle top level multierror that presents now?	2021-07-02 11:55:16 -04:00
Tyler Goodlet	2efd8ed167	Drop run and rpc_module_paths from cancel tests	2021-05-07 11:21:40 -04:00
Tyler Goodlet	2498a4963b	Update all tests to new streaming API	2021-04-28 12:23:14 -04:00
Tyler Goodlet	1f1619c730	Convert all test suite sync funcs	2021-04-27 12:08:30 -04:00
Tyler Goodlet	0eba5f4708	Port remaining tests to pass func refs	2020-12-22 10:39:47 -05:00
Tyler Goodlet	a668f714d5	Allow passing function refs to `Portal.run()` This resolves and completes #69 allowing all RPC invocation APIs to pass function references directly instead of explicit `str` names for the target namespace and function (this is still done implicitly underneath). This brings us closer to `trio`'s task running API as well as acknowledges that any inter-host RPC system (and API) will likely need to be implemented on top of local RPC primitives anyway. Even if this ends up not being true we can always go to "function stubs" as part of our IAC protocol or, add a new method to do explicit namespace calls: `.run_from_module()` or whatever everyone votes on. Resolves #69 Further, this commit drops `Actor.statespace` from the entire system since a user can easily get this same functionality using module level variables. Fix docs to match all these changes (luckily mostly already done due to example scripts referencing).	2020-12-21 09:09:55 -05:00
Tyler Goodlet	1b6ee2ecf6	Skip sync sleep test on windows	2020-10-13 15:26:46 -04:00
Tyler Goodlet	666966097a	Revert "Change to relative conftest.py imports" This reverts commit `2b53c74b1c`.	2020-10-13 14:42:02 -04:00
Tyler Goodlet	24ef919334	Skip sync sleep test on mp backend	2020-10-13 14:16:20 -04:00
Tyler Goodlet	0e344eead8	Add a "cancel arrives during a sync sleep in child" test This appears to demonstrate the same bug found in #156. It looks like cancelling a subactor with a child, while that child is running sync code, can result in the child never getting cancelled due to some strange condition where the internal nurseries aren't being torn down as expected when a `trio.Cancelled` is raised.	2020-10-12 23:25:22 -04:00
Tyler Goodlet	2b53c74b1c	Change to relative conftest.py imports	2020-10-05 11:58:58 -04:00
Tyler Goodlet	da56d0f043	Add slight delays to SIGINT tests on mp	2020-07-29 13:27:15 -04:00
Tyler Goodlet	e8a38e4d15	Fix cancelled type handling	2020-07-27 11:15:05 -04:00
Tyler Goodlet	3c7ec72f8e	Fix SIGINT test names	2020-07-26 23:37:44 -04:00
Tyler Goodlet	5a27065a10	Finally tame the super flaky tests - ease up on first stream test run deadline - skip streaming tests in CI for mp backend, period - give up on > 1 depth nested spawning with mp - completely give up on slow spawning on windows	2020-07-26 22:53:40 -04:00
Tyler Goodlet	891edbab5f	Run the trio spawner in nested tests	2020-07-25 18:19:17 -04:00
Tyler Goodlet	dddbeb0e71	Run Windows on trio and mp backends The new pure trio spawning backend uses `subprocess` internally which is also supported on windows so let's run it in CI.	2020-07-25 13:41:48 -04:00

1 2

67 Commits (eceed29d4a6b04edb5b04c555666944aac3b79d9)