tractor/tests
Gud Boi fe89169f1c Skip-mark + narrow `subint_forkserver` cancel hang
Two-part stopgap for the still-hanging
`test_nested_multierrors[subint_forkserver]`:

1. Skip-mark the test via
   `@pytest.mark.skipon_spawn_backend('subint_forkserver',
   reason=...)` so it stops blocking the test
   matrix while the remaining bug is being chased.
   The reason string cross-refs the conc-anal doc
   for full context.

2. Update the conc-anal doc
   (`subint_forkserver_test_cancellation_leak_issue.md`) with the
   empirical state after the three nested- cancel fix commits
   (`0cd0b633` FD scrub + `fe540d02` pidfd wait + `57935804` parent-chan
   shield break) landed, narrowing the remaining hang from "everything
   broken" to "peer-channel loops don't exit on `service_tn` cancel".

Deats from the DIAGDEBUG instrumentation pass,
- 80 `process_messages` ENTERs, 75 EXITs → 5 stuck
- ALL 40 `shield=True` ENTERs matched EXIT — the
  `_parent_chan_cs.cancel()` wiring from `57935804`
  works as intended for shielded loops.
- the 5 stuck loops are all `shield=False` peer-
  channel handlers in `handle_stream_from_peer`
  (inbound connections handled by
  `stream_handler_tn`, which IS `service_tn` in the
  current config).
- after `_parent_chan_cs.cancel()` fires, NEW
  shielded loops appear on the session reg_addr
  port — probably discovery-layer reconnection;
  doesn't block teardown but indicates the cascade
  has more moving parts than expected.

The remaining unknown: why don't the 5 peer-channel loops exit when
`service_tn.cancel_scope.cancel()` fires? They're not shielded, they're
inside the service_tn scope, a standard cancel should propagate through.
Some fork-config-specific divergence keeps them alive. Doc lists three
follow-up experiments (stackscope dump, side-by-side `trio_proc`
comparison, audit of the `tractor/ipc/_server.py:448` `except
trio.Cancelled:` path).

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 16:44:15 -04:00
..
devx Mark `subint`-hanging tests with `skipon_spawn_backend` 2026-04-21 21:33:15 -04:00
discovery Mark `subint`-hanging tests with `skipon_spawn_backend` 2026-04-21 21:33:15 -04:00
ipc Update tests+examples imports for new subpkgs 2026-04-02 17:59:13 -04:00
msg Update tests+examples imports for new subpkgs 2026-04-02 17:59:13 -04:00
spawn Use `pidfd` for cancellable `_ForkedProc.wait` 2026-04-23 16:06:45 -04:00
__init__.py Add `tests/__init__.py` for `.conftest` imports 2025-03-20 20:53:54 -04:00
conftest.py Mark `subint`-hanging tests with `skipon_spawn_backend` 2026-04-21 21:33:15 -04:00
test_2way.py Tidy a typing-typo, add explicit `ids=` for paramed suites 2026-03-09 19:35:47 -04:00
test_advanced_faults.py Revert advanced-fault UDS edge case handling 2026-03-13 21:10:52 -04:00
test_advanced_streaming.py Remove lingering seg=False-flags from tests 2025-08-18 12:03:32 -04:00
test_cancellation.py Skip-mark + narrow `subint_forkserver` cancel hang 2026-04-23 16:44:15 -04:00
test_child_manages_service_nursery.py Swap `open_channel_from()` to yield `(chan, first)` 2026-03-13 19:28:57 -04:00
test_clustering.py Skip `test_empty_mngrs_input_raises` on UDS tpt 2026-04-02 17:59:13 -04:00
test_context_stream_semantics.py Update tests+examples imports for new subpkgs 2026-04-02 17:59:13 -04:00
test_docs_examples.py Move `get_cpu_state()` to `conftest` as shared latency headroom 2026-04-02 17:59:13 -04:00
test_infected_asyncio.py Update tests+examples imports for new subpkgs 2026-04-02 17:59:13 -04:00
test_inter_peer_cancellation.py Mark `subint`-hanging tests with `skipon_spawn_backend` 2026-04-21 21:33:15 -04:00
test_legacy_one_way_streaming.py Move `get_cpu_state()` to `conftest` as shared latency headroom 2026-04-02 17:59:13 -04:00
test_local.py Rename `Arbiter` -> `Registrar`, mv to `discovery._registry` 2026-04-02 17:59:13 -04:00
test_log_sys.py Mk `test_implicit_mod_name_applied_for_child()` check init-mods 2026-02-11 21:43:37 -05:00
test_multi_program.py Rename `discovery._discovery` to `._api` 2026-04-14 19:54:14 -04:00
test_oob_cancellation.py Woops, fix missing `assert` thanks to copilot 2025-09-11 13:13:18 -04:00
test_pubsub.py Mark `subint`-hanging tests with `skipon_spawn_backend` 2026-04-21 21:33:15 -04:00
test_reg_err_types.py Drop stale `.cancel()`, fix docstring typo in tests 2026-04-02 18:21:19 -04:00
test_remote_exc_relay.py Adjust ep-masking-suite for the real-use-case 2025-07-15 07:23:21 -04:00
test_resource_cache.py Scale `test_open_local_sub_to_stream` timeout by CPU factor 2026-04-16 20:03:32 -04:00
test_ringbuf.py Avoid skip `.ipc._ringbuf` import when no `cffi` 2026-04-20 16:39:32 -04:00
test_root_infect_asyncio.py Swap `open_channel_from()` to yield `(chan, first)` 2026-03-13 19:28:57 -04:00
test_root_runtime.py Update tests+examples imports for new subpkgs 2026-04-02 17:59:13 -04:00
test_rpc.py Rename `Arbiter` -> `Registrar`, mv to `discovery._registry` 2026-04-02 17:59:13 -04:00
test_runtime.py Repair lifetime-stack suite's flakiness 2026-03-13 21:10:52 -04:00
test_shm.py Mark `subint`-hanging tests with `skipon_spawn_backend` 2026-04-21 21:33:15 -04:00
test_spawning.py Tweak timeouts and rm `arbiter_addr` in tests 2026-04-14 19:54:14 -04:00
test_task_broadcasting.py Tweak timeouts and rm `arbiter_addr` in tests 2026-04-14 19:54:14 -04:00
test_trioisms.py Tweaks from copilot, type fix, typos, language. 2025-09-11 10:01:25 -04:00