Compare commits

..

2 Commits

Author SHA1 Message Date
Gud Boi 0cd0b633f1 Scrub inherited FDs in fork-child prelude
Implements fix-direction (1)/blunt-close-all-FDs from
b71705bd (`subint_forkserver` nested-cancel hang
diag), targeting the multi-level cancel-cascade
deadlock in
`test_nested_multierrors[subint_forkserver]`.

The diagnosis doc voted for surgical FD cleanup via
`actor.ipc_server` handle as the cleanest approach,
but going blunt is actually the right call: after
`os.fork()`, the child immediately enters
`_actor_child_main()` which opens its OWN IPC
sockets / wakeup-fd / epoll-fd / etc. — none of the
parent's FDs are needed. Closing everything except
stdio is safe AND defends against future
listener/IPC additions to the parent inheriting
silently into children.

Deats,
- new `_close_inherited_fds(keep={0,1,2}) -> int`
  helper. Linux fast-path enumerates `/proc/self/fd`;
  POSIX fallback uses `RLIMIT_NOFILE` range. Matches
  the stdlib `subprocess._posixsubprocess.close_fds`
  strategy. Returns close-count for sanity logging
- wire into `fork_from_worker_thread._worker()`'s
  post-fork child prelude — runs immediately after
  the pid-pipe `os.close(rfd/wfd)`, before the user
  `child_target` callable executes
- docstring cross-refs the diagnosis doc + spells
  out the FD-inheritance-cascade mechanism and why
  the close-all approach is safe for our spawn shape

Validation pending: re-run `test_nested_multierrors[subint_forkserver]`
to confirm the deadlock is gone.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 15:30:39 -04:00
Gud Boi b71705bdcd Refine `subint_forkserver` nested-cancel hang diagnosis
Major rewrite of
`subint_forkserver_test_cancellation_leak_issue.md`
after empirical investigation revealed the earlier
"descendant-leak + missing tree-kill" diagnosis
conflated two unrelated symptoms:

1. **5-zombie leak holding `:1616`** — turned out to
   be a self-inflicted cleanup bug: `pkill`-ing a bg
   pytest task (SIGTERM/SIGKILL, no SIGINT) skipped
   the SC graceful cancel cascade entirely. Codified
   the real fix — SIGINT-first ladder w/ bounded
   wait before SIGKILL — in e5e2afb5 (`run-tests`
   SKILL) and
   `feedback_sc_graceful_cancel_first.md`.
2. **`test_nested_multierrors[subint_forkserver]`
   hangs indefinitely** — the actual backend bug,
   and it's a deadlock not a leak.

Deats,
- new diagnosis: all 5 procs are kernel-`S` in
  `do_epoll_wait`; pytest-main's trio-cache workers
  are in `os.waitpid` waiting for children that are
  themselves waiting on IPC that never arrives —
  graceful `Portal.cancel_actor` cascade never
  reaches its targets
- tree-structure evidence: asymmetric depth across
  two identical `run_in_actor` calls — child 1
  (3 threads) spawns both its grandchildren; child 2
  (1 thread) never completes its first nursery
  `run_in_actor`. Smells like a race on fork-
  inherited state landing differently per spawn
  ordering
- new hypothesis: `os.fork()` from a subactor
  inherits the ROOT parent's IPC listener FDs
  transitively. Grandchildren end up with three
  overlapping FD sets (own + direct-parent + root),
  so IPC routing becomes ambiguous. Predicts bug
  scales with fork depth — matches reality: single-
  level spawn works, multi-level hangs
- ruled out: `_ForkedProc.kill()` tree-kill (never
  reaches hard-kill path), `:1616` contention (fixed
  by `reg_addr` fixture wiring), GIL starvation
  (each subactor has its own OS process+GIL),
  child-side KBI absorption (`_trio_main` only
  catches KBI at `trio.run()` callsite, reached
  only on trio-loop exit)
- four fix directions ranked: (1) blanket post-fork
  `closerange()`, (2) `FD_CLOEXEC` + audit,
  (3) targeted FD cleanup via `actor.ipc_server`
  handle, (4) `os.posix_spawn` w/ `file_actions`.
  Vote: (3) — surgical, doesn't break the "no exec"
  design of `subint_forkserver`
- standalone repro added (`spawn_and_error(breadth=
  2, depth=1)` under `trio.fail_after(20)`)
- stopgap: skip `test_nested_multierrors` + multi-
  level-spawn tests under the backend via
  `@pytest.mark.skipon_spawn_backend(...)` until
  fix lands

Killing the "tree-kill descendants" fix-direction
section: it addressed a bug that didn't exist.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 15:21:41 -04:00
2 changed files with 370 additions and 135 deletions

View File

@ -1,165 +1,333 @@
# `subint_forkserver` backend leaks subactor descendants in `test_cancellation.py` # `subint_forkserver` backend: `test_cancellation.py` multi-level cancel cascade hang
Follow-up tracker: surfaced while wiring the new Follow-up tracker: surfaced while wiring the new
`subint_forkserver` spawn backend into the full tractor `subint_forkserver` spawn backend into the full tractor
test matrix (step 2 of the post-backend-lands plan; test matrix (step 2 of the post-backend-lands plan).
see also See also
`ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`). `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
— sibling tracker for a different forkserver-teardown
class which probably shares the same fundamental root
cause (fork-FD-inheritance across nested spawns).
## TL;DR ## TL;DR
Running `tests/test_cancellation.py` under `tests/test_cancellation.py::test_nested_multierrors[subint_forkserver]`
`--spawn-backend=subint_forkserver` reproducibly leaks hangs indefinitely under our new backend. The hang is
**exactly 5 `subint-forkserv` comm-named child processes** **inside the graceful IPC cancel cascade** — every actor
after the pytest session exits. Both previously-run in the multi-level tree parks in `epoll_wait` waiting
sessions produced the same 5-process signature — not a for IPC messages that never arrive. Not a hard-kill /
flake. Each leaked process holds a `LISTEN` on the tree-reap issue (we don't reach the hard-kill fallback
default registry TCP addr (`127.0.0.1:1616`), which path at all).
poisons any subsequent tractor test session that
defaults to that addr.
## Stopgap (not the real fix) Working hypothesis (unverified): **`os.fork()` from a
subactor inherits the root parent's IPC listener socket
FDs**. When a first-level subactor forkserver-spawns a
grandchild, that grandchild inherits both its direct
spawner's FDs AND the root's FDs — IPC message routing
becomes ambiguous (or silently sends to the wrong
channel), so the cancel cascade can't reach its target.
Multiple tests in `test_cancellation.py` were calling ## Corrected diagnosis vs. earlier draft
`tractor.open_nursery()` **without** passing
`registry_addrs=[reg_addr]`, i.e. falling back on the
default `:1616`. The commit accompanying this doc wires
the `reg_addr` fixture through those tests so each run
gets a session-unique port — leaked zombies can no
longer poison **other** tests (they hold their own
unique port instead).
Tests touched (in `tests/test_cancellation.py`): An earlier version of this doc claimed the root cause
was **"forkserver teardown doesn't tree-kill
descendants"** (SIGKILL only reaches the direct child,
grandchildren survive and hold TCP `:1616`). That
diagnosis was **wrong**, caused by conflating two
observations:
- `test_cancel_infinite_streamer` 1. *5-zombie leak holding :1616* — happened in my own
- `test_some_cancels_all` workflow when I aborted a bg pytest task with
- `test_nested_multierrors` `pkill` (SIGTERM/SIGKILL, not SIGINT). The abrupt
- `test_cancel_via_SIGINT` kill skipped the graceful `ActorNursery.__aexit__`
- `test_cancel_via_SIGINT_other_task` cancel cascade entirely, orphaning descendants to
init. **This was my cleanup bug, not a forkserver
teardown bug.** Codified the fix (SIGINT-first +
bounded wait before SIGKILL) in
`feedback_sc_graceful_cancel_first.md` +
`.claude/skills/run-tests/SKILL.md`.
2. *`test_nested_multierrors` hangs indefinitely*
the real, separate, forkserver-specific bug
captured by this doc.
This is a **suite-hygiene fix** — it doesn't close the The two symptoms are unrelated. The tree-kill / setpgrp
actual leak; it just stops the leak from blast-radiusing. fix direction proposed earlier would not help (1) (SC-
Zombie descendants still accumulate per run. graceful-cleanup is the right answer there) and would
not help (2) (the hang is in the cancel cascade, not
in the hard-kill fallback).
## The real bug (unfixed) ## Symptom
`subint_forkserver_proc`'s teardown — `_ForkedProc.kill()` Reproducer (py3.14, clean env):
(plain `os.kill(SIGKILL)` to the direct child pid) +
`proc.wait()` — does **not** reap grandchildren or
deeper descendants. When a cancellation test causes a
multi-level actor tree to tear down, the direct child
dies but its own children survive and get reparented to
init (PID 1), where they stay running with their
inherited FDs (including the registry listen socket).
**Symptom on repro:** ```sh
# preflight: ensure clean env
ss -tlnp 2>/dev/null | grep ':1616' && echo 'FOUL — cleanup first!' || echo 'clean'
```
$ ss -tlnp 2>/dev/null | grep ':1616'
LISTEN 0 4096 127.0.0.1:1616 0.0.0.0:* \
users:(("subint-forkserv",pid=211595,fd=17),
("subint-forkserv",pid=211585,fd=17),
("subint-forkserv",pid=211583,fd=17),
("subint-forkserv",pid=211576,fd=17),
("subint-forkserv",pid=211572,fd=17))
$ for p in 211572 211576 211583 211585 211595; do
cat /proc/$p/cmdline | tr '\0' ' '; echo; done
./py314/bin/python -m pytest --spawn-backend=subint_forkserver \ ./py314/bin/python -m pytest --spawn-backend=subint_forkserver \
tests/test_cancellation.py --timeout=30 --timeout-method=signal \ 'tests/test_cancellation.py::test_nested_multierrors[subint_forkserver]' \
--tb=no -q --no-header --timeout=30 --timeout-method=thread --tb=short -v
... (x5, all same cmdline — inherited from fork)
``` ```
All 5 share the pytest cmdline because `os.fork()` Expected: `pytest-timeout` fires at 30s with a thread-
without `exec()` preserves the parent's argv. Their dump banner, but the process itself **remains alive
comm-name (`subint-forkserv`) is the `thread_name` we after timeout** and doesn't unwedge on subsequent
pass to the fork-worker thread in SIGINT. Requires SIGKILL to reap.
`tractor.spawn._subint_forkserver.fork_from_worker_thread`.
## Why 5? ## Evidence (tree structure at hang point)
Not confirmed; guess is 5 = the parametrize cardinality All 5 processes are kernel-level `S` (sleeping) in
of one of the leaky tests (e.g. `test_some_cancels_all` `do_epoll_wait` (trio's event loop waiting on I/O):
has 5 parametrize cases). Each param-case spawns a
nested tree; each leaks exactly one descendant. Worth
verifying by running each parametrize-case individually
and counting leaked procs per case.
## Ruled out ```
PID PPID THREADS NAME ROLE
- **`:1616` collision from a different repo** (e.g. 333986 1 2 subint-forkserv pytest main (the test body)
piker): `/proc/$pid/cmdline` + `cwd` both resolve to 333993 333986 3 subint-forkserv "child 1" spawner subactor
the tractor repo's `py314/` venv for all 5. These are 334003 333993 1 subint-forkserv grandchild errorer under child-1
definitively spawned by our test run. 334014 333993 1 subint-forkserv grandchild errorer under child-1
- **Parent-side `_ForkedProc.wait()` regressed**: the 333999 333986 1 subint-forkserv "child 2" spawner subactor (NO grandchildren!)
direct child's teardown completes cleanly (exit-code
captured, `waitpid` returns); the 5 survivors are
deeper-descendants whose parent-side shim has no
handle on them. So the bug isn't in
`_ForkedProc.wait()` — it's in the lack of tree-
level descendant enumeration + reaping during nursery
teardown.
## Likely fix directions
1. **Process-group-scoped spawn + tree kill.** Put each
forkserver-spawned subactor into its own process
group (`os.setpgrp()` in the fork child), then on
teardown `os.killpg(pgid, SIGKILL)` to reap the
whole tree atomically. Simplest, most surgical.
2. **Subreaper registration.** Use
`PR_SET_CHILD_SUBREAPER` on the tractor root so
orphaned grandchildren reparent to the root rather
than init — then we can `waitpid` them from the
parent-side nursery teardown. More invasive.
3. **Explicit descendant enumeration at teardown.**
In `subint_forkserver_proc`'s finally block, walk
`/proc/<pid>/task/*/children` before issuing SIGKILL
to build a descendant-pid set; then kill + reap all
of them. Fragile (Linux-only, proc-fs-scan race).
Vote: **(1)** — clean, POSIX-standard, aligns with how
`subprocess.Popen` (and by extension `trio.lowlevel.
open_process`) handle tree-kill semantics on
kwargs-supplied `start_new_session=True`.
## Reproducer
```sh
# before: ensure clean env
ss -tlnp 2>/dev/null | grep ':1616' || echo 'clean'
# run the leaky tests
./py314/bin/python -m pytest \
--spawn-backend=subint_forkserver \
tests/test_cancellation.py \
--timeout=30 --timeout-method=signal --tb=no -q --no-header
# observe: 5 leaked children now holding :1616
ss -tlnp 2>/dev/null | grep ':1616'
``` ```
Expected output: `subint-forkserv` processes listed as ### Asymmetric tree depth
listeners on `:1616`. Cleanup:
```sh The test's `spawn_and_error(breadth=2, depth=3)` should
pkill -9 -f \ have BOTH direct children spawning 2 grandchildren
"$(pwd)/py314/bin/python.*pytest.*spawn-backend=subint_forkserver" each, going 3 levels deep. Reality:
- Child 1 (333993, 3 threads) DID spawn its two
grandchildren as expected — fully booted trio
runtime.
- Child 2 (333999, 1 thread) did NOT spawn any
grandchildren — clearly never completed its
nursery's first `run_in_actor`. Its 1-thread state
suggests the runtime never fully booted (no trio
worker threads for `waitpid`/IPC).
This asymmetry is the key clue: the two direct
children started identically but diverged. Probably a
race around fork-inherited state (listener FDs,
subactor-nursery channel state) that happens to land
differently depending on spawn ordering.
### Parent-side state
Thread-dump of pytest main (333986) at the hang:
- Main trio thread — parked in
`trio._core._io_epoll.get_events` (epoll_wait on
its event loop). Waiting for IPC from children.
- Two trio-cache worker threads — each parked in
`outcome.capture(sync_fn)` calling
`os.waitpid(child_pid, 0)`. These are our
`_ForkedProc.wait()` off-loads. They're waiting for
the direct children to exit — but children are
stuck in their own epoll_wait waiting for IPC from
the parent.
**It's a deadlock, not a leak:** the parent is
correctly running `soft_kill(proc, _ForkedProc.wait,
portal)` (graceful IPC cancel via
`Portal.cancel_actor()`), but the children never
acknowledge the cancel message (or the message never
reaches them through the tangled post-fork IPC).
## What's NOT the cause (ruled out)
- **`_ForkedProc.kill()` only SIGKILLs direct pid /
missing tree-kill**: doesn't apply — we never reach
the hard-kill path. The deadlock is in the graceful
cancel cascade.
- **Port `:1616` contention**: ruled out after the
`reg_addr` fixture-wiring fix; each test session
gets a unique port now.
- **GIL starvation / SIGINT pipe filling** (class-A,
`subint_sigint_starvation_issue.md`): doesn't apply
— each subactor is its own OS process with its own
GIL (not legacy-config subint).
- **Child-side `_trio_main` absorbing KBI**: grep
confirmed; `_trio_main` only catches KBI at the
`trio.run()` callsite, which is reached only if the
trio loop exits normally. The children here never
exit trio.run() — they're wedged inside.
## Hypothesis: FD inheritance across nested forks
`subint_forkserver_proc` calls
`fork_from_worker_thread()` which ultimately does
`os.fork()` from a dedicated worker thread. Standard
Linux/POSIX fork semantics: **the child inherits ALL
open FDs from the parent**, including listener
sockets, epoll fds, trio wakeup pipes, and the
parent's IPC channel sockets.
At root-actor fork-spawn time, the root's IPC server
listener FDs are open in the parent. Those get
inherited by child 1. Child 1 then forkserver-spawns
its OWN subactor (grandchild). The grandchild
inherits FDs from child 1 — but child 1's address
space still contains **the root's IPC listener FDs
too** (inherited at first fork). So the grandchild
has THREE sets of FDs:
1. Its own (created after becoming a subactor).
2. Its direct parent child-1's.
3. The ROOT's (grandparent's) — inherited transitively.
IPC message routing may be ambiguous in this tangled
state. Or a listener socket that the root thinks it
owns is actually open in multiple processes, and
messages sent to it go to an arbitrary one. That
would exactly match the observed "graceful cancel
never propagates".
This hypothesis predicts the bug **scales with fork
depth**: single-level forkserver spawn
(`test_subint_forkserver_spawn_basic`) works
perfectly, but any test that spawns a second level
deadlocks. Matches observations so far.
## Fix directions (to validate)
### 1. `close_fds=True` equivalent in `fork_from_worker_thread()`
`subprocess.Popen` / `trio.lowlevel.open_process` have
`close_fds=True` by default on POSIX — they
enumerate open FDs in the child post-fork and close
everything except stdio + any explicitly-passed FDs.
Our raw `os.fork()` doesn't. Adding the equivalent to
our `_worker` prelude would isolate each fork
generation's FD set.
Implementation sketch in
`tractor.spawn._subint_forkserver.fork_from_worker_thread._worker`:
```python
def _worker() -> None:
pid: int = os.fork()
if pid == 0:
# CHILD: close inherited FDs except stdio + the
# pid-pipe we just opened.
keep: set[int] = {0, 1, 2, rfd, wfd}
import resource
soft, _ = resource.getrlimit(resource.RLIMIT_NOFILE)
os.closerange(3, soft) # blunt; or enumerate /proc/self/fd
# ... then child_target() as before
``` ```
Problem: overly aggressive — closes FDs the
grandchild might legitimately need (e.g. its parent's
IPC channel for the spawn-spec handshake, if we rely
on that). Needs thought about which FDs are
"inheritable and safe" vs. "inherited by accident".
### 2. Cloexec on tractor's own FDs
Set `FD_CLOEXEC` on tractor-created sockets (listener
sockets, IPC channel sockets, pipes). This flag
causes automatic close on `execve`, but since we
`fork()` without `exec()`, this alone doesn't help.
BUT — combined with a child-side explicit close-
non-cloexec loop, it gives us a way to mark "my
private FDs" vs. "safe to inherit". Most robust, but
requires tractor-wide audit.
### 3. Explicit FD cleanup in `_ForkedProc`/`_child_target`
Have `subint_forkserver_proc`'s `_child_target`
closure explicitly close the parent-side IPC listener
FDs before calling `_actor_child_main`. Requires
being able to enumerate "the parent's listener FDs
that the child shouldn't keep" — plausible via
`Actor.ipc_server`'s socket objects.
### 4. Use `os.posix_spawn` with explicit `file_actions`
Instead of raw `os.fork()`, use `os.posix_spawn()`
which supports explicit file-action specifications
(close this FD, dup2 that FD). Cleaner semantics, but
probably incompatible with our "no exec" requirement
(subint_forkserver is a fork-without-exec design).
**Likely correct answer: (3) — targeted FD cleanup
via `actor.ipc_server` handle.** (1) is too blunt,
(2) is too wide-ranging, (4) changes the spawn
mechanism.
## Reproducer (standalone, no pytest)
```python
# save as /tmp/forkserver_nested_hang_repro.py (py3.14+)
import trio, tractor
async def assert_err():
assert 0
async def spawn_and_error(breadth: int = 2, depth: int = 1):
async with tractor.open_nursery() as n:
for i in range(breadth):
if depth > 0:
await n.run_in_actor(
spawn_and_error,
breadth=breadth,
depth=depth - 1,
name=f'spawner_{i}_{depth}',
)
else:
await n.run_in_actor(
assert_err,
name=f'errorer_{i}',
)
async def _main():
async with tractor.open_nursery() as n:
for i in range(2):
await n.run_in_actor(
spawn_and_error,
name=f'top_{i}',
breadth=2,
depth=1,
)
if __name__ == '__main__':
from tractor.spawn._spawn import try_set_start_method
try_set_start_method('subint_forkserver')
with trio.fail_after(20):
trio.run(_main)
```
Expected (current): hangs on `trio.fail_after(20)`
— children never ack the error-propagation cancel
cascade. Pattern: top 2 direct children, 4
grandchildren, 1 errorer deadlocks while trying to
unwind through its parent chain.
After fix: `trio.TooSlowError`-free completion; the
root's `open_nursery` receives the
`BaseExceptionGroup` containing the `AssertionError`
from the errorer and unwinds cleanly.
## Stopgap (landed)
Until the fix lands, `test_nested_multierrors` +
related multi-level-spawn tests can be skip-marked
under `subint_forkserver` via
`@pytest.mark.skipon_spawn_backend('subint_forkserver',
reason='...')`. Cross-ref this doc.
## References ## References
- `tractor/spawn/_subint_forkserver.py::_ForkedProc` - `tractor/spawn/_subint_forkserver.py::fork_from_worker_thread`
— the current teardown shim; PID-scoped, not tree- — the primitive whose post-fork FD hygiene is
scoped. probably the culprit.
- `tractor/spawn/_subint_forkserver.py::subint_forkserver_proc` - `tractor/spawn/_subint_forkserver.py::subint_forkserver_proc`
— the spawn backend whose `finally` block needs the — the backend function that orchestrates the
tree-kill fix. graceful cancel path hitting this bug.
- `tests/test_cancellation.py` — the surface where the - `tractor/spawn/_subint_forkserver.py::_ForkedProc`
leak surfaces. — the `trio.Process`-compatible shim; NOT the
failing component (confirmed via thread-dump).
- `tests/test_cancellation.py::test_nested_multierrors`
— the test that surfaced the hang.
- `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md` - `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
— sibling tracker for a different forkserver-teardown — sibling hang class; probably same underlying
class (orphaned child doesn't respond to SIGINT); may fork-FD-inheritance root cause.
share root cause with this one once the fix lands.
- tractor issue #379 — subint backend tracking. - tractor issue #379 — subint backend tracking.

View File

@ -195,6 +195,69 @@ except ImportError:
_has_subints: bool = False _has_subints: bool = False
def _close_inherited_fds(
keep: frozenset[int] = frozenset({0, 1, 2}),
) -> int:
'''
Close every open file descriptor in the current process
EXCEPT those in `keep` (default: stdio only).
Intended as the first thing a post-`os.fork()` child runs
after closing any communication pipes it knows about. This
is the fork-child FD hygiene discipline that
`subprocess.Popen(close_fds=True)` applies by default for
its exec-based children, but which we have to implement
ourselves because our `fork_from_worker_thread()` primitive
deliberately does NOT exec.
Why it matters
--------------
Without this, a forkserver-spawned subactor inherits the
parent actor's IPC listener sockets, trio-epoll fd, trio
wakeup-pipe, peer-channel sockets, etc. If that subactor
then itself forkserver-spawns a grandchild, the grandchild
inherits the FDs transitively from *both* its direct
parent AND the root actor IPC message routing becomes
ambiguous and the cancel cascade deadlocks. See
`ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md`
for the full diagnosis + the empirical repro.
Fresh children will open their own IPC sockets via
`_actor_child_main()`, so they don't need any of the
parent's FDs.
Returns the count of fds that were successfully closed
useful for sanity-check logging at callsites.
'''
# Enumerate open fds via `/proc/self/fd` on Linux (the fast +
# precise path); fall back to `RLIMIT_NOFILE` range close on
# other platforms. Matches stdlib
# `subprocess._posixsubprocess.close_fds` strategy.
try:
fd_names: list[str] = os.listdir('/proc/self/fd')
candidates: list[int] = [
int(n) for n in fd_names if n.isdigit()
]
except (FileNotFoundError, PermissionError):
import resource
soft, _ = resource.getrlimit(resource.RLIMIT_NOFILE)
candidates = list(range(3, soft))
closed: int = 0
for fd in candidates:
if fd in keep:
continue
try:
os.close(fd)
closed += 1
except OSError:
# fd was already closed (race with listdir) or
# otherwise unclosable — either is fine.
pass
return closed
def _format_child_exit( def _format_child_exit(
status: int, status: int,
) -> str: ) -> str:
@ -302,9 +365,13 @@ def fork_from_worker_thread(
pid: int = os.fork() pid: int = os.fork()
if pid == 0: if pid == 0:
# CHILD: close the pid-pipe ends (we don't use # CHILD: close the pid-pipe ends (we don't use
# them here), run the user callable if any, exit. # them here), then scrub ALL other inherited FDs
# so the child starts with a clean slate
# (stdio-only). Critical for multi-level spawn
# trees — see `_close_inherited_fds()` docstring.
os.close(rfd) os.close(rfd)
os.close(wfd) os.close(wfd)
_close_inherited_fds()
rc: int = 0 rc: int = 0
if child_target is not None: if child_target is not None:
try: try: