tractor

Commit Graph

Author	SHA1	Message	Date
Gud Boi	57b3ea59ea	Bump trio depth=3 cancel timeout 6→12s trio 0.29 → 0.33 lock bump (`c7741bba`) slowed the depth=3 cancel-cascade in `test_nested_multierrors` from <6s to ~7-8s; the 6s deadline was firing and its `Cancelled(source='deadline')` (trio 0.33's new cancel-reason metadata) collapsed a BEG branch, breaking the `RemoteActorError` assertion downstream. - Split the `('trio', _)` case-match into per-depth arms: `('trio', 1)` keeps 6s (still finishes in ~3s); `('trio', 3)` → 12s. - Updated inline NOTE explains the version pivot + links the tracking issue `ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`. - Existing MTF/`subint_forkserver` budgets unchanged. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `ea67f1b67b`)	2026-06-09 20:28:42 -04:00
Gud Boi	6df9ee11bc	Add `supervise_run_process` to `trionics._subproc` A `trio.Nursery.start()`-style wrapper around `trio.run_process()` that surfaces rc!=0 errors deterministically, ALWAYS isolates the parent controlling-tty, and optionally live-relays the child's std-streams to `log.<level>` per-line. Suits both short-lived test-runners + long-lived daemons. `supervise_run_process()`, - Deterministic rc!=0: pass `check=False` to `trio` and do our OWN post-drain rc-check from the supervisor coro body AFTER `own_tn.__aexit__` — NOT inside the internal nursery, since that would race-cancel the still-draining relay reader and lose stderr lines. (Re)build + raise a BARE `subprocess.CalledProcessError`: `.stderr=` for programmatic callers + an `add_note()`'d `\|_.stderr:` block for human teardown logs. No nursery-eg-wrapped CPE to `collapse_eg` around. - Parent controlling-tty isolation: `stdin=DEVNULL` always, `stdout=DEVNULL` unless relayed/overridden (via `stdout=` kwarg w/ `_UNSET` sentinel so explicit `None` = inherit still works). Prevents a spawned program from clobbering the launching tty's scrollback w/ control-seqs. - Live per-line relay: `relay_stdout=True`/ `relay_stderr=True` → relayed to `log.<relay_level>` (default `'io'`, our custom level 21). Picked to sort just above stdlib `INFO`=20 so it shows at usual `info`/`devx` levels yet stays separately filterable; `runtime`=15 was REJECTED as a default since it'd be silently filtered at usual verbosity — footgun for daemon supervisors whose whole point is visibility. STREAMED, not buffered-until-exit. - Non-blocking `tn.start()` semantics: live `trio.Process` handed up via `task_status.started()` immediately (else `tn.start()` would block till child exit, losing the long-lived-daemon use case). Supervise/relay bg tasks run to completion in this coro. - `*run_process_kwargs` forwarded verbatim (env, shell, cwd, start_new_session, executable, ...); MANAGED keys (`stdin`/`stdout`/`stderr`/`check`) win on conflict. - Crash-handling layer intentionally NOT baked in — compose `maybe_open_crash_handler()` ON TOP at the call-site. `_relay_stream_lines()` helper, - Concurrent pipe-drain reader. MANDATORY whenever piping w/o `capture_` since nothing else drains the OS pipe — child blocks on `write()` once kernel buf (~64KiB) fills → deadlock. - Modes (combine freely): `emit`-only live relay, `accum`-only silent drain+capture (for the CPE note), or both. Per-line splitting handles cross-chunk residuals + flushes any trailing un-newline-term'd line at EOF. `_add_stderr_note()` helper, - Attaches an indented `\|_.stderr:` note to a CPE via `add_note()` for legible rc!=0 reporting at teardown. Tests (`tests/trionics/test_subproc.py`), - Hermetic `trio`-only (no actor-runtime). - `test_stdout_relayed_per_line`: per-line stdout relay. - `test_parent_tty_isolated`: child fd1 is OUR pipe (no `/dev/pts/*`), fd0 pinned to `/dev/null`. - `test_no_deadlock_on_big_unnewlined_output`: 200KiB no-newline output completes under `fail_after(2)` — exercises the concurrent drain (without it, the child blocks at ~64KiB). - `test_stderr_relay_and_cpe_rebuild`: rc!=0 w/ `relay_stderr=True` → bare `CalledProcessError` w/ the `.stderr` note + per-line live relay. - `test_nonrelay_cpe_note`: rc!=0 w/o relay → same deterministic post-drain CPE w/ `.stderr` note (silent drain+capture path). Re-export `supervise_run_process` from `tractor.trionics`. Prompt-IO: ai/prompt-io/claude/20260601T231429Z_0e3e008b_prompt_io.md (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `f595acc76c`)	2026-06-09 20:28:42 -04:00
Gud Boi	232d6ccfbf	Add `logspec` leaf-mod Route B follow-up doc Follow-up note documenting why the deeper "Route B" fix for `LogSpec`/`apply_logspec()` true per-leaf-MODULE level control was NOT taken — in favor of the smaller sub-PACKAGE fix that shipped in `9c36363b`. Doc covers, - Status: what `9c36363b` already gives (per-sub-pkg control at any nesting depth, `devx.debug` ≠ `devx`) vs. what remains unaddressed (per-leaf-mod levels, top-level lib mods like `tractor.to_asyncio` on the root logger). - "Route B" sketch: make logger identity the full dotted module path; mv the cosmetic leaf-trim out of logger-naming into the formatter's `{name}` rendering. - 6 breaking-change costs: every logger name changes, formatter rewrite, propagation/double-emit surface grows, level-inheritance semantics shift, `modden`/`piker` contract churn, `get_logger()` refactor risk. - Migration plan if pursued: extract a pure `_mk_logger_name()` helper w/ an exhaustive name-shape test matrix, swap `get_logger()` to use it for identity, swap formatter to use the display string, golden-diff rendered headers, coordinate w/ downstreams. - "Route A" alternative: a `logging.Filter` keyed on `record.module`/`pathname` for per-leaf control w/o name churn — lower risk, narrower power. - Recommendation: defer Route B; prefer Route A if per-leaf is needed soon; the shipped sub-PKG fix covers the common ask. Lives under `ai/tooling-todos/` since it's a deferred- work decision record, not a triage/conc-anal doc. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `5b3c2e3762`)	2026-06-09 20:28:42 -04:00
Gud Boi	30591e79c2	Add boot-race conc-anal, widen `xfail` to `n_dups=8` New `ai/conc-anal/spawn_time_boot_death_dup_name_issue.md` documenting the spawn-time rc=2 race under rapid same-name spawning against a forkserver + registrar — the `wait_for_peer_or_proc_death` helper now surfaces the death instead of parking forever on the handshake wait. Also, - extract inline `xfail` into module-level `_DOGGY_BOOT_RACE_XFAIL` marker. - apply it to `n_dups=8` too (previously bare) bc larger N widens the race window enough to fire occasionally. - link to tracking issue #456. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `92443dc4ef`)	2026-06-09 20:28:05 -04:00
Gud Boi	15c3b670e1	Add `test_register_duplicate_name` race analysis Document the intermittent connect-refused failure in the registrar daemon test — root cause is the `daemon` fixture's blind `time.sleep()` readiness gate racing against the subproc's `bind()`/ `listen()` completion. Distinct from the cancel- cascade `TooSlowError` flake class. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `29f9928524`)	2026-06-09 20:28:04 -04:00
Gud Boi	4f042ded23	Add `tractor.trionics.patches` subpkg + first fix With a seminal patch fixing `trio`'s `WakeupSocketpair.drain()` which can busy-loop due to lack of handling `EOF`. New `tractor.trionics.patches` subpkg housing defensive monkey-patches for upstream `trio` bugs we've encountered while running `tractor` — particularly as of recent, fork-survival edge cases that haven't been filed/fixed upstream yet. Each patch is idempotent, version-gated via `is_needed()`, and carries a `# REMOVE WHEN:` marker pointing at the upstream release whose adoption allows deletion. Subpkg layout + per-patch contract documented in `tractor/trionics/patches/README.md` — `apply()` / `is_needed()` / `repro()` API, registry pattern via `_PATCHES` in `__init__.py`, single-call entry point `apply_all()`. First patch, `_wakeup_socketpair`: - `trio`'s `WakeupSocketpair.drain()` loops on `recv(64KB)` and exits ONLY on `BlockingIOError`, NEVER on `recv() == b''` (peer-closed FIN). - under `fork()`-spawning backends the COW-inherited socketpair fds & `_close_inherited_fds()` teardown can leave a `WakeupSocketpair` instance whose write-end is closed, and `drain()` then spins forever in C with no Python checkpoints, - this obviously burns 100% CPU and no signal delivery. Standalone repro: from trio._core._wakeup_socketpair import WakeupSocketpair ws = WakeupSocketpair() ws.write_sock.close() ws.drain() # spins forever Patch is one-line — break the drain loop on b'' EOF. Manifested as two distinct test failures: - `tests/test_multi_program.py::test_register_duplicate_name` hung at 100% CPU on the busy-loop directly (fork child's worker thread) - `tests/test_infected_asyncio.py::test_aio_simple_error` Mode-A deadlock — busy-loop wedged trio's scheduler inside `start_guest_run`, both threads parked in `epoll_wait`, no TCP connect-back to parent ever happened. Same patch fixes both. Restored 99.7% pass rate on full suite under `--spawn-backend=main_thread_forkserver` (was hanging indefinitely before). Wired into `tractor._child._actor_child_main` via `apply_all()` BEFORE any trio runtime init. Harmless on non-fork backends. Conc-anal write-ups, including strace + py-spy evidence: - `ai/conc-anal/trio_wakeup_socketpair_busy_loop_under_fork_issue.md` - `ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md` Regression tests in `tests/trionics/test_patches.py`: each test asserts (a) the bug exists pre-patch (or is fixed upstream — skip cleanly), (b) the patch fixes it with a SIGALRM wall-clock cap so a regression hangs loud instead of silently. TODO: - [ ] file the upstream `python-trio/trio` issue + PR. - [ ] use the `repro()` callable in `_wakeup_socketpair.py` IS the issue body's evidence section. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code (cherry picked from commit `0ef549fadb`) (factored: dropped spawn-backend-only paths: ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md)	2026-06-09 20:28:04 -04:00
Gud Boi	23b8a80e15	Add posix-multithreaded-`fork()` explainer doc (cherry picked from commit `532a9834f3`)	2026-06-09 20:27:26 -04:00
Gud Boi	a7b1ee34ef	Restore fn-arg `_runtime_vars` in `trio_proc` teardown During the Phase A extraction of `trio_proc()` out of `spawn._spawn` into its own submod, the `debug.maybe_wait_for_debugger(child_in_debug=...)` call site in the hard-reap `finally` got refactored from the original `_runtime_vars.get('_debug_mode', ...)` (the fn parameter — the dict that was constructed by the parent for the child's `SpawnSpec`) to `get_runtime_vars().get(...)` (a global getter that returns the parent's live `_state`). Those are semantically different — the first asks "is the child we just spawned in debug mode?", the second asks "are we in debug mode?". Under mixed-debug-mode trees the swap can incorrectly skip (or unnecessarily delay) the debugger-lock wait during teardown. Revert to the fn-parameter lookup and add an inline `NOTE` comment calling out the distinction so it's harder to regress again. Deats, - `spawn/_trio.py`: `child_in_debug=get_runtime_vars().get(...)` → `child_in_debug=_runtime_vars.get(...)` at the `debug.maybe_wait_for_debugger(...)` call in the hard-reap block; add 4-line `NOTE` explaining the parent-vs-child distinction. - `spawn/__init__.py`: drop trailing whitespace after the `'mp_forkserver'` docstring bullet. - `ai/prompt-io/prompts/subints_spawner.md`: drop duplicated `with` in `"as with with subprocs"` prose (copilot grammar catch). Review: PR #444 (Copilot) https://github.com/goodboy/tractor/pull/444#pullrequestreview-4165928469 (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-23 18:30:11 -04:00
Gud Boi	e0b8f23cbc	Add prompt-io files for "phase-A", fix typos caught by copilot	2026-04-17 18:26:41 -04:00
Gud Boi	b5b0504918	Add prompt-IO log for subint spawner design kickoff Log the `claude-opus-4-7` design session that produced the phased plan (A: modularize `_spawn`, B: `_subint` backend, C: harness) and concrete Phase A file-split for #379. Substantive bc the plan directly drives upcoming impl. Prompt-IO: ai/prompt-io/claude/20260417T034918Z_9703210_prompt_io.md (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-17 16:48:22 -04:00
Gud Boi	de78a6445b	Initial prompt to vibe subint support Bo	2026-04-17 16:48:18 -04:00
Gud Boi	3152f423d8	Condense `.raw.md` prompt-IO logs, add `diff_cmd` refs Replace verbose inline code dumps in `.raw.md` entries with terse summaries and `git diff` cmd references. Add `diff_cmd` metadata to each entry's YAML frontmatter so readers can reproduce the actual output diff. Also, - rename `multiaddr_declare_eps.md_` -> `.md` (drop trailing `_` suffix) (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-16 17:44:14 -04:00
Gud Boi	ccb013a615	Add `prefer_addr()` transport selection to `_api` New locality-aware addr preference for multihomed actors: UDS > local TCP > remote TCP. Uses `ipaddress` + `socket.getaddrinfo()` to detect whether a `TCPAddress` is on the local host. Deats, - `_is_local_addr()` checks loopback or same-host IPs via interface enumeration - `prefer_addr()` classifies an addr list into three tiers and picks the latest entry from the highest-priority non-empty tier - `query_actor()` and `wait_for_actor()` now call `prefer_addr()` instead of grabbing `addrs[-1]` or a single pre-selected addr Also, - `Registrar.find_actor()` returns full `list[UnwrappedAddress]\|None` so callers can apply transport preference Prompt-IO: ai/prompt-io/claude/20260414T163300Z_befedc49_prompt_io.md (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-14 19:54:14 -04:00
Gud Boi	e90241baaa	Add `parse_endpoints()` to `_multiaddr` Provide a service-table parsing API for downstream projects (like `piker`) to declare per-actor transport bind addresses as a config map of actor-name -> multiaddr strings (e.g. from a TOML `[network]` section). Deats, - `EndpointsTable` type alias: input `dict[str, list[str\|tuple]]`. - `ParsedEndpoints` type alias: output `dict[str, list[Address]]`. - `parse_endpoints()` iterates the table and delegates each entry to the existing `tractor.discovery._discovery.wrap_address()` helper, which handles maddr strings, raw `(host, port)` tuples, and pre-wrapped `Address` objs. - UDS maddrs use the multiaddr spec name `/unix/...` (not tractor's internal `/uds/` proto_key) Also add new tests, - 7 new pure unit tests (no trio runtime): TCP-only, mixed tpts, unwrapped tuples, mixed str+tuple, unsupported proto (`/udp/`), empty table, empty actor list - all 22 multiaddr tests pass rn. Prompt-IO: ai/prompt-io/claude/20260413T205048Z_269d939c_prompt_io.md (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-14 19:54:14 -04:00
Gud Boi	7079a597c5	Add `test_tpt_bind_addrs.py` + fix type-mixing bug Add 9 test variants (6 fns) covering all three `tpt_bind_addrs` code paths in `open_root_actor()`: - registrar w/ explicit bind (eq, subset, disjoint) - non-registrar w/ explicit bind (same/diff bindspace) using `daemon` fixture - non-registrar default random bind (baseline) - maddr string input parsing - registrar merge produces union - `open_nursery()` forwards `tpt_bind_addrs` Fix type-mixing bug at `_root.py:446` where the registrar merge path did `set(Address + tuple)`, preventing dedup and causing double-bind `OSError`. Wrap `uw_reg_addrs` before the set union so both sides are `Address` objs. Also, - add prompt-io output log for this session - stage original prompt input for tracking Prompt-IO: ai/prompt-io/claude/20260413T192116Z_f851f28_prompt_io.md (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-14 19:54:14 -04:00
Gud Boi	cd1cd03725	Add prompt-io log for `run_ctx` teardown analysis Documents the diagnostic session tracing why per-`ctx_key` locking alone doesn't close the `_Cache.run_ctx` teardown race — the lock pops in the exiting caller's task but resource cleanup runs in the `run_ctx` task inside `service_tn`. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code	2026-04-09 14:42:42 -04:00
Gud Boi	cab366cd65	Add xfail test for `_Cache.run_ctx` teardown race Reproduce the piker `open_cached_client('kraken')` scenario: identical `ctx_key` callers share one cached resource, and a new task re-enters during `__aexit__` — hitting `assert not resources.get()` bc `values` was popped but `resources` wasn't yet. Deats, - `test_moc_reentry_during_teardown` uses an `in_aexit` event to deterministically land in the teardown window. - marked `xfail(raises=AssertionError)` against unpatched code (fix in `9e49eddd` or wtv lands on the `maybe_open_ctx_locking` or thereafter patch branch). Also, add prompt-io log for the session. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code Prompt-IO: ai/prompt-io/claude/20260406T193125Z_85f9c5d_prompt_io.md	2026-04-06 18:17:04 -04:00
Gud Boi	85f9c5df6f	Add per-`ctx_key` isolation tests for `maybe_open_context()` Add `test_per_ctx_key_resource_lifecycle` to verify that per-key user tracking correctly tears down resources independently - exercises the fix from 02b2ef18 where a global `_Cache.users` counter caused stale cache hits when the same `acm_func` was called with different kwargs. Also, add a paired `acm_with_resource()` helper `@acm` that yields its `resource_id` for per-key testing in the above suite. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code Prompt-IO: ai/prompt-io/claude/20260406T172848Z_02b2ef1_prompt_io.md	2026-04-06 14:37:47 -04:00

18 Commits (0952b33a9e6d06ec8f68340a014c4ae90c44d214)