Compare commits

...

4 Commits

Author SHA1 Message Date
Gud Boi f42dfe50a9 TOSQUASH 3a45dbd5, sync proctitle tests+doc to `_def_prefix` 2026-06-02 00:51:52 -04:00
Gud Boi d7da502d93 Bump trio `echoserver` cancel timeout 1→4s
Same trio 0.29 → 0.33 cancel-cascade slowdown that hit
`test_nested_multierrors` (ea67f1b6) — bumps the
`trio`-backend (non-debug, non-forking) budget in
`test_echoserver_detailed_mechanics` from 1s → 4s.

- The 1s budget raced the ~1s teardown deadline. On a
  deadline-fire trio 0.33 injects
  `Cancelled(source='deadline')` (cancel-reason
  metadata) that wraps the mid-stream KBI in a
  `BaseExceptionGroup`, breaking the bare
  `pytest.raises(KeyboardInterrupt)` below.
- Bump matches the forking-spawner branch (4s).
- Inline NOTE references the tracking issue
  `ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-06-02 00:49:54 -04:00
Gud Boi ea67f1b67b Bump trio depth=3 cancel timeout 6→12s
trio 0.29 → 0.33 lock bump (c7741bba) slowed the
depth=3 cancel-cascade in `test_nested_multierrors`
from <6s to ~7-8s; the 6s deadline was firing and its
`Cancelled(source='deadline')` (trio 0.33's new
cancel-reason metadata) collapsed a BEG branch,
breaking the `RemoteActorError` assertion downstream.

- Split the `('trio', _)` case-match into per-depth
  arms: `('trio', 1)` keeps 6s (still finishes in
  ~3s); `('trio', 3)` → 12s.
- Updated inline NOTE explains the version pivot +
  links the tracking issue
  `ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.
- Existing MTF/`subint_forkserver` budgets unchanged.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-06-02 00:47:45 -04:00
Gud Boi 3854cf5ecb Strip ANSI + accept `_create(...)` in devx tests
Two version-compat fixes for the `devx` debugger
test-suite, both about matching upstream output that
got more verbose w/ recent lib releases.

ANSI stripping (`tests/devx/conftest.py`),
- Add `ansi_strip(text)` helper + `_ansi_re` pattern
  (regex per https://stackoverflow.com/a/14693789).
- Apply inside `in_prompt_msg()` + `assert_before()` so
  substring matches against REPL/traceback output stay
  robust to color leakage.
- Motivated by py3.13's colored tracebacks +
  `pdbp`/pygments highlighting leaking ANSI even when
  `PYTHON_COLORS=0` is set in the `spawn` fixture (not
  every renderer in the spawned subproc honors it).
- Replaces the longstanding inline TODO that linked
  the SO answer w/o impl'ing.

trio 0.30+ `Cancelled._create(` match (`test_debugger`),
- In `test_shield_pause` swap the two
  `"raise Cancelled._create()"` assertion patterns →
  `"raise Cancelled._create("` (open-paren form, no
  closing).
- trio >=0.30 raises a multi-line
  `raise Cancelled._create(source=.., reason=..,
  source_task=..)` w/ cancel-reason metadata, so the
  legacy bare-`()` form no longer matches. Inline
  comment documents the trio-version pivot.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-06-02 00:41:06 -04:00
7 changed files with 195 additions and 33 deletions

View File

@ -0,0 +1,102 @@
# `trio` 0.29 -> 0.33 slows the depth=3 cancel-cascade
## Symptom
After locking to `trio==0.33.0` (commit `c7741bba`, was
`0.29.0`), this test reliably trips its `fail_after`
deadline on the **`trio`** backend:
```
FAILED tests/test_cancellation.py::test_nested_multierrors[start_method=trio-depth=3]
- AssertionError: assert False
where False = isinstance(
Cancelled(source='deadline', source_task=None, reason=None),
tractor.RemoteActorError,
)
```
A `fail_after_w_trace` hang-snapshot is captured for the
test each run (deadline-injected `Cancelled` wrapped into
the actor-nursery `BaseExceptionGroup`).
## Root cause (immediate)
The test budgets `fail_after(6)` for the `trio` backend.
That 6s was chosen (commit `32955db0`, while `trio==0.29`)
with the assertion that trio finishes "well under" 6s.
The `trio` 0.29 -> 0.33 bump slowed the depth=3 cascade
past that budget, so the 6s deadline now fires mid-cascade.
trio 0.33 added **cancel-reason tracking** — every
`Cancelled` now carries `(source=, reason=, source_task=)`.
The injected exc is `Cancelled(source='deadline')`, i.e.
trio itself naming our `fail_after(6)` scope as the cancel
origin. When that `Cancelled` collapses one branch of the
nursery BEG, the test's `isinstance(subexc,
RemoteActorError)` assertion fails. The healthy outcome is
`BEG = [RemoteActorError, RemoteActorError]`; the
`Cancelled` is purely an artifact of the deadline cutting
the cascade short.
## Measurements (standalone, this machine)
```
depth=1 trio ~3.15s PASS (keeps 6s budget)
depth=3 trio ~6.8-8.2s FAIL @ 6s (now bumped to 12s)
```
depth=1 still fits comfortably; only depth=3 (deeper
recursive spawn-and-error tree => more actors to reap)
exceeds the old budget. The ~2s/depth-level cost looks
like serialized per-actor reap / `terminate_after` waits.
## Mitigation applied
`test_nested_multierrors` now splits the `trio` budget:
```python
case ('trio', 1):
timeout = 6
case ('trio', 3):
timeout = 12 # was 6; see this doc
```
This stops the deadline from firing so the cascade
completes naturally to `[RAE, RAE]`.
## Also affected — same root cause, different test
`test_echoserver_detailed_mechanics[trio-raise_error=KeyboardInterrupt]`
(`tests/test_infected_asyncio.py`) tripped the *same*
slowdown via its much tighter `trio` budget of `1s`. The
single-aio-subactor teardown now takes ~1s, so the `1s`
`fail_after` raced the deadline (PASS at 0.99s / FAIL at
1.03s across back-to-back standalone runs). On a deadline-
fire the injected `Cancelled(source='deadline')` wraps the
mid-stream `KeyboardInterrupt` into a `BaseExceptionGroup`,
which is NOT a `KeyboardInterrupt` so the bare
`pytest.raises(KeyboardInterrupt)` fails. (The sibling
`raise_error=Exception` variant only "passes" by accident:
an `ExceptionGroup` *is-a* `Exception`, so its
`pytest.raises(Exception)` still matches even when wrapped.)
Mitigation: bump that `trio` budget `1 -> 4s` (matching the
forking-spawner case). Without a deadline-fire the KBI
propagates bare and the assertion passes.
## Open follow-up (the actual regression)
The budget bump is a band-aid — the underlying question is
**why** the depth=3 `trio` cancel-cascade went from <6s to
~7-8s across `trio` 0.29 -> 0.33. Candidate avenues:
- which scope owns the per-actor `terminate_after` wait,
and are the tree's reaps concurrent or serialized?
- did trio 0.33's abort/reschedule or cancel-reason
bookkeeping change checkpoint timing on the cancel path?
If/when the cascade speeds back up under-budget, depth=3
will start completing well under 12s — at which point the
budget can be tightened back toward 6s as a regression
tripwire. Related (different backend, same cascade class):
`cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`.

View File

@ -5,6 +5,7 @@
from __future__ import annotations
import platform
import os
import re
import signal
import time
from typing import (
@ -294,6 +295,26 @@ def expect(
PROMPT = r"\(Pdb\+\)"
# Strip terminal color / ANSI-VT100 escape sequences so
# substring matching against REPL + traceback output stays
# robust to color leakage — Python 3.13's colored tracebacks,
# `pdbp`'s pygments highlighting, etc. — even when
# `PYTHON_COLORS=0` (set in the `spawn` fixture) isn't honored
# by every renderer in the spawned subproc.
# Regex per https://stackoverflow.com/a/14693789
_ansi_re: re.Pattern = re.compile(
r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])'
)
def ansi_strip(text: str) -> str:
'''
Remove ANSI/VT100 escape sequences from `text`.
'''
return _ansi_re.sub('', text)
def in_prompt_msg(
child: SpawnBase,
parts: list[str],
@ -313,7 +334,7 @@ def in_prompt_msg(
'''
__tracebackhide__: bool = False
before: str = str(child.before.decode())
before: str = ansi_strip(str(child.before.decode()))
for part in parts:
if part not in before:
if pause_on_false:
@ -333,9 +354,9 @@ def in_prompt_msg(
return True
# TODO: todo support terminal color-chars stripping so we can match
# against call stack frame output from the the 'll' command the like!
# -[ ] SO answer for stipping ANSI codes: https://stackoverflow.com/a/14693789
# NB: color-char stripping (so we can match against call-stack
# frame output from the `ll` command and the like) is handled by
# `ansi_strip()` applied inside `in_prompt_msg()` + below.
def assert_before(
child: SpawnBase,
patts: list[str],
@ -356,7 +377,7 @@ def assert_before(
err_on_false=True,
**kwargs
)
before: str = str(child.before.decode())
before: str = ansi_strip(str(child.before.decode()))
return before

View File

@ -1186,7 +1186,12 @@ def test_shield_pause(
"('cancelled_before_pause'", # actor name
_repl_fail_msg,
"trio.Cancelled",
"raise Cancelled._create()",
# trio >=0.30 raises via a multi-line
# `raise Cancelled._create(source=.., reason=..,
# source_task=..)` (cancel-reason metadata), so
# match the open-paren form only, NOT the legacy
# bare `()`.
"raise Cancelled._create(",
# we should be handling a taskc inside
# the first `.port_mortem()` sin-shield!
@ -1204,7 +1209,12 @@ def test_shield_pause(
"('root'", # actor name
_repl_fail_msg,
"trio.Cancelled",
"raise Cancelled._create()",
# trio >=0.30 raises via a multi-line
# `raise Cancelled._create(source=.., reason=..,
# source_task=..)` (cancel-reason metadata), so
# match the open-paren form only, NOT the legacy
# bare `()`.
"raise Cancelled._create(",
# handling a taskc inside the first unshielded
# `.port_mortem()`.

View File

@ -8,9 +8,9 @@ after `Actor` construction, so any spawned sub-actor process
should:
- have `argv[0]` (== `/proc/<pid>/cmdline`) start with
`tractor[<aid.reprol()>]`
- have `/proc/<pid>/comm` start with `tractor[` (kernel
truncates to ~15 bytes)
`<_def_prefix>[<aid.reprol()>]` (currently `_subactor[]`)
- have `/proc/<pid>/comm` start with `<_def_prefix>[`
(kernel truncates to ~15 bytes)
- be detected as a tractor sub-actor by
`_is_tractor_subactor(pid)` via the cmdline marker.
@ -27,7 +27,10 @@ import trio
import tractor
from tractor.runtime._runtime import Actor
from tractor.devx._proctitle import set_actor_proctitle
from tractor.devx._proctitle import (
set_actor_proctitle,
_def_prefix,
)
from tractor._testing._reap import (
_is_tractor_subactor,
_read_cmdline,
@ -41,8 +44,9 @@ _non_linux: bool = platform.system() != 'Linux'
def test_set_actor_proctitle_format():
'''
`set_actor_proctitle()` returns the canonical
`tractor[<aid.reprol()>]` form and actually mutates
the running proc's title.
`<_def_prefix>[<aid.reprol()>]` form (currently
`_subactor[]`) and actually mutates the running
proc's title.
'''
pytest.importorskip(
@ -60,12 +64,14 @@ def test_set_actor_proctitle_format():
)
title: str = set_actor_proctitle(actor)
# canonical wrapping: `tractor[<aid.reprol()>]`. We
# compare against the runtime-computed `reprol()`
# rather than a hard-coded value so the test stays
# decoupled from `Aid.reprol()`'s internal format
# (currently `<name>@<pid>`, but could evolve).
expected: str = f'tractor[{actor.aid.reprol()}]'
# canonical wrapping: `<_def_prefix>[<aid.reprol()>]`.
# We source BOTH the prefix (`_def_prefix`) and the
# runtime-computed `reprol()` rather than hard-coding,
# so the test stays decoupled from the prefix shape
# (flipped to `_subactor` in `3a45dbd5`) AND from
# `Aid.reprol()`'s internal format (currently
# `<name>@<pid>`, but could evolve).
expected: str = f'{_def_prefix}[{actor.aid.reprol()}]'
assert title == expected
# sanity: the actor's name must be in the title
# somewhere (so a future `reprol()` change that
@ -140,15 +146,17 @@ def test_subactor_proctitle_visible_via_proc():
)
pid, info = matched[0]
# canonical proctitle prefix in cmdline (full form)
assert info['cmdline'].startswith('tractor[proctitle_boi@'), (
f'cmdline missing `tractor[proctitle_boi@…]` prefix: '
# canonical proctitle prefix in cmdline (full form);
# prefix sourced from `_def_prefix` so it tracks the
# `3a45dbd5` flip (`tractor[` -> `_subactor[`).
assert info['cmdline'].startswith(f'{_def_prefix}[proctitle_boi@'), (
f'cmdline missing `{_def_prefix}[proctitle_boi@…]` prefix: '
f'{info["cmdline"]!r}'
)
# comm is kernel-truncated to ~15 bytes — just check the
# `tractor[` prefix made it.
assert info['comm'].startswith('tractor['), (
f'comm missing `tractor[` prefix: {info["comm"]!r}'
# `<_def_prefix>[` prefix made it.
assert info['comm'].startswith(f'{_def_prefix}['), (
f'comm missing `{_def_prefix}[` prefix: {info["comm"]!r}'
)
# intrinsic-signal detector should match.
assert info['is_tractor'] is True

View File

@ -605,17 +605,28 @@ async def test_nested_multierrors(
)
)
# 6s budget: in the non-hang case (and on the trio
# backend) the whole spawn + cancel-cascade should
# complete in well under that. On the borderline hang
# case the `fail_after_w_trace` fires `TooSlowError`
# AND captures a ptree/wchan/py-spy snapshot to
# Per-backend/-depth budgets: in the non-hang case the
# whole spawn + cancel-cascade should complete in well
# under these. On the borderline hang case the
# `fail_after_w_trace` fires `TooSlowError` AND captures a
# ptree/wchan/py-spy snapshot to
# `$XDG_CACHE_HOME/tractor/hung-dumps/` for offline
# inspection. See
# `ai/conc-anal/cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`.
#
# NOTE: the `trio` depth=3 budget was bumped 6 -> 12s after
# the `trio` 0.29 -> 0.33 lock bump (commit c7741bba) slowed
# the depth-3 cancel-cascade from <6s to ~7-8s; the 6s
# deadline was firing and its `Cancelled(source='deadline')`
# (trio 0.33 cancel-reason metadata) collapsed a BEG branch,
# breaking the `RemoteActorError` assertion below. depth=1
# still finishes in ~3s so keeps the 6s budget. See
# `ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.
match (start_method, depth):
case ('trio', _):
case ('trio', 1):
timeout = 6
case ('trio', 3):
timeout = 12
case ('main_thread_forkserver', 1):
timeout = 16
case ('main_thread_forkserver', 3):

View File

@ -860,7 +860,14 @@ def test_echoserver_detailed_mechanics(
timeout: float = (
999 if tractor.debug_mode()
else 4 if is_forking_spawner
else 1
# was 1; the `trio` 0.29 -> 0.33 bump slowed the
# cancel-cascade so a 1s budget raced the ~1s teardown
# deadline. On a deadline-fire the injected
# `Cancelled(source='deadline')` wraps the mid-stream
# KBI in a `BaseExceptionGroup`, breaking the bare
# `pytest.raises(KeyboardInterrupt)` below. See
# `ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.
else 4
)
# body factored out so the `fail_after_w_trace`-wrapping

View File

@ -24,7 +24,10 @@ which" at a glance without needing to read full
`/proc/<pid>/cmdline`.
Format:
``tractor[<aid.reprol()>]`` e.g. ``tractor[doggy@1027301b]``
``<_def_prefix>[<aid.reprol()>]`` e.g. ``_subactor[doggy@1027301b]``
(prefix from the `_def_prefix` const, flipped `tractor` ->
`_subactor` so sub-actor procs are visually distinct from the
root in `ps`/`htop` and the reap-recognition markers.)
Uses the canonical `Aid.reprol()` form
(``<name>@<uuid_short>``) so the proc-title matches the