Compare commits
No commits in common. "wkt/tooling_enhancements_from_mtf_spawner" and "main" have entirely different histories.
wkt/toolin
...
main
|
|
@ -1,38 +0,0 @@
|
|||
# Docs TODOs
|
||||
|
||||
## Auto-sync README code examples with source
|
||||
|
||||
The `docs/README.rst` has inline code blocks that
|
||||
duplicate actual example files (e.g.
|
||||
`examples/infected_asyncio_echo_server.py`). Every time
|
||||
the public API changes we have to manually sync both.
|
||||
|
||||
Sphinx's `literalinclude` directive can pull code directly
|
||||
from source files:
|
||||
|
||||
```rst
|
||||
.. literalinclude:: ../examples/infected_asyncio_echo_server.py
|
||||
:language: python
|
||||
:caption: examples/infected_asyncio_echo_server.py
|
||||
```
|
||||
|
||||
Or to include only a specific function/section:
|
||||
|
||||
```rst
|
||||
.. literalinclude:: ../examples/infected_asyncio_echo_server.py
|
||||
:language: python
|
||||
:pyobject: aio_echo_server
|
||||
```
|
||||
|
||||
This way the docs always reflect the actual code without
|
||||
manual syncing.
|
||||
|
||||
### Considerations
|
||||
- `README.rst` is also rendered on GitHub/PyPI which do
|
||||
NOT support `literalinclude` - so we'd need a build
|
||||
step or a separate `_sphinx_readme.rst` (which already
|
||||
exists at `docs/github_readme/_sphinx_readme.rst`).
|
||||
- Could use a pre-commit hook or CI step to extract code
|
||||
from examples into the README for GitHub rendering.
|
||||
- Another option: `sphinx-autodoc` style approach where
|
||||
docstrings from the actual module are pulled in.
|
||||
|
|
@ -1,125 +0,0 @@
|
|||
# `RuntimeVars` env-var lift — design plan
|
||||
|
||||
Status: **draft, awaiting user edits**
|
||||
|
||||
## Goal
|
||||
|
||||
Consolidate the sprawl of pytest CLI flags + ad-hoc env vars +
|
||||
hardcoded fixture defaults into a *single* env-var-encoded
|
||||
runtime-vars envelope, with a typed in-memory representation
|
||||
(`tractor.runtime._state.RuntimeVars`) as the sole source of
|
||||
truth.
|
||||
|
||||
## Why now
|
||||
|
||||
- `--tpt-proto`, `--spawn-backend`, `--diag-on-hang`,
|
||||
`--diag-capture-delay` and (soon) `TRACTOR_REG_ADDR` etc. are
|
||||
proliferating. Each adds a parsing seam.
|
||||
- `tests/devx/test_debugger.py` invokes example scripts as
|
||||
separate subprocesses; they currently can't see the
|
||||
fixture-allocated `reg_addr` at all (root cause of why
|
||||
parametrizing devx scripts on `reg_addr` is on your TODO).
|
||||
- Concurrent pytest sessions on the same host collide on
|
||||
shared defaults (the `registry@1616` race we just fixed is
|
||||
one symptom; per-session unique addr is the structural
|
||||
fix).
|
||||
- `tractor.runtime._state.RuntimeVars: Struct` is already
|
||||
defined and **unused** — its docstring even says it
|
||||
"should be utilized as possible for future calls."
|
||||
|
||||
## Design
|
||||
|
||||
### Module: `tractor/_testing/_rtvars.py`
|
||||
|
||||
Lifted from `modden.runtime.env`, ~50 LOC, no new deps.
|
||||
|
||||
```python
|
||||
_TRACTOR_RT_VARS_OSENV: str = '_TRACTOR_RT_VARS'
|
||||
|
||||
def dump_rtvars(rtvars: RuntimeVars|dict) -> tuple[str, str]:
|
||||
'''str-serialize via `str(dict)` — ast.literal_eval-able'''
|
||||
|
||||
def load_rtvars(env: dict) -> RuntimeVars:
|
||||
'''ast.literal_eval the env-var value, hydrate to struct'''
|
||||
|
||||
def get_rtvars(proc: psutil.Process|None = None) -> RuntimeVars:
|
||||
'''read the var from a target proc's env (or current)'''
|
||||
|
||||
def update_rtvars(
|
||||
rtvars: RuntimeVars|dict|None = None,
|
||||
update_osenv: bool|dict = True,
|
||||
) -> tuple[str, str]:
|
||||
'''mutate + re-encode + (optionally) write to os.environ'''
|
||||
```
|
||||
|
||||
### Encoding choice: `str(dict)` + `ast.literal_eval`
|
||||
|
||||
Pros:
|
||||
- stdlib only
|
||||
- handles all the types tractor's tests need: `str`, `int`,
|
||||
`float`, `bool`, `None`, `list`, `tuple`, `dict`
|
||||
- human-readable in the env (greppable, inspectable via
|
||||
`cat /proc/<pid>/environ | tr '\0' '\n'`)
|
||||
|
||||
Cons:
|
||||
- non-stdlib types (msgspec Structs, `Path`, custom classes)
|
||||
must be lowered first — fine for the test fixture set
|
||||
- not stable across Python versions for esoteric repr cases
|
||||
(we don't hit any)
|
||||
|
||||
Alternatives considered:
|
||||
- **msgpack**: adds a dep + binary form is ungreppable
|
||||
- **json**: doesn't preserve tuples (becomes lists), which is
|
||||
a common type for `reg_addr`
|
||||
- **toml/yaml**: heavier deps, no real benefit
|
||||
|
||||
### `RuntimeVars` becomes the single source of truth
|
||||
|
||||
The legacy `_runtime_vars: dict[str, Any]` global in
|
||||
`runtime/_state.py` becomes a *cached view* of a
|
||||
`RuntimeVars` singleton instance:
|
||||
|
||||
- `get_runtime_vars()` returns either the struct or a
|
||||
`.to_dict()` view depending on caller's preference
|
||||
- `set_runtime_vars(...)` validates against the struct schema
|
||||
- spawn-time SpawnSpec sends the struct (already does
|
||||
conceptually — just gets typed)
|
||||
- `__setattr__` `breakpoint()` debug instrumentation gets
|
||||
removed (unrelated cleanup, mentioned in conversation)
|
||||
|
||||
### Migration path
|
||||
|
||||
**Phase 0** *(prep)*: strip the stray `breakpoint()` from
|
||||
`RuntimeVars.__setattr__`.
|
||||
|
||||
**Phase 1**: land `_rtvars.py` as a leaf module, used only by
|
||||
test infra. Subprocess-spawned scripts in `tests/devx/`
|
||||
read `_TRACTOR_RT_VARS` on startup → reconstruct
|
||||
`RuntimeVars` → call `tractor.open_root_actor(**rtvars.as_kwargs())`.
|
||||
Concurrent runs become deterministic-isolated because each
|
||||
session writes a unique `_registry_addrs` into the env.
|
||||
|
||||
**Phase 2**: migrate runtime callers (`_state.get_runtime_vars`,
|
||||
spawn `SpawnSpec`, `Actor.async_main`) to operate on the
|
||||
struct directly, with the dict as a compat view that gets
|
||||
deprecated.
|
||||
|
||||
**Phase 3** *(structural)*: per-session bindspace subdir
|
||||
`/run/user/<uid>/tractor/<session_uuid>/` — encoded in the
|
||||
rt-vars envelope, picked up by every subactor automatically.
|
||||
Obsoletes the entire bindspace-leak warning class.
|
||||
|
||||
## Open design questions (user input wanted)
|
||||
|
||||
- (placeholder for your edits)
|
||||
- (placeholder)
|
||||
- (placeholder)
|
||||
|
||||
## Out-of-scope for this lift
|
||||
|
||||
- Anything in `modden.runtime.env` related to `Spawn`,
|
||||
`WmCtl`, `Wks` — that's a workspace orchestration layer,
|
||||
not an env-var helper. We only lift the four utility
|
||||
functions + the var name constant.
|
||||
- Switching to msgpack/json — explicitly chosen against
|
||||
above.
|
||||
|
|
@ -1,42 +0,0 @@
|
|||
{
|
||||
"permissions": {
|
||||
"allow": [
|
||||
"Bash(cp .claude/*)",
|
||||
"Read(.claude/**)",
|
||||
"Read(.claude/skills/run-tests/**)",
|
||||
"Write(.claude/**/*commit_msg*)",
|
||||
"Write(.claude/git_commit_msg_LATEST.md)",
|
||||
"Skill(run-tests)",
|
||||
"Skill(close-wkt)",
|
||||
"Skill(open-wkt)",
|
||||
"Skill(prompt-io)",
|
||||
"Bash(date *)",
|
||||
"Bash(git diff *)",
|
||||
"Bash(git log *)",
|
||||
"Bash(git status)",
|
||||
"Bash(git remote:*)",
|
||||
"Bash(git stash:*)",
|
||||
"Bash(git mv:*)",
|
||||
"Bash(git rev-parse:*)",
|
||||
"Bash(test:*)",
|
||||
"Bash(ls:*)",
|
||||
"Bash(grep:*)",
|
||||
"Bash(find:*)",
|
||||
"Bash(ln:*)",
|
||||
"Bash(cat:*)",
|
||||
"Bash(mkdir:*)",
|
||||
"Bash(gh pr:*)",
|
||||
"Bash(gh api:*)",
|
||||
"Bash(gh issue:*)",
|
||||
"Bash(UV_PROJECT_ENVIRONMENT=py* uv sync:*)",
|
||||
"Bash(UV_PROJECT_ENVIRONMENT=py* uv run:*)",
|
||||
"Bash(echo EXIT:$?:*)",
|
||||
"Bash(echo \"EXIT=$?\")",
|
||||
"Read(//tmp/**)"
|
||||
],
|
||||
"deny": [],
|
||||
"ask": []
|
||||
},
|
||||
"prefersReducedMotion": false,
|
||||
"outputStyle": "default"
|
||||
}
|
||||
|
|
@ -1,225 +0,0 @@
|
|||
# Commit Message Style Guide for `tractor`
|
||||
|
||||
Analysis based on 500 recent commits from the `tractor` repository.
|
||||
|
||||
## Core Principles
|
||||
|
||||
Write commit messages that are technically precise yet casual in
|
||||
tone. Use abbreviations and informal language while maintaining
|
||||
clarity about what changed and why.
|
||||
|
||||
## Subject Line Format
|
||||
|
||||
### Length and Structure
|
||||
- Target: ~50 chars with a hard-max of 67.
|
||||
- Use backticks around code elements (72.2% of commits)
|
||||
- Rarely use colons (5.2%), except for file prefixes
|
||||
- End with '?' for uncertain changes (rare: 0.8%)
|
||||
- End with '!' for important changes (rare: 2.0%)
|
||||
|
||||
### Opening Verbs (Present Tense)
|
||||
|
||||
Most common verbs from analysis:
|
||||
- `Add` (14.4%) - wholly new features/functionality
|
||||
- `Use` (4.4%) - adopt new approach/tool
|
||||
- `Drop` (3.6%) - remove code/feature
|
||||
- `Fix` (2.4%) - bug fixes
|
||||
- `Move`/`Mv` (3.6%) - relocate code
|
||||
- `Adjust` (2.0%) - minor tweaks
|
||||
- `Update` (1.6%) - enhance existing feature
|
||||
- `Bump` (1.2%) - dependency updates
|
||||
- `Rename` (1.2%) - identifier changes
|
||||
- `Set` (1.2%) - configuration changes
|
||||
- `Handle` (1.0%) - add handling logic
|
||||
- `Raise` (1.0%) - add error raising
|
||||
- `Pass` (0.8%) - pass parameters/values
|
||||
- `Support` (0.8%) - add support for something
|
||||
- `Hide` (1.4%) - make private/internal
|
||||
- `Always` (1.4%) - enforce consistent behavior
|
||||
- `Mk` (1.4%) - make/create (abbreviated)
|
||||
- `Start` (1.0%) - begin implementation
|
||||
|
||||
Other frequent verbs: `More`, `Change`, `Extend`, `Disable`, `Log`,
|
||||
`Enable`, `Ensure`, `Expose`, `Allow`
|
||||
|
||||
### Backtick Usage
|
||||
|
||||
Always use backticks for:
|
||||
- Module names: `trio`, `asyncio`, `msgspec`, `greenback`, `stackscope`
|
||||
- Class names: `Context`, `Actor`, `Address`, `PldRx`, `SpawnSpec`
|
||||
- Method names: `.pause_from_sync()`, `._pause()`, `.cancel()`
|
||||
- Function names: `breakpoint()`, `collapse_eg()`, `open_root_actor()`
|
||||
- Decorators: `@acm`, `@context`
|
||||
- Exceptions: `Cancelled`, `TransportClosed`, `MsgTypeError`
|
||||
- Keywords: `finally`, `None`, `False`
|
||||
- Variable names: `tn`, `debug_mode`
|
||||
- Complex expressions: `trio.Cancelled`, `asyncio.Task`
|
||||
|
||||
Most backticked terms in tractor:
|
||||
`trio`, `asyncio`, `Context`, `.pause_from_sync()`, `tn`,
|
||||
`._pause()`, `breakpoint()`, `collapse_eg()`, `Actor`, `@acm`,
|
||||
`.cancel()`, `Cancelled`, `open_root_actor()`, `greenback`
|
||||
|
||||
### Examples
|
||||
|
||||
Good subject lines:
|
||||
```
|
||||
Add `uds` to `._multiaddr`, tweak typing
|
||||
Drop `DebugStatus.shield` attr, add `.req_finished`
|
||||
Use `stackscope` for all actor-tree rendered "views"
|
||||
Fix `.to_asyncio` inter-task-cancellation!
|
||||
Bump `ruff.toml` to target py313
|
||||
Mv `load_module_from_path()` to new `._code_load` submod
|
||||
Always use `tuple`-cast for singleton parent addrs
|
||||
```
|
||||
|
||||
## Body Format
|
||||
|
||||
### General Structure
|
||||
- 43.2% of commits have no body (simple changes)
|
||||
- Use blank line after subject
|
||||
- Max line length: 67 chars
|
||||
- Use `-` bullets for lists (28.0% of commits)
|
||||
- Rarely use `*` bullets (2.4%)
|
||||
|
||||
### Section Markers
|
||||
|
||||
Use these markers to organize longer commit bodies:
|
||||
- `Also,` (most common: 26 occurrences)
|
||||
- `Other,` (13 occurrences)
|
||||
- `Deats,` (11 occurrences) - for implementation details
|
||||
- `Further,` (7 occurrences)
|
||||
- `TODO,` (3 occurrences)
|
||||
- `Impl details,` (2 occurrences)
|
||||
- `Notes,` (1 occurrence)
|
||||
|
||||
### Common Abbreviations
|
||||
|
||||
Use these freely (sorted by frequency):
|
||||
- `msg` (63) - message
|
||||
- `bg` (37) - background
|
||||
- `ctx` (30) - context
|
||||
- `impl` (27) - implementation
|
||||
- `mod` (26) - module
|
||||
- `obvi` (17) - obviously
|
||||
- `tn` (16) - task name
|
||||
- `fn` (15) - function
|
||||
- `vs` (15) - versus
|
||||
- `bc` (14) - because
|
||||
- `var` (14) - variable
|
||||
- `prolly` (9) - probably
|
||||
- `ep` (6) - entry point
|
||||
- `OW` (5) - otherwise
|
||||
- `rn` (4) - right now
|
||||
- `sig` (4) - signal/signature
|
||||
- `deps` (3) - dependencies
|
||||
- `iface` (2) - interface
|
||||
- `subproc` (2) - subprocess
|
||||
- `tho` (2) - though
|
||||
- `ofc` (2) - of course
|
||||
|
||||
### Tone and Style
|
||||
|
||||
- Casual but technical (use `XD` for humor: 23 times)
|
||||
- Use `..` for trailing thoughts (108 occurrences)
|
||||
- Use `Woops,` to acknowledge mistakes (4 subject lines)
|
||||
- Don't be afraid to show personality while being precise
|
||||
|
||||
### Example Bodies
|
||||
|
||||
Simple with bullets:
|
||||
```
|
||||
Add `multiaddr` and bump up some deps
|
||||
|
||||
Since we're planning to use it for (discovery)
|
||||
addressing, allowing replacement of the hacky (pretend)
|
||||
attempt in `tractor._multiaddr` Bp
|
||||
|
||||
Also pin some deps,
|
||||
- make us py312+
|
||||
- use `pdbp` with my frame indexing fix.
|
||||
- mv to latest `xonsh` for fancy cmd/suggestion injections.
|
||||
|
||||
Bump lock file to match obvi!
|
||||
```
|
||||
|
||||
With section markers:
|
||||
```
|
||||
Use `stackscope` for all actor-tree rendered "views"
|
||||
|
||||
Instead of the (much more) limited and hacky `.devx._code`
|
||||
impls, move to using the new `.devx._stackscope` API which
|
||||
wraps the `stackscope` project.
|
||||
|
||||
Deats,
|
||||
- make new `stackscope.extract_stack()` wrapper
|
||||
- port over frame-descing to `_stackscope.pformat_stack()`
|
||||
- move `PdbREPL` to use `stackscope` render approach
|
||||
- update tests for new stack output format
|
||||
|
||||
Also,
|
||||
- tweak log formatting for consistency
|
||||
- add typing hints throughout
|
||||
```
|
||||
|
||||
## Special Patterns
|
||||
|
||||
### WIP Commits
|
||||
Rare (0.2%) - avoid committing WIP if possible
|
||||
|
||||
### Merge Commits
|
||||
Auto-generated (4.4%), don't worry about style
|
||||
|
||||
### File References
|
||||
- Use `module.py` or `.submodule` style
|
||||
- Rarely use `file.py:line` references (0 in analysis)
|
||||
|
||||
### Links
|
||||
- GitHub links used sparingly (3 total)
|
||||
- Prefer code references over external links
|
||||
|
||||
## Footer
|
||||
|
||||
The default footer should credit `claude` (you) for helping generate
|
||||
the commit msg content:
|
||||
|
||||
```
|
||||
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
|
||||
[claude-code-gh]: https://github.com/anthropics/claude-code
|
||||
```
|
||||
|
||||
Further, if the patch was solely or in part written
|
||||
by `claude`, instead add:
|
||||
|
||||
```
|
||||
(this patch was generated in some part by [`claude-code`][claude-code-gh])
|
||||
[claude-code-gh]: https://github.com/anthropics/claude-code
|
||||
```
|
||||
|
||||
## Summary Checklist
|
||||
|
||||
Before committing, verify:
|
||||
- [ ] Subject line uses present tense verb
|
||||
- [ ] Subject line ~50 chars (hard max 67)
|
||||
- [ ] Code elements wrapped in backticks
|
||||
- [ ] Body lines ≤67 chars
|
||||
- [ ] Abbreviations used where natural
|
||||
- [ ] Casual yet precise tone
|
||||
- [ ] Section markers if body >3 paragraphs
|
||||
- [ ] Technical accuracy maintained
|
||||
|
||||
## Analysis Metadata
|
||||
|
||||
```
|
||||
Source: tractor repository
|
||||
Commits analyzed: 500
|
||||
Date range: 2019-2025
|
||||
Analysis date: 2026-02-08
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
(this style guide was generated by [`claude-code`][claude-code-gh]
|
||||
analyzing commit history)
|
||||
|
||||
[claude-code-gh]: https://github.com/anthropics/claude-code
|
||||
|
|
@ -1,297 +0,0 @@
|
|||
---
|
||||
name: conc-anal
|
||||
description: >
|
||||
Concurrency analysis for tractor's trio-based
|
||||
async primitives. Trace task scheduling across
|
||||
checkpoint boundaries, identify race windows in
|
||||
shared mutable state, and verify synchronization
|
||||
correctness. Invoke on code segments the user
|
||||
points at, OR proactively when reviewing/writing
|
||||
concurrent cache, lock, or multi-task acm code.
|
||||
argument-hint: "[file:line-range or function name]"
|
||||
allowed-tools:
|
||||
- Read
|
||||
- Grep
|
||||
- Glob
|
||||
- Task
|
||||
---
|
||||
|
||||
Perform a structured concurrency analysis on the
|
||||
target code. This skill should be invoked:
|
||||
|
||||
- **On demand**: user points at a code segment
|
||||
(file:lines, function name, or pastes a snippet)
|
||||
- **Proactively**: when writing or reviewing code
|
||||
that touches shared mutable state across trio
|
||||
tasks — especially `_Cache`, locks, events, or
|
||||
multi-task `@acm` lifecycle management
|
||||
|
||||
## 0. Identify the target
|
||||
|
||||
If the user provides a file:line-range or function
|
||||
name, read that code. If not explicitly provided,
|
||||
identify the relevant concurrent code from context
|
||||
(e.g. the current diff, a failing test, or the
|
||||
function under discussion).
|
||||
|
||||
## 1. Inventory shared mutable state
|
||||
|
||||
List every piece of state that is accessed by
|
||||
multiple tasks. For each, note:
|
||||
|
||||
- **What**: the variable/dict/attr (e.g.
|
||||
`_Cache.values`, `_Cache.resources`,
|
||||
`_Cache.users`)
|
||||
- **Scope**: class-level, module-level, or
|
||||
closure-captured
|
||||
- **Writers**: which tasks/code-paths mutate it
|
||||
- **Readers**: which tasks/code-paths read it
|
||||
- **Guarded by**: which lock/event/ordering
|
||||
protects it (or "UNGUARDED" if none)
|
||||
|
||||
Format as a table:
|
||||
|
||||
```
|
||||
| State | Writers | Readers | Guard |
|
||||
|---------------------|-----------------|-----------------|----------------|
|
||||
| _Cache.values | run_ctx, moc¹ | moc | ctx_key lock |
|
||||
| _Cache.resources | run_ctx, moc | moc, run_ctx | UNGUARDED |
|
||||
```
|
||||
|
||||
¹ `moc` = `maybe_open_context`
|
||||
|
||||
## 2. Map checkpoint boundaries
|
||||
|
||||
For each code path through the target, mark every
|
||||
**checkpoint** — any `await` expression where trio
|
||||
can switch to another task. Use line numbers:
|
||||
|
||||
```
|
||||
L325: await lock.acquire() ← CHECKPOINT
|
||||
L395: await service_tn.start(...) ← CHECKPOINT
|
||||
L411: lock.release() ← (not a checkpoint, but changes lock state)
|
||||
L414: yield (False, yielded) ← SUSPEND (caller runs)
|
||||
L485: no_more_users.set() ← (wakes run_ctx, no switch yet)
|
||||
```
|
||||
|
||||
**Key trio scheduling rules to apply:**
|
||||
- `Event.set()` makes waiters *ready* but does NOT
|
||||
switch immediately
|
||||
- `lock.release()` is not a checkpoint
|
||||
- `await sleep(0)` IS a checkpoint
|
||||
- Code in `finally` blocks CAN have checkpoints
|
||||
(unlike asyncio)
|
||||
- `await` inside `except` blocks can be
|
||||
`trio.Cancelled`-masked
|
||||
|
||||
## 3. Trace concurrent task schedules
|
||||
|
||||
Write out the **interleaved execution trace** for
|
||||
the problematic scenario. Number each step and tag
|
||||
which task executes it:
|
||||
|
||||
```
|
||||
[Task A] 1. acquires lock
|
||||
[Task A] 2. cache miss → allocates resources
|
||||
[Task A] 3. releases lock
|
||||
[Task A] 4. yields to caller
|
||||
[Task A] 5. caller exits → finally runs
|
||||
[Task A] 6. users-- → 0, sets no_more_users
|
||||
[Task A] 7. pops lock from _Cache.locks
|
||||
[run_ctx] 8. wakes from no_more_users.wait()
|
||||
[run_ctx] 9. values.pop(ctx_key)
|
||||
[run_ctx] 10. acm __aexit__ → CHECKPOINT
|
||||
[Task B] 11. creates NEW lock (old one popped)
|
||||
[Task B] 12. acquires immediately
|
||||
[Task B] 13. values[ctx_key] → KeyError
|
||||
[Task B] 14. resources[ctx_key] → STILL EXISTS
|
||||
[Task B] 15. 💥 RuntimeError
|
||||
```
|
||||
|
||||
Identify the **race window**: the range of steps
|
||||
where state is inconsistent. In the example above,
|
||||
steps 9–10 are the window (values gone, resources
|
||||
still alive).
|
||||
|
||||
## 4. Classify the bug
|
||||
|
||||
Categorize what kind of concurrency issue this is:
|
||||
|
||||
- **TOCTOU** (time-of-check-to-time-of-use): state
|
||||
changes between a check and the action based on it
|
||||
- **Stale reference**: a task holds a reference to
|
||||
state that another task has invalidated
|
||||
- **Lifetime mismatch**: a synchronization primitive
|
||||
(lock, event) has a shorter lifetime than the
|
||||
state it's supposed to protect
|
||||
- **Missing guard**: shared state is accessed
|
||||
without any synchronization
|
||||
- **Atomicity gap**: two operations that should be
|
||||
atomic have a checkpoint between them
|
||||
|
||||
## 5. Propose fixes
|
||||
|
||||
For each proposed fix, provide:
|
||||
|
||||
- **Sketch**: pseudocode or diff showing the change
|
||||
- **How it closes the window**: which step(s) from
|
||||
the trace it eliminates or reorders
|
||||
- **Tradeoffs**: complexity, perf, new edge cases,
|
||||
impact on other code paths
|
||||
- **Risk**: what could go wrong (deadlocks, new
|
||||
races, cancellation issues)
|
||||
|
||||
Rate each fix: `[simple|moderate|complex]` impl
|
||||
effort.
|
||||
|
||||
## 6. Output format
|
||||
|
||||
Structure the full analysis as:
|
||||
|
||||
```markdown
|
||||
## Concurrency analysis: `<target>`
|
||||
|
||||
### Shared state
|
||||
<table from step 1>
|
||||
|
||||
### Checkpoints
|
||||
<list from step 2>
|
||||
|
||||
### Race trace
|
||||
<interleaved trace from step 3>
|
||||
|
||||
### Classification
|
||||
<bug type from step 4>
|
||||
|
||||
### Fixes
|
||||
<proposals from step 5>
|
||||
```
|
||||
|
||||
## Tractor-specific patterns to watch
|
||||
|
||||
These are known problem areas in tractor's
|
||||
concurrency model. Flag them when encountered:
|
||||
|
||||
### `_Cache` lock vs `run_ctx` lifetime
|
||||
|
||||
The `_Cache.locks` entry is managed by
|
||||
`maybe_open_context` callers, but `run_ctx` runs
|
||||
in `service_tn` — a different task tree. Lock
|
||||
pop/release in the caller's `finally` does NOT
|
||||
wait for `run_ctx` to finish tearing down. Any
|
||||
state that `run_ctx` cleans up in its `finally`
|
||||
(e.g. `resources.pop()`) is vulnerable to
|
||||
re-entry races after the lock is popped.
|
||||
|
||||
### `values.pop()` → acm `__aexit__` → `resources.pop()` gap
|
||||
|
||||
In `_Cache.run_ctx`, the inner `finally` pops
|
||||
`values`, then the acm's `__aexit__` runs (which
|
||||
has checkpoints), then the outer `finally` pops
|
||||
`resources`. This creates a window where `values`
|
||||
is gone but `resources` still exists — a classic
|
||||
atomicity gap.
|
||||
|
||||
### Global vs per-key counters
|
||||
|
||||
`_Cache.users` as a single `int` (pre-fix) meant
|
||||
that users of different `ctx_key`s inflated each
|
||||
other's counts, preventing teardown when one key's
|
||||
users hit zero. Always verify that per-key state
|
||||
(`users`, `locks`) is actually keyed on `ctx_key`
|
||||
and not on `fid` or some broader key.
|
||||
|
||||
### `Event.set()` wakes but doesn't switch
|
||||
|
||||
`trio.Event.set()` makes waiting tasks *ready* but
|
||||
the current task continues executing until its next
|
||||
checkpoint. Code between `.set()` and the next
|
||||
`await` runs atomically from the scheduler's
|
||||
perspective. Use this to your advantage (or watch
|
||||
for bugs where code assumes the woken task runs
|
||||
immediately).
|
||||
|
||||
### `except` block checkpoint masking
|
||||
|
||||
`await` expressions inside `except` handlers can
|
||||
be masked by `trio.Cancelled`. If a `finally`
|
||||
block runs from an `except` and contains
|
||||
`lock.release()`, the release happens — but any
|
||||
`await` after it in the same `except` may be
|
||||
swallowed. This is why `maybe_open_context`'s
|
||||
cache-miss path does `lock.release()` in a
|
||||
`finally` inside the `except KeyError`.
|
||||
|
||||
### Cancellation in `finally`
|
||||
|
||||
Unlike asyncio, trio allows checkpoints in
|
||||
`finally` blocks. This means `finally` cleanup
|
||||
that does `await` can itself be cancelled (e.g.
|
||||
by nursery shutdown). Watch for cleanup code that
|
||||
assumes it will run to completion.
|
||||
|
||||
### Unbounded waits in cleanup paths
|
||||
|
||||
Any `await <event>.wait()` in a teardown path is
|
||||
a latent deadlock unless the event's setter is
|
||||
GUARANTEED to fire. If the setter depends on
|
||||
external state (peer disconnects, child process
|
||||
exit, subsequent task completion) that itself
|
||||
depends on the current task's progress, you have
|
||||
a mutual wait.
|
||||
|
||||
Rule: **bound every `await X.wait()` in cleanup
|
||||
paths with `trio.move_on_after()`** unless you
|
||||
can prove the setter is unconditionally reachable
|
||||
from the state at the await site. Concrete recent
|
||||
example: `ipc_server.wait_for_no_more_peers()` in
|
||||
`async_main`'s finally (see
|
||||
`ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md`
|
||||
"probe iteration 3") — it was unbounded, and when
|
||||
one peer-handler was stuck the wait-for-no-more-
|
||||
peers event never fired, deadlocking the whole
|
||||
actor-tree teardown cascade.
|
||||
|
||||
### The capture-pipe-fill hang pattern (grep this first)
|
||||
|
||||
When investigating any hang in the test suite
|
||||
**especially under fork-based backends**, first
|
||||
check whether the hang reproduces under `pytest
|
||||
-s` (`--capture=no`). If `-s` makes it go away
|
||||
you're not looking at a trio concurrency bug —
|
||||
you're looking at a Linux pipe-buffer fill.
|
||||
|
||||
Mechanism: pytest replaces fds 1,2 with pipe
|
||||
write-ends. Fork-child subactors inherit those
|
||||
fds. High-volume error-log tracebacks (cancel
|
||||
cascade spew) fill the 64KB pipe buffer. Child
|
||||
`write()` blocks. Child can't exit. Parent's
|
||||
`waitpid`/pidfd wait blocks. Deadlock cascades up
|
||||
the tree.
|
||||
|
||||
Pre-existing guards in `tests/conftest.py` encode
|
||||
this knowledge — grep these BEFORE blaming
|
||||
concurrency:
|
||||
|
||||
```python
|
||||
# tests/conftest.py:258
|
||||
if loglevel in ('trace', 'debug'):
|
||||
# XXX: too much logging will lock up the subproc (smh)
|
||||
loglevel: str = 'info'
|
||||
|
||||
# tests/conftest.py:316
|
||||
# can lock up on the `_io.BufferedReader` and hang..
|
||||
stderr: str = proc.stderr.read().decode()
|
||||
```
|
||||
|
||||
Full post-mortem +
|
||||
`ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md`
|
||||
for the canonical reproduction. Cost several
|
||||
investigation sessions before catching it —
|
||||
because the capture-pipe symptom was masked by
|
||||
deeper cascade-deadlocks. Once the cascades were
|
||||
fixed, the tree tore down enough to generate
|
||||
pipe-filling log volume → capture-pipe finally
|
||||
surfaced. Grep-note for future-self: **if a
|
||||
multi-subproc tractor test hangs, `pytest -s`
|
||||
first, conc-anal second.**
|
||||
|
|
@ -1,241 +0,0 @@
|
|||
# PR/Patch-Request Description Format Reference
|
||||
|
||||
Canonical structure for `tractor` patch-request
|
||||
descriptions, designed to work across GitHub,
|
||||
Gitea, SourceHut, and GitLab markdown renderers.
|
||||
|
||||
**Line length: wrap at 72 chars** for all prose
|
||||
content (Summary bullets, Motivation paragraphs,
|
||||
Scopes bullets, etc.). Fill lines *to* 72 — don't
|
||||
stop short at 50-65. Only raw URLs in
|
||||
reference-link definitions may exceed this.
|
||||
|
||||
## Template
|
||||
|
||||
```markdown
|
||||
<!-- pr-msg-meta
|
||||
branch: <branch-name>
|
||||
base: <base-branch>
|
||||
submitted:
|
||||
github: ___
|
||||
gitea: ___
|
||||
srht: ___
|
||||
-->
|
||||
|
||||
## <Title: present-tense verb + backticked code>
|
||||
|
||||
### Summary
|
||||
- [<hash>][<hash>] Description of change ending
|
||||
with period.
|
||||
- [<hash>][<hash>] Another change description
|
||||
ending with period.
|
||||
- [<hash>][<hash>] [<hash>][<hash>] Multi-commit
|
||||
change description.
|
||||
|
||||
### Motivation
|
||||
<1-2 paragraphs: problem/limitation first,
|
||||
then solution. Hard-wrap at 72 chars.>
|
||||
|
||||
### Scopes changed
|
||||
- [<hash>][<hash>] `pkg.mod.func()` — what
|
||||
changed.
|
||||
* [<hash>][<hash>] Also adjusts
|
||||
`.related_thing()` in same module.
|
||||
- [<hash>][<hash>] `tests.test_mod` — new/changed
|
||||
test coverage.
|
||||
|
||||
<!--
|
||||
### Cross-references
|
||||
Also submitted as
|
||||
[github-pr][] | [gitea-pr][] | [srht-patch][].
|
||||
|
||||
### Links
|
||||
- [relevant-issue-or-discussion](url)
|
||||
- [design-doc-or-screenshot](url)
|
||||
-->
|
||||
|
||||
(this pr content was generated in some part by
|
||||
[`claude-code`][claude-code-gh])
|
||||
|
||||
[<hash>]: https://<service>/<owner>/<repo>/commit/<hash>
|
||||
[claude-code-gh]: https://github.com/anthropics/claude-code
|
||||
|
||||
<!-- cross-service pr refs (fill after submit):
|
||||
[github-pr]: https://github.com/<owner>/<repo>/pull/___
|
||||
[gitea-pr]: https://<host>/<owner>/<repo>/pulls/___
|
||||
[srht-patch]: https://git.sr.ht/~<owner>/<repo>/patches/___
|
||||
-->
|
||||
```
|
||||
|
||||
## Markdown Reference-Link Strategy
|
||||
|
||||
Use reference-style links for ALL commit hashes
|
||||
and cross-service PR refs to ensure cross-service
|
||||
compatibility:
|
||||
|
||||
**Inline usage** (in bullets):
|
||||
```markdown
|
||||
- [f3726cf9][f3726cf9] Add `reg_err_types()`
|
||||
for custom exc lookup.
|
||||
```
|
||||
|
||||
**Definition** (bottom of document):
|
||||
```markdown
|
||||
[f3726cf9]: https://github.com/goodboy/tractor/commit/f3726cf9
|
||||
```
|
||||
|
||||
### Why reference-style?
|
||||
- Keeps prose readable without long inline URLs.
|
||||
- All URLs in one place — trivially swappable
|
||||
per-service.
|
||||
- Most git services auto-link bare SHAs anyway,
|
||||
but explicit refs guarantee it works in *any*
|
||||
md renderer.
|
||||
- The `[hash][hash]` form is self-documenting —
|
||||
display text matches the ref ID.
|
||||
- Cross-service PR refs use the same mechanism:
|
||||
`[github-pr][]` resolves via a ref-link def
|
||||
at the bottom, trivially fillable post-submit.
|
||||
|
||||
## Cross-Service PR Placeholder Mechanism
|
||||
|
||||
The generated description includes three layers
|
||||
of cross-service support, all using native md
|
||||
reference-links:
|
||||
|
||||
### 1. Metadata comment (top of file)
|
||||
|
||||
```markdown
|
||||
<!-- pr-msg-meta
|
||||
branch: remote_exc_type_registry
|
||||
base: main
|
||||
submitted:
|
||||
github: ___
|
||||
gitea: ___
|
||||
srht: ___
|
||||
-->
|
||||
```
|
||||
|
||||
A YAML-ish HTML comment block. The `___`
|
||||
placeholders get filled with PR/patch numbers
|
||||
after submission. Machine-parseable for tooling
|
||||
(e.g. `gish`) but invisible in rendered md.
|
||||
|
||||
### 2. Cross-references section (in body)
|
||||
|
||||
```markdown
|
||||
<!--
|
||||
### Cross-references
|
||||
Also submitted as
|
||||
[github-pr][] | [gitea-pr][] | [srht-patch][].
|
||||
-->
|
||||
```
|
||||
|
||||
Commented out at generation time. After submitting
|
||||
to multiple services, uncomment and the ref-links
|
||||
resolve via the stubs at the bottom.
|
||||
|
||||
### 3. Ref-link stubs (bottom of file)
|
||||
|
||||
```markdown
|
||||
<!-- cross-service pr refs (fill after submit):
|
||||
[github-pr]: https://github.com/goodboy/tractor/pull/___
|
||||
[gitea-pr]: https://pikers.dev/goodboy/tractor/pulls/___
|
||||
[srht-patch]: https://git.sr.ht/~goodboy/tractor/patches/___
|
||||
-->
|
||||
```
|
||||
|
||||
Commented out with `___` number placeholders.
|
||||
After submission: uncomment, replace `___` with
|
||||
the actual number. Each service-specific copy
|
||||
fills in all services' numbers so any copy can
|
||||
cross-reference the others.
|
||||
|
||||
### Post-submission file layout
|
||||
|
||||
```
|
||||
pr_msg_LATEST.md # latest draft (skill root)
|
||||
msgs/
|
||||
20260325T002027Z_mybranch_pr_msg.md # timestamped
|
||||
github/
|
||||
42_pr_msg.md # github PR #42
|
||||
gitea/
|
||||
17_pr_msg.md # gitea PR #17
|
||||
srht/
|
||||
5_pr_msg.md # srht patch #5
|
||||
```
|
||||
|
||||
Each `<service>/<num>_pr_msg.md` is a copy with:
|
||||
- metadata `submitted:` fields filled in
|
||||
- cross-references section uncommented
|
||||
- ref-link stubs uncommented with real numbers
|
||||
- all services cross-linked in each copy
|
||||
|
||||
This mirrors the `gish` skill's
|
||||
`<backend>/<num>.md` pattern.
|
||||
|
||||
## Commit-Link URL Patterns by Service
|
||||
|
||||
| Service | Pattern |
|
||||
|-----------|-------------------------------------|
|
||||
| GitHub | `https://github.com/<o>/<r>/commit/<h>` |
|
||||
| Gitea | `https://<host>/<o>/<r>/commit/<h>` |
|
||||
| SourceHut | `https://git.sr.ht/~<o>/<r>/commit/<h>` |
|
||||
| GitLab | `https://gitlab.com/<o>/<r>/-/commit/<h>` |
|
||||
|
||||
## PR/Patch URL Patterns by Service
|
||||
|
||||
| Service | Pattern |
|
||||
|-----------|-------------------------------------|
|
||||
| GitHub | `https://github.com/<o>/<r>/pull/<n>` |
|
||||
| Gitea | `https://<host>/<o>/<r>/pulls/<n>` |
|
||||
| SourceHut | `https://git.sr.ht/~<o>/<r>/patches/<n>` |
|
||||
| GitLab | `https://gitlab.com/<o>/<r>/-/merge_requests/<n>` |
|
||||
|
||||
## Scope Naming Convention
|
||||
|
||||
Use Python namespace-resolution syntax for
|
||||
referencing changed code scopes:
|
||||
|
||||
| File path | Scope reference |
|
||||
|---------------------------|-------------------------------|
|
||||
| `tractor/_exceptions.py` | `tractor._exceptions` |
|
||||
| `tractor/_state.py` | `tractor._state` |
|
||||
| `tests/test_foo.py` | `tests.test_foo` |
|
||||
| Function in module | `tractor._exceptions.func()` |
|
||||
| Method on class | `.RemoteActorError.src_type` |
|
||||
| Class | `tractor._exceptions.RAE` |
|
||||
|
||||
Prefix with the package path for top-level refs;
|
||||
use leading-dot shorthand (`.ClassName.method()`)
|
||||
for sub-bullets where the parent module is already
|
||||
established.
|
||||
|
||||
## Title Conventions
|
||||
|
||||
Same verb vocabulary as commit messages:
|
||||
- `Add` — wholly new feature/API
|
||||
- `Fix` — bug fix
|
||||
- `Drop` — removal
|
||||
- `Use` — adopt new approach
|
||||
- `Move`/`Mv` — relocate code
|
||||
- `Adjust` — minor tweak
|
||||
- `Update` — enhance existing feature
|
||||
- `Support` — add support for something
|
||||
|
||||
Target 50 chars, hard max 70. Always backtick
|
||||
code elements.
|
||||
|
||||
## Tone
|
||||
|
||||
Casual yet technically precise — matching the
|
||||
project's commit-msg style. Terse but every bullet
|
||||
carries signal. Use project abbreviations freely
|
||||
(msg, bg, ctx, impl, mod, obvi, fn, bc, var,
|
||||
prolly, ep, etc.).
|
||||
|
||||
---
|
||||
|
||||
(this format reference was generated by
|
||||
[`claude-code`][claude-code-gh])
|
||||
[claude-code-gh]: https://github.com/anthropics/claude-code
|
||||
|
|
@ -1,625 +0,0 @@
|
|||
---
|
||||
name: run-tests
|
||||
description: >
|
||||
Run tractor test suite (or subsets). Use when the user wants
|
||||
to run tests, verify changes, or check for regressions.
|
||||
argument-hint: "[test-path-or-pattern] [--opts]"
|
||||
allowed-tools:
|
||||
- Bash(python -m pytest *)
|
||||
- Bash(python -c *)
|
||||
- Bash(python --version *)
|
||||
- Bash(UV_PROJECT_ENVIRONMENT=py* uv run python *)
|
||||
- Bash(UV_PROJECT_ENVIRONMENT=py* uv run pytest *)
|
||||
- Bash(UV_PROJECT_ENVIRONMENT=py* uv sync *)
|
||||
- Bash(UV_PROJECT_ENVIRONMENT=py* uv pip show *)
|
||||
- Bash(git rev-parse *)
|
||||
- Bash(ls *)
|
||||
- Bash(cat *)
|
||||
- Bash(jq * .pytest_cache/*)
|
||||
- Read
|
||||
- Grep
|
||||
- Glob
|
||||
- Task
|
||||
- AskUserQuestion
|
||||
---
|
||||
|
||||
Run the `tractor` test suite using `pytest`. Follow this
|
||||
process:
|
||||
|
||||
## 1. Parse user intent
|
||||
|
||||
From the user's message and any arguments, determine:
|
||||
|
||||
- **scope**: full suite, specific file(s), specific
|
||||
test(s), or a keyword pattern (`-k`).
|
||||
- **transport**: which IPC transport protocol to test
|
||||
against (default: `tcp`, also: `uds`).
|
||||
- **options**: any extra pytest flags the user wants
|
||||
(e.g. `--ll debug`, `--tpdb`, `-x`, `-v`).
|
||||
|
||||
If the user provides a bare path or pattern as argument,
|
||||
treat it as the test target. Examples:
|
||||
|
||||
- `/run-tests` → full suite
|
||||
- `/run-tests test_local.py` → single file
|
||||
- `/run-tests test_registrar -v` → file + verbose
|
||||
- `/run-tests -k cancel` → keyword filter
|
||||
- `/run-tests tests/ipc/ --tpt-proto uds` → subdir + UDS
|
||||
|
||||
## 2. Construct the pytest command
|
||||
|
||||
Base command:
|
||||
```
|
||||
python -m pytest
|
||||
```
|
||||
|
||||
### Default flags (always include unless user overrides):
|
||||
- `-x` (stop on first failure)
|
||||
- `--tb=short` (concise tracebacks)
|
||||
- `--no-header` (reduce noise)
|
||||
|
||||
### Path resolution:
|
||||
- If the user gives a bare filename like `test_local.py`,
|
||||
resolve it under `tests/`.
|
||||
- If the user gives a subdirectory like `ipc/`, resolve
|
||||
under `tests/ipc/`.
|
||||
- Glob if needed: `tests/**/test_*<pattern>*.py`
|
||||
|
||||
### Key pytest options for this project:
|
||||
|
||||
| Flag | Purpose |
|
||||
|---|---|
|
||||
| `--ll <level>` | Set tractor log level (e.g. `debug`, `info`, `runtime`) |
|
||||
| `--tpdb` / `--debug-mode` | Enable tractor's multi-proc debugger |
|
||||
| `--tpt-proto <key>` | IPC transport: `tcp` (default) or `uds` |
|
||||
| `--spawn-backend <be>` | Spawn method: `trio` (default), `mp_spawn`, `mp_forkserver` |
|
||||
| `-k <expr>` | pytest keyword filter |
|
||||
| `-v` / `-vv` | Verbosity |
|
||||
| `-s` | No output capture (useful with `--tpdb`) |
|
||||
|
||||
### Common combos:
|
||||
```sh
|
||||
# quick smoke test of core modules
|
||||
python -m pytest tests/test_local.py tests/test_rpc.py -x --tb=short --no-header
|
||||
|
||||
# full suite, stop on first failure
|
||||
python -m pytest tests/ -x --tb=short --no-header
|
||||
|
||||
# specific test with debug
|
||||
python -m pytest tests/discovery/test_registrar.py::test_reg_then_unreg -x -s --tpdb --ll debug
|
||||
|
||||
# run with UDS transport
|
||||
python -m pytest tests/ -x --tb=short --no-header --tpt-proto uds
|
||||
|
||||
# keyword filter
|
||||
python -m pytest tests/ -x --tb=short --no-header -k "cancel and not slow"
|
||||
```
|
||||
|
||||
## 3. Pre-flight: venv detection (MANDATORY)
|
||||
|
||||
**Always verify a `uv` venv is active before running
|
||||
`python` or `pytest`.** This project uses
|
||||
`UV_PROJECT_ENVIRONMENT=py<MINOR>` naming (e.g.
|
||||
`py313`) — never `.venv`.
|
||||
|
||||
### Step 1: detect active venv
|
||||
|
||||
Run this check first:
|
||||
|
||||
```sh
|
||||
python -c "
|
||||
import sys, os
|
||||
venv = os.environ.get('VIRTUAL_ENV', '')
|
||||
prefix = sys.prefix
|
||||
print(f'VIRTUAL_ENV={venv}')
|
||||
print(f'sys.prefix={prefix}')
|
||||
print(f'executable={sys.executable}')
|
||||
"
|
||||
```
|
||||
|
||||
### Step 2: interpret results
|
||||
|
||||
**Case A — venv is active** (`VIRTUAL_ENV` is set
|
||||
and points to a `py<MINOR>/` dir under the project
|
||||
root or worktree):
|
||||
|
||||
Use bare `python` / `python -m pytest` for all
|
||||
commands. This is the normal, fast path.
|
||||
|
||||
**Case B — no venv active** (`VIRTUAL_ENV` is empty
|
||||
or `sys.prefix` points to a system Python):
|
||||
|
||||
Use `AskUserQuestion` to ask the user:
|
||||
|
||||
> "No uv venv is active. Should I activate one
|
||||
> via `UV_PROJECT_ENVIRONMENT=py<MINOR> uv sync`,
|
||||
> or would you prefer to activate your shell venv
|
||||
> first?"
|
||||
|
||||
Options:
|
||||
1. **"Create/sync venv"** — run
|
||||
`UV_PROJECT_ENVIRONMENT=py<MINOR> uv sync` where
|
||||
`<MINOR>` is detected from `python --version`
|
||||
(e.g. `313` for 3.13). Then use
|
||||
`py<MINOR>/bin/python` for all subsequent
|
||||
commands in this session.
|
||||
2. **"I'll activate it myself"** — stop and let the
|
||||
user `source py<MINOR>/bin/activate` or similar.
|
||||
|
||||
**Case C — inside a git worktree** (`git rev-parse
|
||||
--git-common-dir` differs from `--git-dir`):
|
||||
|
||||
Verify Python resolves from the **worktree's own
|
||||
venv**, not the main repo's:
|
||||
|
||||
```sh
|
||||
python -c "import tractor; print(tractor.__file__)"
|
||||
```
|
||||
|
||||
If the path points outside the worktree, create a
|
||||
worktree-local venv:
|
||||
|
||||
```sh
|
||||
UV_PROJECT_ENVIRONMENT=py<MINOR> uv sync
|
||||
```
|
||||
|
||||
Then use `py<MINOR>/bin/python` for all commands.
|
||||
|
||||
**Why this matters**: without the correct venv,
|
||||
subprocesses spawned by tractor resolve modules
|
||||
from the wrong editable install, causing spurious
|
||||
`AttributeError` / `ModuleNotFoundError`.
|
||||
|
||||
### Fallback: `uv run`
|
||||
|
||||
If the user can't or won't activate a venv, all
|
||||
`python` and `pytest` commands can be prefixed
|
||||
with `UV_PROJECT_ENVIRONMENT=py<MINOR> uv run`:
|
||||
|
||||
```sh
|
||||
# instead of: python -m pytest tests/ -x
|
||||
UV_PROJECT_ENVIRONMENT=py313 uv run pytest tests/ -x
|
||||
|
||||
# instead of: python -c 'import tractor'
|
||||
UV_PROJECT_ENVIRONMENT=py313 uv run python -c 'import tractor'
|
||||
```
|
||||
|
||||
`uv run` auto-discovers the project and venv,
|
||||
but is slower than a pre-activated venv due to
|
||||
lock-file resolution on each invocation. Prefer
|
||||
activating the venv when possible.
|
||||
|
||||
### Step 3: import + collection checks
|
||||
|
||||
After venv is confirmed, always run these
|
||||
(especially after refactors or module moves):
|
||||
|
||||
```sh
|
||||
# 1. package import smoke check
|
||||
python -c 'import tractor; print(tractor)'
|
||||
|
||||
# 2. verify all tests collect (no import errors)
|
||||
python -m pytest tests/ -x -q --co 2>&1 | tail -5
|
||||
```
|
||||
|
||||
If either fails, fix the import error before running
|
||||
any actual tests.
|
||||
|
||||
### Step 4: zombie-actor / stale-registry check (MANDATORY)
|
||||
|
||||
The tractor runtime's default registry address is
|
||||
**`127.0.0.1:1616`** (TCP) / `/tmp/registry@1616.sock`
|
||||
(UDS). Whenever any prior test run — especially one
|
||||
using a fork-based backend like `subint_forkserver` —
|
||||
leaks a child actor process, that zombie keeps the
|
||||
registry port bound and **every subsequent test
|
||||
session fails to bind**, often presenting as 50+
|
||||
unrelated failures ("all tests broken"!) across
|
||||
backends.
|
||||
|
||||
**This has to be checked before the first run AND
|
||||
after any cancelled/SIGINT'd run** — signal failures
|
||||
in the middle of a test can leave orphan children.
|
||||
|
||||
```sh
|
||||
# 1. TCP registry — any listener on :1616? (primary signal)
|
||||
ss -tlnp 2>/dev/null | grep ':1616' || echo 'TCP :1616 free'
|
||||
|
||||
# 2. leftover actor/forkserver procs — scoped to THIS
|
||||
# repo's python path, so we don't false-flag legit
|
||||
# long-running tractor-using apps (e.g. `piker`,
|
||||
# downstream projects that embed tractor).
|
||||
pgrep -af "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv" \
|
||||
| grep -v 'grep\|pgrep' \
|
||||
|| echo 'no leaked actor procs from this repo'
|
||||
|
||||
# 3. stale UDS registry sockets
|
||||
ls -la /tmp/registry@*.sock 2>/dev/null \
|
||||
|| echo 'no leaked UDS registry sockets'
|
||||
```
|
||||
|
||||
**Interpretation:**
|
||||
|
||||
- **TCP :1616 free AND no stale sockets** → clean,
|
||||
proceed. The actor-procs probe is secondary — false
|
||||
positives are common (piker, any other tractor-
|
||||
embedding app); only cleanup if `:1616` is bound or
|
||||
sockets linger.
|
||||
- **TCP :1616 bound OR stale sockets present** →
|
||||
surface PIDs + cmdlines to the user, offer cleanup:
|
||||
|
||||
```sh
|
||||
# 1. GRACEFUL FIRST (tractor is structured concurrent — it
|
||||
# catches SIGINT as an OS-cancel in `_trio_main` and
|
||||
# cascades Portal.cancel_actor via IPC to every descendant.
|
||||
# So always try SIGINT first with a bounded timeout; only
|
||||
# escalate to SIGKILL if graceful cleanup doesn't complete).
|
||||
pkill -INT -f "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv"
|
||||
|
||||
# 2. bounded wait for graceful teardown (usually sub-second).
|
||||
# Loop until the processes exit, or timeout. Keep the
|
||||
# bound tight — hung/abrupt-killed descendants usually
|
||||
# hang forever, so don't wait more than a few seconds.
|
||||
for i in $(seq 1 10); do
|
||||
pgrep -f "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv" >/dev/null || break
|
||||
sleep 0.3
|
||||
done
|
||||
|
||||
# 3. ESCALATE TO SIGKILL only if graceful didn't finish.
|
||||
if pgrep -f "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv" >/dev/null; then
|
||||
echo 'graceful teardown timed out — escalating to SIGKILL'
|
||||
pkill -9 -f "$(pwd)/py[0-9]*/bin/python.*_actor_child_main|subint-forkserv"
|
||||
fi
|
||||
|
||||
# 4. if a test zombie holds :1616 specifically and doesn't
|
||||
# match the above pattern, find its PID the hard way:
|
||||
ss -tlnp 2>/dev/null | grep ':1616' # prints `users:(("<name>",pid=NNNN,...))`
|
||||
# then (same SIGINT-first ladder):
|
||||
# kill -INT <NNNN>; sleep 1; kill -9 <NNNN> 2>/dev/null
|
||||
|
||||
# 5. remove stale UDS sockets
|
||||
rm -f /tmp/registry@*.sock
|
||||
|
||||
# 6. re-verify
|
||||
ss -tlnp 2>/dev/null | grep ':1616' || echo 'TCP :1616 now free'
|
||||
```
|
||||
|
||||
**Never ignore stale registry state.** If you see the
|
||||
"all tests failing" pattern — especially
|
||||
`trio.TooSlowError` / connection refused / address in
|
||||
use on many unrelated tests — check registry **before**
|
||||
spelunking into test code. The failure signature will
|
||||
be identical across backends because they're all
|
||||
fighting for the same port.
|
||||
|
||||
**False-positive warning for step 2:** a plain
|
||||
`pgrep -af '_actor_child_main'` will also match
|
||||
legit long-running tractor-embedding apps (e.g.
|
||||
`piker` at `~/repos/piker/py*/bin/python3 -m
|
||||
tractor._child ...`). Always scope to the current
|
||||
repo's python path, or only use step 1 (`:1616`) as
|
||||
the authoritative signal.
|
||||
|
||||
## 4. Run and report
|
||||
|
||||
- Run the constructed command.
|
||||
- Use a timeout of **600000ms** (10min) for full suite
|
||||
runs, **120000ms** (2min) for single-file runs.
|
||||
- If the suite is large (full `tests/`), consider running
|
||||
in the background and checking output when done.
|
||||
- Use `--lf` (last-failed) to re-run only previously
|
||||
failing tests when iterating on a fix.
|
||||
|
||||
### On failure:
|
||||
- Show the failing test name(s) and short traceback.
|
||||
- If the failure looks related to recent changes, point
|
||||
out the likely cause and suggest a fix.
|
||||
- **Check the known-flaky list** (section 8) before
|
||||
investigating — don't waste time on pre-existing
|
||||
timeout issues.
|
||||
- **NEVER auto-commit fixes.** If you apply a code fix
|
||||
during test iteration, leave it unstaged. Tell the
|
||||
user what changed and suggest they review the
|
||||
worktree state, stage files manually, and use
|
||||
`/commit-msg` (inline or in a separate session) to
|
||||
generate the commit message. The human drives all
|
||||
`git add` and `git commit` operations.
|
||||
|
||||
### On success:
|
||||
- Report the pass/fail/skip counts concisely.
|
||||
|
||||
## 5. Test directory layout (reference)
|
||||
|
||||
```
|
||||
tests/
|
||||
├── conftest.py # root fixtures, daemon, signals
|
||||
├── devx/ # debugger/tooling tests
|
||||
├── ipc/ # transport protocol tests
|
||||
├── msg/ # messaging layer tests
|
||||
├── discovery/ # discovery subsystem tests
|
||||
│ ├── test_multiaddr.py # multiaddr construction
|
||||
│ └── test_registrar.py # registry/discovery protocol
|
||||
├── test_local.py # registrar + local actor basics
|
||||
├── test_rpc.py # RPC error handling
|
||||
├── test_spawning.py # subprocess spawning
|
||||
├── test_multi_program.py # multi-process tree tests
|
||||
├── test_cancellation.py # cancellation semantics
|
||||
├── test_context_stream_semantics.py # ctx streaming
|
||||
├── test_inter_peer_cancellation.py # peer cancel
|
||||
├── test_infected_asyncio.py # trio-in-asyncio
|
||||
└── ...
|
||||
```
|
||||
|
||||
## 6. Change-type → test mapping
|
||||
|
||||
After modifying specific modules, run the corresponding
|
||||
test subset first for fast feedback:
|
||||
|
||||
| Changed module(s) | Run these tests first |
|
||||
|---|---|
|
||||
| `runtime/_runtime.py`, `runtime/_state.py` | `test_local.py test_rpc.py test_spawning.py test_root_runtime.py` |
|
||||
| `discovery/` (`_registry`, `_discovery`, `_addr`) | `tests/discovery/ test_multi_program.py test_local.py` |
|
||||
| `_context.py`, `_streaming.py` | `test_context_stream_semantics.py test_advanced_streaming.py` |
|
||||
| `ipc/` (`_chan`, `_server`, `_transport`) | `tests/ipc/ test_2way.py` |
|
||||
| `runtime/_portal.py`, `runtime/_rpc.py` | `test_rpc.py test_cancellation.py` |
|
||||
| `spawn/` (`_spawn`, `_entry`) | `test_spawning.py test_multi_program.py` |
|
||||
| `devx/debug/` | `tests/devx/test_debugger.py` (slow!) |
|
||||
| `to_asyncio.py` | `test_infected_asyncio.py test_root_infect_asyncio.py` |
|
||||
| `msg/` | `tests/msg/` |
|
||||
| `_exceptions.py` | `test_remote_exc_relay.py test_inter_peer_cancellation.py` |
|
||||
| `runtime/_supervise.py` | `test_cancellation.py test_spawning.py` |
|
||||
|
||||
## 7. Quick-check shortcuts
|
||||
|
||||
### After refactors (fastest first-pass):
|
||||
```sh
|
||||
# import + collect check
|
||||
python -c 'import tractor' && python -m pytest tests/ -x -q --co 2>&1 | tail -3
|
||||
|
||||
# core subset (~10s)
|
||||
python -m pytest tests/test_local.py tests/test_rpc.py tests/test_spawning.py tests/discovery/test_registrar.py -x --tb=short --no-header
|
||||
```
|
||||
|
||||
### Inspect last failures (without re-running):
|
||||
|
||||
When the user asks "what failed?", "show failures",
|
||||
or wants to check the last-failed set before
|
||||
re-running — read the pytest cache directly. This
|
||||
is instant and avoids test collection overhead.
|
||||
|
||||
```sh
|
||||
python -c "
|
||||
import json, pathlib, sys
|
||||
p = pathlib.Path('.pytest_cache/v/cache/lastfailed')
|
||||
if not p.exists():
|
||||
print('No lastfailed cache found.'); sys.exit()
|
||||
data = json.loads(p.read_text())
|
||||
# filter to real test node IDs (ignore junk
|
||||
# entries that can accumulate from system paths)
|
||||
tests = sorted(k for k in data if k.startswith('tests/'))
|
||||
if not tests:
|
||||
print('No failures recorded.')
|
||||
else:
|
||||
print(f'{len(tests)} last-failed test(s):')
|
||||
for t in tests:
|
||||
print(f' {t}')
|
||||
"
|
||||
```
|
||||
|
||||
**Why not `--cache-show` or `--co --lf`?**
|
||||
|
||||
- `pytest --cache-show 'cache/lastfailed'` works
|
||||
but dumps raw dict repr including junk entries
|
||||
(stale system paths that leak into the cache).
|
||||
- `pytest --co --lf` actually *collects* tests which
|
||||
triggers import resolution and is slow (~0.5s+).
|
||||
Worse, when cached node IDs don't exactly match
|
||||
current parametrize IDs (e.g. param names changed
|
||||
between runs), pytest falls back to collecting
|
||||
the *entire file*, giving false positives.
|
||||
- Reading the JSON directly is instant, filterable
|
||||
to `tests/`-prefixed entries, and shows exactly
|
||||
what pytest recorded — no interpretation.
|
||||
|
||||
**After inspecting**, re-run the failures:
|
||||
```sh
|
||||
python -m pytest --lf -x --tb=short --no-header
|
||||
```
|
||||
|
||||
### Full suite in background:
|
||||
When core tests pass and you want full coverage while
|
||||
continuing other work, run in background:
|
||||
```sh
|
||||
python -m pytest tests/ -x --tb=short --no-header -q
|
||||
```
|
||||
(use `run_in_background=true` on the Bash tool)
|
||||
|
||||
## 8. Known flaky tests
|
||||
|
||||
These tests have **pre-existing** timing/environment
|
||||
sensitivity. If they fail with `TooSlowError` or
|
||||
pexpect `TIMEOUT`, they are almost certainly NOT caused
|
||||
by your changes — note them and move on.
|
||||
|
||||
| Test | Typical error | Notes |
|
||||
|---|---|---|
|
||||
| `devx/test_debugger.py::test_multi_nested_subactors_error_through_nurseries` | pexpect TIMEOUT | Debugger pexpect timing |
|
||||
| `test_cancellation.py::test_cancel_via_SIGINT_other_task` | TooSlowError | Signal handling race |
|
||||
| `test_inter_peer_cancellation.py::test_peer_spawns_and_cancels_service_subactor` | TooSlowError | Async timing (both param variants) |
|
||||
| `test_docs_examples.py::test_example[we_are_processes.py]` | `assert None == 0` | `__main__` missing `__file__` in subproc |
|
||||
|
||||
**Rule of thumb**: if a test fails with `TooSlowError`,
|
||||
`trio.TooSlowError`, or `pexpect.TIMEOUT` and you didn't
|
||||
touch the relevant code path, it's flaky — skip it.
|
||||
|
||||
## 9. The pytest-capture hang pattern (CHECK THIS FIRST)
|
||||
|
||||
**Symptom:** a tractor test hangs indefinitely under
|
||||
default `pytest` but passes instantly when you add
|
||||
`-s` (`--capture=no`).
|
||||
|
||||
**Cause:** tractor subactors (especially under fork-
|
||||
based backends) inherit pytest's stdout/stderr
|
||||
capture pipes via fds 1,2. Under high-volume error
|
||||
logging (e.g. multi-level cancel cascade, nested
|
||||
`run_in_actor` failures, anything triggering
|
||||
`RemoteActorError` + `ExceptionGroup` traceback
|
||||
spew), the **64KB Linux pipe buffer fills** faster
|
||||
than pytest drains it. Subactor writes block → can't
|
||||
finish exit → parent's `waitpid`/pidfd wait blocks →
|
||||
deadlock cascades up the tree.
|
||||
|
||||
**Pre-existing guards in the tractor harness** that
|
||||
encode this same knowledge — grep these FIRST
|
||||
before spelunking:
|
||||
|
||||
- `tests/conftest.py:258-260` (in the `daemon`
|
||||
fixture): `# XXX: too much logging will lock up
|
||||
the subproc (smh)` — downgrades `trace`/`debug`
|
||||
loglevel to `info` to prevent the hang.
|
||||
- `tests/conftest.py:316`: `# can lock up on the
|
||||
_io.BufferedReader and hang..` — noted on the
|
||||
`proc.stderr.read()` post-SIGINT.
|
||||
|
||||
**Debug recipe (in priority order):**
|
||||
|
||||
1. **Try `-s` first.** If the hang disappears with
|
||||
`pytest -s`, you've confirmed it's capture-pipe
|
||||
fill. Skip spelunking.
|
||||
2. **Lower the loglevel.** Default `--ll=error` on
|
||||
this project; if you've bumped it to `debug` /
|
||||
`info`, try dropping back. Each log level
|
||||
multiplies pipe-pressure under fault cascades.
|
||||
3. **If you MUST use default capture + high log
|
||||
volume**, redirect subactor stdout/stderr in the
|
||||
child prelude (e.g.
|
||||
`tractor.spawn._subint_forkserver._child_target`
|
||||
post-`_close_inherited_fds`) to `/dev/null` or a
|
||||
file.
|
||||
|
||||
**Signature tells you it's THIS bug (vs. a real
|
||||
code hang):**
|
||||
|
||||
- Multi-actor test under fork-based backend
|
||||
(`subint_forkserver`, eventually `trio_proc` too
|
||||
under enough log volume).
|
||||
- Multiple `RemoteActorError` / `ExceptionGroup`
|
||||
tracebacks in the error path.
|
||||
- Test passes with `-s` in the 5-10s range, hangs
|
||||
past pytest-timeout (usually 30+ s) without `-s`.
|
||||
- Subactor processes visible via `pgrep -af
|
||||
subint-forkserv` or similar after the hang —
|
||||
they're alive but blocked on `write()` to an
|
||||
inherited stdout fd.
|
||||
|
||||
**Historical reference:** this deadlock cost a
|
||||
multi-session investigation (4 genuine cascade
|
||||
fixes landed along the way) that only surfaced the
|
||||
capture-pipe issue AFTER the deeper fixes let the
|
||||
tree actually tear down enough to produce pipe-
|
||||
filling log volume. Full post-mortem in
|
||||
`ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md`.
|
||||
Lesson codified here so future-me grep-finds the
|
||||
workaround before digging.
|
||||
|
||||
## 10. Reaping zombie subactors (`tractor-reap`)
|
||||
|
||||
**Symptom:** after a `pytest` run crashes, times out,
|
||||
or is `Ctrl+C`'d, subactor forks (esp. under
|
||||
`subint_forkserver`) can be reparented to `init`
|
||||
(PPid==1) and linger. They hold onto ports, inherit
|
||||
pytest's capture-pipe fds, and flakify later
|
||||
sessions.
|
||||
|
||||
**Two layers of defense:**
|
||||
|
||||
### a) Session-scoped auto-fixture (always on)
|
||||
|
||||
`tractor/_testing/pytest.py::_reap_orphaned_subactors`
|
||||
runs at pytest session teardown. It walks `/proc` for
|
||||
direct descendants of the pytest pid, SIGINTs them,
|
||||
waits up to 3s, then SIGKILLs survivors. SC-polite:
|
||||
gives the subactor runtime a chance to run its trio
|
||||
cancel shield + IPC teardown before escalation.
|
||||
|
||||
This is *autouse* and session-scoped — you don't need
|
||||
to do anything. It just runs.
|
||||
|
||||
### b) `scripts/tractor-reap` CLI (manual reap)
|
||||
|
||||
For the **pytest-died-mid-session** case (Ctrl+C, OOM
|
||||
kill, hung process you had to `kill -9`), the fixture
|
||||
never ran. Reach for the CLI:
|
||||
|
||||
```sh
|
||||
# default: orphans (PPid==1, cwd==repo, cmd contains python)
|
||||
scripts/tractor-reap
|
||||
|
||||
# descendant-mode: from a still-live supervisor
|
||||
scripts/tractor-reap --parent <pytest-pid>
|
||||
|
||||
# see what would be reaped, don't signal
|
||||
scripts/tractor-reap -n
|
||||
|
||||
# tune the SIGINT → SIGKILL grace window
|
||||
scripts/tractor-reap --grace 5
|
||||
```
|
||||
|
||||
Exit code: `0` if everyone exited on SIGINT, `1` if
|
||||
SIGKILL had to escalate — so you can chain it in CI
|
||||
health-checks (`scripts/tractor-reap || <alert>`).
|
||||
|
||||
**What it matches** (orphan-mode):
|
||||
- `PPid == 1` (reparented to init → definitely
|
||||
orphaned, not just a currently-running child)
|
||||
- `cwd == <repo-root>` (keeps the sweep scoped; won't
|
||||
touch unrelated init-children elsewhere)
|
||||
- `python` in cmdline
|
||||
|
||||
**What it does not do:** kill anything whose PPid is
|
||||
still a live tractor parent. If the parent is alive
|
||||
it's not an orphan; use `--parent <pid>` if you need
|
||||
to force-reap under a still-live supervisor.
|
||||
|
||||
**When NOT to run it:** while a pytest session is
|
||||
active in another terminal. It's safe (won't touch
|
||||
that session's live children in orphan-mode) but can
|
||||
race if the target session is mid-teardown.
|
||||
|
||||
### c) `--shm` / `--shm-only`: orphan-segment sweep
|
||||
|
||||
Because `tractor.ipc._mp_bs.disable_mantracker()`
|
||||
turns off `mp.resource_tracker` (see
|
||||
`ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md`),
|
||||
a hard-crashing actor can leave `/dev/shm/<key>`
|
||||
segments behind that nothing else GCs.
|
||||
|
||||
```sh
|
||||
# process reap THEN shm sweep
|
||||
scripts/tractor-reap --shm
|
||||
|
||||
# shm sweep only (skip process phase)
|
||||
scripts/tractor-reap --shm-only
|
||||
|
||||
# dry-run: list candidates, don't unlink
|
||||
scripts/tractor-reap --shm -n
|
||||
```
|
||||
|
||||
**Match criteria** (very conservative — this is a
|
||||
shared-system path, can't be wrong):
|
||||
- segment is a regular file under `/dev/shm`,
|
||||
- owned by the **current uid** (`stat.st_uid`),
|
||||
- AND **no live process holds it open** —
|
||||
enumerated by walking every readable
|
||||
`/proc/<pid>/maps` (post-mmap mappings) AND
|
||||
`/proc/<pid>/fd/*` (pre-mmap shm-opened fds).
|
||||
|
||||
The "nobody has it open" check is the
|
||||
kernel-canonical "is this leaked?" test — same
|
||||
answer `lsof /dev/shm/<key>` would give. No
|
||||
reliance on tractor-specific naming, so it works
|
||||
for any tractor app. Critically, it WILL NOT touch
|
||||
segments held by other apps you have running
|
||||
(e.g. `piker`, `lttng-ust-*`, `aja-shm-*` —
|
||||
verified locally with 81 in-use segments correctly
|
||||
preserved).
|
||||
|
|
@ -1,18 +1,10 @@
|
|||
name: CI
|
||||
|
||||
# NOTE distilled from,
|
||||
# https://github.com/orgs/community/discussions/26276
|
||||
on:
|
||||
# any time a new update to 'main'
|
||||
# any time someone pushes a new branch to origin
|
||||
push:
|
||||
branches:
|
||||
- main
|
||||
|
||||
# for on all (forked) PRs to repo
|
||||
# NOTE, use a draft PR if you just want CI triggered..
|
||||
pull_request:
|
||||
|
||||
# to run workflow manually from the "Actions" tab
|
||||
# Allows you to run this workflow manually from the Actions tab
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
|
|
@ -82,44 +74,24 @@ jobs:
|
|||
# run: mypy tractor/ --ignore-missing-imports --show-traceback
|
||||
|
||||
|
||||
testing:
|
||||
name: '${{ matrix.os }} Python${{ matrix.python-version }} spawn_backend=${{ matrix.spawn_backend }} tpt_proto=${{ matrix.tpt_proto }}'
|
||||
timeout-minutes: 16
|
||||
testing-linux:
|
||||
name: '${{ matrix.os }} Python ${{ matrix.python }} - ${{ matrix.spawn_backend }}'
|
||||
timeout-minutes: 10
|
||||
runs-on: ${{ matrix.os }}
|
||||
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
os: [
|
||||
ubuntu-latest,
|
||||
macos-latest,
|
||||
]
|
||||
python-version: [
|
||||
'3.13',
|
||||
# '3.14',
|
||||
]
|
||||
os: [ubuntu-latest]
|
||||
python-version: ['3.13']
|
||||
spawn_backend: [
|
||||
'trio',
|
||||
# 'mp_spawn',
|
||||
# 'mp_forkserver',
|
||||
# ?TODO^ is it worth it to get these running again?
|
||||
#
|
||||
# - [ ] next-gen backends, on 3.13+
|
||||
# https://github.com/goodboy/tractor/issues/379
|
||||
# 'subinterpreter',
|
||||
# 'subint',
|
||||
]
|
||||
tpt_proto: [
|
||||
'tcp',
|
||||
'uds',
|
||||
]
|
||||
# https://github.com/orgs/community/discussions/26253#discussioncomment-3250989
|
||||
exclude:
|
||||
# don't do UDS run on macOS (for now)
|
||||
- os: macos-latest
|
||||
tpt_proto: 'uds'
|
||||
|
||||
steps:
|
||||
|
||||
- uses: actions/checkout@v4
|
||||
|
||||
- name: 'Install uv + py-${{ matrix.python-version }}'
|
||||
|
|
@ -146,15 +118,7 @@ jobs:
|
|||
run: uv tree
|
||||
|
||||
- name: Run tests
|
||||
run: >
|
||||
uv run
|
||||
pytest
|
||||
tests/
|
||||
-rsx
|
||||
--spawn-backend=${{ matrix.spawn_backend }}
|
||||
--tpt-proto=${{ matrix.tpt_proto }}
|
||||
--capture=fd
|
||||
# ^XXX^ can't work with --spawn-method=main_thread_forkserver
|
||||
run: uv run pytest tests/ --spawn-backend=${{ matrix.spawn_backend }} -rsx
|
||||
|
||||
# XXX legacy NOTE XXX
|
||||
#
|
||||
|
|
|
|||
|
|
@ -102,69 +102,3 @@ venv.bak/
|
|||
|
||||
# mypy
|
||||
.mypy_cache/
|
||||
|
||||
# all files under
|
||||
.git/
|
||||
|
||||
# require very explicit staging for anything we **really**
|
||||
# want put/kept in repo.
|
||||
notes_to_self/
|
||||
snippets/
|
||||
|
||||
# ------- AI shiz -------
|
||||
# `ai.skillz` symlinks,
|
||||
# (machine-local, deploy via deploy-skill.sh)
|
||||
.claude/skills/py-codestyle
|
||||
.claude/skills/close-wkt
|
||||
.claude/skills/plan-io
|
||||
.claude/skills/prompt-io
|
||||
.claude/skills/resolve-conflicts
|
||||
.claude/skills/inter-skill-review
|
||||
|
||||
# /open-wkt specifics
|
||||
.claude/skills/open-wkt
|
||||
.claude/wkts/
|
||||
claude_wkts
|
||||
|
||||
# /code-review-changes specifics
|
||||
.claude/skills/code-review-changes
|
||||
# review-skill ephemeral ctx (per-PR, single-use)
|
||||
.claude/review_context.md
|
||||
.claude/review_regression.md
|
||||
|
||||
# /pr-msg specifics
|
||||
.claude/skills/pr-msg/*
|
||||
# repo-specific
|
||||
!.claude/skills/pr-msg/format-reference.md
|
||||
# XXX, so u can nvim-telescope this file.
|
||||
# !.claude/skills/pr-msg/pr_msg_LATEST.md
|
||||
|
||||
# /commit-msg specifics
|
||||
# - any commit-msg gen tmp files
|
||||
.claude/*_commit_*.md
|
||||
.claude/*_commit*.txt
|
||||
.claude/skills/commit-msg/*
|
||||
!.claude/skills/commit-msg/style-duie-reference.md
|
||||
|
||||
# use prompt-io instead?
|
||||
.claude/plans
|
||||
|
||||
# nix develop --profile .nixdev
|
||||
.nixdev*
|
||||
|
||||
# :Obsession .
|
||||
Session.vim
|
||||
|
||||
# `gish` local `.md`-files
|
||||
# TODO? better all around automation!
|
||||
# -[ ] it'd be handy to also commit and sync with wtv git service?
|
||||
# -[ ] everything should be put under a `.gish/` no?
|
||||
gitea/
|
||||
gh/
|
||||
|
||||
# ------ macOS ------
|
||||
# Finder metadata
|
||||
**/.DS_Store
|
||||
|
||||
# LLM conversations that should remain private
|
||||
docs/conversations/
|
||||
|
|
|
|||
|
|
@ -1,281 +0,0 @@
|
|||
# `fork()` in a multi-threaded program — execution-side vs. memory-side of the same coin
|
||||
|
||||
A reference doc for readers who've encountered one of two
|
||||
opposite-sounding framings of POSIX `fork()` semantics in a
|
||||
multi-threaded program and are confused by the other.
|
||||
|
||||
This is a sibling to
|
||||
`subint_fork_blocked_by_cpython_post_fork_issue.md` — that
|
||||
doc covers a CPython-level refusal of fork-from-subint;
|
||||
this one covers the more general POSIX layer, since
|
||||
tractor's main-thread forkserver design rests on it.
|
||||
|
||||
## TL;DR
|
||||
|
||||
POSIX `fork()` only preserves the *calling* thread as a
|
||||
runnable thread in the child — every other thread in the
|
||||
parent simply never executes another instruction in the
|
||||
child. trio's docs call this "leaked"; tractor's
|
||||
`_main_thread_forkserver.py` docstring calls it "gone".
|
||||
Both are correct: "gone" is the *execution* side (no
|
||||
scheduler entry, no instructions retired), "leaked" is the
|
||||
*memory* side (the dead threads' stacks and per-thread
|
||||
heap structures still ride into the child's address space
|
||||
as orphaned COW pages with no owner and no cleanup hook).
|
||||
Same POSIX reality, two halves of the same coin.
|
||||
|
||||
## The two framings
|
||||
|
||||
[python-trio/trio#1614][trio-1614] (the canonical "trio +
|
||||
fork" hazards thread) puts it this way:
|
||||
|
||||
> If you use `fork()` in a process with multiple threads,
|
||||
> all the other thread stacks are just leaked: there's
|
||||
> nothing else you can reasonably do with them.
|
||||
|
||||
`tractor.spawn._main_thread_forkserver`'s module docstring
|
||||
(specifically the "What survives the fork? — POSIX
|
||||
semantics" section) puts it this way:
|
||||
|
||||
> POSIX `fork()` only preserves the *calling* thread as a
|
||||
> runnable thread in the child. Every other thread in the
|
||||
> parent — trio's runner thread, any `to_thread` cache
|
||||
> threads, anything else — never executes another
|
||||
> instruction post-fork.
|
||||
|
||||
A reader bouncing between the two can be forgiven for
|
||||
asking: well, *which* is it — leaked or gone?
|
||||
|
||||
The answer is "yes". They're describing the same POSIX
|
||||
behavior from two different angles:
|
||||
|
||||
- trio is talking about the **bytes** the dead threads
|
||||
leave behind — stacks, TLS slots, per-thread arena
|
||||
metadata — and the fact that nothing in the child can
|
||||
drive them forward, free them, or even safely walk
|
||||
them. That's a memory leak in the strict sense: held
|
||||
but unreachable.
|
||||
- tractor is talking about the **execution** side
|
||||
relevant to the forkserver design: which threads
|
||||
retire instructions in the child? Exactly one — the
|
||||
one that called `fork()`. Everything else, regardless
|
||||
of the bytes left behind, is dead in a scheduler
|
||||
sense.
|
||||
|
||||
Neither framing is wrong; they're just answering
|
||||
different questions.
|
||||
|
||||
## POSIX `fork()` in a multi-threaded program — what actually happens
|
||||
|
||||
Per POSIX (and concretely on Linux glibc), the contract
|
||||
of `fork()` in a multi-threaded process is:
|
||||
|
||||
1. The kernel creates a new process whose virtual
|
||||
address space is a COW copy of the parent's. *All*
|
||||
pages map across — code, heap, every thread's stack,
|
||||
every malloc arena, every mmap region.
|
||||
2. Of the parent's N threads, exactly **one** is
|
||||
reified in the child as a runnable kernel task: the
|
||||
thread that called `fork()`. The other N-1 threads
|
||||
have *no* corresponding task in the child kernel. They
|
||||
were never scheduled, never `clone()`d for the child,
|
||||
never exist as runnable entities.
|
||||
3. Their **memory artifacts** — pthread stacks, TLS,
|
||||
`pthread_t` structures, glibc per-thread arena
|
||||
bookkeeping — are still mapped in the child's address
|
||||
space, because (1) duplicates *everything* page-wise.
|
||||
They sit there as inert COW bytes.
|
||||
4. The kernel does not clean those bytes up. There is no
|
||||
"phantom-thread cleanup" pass post-fork. The kernel
|
||||
doesn't know which mapped pages "belonged to" which
|
||||
thread — at the kernel level mappings are
|
||||
process-scoped, not thread-scoped.
|
||||
5. The surviving thread (the caller of `fork()`) cannot
|
||||
safely access those leaked bytes either. Any state
|
||||
they encoded — held mutexes, in-flight syscalls,
|
||||
half-updated invariants — is frozen at whatever
|
||||
instant the parent's fork-syscall observed it. Some
|
||||
of those mutexes may even still be locked from the
|
||||
child's POV (the canonical "fork-in-multithreaded-
|
||||
program-deadlocks" hazard; see `man pthread_atfork`).
|
||||
|
||||
So: from the kernel's PoV, the child has one thread.
|
||||
From the address-space's PoV, the child has all the
|
||||
parent's bytes — including the corpses of the N-1 dead
|
||||
threads' stacks. Both true simultaneously.
|
||||
|
||||
## Why trio says "leaked"
|
||||
|
||||
trio's framing makes sense from the parent's
|
||||
PoV, looking at *what those threads were doing*. In a
|
||||
running `trio.run()` process you typically have:
|
||||
|
||||
- The trio runner thread itself — owns the `selectors`
|
||||
epoll fd, the signal-wakeup-fd, the run-queue.
|
||||
- Threadpool worker threads (`trio.to_thread`'s cache)
|
||||
— blocked in `wait()` on the threadpool's work
|
||||
condvar.
|
||||
- Whatever other ad-hoc threads the application
|
||||
started.
|
||||
|
||||
Each of those threads owns *real work-state*: epoll
|
||||
registrations, file descriptors held in
|
||||
soon-to-be-completed reads, half-released locks, posted
|
||||
but unconsumed wakeups. After fork, that state is still
|
||||
encoded in the child's memory. None of it is invalid in
|
||||
a well-formed-bytes sense. It's just that:
|
||||
|
||||
- The thread that was driving it is gone.
|
||||
- Nothing else in the child knows the layout well
|
||||
enough to take over.
|
||||
- Even if it did, the kernel objects backing the work
|
||||
(epoll fd, signalfd) have separate post-fork
|
||||
semantics that don't compose with userland trio
|
||||
state.
|
||||
|
||||
So the bytes are *held* (they're in the child's
|
||||
address space, they count against RSS, they survive
|
||||
until something clobbers them), and they're
|
||||
*unreachable* in any meaningful sense — no thread can
|
||||
safely drive them forward. That is the textbook
|
||||
definition of a leak.
|
||||
|
||||
trio's quote is reminding the user that `fork()` from a
|
||||
multi-threaded process is a one-way memory hazard:
|
||||
whatever those threads were doing, that work-state is
|
||||
now garbage you happen to still be carrying.
|
||||
|
||||
## Why tractor says "gone"
|
||||
|
||||
tractor's `_main_thread_forkserver` framing is concerned
|
||||
with a different question: *which thread executes in the
|
||||
child, and is it safe?*
|
||||
|
||||
The forkserver design rests on POSIX's "calling thread
|
||||
is the sole survivor" guarantee. We pick that calling
|
||||
thread very deliberately: a dedicated worker that has
|
||||
provably never entered trio. So the thread that *does*
|
||||
run in the child is one whose locals, TLS, and stack
|
||||
contain nothing trio-related. Trio's runner thread —
|
||||
the one that owned the epoll fd and the run-queue — is
|
||||
*gone* from the child in the execution sense. It will
|
||||
never run another instruction. The fact that its stack
|
||||
bytes still exist in the child's address space (the
|
||||
"leaked" view) is irrelevant to the forkserver, because
|
||||
nothing in the child reads or writes those pages.
|
||||
|
||||
So when the docstring says "Every other thread … is
|
||||
gone the instant `fork()` returns in the child", it's
|
||||
being precise about the surface that matters for the
|
||||
backend: scheduler-level liveness. Nothing schedules
|
||||
those threads ever again. Whether their bytes are
|
||||
hanging around is a separate (and, for the design,
|
||||
non-load-bearing) fact.
|
||||
|
||||
## Cross-table
|
||||
|
||||
The same tabular layout the `_main_thread_forkserver`
|
||||
docstring uses, expanded with a fourth "what handles
|
||||
it" column:
|
||||
|
||||
| thread | parent | child (executing) | child (memory) | what handles it |
|
||||
|---------------------|-----------|-------------------|------------------------------|-----------------------------|
|
||||
| forkserver worker | continues | sole survivor | live stack | runs the child's bootstrap |
|
||||
| `trio.run()` thread | continues | not running | leaked stack (zombie bytes) | overwritten by child's fresh `trio.run()` |
|
||||
| any other thread | continues | not running | leaked stack (zombie bytes) | overwritten / GC'd / clobbered by `exec()` if used |
|
||||
|
||||
The "child (executing)" column is the *execution* side
|
||||
of the coin — what tractor cares about. The "child
|
||||
(memory)" column is the *memory* side — what trio
|
||||
cares about.
|
||||
|
||||
The "what handles it" column is the deliberate punchline
|
||||
of the design: nothing has to handle the leaked bytes
|
||||
*explicitly*. They get clobbered by ordinary forward
|
||||
progress in the child:
|
||||
|
||||
- The fresh `trio.run()` the child boots up allocates
|
||||
its own stack, scheduler, and run-queue, which over
|
||||
time overlaps and overwrites the inherited zombie
|
||||
pages.
|
||||
- Python's GC walks live objects only; the dead-thread
|
||||
Python frames aren't reachable from any
|
||||
`PyThreadState`, so they get freed at the next
|
||||
collection cycle.
|
||||
- If the child eventually `exec()`s, the entire address
|
||||
space is replaced and the leak vanishes.
|
||||
|
||||
## What this means for the forkserver design
|
||||
|
||||
The crucial point is that **the design doesn't and
|
||||
*can't* prevent the leak**. There is no userland fix
|
||||
for COW thread stacks. The kernel hands the child a
|
||||
duplicated address space; that's what `fork()` *is*. No
|
||||
amount of pre-fork hookery, `pthread_atfork()`
|
||||
gymnastics, or post-fork cleanup can un-COW the dead
|
||||
threads' pages without unmapping them, and unmapping
|
||||
arbitrary regions of a duplicated address space is
|
||||
neither portable nor safe.
|
||||
|
||||
What the design *does* ensure is the orthogonal
|
||||
property: the survivor thread is one that doesn't need
|
||||
any of that leaked state to function. Concretely:
|
||||
|
||||
- Survivor is the forkserver worker thread.
|
||||
- That worker has provably never imported, called into,
|
||||
or held any reference to `trio`. (Enforced by keeping
|
||||
the worker's lifecycle entirely in
|
||||
`_main_thread_forkserver.py` and never letting trio
|
||||
task-state cross into it.)
|
||||
- So the leaked pages — trio runner stack, threadpool
|
||||
caches, etc. — are inert relative to the survivor.
|
||||
No code path in the child references them.
|
||||
- The child then boots its own fresh `trio.run()`,
|
||||
which allocates new state in new pages. Over the
|
||||
child's lifetime the COW'd zombie pages get
|
||||
overwritten, GC'd, or (if the child eventually
|
||||
`exec()`s) discarded wholesale.
|
||||
|
||||
The "leak" is real but inert. It costs RSS until
|
||||
clobbered; it doesn't cost correctness. That's exactly
|
||||
the property the forkserver pattern is built on, and
|
||||
it's also why the design needs the "calling thread is
|
||||
trio-free" precondition to be airtight: if the survivor
|
||||
were a trio thread, it *would* try to drive the leaked
|
||||
trio state, and the leak would no longer be inert.
|
||||
|
||||
## See also
|
||||
|
||||
- `tractor/spawn/_main_thread_forkserver.py` — module
|
||||
docstring's "What survives the fork? — POSIX
|
||||
semantics" section is the in-tree, code-adjacent
|
||||
prose this doc expands on. The cross-table here is a
|
||||
fourth-column expansion of the table there.
|
||||
|
||||
- [python-trio/trio#1614][trio-1614] — the trio issue
|
||||
with the "leaked" framing, and the canonical thread
|
||||
for trio + `fork()` hazards more broadly.
|
||||
|
||||
- [`subint_fork_blocked_by_cpython_post_fork_issue.md`](./subint_fork_blocked_by_cpython_post_fork_issue.md)
|
||||
— sibling analysis covering CPython's *post-fork*
|
||||
hooks (`PyOS_AfterFork_Child`,
|
||||
`_PyInterpreterState_DeleteExceptMain`) and why
|
||||
fork-from-non-main-subint is a CPython-level hard
|
||||
refusal. Complementary axis: this doc is about POSIX
|
||||
semantics; that doc is about the CPython runtime
|
||||
layer that runs *after* POSIX `fork()` returns in
|
||||
the child.
|
||||
|
||||
- `man pthread_atfork(3)` — canonical "fork in a
|
||||
multithreaded process is dangerous" reference.
|
||||
Especially the rationale section, which is the
|
||||
closest thing to a normative statement of "the
|
||||
surviving thread cannot safely use anything the dead
|
||||
threads were touching."
|
||||
|
||||
- `man fork(2)` (Linux) — "Other than [the calling
|
||||
thread], … no other threads are replicated …"
|
||||
paragraph is the kernel-side statement of the
|
||||
execution-side framing this doc opens with.
|
||||
|
||||
[trio-1614]: https://github.com/python-trio/trio/issues/1614
|
||||
|
|
@ -1,142 +0,0 @@
|
|||
# Spawn-time boot-death (`rc=2`) under rapid same-name spawn against a registrar
|
||||
|
||||
## Symptom
|
||||
|
||||
Spawning N (≥4) sub-actors with the **same name** in tight
|
||||
succession against a daemon registrar surfaces as
|
||||
`ActorFailure: Sub-actor (...) died during boot (rc=2)
|
||||
before completing parent-handshake`.
|
||||
|
||||
```
|
||||
tests/discovery/test_multi_program.py
|
||||
::test_dup_name_cancel_cascade_escalates_to_hard_kill[n_dups=4]
|
||||
```
|
||||
|
||||
```
|
||||
tractor._exceptions.ActorFailure:
|
||||
Sub-actor ('doggy', '<uuid>') died during boot (rc=2)
|
||||
before completing parent-handshake.
|
||||
proc: <_ForkedProc pid=<n> returncode=None>
|
||||
```
|
||||
|
||||
The `proc` repr shows `returncode=None` because the repr is
|
||||
captured before `proc.wait()` returns; the actual
|
||||
`os.WEXITSTATUS == 2` is reported via `result['died']` in the
|
||||
race-helper.
|
||||
|
||||
## When it surfaces
|
||||
|
||||
- N=2 (`n_dups=2`): **always passes**.
|
||||
- N=4 (`n_dups=4`): **consistent fail** under both `tpt-proto=tcp`
|
||||
and `tpt-proto=uds`, MTF backend.
|
||||
- N=8 (`n_dups=8`): **passes** (counter-intuitive — see "racing
|
||||
windows").
|
||||
- Non-MTF backends: not yet exercised systematically.
|
||||
|
||||
## What previously masked it
|
||||
|
||||
Pre the spawn-time `wait_for_peer_or_proc_death` race-helper
|
||||
(in `tractor.spawn._spawn`), the parent's `start_actor` flow
|
||||
ended with a bare:
|
||||
|
||||
```python
|
||||
event, chan = await ipc_server.wait_for_peer(uid)
|
||||
```
|
||||
|
||||
That awaits an unsignalled `trio.Event` on `_peer_connected[uid]`.
|
||||
If the sub-actor process **dies during boot** (before its
|
||||
runtime executes the parent-callback handshake that sets the
|
||||
event), the wait parks forever. The dead proc becomes a zombie
|
||||
because no one ever calls `proc.wait()` to reap it.
|
||||
|
||||
In test contexts the failure presented as a hang or a much
|
||||
later `trio.TooSlowError` from an outer `fail_after`. In
|
||||
production it'd present as a parent that never makes progress
|
||||
past `start_actor`. The death itself was silently masked.
|
||||
|
||||
## What surfaces it now
|
||||
|
||||
`tractor.spawn._spawn.wait_for_peer_or_proc_death` (used by
|
||||
`_main_thread_forkserver_proc`) races the handshake-wait
|
||||
against `proc.wait()`. The race-helper raises `ActorFailure`
|
||||
on death-first instead of parking, exposing the rc=2.
|
||||
|
||||
## Hypothesis: registrar-side same-name contention
|
||||
|
||||
The test spawns N actors with name `doggy` sequentially:
|
||||
|
||||
```python
|
||||
for i in range(n_dups):
|
||||
p: Portal = await an.start_actor('doggy')
|
||||
portals.append(p)
|
||||
```
|
||||
|
||||
Each spawned doggy:
|
||||
|
||||
1. Forks via the forkserver.
|
||||
2. Boots its runtime in `_actor_child_main`.
|
||||
3. Connects back to the parent for handshake.
|
||||
4. Connects to the daemon registrar to call `register_actor`.
|
||||
5. Enters its RPC msg-loop.
|
||||
|
||||
Step (4) is where the same-name contention lives. The
|
||||
registrar's `register_actor` (in
|
||||
`tractor.discovery._registry`) accepts duplicate names
|
||||
(stores `(name, uuid) -> addr`), but its internal bookkeeping
|
||||
may have a non-trivial check (e.g. `wait_for_actor` resolution,
|
||||
`_addrs2aids` map updates) that errors out under specific
|
||||
ordering between the existing entry and the incoming one.
|
||||
|
||||
`rc=2 == os.WEXITSTATUS == 2` corresponds to `sys.exit(2)`
|
||||
in the doggy process — typically reached via an unhandled
|
||||
exception that's translated to exit code 2 by Python's top-
|
||||
level (e.g. `argparse` errors use 2; `SystemExit(2)` etc.).
|
||||
So the doggy is hitting an explicit exit path during
|
||||
`register_actor` or just-after.
|
||||
|
||||
The non-monotonic shape (N=2 OK, N=4 BAD, N=8 OK) suggests a
|
||||
specific timing window — likely "the 3rd register-RPC arrives
|
||||
while the 1st-or-2nd is in some intermediate state". With
|
||||
N=8, the additional procs widen the registration spread
|
||||
enough that no two land in the conflicting window.
|
||||
|
||||
## Where to dig next
|
||||
|
||||
- Add per-actor logging in `_actor_child_main` and
|
||||
`register_actor` to surface the actual exception that
|
||||
triggers the rc=2 exit. Currently the doggy dies before
|
||||
the parent ever sees its stderr (forkserver doesn't
|
||||
marshal child stdio back).
|
||||
- Race-test the registrar's `register_actor` /
|
||||
`unregister_actor` / `wait_for_actor` against same-name
|
||||
concurrent calls in isolation (no spawn).
|
||||
- Consider whether `register_actor` should be idempotent
|
||||
under same-name re-register or should explicitly reject
|
||||
same-name (and ideally with a clear `RemoteActorError`,
|
||||
not `sys.exit(2)`).
|
||||
|
||||
## Test-suite handling
|
||||
|
||||
Currently:
|
||||
|
||||
- `tests/discovery/test_multi_program.py
|
||||
::test_dup_name_cancel_cascade_escalates_to_hard_kill[n_dups=4]`
|
||||
is `pytest.mark.xfail(strict=False, reason=...)` to keep
|
||||
the suite green while this issue is investigated.
|
||||
- `n_dups=2` and `n_dups=8` continue to validate the
|
||||
cancel-cascade hard-kill escalation.
|
||||
|
||||
Once the underlying race is understood + fixed, drop the
|
||||
xfail.
|
||||
|
||||
## Related work
|
||||
|
||||
- The cancel-cascade fix that introduced this regression
|
||||
test:
|
||||
`tractor/_exceptions.py:ActorTooSlowError`,
|
||||
`tractor/runtime/_supervise.py:_try_cancel_then_kill`,
|
||||
`tractor/runtime/_portal.py:Portal.cancel_actor(
|
||||
raise_on_timeout=...)`.
|
||||
- The spawn-time death-detection that exposed this:
|
||||
`tractor/spawn/_spawn.py:wait_for_peer_or_proc_death`,
|
||||
used by `tractor/spawn/_main_thread_forkserver.py`.
|
||||
|
|
@ -1,273 +0,0 @@
|
|||
# `test_register_duplicate_name` racy connect-failure on `daemon` fixture readiness
|
||||
|
||||
## Symptom
|
||||
|
||||
`tests/test_multi_program.py::test_register_duplicate_name`
|
||||
fails intermittently under BOTH transports + ALL spawn
|
||||
backends with connect-refused errors:
|
||||
|
||||
```
|
||||
# under --tpt-proto=uds
|
||||
FAILED tests/test_multi_program.py::test_register_duplicate_name
|
||||
- ConnectionRefusedError: [Errno 111] Connection refused
|
||||
( ^^^ this exc was collapsed from a group ^^^ )
|
||||
|
||||
# under --tpt-proto=tcp
|
||||
FAILED tests/test_multi_program.py::test_register_duplicate_name
|
||||
- OSError: all attempts to connect to 127.0.0.1:36003 failed
|
||||
( ^^^ this exc was collapsed from a group ^^^ )
|
||||
```
|
||||
|
||||
Distinct from the cancel-cascade `TooSlowError` flake
|
||||
class — see
|
||||
`cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`.
|
||||
This is a **connect-time race** before the daemon is
|
||||
fully ready to `accept()`, not a teardown-cascade
|
||||
slowness.
|
||||
|
||||
## Root cause: blind `time.sleep()` in `daemon` fixture
|
||||
|
||||
`tests/conftest.py::daemon` boots a sub-py-process via
|
||||
`subprocess.Popen([python, '-c', 'tractor.run_daemon(...)'])`,
|
||||
then **blindly sleeps** a fixed delay before yielding
|
||||
`proc` to the test:
|
||||
|
||||
```python
|
||||
# excerpt from tests/conftest.py::daemon
|
||||
proc = subprocess.Popen([
|
||||
sys.executable, '-c', code,
|
||||
])
|
||||
|
||||
bg_daemon_spawn_delay: float = _PROC_SPAWN_WAIT # 0.6
|
||||
if tpt_proto == 'uds':
|
||||
bg_daemon_spawn_delay += 1.6
|
||||
if _non_linux and ci_env:
|
||||
bg_daemon_spawn_delay += 1
|
||||
|
||||
# XXX, allow time for the sub-py-proc to boot up.
|
||||
# !TODO, see ping-polling ideas above!
|
||||
time.sleep(bg_daemon_spawn_delay)
|
||||
|
||||
assert not proc.returncode
|
||||
yield proc
|
||||
```
|
||||
|
||||
Inherent fragility: the delay is "long enough on dev
|
||||
boxes most of the time" but has no actual
|
||||
synchronization with the daemon's `bind()` + `listen()`
|
||||
completion. Under any of:
|
||||
|
||||
- Loaded box (CI parallelism, big rebuild in
|
||||
background, low-cpu-freq)
|
||||
- Cold first-run (`importlib` cache miss, JIT warmup)
|
||||
- Higher-than-expected `tractor` import cost
|
||||
- Filesystem latency (UDS sockfile create, slow
|
||||
tmpfs)
|
||||
|
||||
...the sleep finishes BEFORE the daemon has bound its
|
||||
listen socket → first test client call to
|
||||
`tractor.find_actor()` / `wait_for_actor()` /
|
||||
`open_nursery(registry_addrs=[reg_addr])`'s implicit
|
||||
connect → `ConnectionRefusedError` (TCP) or
|
||||
`FileNotFoundError`/`ConnectionRefusedError` (UDS).
|
||||
|
||||
## Reproducer
|
||||
|
||||
Easiest: run the suite under load.
|
||||
|
||||
```bash
|
||||
# create CPU pressure on another core in parallel
|
||||
stress-ng --cpu 2 --timeout 600s &
|
||||
|
||||
./py313/bin/python -m pytest \
|
||||
tests/test_multi_program.py::test_register_duplicate_name \
|
||||
--spawn-backend=main_thread_forkserver \
|
||||
--tpt-proto=tcp -v
|
||||
```
|
||||
|
||||
Reproduces ~30-50% of the time on a dev laptop. On a
|
||||
quiet idle box, may need 5-10 runs to hit.
|
||||
|
||||
## Why the existing `_PROC_SPAWN_WAIT` tuning is
|
||||
inadequate
|
||||
|
||||
Recent `bg_daemon_spawn_delay` rename
|
||||
(de-monotonic-grow fix) just-shipped removed the
|
||||
*accumulation* bug where each invocation made the
|
||||
NEXT test's wait longer too. Net effect: every
|
||||
invocation now uses the SAME `0.6 + 1.6` (UDS) or
|
||||
`0.6` (TCP) sleep, no growth. Good — but does
|
||||
NOTHING for the underlying race. Each individual
|
||||
test still relies on a blind sleep that may or may
|
||||
not be sufficient.
|
||||
|
||||
Bumping the constant higher pushes flake rate down
|
||||
but never to zero AND adds dead time to every
|
||||
non-flaking run. Not a fix, just a knob.
|
||||
|
||||
## Side effects
|
||||
|
||||
- **Inter-test cascade**: a single failure can cascade
|
||||
via leaked subprocesses (the `daemon` fixture's
|
||||
cleanup may not fully tear down a daemon that never
|
||||
reached "ready"). The `_reap_orphaned_subactors`
|
||||
session-end + `_track_orphaned_uds_per_test`
|
||||
per-test fixtures handle most of this now, but the
|
||||
affected test itself still fails.
|
||||
- **Worsens under fork-spawn backends**: the daemon
|
||||
has more init work
|
||||
(`_main_thread_forkserver`-coordinator-thread
|
||||
startup, etc.) so the sleep has to cover MORE.
|
||||
|
||||
## Fix design — replace blind sleep with active poll
|
||||
|
||||
The right primitive is **poll the daemon's bind
|
||||
address until it accepts a connection or we time
|
||||
out**, with the timeout being a hard ceiling rather
|
||||
than a baseline. Two implementation paths:
|
||||
|
||||
### Path A — TCP/UDS connect-poll loop
|
||||
|
||||
Try `socket.connect(reg_addr)` in a tight loop with
|
||||
short backoff (~50ms), succeed on the first non-error
|
||||
return, fail-loud on a hard cap (e.g. 10s). Same
|
||||
primitive works for both transports because both use
|
||||
`socket.connect()` semantics.
|
||||
|
||||
Rough shape:
|
||||
|
||||
```python
|
||||
def _wait_for_daemon_ready(
|
||||
reg_addr,
|
||||
tpt_proto: str,
|
||||
timeout: float = 10.0,
|
||||
poll_interval: float = 0.05,
|
||||
) -> None:
|
||||
deadline = time.monotonic() + timeout
|
||||
while True:
|
||||
if tpt_proto == 'tcp':
|
||||
sock = socket.socket(socket.AF_INET)
|
||||
target = reg_addr # (host, port)
|
||||
else: # uds
|
||||
sock = socket.socket(socket.AF_UNIX)
|
||||
target = os.path.join(*reg_addr)
|
||||
try:
|
||||
sock.settimeout(poll_interval)
|
||||
sock.connect(target)
|
||||
except (
|
||||
ConnectionRefusedError,
|
||||
FileNotFoundError,
|
||||
socket.timeout,
|
||||
) as exc:
|
||||
if time.monotonic() >= deadline:
|
||||
raise TimeoutError(
|
||||
f'Daemon never accepted on {target!r} '
|
||||
f'within {timeout}s'
|
||||
) from exc
|
||||
time.sleep(poll_interval)
|
||||
else:
|
||||
sock.close()
|
||||
return
|
||||
```
|
||||
|
||||
Pros: trivial primitive, no tractor-runtime
|
||||
dependency, works pre-yield in the fixture body,
|
||||
fail-fast on truly-broken daemon.
|
||||
Cons: doesn't actually do an IPC handshake, just
|
||||
proves listen-side is up. A daemon that bound but
|
||||
hasn't initialized its registrar table yet would
|
||||
still race.
|
||||
|
||||
### Path B — `tractor.find_actor()` poll
|
||||
|
||||
Use the actual discovery API the test would call:
|
||||
|
||||
```python
|
||||
async def _wait_for_daemon_ready_via_discovery(
|
||||
reg_addr,
|
||||
timeout: float = 10.0,
|
||||
poll_interval: float = 0.05,
|
||||
):
|
||||
deadline = trio.current_time() + timeout
|
||||
async with tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
# ephemeral root just for the probe
|
||||
):
|
||||
while True:
|
||||
try:
|
||||
async with tractor.find_actor(
|
||||
'registrar', # daemon's own name
|
||||
registry_addrs=[reg_addr],
|
||||
) as portal:
|
||||
if portal is not None:
|
||||
return
|
||||
except Exception:
|
||||
pass
|
||||
if trio.current_time() >= deadline:
|
||||
raise TimeoutError(...)
|
||||
await trio.sleep(poll_interval)
|
||||
```
|
||||
|
||||
Pros: actually proves the discovery path works,
|
||||
handles the "bound but not ready" case naturally.
|
||||
Cons: requires booting an ephemeral root actor JUST
|
||||
for the probe (overhead), more code, and runs in trio
|
||||
which complicates the sync-fixture context. Need a
|
||||
`trio.run()` wrapper.
|
||||
|
||||
### Recommended: Path A with optional handshake check
|
||||
|
||||
Path A is much simpler + handles 95% of the bug
|
||||
class. If "bound-but-not-ready" turns out to still
|
||||
race (it shouldn't — `tractor.run_daemon` doesn't
|
||||
return from `bind()` until the registrar is
|
||||
fully populated), escalate to Path B as a focused
|
||||
follow-up.
|
||||
|
||||
## Workarounds (until fix lands)
|
||||
|
||||
1. **Bump `_PROC_SPAWN_WAIT`** higher (current: 0.6).
|
||||
2.0–3.0 hides most flakes at the cost of adding
|
||||
dead time to every test. Not a fix but reduces
|
||||
blast radius while the proper poll lands.
|
||||
2. **`pytest-rerunfailures`** with `reruns=1` on the
|
||||
`daemon` fixture's tests specifically. Hides the
|
||||
flake but doesn't address it.
|
||||
3. **Mark known-affected tests as `xfail(strict=False)`**
|
||||
under `--ci`. Lets CI go green at the cost of
|
||||
silently hiding regressions.
|
||||
|
||||
(Recommend skipping all three — implement the active
|
||||
poll instead.)
|
||||
|
||||
## Investigation next steps
|
||||
|
||||
1. Implement Path A as a `_wait_for_daemon_ready()`
|
||||
helper in `tests/conftest.py`. Replace the
|
||||
`time.sleep(bg_daemon_spawn_delay)` call with it.
|
||||
2. Drop the `_PROC_SPAWN_WAIT` constant entirely
|
||||
(active poll obsoletes blind sleep).
|
||||
3. Run the suite 5-10 times to validate flake rate
|
||||
drops to 0.
|
||||
4. If flakes persist, profile whether the daemon
|
||||
process exits with non-zero before the poll's
|
||||
deadline hits — that'd be a different bug
|
||||
(daemon startup crash) that the blind sleep was
|
||||
masking.
|
||||
5. Cross-check `tests/test_multi_program.py::test_*`
|
||||
— multiple tests use the `daemon` fixture; all
|
||||
should benefit from the same poll primitive.
|
||||
|
||||
## Related
|
||||
|
||||
- `tests/conftest.py::daemon` — the fixture under
|
||||
fix
|
||||
- `tests/conftest.py::_PROC_SPAWN_WAIT` — the
|
||||
constant to drop
|
||||
- `cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`
|
||||
— distinct flake class (cancel-cascade
|
||||
`TooSlowError` at teardown, not connect-time race)
|
||||
- `trio_wakeup_socketpair_busy_loop_under_fork_issue.md`
|
||||
— different bug entirely; this race was masked
|
||||
pre-WakeupSocketpair-patch by the busy-loop
|
||||
hangs.
|
||||
|
|
@ -1,102 +0,0 @@
|
|||
# `trio` 0.29 -> 0.33 slows the depth=3 cancel-cascade
|
||||
|
||||
## Symptom
|
||||
|
||||
After locking to `trio==0.33.0` (commit `c7741bba`, was
|
||||
`0.29.0`), this test reliably trips its `fail_after`
|
||||
deadline on the **`trio`** backend:
|
||||
|
||||
```
|
||||
FAILED tests/test_cancellation.py::test_nested_multierrors[start_method=trio-depth=3]
|
||||
- AssertionError: assert False
|
||||
where False = isinstance(
|
||||
Cancelled(source='deadline', source_task=None, reason=None),
|
||||
tractor.RemoteActorError,
|
||||
)
|
||||
```
|
||||
|
||||
A `fail_after_w_trace` hang-snapshot is captured for the
|
||||
test each run (deadline-injected `Cancelled` wrapped into
|
||||
the actor-nursery `BaseExceptionGroup`).
|
||||
|
||||
## Root cause (immediate)
|
||||
|
||||
The test budgets `fail_after(6)` for the `trio` backend.
|
||||
That 6s was chosen (commit `32955db0`, while `trio==0.29`)
|
||||
with the assertion that trio finishes "well under" 6s.
|
||||
The `trio` 0.29 -> 0.33 bump slowed the depth=3 cascade
|
||||
past that budget, so the 6s deadline now fires mid-cascade.
|
||||
|
||||
trio 0.33 added **cancel-reason tracking** — every
|
||||
`Cancelled` now carries `(source=, reason=, source_task=)`.
|
||||
The injected exc is `Cancelled(source='deadline')`, i.e.
|
||||
trio itself naming our `fail_after(6)` scope as the cancel
|
||||
origin. When that `Cancelled` collapses one branch of the
|
||||
nursery BEG, the test's `isinstance(subexc,
|
||||
RemoteActorError)` assertion fails. The healthy outcome is
|
||||
`BEG = [RemoteActorError, RemoteActorError]`; the
|
||||
`Cancelled` is purely an artifact of the deadline cutting
|
||||
the cascade short.
|
||||
|
||||
## Measurements (standalone, this machine)
|
||||
|
||||
```
|
||||
depth=1 trio ~3.15s PASS (keeps 6s budget)
|
||||
depth=3 trio ~6.8-8.2s FAIL @ 6s (now bumped to 12s)
|
||||
```
|
||||
|
||||
depth=1 still fits comfortably; only depth=3 (deeper
|
||||
recursive spawn-and-error tree => more actors to reap)
|
||||
exceeds the old budget. The ~2s/depth-level cost looks
|
||||
like serialized per-actor reap / `terminate_after` waits.
|
||||
|
||||
## Mitigation applied
|
||||
|
||||
`test_nested_multierrors` now splits the `trio` budget:
|
||||
|
||||
```python
|
||||
case ('trio', 1):
|
||||
timeout = 6
|
||||
case ('trio', 3):
|
||||
timeout = 12 # was 6; see this doc
|
||||
```
|
||||
|
||||
This stops the deadline from firing so the cascade
|
||||
completes naturally to `[RAE, RAE]`.
|
||||
|
||||
## Also affected — same root cause, different test
|
||||
|
||||
`test_echoserver_detailed_mechanics[trio-raise_error=KeyboardInterrupt]`
|
||||
(`tests/test_infected_asyncio.py`) tripped the *same*
|
||||
slowdown via its much tighter `trio` budget of `1s`. The
|
||||
single-aio-subactor teardown now takes ~1s, so the `1s`
|
||||
`fail_after` raced the deadline (PASS at 0.99s / FAIL at
|
||||
1.03s across back-to-back standalone runs). On a deadline-
|
||||
fire the injected `Cancelled(source='deadline')` wraps the
|
||||
mid-stream `KeyboardInterrupt` into a `BaseExceptionGroup`,
|
||||
which is NOT a `KeyboardInterrupt` so the bare
|
||||
`pytest.raises(KeyboardInterrupt)` fails. (The sibling
|
||||
`raise_error=Exception` variant only "passes" by accident:
|
||||
an `ExceptionGroup` *is-a* `Exception`, so its
|
||||
`pytest.raises(Exception)` still matches even when wrapped.)
|
||||
|
||||
Mitigation: bump that `trio` budget `1 -> 4s` (matching the
|
||||
forking-spawner case). Without a deadline-fire the KBI
|
||||
propagates bare and the assertion passes.
|
||||
|
||||
## Open follow-up (the actual regression)
|
||||
|
||||
The budget bump is a band-aid — the underlying question is
|
||||
**why** the depth=3 `trio` cancel-cascade went from <6s to
|
||||
~7-8s across `trio` 0.29 -> 0.33. Candidate avenues:
|
||||
|
||||
- which scope owns the per-actor `terminate_after` wait,
|
||||
and are the tree's reaps concurrent or serialized?
|
||||
- did trio 0.33's abort/reschedule or cancel-reason
|
||||
bookkeeping change checkpoint timing on the cancel path?
|
||||
|
||||
If/when the cascade speeds back up under-budget, depth=3
|
||||
will start completing well under 12s — at which point the
|
||||
budget can be tightened back toward 6s as a regression
|
||||
tripwire. Related (different backend, same cascade class):
|
||||
`cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`.
|
||||
|
|
@ -1,221 +0,0 @@
|
|||
# trio `WakeupSocketpair.drain()` busy-loop in forked child (peer-closed missed-EOF)
|
||||
|
||||
## Reproducer
|
||||
|
||||
```bash
|
||||
./py313/bin/python -m pytest \
|
||||
tests/test_multi_program.py::test_register_duplicate_name \
|
||||
--tpt-proto=tcp \
|
||||
--spawn-backend=main_thread_forkserver \
|
||||
-v --capture=sys
|
||||
```
|
||||
|
||||
Subactor pegs a CPU core indefinitely; parent test
|
||||
hangs waiting for the subactor.
|
||||
|
||||
## Empirical evidence (caught alive)
|
||||
|
||||
```
|
||||
$ sudo strace -p <subactor-pid>
|
||||
recvfrom(6, "", 65536, 0, NULL, NULL) = 0
|
||||
recvfrom(6, "", 65536, 0, NULL, NULL) = 0
|
||||
recvfrom(6, "", 65536, 0, NULL, NULL) = 0
|
||||
... (no `epoll_wait`, no other syscalls, just this back-to-back)
|
||||
```
|
||||
|
||||
Pattern: tight C-level `recvfrom` loop returning 0
|
||||
each call. No `epoll_wait` between iterations →
|
||||
**not trio's task scheduler**. Pure synchronous C
|
||||
loop.
|
||||
|
||||
```
|
||||
$ sudo readlink /proc/<subactor-pid>/fd/6
|
||||
socket:[<inode>]
|
||||
|
||||
$ sudo lsof -p <subactor-pid> | grep ' 6u'
|
||||
<cmd> <pid> goodboy 6u unix 0xffff... 0t0 <inode> type=STREAM (CONNECTED)
|
||||
```
|
||||
|
||||
fd=6 is an **AF_UNIX socket** in CONNECTED state.
|
||||
Even though the test uses `--tpt-proto=tcp`, this fd
|
||||
is NOT a tractor IPC channel — it's an internal
|
||||
trio socketpair.
|
||||
|
||||
## Root-cause: `WakeupSocketpair.drain()`
|
||||
|
||||
`/site-packages/trio/_core/_wakeup_socketpair.py`:
|
||||
|
||||
```python
|
||||
class WakeupSocketpair:
|
||||
def __init__(self) -> None:
|
||||
self.wakeup_sock, self.write_sock = socket.socketpair()
|
||||
self.wakeup_sock.setblocking(False)
|
||||
self.write_sock.setblocking(False)
|
||||
...
|
||||
|
||||
def drain(self) -> None:
|
||||
try:
|
||||
while True:
|
||||
self.wakeup_sock.recv(2**16)
|
||||
except BlockingIOError:
|
||||
pass
|
||||
```
|
||||
|
||||
`socket.socketpair()` on Linux defaults to AF_UNIX
|
||||
SOCK_STREAM. Both ends non-blocking. Normal flow:
|
||||
|
||||
1. Signal/wake event → `write_sock.send(b'\x00')`
|
||||
queues a byte.
|
||||
2. `wakeup_sock` becomes readable → trio's epoll
|
||||
triggers.
|
||||
3. Trio calls `drain()` to flush the buffer.
|
||||
4. drain loops on `wakeup_sock.recv(64KB)`.
|
||||
5. Eventually buffer empty → non-blocking socket
|
||||
raises `BlockingIOError` → except → break.
|
||||
|
||||
**Bug surface — peer-closed missed-EOF**:
|
||||
|
||||
Non-blocking socket semantics:
|
||||
- buffer has data → `recv` returns N>0 bytes (loop continues)
|
||||
- buffer empty → `recv` raises `BlockingIOError`
|
||||
- **peer FIN'd → `recv` returns 0 bytes (NEITHER exception NOR
|
||||
break — infinite tight loop)**
|
||||
|
||||
`drain()` does not handle the `b''` return-value
|
||||
(EOF) case. If `write_sock` has been closed (or the
|
||||
process holding it is gone), every iteration returns
|
||||
0 → infinite loop → 100% CPU on a single core.
|
||||
|
||||
## Why this triggers under `main_thread_forkserver`
|
||||
|
||||
Under `os.fork()` from the forkserver-worker thread:
|
||||
|
||||
1. Parent has a `WakeupSocketpair` instance with
|
||||
`wakeup_sock=fdN`, `write_sock=fdM`. Both fds
|
||||
open in parent.
|
||||
2. Fork → child inherits BOTH fds (kernel-level fd
|
||||
table dup).
|
||||
3. `_close_inherited_fds()` runs in child →
|
||||
closes everything except stdio. `wakeup_sock` and
|
||||
`write_sock` of the parent's `WakeupSocketpair`
|
||||
ARE closed in child.
|
||||
4. Child's trio (running fresh) creates its OWN
|
||||
`WakeupSocketpair` → NEW fd numbers (e.g. fd 6, 7).
|
||||
5. **In `infect_asyncio` mode** the asyncio loop is
|
||||
the host; trio runs as guest via
|
||||
`start_guest_run`. trio still creates its
|
||||
`WakeupSocketpair` in the I/O manager but its
|
||||
role is different.
|
||||
|
||||
The race window: somewhere between (3) and (5), if a
|
||||
`WakeupSocketpair` Python object reference inherited
|
||||
via COW (from parent's pre-fork heap) survives long
|
||||
enough that `drain()` is called on it AFTER its fds
|
||||
were closed but BEFORE the child's NEW socketpair
|
||||
takes over the recycled fd numbers — the recycled fd
|
||||
will be one of the child's NEW socketpair ends, whose
|
||||
peer might be FIN-flagged (e.g. parent-process
|
||||
peer-end is closed).
|
||||
|
||||
Or simpler: the `wait_for_actor`/`find_actor` discovery
|
||||
flow in `test_register_duplicate_name` triggers an
|
||||
unusual code path where a stale `WakeupSocketpair`
|
||||
gets `drain()`-called on a fd whose peer has already
|
||||
closed.
|
||||
|
||||
## Why `drain()` shouldn't loop indefinitely on EOF
|
||||
(upstream trio bug)
|
||||
|
||||
Even WITHOUT fork, `drain()` should treat `b''` as
|
||||
EOF and break. The current code is correct for the
|
||||
"buffer drained on a healthy socketpair" scenario but
|
||||
incorrect for the "peer is gone" scenario. It's a
|
||||
defensive-programming gap in trio.
|
||||
|
||||
A one-line patch upstream:
|
||||
|
||||
```python
|
||||
def drain(self) -> None:
|
||||
try:
|
||||
while True:
|
||||
data = self.wakeup_sock.recv(2**16)
|
||||
if not data:
|
||||
break # peer-closed; nothing more to drain
|
||||
except BlockingIOError:
|
||||
pass
|
||||
```
|
||||
|
||||
## Workarounds (until the underlying issue lands)
|
||||
|
||||
1. **Skip-mark on the fork backend**:
|
||||
`tests/test_multi_program.py` →
|
||||
`pytest.mark.skipon_spawn_backend('main_thread_forkserver',
|
||||
reason='trio WakeupSocketpair.drain busy-loop, see ai/conc-anal/trio_wakeup_socketpair_busy_loop_under_fork_issue.md')`.
|
||||
|
||||
2. **Defensive monkey-patch in tractor's
|
||||
forkserver-child prelude** — wrap
|
||||
`WakeupSocketpair.drain` to handle `b''`:
|
||||
|
||||
```python
|
||||
# in `_actor_child_main` or `_close_inherited_fds`'s
|
||||
# post-fork prelude:
|
||||
from trio._core._wakeup_socketpair import WakeupSocketpair
|
||||
_orig_drain = WakeupSocketpair.drain
|
||||
def _safe_drain(self):
|
||||
try:
|
||||
while True:
|
||||
data = self.wakeup_sock.recv(2**16)
|
||||
if not data:
|
||||
return # peer closed
|
||||
except BlockingIOError:
|
||||
pass
|
||||
WakeupSocketpair.drain = _safe_drain
|
||||
```
|
||||
|
||||
Tracks upstream — remove once trio fixes.
|
||||
|
||||
3. **Upstream the fix**: 1-line PR to `python-trio/trio`
|
||||
adding `if not data: break` to `drain()`.
|
||||
|
||||
## Investigation next steps
|
||||
|
||||
1. **Confirm via py-spy**: when caught alive, detach
|
||||
strace first then
|
||||
`sudo py-spy dump --pid <subactor> --locals`. The
|
||||
busy thread should show `drain` from `WakeupSocketpair`
|
||||
in the call chain.
|
||||
2. **Identify which write-end peer is closed**: from
|
||||
the inode of fd 6, look up the matching peer
|
||||
inode via `ss -xp` and see whose process it
|
||||
was/is.
|
||||
3. **Verify the missed-EOF hypothesis**: hand-craft a
|
||||
minimal `WakeupSocketpair` repro:
|
||||
|
||||
```python
|
||||
from trio._core._wakeup_socketpair import WakeupSocketpair
|
||||
ws = WakeupSocketpair()
|
||||
ws.write_sock.close() # simulate peer-gone
|
||||
ws.drain() # should hang forever
|
||||
```
|
||||
|
||||
## Sibling bug
|
||||
|
||||
`tests/test_infected_asyncio.py::test_aio_simple_error`
|
||||
hangs under the same backend with a DIFFERENT
|
||||
fingerprint (Mode-A deadlock, both parties in
|
||||
`epoll_wait`, no busy-loop). Distinct root cause —
|
||||
see `infected_asyncio_under_main_thread_forkserver_hang_issue.md`.
|
||||
|
||||
Both share the broader theme: **trio internal-state
|
||||
initialization isn't fully fork-safe under
|
||||
`main_thread_forkserver`** for the more exotic
|
||||
dispatch paths.
|
||||
|
||||
## See also
|
||||
|
||||
- [#379](https://github.com/goodboy/tractor/issues/379) — subint umbrella
|
||||
- python-trio/trio#1614 — trio + fork hazards
|
||||
- `trio._core._wakeup_socketpair.WakeupSocketpair`
|
||||
source (the smoking gun)
|
||||
- `ai/conc-anal/fork_thread_semantics_execution_vs_memory.md`
|
||||
- `ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md`
|
||||
|
|
@ -1,54 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-6
|
||||
service: claude
|
||||
session: (ad-hoc, not tracked via conf.toml)
|
||||
timestamp: 2026-04-06T17:28:48Z
|
||||
git_ref: 02b2ef1
|
||||
scope: tests
|
||||
substantive: true
|
||||
raw_file: 20260406T172848Z_02b2ef1_prompt_io.raw.md
|
||||
---
|
||||
|
||||
## Prompt
|
||||
|
||||
User asked to extend `tests/test_resource_cache.py` with a test
|
||||
that reproduces the edge case fixed in commit `02b2ef18` (per-key
|
||||
locking+user tracking in `maybe_open_context()`). The bug was
|
||||
originally triggered in piker's `brokerd.kraken` backend where the
|
||||
same `acm_func` was called with different kwargs, and the old
|
||||
global `_Cache.users` counter caused:
|
||||
|
||||
- teardown skipped for one `ctx_key` bc another key's users kept
|
||||
the global count > 0
|
||||
- re-entry hitting `assert not resources.get(ctx_key)` during the
|
||||
teardown window
|
||||
|
||||
User requested a test that would fail under the old code and pass
|
||||
with the fix.
|
||||
|
||||
## Response summary
|
||||
|
||||
Designed and implemented `test_per_ctx_key_resource_lifecycle`
|
||||
which verifies per-`ctx_key` resource isolation by:
|
||||
|
||||
1. Holding resource `'a'` open in a bg task
|
||||
2. Opening+closing resource `'b'` (same `acm_func`, different
|
||||
kwargs) while `'a'` is still alive
|
||||
3. Re-opening `'b'` and asserting cache MISS — proving `'b'` was
|
||||
torn down independently despite `'a'` keeping its own user
|
||||
count > 0
|
||||
|
||||
With the old global counter, phase 3 would produce a stale cache
|
||||
HIT (leaked resource) or crash on the assert.
|
||||
|
||||
Also added a trivial `acm_with_resource(resource_id)` ACM helper
|
||||
at module level.
|
||||
|
||||
## Files changed
|
||||
|
||||
- `tests/test_resource_cache.py` — add `acm_with_resource` ACM +
|
||||
`test_per_ctx_key_resource_lifecycle` test fn
|
||||
|
||||
## Human edits
|
||||
|
||||
None — committed as generated (pending user review).
|
||||
|
|
@ -1,57 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-6
|
||||
service: claude
|
||||
session: (ad-hoc, not tracked via conf.toml)
|
||||
timestamp: 2026-04-06T19:31:25Z
|
||||
git_ref: 85f9c5d
|
||||
scope: tests
|
||||
substantive: true
|
||||
raw_file: 20260406T193125Z_85f9c5d_prompt_io.raw.md
|
||||
---
|
||||
|
||||
## Prompt
|
||||
|
||||
User asked to reproduce the exact `assert not resources.get(ctx_key)`
|
||||
crash originally seen in piker's `brokerd.kraken` backend via
|
||||
`open_cached_client('kraken')`. Key constraints from user:
|
||||
|
||||
- In piker, kwargs were the **same** (empty) so all callers
|
||||
share one `ctx_key = (fid, ())`
|
||||
- The root issue is `_Cache.locks` being indexed by `fid`
|
||||
rather than `ctx_key`, plus the race window between
|
||||
`values.pop()` and `resources.pop()` in `_Cache.run_ctx`
|
||||
(the acm `__aexit__` has checkpoints in between)
|
||||
- Mark the test `@pytest.mark.xfail` so it can be committed
|
||||
against the unpatched code and later adjusted once the fix
|
||||
from `9e49eddd` is applied
|
||||
|
||||
User pointed to piker's `open_cached_client('kraken')` usage
|
||||
as the real-world pattern to model.
|
||||
|
||||
## Response summary
|
||||
|
||||
Added `test_moc_reentry_during_teardown` which deterministically
|
||||
reproduces the teardown race:
|
||||
|
||||
1. A `cached_client()` acm (no params, like
|
||||
`kraken.api.get_client()`) signals via `in_aexit` event
|
||||
when its `__aexit__` starts, then sleeps to hold the
|
||||
window open
|
||||
2. `use_and_exit` task enters+exits the cached resource
|
||||
3. `reenter_during_teardown` waits on `in_aexit` then
|
||||
re-enters — landing exactly in the window where
|
||||
`values` is popped but `resources` still exists
|
||||
4. The old `assert not resources.get(ctx_key)` fires
|
||||
5. `@pytest.mark.xfail(raises=AssertionError)` catches it
|
||||
|
||||
Test confirmed: `XFAIL` with the exact traceback at
|
||||
`_mngrs.py:327`.
|
||||
|
||||
## Files changed
|
||||
|
||||
- `tests/test_resource_cache.py` — add
|
||||
`test_moc_reentry_during_teardown` xfail test fn
|
||||
|
||||
## Human edits
|
||||
|
||||
None — committed as generated (pending user review).
|
||||
|
|
@ -1,52 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-6
|
||||
service: claude
|
||||
session: (ad-hoc, not tracked via conf.toml)
|
||||
timestamp: 2026-04-06T22:26:48Z
|
||||
git_ref: f086222
|
||||
scope: code
|
||||
substantive: true
|
||||
raw_file: 20260406T222648Z_f086222_prompt_io.raw.md
|
||||
---
|
||||
|
||||
## Prompt
|
||||
|
||||
User applied the per-key locking patch (`f086222d`),
|
||||
removed the `@pytest.mark.xfail` from
|
||||
`test_moc_reentry_during_teardown`, and found the
|
||||
test STILL fails — now with `RuntimeError` instead
|
||||
of `AssertionError`. Asked to diagnose why the fix
|
||||
doesn't actually resolve this race.
|
||||
|
||||
## Response summary
|
||||
|
||||
Traced the exact scheduling sequence across three
|
||||
concurrent tasks to show that the per-`ctx_key`
|
||||
locking change doesn't close the fundamental gap:
|
||||
|
||||
- `maybe_open_context`'s finally pops the lock and
|
||||
sets `no_more_users` in the **exiting caller's
|
||||
task**
|
||||
- But the actual resource cleanup (`values.pop`,
|
||||
acm `__aexit__`, `resources.pop`) runs in the
|
||||
**`run_ctx` task** inside `service_tn`
|
||||
- A re-entering task creates a fresh lock (old one
|
||||
popped), acquires immediately, and hits stale
|
||||
`resources[ctx_key]` while `run_ctx` is still in
|
||||
`__aexit__`
|
||||
|
||||
Identified three possible fix strategies:
|
||||
1. Let `run_ctx` own lock release/pop after
|
||||
`resources.pop()`
|
||||
2. Add a "teardown complete" event that new callers
|
||||
wait on
|
||||
3. Move `resources.pop()` into `run_ctx`'s inner
|
||||
finally (before acm `__aexit__`)
|
||||
|
||||
## Files changed
|
||||
|
||||
No code changes — analysis only.
|
||||
|
||||
## Human edits
|
||||
|
||||
N/A — diagnostic output, no code generated.
|
||||
|
|
@ -1,32 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-6
|
||||
service: claude
|
||||
timestamp: 2026-04-06T22:26:48Z
|
||||
git_ref: f086222
|
||||
diff_cmd: git diff f086222~1..f086222
|
||||
---
|
||||
|
||||
Here's the problem. The per-key locking doesn't fix this race because the **lock lifetime is decoupled from `run_ctx`'s lifetime**. Trace through what happens:
|
||||
|
||||
**`maybe_open_context` finally** (lines 468-495):
|
||||
1. `users[ctx_key] -= 1` → 0
|
||||
2. `no_more_users.set()` → wakes `run_ctx` (but it doesn't run yet, just becomes ready)
|
||||
3. `_Cache.locks.pop(ctx_key)` → **lock is gone**
|
||||
4. `maybe_open_context` exits
|
||||
|
||||
**`run_ctx`** now gets scheduled (lines 237-241):
|
||||
5. Inner finally: `values.pop(ctx_key)` → values gone
|
||||
6. `mng.__aexit__()` runs → `in_aexit.set()`, `await trio.sleep(10)` → **CHECKPOINT**
|
||||
|
||||
**`reenter_during_teardown`** wakes:
|
||||
7. Enters `maybe_open_context` → no lock exists for `ctx_key` → creates **fresh** lock, acquires immediately
|
||||
8. `values[ctx_key]` → `KeyError`
|
||||
9. `resources.get(ctx_key)` → **still exists!** (step 6 hasn't reached `resources.pop()` in `run_ctx`'s outer finally yet)
|
||||
10. `raise RuntimeError('Caching resources ALREADY exist?!')`
|
||||
|
||||
The core issue: `no_more_users.set()` (step 2) and `locks.pop()` (step 3) happen in the **exiting caller's task**, but the actual resource cleanup (`values.pop`, acm `__aexit__`, `resources.pop`) happens in the **`run_ctx` task** inside `service_tn`. There's no synchronization between "lock removed, new callers welcome" and "run_ctx has actually finished tearing down". The per-`ctx_key` lock change just made the lock more granular — it didn't close this gap.
|
||||
|
||||
The fix needs to ensure a new caller can't acquire a lock for `ctx_key` until `run_ctx` has fully exited (including `resources.pop()`). Either:
|
||||
- **don't pop the lock in `maybe_open_context`** — let `run_ctx` release/pop it after `resources.pop()`
|
||||
- **wait on a "teardown complete" event** in the cache-miss path when stale `resources` exist
|
||||
- **pop `resources` before the acm `__aexit__`** (move `resources.pop()` into the inner finally alongside `values.pop()`)
|
||||
|
|
@ -1,74 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-6
|
||||
service: claude
|
||||
session: (ad-hoc, not tracked via conf.toml)
|
||||
timestamp: 2026-04-13T19:21:16Z
|
||||
git_ref: f851f28
|
||||
scope: code
|
||||
substantive: true
|
||||
---
|
||||
|
||||
## Prompt
|
||||
|
||||
User provided a detailed implementation plan for
|
||||
a new `test_tpt_bind_addrs.py` test suite covering
|
||||
`open_root_actor(tpt_bind_addrs=...)` — the three
|
||||
runtime code paths in `_root.py:385-450`:
|
||||
|
||||
1. Non-registrar, no explicit bind -> random addrs
|
||||
2. Registrar, no explicit bind -> registry_addrs
|
||||
3. Explicit bind given -> `wrap_address()` + merge
|
||||
|
||||
Plan specified 6 test functions (~10 parametrized
|
||||
variants), predicted a type-mixing bug at line 446,
|
||||
and asked for an in-flight fix if confirmed.
|
||||
|
||||
## Response summary
|
||||
|
||||
Created `tests/discovery/test_tpt_bind_addrs.py`
|
||||
with 9 collected test variants across 6 functions:
|
||||
|
||||
- `test_registrar_root_tpt_bind_addrs` (3 variants:
|
||||
`bind-eq-reg`, `bind-subset-reg`,
|
||||
`bind-disjoint-reg`)
|
||||
- `test_non_registrar_root_tpt_bind_addrs`
|
||||
(2 variants: `bind-same-bindspace`,
|
||||
`bind-disjoint`)
|
||||
- `test_non_registrar_default_random_bind`
|
||||
(baseline, no explicit bind)
|
||||
- `test_tpt_bind_addrs_as_maddr_str`
|
||||
(multiaddr string input)
|
||||
- `test_registrar_merge_binds_union`
|
||||
(registrar + disjoint bind -> union)
|
||||
- `test_open_nursery_forwards_tpt_bind_addrs`
|
||||
(`open_nursery(**kwargs)` forwarding)
|
||||
|
||||
Confirmed and fixed the predicted bug at
|
||||
`_root.py:446`: the registrar merge path mixed
|
||||
`Address` objects (`tpt_bind_addrs`) with raw tuples
|
||||
(`uw_reg_addrs`) inside `set()`, preventing
|
||||
deduplication and causing double-bind `OSError`.
|
||||
|
||||
Fix: wrap `uw_reg_addrs` before the set union:
|
||||
```python
|
||||
# before (broken)
|
||||
tpt_bind_addrs = list(set(
|
||||
tpt_bind_addrs + uw_reg_addrs
|
||||
))
|
||||
# after (fixed)
|
||||
tpt_bind_addrs = list(set(
|
||||
tpt_bind_addrs
|
||||
+ [wrap_address(a) for a in uw_reg_addrs]
|
||||
))
|
||||
```
|
||||
|
||||
All 9 tests pass after the fix.
|
||||
|
||||
## Files changed
|
||||
|
||||
- `tests/discovery/test_tpt_bind_addrs.py` (new)
|
||||
- `tractor/_root.py:446` (bug fix, 1 line)
|
||||
|
||||
## Human edits
|
||||
|
||||
N/A — pending review.
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-6
|
||||
service: claude
|
||||
session: 76154e65-d8e1-4b5f-9275-0ea45ba7e98a
|
||||
timestamp: 2026-04-13T20:50:48Z
|
||||
git_ref: 269d939c
|
||||
scope: code
|
||||
substantive: true
|
||||
raw_file: 20260413T205048Z_269d939c_prompt_io.raw.md
|
||||
---
|
||||
|
||||
## Prompt
|
||||
|
||||
Implement a `parse_endpoints()` API in
|
||||
`tractor.discovery._multiaddr` that lets downstream
|
||||
projects declare per-actor transport bind addresses
|
||||
as a service table of actor-name -> multiaddr
|
||||
strings (loaded from config, e.g. TOML `[network]`
|
||||
section). Add type aliases `EndpointsTable` and
|
||||
`ParsedEndpoints`, plus 7 unit tests covering TCP,
|
||||
UDS, tuples, mixed input, unsupported protocols, and
|
||||
edge cases.
|
||||
|
||||
A detailed implementation plan was provided
|
||||
specifying insertion points, type signatures, test
|
||||
names, and verification commands.
|
||||
|
||||
## Response summary
|
||||
|
||||
Add `parse_endpoints()` function with two supporting
|
||||
type aliases to `_multiaddr.py`. The function
|
||||
iterates a `dict[str, list[str|tuple]]` service
|
||||
table and delegates each entry to the existing
|
||||
`wrap_address()` helper, which already handles maddr
|
||||
strings, raw tuples, and pre-wrapped `Address`
|
||||
objects. Seven pure unit tests (no trio runtime)
|
||||
validate all input variants and edge cases. All 22
|
||||
multiaddr tests pass.
|
||||
|
||||
## Files changed
|
||||
|
||||
- `tractor/discovery/_multiaddr.py` — add
|
||||
`EndpointsTable`, `ParsedEndpoints` type aliases
|
||||
and `parse_endpoints()` function after line 133
|
||||
- `tests/discovery/test_multiaddr.py` — add
|
||||
`parse_endpoints` import and 7 new test functions
|
||||
|
||||
## Human edits
|
||||
|
||||
None — committed as generated.
|
||||
|
|
@ -1,32 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-6
|
||||
service: claude
|
||||
timestamp: 2026-04-13T20:50:48Z
|
||||
git_ref: 269d939c
|
||||
diff_cmd: git diff 269d939c~1..269d939c
|
||||
---
|
||||
|
||||
## Implementation output
|
||||
|
||||
### `tractor/discovery/_multiaddr.py`
|
||||
|
||||
> `git diff 269d939c~1..269d939c -- tractor/discovery/_multiaddr.py`
|
||||
|
||||
Type aliases `EndpointsTable` and `ParsedEndpoints`
|
||||
plus `parse_endpoints()` helper that wraps each
|
||||
entry via `wrap_address()`.
|
||||
|
||||
### `tests/discovery/test_multiaddr.py`
|
||||
|
||||
> `git diff 269d939c~1..269d939c -- tests/discovery/test_multiaddr.py`
|
||||
|
||||
Add import of `parse_endpoints` to existing import
|
||||
block, then 7 new test functions.
|
||||
|
||||
### Test run output
|
||||
|
||||
```
|
||||
22 passed, 1 warning in 0.05s
|
||||
```
|
||||
|
||||
All 22 tests pass (15 existing + 7 new).
|
||||
|
|
@ -1,38 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-6
|
||||
service: claude
|
||||
session: multiaddr-support-rename-prefer
|
||||
timestamp: 2026-04-14T16:33:00Z
|
||||
git_ref: befedc49
|
||||
scope: code
|
||||
substantive: true
|
||||
raw_file: 20260414T163300Z_befedc49_prompt_io.raw.md
|
||||
---
|
||||
|
||||
## Prompt
|
||||
|
||||
Create a helper function that determines the best transport given
|
||||
actor locality (distributed vs same host). Use PID/hostname
|
||||
comparison for locality detection, apply at registry addr selection
|
||||
only (not spawn-time).
|
||||
|
||||
## Response summary
|
||||
|
||||
New `prefer_addr()` + `_is_local_addr()` helpers
|
||||
in `_api.py` using `socket.getaddrinfo()` and
|
||||
`ipaddress` for PID/hostname locality detection.
|
||||
Preference: UDS > local TCP > remote TCP.
|
||||
Integrated into `query_actor()` and
|
||||
`wait_for_actor()`. Also changed
|
||||
`Registrar.find_actor()` to return full addr list
|
||||
so callers can apply preference.
|
||||
|
||||
## Files changed
|
||||
|
||||
- `tractor/discovery/_discovery.py` → `_api.py`
|
||||
— renamed + added `prefer_addr()`,
|
||||
`_is_local_addr()`; updated `query_actor()` and
|
||||
`wait_for_actor()` call sites
|
||||
- `tractor/discovery/_registry.py`
|
||||
— `Registrar.find_actor()` returns
|
||||
`list[UnwrappedAddress]|None`
|
||||
|
|
@ -1,62 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-6
|
||||
service: claude
|
||||
timestamp: 2026-04-14T16:33:00Z
|
||||
git_ref: befedc49
|
||||
diff_cmd: git diff befedc49~1..befedc49
|
||||
---
|
||||
|
||||
### `tractor/discovery/_api.py`
|
||||
|
||||
> `git diff befedc49~1..befedc49 -- tractor/discovery/_api.py`
|
||||
|
||||
Add `_is_local_addr()` and `prefer_addr()` transport
|
||||
preference helpers.
|
||||
|
||||
#### `_is_local_addr(addr: Address) -> bool`
|
||||
|
||||
Determines whether an `Address` is reachable on the
|
||||
local host:
|
||||
|
||||
- `UDSAddress`: always returns `True`
|
||||
(filesystem-bound, inherently local)
|
||||
- `TCPAddress`: checks if `._host` is a loopback IP
|
||||
via `ipaddress.ip_address().is_loopback`, then
|
||||
falls back to comparing against the machine's own
|
||||
interface IPs via
|
||||
`socket.getaddrinfo(socket.gethostname(), None)`
|
||||
|
||||
#### `prefer_addr(addrs: list[UnwrappedAddress]) -> UnwrappedAddress`
|
||||
|
||||
Selects the "best" transport address from a
|
||||
multihomed actor's address list. Wraps each
|
||||
candidate via `wrap_address()` to get typed
|
||||
`Address` objects, then classifies into three tiers:
|
||||
|
||||
1. **UDS** (same-host guaranteed, lowest overhead)
|
||||
2. **TCP loopback / same-host IP** (local network)
|
||||
3. **TCP remote** (only option for distributed)
|
||||
|
||||
Within each tier, the last-registered (latest) entry
|
||||
is preferred. Falls back to `addrs[-1]` if no
|
||||
heuristic matches.
|
||||
|
||||
### `tractor/discovery/_registry.py`
|
||||
|
||||
> `git diff befedc49~1..befedc49 -- tractor/discovery/_registry.py`
|
||||
|
||||
`Registrar.find_actor()` return type broadened from
|
||||
single addr to `list[UnwrappedAddress]|None` — full
|
||||
addr list lets callers apply transport preference.
|
||||
|
||||
#### Integration
|
||||
|
||||
`query_actor()` and `wait_for_actor()` now call
|
||||
`prefer_addr(addrs)` instead of `addrs[-1]`.
|
||||
|
||||
### Verification
|
||||
|
||||
All discovery tests pass (13/13 non-daemon).
|
||||
`test_local.py` and `test_multi_program.py` also
|
||||
pass (daemon fixture teardown failures are
|
||||
pre-existing and unrelated).
|
||||
|
|
@ -1,101 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-7[1m]
|
||||
service: claude
|
||||
session: subints-spawner-design-kickoff
|
||||
timestamp: 2026-04-17T03:49:18Z
|
||||
git_ref: 9703210
|
||||
scope: docs
|
||||
substantive: true
|
||||
raw_file: 20260417T034918Z_9703210_prompt_io.raw.md
|
||||
---
|
||||
|
||||
## Prompt
|
||||
|
||||
Drive the "first big boi, from GH issue" task seeded by
|
||||
`ai/prompt-io/prompts/subints_spawner.md`: design, plan
|
||||
and implement sub-interpreter (subint) spawn-backend
|
||||
support per issue #379, including (1) modularizing
|
||||
`tractor.spawn._spawn` into per-backend submods, (2) a new
|
||||
`._subint` backend, and (3) harness parametrization via the
|
||||
existing `--spawn-backend` / `start_method` pytest fixture
|
||||
in `tractor._testing.pytest`.
|
||||
|
||||
Follow-up clarifications from the user (this turn):
|
||||
1. Pin `<3.15` on this dev branch and feature-gate subint
|
||||
tests — chose option (a).
|
||||
2. Split Phase A (modularization) into its own PR first.
|
||||
3. Defer the `fork()`-via-subint hack to a follow-up.
|
||||
4. Harness flag is `pytest --spawn-backend <key>` CLI →
|
||||
`start_method` fixture (prompt file updated to match).
|
||||
|
||||
## Response summary
|
||||
|
||||
Produced a three-phase plan and a concrete Phase A (pure
|
||||
modularization) file-split plan for user review; no code
|
||||
written yet — the green-light to start Phase A was given
|
||||
in this same turn conditional on logging this prompt-io
|
||||
entry first.
|
||||
|
||||
Phases:
|
||||
- **A — modularize** `tractor/spawn/_spawn.py` (847 LOC):
|
||||
keep generic machinery in `_spawn.py`, extract
|
||||
`trio_proc` → `spawn/_trio.py`, `mp_proc` →
|
||||
`spawn/_mp.py`. No pin bump.
|
||||
- **B — `_subint` backend**: bump `pyproject.toml`
|
||||
`requires-python` upper to `<3.15`; add `'subint'` to
|
||||
`SpawnMethodKey`; reuse existing UDS transport; shm
|
||||
escape-hatch deferred.
|
||||
- **C — harness**: drive the valid-backend tuple in
|
||||
`tractor/_testing/pytest.py:345-349` from
|
||||
`typing.get_args(SpawnMethodKey)`; skip subint tests on
|
||||
Python < 3.14.
|
||||
|
||||
Key findings surfaced to the user:
|
||||
- `pyproject.toml:12` currently pins `<3.14`; PEP 734
|
||||
`concurrent.interpreters` only ships in 3.14 — the
|
||||
load-bearing constraint.
|
||||
- `_testing/pytest.py:345-349` hardcodes valid backends
|
||||
as a string tuple (`'mp_spawn'`, `'mp_forkserver'`,
|
||||
`'trio'`) — should be `get_args(SpawnMethodKey)`.
|
||||
- `_testing/pytest.py:228` already imports
|
||||
`try_set_start_method` from `tractor.spawn._spawn` —
|
||||
keeping the `_spawn.py` path as the "core" module
|
||||
avoids breaking external refs during Phase A.
|
||||
|
||||
Phase A commit breakdown (each green via `/run-tests`):
|
||||
1. Drive `_testing/pytest.py` backend-tuple from
|
||||
`get_args(SpawnMethodKey)` — prep.
|
||||
2. Extract `trio_proc` → `spawn/_trio.py`; `_methods`
|
||||
registry imports it.
|
||||
3. Extract `mp_proc` → `spawn/_mp.py`; `_methods` imports
|
||||
it.
|
||||
4. Tidy `spawn/__init__.py` docstring for the new layout.
|
||||
|
||||
Workflow: Phase A runs in a new worktree via `/open-wkt
|
||||
spawn_modularize` (branch `wkt/spawn_modularize`) and
|
||||
lands as its own PR before Phase B/C returns to the
|
||||
current `claude_bun_opus4.7_subint_spawner_backend`
|
||||
branch.
|
||||
|
||||
## Files changed
|
||||
|
||||
None this turn — design/planning only. The user updated
|
||||
`ai/prompt-io/prompts/subints_spawner.md` themselves to
|
||||
complete the truncated sentence on line 32 (harness-flag
|
||||
reference).
|
||||
|
||||
Upcoming (Phase A, next turn):
|
||||
- `tractor/spawn/_spawn.py` — trimmed to "core"
|
||||
- `tractor/spawn/_trio.py` — new; holds `trio_proc()`
|
||||
- `tractor/spawn/_mp.py` — new; holds `mp_proc()`
|
||||
- `tractor/spawn/__init__.py` — docstring refresh
|
||||
- `tractor/_testing/pytest.py` — backend tuple from
|
||||
`get_args(SpawnMethodKey)`
|
||||
|
||||
## Human edits
|
||||
|
||||
None — committed as generated. User completed the
|
||||
truncated sentence on line 32 of
|
||||
`ai/prompt-io/prompts/subints_spawner.md` themselves
|
||||
*before* this prompt-io log was written; that edit was
|
||||
the user's authoring, not an edit to AI output.
|
||||
|
|
@ -1,117 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-7[1m]
|
||||
service: claude
|
||||
timestamp: 2026-04-17T03:49:18Z
|
||||
git_ref: 9703210
|
||||
diff_cmd: git diff HEAD~1..HEAD
|
||||
---
|
||||
|
||||
No code was produced in this turn — this is a design/planning
|
||||
exchange for subint spawner support per issue #379. Captured
|
||||
verbatim below are the two substantive AI responses, which
|
||||
together fix the phased plan (A: modularize, B: subint, C:
|
||||
harness) and the concrete Phase A file split before any
|
||||
implementation begins.
|
||||
|
||||
## Turn 1 — initial triage + gating questions
|
||||
|
||||
> Read: issue #379, current `spawn/_spawn.py` (847 lines),
|
||||
> and pyproject pins.
|
||||
>
|
||||
> ## Task summary
|
||||
> Design/plan/implement subint backend per #379, plus
|
||||
> modularize `spawn/_spawn.py` into per-backend submods, plus
|
||||
> harness flag for test parametrization.
|
||||
>
|
||||
> ## Key finding
|
||||
> `pyproject.toml:12` pins `requires-python = ">=3.12, <3.14"`
|
||||
> — but `concurrent.interpreters` / PEP 734 **only lands in
|
||||
> 3.14**. This is the load-bearing design question before
|
||||
> coding.
|
||||
>
|
||||
> ## Proposed phasing (incremental commits)
|
||||
> - **Phase A — modularize** `_spawn.py`: extract `trio_proc`
|
||||
> → `spawn/_trio.py`, `mp_proc` → `spawn/_mp.py`, keep
|
||||
> generic helpers (`hard_kill`/`soft_kill`/`new_proc`
|
||||
> dispatcher/`SpawnMethodKey`) in a slim core. Tests green
|
||||
> at each step.
|
||||
> - **Phase B — `_subint` backend**: extend `SpawnMethodKey`
|
||||
> with `'subint'`; reuse UDS transport by default; shm is
|
||||
> escape-hatch follow-up.
|
||||
> - **Phase C — harness**: add backend-parametrize flag, skip
|
||||
> subint when Python < 3.14.
|
||||
>
|
||||
> ## Gating questions before I start
|
||||
> 1. **Python pin**: bump `<3.15` and feature-gate subint
|
||||
> tests (a), require 3.14 outright (b), or backport shim
|
||||
> (c)? I'd recommend (a).
|
||||
> 2. **PR strategy**: one branch, commits per phase (current
|
||||
> `claude_bun_opus4.7_subint_spawner_backend`)? Or split
|
||||
> Phase A as its own PR first?
|
||||
> 3. **`fork()`-via-subint hack** from the issue's "Our own
|
||||
> thoughts" section — v1 or follow-up? I'd defer to
|
||||
> follow-up.
|
||||
> 4. **Prompt file line 32 is truncated** — what's the
|
||||
> existing harness flag name you had in mind ("using the
|
||||
> existing ___")?
|
||||
>
|
||||
> Which of these are already decided vs. open?
|
||||
|
||||
## User answers (for the raw record)
|
||||
|
||||
1. bump `<3.15`, feature-gate — option (a).
|
||||
2. split Phase A as its own PR first.
|
||||
3. defer `fork()`-via-subint hack.
|
||||
4. harness flag is `--spawn-backend` CLI → `start_method`
|
||||
fixture, per `tractor._testing._pytest`. Prompt file
|
||||
updated accordingly.
|
||||
|
||||
## Turn 2 — concrete Phase A plan
|
||||
|
||||
> Good — prompt file now ends cleanly; harness flag
|
||||
> confirmed as `--spawn-backend` CLI → `start_method`
|
||||
> fixture (via `_testing/pytest.py:193,355`). Also noted
|
||||
> `pytest.py:345-349` hardcodes valid backends as a string
|
||||
> tuple — `typing.get_args(SpawnMethodKey)` should drive
|
||||
> that list.
|
||||
>
|
||||
> ## Phase A — concrete plan (separate PR)
|
||||
>
|
||||
> **File split** in `tractor/spawn/`:
|
||||
> - `_spawn.py` stays as "core": keeps `SpawnMethodKey`,
|
||||
> `_spawn_method`, `proc_waiter`, `try_set_start_method`,
|
||||
> `exhaust_portal`, `cancel_on_completion`, `hard_kill`,
|
||||
> `soft_kill`, `new_proc` dispatcher, `_methods`
|
||||
> registry. Kept at this path so `_testing/pytest.py:228`
|
||||
> and other external imports don't break.
|
||||
> - `_trio.py` ← `trio_proc()` (lines 448–671, ~224 lines)
|
||||
> - `_mp.py` ← `mp_proc()` (lines 672–842, ~171 lines)
|
||||
> - `_methods` dict imports the two backend fns from the
|
||||
> new modules.
|
||||
>
|
||||
> **Commits** (each green via `/run-tests`):
|
||||
> 1. Drive `_testing/pytest.py` backend-tuple from
|
||||
> `get_args(SpawnMethodKey)` — tiny prep.
|
||||
> 2. Extract `trio_proc` → `spawn/_trio.py`; `_methods`
|
||||
> imports it.
|
||||
> 3. Extract `mp_proc` → `spawn/_mp.py`; `_methods`
|
||||
> imports it.
|
||||
> 4. Tidy `spawn/__init__.py` docstring to describe new
|
||||
> layout.
|
||||
>
|
||||
> No pyproject pin bump in Phase A (that's Phase B).
|
||||
>
|
||||
> **Workflow**: I'd run this in a new worktree via
|
||||
> `/open-wkt spawn_modularize` (branch
|
||||
> `wkt/spawn_modularize`), then open a PR before returning
|
||||
> to the current `claude_bun_opus4.7_subint_spawner_backend`
|
||||
> branch for Phase B/C.
|
||||
>
|
||||
> Green-light to `/open-wkt` and start Phase A?
|
||||
|
||||
## User final response
|
||||
|
||||
> all this first then yes, green light.
|
||||
|
||||
— log this `/prompt-io` entry first, then proceed with
|
||||
Phase A in a worktree.
|
||||
|
|
@ -1,86 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-7[1m]
|
||||
service: claude
|
||||
session: subints-spawner-phase-a-impl
|
||||
timestamp: 2026-04-17T03:58:05Z
|
||||
git_ref: 61a73ba
|
||||
scope: code
|
||||
substantive: true
|
||||
raw_file: 20260417T035800Z_61a73ba_prompt_io.raw.md
|
||||
---
|
||||
|
||||
## Prompt
|
||||
|
||||
Follow-up from the design session logged in
|
||||
`20260417T034918Z_9703210_prompt_io.md`: the user
|
||||
greenlit Phase A after `/prompt-io` tracking, so
|
||||
execute the Phase A "modularize `spawn._spawn`" plan
|
||||
end-to-end in a new worktree — four logical steps
|
||||
(pytest harness prep, `trio_proc` extraction,
|
||||
`mp_proc` extraction, docstring refresh), each green
|
||||
via `/run-tests`.
|
||||
|
||||
User directives during execution:
|
||||
- **Option 3** for the test cadence ("continue A.2–A.4
|
||||
first, then run the full suite once at the end of
|
||||
Phase A").
|
||||
- **One commit** for the whole phase ("can't we just
|
||||
commit the whole patch in one commit?") instead of
|
||||
the 3/4-commit split I initially proposed.
|
||||
- **Don't pre-draft** commit messages — wait for the
|
||||
user to invoke `/commit-msg` (captured as feedback
|
||||
memory `feedback_no_auto_draft_commit_msgs.md`).
|
||||
|
||||
## Response summary
|
||||
|
||||
Produced the cohesive Phase A modularization patch,
|
||||
landed as commit `61a73bae` (subject: `Mv
|
||||
trio_proc`/`mp_proc` to per-backend submods`). Five
|
||||
files changed, +565 / -418 lines.
|
||||
|
||||
Key pieces of the patch (generated by claude,
|
||||
reviewed by the human before commit):
|
||||
- `tractor/spawn/_trio.py` — **new**; receives
|
||||
`trio_proc()` verbatim from `_spawn.py`; imports
|
||||
cross-backend helpers back from `._spawn`.
|
||||
- `tractor/spawn/_mp.py` — **new**; receives
|
||||
`mp_proc()` verbatim; uses `from . import _spawn`
|
||||
for late-binding access to the mutable `_ctx` /
|
||||
`_spawn_method` globals (design decision made
|
||||
during impl, not the original plan).
|
||||
- `tractor/spawn/_spawn.py` — shrunk 847 → 448 LOC;
|
||||
import pruning; bottom-of-module late imports for
|
||||
`trio_proc` / `mp_proc` with a one-line comment
|
||||
explaining the circular-dep reason.
|
||||
- `tractor/spawn/__init__.py` — docstring refresh
|
||||
describing the new layout.
|
||||
- `tractor/_testing/pytest.py` — the valid-backend
|
||||
set now comes from `typing.get_args(SpawnMethodKey)`
|
||||
so future additions (`'subint'`) don't need harness
|
||||
edits.
|
||||
|
||||
## Files changed
|
||||
|
||||
See `git diff 61a73ba~1..61a73ba --stat`:
|
||||
|
||||
```
|
||||
tractor/_testing/pytest.py | 12 +-
|
||||
tractor/spawn/__init__.py | 31 +++-
|
||||
tractor/spawn/_mp.py | 235 ++++++++++++++++++++++++
|
||||
tractor/spawn/_spawn.py | 413 +-------------------------------
|
||||
tractor/spawn/_trio.py | 292 ++++++++++++++++++++++++++++
|
||||
5 files changed, 565 insertions(+), 418 deletions(-)
|
||||
```
|
||||
|
||||
Validation:
|
||||
- import probe + `_methods` wiring check — OK
|
||||
- spawn-relevant test subset — 37 passed, 1 skipped
|
||||
- full suite — 350 passed, 14 skipped, 7 xfailed, 1
|
||||
xpassed
|
||||
|
||||
## Human edits
|
||||
|
||||
None — committed as generated by claude (no diff
|
||||
between `.claude/git_commit_msg_LATEST.md` and the
|
||||
committed body, as far as the assistant could
|
||||
observe).
|
||||
|
|
@ -1,138 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-7[1m]
|
||||
service: claude
|
||||
timestamp: 2026-04-17T03:58:05Z
|
||||
git_ref: 61a73ba
|
||||
diff_cmd: git diff 61a73ba~1..61a73ba
|
||||
---
|
||||
|
||||
Code generated in this turn was committed verbatim as
|
||||
`61a73bae` ("Mv `trio_proc`/`mp_proc` to per-backend
|
||||
submods"). Per diff-ref mode, per-file code is captured
|
||||
via the pointers below, each followed by a prose
|
||||
summary of what the AI generated. Non-code output
|
||||
(sanity-check results, design rationale) is included
|
||||
verbatim.
|
||||
|
||||
## Per-file generated content
|
||||
|
||||
### `tractor/spawn/_trio.py` (new, 292 lines)
|
||||
|
||||
> `git diff 61a73ba~1..61a73ba -- tractor/spawn/_trio.py`
|
||||
|
||||
Pure lift-and-shift of `trio_proc()` out of
|
||||
`tractor/spawn/_spawn.py` (previously lines 448–670).
|
||||
Added AGPL header + module docstring describing the
|
||||
backend; imports include local `from ._spawn import
|
||||
cancel_on_completion, hard_kill, soft_kill` which
|
||||
creates the bottom-of-module late-import pattern in
|
||||
the core file to avoid a cycle. All call sites,
|
||||
log-format strings, and body logic are byte-identical
|
||||
to the originals — no semantic change.
|
||||
|
||||
### `tractor/spawn/_mp.py` (new, 235 lines)
|
||||
|
||||
> `git diff 61a73ba~1..61a73ba -- tractor/spawn/_mp.py`
|
||||
|
||||
Pure lift-and-shift of `mp_proc()` out of
|
||||
`tractor/spawn/_spawn.py` (previously lines 672–842).
|
||||
Same AGPL header convention. Key difference from
|
||||
`_trio.py`: uses `from . import _spawn` (module
|
||||
import, not from-import) for `_ctx` and
|
||||
`_spawn_method` references — these are mutated at
|
||||
runtime by `try_set_start_method()`, so late binding
|
||||
via `_spawn._ctx` / `_spawn._spawn_method` is required
|
||||
for correctness. Also imports `cancel_on_completion`,
|
||||
`soft_kill`, `proc_waiter` from `._spawn`.
|
||||
|
||||
### `tractor/spawn/_spawn.py` (modified, 847 → 448 LOC)
|
||||
|
||||
> `git diff 61a73ba~1..61a73ba -- tractor/spawn/_spawn.py`
|
||||
|
||||
- removed `trio_proc()` body (moved to `_trio.py`)
|
||||
- removed `mp_proc()` body (moved to `_mp.py`)
|
||||
- pruned imports now unused in core: `sys`,
|
||||
`is_root_process`, `current_actor`,
|
||||
`is_main_process`, `_mp_main`, `ActorFailure`,
|
||||
`pretty_struct`, `_pformat`
|
||||
- added bottom-of-file late imports
|
||||
`from ._trio import trio_proc` and
|
||||
`from ._mp import mp_proc` with a one-line
|
||||
comment explaining why (circular dep)
|
||||
- `_methods` dict unchanged structurally; still binds
|
||||
`'trio' → trio_proc`, `'mp_spawn' → mp_proc`,
|
||||
`'mp_forkserver' → mp_proc`
|
||||
|
||||
### `tractor/spawn/__init__.py` (modified)
|
||||
|
||||
> `git diff 61a73ba~1..61a73ba -- tractor/spawn/__init__.py`
|
||||
|
||||
Docstring rewrite only — no code. New text describes
|
||||
the per-backend `Layout` with `- ._spawn` (core),
|
||||
`- ._trio` (default), `- ._mp` (two mp variants), and
|
||||
reiterates the "no eager import" NOTE.
|
||||
|
||||
### `tractor/_testing/pytest.py` (modified)
|
||||
|
||||
> `git diff 61a73ba~1..61a73ba -- tractor/_testing/pytest.py`
|
||||
|
||||
Two-line logical change (plus a comment):
|
||||
- `from typing import` grows `get_args`
|
||||
- `pytest_generate_tests()` replaces the hardcoded
|
||||
`('mp_spawn','mp_forkserver','trio')` tuple with
|
||||
`get_args(SpawnMethodKey)` via lazy import
|
||||
`from tractor.spawn._spawn import SpawnMethodKey`
|
||||
inside the function body (matching the project's
|
||||
existing lazy-import convention for the spawn
|
||||
module)
|
||||
|
||||
## Non-code output (verbatim)
|
||||
|
||||
### Design rationale for the file split
|
||||
|
||||
From my Phase A plan turn (already captured in
|
||||
`20260417T034918Z_9703210_prompt_io.md`):
|
||||
- `_spawn.py` stays as the "core" rather than renamed
|
||||
to `_core.py` so external imports
|
||||
(`_testing/pytest.py:228` imports
|
||||
`try_set_start_method` from `tractor.spawn._spawn`)
|
||||
keep working without churn.
|
||||
- Per-backend extraction chosen over alternatives
|
||||
(e.g. splitting generic helpers further) because
|
||||
the immediate motivation is hosting a 3rd
|
||||
`_subint.py` sibling cleanly in Phase B.
|
||||
|
||||
### Sanity-check output (verbatim terminal excerpts)
|
||||
|
||||
Post-extraction import probe:
|
||||
```
|
||||
extraction OK
|
||||
_methods: {'trio': 'tractor.spawn._trio.trio_proc',
|
||||
'mp_spawn': 'tractor.spawn._mp.mp_proc',
|
||||
'mp_forkserver': 'tractor.spawn._mp.mp_proc'}
|
||||
```
|
||||
|
||||
Spawn-relevant test subset (`tests/test_local.py
|
||||
test_rpc.py test_spawning.py test_multi_program.py
|
||||
test_discovery.py`):
|
||||
```
|
||||
37 passed, 1 skipped, 14 warnings in 55.37s
|
||||
```
|
||||
|
||||
Full suite:
|
||||
```
|
||||
350 passed, 14 skipped, 7 xfailed, 1 xpassed,
|
||||
151 warnings in 437.73s (0:07:17)
|
||||
```
|
||||
|
||||
No regressions vs. `main`. One transient `-x`
|
||||
early-stop `ERROR` on
|
||||
`test_close_channel_explicit_remote_registrar[trio-True]`
|
||||
was flaky (passed solo, passed without `-x`), not
|
||||
caused by this refactor.
|
||||
|
||||
### Commit message
|
||||
|
||||
Also AI-drafted (via `/commit-msg`) — the 40-line
|
||||
message on commit `61a73bae` itself. Not reproduced
|
||||
here; see `git log -1 61a73bae`.
|
||||
|
|
@ -1,146 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-7[1m]
|
||||
service: claude
|
||||
session: trio-0.33-subproc-supervisor-retroactive
|
||||
timestamp: 2026-06-01T23:14:29Z
|
||||
git_ref: 0e3e008b
|
||||
scope: code
|
||||
substantive: true
|
||||
raw_file: 20260601T231429Z_0e3e008b_prompt_io.raw.md
|
||||
---
|
||||
|
||||
## Prompt
|
||||
|
||||
**RETROACTIVE LOG** — original session prompts not
|
||||
preserved; reconstructed from the staged work product.
|
||||
|
||||
The work designs a `trio.Nursery.start()`-style wrapper
|
||||
around `trio.run_process()` for SC-friendly subprocess
|
||||
supervision. From the resulting code shape, the
|
||||
prompting intent was:
|
||||
|
||||
1. Surface rc!=0 `CalledProcessError` DETERMINISTICALLY,
|
||||
without the nursery-eg-wrapping that complicates
|
||||
`collapse_eg()` usage and races the relay reader on
|
||||
trio's `check=True`-driven cancel cascade.
|
||||
2. ALWAYS isolate the parent controlling-tty so a
|
||||
spawned child can't emit terminal control-seqs onto
|
||||
the launching tty (clobbering scrollback). Default
|
||||
`stdin=DEVNULL`; default `stdout=DEVNULL` unless
|
||||
explicitly relayed/overridden; distinguish "caller
|
||||
passed nothing" from "caller passed `None` for
|
||||
inherit".
|
||||
3. Optional live per-line relay of child std-streams to
|
||||
the `tractor` log — STREAMED (not
|
||||
buffered-until-exit) so long-lived daemon output is
|
||||
visible during the run. Pick a custom log level that
|
||||
shows at usual `info`/`devx` console levels but is
|
||||
separately filterable.
|
||||
4. Concurrent pipe-drain reader MANDATORY when piping
|
||||
without `capture_*` — without it the child blocks on
|
||||
`write()` once the OS pipe buffer fills (~64KiB),
|
||||
causing deadlocks on output bursts.
|
||||
5. Non-blocking `tn.start()` semantics: hand the live
|
||||
`trio.Process` to the parent immediately;
|
||||
supervise/relay run to completion in the supervisor
|
||||
coro.
|
||||
6. Hermetic `trio`-only unit tests (no actor-runtime)
|
||||
covering each of: per-line relay, tty isolation,
|
||||
no-deadlock on >64KiB unnewlined output, CPE
|
||||
rebuild w/ stderr relay, CPE rebuild on the silent
|
||||
drain+capture path.
|
||||
|
||||
## Response summary
|
||||
|
||||
Adds `tractor/trionics/_subproc.py` (296 LOC) +
|
||||
`tests/trionics/test_subproc.py` (230 LOC) + a
|
||||
re-export in `tractor/trionics/__init__.py`.
|
||||
|
||||
**`supervise_run_process()`** (public, re-exported)
|
||||
- `check=False` is forced to `trio.run_process`; the
|
||||
rc-check runs in the supervisor coro AFTER `own_tn`
|
||||
unwinds (both the child AND the relay readers have
|
||||
hit EOF + fully drained). A BARE
|
||||
`subprocess.CalledProcessError` is rebuilt + raised
|
||||
from there, with `.stderr` bytes passed in the
|
||||
constructor AND attached as an `add_note()`'d
|
||||
`|_.stderr:` block for legible teardown logs.
|
||||
- `stdin=DEVNULL` always. `stdout` default chosen via a
|
||||
`_UNSET` sentinel: `relay_stdout=True` → PIPE,
|
||||
explicit `stdout=...` → as given, else `DEVNULL`.
|
||||
`stderr` defaults to PIPE whenever we relay OR need
|
||||
the CPE note (when `check=True`), else `DEVNULL`.
|
||||
- `relay_level='io'` (custom level 21; sorts just
|
||||
above stdlib `INFO`=20 so it shows at usual
|
||||
`info`/`devx` levels and stays separately
|
||||
filterable). `runtime`=15 would silently filter at
|
||||
default levels, so it's rejected as a default.
|
||||
- `task_status.started(trio_proc)` delivers the live
|
||||
process immediately. The internal `own_tn`
|
||||
supervises `trio.run_process` + any relay readers to
|
||||
completion.
|
||||
- `**run_process_kwargs` forward verbatim;
|
||||
`stdin/stdout/stderr/check` are MANAGED keys
|
||||
(override on conflict).
|
||||
- Crash-handling deliberately NOT baked in — compose
|
||||
`maybe_open_crash_handler()` on top at the call-site.
|
||||
|
||||
**`_relay_stream_lines()`** (internal helper)
|
||||
- Three modes (combinable): `emit`-only (live per-line
|
||||
relay), `accum`-only (silent drain+capture for a CPE
|
||||
note), or both (live relay AND capture).
|
||||
- Per-line split handles cross-chunk residuals via a
|
||||
rolling `residual` bytes buffer; flushes any trailing
|
||||
un-newline-term'd line at EOF.
|
||||
- `async with stream:` ensures aclose at EOF/cancel
|
||||
(mirrors trio's internal `_subprocess` drain idiom).
|
||||
|
||||
**`_add_stderr_note()`** (internal helper)
|
||||
- `add_note()`s a `textwrap.indent(...)`'d
|
||||
`|_.stderr:` block onto a `CalledProcessError` for
|
||||
teardown logs.
|
||||
|
||||
**Tests** (5 hermetic, trio-only) — `_capture_relay`
|
||||
fixture monkeypatches `_subproc.log.<level>` to a list:
|
||||
- `test_stdout_relayed_per_line`: per-line stdout
|
||||
relay carries each `line=N` to the records.
|
||||
- `test_parent_tty_isolated`: `readlink /proc/self/fd/0`
|
||||
and `fd/1` from the child show `pipe:` (fd1) +
|
||||
`/dev/null` (fd0); NO `/dev/pts/*`.
|
||||
- `test_no_deadlock_on_big_unnewlined_output`: 200KiB
|
||||
of `x` with no newlines completes inside
|
||||
`fail_after(2)` — exercises the concurrent drain.
|
||||
- `test_stderr_relay_and_cpe_rebuild`: rc=3 with
|
||||
`relay_stderr=True` raises bare CPE
|
||||
(via `collapse_eg()`) with `b'boom' in cpe.stderr`,
|
||||
the note attached, AND per-line live relay.
|
||||
- `test_nonrelay_cpe_note`: rc=7 with no relay still
|
||||
produces CPE with `.stderr` + note via the silent
|
||||
drain+capture path.
|
||||
|
||||
## Files changed
|
||||
|
||||
- `tractor/trionics/_subproc.py` — NEW. Public
|
||||
`supervise_run_process()` + helpers
|
||||
`_relay_stream_lines()` / `_add_stderr_note()` + the
|
||||
`_UNSET` sentinel.
|
||||
- `tests/trionics/test_subproc.py` — NEW. 5 hermetic
|
||||
trio-only tests + `_capture_relay` monkeypatch
|
||||
fixture.
|
||||
- `tractor/trionics/__init__.py` — re-export
|
||||
`supervise_run_process`.
|
||||
|
||||
## Human edits
|
||||
|
||||
**RETROACTIVE**: this log is being written from the
|
||||
staged diff, not from a live session. The code as
|
||||
staged is the canonical artifact; any human edits the
|
||||
user made during the originating design session are
|
||||
already integrated and cannot be separated post-hoc.
|
||||
The `.raw.md` sibling is a diff-pointer placeholder,
|
||||
NOT a pre-edit transcript.
|
||||
|
||||
Future prompt-io entries for in-flight work should be
|
||||
written DURING the design session per the skill
|
||||
contract so the pre-edit `.raw.md` captures the
|
||||
unedited model output for genuine provenance.
|
||||
|
|
@ -1,106 +0,0 @@
|
|||
---
|
||||
model: claude-opus-4-7[1m]
|
||||
service: claude
|
||||
timestamp: 2026-06-01T23:14:29Z
|
||||
git_ref: 0e3e008b
|
||||
diff_cmd: git diff HEAD~1..HEAD
|
||||
---
|
||||
|
||||
# RETROACTIVE — original model output not preserved
|
||||
|
||||
This `.raw.md` would normally contain the verbatim
|
||||
pre-human-edit response from the design session that
|
||||
produced the staged `_subproc.py` module + tests. That
|
||||
session's transcript is not available, so this file
|
||||
serves as a diff-pointer placeholder + transparency
|
||||
note.
|
||||
|
||||
## Authoritative artifact
|
||||
|
||||
The committed code IS the artifact of record. Once the
|
||||
companion commit lands, the unified diff is:
|
||||
|
||||
> `git diff HEAD~1..HEAD -- tractor/trionics/_subproc.py`
|
||||
> `git diff HEAD~1..HEAD -- tests/trionics/test_subproc.py`
|
||||
> `git diff HEAD~1..HEAD -- tractor/trionics/__init__.py`
|
||||
|
||||
Before committing, substitute `--cached` for the
|
||||
pre-commit form.
|
||||
|
||||
## What is NOT here
|
||||
|
||||
Because this is retroactive:
|
||||
- No verbatim chain-of-thought / discussion prose from
|
||||
the design session.
|
||||
- No rejected alternatives the model considered before
|
||||
arriving at the final shape (e.g. whether the
|
||||
rc-check should live inside `own_tn` vs after it; the
|
||||
`_UNSET` sentinel vs a `None`-means-DEVNULL
|
||||
convention; `io` vs `info` as the default relay
|
||||
level).
|
||||
- No pre-edit code blocks as the model first emitted
|
||||
them, separable from any user cleanup applied before
|
||||
the diff was staged.
|
||||
|
||||
## Inferred design choices visible in the final code
|
||||
|
||||
(Documented here because they're the kind of decision
|
||||
detail an unedited raw transcript would have captured.)
|
||||
|
||||
1. **Post-drain rc-check in the supervisor coro body,
|
||||
AFTER `own_tn.__aexit__`.** Placing the
|
||||
`CalledProcessError` raise here (not inside
|
||||
`own_tn`) means the EG-unwrap happens at the OUTER
|
||||
`tn.start()` boundary — callers do `collapse_eg()`
|
||||
if they want bare. Doing the raise INSIDE `own_tn`
|
||||
would cancel the still-draining relay reader
|
||||
mid-flight and lose stderr lines.
|
||||
|
||||
2. **`_UNSET` sentinel for `stdout`.** A plain default
|
||||
of `None` couldn't distinguish "use the safe
|
||||
`DEVNULL` default" from "caller explicitly passed
|
||||
`None` (inherit, presumably knowingly)". The
|
||||
sentinel keeps the SAFE default while letting power
|
||||
users opt into inherit.
|
||||
|
||||
3. **`relay_level='io'` (custom level 21).** Chosen to
|
||||
sort just above stdlib `INFO`=20 so a default
|
||||
`--ll info` shows the relay, but it remains a
|
||||
distinct level so users can filter
|
||||
`tractor.trionics:io` separately. Picking
|
||||
`runtime`=15 would have made the relay invisible at
|
||||
default verbosity (a footgun for daemon supervisors
|
||||
whose whole point is "I want to see this output").
|
||||
|
||||
4. **Reader is MANDATORY, not opt-in cosmetic.** With
|
||||
`stdout=PIPE` / `stderr=PIPE` we OWN the drain
|
||||
responsibility — there's no `trio.capture_*` running
|
||||
under the hood here. The ~64KiB OS pipe buffer
|
||||
means a child writing more than that without us
|
||||
reading hangs at `write()` — a deadlock that won't
|
||||
show up in small-output tests, which is why the
|
||||
200KiB-no-newline test is in the suite.
|
||||
|
||||
5. **`task_status.started(trio_proc)` BEFORE the
|
||||
`own_tn` exits.** Without this, `tn.start()` would
|
||||
block until the child exits — losing the "start a
|
||||
long-lived daemon and continue with parent work"
|
||||
use case. With it, the parent gets the live process
|
||||
handle immediately and the supervise+relay tasks
|
||||
run in the supervisor coro until the child exits.
|
||||
|
||||
6. **`__notes__` via `add_note()` for the CPE
|
||||
`.stderr`.** The `.stderr` attribute is what
|
||||
`subprocess` callers expect; the `add_note()` is
|
||||
what trio's exception-rendering shows. Both wired so
|
||||
programmatic AND human consumers see the stderr at
|
||||
teardown.
|
||||
|
||||
## Honesty statement
|
||||
|
||||
This file's content is RECONSTRUCTED from the staged
|
||||
code, not extracted from a verbatim model transcript.
|
||||
The prompt-io skill's intent is for the `.raw.md` to
|
||||
be a pre-edit fossil; that's not possible here. Future
|
||||
work should write the prompt-io entry DURING the
|
||||
design session.
|
||||
|
|
@ -1,27 +0,0 @@
|
|||
# AI Prompt I/O Log — claude
|
||||
|
||||
This directory tracks prompt inputs and model
|
||||
outputs for AI-assisted development using
|
||||
`claude` (Claude Code).
|
||||
|
||||
## Policy
|
||||
|
||||
Prompt logging follows the
|
||||
[NLNet generative AI policy][nlnet-ai].
|
||||
All substantive AI contributions are logged
|
||||
with:
|
||||
- Model name and version
|
||||
- Timestamps
|
||||
- The prompts that produced the output
|
||||
- Unedited model output (`.raw.md` files)
|
||||
|
||||
[nlnet-ai]: https://nlnet.nl/foundation/policies/generativeAI/
|
||||
|
||||
## Usage
|
||||
|
||||
Entries are created by the `/prompt-io` skill
|
||||
or automatically via `/commit-msg` integration.
|
||||
|
||||
Human contributors remain accountable for all
|
||||
code decisions. AI-generated content is never
|
||||
presented as human-authored work.
|
||||
|
|
@ -1,76 +0,0 @@
|
|||
ok now i want you to take a look at the most recent commit adding
|
||||
a `tpt_bind_addrs` to `open_root_actor()` and extend the existing
|
||||
tests/discovery/test_multiaddr* and friends to use this new param in
|
||||
at least one suite with parametrizations over,
|
||||
|
||||
- `registry_addrs == tpt_bind_addrs`, as in both inputs are the same.
|
||||
- `set(registry_addrs) >= set(tpt_bind_addrs)`, as in the registry
|
||||
addrs include the bind set.
|
||||
- `registry_addrs != tpt_bind_addrs`, where the reg set is disjoint from
|
||||
the bind set in all possible combos you can imagine.
|
||||
|
||||
All of the ^above cases should further be parametrized over,
|
||||
- the root being the registrar,
|
||||
- a non-registrar root using our bg `daemon` fixture.
|
||||
|
||||
once we have a fairly thorough test suite and have flushed out all
|
||||
bugs and edge cases we want to design a wrapping API which allows
|
||||
declaring full tree's of actors tpt endpoints using multiaddrs such
|
||||
that a `dict[str, list[str]]` of actor-name -> multiaddr can be used
|
||||
to configure a tree of actors-as-services given such an input
|
||||
"endpoints-table" can be matched with the number of appropriately
|
||||
named subactore spawns in a `tractor` user-app.
|
||||
|
||||
Here is a small example from piker,
|
||||
|
||||
- in piker's root conf.toml we define a `[network]` section which can
|
||||
define various actor-service-daemon names set to a maddr
|
||||
(multiaddress str).
|
||||
|
||||
- each actor whether part of the `pikerd` tree (as a sub) or spawned
|
||||
in other non-registrar rooted trees (such as `piker chart`) should
|
||||
configurable in terms of its `tractor` tpt bind addresses via
|
||||
a simple service lookup table,
|
||||
|
||||
```toml
|
||||
[network]
|
||||
pikerd = [
|
||||
'/ip4/127.0.0.1/tcp/6116', # std localhost daemon-actor tree
|
||||
'/uds/run/user/1000/piker/pikerd@6116.sock', # same but serving UDS
|
||||
]
|
||||
chart = [
|
||||
'/ip4/127.0.0.1/tcp/3333', # std localhost daemon-actor tree
|
||||
'/uds/run/user/1000/piker/chart@3333.sock',
|
||||
]
|
||||
```
|
||||
|
||||
We should take whatever common API is needed to support this and
|
||||
distill it into a
|
||||
```python
|
||||
tractor.discovery.parse_endpoints(
|
||||
) -> dict[
|
||||
str,
|
||||
list[Address]
|
||||
|dict[str, list[Address]]
|
||||
# ^recursive case, see below
|
||||
]:
|
||||
```
|
||||
|
||||
style API which can,
|
||||
|
||||
- be re-used easily across dependent projects.
|
||||
- correctly raise tpt-backend support errors when a maddr specifying
|
||||
a unsupport proto is passed.
|
||||
- be used to handle "tunnelled" maddrs per
|
||||
https://github.com/multiformats/py-multiaddr/#tunneling such that
|
||||
for any such tunneled maddr-`str`-entry we deliver a data-structure
|
||||
which can easily be passed to nested `@acm`s which consecutively
|
||||
setup nested net bindspaces for binding the endpoint addrs using
|
||||
a combo of our `.ipc.*` machinery and, say for example something like
|
||||
https://github.com/svinota/pyroute2, more precisely say for
|
||||
managing tunnelled wireguard eps within network-namespaces,
|
||||
* https://docs.pyroute2.org/
|
||||
* https://docs.pyroute2.org/netns.html
|
||||
|
||||
remember to include use of all default `.claude/skills` throughout
|
||||
this work!
|
||||
|
|
@ -1,34 +0,0 @@
|
|||
This is your first big boi, "from GH issue" design, plan and
|
||||
implement task.
|
||||
|
||||
We need to try and add sub-interpreter (aka subint) support per the
|
||||
issue,
|
||||
|
||||
https://github.com/goodboy/tractor/issues/379
|
||||
|
||||
Part of this work should include,
|
||||
|
||||
- modularizing and thus better organizing the `.spawn.*` subpkg by
|
||||
breaking up various backends currently in `spawn._spawn` into
|
||||
separate submods where it makes sense.
|
||||
|
||||
- add a new `._subint` backend which tries to keep as much of the
|
||||
inter-process-isolation machinery in use as possible but with plans
|
||||
to optimize for localhost only benefits as offered by python's
|
||||
subints where possible.
|
||||
|
||||
* utilizing localhost-only tpts like UDS, shm-buffers for
|
||||
performant IPC between subactors but also leveraging the benefits from
|
||||
the traditional OS subprocs mem/storage-domain isolation, linux
|
||||
namespaces where possible and as available/permitted by whatever
|
||||
is happening under the hood with how cpython implements subints.
|
||||
|
||||
* default configuration should encourage state isolation as with
|
||||
subprocs, but explicit public escape hatches to enable rigorously
|
||||
managed shm channels for high performance apps.
|
||||
|
||||
- all tests should be (able to be) parameterized to use the new
|
||||
`subints` backend and enabled by flag in the harness using the
|
||||
existing `pytest --spawn-backend <spawn-backend>` support offered in
|
||||
the `open_root_actor()` and `.testing._pytest` harness override
|
||||
fixture.
|
||||
|
|
@ -1,159 +0,0 @@
|
|||
# Logging-spec leaf-module granularity — "Route B" (decouple
|
||||
# logger-*identity* from console-*display*)
|
||||
|
||||
Follow-up notes recording the breaking-changes / costs of the
|
||||
deeper fix that would give the `tractor.log` logging-spec (see
|
||||
`LogSpec`/`apply_logspec()`) true **per-leaf-MODULE** level
|
||||
control — deliberately *not* taken (for now) in favour of the
|
||||
smaller sub-PACKAGE fix already landed.
|
||||
|
||||
## Status / what already shipped
|
||||
|
||||
The cheap, contained fix is **done**: `get_logger()`'s "strip
|
||||
#2" (`log.py`, the `pkg_path = subpkg_path` collapse) no longer
|
||||
eats a real sub-package component. It now strips the trailing
|
||||
token *only* when it duplicates the caller's leaf-*module*
|
||||
filename (which the header already shows via `{filename}`).
|
||||
|
||||
Result:
|
||||
|
||||
- `devx.debug` resolves to `tractor.devx.debug`, **distinct**
|
||||
from a bare `devx` -> `tractor.devx` (its parent). So the
|
||||
logging-spec can dial sub-package levels at any nesting depth
|
||||
(`devx.debug:runtime` ≠ `devx:cancel`).
|
||||
- The `get_logger(__name__)` cosmetic ("don't repeat the leaf
|
||||
module in `{name}` since `{filename}` shows it") is preserved.
|
||||
|
||||
What is **still NOT addressable** after that fix:
|
||||
|
||||
- **Per-leaf-MODULE** levels. Every module in a (sub-)pkg shares
|
||||
that pkg's logger, because `get_logger()` drops the leaf
|
||||
module-name from the logger key by design.
|
||||
- **Top-level lib modules** (eg. `tractor.to_asyncio`,
|
||||
`__package__ == 'tractor'`) emit on the *root* `tractor`
|
||||
logger, so a `to_asyncio:<lvl>` spec entry hits a phantom
|
||||
child -> no-op.
|
||||
|
||||
## What "Route B" is
|
||||
|
||||
Make the logger's *identity* the **full dotted module path**
|
||||
(incl. the leaf module + top-level modules), eg.
|
||||
`tractor.devx.debug._tty_lock` and `tractor.to_asyncio`, and
|
||||
move the cosmetic leaf-trim out of logger-naming and into the
|
||||
**formatter's `{name}` rendering**.
|
||||
|
||||
Net effect:
|
||||
|
||||
- Real per-module `Logger` nodes exist in the hierarchy ->
|
||||
the spec can target ANY module; stdlib level-inheritance and
|
||||
propagation "just work" top-down.
|
||||
- Console headers stay clean because the formatter computes a
|
||||
trimmed display string (drop the trailing token that equals
|
||||
`{filename}`'s stem) instead of the logger doing it.
|
||||
|
||||
## Why it's "broad" — breaking changes / costs
|
||||
|
||||
The logger *name* is currently load-bearing well beyond
|
||||
display; changing it ripples:
|
||||
|
||||
1. **Every logger name changes.**
|
||||
Today (post sub-pkg fix) names collapse to the sub-package;
|
||||
Route B = full module path. This touches:
|
||||
- handler attachment points + the `getChild()` hierarchy,
|
||||
- any `logging.getLogger('tractor.X')` string lookups,
|
||||
- any name-based filtering,
|
||||
- the dedup / `_strict_debug` warning logic *inside*
|
||||
`get_logger()` itself — the `pkg_name in name`,
|
||||
`leaf_mod in pkg_path`, "duplicate pkg-name" branches all
|
||||
key off the *name shape* and would need re-derivation.
|
||||
|
||||
2. **Formatter rewrite.**
|
||||
`LOG_FORMAT` uses `{name}` == `record.name` (the full logger
|
||||
name). To keep headers clean we must compute a *display*
|
||||
name and inject it as a record attr (eg. `record.pkg_ns`)
|
||||
via a `logging.Filter` or a `colorlog.ColoredFormatter`
|
||||
subclass overriding `.format()`, then point `LOG_FORMAT` at
|
||||
that field. The `{filename}` vs `{name}` de-dup intent has
|
||||
to be re-implemented per-record rather than per-logger.
|
||||
|
||||
3. **Propagation / double-emit surface grows.**
|
||||
Full-depth loggers mean more intermediate nodes
|
||||
(`...debug._tty_lock` -> `.debug` -> `.devx` -> `tractor`).
|
||||
If more than one level carries a handler (spec sub-handlers
|
||||
+ a root console), records double-emit. The
|
||||
`propagate=False` trick we already use for filter-targeted
|
||||
sub-loggers (`apply_logspec()`) must be applied carefully
|
||||
across a deeper tree — more levels == more places to leak a
|
||||
dup.
|
||||
|
||||
4. **Level-inheritance semantics shift.**
|
||||
Today setting a level on `tractor.devx` gates *all* devx
|
||||
emits (they share that logger). Post-Route-B,
|
||||
`tractor.devx.debug._tty_lock` is its own `NOTSET` logger
|
||||
that *inherits* the effective level from ancestors —
|
||||
functionally similar via inheritance, BUT any code that does
|
||||
`log.setLevel(...)` / reads `log.level` on a (previously
|
||||
collapsed) logger now only affects that exact node. All
|
||||
`setLevel`/`.level =` call sites need an audit (eg.
|
||||
`get_logger()`'s own `log.level = rlog.level` line).
|
||||
|
||||
5. **Downstream contract churn.**
|
||||
`modden` / `piker` call `get_logger()` / `get_console_log()`
|
||||
and may depend on current names — including
|
||||
`modden.runtime.daemon.setup_tractor_logging()` which
|
||||
asserts `'tractor' not in name` on spec parts. The header
|
||||
`{name}` field is user-visible in everyone's logs + CI
|
||||
output. Changing the canonical names is a public-ish
|
||||
behavior change -> needs a version note + downstream
|
||||
coordination (or a formatter trim that keeps the *displayed*
|
||||
string byte-identical to today).
|
||||
|
||||
6. **`get_logger()` refactor risk.**
|
||||
The fn tangles two concerns: compute logger *identity* and
|
||||
compute the *display* string. Route B forces splitting them
|
||||
inside a ~300-line fn with multiple `_strict_debug`
|
||||
branches, dup-warnings, and the `name=__name__` convenience.
|
||||
High chance of subtle regressions without an exhaustive
|
||||
name-derivation test matrix.
|
||||
|
||||
## Migration / test plan (if pursued)
|
||||
|
||||
- Extract a pure helper
|
||||
`_mk_logger_name(pkg_name, mod_name, mod_pkg) -> (logger_name,
|
||||
display_name)` and cover it with an exhaustive unit matrix:
|
||||
auto vs explicit vs `__name__`; package-`__init__` vs leaf
|
||||
module; nested vs flat; `pkg_name in name` vs not; top-level
|
||||
module (`__package__ == pkg_name`).
|
||||
- Switch `get_logger()` to use it for *identity*; switch the
|
||||
formatter to use `display_name` (via a record attr).
|
||||
- Re-run the full suite + golden-diff a sample of rendered log
|
||||
headers to confirm zero cosmetic churn.
|
||||
- Coordinate the name change with `modden`/`piker`; bump +
|
||||
CHANGES note.
|
||||
|
||||
## Cheaper alternative — "Route A" (record-filter)
|
||||
|
||||
If per-leaf control is wanted *before* committing to Route B:
|
||||
keep names collapsed, add a `logging.Filter` on the configured
|
||||
handler keyed on `record.module` / `record.pathname` that maps
|
||||
each record's source module -> its spec level. Set the base
|
||||
logger to the *minimum* level in the spec (so records aren't
|
||||
pre-dropped by the logger), and let the filter discriminate
|
||||
up/down within that floor.
|
||||
|
||||
- Pros: no name churn, no formatter change, fully contained
|
||||
next to `apply_logspec()`.
|
||||
- Cons: a filter can only discriminate *within* what the logger
|
||||
admits -> base must be permissive, so `at_least_level()`
|
||||
expensive-work guards over-admit; matching dotted spec names
|
||||
to a `pathname` is fiddly; doesn't clean up the hierarchy
|
||||
itself.
|
||||
|
||||
## Recommendation
|
||||
|
||||
- Defer Route B unless true per-module loggers are wanted as a
|
||||
first-class feature.
|
||||
- If per-leaf control is needed soon, prefer **Route A**
|
||||
(filter) — lower risk.
|
||||
- The shipped sub-PACKAGE fix already covers the common ask
|
||||
(`devx.debug` vs `devx`).
|
||||
|
|
@ -420,17 +420,20 @@ Check out our experimental system for `guest`_-mode controlled
|
|||
|
||||
|
||||
async def aio_echo_server(
|
||||
chan: tractor.to_asyncio.LinkedTaskChannel,
|
||||
to_trio: trio.MemorySendChannel,
|
||||
from_trio: asyncio.Queue,
|
||||
) -> None:
|
||||
|
||||
# a first message must be sent **from** this ``asyncio``
|
||||
# task or the ``trio`` side will never unblock from
|
||||
# ``tractor.to_asyncio.open_channel_from():``
|
||||
chan.started_nowait('start')
|
||||
to_trio.send_nowait('start')
|
||||
|
||||
# XXX: this uses an ``from_trio: asyncio.Queue`` currently but we
|
||||
# should probably offer something better.
|
||||
while True:
|
||||
# echo the msg back
|
||||
chan.send_nowait(await chan.get())
|
||||
to_trio.send_nowait(await from_trio.get())
|
||||
await asyncio.sleep(0)
|
||||
|
||||
|
||||
|
|
@ -442,7 +445,7 @@ Check out our experimental system for `guest`_-mode controlled
|
|||
# message.
|
||||
async with tractor.to_asyncio.open_channel_from(
|
||||
aio_echo_server,
|
||||
) as (chan, first):
|
||||
) as (first, chan):
|
||||
|
||||
assert first == 'start'
|
||||
await ctx.started(first)
|
||||
|
|
@ -501,10 +504,8 @@ Yes, we spawn a python process, run ``asyncio``, start ``trio`` on the
|
|||
``asyncio`` loop, then send commands to the ``trio`` scheduled tasks to
|
||||
tell ``asyncio`` tasks what to do XD
|
||||
|
||||
The ``asyncio``-side task receives a single
|
||||
``chan: LinkedTaskChannel`` handle providing a ``trio``-like
|
||||
API: ``.started_nowait()``, ``.send_nowait()``, ``.get()``
|
||||
and more. Feel free to sling your opinion in `#273`_!
|
||||
We need help refining the `asyncio`-side channel API to be more
|
||||
`trio`-like. Feel free to sling your opinion in `#273`_!
|
||||
|
||||
|
||||
.. _#273: https://github.com/goodboy/tractor/issues/273
|
||||
|
|
@ -640,15 +641,13 @@ Help us push toward the future of distributed `Python`.
|
|||
- Typed capability-based (dialog) protocols ( see `#196
|
||||
<https://github.com/goodboy/tractor/issues/196>`_ with draft work
|
||||
started in `#311 <https://github.com/goodboy/tractor/pull/311>`_)
|
||||
- **macOS is now officially supported** and tested in CI
|
||||
alongside Linux!
|
||||
- We **recently disabled CI-testing on windows** and need
|
||||
help getting it running again! (see `#327
|
||||
<https://github.com/goodboy/tractor/pull/327>`_). **We do
|
||||
have windows support** (and have for quite a while) but
|
||||
since no active hacker exists in the user-base to help
|
||||
test on that OS, for now we're not actively maintaining
|
||||
testing due to the added hassle and general latency..
|
||||
- We **recently disabled CI-testing on windows** and need help getting
|
||||
it running again! (see `#327
|
||||
<https://github.com/goodboy/tractor/pull/327>`_). **We do have windows
|
||||
support** (and have for quite a while) but since no active hacker
|
||||
exists in the user-base to help test on that OS, for now we're not
|
||||
actively maintaining testing due to the added hassle and general
|
||||
latency..
|
||||
|
||||
|
||||
Feel like saying hi?
|
||||
|
|
|
|||
|
|
@ -17,7 +17,6 @@ from tractor import (
|
|||
MsgStream,
|
||||
_testing,
|
||||
trionics,
|
||||
TransportClosed,
|
||||
)
|
||||
import trio
|
||||
import pytest
|
||||
|
|
@ -209,16 +208,12 @@ async def main(
|
|||
# TODO: is this needed or no?
|
||||
raise
|
||||
|
||||
except (
|
||||
trio.ClosedResourceError,
|
||||
TransportClosed,
|
||||
) as _tpt_err:
|
||||
except trio.ClosedResourceError:
|
||||
# NOTE: don't send if we already broke the
|
||||
# connection to avoid raising a closed-error
|
||||
# such that we drop through to the ctl-c
|
||||
# mashing by user.
|
||||
with trio.CancelScope(shield=True):
|
||||
await trio.sleep(0.01)
|
||||
await trio.sleep(0.01)
|
||||
|
||||
# timeout: int = 1
|
||||
# with trio.move_on_after(timeout) as cs:
|
||||
|
|
@ -252,7 +247,6 @@ async def main(
|
|||
await stream.send(i)
|
||||
pytest.fail('stream not closed?')
|
||||
except (
|
||||
TransportClosed,
|
||||
trio.ClosedResourceError,
|
||||
trio.EndOfChannel,
|
||||
) as send_err:
|
||||
|
|
|
|||
|
|
@ -18,14 +18,15 @@ async def aio_sleep_forever():
|
|||
|
||||
|
||||
async def bp_then_error(
|
||||
chan: to_asyncio.LinkedTaskChannel,
|
||||
to_trio: trio.MemorySendChannel,
|
||||
from_trio: asyncio.Queue,
|
||||
|
||||
raise_after_bp: bool = True,
|
||||
|
||||
) -> None:
|
||||
|
||||
# sync with `trio`-side (caller) task
|
||||
chan.started_nowait('start')
|
||||
to_trio.send_nowait('start')
|
||||
|
||||
# NOTE: what happens here inside the hook needs some refinement..
|
||||
# => seems like it's still `.debug._set_trace()` but
|
||||
|
|
@ -59,7 +60,7 @@ async def trio_ctx(
|
|||
to_asyncio.open_channel_from(
|
||||
bp_then_error,
|
||||
# raise_after_bp=not bp_before_started,
|
||||
) as (chan, first),
|
||||
) as (first, chan),
|
||||
|
||||
trio.open_nursery() as tn,
|
||||
):
|
||||
|
|
|
|||
|
|
@ -20,7 +20,7 @@ async def sleep(
|
|||
|
||||
|
||||
async def open_ctx(
|
||||
n: tractor.runtime._supervise.ActorNursery
|
||||
n: tractor._supervise.ActorNursery
|
||||
):
|
||||
|
||||
# spawn both actors
|
||||
|
|
|
|||
|
|
@ -27,9 +27,12 @@ async def main():
|
|||
'''
|
||||
async with tractor.open_nursery(
|
||||
debug_mode=True,
|
||||
) as an:
|
||||
p0 = await an.start_actor('bp_forever', enable_modules=[__name__])
|
||||
p1 = await an.start_actor('name_error', enable_modules=[__name__])
|
||||
loglevel='cancel',
|
||||
# loglevel='devx',
|
||||
) as n:
|
||||
|
||||
p0 = await n.start_actor('bp_forever', enable_modules=[__name__])
|
||||
p1 = await n.start_actor('name_error', enable_modules=[__name__])
|
||||
|
||||
# retreive results
|
||||
async with p0.open_stream_from(breakpoint_forever) as stream:
|
||||
|
|
|
|||
|
|
@ -67,7 +67,7 @@ async def main():
|
|||
"""
|
||||
async with tractor.open_nursery(
|
||||
debug_mode=True,
|
||||
loglevel='pdb',
|
||||
# loglevel='cancel',
|
||||
) as n:
|
||||
|
||||
# spawn both actors
|
||||
|
|
|
|||
|
|
@ -39,8 +39,8 @@ async def main():
|
|||
'''
|
||||
async with tractor.open_nursery(
|
||||
debug_mode=True,
|
||||
enable_transports=['uds'], # TODO, apss this via osenv?
|
||||
loglevel='devx', # XXX, required for test!
|
||||
loglevel='devx',
|
||||
enable_transports=['uds'],
|
||||
) as n:
|
||||
|
||||
# spawn both actors
|
||||
|
|
|
|||
|
|
@ -1,3 +1,4 @@
|
|||
|
||||
import trio
|
||||
import tractor
|
||||
|
||||
|
|
@ -8,22 +9,16 @@ async def key_error():
|
|||
|
||||
|
||||
async def main():
|
||||
'''
|
||||
Root is fail-after-cancelled while blocking and child RPC fails
|
||||
simultaneously.
|
||||
"""Root dies
|
||||
|
||||
'''
|
||||
"""
|
||||
async with tractor.open_nursery(
|
||||
debug_mode=True,
|
||||
# loglevel='debug' # ?XXX required?
|
||||
loglevel='debug'
|
||||
) as n:
|
||||
|
||||
# spawn both actors
|
||||
portal = await n.run_in_actor(key_error)
|
||||
print(
|
||||
f'Child is up @ {portal.chan.aid.reprol()}'
|
||||
)
|
||||
|
||||
|
||||
# XXX: originally a bug caused by this is where root would enter
|
||||
# the debugger and clobber the tty used by the repl even though
|
||||
|
|
|
|||
|
|
@ -3,7 +3,6 @@ Verify we can dump a `stackscope` tree on a hang.
|
|||
|
||||
'''
|
||||
import os
|
||||
import platform
|
||||
import signal
|
||||
|
||||
import trio
|
||||
|
|
@ -32,28 +31,13 @@ async def main(
|
|||
from_test: bool = False,
|
||||
) -> None:
|
||||
|
||||
if platform.system() != 'Darwin':
|
||||
tpt = 'uds'
|
||||
else:
|
||||
# XXX, precisely we can't use pytest's tmp-path generation
|
||||
# for tests.. apparently because:
|
||||
#
|
||||
# > The OSError: AF_UNIX path too long in macOS Python occurs
|
||||
# > because the path to the Unix domain socket exceeds the
|
||||
# > operating system's maximum path length limit (around 104
|
||||
#
|
||||
# WHICH IS just, wtf hillarious XD
|
||||
tpt = 'tcp'
|
||||
|
||||
async with (
|
||||
tractor.open_nursery(
|
||||
debug_mode=True,
|
||||
enable_stack_on_sig=True,
|
||||
loglevel='devx', # XXX REQUIRED log level!
|
||||
enable_transports=[tpt],
|
||||
# maybe_enable_greenback=True,
|
||||
# ^TODO? maybe a "smarter" way todo all this is how
|
||||
# `modden` does with a rtv serialized through the osenv?
|
||||
# maybe_enable_greenback=False,
|
||||
loglevel='devx',
|
||||
enable_transports=['uds'],
|
||||
) as an,
|
||||
):
|
||||
ptl: tractor.Portal = await an.start_actor(
|
||||
|
|
@ -65,9 +49,7 @@ async def main(
|
|||
start_n_shield_hang,
|
||||
) as (ctx, cpid):
|
||||
|
||||
_, proc, _ = an._children[
|
||||
ptl.chan.aid.uid
|
||||
]
|
||||
_, proc, _ = an._children[ptl.chan.uid]
|
||||
assert cpid == proc.pid
|
||||
|
||||
print(
|
||||
|
|
|
|||
|
|
@ -1,5 +1,3 @@
|
|||
import platform
|
||||
|
||||
import tractor
|
||||
import trio
|
||||
|
||||
|
|
@ -36,27 +34,9 @@ async def just_bp(
|
|||
|
||||
async def main():
|
||||
|
||||
# !TODO, parametrize the --tpt-proto={key} with osenv vars just
|
||||
# like we do for loglevel/spawn-backend!
|
||||
# - [ ] run on both tpts for all such debugger tests?
|
||||
# - [ ] special skip for macos!
|
||||
#
|
||||
if platform.system() != 'Darwin':
|
||||
tpt = 'uds'
|
||||
else:
|
||||
# XXX, precisely we can't use pytest's tmp-path generation
|
||||
# for tests.. apparently because:
|
||||
#
|
||||
# > The OSError: AF_UNIX path too long in macOS Python occurs
|
||||
# > because the path to the Unix domain socket exceeds the
|
||||
# > operating system's maximum path length limit (around 104
|
||||
#
|
||||
# WHICH IS just, wtf hillarious XD
|
||||
tpt = 'tcp'
|
||||
|
||||
async with tractor.open_nursery(
|
||||
debug_mode=True,
|
||||
enable_transports=[tpt],
|
||||
enable_transports=['uds'],
|
||||
loglevel='devx',
|
||||
) as n:
|
||||
p = await n.start_actor(
|
||||
|
|
|
|||
|
|
@ -9,6 +9,7 @@ async def name_error():
|
|||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
debug_mode=True,
|
||||
# loglevel='transport',
|
||||
) as an:
|
||||
|
||||
# TODO: ideally the REPL arrives at this frame in the parent,
|
||||
|
|
|
|||
|
|
@ -1,22 +1,9 @@
|
|||
from functools import partial
|
||||
import os
|
||||
import time
|
||||
|
||||
# ?TODO? how to make `pdbp` enforce this?
|
||||
# os.environ['PYTHON_COLORS'] = '0'
|
||||
# os.environ['NO_COLOR'] = '1'
|
||||
|
||||
import trio
|
||||
import tractor
|
||||
|
||||
# disable `pbdp` prompt colors
|
||||
# for prompt matching in test.
|
||||
def disable_pdbp_color():
|
||||
if os.environ['PYTHON_COLORS'] == '0':
|
||||
from tractor.devx.debug import _repl
|
||||
_repl.TractorConfig.use_pygments = False
|
||||
|
||||
|
||||
# TODO: only import these when not running from test harness?
|
||||
# can we detect `pexpect` usage maybe?
|
||||
# from tractor.devx.debug import (
|
||||
|
|
@ -55,7 +42,6 @@ async def start_n_sync_pause(
|
|||
ctx: tractor.Context,
|
||||
):
|
||||
actor: tractor.Actor = tractor.current_actor()
|
||||
disable_pdbp_color()
|
||||
|
||||
# sync to parent-side task
|
||||
await ctx.started()
|
||||
|
|
@ -66,15 +52,13 @@ async def start_n_sync_pause(
|
|||
|
||||
|
||||
async def main() -> None:
|
||||
disable_pdbp_color()
|
||||
async with (
|
||||
tractor.open_nursery(
|
||||
debug_mode=True,
|
||||
maybe_enable_greenback=True,
|
||||
|
||||
# XXX flags required for test pattern matching.
|
||||
loglevel='pdb',
|
||||
# enable_stack_on_sig=True,
|
||||
enable_stack_on_sig=True,
|
||||
# loglevel='warning',
|
||||
# loglevel='devx',
|
||||
) as an,
|
||||
trio.open_nursery() as tn,
|
||||
):
|
||||
|
|
@ -84,8 +68,8 @@ async def main() -> None:
|
|||
p: tractor.Portal = await an.start_actor(
|
||||
'subactor',
|
||||
enable_modules=[__name__],
|
||||
debug_mode=True,
|
||||
# infect_asyncio=True,
|
||||
debug_mode=True,
|
||||
)
|
||||
|
||||
# TODO: 3 sub-actor usage cases:
|
||||
|
|
|
|||
|
|
@ -90,7 +90,7 @@ async def main() -> list[int]:
|
|||
# yes, a nursery which spawns `trio`-"actors" B)
|
||||
an: ActorNursery
|
||||
async with tractor.open_nursery(
|
||||
loglevel='error',
|
||||
loglevel='cancel',
|
||||
# debug_mode=True,
|
||||
) as an:
|
||||
|
||||
|
|
@ -118,10 +118,8 @@ async def main() -> list[int]:
|
|||
cancelled: bool = await portal.cancel_actor()
|
||||
assert cancelled
|
||||
|
||||
print(
|
||||
f"STREAM TIME = {time.time() - start}\n"
|
||||
f"STREAM + SPAWN TIME = {time.time() - pre_start}\n"
|
||||
)
|
||||
print(f"STREAM TIME = {time.time() - start}")
|
||||
print(f"STREAM + SPAWN TIME = {time.time() - pre_start}")
|
||||
assert result_stream == list(range(seed))
|
||||
return result_stream
|
||||
|
||||
|
|
|
|||
|
|
@ -11,17 +11,21 @@ import tractor
|
|||
|
||||
|
||||
async def aio_echo_server(
|
||||
chan: tractor.to_asyncio.LinkedTaskChannel,
|
||||
to_trio: trio.MemorySendChannel,
|
||||
from_trio: asyncio.Queue,
|
||||
|
||||
) -> None:
|
||||
|
||||
# a first message must be sent **from** this ``asyncio``
|
||||
# task or the ``trio`` side will never unblock from
|
||||
# ``tractor.to_asyncio.open_channel_from():``
|
||||
chan.started_nowait('start')
|
||||
to_trio.send_nowait('start')
|
||||
|
||||
# XXX: this uses an ``from_trio: asyncio.Queue`` currently but we
|
||||
# should probably offer something better.
|
||||
while True:
|
||||
# echo the msg back
|
||||
chan.send_nowait(await chan.get())
|
||||
to_trio.send_nowait(await from_trio.get())
|
||||
await asyncio.sleep(0)
|
||||
|
||||
|
||||
|
|
@ -33,7 +37,7 @@ async def trio_to_aio_echo_server(
|
|||
# message.
|
||||
async with tractor.to_asyncio.open_channel_from(
|
||||
aio_echo_server,
|
||||
) as (chan, first):
|
||||
) as (first, chan):
|
||||
|
||||
assert first == 'start'
|
||||
await ctx.started(first)
|
||||
|
|
|
|||
|
|
@ -1,5 +0,0 @@
|
|||
import os
|
||||
|
||||
|
||||
async def child_fn() -> str:
|
||||
return f"child OK pid={os.getpid()}"
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
"""
|
||||
Integration test: spawning tractor actors from an MPI process.
|
||||
|
||||
When a parent is launched via ``mpirun``, Open MPI sets ``OMPI_*`` env
|
||||
vars that bind ``MPI_Init`` to the ``orted`` daemon. Tractor children
|
||||
inherit those env vars, so if ``inherit_parent_main=True`` (the default)
|
||||
the child re-executes ``__main__``, re-imports ``mpi4py``, and
|
||||
``MPI_Init_thread`` fails because the child was never spawned by
|
||||
``orted``::
|
||||
|
||||
getting local rank failed
|
||||
--> Returned value No permission (-17) instead of ORTE_SUCCESS
|
||||
|
||||
Passing ``inherit_parent_main=False`` and placing RPC functions in a
|
||||
separate importable module (``_child``) avoids the re-import entirely.
|
||||
|
||||
Usage::
|
||||
|
||||
mpirun --allow-run-as-root -np 1 python -m \
|
||||
examples.integration.mpi4py.inherit_parent_main
|
||||
"""
|
||||
|
||||
from mpi4py import MPI
|
||||
|
||||
import os
|
||||
import trio
|
||||
import tractor
|
||||
|
||||
from ._child import child_fn
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
rank = MPI.COMM_WORLD.Get_rank()
|
||||
print(f"[parent] rank={rank} pid={os.getpid()}", flush=True)
|
||||
|
||||
async with tractor.open_nursery(start_method='trio') as an:
|
||||
portal = await an.start_actor(
|
||||
'mpi-child',
|
||||
enable_modules=[child_fn.__module__],
|
||||
# Without this the child replays __main__, which
|
||||
# re-imports mpi4py and crashes on MPI_Init.
|
||||
inherit_parent_main=False,
|
||||
)
|
||||
result = await portal.run(child_fn)
|
||||
print(f"[parent] got: {result}", flush=True)
|
||||
await portal.cancel_actor()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
trio.run(main)
|
||||
|
|
@ -10,7 +10,7 @@ async def main(service_name):
|
|||
await an.start_actor(service_name)
|
||||
|
||||
async with tractor.get_registry() as portal:
|
||||
print(f"Registrar is listening on {portal.channel}")
|
||||
print(f"Arbiter is listening on {portal.channel}")
|
||||
|
||||
async with tractor.wait_for_actor(service_name) as sockaddr:
|
||||
print(f"my_service is found at {sockaddr}")
|
||||
|
|
|
|||
27
flake.lock
27
flake.lock
|
|
@ -1,27 +0,0 @@
|
|||
{
|
||||
"nodes": {
|
||||
"nixpkgs": {
|
||||
"locked": {
|
||||
"lastModified": 1769018530,
|
||||
"narHash": "sha256-MJ27Cy2NtBEV5tsK+YraYr2g851f3Fl1LpNHDzDX15c=",
|
||||
"owner": "nixos",
|
||||
"repo": "nixpkgs",
|
||||
"rev": "88d3861acdd3d2f0e361767018218e51810df8a1",
|
||||
"type": "github"
|
||||
},
|
||||
"original": {
|
||||
"owner": "nixos",
|
||||
"ref": "nixos-unstable",
|
||||
"repo": "nixpkgs",
|
||||
"type": "github"
|
||||
}
|
||||
},
|
||||
"root": {
|
||||
"inputs": {
|
||||
"nixpkgs": "nixpkgs"
|
||||
}
|
||||
}
|
||||
},
|
||||
"root": "root",
|
||||
"version": 7
|
||||
}
|
||||
70
flake.nix
70
flake.nix
|
|
@ -1,70 +0,0 @@
|
|||
# An "impure" template thx to `pyproject.nix`,
|
||||
# https://pyproject-nix.github.io/pyproject.nix/templates.html#impure
|
||||
# https://github.com/pyproject-nix/pyproject.nix/blob/master/templates/impure/flake.nix
|
||||
{
|
||||
description = "An impure overlay (w dev-shell) using `uv`";
|
||||
|
||||
inputs = {
|
||||
nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
|
||||
};
|
||||
|
||||
outputs =
|
||||
{ nixpkgs, ... }:
|
||||
let
|
||||
inherit (nixpkgs) lib;
|
||||
forAllSystems = lib.genAttrs lib.systems.flakeExposed;
|
||||
in
|
||||
{
|
||||
devShells = forAllSystems (
|
||||
system:
|
||||
let
|
||||
pkgs = nixpkgs.legacyPackages.${system};
|
||||
|
||||
# XXX NOTE XXX, for now we overlay specific pkgs via
|
||||
# a major-version-pinned-`cpython`
|
||||
cpython = "python313";
|
||||
venv_dir = "py313";
|
||||
pypkgs = pkgs."${cpython}Packages";
|
||||
in
|
||||
{
|
||||
default = pkgs.mkShell {
|
||||
|
||||
packages = [
|
||||
# XXX, ensure sh completions activate!
|
||||
pkgs.bashInteractive
|
||||
pkgs.bash-completion
|
||||
|
||||
# XXX, on nix(os), use pkgs version to avoid
|
||||
# build/sys-sh-integration issues
|
||||
pkgs.ruff
|
||||
|
||||
pkgs.uv
|
||||
pkgs.${cpython}# ?TODO^ how to set from `cpython` above?
|
||||
];
|
||||
|
||||
shellHook = ''
|
||||
# unmask to debug **this** dev-shell-hook
|
||||
# set -e
|
||||
|
||||
# link-in c++ stdlib for various AOT-ext-pkgs (numpy, etc.)
|
||||
LD_LIBRARY_PATH="${pkgs.stdenv.cc.cc.lib}/lib:$LD_LIBRARY_PATH"
|
||||
|
||||
export LD_LIBRARY_PATH
|
||||
|
||||
# RUNTIME-SETTINGS
|
||||
# ------ uv ------
|
||||
# - always use the ./py313/ venv-subdir
|
||||
# - sync env with all extras
|
||||
export UV_PROJECT_ENVIRONMENT=${venv_dir}
|
||||
uv sync --dev --all-extras
|
||||
|
||||
# ------ TIPS ------
|
||||
# NOTE, to launch the py-venv installed `xonsh` (like @goodboy)
|
||||
# run the `nix develop` cmd with,
|
||||
# >> nix develop -c uv run xonsh
|
||||
'';
|
||||
};
|
||||
}
|
||||
);
|
||||
};
|
||||
}
|
||||
153
pyproject.toml
153
pyproject.toml
|
|
@ -9,7 +9,7 @@ name = "tractor"
|
|||
version = "0.1.0a6dev0"
|
||||
description = 'structured concurrent `trio`-"actors"'
|
||||
authors = [{ name = "Tyler Goodlet", email = "goodboy_foss@protonmail.com" }]
|
||||
requires-python = ">=3.13, <3.15"
|
||||
requires-python = ">= 3.11"
|
||||
readme = "docs/README.rst"
|
||||
license = "AGPL-3.0-or-later"
|
||||
keywords = [
|
||||
|
|
@ -24,14 +24,11 @@ keywords = [
|
|||
classifiers = [
|
||||
"Development Status :: 3 - Alpha",
|
||||
"Operating System :: POSIX :: Linux",
|
||||
"Operating System :: MacOS",
|
||||
"Framework :: Trio",
|
||||
"License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)",
|
||||
"Programming Language :: Python :: Implementation :: CPython",
|
||||
"Programming Language :: Python :: 3 :: Only",
|
||||
"Programming Language :: Python :: 3.12",
|
||||
"Programming Language :: Python :: 3.13",
|
||||
"Programming Language :: Python :: 3.14",
|
||||
"Programming Language :: Python :: 3.11",
|
||||
"Topic :: System :: Distributed Computing",
|
||||
]
|
||||
dependencies = [
|
||||
|
|
@ -45,115 +42,48 @@ dependencies = [
|
|||
"wrapt>=1.16.0,<2",
|
||||
"colorlog>=6.8.2,<7",
|
||||
# built-in multi-actor `pdb` REPL
|
||||
"pdbp>=1.8.2,<2", # windows only (from `pdbp`)
|
||||
"pdbp>=1.6,<2", # windows only (from `pdbp`)
|
||||
# typed IPC msging
|
||||
"msgspec>=0.20.0",
|
||||
"msgspec>=0.19.0",
|
||||
"cffi>=1.17.1",
|
||||
"bidict>=0.23.1",
|
||||
"multiaddr>=0.2.0",
|
||||
"platformdirs>=4.4.0",
|
||||
# per-actor `argv[0]` proc-title for OS-level diag tools
|
||||
# (`ps`, `top`, `psutil`-backed tooling like `acli.pytree`).
|
||||
# Optional at runtime — guarded by `try/except ImportError` in
|
||||
# `tractor.devx._proctitle` — but listed here so default
|
||||
# installs benefit from it. See tracking issue for follow-ups
|
||||
# (e.g. richer formats, per-backend overrides).
|
||||
"setproctitle>=1.3,<2",
|
||||
]
|
||||
|
||||
# ------ project ------
|
||||
|
||||
[dependency-groups]
|
||||
dev = [
|
||||
{include-group = 'devx'},
|
||||
{include-group = 'testing'},
|
||||
{include-group = 'repl'},
|
||||
{include-group = 'sync_pause'},
|
||||
]
|
||||
devx = [
|
||||
# `tractor.devx` tooling
|
||||
"stackscope>=0.2.2,<0.3",
|
||||
# ^ requires this?
|
||||
"typing-extensions>=4.14.1",
|
||||
# {include-group = 'sync_pause'}, # XXX, no 3.14 yet!
|
||||
]
|
||||
sync_pause = [
|
||||
"greenback>=1.2.1,<2", # TODO? 3.14 greenlet on nix?
|
||||
]
|
||||
testing = [
|
||||
# test suite
|
||||
# TODO: maybe some of these layout choices?
|
||||
# https://docs.pytest.org/en/8.0.x/explanation/goodpractices.html#choosing-a-test-layout-import-rules
|
||||
# bumped 8.3.5 → 9.0 per upstream security advisory + our
|
||||
# local-only reliance on the post-9.0 capture-machinery shape
|
||||
# (the `sys.__stderr__`-bypass print in
|
||||
# `tractor._testing.trace._do_capture_snapshot` works on 8.x
|
||||
# too, but standardizing on 9.x here ensures `--show-capture`
|
||||
# interactions stay predictable across dev installs).
|
||||
"pytest>=9.0",
|
||||
"pytest>=8.3.5",
|
||||
"pexpect>=4.9.0,<5",
|
||||
# per-test wall-clock bound (used via
|
||||
# `@pytest.mark.timeout(..., method='thread')` on the
|
||||
# known-hanging `subint`-backend audit tests; see
|
||||
# `ai/conc-anal/subint_*_issue.md`).
|
||||
"pytest-timeout>=2.3",
|
||||
# used by `tractor._testing._reap` for the
|
||||
# `tractor-reap` zombie-subactor + leaked-shm
|
||||
# cleanup utility (xplatform `Process.memory_maps`,
|
||||
# `Process.open_files`).
|
||||
"psutil>=7.0.0",
|
||||
]
|
||||
repl = [
|
||||
# `tractor.devx` tooling
|
||||
"greenback>=1.2.1,<2",
|
||||
"stackscope>=0.2.2,<0.3",
|
||||
# ^ requires this?
|
||||
"typing-extensions>=4.14.1",
|
||||
|
||||
"pyperclip>=1.9.0",
|
||||
"prompt-toolkit>=3.0.50",
|
||||
"xonsh>=0.23.8",
|
||||
"xonsh>=0.19.2",
|
||||
"psutil>=7.0.0",
|
||||
]
|
||||
lint = [
|
||||
"ruff>=0.9.6"
|
||||
]
|
||||
# XXX, used for linux-only hi perf eventfd+shm channels
|
||||
# now mostly moved over to `hotbaud`.
|
||||
eventfd = [
|
||||
"cffi>=1.17.1",
|
||||
]
|
||||
subints = [
|
||||
"msgspec>=0.21.0",
|
||||
]
|
||||
# TODO, add these with sane versions; were originally in
|
||||
# `requirements-docs.txt`..
|
||||
# docs = [
|
||||
# "sphinx>="
|
||||
# "sphinx_book_theme>="
|
||||
# ]
|
||||
|
||||
# ------ dependency-groups ------
|
||||
|
||||
[tool.uv.dependency-groups]
|
||||
# for subints, we require 3.14+ due to 2 issues,
|
||||
# - hanging behaviour for various multi-task teardown cases (see
|
||||
# "Availability" section in the `tractor.spawn._subints` doc string).
|
||||
# - `msgspec` support which is oustanding per PEP 684 upstream tracker:
|
||||
# https://github.com/jcrist/msgspec/issues/563
|
||||
#
|
||||
# https://docs.astral.sh/uv/concepts/projects/dependencies/#group-requires-python
|
||||
subints = {requires-python = ">=3.14"}
|
||||
eventfd = {requires-python = ">=3.13, <3.14"}
|
||||
sync_pause = {requires-python = ">=3.13, <3.14"}
|
||||
# ------ dependency-groups ------
|
||||
|
||||
[tool.uv.sources]
|
||||
# XXX NOTE, only for @goodboy's hacking on `pprint(sort_dicts=False)`
|
||||
# for the `pp` alias..
|
||||
# ------ gh upstream ------
|
||||
# xonsh = { git = 'https://github.com/anki-code/xonsh.git', branch = 'prompt_next_suggestion' }
|
||||
# ^ https://github.com/xonsh/xonsh/pull/6048
|
||||
# xonsh = { git = 'https://github.com/xonsh/xonsh.git', branch = 'main' }
|
||||
# xonsh = { path = "../xonsh", editable = true }
|
||||
|
||||
# [tool.uv.sources.pdbp]
|
||||
# XXX, in case we need to tmp patch again.
|
||||
# git = "https://github.com/goodboy/pdbp.git"
|
||||
# branch ="repair_stack_trace_frame_indexing"
|
||||
# path = "../pdbp"
|
||||
# editable = true
|
||||
# pdbp = { path = "../pdbp", editable = true }
|
||||
|
||||
# ------ tool.uv.sources ------
|
||||
# TODO, distributed (multi-host) extensions
|
||||
|
|
@ -215,69 +145,20 @@ all_bullets = true
|
|||
|
||||
[tool.pytest.ini_options]
|
||||
minversion = '6.0'
|
||||
# NOTE: `pytest-timeout`'s global per-test cap is intentionally
|
||||
# NOT set — both of its enforcement methods break trio's
|
||||
# runtime under our fork-based spawn backends:
|
||||
#
|
||||
# - `method='signal'` (the default; SIGALRM) raises `Failed`
|
||||
# synchronously from the signal handler in trio's main
|
||||
# thread, which leaves `GLOBAL_RUN_CONTEXT` half-installed
|
||||
# ("Trio guest run got abandoned"). EVERY subsequent
|
||||
# `trio.run()` in the same pytest session then bails with
|
||||
# `RuntimeError: Attempted to call run() from inside a
|
||||
# run()` — full-session poison: a single 200s hang
|
||||
# cascades into 30+ false-positive failures across
|
||||
# downstream test files.
|
||||
#
|
||||
# - `method='thread'` calls `_thread.interrupt_main()` which
|
||||
# can let the resulting `KeyboardInterrupt` escape trio's
|
||||
# `KIManager` under fork-cascade teardown races, killing
|
||||
# the whole pytest session.
|
||||
#
|
||||
# For tests that legitimately need a wall-clock cap, use
|
||||
# `with trio.fail_after(N):` INSIDE the test — trio's own
|
||||
# Cancelled machinery handles the timeout cleanly through
|
||||
# the actor nursery without disturbing global state. See
|
||||
# `tests/test_advanced_streaming.py::test_dynamic_pub_sub`'s
|
||||
# module-level NOTE for the canonical pattern.
|
||||
#
|
||||
# CI environments should rely on job-level wall-clock
|
||||
# timeouts (e.g. GitHub Actions `timeout-minutes`) for an
|
||||
# escape hatch on genuinely-stuck suites.
|
||||
# https://docs.pytest.org/en/stable/reference/reference.html#configuration-options
|
||||
testpaths = [
|
||||
'tests'
|
||||
]
|
||||
addopts = [
|
||||
# TODO: figure out why this isn't working..
|
||||
'--rootdir=./tests',
|
||||
|
||||
'--import-mode=importlib',
|
||||
# don't show frickin captured logs AGAIN in the report..
|
||||
'--show-capture=no',
|
||||
|
||||
# load builtin plugin since we need a boostrapping hook,
|
||||
# `pytest_load_initial_conftests()` for `--capture=` per:
|
||||
# https://docs.pytest.org/en/stable/reference/reference.html#bootstrapping-hooks
|
||||
'-p tractor._testing.pytest',
|
||||
|
||||
# disable `xonsh` plugin
|
||||
# https://docs.pytest.org/en/stable/how-to/plugins.html#disabling-plugins-from-autoloading
|
||||
# https://docs.pytest.org/en/stable/how-to/plugins.html#deactivating-unregistering-a-plugin-by-name
|
||||
'-p no:xonsh',
|
||||
|
||||
# XXX default on non-forking spawners
|
||||
'--capture=fd',
|
||||
# '--capture=sys',
|
||||
# ^XXX NOTE^ ALWAYS SET THIS for `*_forkserver` spawner
|
||||
# backends! see details @
|
||||
# `tractor._testing.pytest.pytest_load_initial_conftests()`
|
||||
|
||||
]
|
||||
log_cli = false
|
||||
# TODO: maybe some of these layout choices?
|
||||
# https://docs.pytest.org/en/8.0.x/explanation/goodpractices.html#choosing-a-test-layout-import-rules
|
||||
# pythonpath = "src"
|
||||
|
||||
# https://docs.pytest.org/en/stable/reference/reference.html#confval-console_output_style
|
||||
console_output_style = 'progress'
|
||||
# ------ tool.pytest ------
|
||||
|
|
|
|||
|
|
@ -0,0 +1,8 @@
|
|||
# vim: ft=ini
|
||||
# pytest.ini for tractor
|
||||
|
||||
[pytest]
|
||||
# don't show frickin captured logs AGAIN in the report..
|
||||
addopts = --show-capture='no'
|
||||
log_cli = false
|
||||
; minversion = 6.0
|
||||
|
|
@ -35,8 +35,8 @@ exclude = [
|
|||
line-length = 88
|
||||
indent-width = 4
|
||||
|
||||
# assume latest minor cpython
|
||||
target-version = "py313"
|
||||
# Assume Python 3.9
|
||||
target-version = "py311"
|
||||
|
||||
[lint]
|
||||
# Enable Pyflakes (`F`) and a subset of the pycodestyle (`E`) codes by default.
|
||||
|
|
|
|||
|
|
@ -1,237 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
# tractor: structured concurrent "actors".
|
||||
# Copyright 2018-eternity Tyler Goodlet.
|
||||
#
|
||||
# SPDX-License-Identifier: AGPL-3.0-or-later
|
||||
'''
|
||||
`tractor-reap` — SC-polite zombie-subactor reaper +
|
||||
optional `/dev/shm/` orphan-segment sweep.
|
||||
|
||||
Two cleanup phases (run in order when both are enabled):
|
||||
|
||||
1. **process reap** — finds `tractor` subactor processes
|
||||
left alive after a `pytest` (or any tractor-app) run
|
||||
that failed to fully cancel its actor tree, then sends
|
||||
SIGINT with a bounded grace window before escalating
|
||||
to SIGKILL.
|
||||
|
||||
2. **shm sweep** (`--shm` / `--shm-only`) — unlinks
|
||||
`/dev/shm/<file>` entries owned by the current uid
|
||||
that no live process has open (mmap'd or fd-held).
|
||||
Needed because `tractor` disables
|
||||
`mp.resource_tracker` (see `tractor.ipc._mp_bs`), so a
|
||||
hard-crashing actor leaves leaked segments that
|
||||
nothing else GCs.
|
||||
|
||||
3. **UDS sweep** (`--uds` / `--uds-only`) — unlinks
|
||||
`${XDG_RUNTIME_DIR}/tractor/<name>@<pid>.sock` files
|
||||
whose binder pid is dead (or the `1616` registry
|
||||
sentinel). Needed because the IPC server's
|
||||
`os.unlink()` cleanup lives in a `finally:` block
|
||||
that doesn't always run on hard exits (SIGKILL,
|
||||
escaped `KeyboardInterrupt`, etc.) — see issue #452.
|
||||
|
||||
Process-reap detection modes (auto-selected):
|
||||
|
||||
--parent <pid> : descendant-mode — kill procs whose
|
||||
PPid == <pid>. Use when a parent
|
||||
is still alive and you want to
|
||||
scope the sweep precisely (e.g.
|
||||
CI wrapper calling in from outside
|
||||
pytest).
|
||||
|
||||
(default) : orphan-mode — kill procs with
|
||||
PPid==1 (init-reparented) whose
|
||||
cwd matches the repo root AND
|
||||
whose cmdline contains `python`.
|
||||
The cwd filter is what prevents
|
||||
sweeping unrelated init-children.
|
||||
|
||||
Usage:
|
||||
|
||||
# process reap only (default)
|
||||
scripts/tractor-reap
|
||||
|
||||
# process reap + shm sweep
|
||||
scripts/tractor-reap --shm
|
||||
|
||||
# only the shm sweep, skip process reap
|
||||
scripts/tractor-reap --shm-only
|
||||
|
||||
# process reap + shm + UDS sweep (the works)
|
||||
scripts/tractor-reap --shm --uds
|
||||
|
||||
# only UDS sweep
|
||||
scripts/tractor-reap --uds-only
|
||||
|
||||
# from inside a still-live supervisor
|
||||
scripts/tractor-reap --parent 12345
|
||||
|
||||
# dry-run: list what would be reaped, don't act
|
||||
scripts/tractor-reap -n
|
||||
scripts/tractor-reap --shm --uds -n
|
||||
|
||||
'''
|
||||
import argparse
|
||||
import pathlib
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
|
||||
def _repo_root() -> pathlib.Path:
|
||||
'''
|
||||
Use `git rev-parse --show-toplevel` when available;
|
||||
fall back to the repo this script lives in.
|
||||
|
||||
'''
|
||||
try:
|
||||
out: str = subprocess.check_output(
|
||||
['git', 'rev-parse', '--show-toplevel'],
|
||||
stderr=subprocess.DEVNULL,
|
||||
text=True,
|
||||
).strip()
|
||||
return pathlib.Path(out)
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
return pathlib.Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(
|
||||
prog='tractor-reap',
|
||||
description=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
parser.add_argument(
|
||||
'--parent', '-p',
|
||||
type=int,
|
||||
default=None,
|
||||
help='descendant-mode: reap procs with PPid==<pid>',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--grace', '-g',
|
||||
type=float,
|
||||
default=3.0,
|
||||
help='SIGINT grace window in seconds (default 3.0)',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--dry-run', '-n',
|
||||
action='store_true',
|
||||
help='list matched pids/paths but do not signal/unlink',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--shm',
|
||||
action='store_true',
|
||||
help=(
|
||||
'after process reap, also unlink orphaned '
|
||||
'/dev/shm segments owned by the current user '
|
||||
'that no live process is mapping or holding open'
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
'--shm-only',
|
||||
action='store_true',
|
||||
help='skip process reap; only do the shm sweep',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--uds',
|
||||
action='store_true',
|
||||
help=(
|
||||
'after process reap, also unlink orphaned '
|
||||
'${XDG_RUNTIME_DIR}/tractor/*.sock files '
|
||||
'whose binder pid is dead (or the 1616 '
|
||||
'registry sentinel). See issue #452.'
|
||||
),
|
||||
)
|
||||
parser.add_argument(
|
||||
'--uds-only',
|
||||
action='store_true',
|
||||
help='skip process reap + shm; only do the UDS sweep',
|
||||
)
|
||||
args = parser.parse_args()
|
||||
# any *-only flag also skips the process reap phase
|
||||
skip_proc_reap: bool = (
|
||||
args.shm_only
|
||||
or
|
||||
args.uds_only
|
||||
)
|
||||
|
||||
# import lazily so `--help` doesn't require the tractor
|
||||
# package to be importable (e.g. when running from a
|
||||
# shell not inside a venv).
|
||||
repo = _repo_root()
|
||||
sys.path.insert(0, str(repo))
|
||||
from tractor._testing._reap import (
|
||||
find_descendants,
|
||||
find_orphans,
|
||||
find_orphaned_shm,
|
||||
find_orphaned_uds,
|
||||
reap,
|
||||
reap_shm,
|
||||
reap_uds,
|
||||
)
|
||||
|
||||
rc: int = 0
|
||||
|
||||
# --- phase 1: process reap (skipped under --*-only) ---
|
||||
if not skip_proc_reap:
|
||||
if args.parent is not None:
|
||||
pids: list[int] = find_descendants(args.parent)
|
||||
mode: str = f'descendants of PPid={args.parent}'
|
||||
else:
|
||||
pids = find_orphans(repo)
|
||||
mode = f'orphans (PPid=1, cwd={repo})'
|
||||
|
||||
if not pids:
|
||||
print(f'[tractor-reap] no {mode} to reap')
|
||||
elif args.dry_run:
|
||||
print(
|
||||
f'[tractor-reap] dry-run — {mode}:\n {pids}'
|
||||
)
|
||||
else:
|
||||
_, survivors = reap(pids, grace=args.grace)
|
||||
if survivors:
|
||||
rc = 1
|
||||
|
||||
# --- phase 2: shm sweep (opt-in) ---
|
||||
if args.shm or args.shm_only:
|
||||
leaked: list[str] = find_orphaned_shm()
|
||||
if not leaked:
|
||||
print(
|
||||
'[tractor-reap] no orphaned /dev/shm '
|
||||
'segments to sweep'
|
||||
)
|
||||
elif args.dry_run:
|
||||
print(
|
||||
f'[tractor-reap] dry-run — {len(leaked)} '
|
||||
f'orphaned shm segment(s):\n {leaked}'
|
||||
)
|
||||
else:
|
||||
_, errors = reap_shm(leaked)
|
||||
if errors:
|
||||
rc = 1
|
||||
|
||||
# --- phase 3: UDS sweep (opt-in) ---
|
||||
if args.uds or args.uds_only:
|
||||
leaked_uds: list[str] = find_orphaned_uds()
|
||||
if not leaked_uds:
|
||||
print(
|
||||
'[tractor-reap] no orphaned UDS sock-files '
|
||||
'to sweep'
|
||||
)
|
||||
elif args.dry_run:
|
||||
print(
|
||||
f'[tractor-reap] dry-run — {len(leaked_uds)} '
|
||||
f'orphaned UDS sock-file(s):\n {leaked_uds}'
|
||||
)
|
||||
else:
|
||||
_, errors = reap_uds(leaked_uds)
|
||||
if errors:
|
||||
rc = 1
|
||||
|
||||
# exit 0 if everything cleaned cleanly, else 1 — useful
|
||||
# for CI health-check chaining.
|
||||
return rc
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
raise SystemExit(main())
|
||||
|
|
@ -9,11 +9,8 @@ import os
|
|||
import signal
|
||||
import platform
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Literal
|
||||
|
||||
import pytest
|
||||
import tractor
|
||||
from tractor._testing import (
|
||||
examples_dir as examples_dir,
|
||||
tractor_test as tractor_test,
|
||||
|
|
@ -22,111 +19,58 @@ from tractor._testing import (
|
|||
|
||||
pytest_plugins: list[str] = [
|
||||
'pytester',
|
||||
# NOTE, now loaded in `pytest-ini` section of `pyproject.toml`
|
||||
# 'tractor._testing.pytest',
|
||||
'tractor._testing.pytest',
|
||||
]
|
||||
|
||||
_ci_env: bool = os.environ.get('CI', False)
|
||||
_non_linux: bool = platform.system() != 'Linux'
|
||||
|
||||
# Sending signal.SIGINT on subprocess fails on windows. Use CTRL_* alternatives
|
||||
if platform.system() == 'Windows':
|
||||
_KILL_SIGNAL = signal.CTRL_BREAK_EVENT
|
||||
_INT_SIGNAL = signal.CTRL_C_EVENT
|
||||
_INT_RETURN_CODE = 3221225786
|
||||
_PROC_SPAWN_WAIT = 2
|
||||
else:
|
||||
_KILL_SIGNAL = signal.SIGKILL
|
||||
_INT_SIGNAL = signal.SIGINT
|
||||
_INT_RETURN_CODE = 1 if sys.version_info < (3, 8) else -signal.SIGINT.value
|
||||
_PROC_SPAWN_WAIT = (
|
||||
0.6
|
||||
if sys.version_info < (3, 7)
|
||||
else 0.4
|
||||
)
|
||||
|
||||
|
||||
no_windows = pytest.mark.skipif(
|
||||
platform.system() == "Windows",
|
||||
reason="Test is unsupported on windows",
|
||||
)
|
||||
no_macos = pytest.mark.skipif(
|
||||
platform.system() == "Darwin",
|
||||
reason="Test is unsupported on MacOS",
|
||||
)
|
||||
|
||||
|
||||
def get_cpu_state(
|
||||
icpu: int = 0,
|
||||
setting: Literal[
|
||||
'scaling_governor',
|
||||
'*_pstate_max_freq',
|
||||
'scaling_max_freq',
|
||||
# 'scaling_cur_freq',
|
||||
] = '*_pstate_max_freq',
|
||||
) -> tuple[
|
||||
Path,
|
||||
str|int,
|
||||
]|None:
|
||||
'''
|
||||
Attempt to read the (first) CPU's setting according
|
||||
to the set `setting` from under the file-sys,
|
||||
|
||||
/sys/devices/system/cpu/cpu0/cpufreq/{setting}
|
||||
|
||||
Useful to determine latency headroom for various perf affected
|
||||
test suites.
|
||||
|
||||
'''
|
||||
try:
|
||||
# Read governor for core 0 (usually same for all)
|
||||
setting_path: Path = list(
|
||||
Path(f'/sys/devices/system/cpu/cpu{icpu}/cpufreq/')
|
||||
.glob(f'{setting}')
|
||||
)[0] # <- XXX must be single match!
|
||||
with open(
|
||||
setting_path,
|
||||
'r',
|
||||
) as f:
|
||||
return (
|
||||
setting_path,
|
||||
f.read().strip(),
|
||||
)
|
||||
except (FileNotFoundError, IndexError):
|
||||
return None
|
||||
def pytest_addoption(
|
||||
parser: pytest.Parser,
|
||||
):
|
||||
# ?TODO? should this be exposed from our `._testing.pytest`
|
||||
# plugin or should we make it more explicit with `--tl` for
|
||||
# tractor logging like we do in other client projects?
|
||||
parser.addoption(
|
||||
"--ll",
|
||||
action="store",
|
||||
dest='loglevel',
|
||||
default='ERROR', help="logging level to set when testing"
|
||||
)
|
||||
|
||||
|
||||
def cpu_scaling_factor() -> float:
|
||||
'''
|
||||
Return a latency-headroom multiplier (>= 1.0) reflecting how
|
||||
much to inflate time-limits when CPU-freq scaling is active on
|
||||
linux.
|
||||
|
||||
When no scaling info is available (non-linux, missing sysfs),
|
||||
returns 1.0 (i.e. no headroom adjustment needed).
|
||||
|
||||
'''
|
||||
if _non_linux:
|
||||
return 1.
|
||||
|
||||
mx = get_cpu_state()
|
||||
cur = get_cpu_state(setting='scaling_max_freq')
|
||||
if mx is None or cur is None:
|
||||
return 1.
|
||||
|
||||
_mx_pth, max_freq = mx
|
||||
_cur_pth, cur_freq = cur
|
||||
cpu_scaled: float = int(cur_freq) / int(max_freq)
|
||||
|
||||
if cpu_scaled != 1.:
|
||||
return 1. / (
|
||||
cpu_scaled * 2 # <- bc likely "dual threaded"
|
||||
)
|
||||
|
||||
return 1.
|
||||
@pytest.fixture(scope='session', autouse=True)
|
||||
def loglevel(request):
|
||||
import tractor
|
||||
orig = tractor.log._default_loglevel
|
||||
level = tractor.log._default_loglevel = request.config.option.loglevel
|
||||
tractor.log.get_console_log(level)
|
||||
yield level
|
||||
tractor.log._default_loglevel = orig
|
||||
|
||||
|
||||
# NOTE, the `--ll`/`--tl` CLI flags + the `loglevel`, `test_log`
|
||||
# and `testing_pkg_name` fixtures have been factored into the
|
||||
# `tractor._testing.pytest` plugin (loaded via the `-p` entry in
|
||||
# `pyproject.toml`'s `[tool.pytest.ini_options]`) so downstream
|
||||
# consuming projects (eg. `modden`) inherit them for free. The
|
||||
# plugin's `testing_pkg_name` fixture defaults to `'tractor'`, so
|
||||
# this suite keeps treating `--ll` as the runtime loglevel.
|
||||
_ci_env: bool = os.environ.get('CI', False)
|
||||
|
||||
|
||||
@pytest.fixture(scope='session')
|
||||
|
|
@ -141,51 +85,92 @@ def ci_env() -> bool:
|
|||
def sig_prog(
|
||||
proc: subprocess.Popen,
|
||||
sig: int,
|
||||
canc_timeout: float = 0.2,
|
||||
tries: int = 3,
|
||||
canc_timeout: float = 0.1,
|
||||
) -> int:
|
||||
'''
|
||||
Kill the actor-process with `sig`.
|
||||
|
||||
Prefer to kill with the provided signal and
|
||||
failing a `canc_timeout`, send a `SIKILL`-like
|
||||
to ensure termination.
|
||||
|
||||
'''
|
||||
for i in range(tries):
|
||||
proc.send_signal(sig)
|
||||
if proc.poll() is None:
|
||||
print(
|
||||
f'WARNING, proc still alive after,\n'
|
||||
f'canc_timeout={canc_timeout!r}\n'
|
||||
f'sig={sig!r}\n'
|
||||
f'\n'
|
||||
f'{proc.args!r}\n'
|
||||
)
|
||||
time.sleep(canc_timeout)
|
||||
else:
|
||||
"Kill the actor-process with ``sig``."
|
||||
proc.send_signal(sig)
|
||||
time.sleep(canc_timeout)
|
||||
if not proc.poll():
|
||||
# TODO: why sometimes does SIGINT not work on teardown?
|
||||
# seems to happen only when trace logging enabled?
|
||||
if proc.poll() is None:
|
||||
print(
|
||||
f'XXX WARNING KILLING PROG WITH SIGINT XXX\n'
|
||||
f'canc_timeout={canc_timeout!r}\n'
|
||||
f'{proc.args!r}\n'
|
||||
)
|
||||
proc.send_signal(_KILL_SIGNAL)
|
||||
|
||||
proc.send_signal(_KILL_SIGNAL)
|
||||
ret: int = proc.wait()
|
||||
assert ret
|
||||
|
||||
|
||||
# NOTE, the `daemon` fixture (+ its `_wait_for_daemon_ready`
|
||||
# helper + the post-yield teardown drain logic) has been
|
||||
# moved to `tests/discovery/conftest.py` since 100% of its
|
||||
# consumers are discovery-protocol tests now living under
|
||||
# that subdir. See:
|
||||
# - `tests/discovery/test_multi_program.py`
|
||||
# - `tests/discovery/test_registrar.py`
|
||||
# - `tests/discovery/test_tpt_bind_addrs.py`
|
||||
# TODO: factor into @cm and move to `._testing`?
|
||||
@pytest.fixture
|
||||
def daemon(
|
||||
debug_mode: bool,
|
||||
loglevel: str,
|
||||
testdir: pytest.Pytester,
|
||||
reg_addr: tuple[str, int],
|
||||
tpt_proto: str,
|
||||
|
||||
) -> subprocess.Popen:
|
||||
'''
|
||||
Run a daemon root actor as a separate actor-process tree and
|
||||
"remote registrar" for discovery-protocol related tests.
|
||||
|
||||
'''
|
||||
if loglevel in ('trace', 'debug'):
|
||||
# XXX: too much logging will lock up the subproc (smh)
|
||||
loglevel: str = 'info'
|
||||
|
||||
code: str = (
|
||||
"import tractor; "
|
||||
"tractor.run_daemon([], "
|
||||
"registry_addrs={reg_addrs}, "
|
||||
"debug_mode={debug_mode}, "
|
||||
"loglevel={ll})"
|
||||
).format(
|
||||
reg_addrs=str([reg_addr]),
|
||||
ll="'{}'".format(loglevel) if loglevel else None,
|
||||
debug_mode=debug_mode,
|
||||
)
|
||||
cmd: list[str] = [
|
||||
sys.executable,
|
||||
'-c', code,
|
||||
]
|
||||
# breakpoint()
|
||||
kwargs = {}
|
||||
if platform.system() == 'Windows':
|
||||
# without this, tests hang on windows forever
|
||||
kwargs['creationflags'] = subprocess.CREATE_NEW_PROCESS_GROUP
|
||||
|
||||
proc: subprocess.Popen = testdir.popen(
|
||||
cmd,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
# UDS sockets are **really** fast to bind()/listen()/connect()
|
||||
# so it's often required that we delay a bit more starting
|
||||
# the first actor-tree..
|
||||
if tpt_proto == 'uds':
|
||||
global _PROC_SPAWN_WAIT
|
||||
_PROC_SPAWN_WAIT = 0.6
|
||||
|
||||
time.sleep(_PROC_SPAWN_WAIT)
|
||||
|
||||
assert not proc.returncode
|
||||
yield proc
|
||||
sig_prog(proc, _INT_SIGNAL)
|
||||
|
||||
# XXX! yeah.. just be reaaal careful with this bc sometimes it
|
||||
# can lock up on the `_io.BufferedReader` and hang..
|
||||
stderr: str = proc.stderr.read().decode()
|
||||
if stderr:
|
||||
print(
|
||||
f'Daemon actor tree produced STDERR:\n'
|
||||
f'{proc.args}\n'
|
||||
f'\n'
|
||||
f'{stderr}\n'
|
||||
)
|
||||
if proc.returncode != -2:
|
||||
raise RuntimeError(
|
||||
'Daemon actor tree failed !?\n'
|
||||
f'{proc.args}\n'
|
||||
)
|
||||
|
||||
|
||||
# @pytest.fixture(autouse=True)
|
||||
|
|
|
|||
|
|
@ -3,10 +3,6 @@
|
|||
|
||||
'''
|
||||
from __future__ import annotations
|
||||
import platform
|
||||
import os
|
||||
import re
|
||||
import signal
|
||||
import time
|
||||
from typing import (
|
||||
Callable,
|
||||
|
|
@ -36,29 +32,14 @@ if TYPE_CHECKING:
|
|||
from pexpect import pty_spawn
|
||||
|
||||
|
||||
_non_linux: bool = platform.system() != 'Linux'
|
||||
|
||||
|
||||
def pytest_configure(config):
|
||||
# register custom marks to avoid warnings see,
|
||||
# https://docs.pytest.org/en/stable/how-to/writing_plugins.html#registering-custom-markers
|
||||
config.addinivalue_line(
|
||||
'markers',
|
||||
'ctlcs_bish: test will (likely) not behave under SIGINT..'
|
||||
)
|
||||
|
||||
# a fn that sub-instantiates a `pexpect.spawn()`
|
||||
# and returns it.
|
||||
type PexpectSpawner = Callable[
|
||||
[str],
|
||||
pty_spawn.spawn,
|
||||
]
|
||||
type PexpectSpawner = Callable[[str], pty_spawn.spawn]
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def spawn(
|
||||
start_method: str,
|
||||
loglevel: str,
|
||||
testdir: pytest.Pytester,
|
||||
reg_addr: tuple[str, int],
|
||||
|
||||
|
|
@ -68,19 +49,9 @@ def spawn(
|
|||
run an `./examples/..` script by name.
|
||||
|
||||
'''
|
||||
supported_spawners: set[str] = {
|
||||
'trio',
|
||||
# `examples/debugging/<script>.py` picks up the spawn
|
||||
# backend via the `TRACTOR_SPAWN_METHOD` env-var which
|
||||
# is honored inside `tractor._root.open_root_actor()`,
|
||||
# so no per-script edits are required.
|
||||
'main_thread_forkserver',
|
||||
'subint_forkserver',
|
||||
}
|
||||
if start_method not in supported_spawners:
|
||||
if start_method != 'trio':
|
||||
pytest.skip(
|
||||
f'`pexpect` based tests NOT supported on spawning-backend: {start_method!r}\n'
|
||||
f'supported-spawners: {supported_spawners!r}'
|
||||
'`pexpect` based tests only supported on `trio` backend'
|
||||
)
|
||||
|
||||
def unset_colors():
|
||||
|
|
@ -92,117 +63,27 @@ def spawn(
|
|||
https://docs.python.org/3/using/cmdline.html#using-on-controlling-color
|
||||
|
||||
'''
|
||||
# disable colored tbs
|
||||
import os
|
||||
os.environ['PYTHON_COLORS'] = '0'
|
||||
# disable all ANSI color output
|
||||
# os.environ['NO_COLOR'] = '1'
|
||||
# ?TODO, doesn't seem to disable prompt color
|
||||
# for `pdbp`?
|
||||
|
||||
def set_spawn_method(
|
||||
start_method: str,
|
||||
):
|
||||
'''
|
||||
Drive the actor-spawn backend inside the spawned
|
||||
`examples/debugging/<script>.py` subproc via env-var
|
||||
(consumed by `tractor._root.open_root_actor()`),
|
||||
without requiring per-script CLI plumbing.
|
||||
|
||||
'''
|
||||
os.environ['TRACTOR_SPAWN_METHOD'] = start_method
|
||||
|
||||
def set_loglevel(
|
||||
loglevel: str|None,
|
||||
):
|
||||
'''
|
||||
Forward the test-suite parametrized `loglevel` into the
|
||||
spawned `examples/debugging/<script>.py` subproc via
|
||||
env-var (consumed by `tractor._root.open_root_actor()`),
|
||||
so console verbosity can be cranked or silenced from
|
||||
the test harness without per-script edits.
|
||||
|
||||
'''
|
||||
if loglevel:
|
||||
os.environ['TRACTOR_LOGLEVEL'] = loglevel
|
||||
else:
|
||||
os.environ.pop('TRACTOR_LOGLEVEL', None)
|
||||
|
||||
spawned: PexpectSpawner|None = None
|
||||
|
||||
def _spawn(
|
||||
cmd: str,
|
||||
expect_timeout: float = 4,
|
||||
start_method: str = start_method,
|
||||
loglevel: str|None = None,
|
||||
**mkcmd_kwargs,
|
||||
) -> pty_spawn.spawn:
|
||||
'''
|
||||
Inner closure handed to consumer tests to invoke
|
||||
`pytest.Pytester.spawn`
|
||||
|
||||
'''
|
||||
nonlocal spawned
|
||||
unset_colors()
|
||||
set_spawn_method(start_method=start_method)
|
||||
set_loglevel(
|
||||
loglevel=loglevel,
|
||||
# ?TODO^ when should this be set by `--ll <level>` ?
|
||||
# by default we apply 'error' but there should be a diff
|
||||
# vs. when the flag IS NOT passed?
|
||||
)
|
||||
spawned = testdir.spawn(
|
||||
return testdir.spawn(
|
||||
cmd=mk_cmd(
|
||||
cmd,
|
||||
**mkcmd_kwargs,
|
||||
),
|
||||
expect_timeout=(timeout:=(
|
||||
expect_timeout + 6
|
||||
if _non_linux and _ci_env
|
||||
else expect_timeout
|
||||
)),
|
||||
expect_timeout=3,
|
||||
# preexec_fn=unset_colors,
|
||||
# ^TODO? get `pytest` core to expose underlying
|
||||
# `pexpect.spawn()` stuff?
|
||||
)
|
||||
# sanity
|
||||
assert spawned.timeout == timeout
|
||||
return spawned
|
||||
|
||||
# such that test-dep can pass input script name.
|
||||
yield _spawn # the `PexpectSpawner`, type alias.
|
||||
|
||||
if (
|
||||
spawned
|
||||
and
|
||||
(ptyproc := spawned.ptyproc)
|
||||
):
|
||||
start: float = time.time()
|
||||
timeout: float = 5
|
||||
while (
|
||||
ptyproc.isalive()
|
||||
and
|
||||
(
|
||||
(_time_took := (time.time() - start))
|
||||
<
|
||||
timeout
|
||||
)
|
||||
):
|
||||
ptyproc.kill(signal.SIGINT)
|
||||
time.sleep(0.01)
|
||||
|
||||
if ptyproc.isalive():
|
||||
ptyproc.kill(signal.SIGKILL)
|
||||
|
||||
# Scope our env-var mutations to this single fixture invocation
|
||||
# — both `TRACTOR_SPAWN_METHOD` and `TRACTOR_LOGLEVEL` are
|
||||
# honored by `tractor._root.open_root_actor()` so leaking them
|
||||
# past this test could inadvertently re-route a later in-process
|
||||
# tractor test's spawn-backend / loglevel.
|
||||
os.environ.pop('TRACTOR_SPAWN_METHOD', None)
|
||||
os.environ.pop('TRACTOR_LOGLEVEL', None)
|
||||
|
||||
# TODO? ensure we've cleaned up any UDS-paths?
|
||||
# breakpoint()
|
||||
return _spawn # the `PexpectSpawner`, type alias.
|
||||
|
||||
|
||||
@pytest.fixture(
|
||||
|
|
@ -210,47 +91,25 @@ def spawn(
|
|||
ids='ctl-c={}'.format,
|
||||
)
|
||||
def ctlc(
|
||||
request: pytest.FixtureRequest,
|
||||
request,
|
||||
ci_env: bool,
|
||||
start_method: str,
|
||||
|
||||
) -> bool:
|
||||
'''
|
||||
Parametrize and optionally skip tests which handle
|
||||
ctlc-in-`pdbp`-REPL testing scenarios; certain spawners and actor-tree depths
|
||||
cope very poorly with this..
|
||||
|
||||
In particular the spawning backends from `multiprocessing` are
|
||||
fragile, as can be the default `trio` spawner under certain
|
||||
conditions where SIGINT is relayed down the entire subproc tree.
|
||||
use_ctlc = request.param
|
||||
|
||||
'''
|
||||
use_ctlc: bool = request.param
|
||||
node = request.node
|
||||
markers = node.own_markers
|
||||
for mark in markers:
|
||||
if (
|
||||
mark.name == 'has_nested_actors'
|
||||
and
|
||||
start_method not in {
|
||||
# TODO, any spawners we should try again?
|
||||
# - [ ] 'trio' but WITHOUT the SIGINT handler setup
|
||||
# per subproc?
|
||||
# 'main_thread_forkserver',
|
||||
}
|
||||
):
|
||||
if mark.name == 'has_nested_actors':
|
||||
pytest.skip(
|
||||
f'Test {node} has nested actors and fails with Ctrl-C.\n'
|
||||
f'The test can sometimes run fine locally but until'
|
||||
' we solve' 'this issue this CI test will be xfail:\n'
|
||||
'https://github.com/goodboy/tractor/issues/320'
|
||||
)
|
||||
if (
|
||||
mark.name == 'ctlcs_bish'
|
||||
and
|
||||
use_ctlc
|
||||
and
|
||||
all(mark.args)
|
||||
):
|
||||
|
||||
if mark.name == 'ctlcs_bish':
|
||||
pytest.skip(
|
||||
f'Test {node} prolly uses something from the stdlib (namely `asyncio`..)\n'
|
||||
f'The test and/or underlying example script can *sometimes* run fine '
|
||||
|
|
@ -270,10 +129,13 @@ def ctlc(
|
|||
|
||||
def expect(
|
||||
child,
|
||||
patt: str, # often a `pdbp`-prompt
|
||||
|
||||
# normally a `pdb` prompt by default
|
||||
patt: str,
|
||||
|
||||
**kwargs,
|
||||
|
||||
) -> str:
|
||||
) -> None:
|
||||
'''
|
||||
Expect wrapper that prints last seen console
|
||||
data before failing.
|
||||
|
|
@ -284,8 +146,6 @@ def expect(
|
|||
patt,
|
||||
**kwargs,
|
||||
)
|
||||
before = str(child.before.decode())
|
||||
return before
|
||||
except TIMEOUT:
|
||||
before = str(child.before.decode())
|
||||
print(before)
|
||||
|
|
@ -295,26 +155,6 @@ def expect(
|
|||
PROMPT = r"\(Pdb\+\)"
|
||||
|
||||
|
||||
# Strip terminal color / ANSI-VT100 escape sequences so
|
||||
# substring matching against REPL + traceback output stays
|
||||
# robust to color leakage — Python 3.13's colored tracebacks,
|
||||
# `pdbp`'s pygments highlighting, etc. — even when
|
||||
# `PYTHON_COLORS=0` (set in the `spawn` fixture) isn't honored
|
||||
# by every renderer in the spawned subproc.
|
||||
# Regex per https://stackoverflow.com/a/14693789
|
||||
_ansi_re: re.Pattern = re.compile(
|
||||
r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])'
|
||||
)
|
||||
|
||||
|
||||
def ansi_strip(text: str) -> str:
|
||||
'''
|
||||
Remove ANSI/VT100 escape sequences from `text`.
|
||||
|
||||
'''
|
||||
return _ansi_re.sub('', text)
|
||||
|
||||
|
||||
def in_prompt_msg(
|
||||
child: SpawnBase,
|
||||
parts: list[str],
|
||||
|
|
@ -334,7 +174,7 @@ def in_prompt_msg(
|
|||
'''
|
||||
__tracebackhide__: bool = False
|
||||
|
||||
before: str = ansi_strip(str(child.before.decode()))
|
||||
before: str = str(child.before.decode())
|
||||
for part in parts:
|
||||
if part not in before:
|
||||
if pause_on_false:
|
||||
|
|
@ -354,19 +194,16 @@ def in_prompt_msg(
|
|||
return True
|
||||
|
||||
|
||||
# NB: color-char stripping (so we can match against call-stack
|
||||
# frame output from the `ll` command and the like) is handled by
|
||||
# `ansi_strip()` applied inside `in_prompt_msg()` + below.
|
||||
# TODO: todo support terminal color-chars stripping so we can match
|
||||
# against call stack frame output from the the 'll' command the like!
|
||||
# -[ ] SO answer for stipping ANSI codes: https://stackoverflow.com/a/14693789
|
||||
def assert_before(
|
||||
child: SpawnBase,
|
||||
patts: list[str],
|
||||
**kwargs,
|
||||
) -> str:
|
||||
'''
|
||||
Assert a patter is in `child.before.decode() -> str`,
|
||||
return the full `.before` output on success.
|
||||
|
||||
'''
|
||||
**kwargs,
|
||||
|
||||
) -> None:
|
||||
__tracebackhide__: bool = False
|
||||
|
||||
assert in_prompt_msg(
|
||||
|
|
@ -377,14 +214,12 @@ def assert_before(
|
|||
err_on_false=True,
|
||||
**kwargs
|
||||
)
|
||||
before: str = ansi_strip(str(child.before.decode()))
|
||||
return before
|
||||
|
||||
|
||||
def do_ctlc(
|
||||
child,
|
||||
count: int = 3,
|
||||
delay: float|None = None,
|
||||
delay: float = 0.1,
|
||||
patt: str|None = None,
|
||||
|
||||
# expect repl UX to reprint the prompt after every
|
||||
|
|
@ -396,7 +231,6 @@ def do_ctlc(
|
|||
) -> str|None:
|
||||
|
||||
before: str|None = None
|
||||
delay = delay or 0.1
|
||||
|
||||
# make sure ctl-c sends don't do anything but repeat output
|
||||
for _ in range(count):
|
||||
|
|
@ -407,10 +241,7 @@ def do_ctlc(
|
|||
# if you run this test manually it works just fine..
|
||||
if expect_prompt:
|
||||
time.sleep(delay)
|
||||
child.expect(
|
||||
PROMPT,
|
||||
timeout=(child.timeout * 2) if _ci_env else child.timeout,
|
||||
)
|
||||
child.expect(PROMPT)
|
||||
before = str(child.before.decode())
|
||||
time.sleep(delay)
|
||||
|
||||
|
|
|
|||
|
|
@ -24,7 +24,6 @@ from pexpect.exceptions import (
|
|||
TIMEOUT,
|
||||
EOF,
|
||||
)
|
||||
import tractor
|
||||
|
||||
from .conftest import (
|
||||
do_ctlc,
|
||||
|
|
@ -38,9 +37,6 @@ from .conftest import (
|
|||
in_prompt_msg,
|
||||
assert_before,
|
||||
)
|
||||
from ..conftest import (
|
||||
_ci_env,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ..conftest import PexpectSpawner
|
||||
|
|
@ -55,14 +51,13 @@ if TYPE_CHECKING:
|
|||
# - recurrent root errors
|
||||
|
||||
|
||||
_non_linux: bool = platform.system() != 'Linux'
|
||||
|
||||
if platform.system() == 'Windows':
|
||||
pytest.skip(
|
||||
'Debugger tests have no windows support (yet)',
|
||||
allow_module_level=True,
|
||||
)
|
||||
|
||||
|
||||
# TODO: was trying to this xfail style but some weird bug i see in CI
|
||||
# that's happening at collect time.. pretty soon gonna dump actions i'm
|
||||
# thinkin...
|
||||
|
|
@ -198,11 +193,6 @@ def test_root_actor_bp_forever(
|
|||
child.expect(EOF)
|
||||
|
||||
|
||||
# skip on non-Linux CI
|
||||
@pytest.mark.ctlcs_bish(
|
||||
_non_linux,
|
||||
_ci_env,
|
||||
)
|
||||
@pytest.mark.parametrize(
|
||||
'do_next',
|
||||
(True, False),
|
||||
|
|
@ -268,11 +258,6 @@ def test_subactor_error(
|
|||
child.expect(EOF)
|
||||
|
||||
|
||||
# skip on non-Linux CI
|
||||
@pytest.mark.ctlcs_bish(
|
||||
_non_linux,
|
||||
_ci_env,
|
||||
)
|
||||
def test_subactor_breakpoint(
|
||||
spawn,
|
||||
ctlc: bool,
|
||||
|
|
@ -344,7 +329,6 @@ def test_subactor_breakpoint(
|
|||
def test_multi_subactors(
|
||||
spawn,
|
||||
ctlc: bool,
|
||||
set_fork_aware_capture,
|
||||
):
|
||||
'''
|
||||
Multiple subactors, both erroring and
|
||||
|
|
@ -489,32 +473,15 @@ def test_multi_subactors(
|
|||
def test_multi_daemon_subactors(
|
||||
spawn,
|
||||
loglevel: str,
|
||||
ctlc: bool,
|
||||
set_fork_aware_capture,
|
||||
ctlc: bool
|
||||
):
|
||||
'''
|
||||
Multiple daemon subactors, both erroring and breakpointing within
|
||||
a stream.
|
||||
Multiple daemon subactors, both erroring and breakpointing within a
|
||||
stream.
|
||||
|
||||
'''
|
||||
non_linux = _non_linux
|
||||
if non_linux and ctlc:
|
||||
pytest.skip(
|
||||
'Ctl-c + MacOS is too unreliable/racy for this test..\n'
|
||||
)
|
||||
# !TODO, if someone with more patience then i wants to muck
|
||||
# with the timings on this please feel free to see all the
|
||||
# `non_linux` branching logic i added on my first attempt
|
||||
# below!
|
||||
#
|
||||
# my conclusion was that if i were to run the script
|
||||
# manually, and thus as slowly as a human would, the test
|
||||
# would and should pass as described in this test fn, however
|
||||
# after fighting with it for >= 1hr. i decided more then
|
||||
# likely the more extensive `linux` testing should cover most
|
||||
# regressions.
|
||||
|
||||
child = spawn('multi_daemon_subactors')
|
||||
|
||||
child.expect(PROMPT)
|
||||
|
||||
# there can be a race for which subactor will acquire
|
||||
|
|
@ -544,19 +511,8 @@ def test_multi_daemon_subactors(
|
|||
else:
|
||||
raise ValueError('Neither log msg was found !?')
|
||||
|
||||
non_linux_delay: float = 0.3
|
||||
if ctlc:
|
||||
do_ctlc(
|
||||
child,
|
||||
delay=(
|
||||
non_linux_delay
|
||||
if non_linux
|
||||
else None
|
||||
),
|
||||
)
|
||||
|
||||
if non_linux:
|
||||
time.sleep(1)
|
||||
do_ctlc(child)
|
||||
|
||||
# NOTE: previously since we did not have clobber prevention
|
||||
# in the root actor this final resume could result in the debugger
|
||||
|
|
@ -587,69 +543,33 @@ def test_multi_daemon_subactors(
|
|||
# assert "in use by child ('bp_forever'," in before
|
||||
|
||||
if ctlc:
|
||||
do_ctlc(
|
||||
child,
|
||||
delay=(
|
||||
non_linux_delay
|
||||
if non_linux
|
||||
else None
|
||||
),
|
||||
)
|
||||
|
||||
if non_linux:
|
||||
time.sleep(1)
|
||||
do_ctlc(child)
|
||||
|
||||
# expect another breakpoint actor entry
|
||||
child.sendline('c')
|
||||
child.expect(PROMPT)
|
||||
|
||||
try:
|
||||
before: str = assert_before(
|
||||
assert_before(
|
||||
child,
|
||||
bp_forev_parts,
|
||||
)
|
||||
except (
|
||||
# AssertionError, # TODO? rm since never raised?
|
||||
ValueError,
|
||||
):
|
||||
before: str = assert_before(
|
||||
except AssertionError:
|
||||
assert_before(
|
||||
child,
|
||||
name_error_parts,
|
||||
)
|
||||
|
||||
else:
|
||||
if ctlc:
|
||||
before: str = do_ctlc(
|
||||
child,
|
||||
delay=(
|
||||
non_linux_delay
|
||||
if non_linux
|
||||
else None
|
||||
),
|
||||
)
|
||||
|
||||
if non_linux:
|
||||
time.sleep(1)
|
||||
do_ctlc(child)
|
||||
|
||||
# should crash with the 2nd name error (simulates
|
||||
# a retry) and then the root eventually (boxed) errors
|
||||
# after 1 or more further bp actor entries.
|
||||
|
||||
child.sendline('c')
|
||||
try:
|
||||
child.expect(
|
||||
PROMPT,
|
||||
timeout=3,
|
||||
)
|
||||
except EOF:
|
||||
before: str = child.before.decode()
|
||||
print(
|
||||
f'\n'
|
||||
f'??? NEVER RXED `pdb` PROMPT ???\n'
|
||||
f'\n'
|
||||
f'{before}\n'
|
||||
)
|
||||
raise
|
||||
|
||||
child.expect(PROMPT)
|
||||
assert_before(
|
||||
child,
|
||||
name_error_parts,
|
||||
|
|
@ -769,10 +689,7 @@ def test_multi_subactors_root_errors(
|
|||
|
||||
@has_nested_actors
|
||||
def test_multi_nested_subactors_error_through_nurseries(
|
||||
ci_env: bool,
|
||||
spawn: PexpectSpawner,
|
||||
is_forking_spawner: bool,
|
||||
test_log: tractor.log.StackLevelAdapter,
|
||||
spawn,
|
||||
|
||||
# TODO: address debugger issue for nested tree:
|
||||
# https://github.com/goodboy/tractor/issues/320
|
||||
|
|
@ -789,105 +706,51 @@ def test_multi_nested_subactors_error_through_nurseries(
|
|||
# A test (below) has now been added to explicitly verify this is
|
||||
# fixed.
|
||||
|
||||
child = spawn(
|
||||
'multi_nested_subactors_error_up_through_nurseries',
|
||||
loglevel='pdb',
|
||||
)
|
||||
last_send_char: str|None = None
|
||||
for (
|
||||
i,
|
||||
send_char,
|
||||
) in enumerate(itertools.cycle(['c', 'q'])):
|
||||
child = spawn('multi_nested_subactors_error_up_through_nurseries')
|
||||
|
||||
timeout: float = child.timeout
|
||||
if (
|
||||
_non_linux
|
||||
and
|
||||
ci_env
|
||||
):
|
||||
timeout: float = 6
|
||||
|
||||
# XXX linux but the first crash sequence
|
||||
# can take longer to arrive at a prompt.
|
||||
elif i == 0:
|
||||
timeout = 5
|
||||
|
||||
# XXX forking backends may take longer due to
|
||||
# determinstic IPC cancellation.
|
||||
if is_forking_spawner:
|
||||
timeout += 4
|
||||
# timed_out_early: bool = False
|
||||
|
||||
for send_char in itertools.cycle(['c', 'q']):
|
||||
try:
|
||||
child.expect(
|
||||
PROMPT,
|
||||
timeout=timeout,
|
||||
)
|
||||
delay: float = 0.1
|
||||
test_log.info('Sleeping {delay!r} before next send-chart..')
|
||||
time.sleep(delay)
|
||||
last_send_char: str = send_char
|
||||
child.expect(PROMPT)
|
||||
child.sendline(send_char)
|
||||
time.sleep(delay)
|
||||
time.sleep(0.01)
|
||||
|
||||
# script finally exited with tb on console.
|
||||
except EOF:
|
||||
test_log.info(
|
||||
f'Breaking from send-char loop'
|
||||
f'last_send_char: {last_send_char!r}\n'
|
||||
)
|
||||
break
|
||||
|
||||
# boxed source errors
|
||||
expect_patts: list[str] = [
|
||||
"NameError: name 'doggypants' is not defined",
|
||||
"tractor._exceptions.RemoteActorError:",
|
||||
"('name_error'",
|
||||
|
||||
# first level subtrees
|
||||
# "tractor._exceptions.RemoteActorError: ('spawner0'",
|
||||
"src_uid=('spawner0'",
|
||||
|
||||
# "tractor._exceptions.RemoteActorError: ('spawner1'",
|
||||
|
||||
# propagation of errors up through nested subtrees
|
||||
# "tractor._exceptions.RemoteActorError: ('spawn_until_0'",
|
||||
# "tractor._exceptions.RemoteActorError: ('spawn_until_1'",
|
||||
# "tractor._exceptions.RemoteActorError: ('spawn_until_2'",
|
||||
# ^-NOTE-^ old RAE repr, new one is below with a field
|
||||
# showing the src actor's uid.
|
||||
"src_uid=('spawn_until_2'",
|
||||
]
|
||||
# XXX, I HAVE NO IDEA why these patts only show on the
|
||||
# `trio`-spawner but it seems to have something to do with
|
||||
# what gets dumped in prior-prompt latches somehow??
|
||||
# TODO for claude, explain and or work through how this is
|
||||
# happening but ONLY WHEN RUN FROM THE TEST, bc when i try to
|
||||
# run the test script manually the correct output ALWAYS seems
|
||||
# to be in the last `str(child.before.decode())` output !?!?
|
||||
if (
|
||||
not is_forking_spawner
|
||||
and
|
||||
last_send_char == 'q'
|
||||
):
|
||||
expect_patts += [
|
||||
# expect the pdb-quit exc.
|
||||
"bdb.BdbQuit",
|
||||
# BUT WHY these dude!?
|
||||
"src_uid=('spawn_until_0'",
|
||||
"relay_uid=('spawn_until_1'",
|
||||
]
|
||||
|
||||
assert_before(
|
||||
child,
|
||||
expect_patts,
|
||||
[ # boxed source errors
|
||||
"NameError: name 'doggypants' is not defined",
|
||||
"tractor._exceptions.RemoteActorError:",
|
||||
"('name_error'",
|
||||
"bdb.BdbQuit",
|
||||
|
||||
# first level subtrees
|
||||
# "tractor._exceptions.RemoteActorError: ('spawner0'",
|
||||
"src_uid=('spawner0'",
|
||||
|
||||
# "tractor._exceptions.RemoteActorError: ('spawner1'",
|
||||
|
||||
# propagation of errors up through nested subtrees
|
||||
# "tractor._exceptions.RemoteActorError: ('spawn_until_0'",
|
||||
# "tractor._exceptions.RemoteActorError: ('spawn_until_1'",
|
||||
# "tractor._exceptions.RemoteActorError: ('spawn_until_2'",
|
||||
# ^-NOTE-^ old RAE repr, new one is below with a field
|
||||
# showing the src actor's uid.
|
||||
"src_uid=('spawn_until_0'",
|
||||
"relay_uid=('spawn_until_1'",
|
||||
"src_uid=('spawn_until_2'",
|
||||
]
|
||||
)
|
||||
expect(child, EOF)
|
||||
|
||||
|
||||
# @pytest.mark.timeout(15)
|
||||
@pytest.mark.timeout(15)
|
||||
@has_nested_actors
|
||||
def test_root_nursery_cancels_before_child_releases_tty_lock(
|
||||
spawn,
|
||||
start_method,
|
||||
ctlc: bool,
|
||||
):
|
||||
'''
|
||||
|
|
@ -1026,11 +889,6 @@ def test_different_debug_mode_per_actor(
|
|||
)
|
||||
|
||||
|
||||
# skip on non-Linux CI
|
||||
@pytest.mark.ctlcs_bish(
|
||||
_non_linux,
|
||||
_ci_env,
|
||||
)
|
||||
def test_post_mortem_api(
|
||||
spawn,
|
||||
ctlc: bool,
|
||||
|
|
@ -1186,12 +1044,7 @@ def test_shield_pause(
|
|||
"('cancelled_before_pause'", # actor name
|
||||
_repl_fail_msg,
|
||||
"trio.Cancelled",
|
||||
# trio >=0.30 raises via a multi-line
|
||||
# `raise Cancelled._create(source=.., reason=..,
|
||||
# source_task=..)` (cancel-reason metadata), so
|
||||
# match the open-paren form only, NOT the legacy
|
||||
# bare `()`.
|
||||
"raise Cancelled._create(",
|
||||
"raise Cancelled._create()",
|
||||
|
||||
# we should be handling a taskc inside
|
||||
# the first `.port_mortem()` sin-shield!
|
||||
|
|
@ -1209,12 +1062,7 @@ def test_shield_pause(
|
|||
"('root'", # actor name
|
||||
_repl_fail_msg,
|
||||
"trio.Cancelled",
|
||||
# trio >=0.30 raises via a multi-line
|
||||
# `raise Cancelled._create(source=.., reason=..,
|
||||
# source_task=..)` (cancel-reason metadata), so
|
||||
# match the open-paren form only, NOT the legacy
|
||||
# bare `()`.
|
||||
"raise Cancelled._create(",
|
||||
"raise Cancelled._create()",
|
||||
|
||||
# handling a taskc inside the first unshielded
|
||||
# `.port_mortem()`.
|
||||
|
|
@ -1239,11 +1087,7 @@ def test_ctxep_pauses_n_maybe_ipc_breaks(
|
|||
mashed and zombie reaper kills sub with no hangs.
|
||||
|
||||
'''
|
||||
child = spawn(
|
||||
'subactor_bp_in_ctx',
|
||||
loglevel='devx'
|
||||
# ^XXX REQUIRED for below patt matching!
|
||||
)
|
||||
child = spawn('subactor_bp_in_ctx')
|
||||
child.expect(PROMPT)
|
||||
|
||||
# 3 iters for the `gen()` pause-points
|
||||
|
|
@ -1289,21 +1133,12 @@ def test_ctxep_pauses_n_maybe_ipc_breaks(
|
|||
# closed so verify we see error reporting as well as
|
||||
# a failed crash-REPL request msg and can CTL-c our way
|
||||
# out.
|
||||
|
||||
# ?TODO, match depending on `tpt_proto(s)`?
|
||||
# - [ ] how can we pass it into the script tho?
|
||||
tpt: str = 'UDS'
|
||||
if _non_linux:
|
||||
tpt: str = 'TCP'
|
||||
|
||||
assert_before(
|
||||
child,
|
||||
['peer IPC channel closed abruptly?',
|
||||
'another task closed this fd',
|
||||
'Debug lock request was CANCELLED?',
|
||||
f"'Msgpack{tpt}Stream' was already closed locally?",
|
||||
f"TransportClosed: 'Msgpack{tpt}Stream' was already closed 'by peer'?",
|
||||
]
|
||||
"TransportClosed: 'MsgpackUDSStream' was already closed locally ?",]
|
||||
|
||||
# XXX races on whether these show/hit?
|
||||
# 'Failed to REPl via `_pause()` You called `tractor.pause()` from an already cancelled scope!',
|
||||
|
|
@ -1333,11 +1168,7 @@ def test_crash_handling_within_cancelled_root_actor(
|
|||
call.
|
||||
|
||||
'''
|
||||
child = spawn(
|
||||
'root_self_cancelled_w_error',
|
||||
loglevel='cancel',
|
||||
# ^XXX REQUIRED for below patt matching!
|
||||
)
|
||||
child = spawn('root_self_cancelled_w_error')
|
||||
child.expect(PROMPT)
|
||||
|
||||
assert_before(
|
||||
|
|
|
|||
|
|
@ -63,31 +63,19 @@ def test_pause_from_sync(
|
|||
`examples/debugging/sync_bp.py`
|
||||
|
||||
'''
|
||||
# XXX required for `breakpoint()` overload and
|
||||
# thus`tractor.devx.pause_from_sync()`.
|
||||
pytest.importorskip('greenback')
|
||||
child = spawn(
|
||||
'sync_bp',
|
||||
loglevel='pdb', # XXX pattern matching
|
||||
)
|
||||
child = spawn('sync_bp')
|
||||
|
||||
# first `sync_pause()` after nurseries open
|
||||
child.expect(PROMPT)
|
||||
_before: str = assert_before(
|
||||
assert_before(
|
||||
child,
|
||||
[
|
||||
# devx-loglevel
|
||||
# "imported <module 'greenback' from",
|
||||
# "successfully scheduled `._pause()` in `trio` thread on behalf of <Task",
|
||||
|
||||
_pause_msg, # pre-prompt line
|
||||
"('root'",
|
||||
# pre-prompt line
|
||||
_pause_msg,
|
||||
"<Task '__main__.main'",
|
||||
"tractor.pause_from_sync()",
|
||||
"('root'",
|
||||
]
|
||||
)
|
||||
# XXX `enable_stack_on_sig=False` in script
|
||||
assert 'stackscope' not in _before
|
||||
if ctlc:
|
||||
do_ctlc(child)
|
||||
# ^NOTE^ subactor not spawned yet; don't need extra delay.
|
||||
|
|
@ -97,18 +85,18 @@ def test_pause_from_sync(
|
|||
# first `await tractor.pause()` inside `p.open_context()` body
|
||||
child.expect(PROMPT)
|
||||
|
||||
# XXX shouldn't see gb loaded message with PDB loglevel!
|
||||
# assert not in_prompt_msg(
|
||||
# child,
|
||||
# ['`greenback` portal opened!'],
|
||||
# )
|
||||
# should be same root task
|
||||
assert_before(
|
||||
child,
|
||||
[
|
||||
# XXX should see gb loaded with devx-loglevel.
|
||||
# "`greenback` portal opened!",
|
||||
# "Activated `greenback` for `tractor.pause_from_sync()` support!",
|
||||
|
||||
_pause_msg,
|
||||
"('root'",
|
||||
"<Task '__main__.main'",
|
||||
"tractor.pause()",
|
||||
"('root'",
|
||||
]
|
||||
)
|
||||
|
||||
|
|
@ -139,17 +127,17 @@ def test_pause_from_sync(
|
|||
# `Lock.acquire()`-ed
|
||||
# (NOT both, which will result in REPL clobbering!)
|
||||
attach_patts: dict[str, list[str]] = {
|
||||
"|_<Task 'start_n_sync_pause'": [
|
||||
"|_('subactor'",
|
||||
"tractor.pause_from_sync()",
|
||||
'subactor': [
|
||||
"'start_n_sync_pause'",
|
||||
"('subactor'",
|
||||
],
|
||||
"|_<Thread(inline_root_bg_thread": [
|
||||
'inline_root_bg_thread': [
|
||||
"<Thread(inline_root_bg_thread",
|
||||
"('root'",
|
||||
"breakpoint(hide_tb=hide_tb)",
|
||||
],
|
||||
"|_<Thread(start_soon_root_bg_thread": [
|
||||
"|_('root'",
|
||||
"tractor.pause_from_sync()",
|
||||
'start_soon_root_bg_thread': [
|
||||
"<Thread(start_soon_root_bg_thread",
|
||||
"('root'",
|
||||
],
|
||||
}
|
||||
conts: int = 0 # for debugging below matching logic on failure
|
||||
|
|
@ -272,9 +260,6 @@ def test_sync_pause_from_aio_task(
|
|||
`examples/debugging/asycio_bp.py`
|
||||
|
||||
'''
|
||||
# XXX required for `breakpoint()` overload and
|
||||
# thus`tractor.devx.pause_from_sync()`.
|
||||
pytest.importorskip('greenback')
|
||||
child = spawn('asyncio_bp')
|
||||
|
||||
# RACE on whether trio/asyncio task bps first
|
||||
|
|
|
|||
|
|
@ -1,178 +0,0 @@
|
|||
'''
|
||||
Tests for `tractor.devx._proctitle` (per-actor `setproctitle`)
|
||||
and the intrinsic-signal sub-actor detection in
|
||||
`tractor._testing._reap`.
|
||||
|
||||
The proctitle is set in `tractor._child._actor_child_main()`
|
||||
after `Actor` construction, so any spawned sub-actor process
|
||||
should:
|
||||
|
||||
- have `argv[0]` (== `/proc/<pid>/cmdline`) start with
|
||||
`<_def_prefix>[<aid.reprol()>]` (currently `_subactor[…]`)
|
||||
- have `/proc/<pid>/comm` start with `<_def_prefix>[`
|
||||
(kernel truncates to ~15 bytes)
|
||||
- be detected as a tractor sub-actor by
|
||||
`_is_tractor_subactor(pid)` via the cmdline marker.
|
||||
|
||||
`set_actor_proctitle()` itself is also unit-tested in-process
|
||||
to verify the format string.
|
||||
|
||||
'''
|
||||
from __future__ import annotations
|
||||
import platform
|
||||
|
||||
import psutil
|
||||
import pytest
|
||||
import trio
|
||||
import tractor
|
||||
|
||||
from tractor.runtime._runtime import Actor
|
||||
from tractor.devx._proctitle import (
|
||||
set_actor_proctitle,
|
||||
_def_prefix,
|
||||
)
|
||||
from tractor._testing._reap import (
|
||||
_is_tractor_subactor,
|
||||
_read_cmdline,
|
||||
_read_comm,
|
||||
)
|
||||
|
||||
|
||||
_non_linux: bool = platform.system() != 'Linux'
|
||||
|
||||
|
||||
def test_set_actor_proctitle_format():
|
||||
'''
|
||||
`set_actor_proctitle()` returns the canonical
|
||||
`<_def_prefix>[<aid.reprol()>]` form (currently
|
||||
`_subactor[…]`) and actually mutates the running
|
||||
proc's title.
|
||||
|
||||
'''
|
||||
pytest.importorskip(
|
||||
'setproctitle',
|
||||
reason='`setproctitle` is an optional runtime dep',
|
||||
)
|
||||
import setproctitle
|
||||
|
||||
# save + restore so we don't pollute pytest's own title
|
||||
saved: str = setproctitle.getproctitle()
|
||||
try:
|
||||
actor = Actor(
|
||||
name='unit_test_actor',
|
||||
uuid='1027301b-a0e3-430e-8806-a5279f21abe6',
|
||||
)
|
||||
title: str = set_actor_proctitle(actor)
|
||||
|
||||
# canonical wrapping: `<_def_prefix>[<aid.reprol()>]`.
|
||||
# We source BOTH the prefix (`_def_prefix`) and the
|
||||
# runtime-computed `reprol()` rather than hard-coding,
|
||||
# so the test stays decoupled from the prefix shape
|
||||
# (flipped to `_subactor` in `3a45dbd5`) AND from
|
||||
# `Aid.reprol()`'s internal format (currently
|
||||
# `<name>@<pid>`, but could evolve).
|
||||
expected: str = f'{_def_prefix}[{actor.aid.reprol()}]'
|
||||
assert title == expected
|
||||
# sanity: the actor's name must be in the title
|
||||
# somewhere (so a future `reprol()` change that
|
||||
# drops the name is also caught).
|
||||
assert 'unit_test_actor' in title
|
||||
|
||||
# actually set on the running proc
|
||||
assert setproctitle.getproctitle() == title
|
||||
|
||||
finally:
|
||||
setproctitle.setproctitle(saved)
|
||||
|
||||
|
||||
@pytest.mark.skipif(
|
||||
_non_linux,
|
||||
reason=(
|
||||
'detection helpers read `/proc/<pid>/{cmdline,comm}` '
|
||||
'which is Linux-specific'
|
||||
),
|
||||
)
|
||||
def test_subactor_proctitle_visible_via_proc():
|
||||
'''
|
||||
Spawn a sub-actor and verify its proc-title is visible
|
||||
via both `/proc/<pid>/cmdline` AND `/proc/<pid>/comm`,
|
||||
AND that `_is_tractor_subactor()` correctly identifies
|
||||
it.
|
||||
|
||||
'''
|
||||
pytest.importorskip('setproctitle')
|
||||
|
||||
async def main() -> dict:
|
||||
async with tractor.open_nursery() as an:
|
||||
portal = await an.start_actor('proctitle_boi')
|
||||
# let the child finish setproctitle in
|
||||
# `_actor_child_main`
|
||||
await trio.sleep(0.3)
|
||||
|
||||
# the sub-actor's pid is on the portal's chan
|
||||
# repr; psutil-walk `me.children()` is simpler.
|
||||
me = psutil.Process()
|
||||
sub_pids: list[int] = [
|
||||
p.pid for p in me.children(recursive=True)
|
||||
]
|
||||
assert sub_pids, (
|
||||
'expected at least one spawned sub-actor pid'
|
||||
)
|
||||
|
||||
results: dict = {}
|
||||
for pid in sub_pids:
|
||||
results[pid] = {
|
||||
'cmdline': _read_cmdline(pid),
|
||||
'comm': _read_comm(pid),
|
||||
'is_tractor': _is_tractor_subactor(pid),
|
||||
}
|
||||
|
||||
await portal.cancel_actor()
|
||||
return results
|
||||
|
||||
found: dict = trio.run(main)
|
||||
|
||||
# at least one of the spawned procs should match the
|
||||
# `proctitle_boi` actor we started; assert the proc-
|
||||
# title shape on it specifically.
|
||||
matched: list[tuple[int, dict]] = [
|
||||
(pid, info)
|
||||
for pid, info in found.items()
|
||||
if 'proctitle_boi' in info['cmdline']
|
||||
]
|
||||
assert matched, (
|
||||
f'no sub-actor pid had a `proctitle_boi` cmdline; '
|
||||
f'all={found}'
|
||||
)
|
||||
|
||||
pid, info = matched[0]
|
||||
# canonical proctitle prefix in cmdline (full form);
|
||||
# prefix sourced from `_def_prefix` so it tracks the
|
||||
# `3a45dbd5` flip (`tractor[` -> `_subactor[`).
|
||||
assert info['cmdline'].startswith(f'{_def_prefix}[proctitle_boi@'), (
|
||||
f'cmdline missing `{_def_prefix}[proctitle_boi@…]` prefix: '
|
||||
f'{info["cmdline"]!r}'
|
||||
)
|
||||
# comm is kernel-truncated to ~15 bytes — just check the
|
||||
# `<_def_prefix>[` prefix made it.
|
||||
assert info['comm'].startswith(f'{_def_prefix}['), (
|
||||
f'comm missing `{_def_prefix}[` prefix: {info["comm"]!r}'
|
||||
)
|
||||
# intrinsic-signal detector should match.
|
||||
assert info['is_tractor'] is True
|
||||
|
||||
|
||||
@pytest.mark.skipif(
|
||||
_non_linux,
|
||||
reason='reads /proc/<pid>/{cmdline,comm}',
|
||||
)
|
||||
def test_is_tractor_subactor_negative():
|
||||
'''
|
||||
`_is_tractor_subactor()` returns False for non-tractor
|
||||
procs (e.g. the pytest test-runner pid itself, which
|
||||
is `python -m pytest …` — no `tractor[` proctitle, no
|
||||
`tractor._child` cmdline).
|
||||
|
||||
'''
|
||||
import os
|
||||
assert _is_tractor_subactor(os.getpid()) is False
|
||||
|
|
@ -21,7 +21,6 @@ import os
|
|||
import signal
|
||||
import time
|
||||
from typing import (
|
||||
Callable,
|
||||
TYPE_CHECKING,
|
||||
)
|
||||
|
||||
|
|
@ -32,9 +31,6 @@ from .conftest import (
|
|||
PROMPT,
|
||||
_pause_msg,
|
||||
)
|
||||
from ..conftest import (
|
||||
no_macos,
|
||||
)
|
||||
|
||||
import pytest
|
||||
from pexpect.exceptions import (
|
||||
|
|
@ -46,14 +42,8 @@ if TYPE_CHECKING:
|
|||
from ..conftest import PexpectSpawner
|
||||
|
||||
|
||||
@no_macos
|
||||
def test_shield_pause(
|
||||
spawn: Callable[
|
||||
...,
|
||||
PexpectSpawner,
|
||||
],
|
||||
start_method: str,
|
||||
request: pytest.FixtureRequest,
|
||||
spawn: PexpectSpawner,
|
||||
):
|
||||
'''
|
||||
Verify the `tractor.pause()/.post_mortem()` API works inside an
|
||||
|
|
@ -61,15 +51,12 @@ def test_shield_pause(
|
|||
next checkpoint wherein the cancelled will get raised.
|
||||
|
||||
'''
|
||||
child: PexpectSpawner = spawn(
|
||||
'shield_hang_in_sub',
|
||||
loglevel='devx',
|
||||
# ^XXX REQUIRED for below patt matching!
|
||||
child = spawn(
|
||||
'shield_hang_in_sub'
|
||||
)
|
||||
expect(
|
||||
child,
|
||||
'Yo my child hanging..?',
|
||||
timeout=3,
|
||||
)
|
||||
assert_before(
|
||||
child,
|
||||
|
|
@ -94,82 +81,38 @@ def test_shield_pause(
|
|||
# end-of-tree delimiter
|
||||
"end-of-\('root'",
|
||||
)
|
||||
_before: str = assert_before(
|
||||
assert_before(
|
||||
child,
|
||||
[
|
||||
# 'Srying to dump `stackscope` tree..',
|
||||
# 'Dumping `stackscope` tree for actor',
|
||||
"('root'", # uid line
|
||||
|
||||
# TODO!? this in-task-code used to show??
|
||||
# TODO!? this used to show?
|
||||
# -[ ] mk reproducable for @oremanj?
|
||||
# => SOLVED? by our `trio_token.run_sync_soon()`
|
||||
# approach?
|
||||
#
|
||||
# parent block point (non-shielded)
|
||||
# 'await trio.sleep_forever() # in root',
|
||||
]
|
||||
)
|
||||
expect(
|
||||
child,
|
||||
# end-of-tree delimiter
|
||||
"end-of-\('hanger'",
|
||||
)
|
||||
assert_before(
|
||||
child,
|
||||
[
|
||||
# relay to the sub should be reported
|
||||
'Relaying `SIGUSR1`[10] to sub-actor',
|
||||
|
||||
# NOTE, hierarchical-ordering invariant restored by
|
||||
# `_dump_then_relay` (co-scheduled dump+relay on the
|
||||
# trio loop, see `tractor.devx._stackscope`): the
|
||||
# parent's full task-tree prints BEFORE the 'Relaying
|
||||
# `SIGUSR1`' log msg, which prints BEFORE any sub-
|
||||
# actor receives the signal and dumps its own tree.
|
||||
# So the relay log appears BETWEEN `end-of-('root'`
|
||||
# (above) and `end-of-('hanger'` (below).
|
||||
handle_out_of_order: bool = False
|
||||
|
||||
# XXX, when capfd is NOT used we don't expect to
|
||||
# see the logging output from the subactor.
|
||||
if (no_capfd := (start_method in [
|
||||
'main_thread_forkserver',
|
||||
])
|
||||
):
|
||||
opts = request.config.option
|
||||
assert opts.spawn_backend == start_method
|
||||
# ?XXX? i guess the `testdir` fixture "pretends to" reset
|
||||
# this to the default 'fd'??
|
||||
# assert opts.capture in [
|
||||
# 'sys',
|
||||
# 'no',
|
||||
# ]
|
||||
|
||||
if (
|
||||
handle_out_of_order
|
||||
and
|
||||
"end-of-('hanger'" in _before
|
||||
):
|
||||
assert "('hanger'" in _before
|
||||
assert 'Relaying `SIGUSR1`[10] to sub-actor' in _before
|
||||
|
||||
else:
|
||||
_before = expect(
|
||||
child,
|
||||
'Relaying `SIGUSR1`\\[10\\] to sub-actor',
|
||||
)
|
||||
# _before: str = assert_before(
|
||||
# child,
|
||||
# ["('hanger'",] # uid line
|
||||
# )
|
||||
if not no_capfd:
|
||||
expect(
|
||||
child,
|
||||
# end-of-subactor's-tree delimiter
|
||||
"end-of-\('hanger'",
|
||||
)
|
||||
_before: str = assert_before(
|
||||
child,
|
||||
[
|
||||
"('hanger'", # uid line
|
||||
|
||||
# TODO!? SEE ABOVE
|
||||
# hanger LOC where it's shield-halted
|
||||
# 'await trio.sleep_forever() # in subactor',
|
||||
]
|
||||
)
|
||||
"('hanger'", # uid line
|
||||
|
||||
# TODO!? SEE ABOVE
|
||||
# hanger LOC where it's shield-halted
|
||||
# 'await trio.sleep_forever() # in subactor',
|
||||
]
|
||||
)
|
||||
|
||||
# simulate the user sending a ctl-c to the hanging program.
|
||||
# this should result in the terminator kicking in since
|
||||
|
|
@ -178,26 +121,21 @@ def test_shield_pause(
|
|||
child.pid,
|
||||
signal.SIGINT,
|
||||
)
|
||||
from tractor.runtime._supervise import _shutdown_msg
|
||||
from tractor._supervise import _shutdown_msg
|
||||
expect(
|
||||
child,
|
||||
# 'Shutting down actor runtime',
|
||||
_shutdown_msg,
|
||||
timeout=6,
|
||||
)
|
||||
expect_on_teardown: list[str] = [
|
||||
'raise KeyboardInterrupt',
|
||||
'Root actor terminated',
|
||||
]
|
||||
if not no_capfd:
|
||||
expect_on_teardown += [
|
||||
assert_before(
|
||||
child,
|
||||
[
|
||||
'raise KeyboardInterrupt',
|
||||
# 'Shutting down actor runtime',
|
||||
'#T-800 deployed to collect zombie B0',
|
||||
"'--uid', \"('hanger',",
|
||||
]
|
||||
assert_before(
|
||||
child,
|
||||
expect_on_teardown,
|
||||
)
|
||||
|
||||
|
||||
|
|
@ -213,10 +151,8 @@ def test_breakpoint_hook_restored(
|
|||
calls used.
|
||||
|
||||
'''
|
||||
# XXX required for `breakpoint()` overload and
|
||||
# thus`tractor.devx.pause_from_sync()`.
|
||||
pytest.importorskip('greenback')
|
||||
child = spawn('restore_builtin_breakpoint')
|
||||
|
||||
child.expect(PROMPT)
|
||||
try:
|
||||
assert_before(
|
||||
|
|
|
|||
|
|
@ -1,223 +0,0 @@
|
|||
'''
|
||||
Discovery-suite fixtures, including the `daemon`
|
||||
remote-registrar subprocess used by the multi-program
|
||||
discovery tests.
|
||||
|
||||
Lives here (vs. the parent `tests/conftest.py`)
|
||||
because `daemon` is a discovery-protocol primitive —
|
||||
boots a separate `tractor.run_daemon()` process whose
|
||||
sole purpose is to serve as a registrar peer for
|
||||
discovery-roundtrip tests. Pytest fixtures inherit
|
||||
DOWNWARD through conftest hierarchy, so anything
|
||||
under `tests/discovery/` automatically picks this up.
|
||||
|
||||
'''
|
||||
from __future__ import annotations
|
||||
import os
|
||||
import platform
|
||||
import socket
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
|
||||
import pytest
|
||||
import tractor
|
||||
|
||||
from ..conftest import (
|
||||
sig_prog,
|
||||
_INT_SIGNAL,
|
||||
_non_linux,
|
||||
)
|
||||
|
||||
|
||||
def _wait_for_daemon_ready(
|
||||
reg_addr: tuple,
|
||||
tpt_proto: str,
|
||||
*,
|
||||
deadline: float = 10.0,
|
||||
poll_interval: float = 0.05,
|
||||
proc: subprocess.Popen|None = None,
|
||||
) -> None:
|
||||
'''
|
||||
Active-poll the daemon's bind address until it
|
||||
accepts a connection (proving it has called
|
||||
`bind() + listen()` and is ready to handle IPC).
|
||||
|
||||
Replaces the historical blind `time.sleep()` in the
|
||||
`daemon` fixture which was racy under load — see
|
||||
`ai/conc-anal/test_register_duplicate_name_daemon_connect_race_issue.md`.
|
||||
|
||||
Uses stdlib `socket` directly (no trio runtime
|
||||
bootstrap cost) — sufficient because
|
||||
`tractor.run_daemon()` doesn't return from
|
||||
bootstrap until the runtime is fully ready to
|
||||
accept IPC.
|
||||
|
||||
Raises `TimeoutError` on `deadline` exceeded. If
|
||||
`proc` is given, ALSO raises early if the daemon
|
||||
process exits non-zero before the deadline (catches
|
||||
daemon-startup-crash that the blind sleep used to
|
||||
silently mask).
|
||||
|
||||
'''
|
||||
end: float = time.monotonic() + deadline
|
||||
last_exc: Exception|None = None
|
||||
while time.monotonic() < end:
|
||||
# Daemon-died-during-startup early-exit. Without
|
||||
# this, a crashed-on-import daemon would just
|
||||
# eat the full deadline before raising opaque
|
||||
# TimeoutError.
|
||||
if proc is not None and proc.poll() is not None:
|
||||
raise RuntimeError(
|
||||
f'Daemon proc exited (rc={proc.returncode}) '
|
||||
f'before becoming ready to accept on '
|
||||
f'{reg_addr!r}'
|
||||
)
|
||||
try:
|
||||
if tpt_proto == 'tcp':
|
||||
# `socket.create_connection` does the
|
||||
# `socket() + connect()` dance with a
|
||||
# builtin timeout — perfect primitive
|
||||
# for a one-shot probe.
|
||||
with socket.create_connection(
|
||||
reg_addr,
|
||||
timeout=poll_interval,
|
||||
):
|
||||
return
|
||||
else:
|
||||
# UDS — `reg_addr` is a `(filedir, sockname)`
|
||||
# tuple per `tractor.ipc._uds.UDSAddress.unwrap`.
|
||||
sockpath: str = os.path.join(*reg_addr)
|
||||
sock = socket.socket(socket.AF_UNIX)
|
||||
try:
|
||||
sock.settimeout(poll_interval)
|
||||
sock.connect(sockpath)
|
||||
return
|
||||
finally:
|
||||
sock.close()
|
||||
except (
|
||||
ConnectionRefusedError,
|
||||
FileNotFoundError,
|
||||
OSError,
|
||||
socket.timeout,
|
||||
) as exc:
|
||||
last_exc = exc
|
||||
time.sleep(poll_interval)
|
||||
raise TimeoutError(
|
||||
f'Daemon never accepted on {reg_addr!r} within '
|
||||
f'{deadline}s (last connect-attempt exc: '
|
||||
f'{last_exc!r})'
|
||||
)
|
||||
|
||||
|
||||
# TODO: factor into @cm and move to `._testing`?
|
||||
@pytest.fixture
|
||||
def daemon(
|
||||
debug_mode: bool,
|
||||
loglevel: str,
|
||||
testdir: pytest.Pytester,
|
||||
reg_addr: tuple[str, int],
|
||||
tpt_proto: str,
|
||||
ci_env: bool,
|
||||
test_log: tractor.log.StackLevelAdapter,
|
||||
|
||||
) -> subprocess.Popen:
|
||||
'''
|
||||
Run a daemon root actor as a separate actor-process
|
||||
tree and "remote registrar" for discovery-protocol
|
||||
related tests.
|
||||
|
||||
'''
|
||||
# XXX: too much logging will lock up the subproc (smh)
|
||||
if loglevel in ('trace', 'debug'):
|
||||
test_log.warning(
|
||||
f'Test harness log level is too verbose: {loglevel!r}\n'
|
||||
f'Reducing to INFO level..'
|
||||
)
|
||||
loglevel: str = 'info'
|
||||
|
||||
code: str = (
|
||||
"import tractor; "
|
||||
"tractor.run_daemon([], "
|
||||
"registry_addrs={reg_addrs}, "
|
||||
"enable_transports={enable_tpts}, "
|
||||
"debug_mode={debug_mode}, "
|
||||
"loglevel={ll})"
|
||||
).format(
|
||||
reg_addrs=str([reg_addr]),
|
||||
enable_tpts=str([tpt_proto]),
|
||||
ll="'{}'".format(loglevel) if loglevel else None,
|
||||
debug_mode=debug_mode,
|
||||
)
|
||||
cmd: list[str] = [
|
||||
sys.executable,
|
||||
'-c', code,
|
||||
]
|
||||
kwargs = {}
|
||||
if platform.system() == 'Windows':
|
||||
# without this, tests hang on windows forever
|
||||
kwargs['creationflags'] = subprocess.CREATE_NEW_PROCESS_GROUP
|
||||
|
||||
proc: subprocess.Popen = testdir.popen(
|
||||
cmd,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
# Active-poll the daemon's bind address until it's
|
||||
# ready to accept connections — replaces the legacy
|
||||
# blind `time.sleep(2.2)` which was racy under load
|
||||
# (see
|
||||
# `ai/conc-anal/test_register_duplicate_name_daemon_connect_race_issue.md`).
|
||||
#
|
||||
# Per-test deadline scales with platform: macOS/CI
|
||||
# gets extra headroom; Linux dev boxes need very
|
||||
# little.
|
||||
deadline: float = (
|
||||
15.0 if (_non_linux and ci_env)
|
||||
else 10.0
|
||||
)
|
||||
_wait_for_daemon_ready(
|
||||
reg_addr=reg_addr,
|
||||
tpt_proto=tpt_proto,
|
||||
deadline=deadline,
|
||||
proc=proc,
|
||||
)
|
||||
|
||||
assert not proc.returncode
|
||||
yield proc
|
||||
sig_prog(proc, _INT_SIGNAL)
|
||||
|
||||
# XXX! yeah.. just be reaaal careful with this bc
|
||||
# sometimes it can lock up on the `_io.BufferedReader`
|
||||
# and hang..
|
||||
#
|
||||
# NB, drain happens at TEARDOWN (post-yield), so the
|
||||
# test body has its chance to read `proc.stderr`
|
||||
# FIRST. Reading here AFTER would silently swallow
|
||||
# the daemon's stderr output and break tests that
|
||||
# assert on it (e.g. `test_abort_on_sigint`).
|
||||
stderr: str = proc.stderr.read().decode()
|
||||
stdout: str = proc.stdout.read().decode()
|
||||
if (
|
||||
stderr
|
||||
or
|
||||
stdout
|
||||
):
|
||||
print(
|
||||
f'Daemon actor tree produced output:\n'
|
||||
f'{proc.args}\n'
|
||||
f'\n'
|
||||
f'stderr: {stderr!r}\n'
|
||||
f'stdout: {stdout!r}\n'
|
||||
)
|
||||
|
||||
if (rc := proc.returncode) != -2:
|
||||
msg: str = (
|
||||
f'Daemon actor tree was not cancelled !?\n'
|
||||
f'proc.args: {proc.args!r}\n'
|
||||
f'proc.returncode: {rc!r}\n'
|
||||
)
|
||||
if rc < 0:
|
||||
raise RuntimeError(msg)
|
||||
|
||||
test_log.error(msg)
|
||||
|
|
@ -1,355 +0,0 @@
|
|||
"""
|
||||
Multiple python programs invoking the runtime.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
import platform
|
||||
import subprocess
|
||||
import time
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
)
|
||||
|
||||
import pytest
|
||||
import trio
|
||||
import tractor
|
||||
from tractor._testing import (
|
||||
tractor_test,
|
||||
)
|
||||
from tractor import (
|
||||
current_actor,
|
||||
Actor,
|
||||
Context,
|
||||
Portal,
|
||||
)
|
||||
from tractor.runtime import _state
|
||||
from ..conftest import (
|
||||
sig_prog,
|
||||
_INT_SIGNAL,
|
||||
_INT_RETURN_CODE,
|
||||
)
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from tractor.msg import Aid
|
||||
from tractor.discovery._addr import (
|
||||
UnwrappedAddress,
|
||||
)
|
||||
|
||||
|
||||
_non_linux: bool = platform.system() != 'Linux'
|
||||
|
||||
|
||||
# NOTE, multi-program tests historically triggered both
|
||||
# UDS sock-file leaks (daemon-subproc SIGKILL paths) AND
|
||||
# trio `WakeupSocketpair.drain()` busy-loops
|
||||
# (`test_register_duplicate_name`). Track + detect
|
||||
# per-test as a regression net.
|
||||
pytestmark = pytest.mark.usefixtures(
|
||||
'track_orphaned_uds_per_test',
|
||||
'detect_runaway_subactors_per_test',
|
||||
)
|
||||
|
||||
|
||||
def test_abort_on_sigint(
|
||||
daemon: subprocess.Popen,
|
||||
):
|
||||
assert daemon.returncode is None
|
||||
time.sleep(0.1)
|
||||
sig_prog(daemon, _INT_SIGNAL)
|
||||
assert daemon.returncode == _INT_RETURN_CODE
|
||||
|
||||
# XXX: oddly, couldn't get capfd.readouterr() to work here?
|
||||
if platform.system() != 'Windows':
|
||||
# don't check stderr on windows as its empty when sending CTRL_C_EVENT
|
||||
assert "KeyboardInterrupt" in str(daemon.stderr.read())
|
||||
|
||||
|
||||
@tractor_test
|
||||
async def test_cancel_remote_registrar(
|
||||
daemon: subprocess.Popen,
|
||||
reg_addr: UnwrappedAddress,
|
||||
):
|
||||
assert not current_actor().is_registrar
|
||||
async with tractor.get_registry(reg_addr) as portal:
|
||||
await portal.cancel_actor()
|
||||
|
||||
time.sleep(0.1)
|
||||
# the registrar channel server is cancelled but not its main task
|
||||
assert daemon.returncode is None
|
||||
|
||||
# no registrar socket should exist
|
||||
with pytest.raises(OSError):
|
||||
async with tractor.get_registry(reg_addr) as portal:
|
||||
pass
|
||||
|
||||
|
||||
def test_register_duplicate_name(
|
||||
daemon: subprocess.Popen,
|
||||
reg_addr: UnwrappedAddress,
|
||||
):
|
||||
# bug-class-3 breadcrumbs: the *last* `[CANCEL]` line that
|
||||
# appears under `--ll cancel`/`TRACTOR_LOG_FILE=...` names the
|
||||
# cancel-cascade boundary that's parked. Pair with
|
||||
# `_trio_main` entry/exit breadcrumbs in
|
||||
# `tractor/spawn/_entry.py` to triangulate the swallow point.
|
||||
log = tractor.log.get_logger('tractor.tests.test_multi_program')
|
||||
|
||||
async def main():
|
||||
log.cancel('test_register_duplicate_name: enter `main()`')
|
||||
try:
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as an:
|
||||
log.cancel(
|
||||
'test_register_duplicate_name: '
|
||||
'actor nursery opened'
|
||||
)
|
||||
|
||||
assert not current_actor().is_registrar
|
||||
|
||||
p1 = await an.start_actor('doggy')
|
||||
log.cancel(
|
||||
'test_register_duplicate_name: '
|
||||
'spawned doggy #1'
|
||||
)
|
||||
p2 = await an.start_actor('doggy')
|
||||
log.cancel(
|
||||
'test_register_duplicate_name: '
|
||||
'spawned doggy #2'
|
||||
)
|
||||
|
||||
async with tractor.wait_for_actor('doggy') as portal:
|
||||
log.cancel(
|
||||
'test_register_duplicate_name: '
|
||||
'`wait_for_actor` returned'
|
||||
)
|
||||
assert portal.channel.uid in (p2.channel.uid, p1.channel.uid)
|
||||
|
||||
log.cancel(
|
||||
'test_register_duplicate_name: '
|
||||
'ABOUT TO CALL `an.cancel()`'
|
||||
)
|
||||
await an.cancel()
|
||||
log.cancel(
|
||||
'test_register_duplicate_name: '
|
||||
'`an.cancel()` returned'
|
||||
)
|
||||
finally:
|
||||
log.cancel(
|
||||
'test_register_duplicate_name: '
|
||||
'`open_nursery.__aexit__` returned, leaving `main()`'
|
||||
)
|
||||
|
||||
# XXX, run manually since we want to start this root **after**
|
||||
# the other "daemon" program with it's own root.
|
||||
trio.run(main)
|
||||
|
||||
|
||||
# `n_dups` in {4, 8} both expose the SAME pre-existing race:
|
||||
# under rapid same-name spawning against a forkserver +
|
||||
# registrar, ONE of the spawned doggies `sys.exit(2)`s during
|
||||
# boot before completing parent-handshake. Surfaces now (post
|
||||
# the spawn-time `wait_for_peer_or_proc_death` fix) as
|
||||
# `ActorFailure rc=2`; previously it was silently masked by
|
||||
# the handshake-wait parking forever.
|
||||
#
|
||||
# Larger `n_dups` widens the race window so the boot-race
|
||||
# fires more often — n_dups=4 hits ~always, n_dups=8 hits
|
||||
# occasionally. Both xfail(strict=False) so the cancel-cascade
|
||||
# regression-check still passes when the boot-race happens
|
||||
# NOT to fire.
|
||||
#
|
||||
# Tracked separately in,
|
||||
# https://github.com/goodboy/tractor/issues/456
|
||||
_DOGGY_BOOT_RACE_XFAIL = pytest.mark.xfail(
|
||||
strict=False,
|
||||
reason=(
|
||||
'doggy boot-race rc=2 under rapid same-name '
|
||||
'spawn — separate bug from cancel-cascade'
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'n_dups',
|
||||
[
|
||||
2,
|
||||
pytest.param(4, marks=_DOGGY_BOOT_RACE_XFAIL),
|
||||
pytest.param(8, marks=_DOGGY_BOOT_RACE_XFAIL),
|
||||
],
|
||||
ids=lambda n: f'n_dups={n}',
|
||||
)
|
||||
def test_dup_name_cancel_cascade_escalates_to_hard_kill(
|
||||
daemon: subprocess.Popen,
|
||||
reg_addr: UnwrappedAddress,
|
||||
n_dups: int,
|
||||
):
|
||||
'''
|
||||
Regression for the duplicate-name cancel-cascade hang under
|
||||
`tcp+main_thread_forkserver`.
|
||||
|
||||
When N actors share a single name and the parent calls
|
||||
`an.cancel()`, the daemon registrar gets N `register_actor` RPCs
|
||||
in tight succession. Under TCP+MTF, kernel-level socket-buffer
|
||||
contention can push at least one sub-actor's cancel-RPC ack past
|
||||
`Portal.cancel_timeout` (default 0.5s).
|
||||
|
||||
Pre-fix, `Portal.cancel_actor()` silently returned `False` on
|
||||
that timeout, the supervisor's outer `move_on_after(3)` never
|
||||
fired (each per-portal task always returned ≤0.5s, never
|
||||
exceeded 3s), and `soft_kill()`'s `await wait_func(proc)` parked
|
||||
forever — deadlocking nursery `__aexit__`.
|
||||
|
||||
Post-fix, `Portal.cancel_actor()` raises `ActorTooSlowError` on
|
||||
the bounded-wait timeout, and `ActorNursery.cancel()`'s
|
||||
per-child wrapper escalates to `proc.terminate()` (hard-kill).
|
||||
The full nursery teardown therefore stays bounded even under
|
||||
pathological timing.
|
||||
|
||||
`n_dups` is parametrized to widen the race window — more
|
||||
same-name siblings = more concurrent register-RPCs at the
|
||||
daemon = higher probability of hitting the contention path.
|
||||
|
||||
'''
|
||||
log = tractor.log.get_logger(
|
||||
'tractor.tests.test_multi_program'
|
||||
)
|
||||
|
||||
# outer hard ceiling: a regression should fail-fast, NOT hang
|
||||
# the test session for minutes. Budget scales with `n_dups`
|
||||
# since each extra same-name sibling adds ~spawn-cost +
|
||||
# potential cancel-ack-timeout escalation latency under
|
||||
# TCP+forkserver. ~5s/sibling + 15s baseline gives plenty of
|
||||
# headroom while still failing-loud on a real hang.
|
||||
fail_after_s: int = 15 + (5 * n_dups)
|
||||
|
||||
async def main():
|
||||
log.cancel(
|
||||
f'enter `main()` n_dups={n_dups}'
|
||||
)
|
||||
with trio.fail_after(fail_after_s):
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as an:
|
||||
portals: list[Portal] = []
|
||||
for i in range(n_dups):
|
||||
p: Portal = await an.start_actor('doggy')
|
||||
portals.append(p)
|
||||
log.cancel(
|
||||
f'spawned doggy #{i + 1}/{n_dups}'
|
||||
)
|
||||
|
||||
# at least one of the N must be discoverable by
|
||||
# name; doesn't matter which one (registrar will
|
||||
# have last-wins semantics under same-name).
|
||||
async with tractor.wait_for_actor('doggy') as portal:
|
||||
expected_uids = {p.channel.uid for p in portals}
|
||||
assert portal.channel.uid in expected_uids
|
||||
|
||||
# critical section: this MUST return within
|
||||
# `fail_after_s` even when one or more cancel-RPC
|
||||
# acks time out. Pre-fix, this hangs forever.
|
||||
log.cancel('about to call `an.cancel()`')
|
||||
await an.cancel()
|
||||
log.cancel('`an.cancel()` returned')
|
||||
|
||||
# post-teardown sanity: every child proc must be reaped.
|
||||
# If escalation worked, even timed-out cancel-RPCs would
|
||||
# have triggered `proc.terminate()` and the procs are dead.
|
||||
for p in portals:
|
||||
# `Portal.channel.connected()` -> False once the
|
||||
# underlying chan disconnected (clean exit OR
|
||||
# hard-killed proc both produce disconnect).
|
||||
assert not p.channel.connected(), (
|
||||
f'Portal chan still connected post-teardown?\n'
|
||||
f'{p.channel}'
|
||||
)
|
||||
|
||||
trio.run(main)
|
||||
|
||||
|
||||
@tractor.context
|
||||
async def get_root_portal(
|
||||
ctx: Context,
|
||||
):
|
||||
'''
|
||||
Connect back to the root actor manually (using `._discovery` API)
|
||||
and ensure it's contact info is the same as our immediate parent.
|
||||
|
||||
'''
|
||||
sub: Actor = current_actor()
|
||||
rtvs: dict = _state._runtime_vars
|
||||
raddrs: list[UnwrappedAddress] = rtvs['_root_addrs']
|
||||
|
||||
# await tractor.pause()
|
||||
# XXX, in case the sub->root discovery breaks you might need
|
||||
# this (i know i did Xp)!!
|
||||
# from tractor.devx import mk_pdb
|
||||
# mk_pdb().set_trace()
|
||||
|
||||
assert (
|
||||
len(raddrs) == 1
|
||||
and
|
||||
list(sub._parent_chan.raddr.unwrap()) in raddrs
|
||||
)
|
||||
|
||||
# connect back to our immediate parent which should also
|
||||
# be the actor-tree's root.
|
||||
from tractor.discovery._api import get_root
|
||||
ptl: Portal
|
||||
async with get_root() as ptl:
|
||||
root_aid: Aid = ptl.chan.aid
|
||||
parent_ptl: Portal = current_actor().get_parent()
|
||||
assert (
|
||||
root_aid.name == 'root'
|
||||
and
|
||||
parent_ptl.chan.aid == root_aid
|
||||
)
|
||||
await ctx.started()
|
||||
|
||||
|
||||
def test_non_registrar_spawns_child(
|
||||
daemon: subprocess.Popen,
|
||||
reg_addr: UnwrappedAddress,
|
||||
loglevel: str,
|
||||
debug_mode: bool,
|
||||
ci_env: bool,
|
||||
):
|
||||
'''
|
||||
Ensure a non-regristar (serving) root actor can spawn a sub and
|
||||
that sub can connect back (manually) to it's rent that is the
|
||||
root without issue.
|
||||
|
||||
More or less this audits the global contact info in
|
||||
`._state._runtime_vars`.
|
||||
|
||||
'''
|
||||
async def main():
|
||||
|
||||
# XXX, since apparently on macos in GH's CI it can be a race
|
||||
# with the `daemon` registrar on grabbing the socket-addr..
|
||||
if ci_env and _non_linux:
|
||||
await trio.sleep(.5)
|
||||
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
loglevel=loglevel,
|
||||
debug_mode=debug_mode,
|
||||
) as an:
|
||||
|
||||
actor: Actor = tractor.current_actor()
|
||||
assert not actor.is_registrar
|
||||
sub_ptl: Portal = await an.start_actor(
|
||||
name='sub',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
|
||||
async with sub_ptl.open_context(
|
||||
get_root_portal,
|
||||
) as (ctx, _):
|
||||
print('Waiting for `sub` to connect back to us..')
|
||||
|
||||
await an.cancel()
|
||||
|
||||
# XXX, run manually since we want to start this root **after**
|
||||
# the other "daemon" program with it's own root.
|
||||
trio.run(main)
|
||||
|
|
@ -1,376 +0,0 @@
|
|||
'''
|
||||
Multiaddr construction, parsing, and round-trip tests for
|
||||
`tractor.discovery._multiaddr.mk_maddr()` and
|
||||
`tractor.discovery._multiaddr.parse_maddr()`.
|
||||
|
||||
'''
|
||||
from pathlib import Path
|
||||
from types import SimpleNamespace
|
||||
|
||||
import pytest
|
||||
from multiaddr import Multiaddr
|
||||
|
||||
from tractor.ipc._tcp import TCPAddress
|
||||
from tractor.ipc._uds import UDSAddress
|
||||
from tractor.discovery._multiaddr import (
|
||||
mk_maddr,
|
||||
parse_maddr,
|
||||
parse_endpoints,
|
||||
_tpt_proto_to_maddr,
|
||||
_maddr_to_tpt_proto,
|
||||
)
|
||||
from tractor.discovery._addr import wrap_address
|
||||
|
||||
|
||||
def test_tpt_proto_to_maddr_mapping():
|
||||
'''
|
||||
`_tpt_proto_to_maddr` maps all supported `proto_key`
|
||||
values to their correct multiaddr protocol names.
|
||||
|
||||
'''
|
||||
assert _tpt_proto_to_maddr['tcp'] == 'tcp'
|
||||
assert _tpt_proto_to_maddr['uds'] == 'unix'
|
||||
assert len(_tpt_proto_to_maddr) == 2
|
||||
|
||||
|
||||
def test_mk_maddr_tcp_ipv4():
|
||||
'''
|
||||
`mk_maddr()` on a `TCPAddress` with an IPv4 host
|
||||
produces the correct `/ip4/<host>/tcp/<port>` multiaddr.
|
||||
|
||||
'''
|
||||
addr = TCPAddress('127.0.0.1', 1234)
|
||||
result: Multiaddr = mk_maddr(addr)
|
||||
|
||||
assert isinstance(result, Multiaddr)
|
||||
assert str(result) == '/ip4/127.0.0.1/tcp/1234'
|
||||
|
||||
protos = result.protocols()
|
||||
assert protos[0].name == 'ip4'
|
||||
assert protos[1].name == 'tcp'
|
||||
|
||||
assert result.value_for_protocol('ip4') == '127.0.0.1'
|
||||
assert result.value_for_protocol('tcp') == '1234'
|
||||
|
||||
|
||||
def test_mk_maddr_tcp_ipv6():
|
||||
'''
|
||||
`mk_maddr()` on a `TCPAddress` with an IPv6 host
|
||||
produces the correct `/ip6/<host>/tcp/<port>` multiaddr.
|
||||
|
||||
'''
|
||||
addr = TCPAddress('::1', 5678)
|
||||
result: Multiaddr = mk_maddr(addr)
|
||||
|
||||
assert str(result) == '/ip6/::1/tcp/5678'
|
||||
|
||||
protos = result.protocols()
|
||||
assert protos[0].name == 'ip6'
|
||||
assert protos[1].name == 'tcp'
|
||||
|
||||
|
||||
def test_mk_maddr_uds():
|
||||
'''
|
||||
`mk_maddr()` on a `UDSAddress` produces a `/unix/<path>`
|
||||
multiaddr containing the full socket path.
|
||||
|
||||
'''
|
||||
# NOTE, use an absolute `filedir` to match real runtime
|
||||
# UDS paths; `mk_maddr()` strips the leading `/` to avoid
|
||||
# the double-slash `/unix//run/..` that py-multiaddr
|
||||
# rejects as "empty protocol path".
|
||||
filedir = '/tmp/tractor_test'
|
||||
filename = 'test_sock.sock'
|
||||
addr = UDSAddress(
|
||||
filedir=filedir,
|
||||
filename=filename,
|
||||
)
|
||||
result: Multiaddr = mk_maddr(addr)
|
||||
|
||||
assert isinstance(result, Multiaddr)
|
||||
|
||||
result_str: str = str(result)
|
||||
assert result_str.startswith('/unix/')
|
||||
# verify the leading `/` was stripped to avoid double-slash
|
||||
assert '/unix/tmp/tractor_test/' in result_str
|
||||
|
||||
sockpath_rel: str = str(
|
||||
Path(filedir) / filename
|
||||
).lstrip('/')
|
||||
unix_val: str = result.value_for_protocol('unix')
|
||||
assert unix_val.endswith(sockpath_rel)
|
||||
|
||||
|
||||
def test_mk_maddr_unsupported_proto_key():
|
||||
'''
|
||||
`mk_maddr()` raises `ValueError` for an unsupported
|
||||
`proto_key`.
|
||||
|
||||
'''
|
||||
fake_addr = SimpleNamespace(proto_key='quic')
|
||||
with pytest.raises(
|
||||
ValueError,
|
||||
match='Unsupported proto_key',
|
||||
):
|
||||
mk_maddr(fake_addr)
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'addr',
|
||||
[
|
||||
pytest.param(
|
||||
TCPAddress('127.0.0.1', 9999),
|
||||
id='tcp-ipv4',
|
||||
),
|
||||
pytest.param(
|
||||
UDSAddress(
|
||||
filedir='/tmp/tractor_rt',
|
||||
filename='roundtrip.sock',
|
||||
),
|
||||
id='uds',
|
||||
),
|
||||
],
|
||||
)
|
||||
def test_mk_maddr_roundtrip(addr):
|
||||
'''
|
||||
`mk_maddr()` output is valid multiaddr syntax that the
|
||||
library can re-parse back into an equivalent `Multiaddr`.
|
||||
|
||||
'''
|
||||
maddr: Multiaddr = mk_maddr(addr)
|
||||
reparsed = Multiaddr(str(maddr))
|
||||
|
||||
assert reparsed == maddr
|
||||
assert str(reparsed) == str(maddr)
|
||||
|
||||
|
||||
# ------ parse_maddr() tests ------
|
||||
|
||||
def test_maddr_to_tpt_proto_mapping():
|
||||
'''
|
||||
`_maddr_to_tpt_proto` is the exact inverse of
|
||||
`_tpt_proto_to_maddr`.
|
||||
|
||||
'''
|
||||
assert _maddr_to_tpt_proto == {
|
||||
'tcp': 'tcp',
|
||||
'unix': 'uds',
|
||||
}
|
||||
|
||||
|
||||
def test_parse_maddr_tcp_ipv4():
|
||||
'''
|
||||
`parse_maddr()` on an IPv4 TCP multiaddr string
|
||||
produce a `TCPAddress` with the correct host and port.
|
||||
|
||||
'''
|
||||
result = parse_maddr('/ip4/127.0.0.1/tcp/1234')
|
||||
|
||||
assert isinstance(result, TCPAddress)
|
||||
assert result.unwrap() == ('127.0.0.1', 1234)
|
||||
|
||||
|
||||
def test_parse_maddr_tcp_ipv6():
|
||||
'''
|
||||
`parse_maddr()` on an IPv6 TCP multiaddr string
|
||||
produce a `TCPAddress` with the correct host and port.
|
||||
|
||||
'''
|
||||
result = parse_maddr('/ip6/::1/tcp/5678')
|
||||
|
||||
assert isinstance(result, TCPAddress)
|
||||
assert result.unwrap() == ('::1', 5678)
|
||||
|
||||
|
||||
def test_parse_maddr_uds():
|
||||
'''
|
||||
`parse_maddr()` on a `/unix/...` multiaddr string
|
||||
produce a `UDSAddress` with the correct dir and filename,
|
||||
preserving absolute path semantics.
|
||||
|
||||
'''
|
||||
result = parse_maddr('/unix/tmp/tractor_test/test.sock')
|
||||
|
||||
assert isinstance(result, UDSAddress)
|
||||
filedir, filename = result.unwrap()
|
||||
assert filename == 'test.sock'
|
||||
assert str(filedir) == '/tmp/tractor_test'
|
||||
|
||||
|
||||
def test_parse_maddr_unsupported():
|
||||
'''
|
||||
`parse_maddr()` raise `ValueError` for an unsupported
|
||||
protocol combination like UDP.
|
||||
|
||||
'''
|
||||
with pytest.raises(
|
||||
ValueError,
|
||||
match='Unsupported multiaddr protocol combo',
|
||||
):
|
||||
parse_maddr('/ip4/127.0.0.1/udp/1234')
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'addr',
|
||||
[
|
||||
pytest.param(
|
||||
TCPAddress('127.0.0.1', 9999),
|
||||
id='tcp-ipv4',
|
||||
),
|
||||
pytest.param(
|
||||
UDSAddress(
|
||||
filedir='/tmp/tractor_rt',
|
||||
filename='roundtrip.sock',
|
||||
),
|
||||
id='uds',
|
||||
),
|
||||
],
|
||||
)
|
||||
def test_parse_maddr_roundtrip(addr):
|
||||
'''
|
||||
Full round-trip: `addr -> mk_maddr -> str -> parse_maddr`
|
||||
produce an `Address` whose `.unwrap()` matches the original.
|
||||
|
||||
'''
|
||||
maddr: Multiaddr = mk_maddr(addr)
|
||||
maddr_str: str = str(maddr)
|
||||
parsed = parse_maddr(maddr_str)
|
||||
|
||||
assert type(parsed) is type(addr)
|
||||
assert parsed.unwrap() == addr.unwrap()
|
||||
|
||||
|
||||
def test_wrap_address_maddr_str():
|
||||
'''
|
||||
`wrap_address()` accept a multiaddr-format string and
|
||||
return the correct `Address` type.
|
||||
|
||||
'''
|
||||
result = wrap_address('/ip4/127.0.0.1/tcp/9999')
|
||||
|
||||
assert isinstance(result, TCPAddress)
|
||||
assert result.unwrap() == ('127.0.0.1', 9999)
|
||||
|
||||
|
||||
# ------ parse_endpoints() tests ------
|
||||
|
||||
def test_parse_endpoints_tcp_only():
|
||||
'''
|
||||
`parse_endpoints()` with a single TCP maddr per actor
|
||||
produce the correct `TCPAddress` instances.
|
||||
|
||||
'''
|
||||
table = {
|
||||
'registry': ['/ip4/127.0.0.1/tcp/1616'],
|
||||
'data_feed': ['/ip4/0.0.0.0/tcp/5555'],
|
||||
}
|
||||
result = parse_endpoints(table)
|
||||
|
||||
assert set(result.keys()) == {'registry', 'data_feed'}
|
||||
|
||||
reg_addr = result['registry'][0]
|
||||
assert isinstance(reg_addr, TCPAddress)
|
||||
assert reg_addr.unwrap() == ('127.0.0.1', 1616)
|
||||
|
||||
feed_addr = result['data_feed'][0]
|
||||
assert isinstance(feed_addr, TCPAddress)
|
||||
assert feed_addr.unwrap() == ('0.0.0.0', 5555)
|
||||
|
||||
|
||||
def test_parse_endpoints_mixed_tpts():
|
||||
'''
|
||||
`parse_endpoints()` with both TCP and UDS maddrs for
|
||||
the same actor produce the correct mixed `Address` list.
|
||||
|
||||
'''
|
||||
table = {
|
||||
'broker': [
|
||||
'/ip4/127.0.0.1/tcp/4040',
|
||||
'/unix/tmp/tractor/broker.sock',
|
||||
],
|
||||
}
|
||||
result = parse_endpoints(table)
|
||||
addrs = result['broker']
|
||||
|
||||
assert len(addrs) == 2
|
||||
assert isinstance(addrs[0], TCPAddress)
|
||||
assert addrs[0].unwrap() == ('127.0.0.1', 4040)
|
||||
|
||||
assert isinstance(addrs[1], UDSAddress)
|
||||
filedir, filename = addrs[1].unwrap()
|
||||
assert filename == 'broker.sock'
|
||||
assert str(filedir) == '/tmp/tractor'
|
||||
|
||||
|
||||
def test_parse_endpoints_unwrapped_tuples():
|
||||
'''
|
||||
`parse_endpoints()` accept raw `(host, port)` tuples
|
||||
and wrap them as `TCPAddress`.
|
||||
|
||||
'''
|
||||
table = {
|
||||
'ems': [('127.0.0.1', 6666)],
|
||||
}
|
||||
result = parse_endpoints(table)
|
||||
|
||||
addr = result['ems'][0]
|
||||
assert isinstance(addr, TCPAddress)
|
||||
assert addr.unwrap() == ('127.0.0.1', 6666)
|
||||
|
||||
|
||||
def test_parse_endpoints_mixed_str_and_tuple():
|
||||
'''
|
||||
`parse_endpoints()` accept a mix of maddr strings and
|
||||
raw tuples in the same actor entry list.
|
||||
|
||||
'''
|
||||
table = {
|
||||
'quoter': [
|
||||
'/ip4/127.0.0.1/tcp/7777',
|
||||
('127.0.0.1', 8888),
|
||||
],
|
||||
}
|
||||
result = parse_endpoints(table)
|
||||
addrs = result['quoter']
|
||||
|
||||
assert len(addrs) == 2
|
||||
assert isinstance(addrs[0], TCPAddress)
|
||||
assert addrs[0].unwrap() == ('127.0.0.1', 7777)
|
||||
|
||||
assert isinstance(addrs[1], TCPAddress)
|
||||
assert addrs[1].unwrap() == ('127.0.0.1', 8888)
|
||||
|
||||
|
||||
def test_parse_endpoints_unsupported_proto():
|
||||
'''
|
||||
`parse_endpoints()` raise `ValueError` when a maddr
|
||||
string uses an unsupported protocol like `/udp/`.
|
||||
|
||||
'''
|
||||
table = {
|
||||
'bad_actor': ['/ip4/127.0.0.1/udp/9999'],
|
||||
}
|
||||
with pytest.raises(
|
||||
ValueError,
|
||||
match='Unsupported multiaddr protocol combo',
|
||||
):
|
||||
parse_endpoints(table)
|
||||
|
||||
|
||||
def test_parse_endpoints_empty_table():
|
||||
'''
|
||||
`parse_endpoints()` on an empty table return an empty
|
||||
dict.
|
||||
|
||||
'''
|
||||
assert parse_endpoints({}) == {}
|
||||
|
||||
|
||||
def test_parse_endpoints_empty_actor_list():
|
||||
'''
|
||||
`parse_endpoints()` with an actor mapped to an empty
|
||||
list preserve the key with an empty list value.
|
||||
|
||||
'''
|
||||
result = parse_endpoints({'x': []})
|
||||
assert result == {'x': []}
|
||||
|
|
@ -1,673 +0,0 @@
|
|||
'''
|
||||
Discovery subsystem via a "registrar" actor scenarios.
|
||||
|
||||
'''
|
||||
import os
|
||||
import signal
|
||||
import platform
|
||||
from functools import partial
|
||||
import itertools
|
||||
import time
|
||||
from typing import Callable
|
||||
|
||||
import psutil
|
||||
import pytest
|
||||
import subprocess
|
||||
import tractor
|
||||
from tractor.devx import dump_on_hang
|
||||
from tractor.trionics import collapse_eg
|
||||
from tractor._testing import tractor_test
|
||||
from tractor.discovery._addr import wrap_address
|
||||
from tractor.discovery._multiaddr import mk_maddr
|
||||
import trio
|
||||
|
||||
|
||||
pytestmark = pytest.mark.usefixtures(
|
||||
'reap_subactors_per_test',
|
||||
# NOTE, registrar tests stress the discovery
|
||||
# roundtrip (find_actor / wait_for_actor) which
|
||||
# historically left orphaned UDS sock-files when
|
||||
# subactor `hard_kill` SIGKILL'd, and which
|
||||
# exercises the same trio `WakeupSocketpair`
|
||||
# peer-disconnect path that triggered the
|
||||
# busy-loop bug class.
|
||||
'track_orphaned_uds_per_test',
|
||||
'detect_runaway_subactors_per_test',
|
||||
)
|
||||
|
||||
|
||||
@tractor_test
|
||||
async def test_reg_then_unreg(
|
||||
reg_addr: tuple,
|
||||
):
|
||||
actor = tractor.current_actor()
|
||||
assert actor.is_registrar
|
||||
assert len(actor._registry) == 1 # only self is registered
|
||||
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as n:
|
||||
|
||||
portal = await n.start_actor('actor', enable_modules=[__name__])
|
||||
uid = portal.channel.aid.uid
|
||||
|
||||
async with tractor.get_registry(reg_addr) as aportal:
|
||||
# this local actor should be the registrar
|
||||
assert actor is aportal.actor
|
||||
|
||||
async with tractor.wait_for_actor('actor'):
|
||||
# sub-actor uid should be in the registry
|
||||
assert uid in aportal.actor._registry
|
||||
sockaddrs = actor._registry[uid]
|
||||
# XXX: can we figure out what the listen addr will be?
|
||||
assert sockaddrs
|
||||
|
||||
await n.cancel() # tear down nursery
|
||||
|
||||
await trio.sleep(0.1)
|
||||
assert uid not in aportal.actor._registry
|
||||
sockaddrs = actor._registry.get(uid)
|
||||
assert not sockaddrs
|
||||
|
||||
|
||||
@tractor_test
|
||||
async def test_reg_then_unreg_maddr(
|
||||
reg_addr: tuple,
|
||||
):
|
||||
'''
|
||||
Same as `test_reg_then_unreg` but pass the registry
|
||||
address as a multiaddr string to verify `wrap_address()`
|
||||
multiaddr parsing end-to-end through the runtime.
|
||||
|
||||
'''
|
||||
# tuple -> Address -> multiaddr string
|
||||
addr_obj = wrap_address(reg_addr)
|
||||
maddr_str: str = str(mk_maddr(addr_obj))
|
||||
|
||||
actor = tractor.current_actor()
|
||||
assert actor.is_registrar
|
||||
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[maddr_str],
|
||||
) as n:
|
||||
|
||||
portal = await n.start_actor(
|
||||
'actor_maddr',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
uid = portal.channel.aid.uid
|
||||
|
||||
async with tractor.get_registry(maddr_str) as aportal:
|
||||
assert actor is aportal.actor
|
||||
|
||||
async with tractor.wait_for_actor('actor_maddr'):
|
||||
assert uid in aportal.actor._registry
|
||||
sockaddrs = actor._registry[uid]
|
||||
assert sockaddrs
|
||||
|
||||
await n.cancel()
|
||||
|
||||
await trio.sleep(0.1)
|
||||
assert uid not in aportal.actor._registry
|
||||
sockaddrs = actor._registry.get(uid)
|
||||
assert not sockaddrs
|
||||
|
||||
|
||||
the_line = 'Hi my name is {}'
|
||||
|
||||
|
||||
async def hi():
|
||||
return the_line.format(tractor.current_actor().name)
|
||||
|
||||
|
||||
async def say_hello_use_wait(
|
||||
other_actor: str,
|
||||
reg_addr: tuple[str, int],
|
||||
):
|
||||
async with tractor.wait_for_actor(
|
||||
other_actor,
|
||||
registry_addr=reg_addr,
|
||||
) as portal:
|
||||
assert portal is not None
|
||||
result = await portal.run(__name__, 'hi')
|
||||
return result
|
||||
|
||||
|
||||
@tractor_test(
|
||||
timeout=7,
|
||||
)
|
||||
@pytest.mark.parametrize(
|
||||
'ria_fn',
|
||||
[
|
||||
say_hello_use_wait,
|
||||
]
|
||||
)
|
||||
async def test_trynamic_trio(
|
||||
ria_fn: Callable,
|
||||
start_method: str,
|
||||
reg_addr: tuple,
|
||||
):
|
||||
'''
|
||||
Root actor acting as the "director" and running one-shot-task-actors
|
||||
for the directed subs.
|
||||
|
||||
'''
|
||||
async with tractor.open_nursery() as n:
|
||||
print("Alright... Action!")
|
||||
|
||||
donny = await n.run_in_actor(
|
||||
ria_fn,
|
||||
other_actor='gretchen',
|
||||
reg_addr=reg_addr,
|
||||
name='donny',
|
||||
)
|
||||
gretchen = await n.run_in_actor(
|
||||
ria_fn,
|
||||
other_actor='donny',
|
||||
reg_addr=reg_addr,
|
||||
name='gretchen',
|
||||
)
|
||||
print(await gretchen.result())
|
||||
print(await donny.result())
|
||||
print("CUTTTT CUUTT CUT!!?! Donny!! You're supposed to say...")
|
||||
|
||||
|
||||
async def stream_forever():
|
||||
for i in itertools.count():
|
||||
yield i
|
||||
await trio.sleep(0.01)
|
||||
|
||||
|
||||
async def cancel(
|
||||
use_signal: bool,
|
||||
delay: float = 0,
|
||||
):
|
||||
# hold on there sally
|
||||
await trio.sleep(delay)
|
||||
|
||||
# trigger cancel
|
||||
if use_signal:
|
||||
if platform.system() == 'Windows':
|
||||
pytest.skip("SIGINT not supported on windows")
|
||||
os.kill(os.getpid(), signal.SIGINT)
|
||||
else:
|
||||
raise KeyboardInterrupt
|
||||
|
||||
|
||||
async def stream_from(portal: tractor.Portal):
|
||||
async with portal.open_stream_from(stream_forever) as stream:
|
||||
async for value in stream:
|
||||
print(value)
|
||||
|
||||
|
||||
async def unpack_reg(
|
||||
actor_or_portal: tractor.Portal|tractor.Actor,
|
||||
):
|
||||
'''
|
||||
Get and unpack a "registry" RPC request from the registrar
|
||||
system.
|
||||
|
||||
'''
|
||||
if getattr(actor_or_portal, 'get_registry', None):
|
||||
msg = await actor_or_portal.get_registry()
|
||||
else:
|
||||
msg = await actor_or_portal.run_from_ns('self', 'get_registry')
|
||||
|
||||
return {
|
||||
tuple(key.split('.')): val
|
||||
for key, val in msg.items()
|
||||
}
|
||||
|
||||
|
||||
async def spawn_and_check_registry(
|
||||
reg_addr: tuple,
|
||||
use_signal: bool,
|
||||
debug_mode: bool = False,
|
||||
remote_arbiter: bool = False,
|
||||
with_streaming: bool = False,
|
||||
maybe_daemon: tuple[
|
||||
subprocess.Popen,
|
||||
psutil.Process,
|
||||
]|None = None,
|
||||
|
||||
) -> None:
|
||||
|
||||
if maybe_daemon:
|
||||
popen, proc = maybe_daemon
|
||||
# breakpoint()
|
||||
|
||||
async with tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
):
|
||||
async with tractor.get_registry(
|
||||
addr=reg_addr,
|
||||
) as portal:
|
||||
# runtime needs to be up to call this
|
||||
actor = tractor.current_actor()
|
||||
|
||||
if remote_arbiter:
|
||||
assert not actor.is_registrar
|
||||
|
||||
if actor.is_registrar:
|
||||
extra = 1 # registrar is local root actor
|
||||
get_reg = partial(unpack_reg, actor)
|
||||
|
||||
else:
|
||||
get_reg = partial(unpack_reg, portal)
|
||||
extra = 2 # local root actor + remote registrar
|
||||
|
||||
# ensure current actor is registered
|
||||
registry: dict = await get_reg()
|
||||
assert actor.aid.uid in registry
|
||||
|
||||
try:
|
||||
async with tractor.open_nursery() as an:
|
||||
async with (
|
||||
collapse_eg(),
|
||||
trio.open_nursery() as trion,
|
||||
):
|
||||
portals = {}
|
||||
for i in range(3):
|
||||
name = f'a{i}'
|
||||
if with_streaming:
|
||||
portals[name] = await an.start_actor(
|
||||
name=name, enable_modules=[__name__])
|
||||
|
||||
else: # no streaming
|
||||
portals[name] = await an.run_in_actor(
|
||||
trio.sleep_forever, name=name)
|
||||
|
||||
# wait on last actor to come up
|
||||
async with tractor.wait_for_actor(name):
|
||||
registry = await get_reg()
|
||||
for uid in an._children:
|
||||
assert uid in registry
|
||||
|
||||
assert len(portals) + extra == len(registry)
|
||||
|
||||
if with_streaming:
|
||||
await trio.sleep(0.1)
|
||||
|
||||
pts = list(portals.values())
|
||||
for p in pts[:-1]:
|
||||
trion.start_soon(stream_from, p)
|
||||
|
||||
# stream for 1 sec
|
||||
trion.start_soon(cancel, use_signal, 1)
|
||||
|
||||
last_p = pts[-1]
|
||||
await stream_from(last_p)
|
||||
|
||||
else:
|
||||
await cancel(use_signal)
|
||||
|
||||
finally:
|
||||
await trio.sleep(0.5)
|
||||
|
||||
# all subactors should have de-registered
|
||||
registry = await get_reg()
|
||||
start: float = time.time()
|
||||
while (
|
||||
not (len(registry) == extra)
|
||||
and
|
||||
(time.time() - start) < 5
|
||||
):
|
||||
print(
|
||||
f'Waiting for remaining subs to dereg..\n'
|
||||
f'{registry!r}\n'
|
||||
)
|
||||
await trio.sleep(0.3)
|
||||
else:
|
||||
assert len(registry) == extra
|
||||
|
||||
assert actor.aid.uid in registry
|
||||
|
||||
|
||||
async def with_timeout(
|
||||
main: Callable,
|
||||
timeout: float = 6,
|
||||
):
|
||||
with trio.fail_after(timeout):
|
||||
await main()
|
||||
|
||||
|
||||
@pytest.mark.parametrize('use_signal', [False, True])
|
||||
@pytest.mark.parametrize('with_streaming', [False, True])
|
||||
def test_subactors_unregister_on_cancel(
|
||||
debug_mode: bool,
|
||||
start_method: str,
|
||||
use_signal: bool,
|
||||
reg_addr: tuple,
|
||||
with_streaming: bool,
|
||||
):
|
||||
'''
|
||||
Verify that cancelling a nursery results in all subactors
|
||||
deregistering themselves with the registrar.
|
||||
|
||||
'''
|
||||
with pytest.raises(KeyboardInterrupt):
|
||||
trio.run(
|
||||
# with_timeout,
|
||||
partial(
|
||||
spawn_and_check_registry,
|
||||
reg_addr,
|
||||
use_signal,
|
||||
debug_mode=debug_mode,
|
||||
remote_arbiter=False,
|
||||
with_streaming=with_streaming,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.parametrize('use_signal', [False, True])
|
||||
@pytest.mark.parametrize('with_streaming', [False, True])
|
||||
def test_subactors_unregister_on_cancel_remote_daemon(
|
||||
daemon: subprocess.Popen,
|
||||
debug_mode: bool,
|
||||
start_method: str,
|
||||
use_signal: bool,
|
||||
reg_addr: tuple,
|
||||
with_streaming: bool,
|
||||
):
|
||||
'''
|
||||
Verify that cancelling a nursery results in all subactors
|
||||
deregistering themselves with a **remote** (not in the local
|
||||
process tree) registrar.
|
||||
|
||||
'''
|
||||
with pytest.raises(KeyboardInterrupt):
|
||||
trio.run(
|
||||
with_timeout,
|
||||
partial(
|
||||
spawn_and_check_registry,
|
||||
reg_addr,
|
||||
use_signal,
|
||||
debug_mode=debug_mode,
|
||||
remote_arbiter=True,
|
||||
with_streaming=with_streaming,
|
||||
maybe_daemon=(
|
||||
daemon,
|
||||
psutil.Process(daemon.pid)
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
async def streamer(agen):
|
||||
async for item in agen:
|
||||
print(item)
|
||||
|
||||
|
||||
async def close_chans_before_nursery(
|
||||
reg_addr: tuple,
|
||||
use_signal: bool,
|
||||
remote_arbiter: bool = False,
|
||||
) -> None:
|
||||
|
||||
# logic for how many actors should still be
|
||||
# in the registry at teardown.
|
||||
if remote_arbiter:
|
||||
entries_at_end = 2
|
||||
else:
|
||||
entries_at_end = 1
|
||||
|
||||
async with tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
):
|
||||
async with tractor.get_registry(reg_addr) as aportal:
|
||||
try:
|
||||
get_reg = partial(unpack_reg, aportal)
|
||||
|
||||
async with tractor.open_nursery() as an:
|
||||
portal1 = await an.start_actor(
|
||||
name='consumer1',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
portal2 = await an.start_actor(
|
||||
'consumer2',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
|
||||
async with (
|
||||
portal1.open_stream_from(
|
||||
stream_forever
|
||||
) as agen1,
|
||||
portal2.open_stream_from(
|
||||
stream_forever
|
||||
) as agen2,
|
||||
):
|
||||
async with (
|
||||
collapse_eg(),
|
||||
trio.open_nursery() as tn,
|
||||
):
|
||||
tn.start_soon(streamer, agen1)
|
||||
tn.start_soon(cancel, use_signal, .5)
|
||||
try:
|
||||
await streamer(agen2)
|
||||
finally:
|
||||
# Kill the root nursery thus resulting in
|
||||
# normal registrar channel ops to fail during
|
||||
# teardown. It doesn't seem like this is
|
||||
# reliably triggered by an external SIGINT.
|
||||
# tractor.current_actor()._root_nursery.cancel_scope.cancel()
|
||||
|
||||
# XXX: THIS IS THE KEY THING that
|
||||
# happens **before** exiting the
|
||||
# actor nursery block
|
||||
|
||||
# also kill off channels cuz why not
|
||||
await agen1.aclose()
|
||||
await agen2.aclose()
|
||||
|
||||
finally:
|
||||
with trio.CancelScope(shield=True):
|
||||
await trio.sleep(1)
|
||||
|
||||
# all subactors should have de-registered
|
||||
registry = await get_reg()
|
||||
assert portal1.channel.aid.uid not in registry
|
||||
assert portal2.channel.aid.uid not in registry
|
||||
assert len(registry) == entries_at_end
|
||||
|
||||
|
||||
@pytest.mark.parametrize('use_signal', [False, True])
|
||||
def test_close_channel_explicit(
|
||||
start_method: str,
|
||||
use_signal: bool,
|
||||
reg_addr: tuple,
|
||||
):
|
||||
'''
|
||||
Verify that closing a stream explicitly and killing the actor's
|
||||
"root nursery" **before** the containing nursery tears down also
|
||||
results in subactor(s) deregistering from the registrar.
|
||||
|
||||
'''
|
||||
with pytest.raises(KeyboardInterrupt):
|
||||
trio.run(
|
||||
partial(
|
||||
close_chans_before_nursery,
|
||||
reg_addr,
|
||||
use_signal,
|
||||
remote_arbiter=False,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.parametrize('use_signal', [False, True])
|
||||
def test_close_channel_explicit_remote_registrar(
|
||||
daemon: subprocess.Popen,
|
||||
start_method: str,
|
||||
use_signal: bool,
|
||||
reg_addr: tuple,
|
||||
):
|
||||
'''
|
||||
Verify that closing a stream explicitly and killing the actor's
|
||||
"root nursery" **before** the containing nursery tears down also
|
||||
results in subactor(s) deregistering from the registrar.
|
||||
|
||||
'''
|
||||
with pytest.raises(KeyboardInterrupt):
|
||||
trio.run(
|
||||
partial(
|
||||
close_chans_before_nursery,
|
||||
reg_addr,
|
||||
use_signal,
|
||||
remote_arbiter=True,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
@tractor.context
|
||||
async def kill_transport(
|
||||
ctx: tractor.Context,
|
||||
) -> None:
|
||||
|
||||
await ctx.started()
|
||||
actor: tractor.Actor = tractor.current_actor()
|
||||
actor.ipc_server.cancel()
|
||||
await trio.sleep_forever()
|
||||
|
||||
|
||||
|
||||
# ?TODO, do a OSc style signalling test on this?
|
||||
# -[ ] doesn't work for fork backends
|
||||
# @pytest.mark.parametrize('use_signal', [False, True])
|
||||
#
|
||||
# Wall-clock bound via `pytest-timeout` (`method='thread'`).
|
||||
# Under `--spawn-backend=subint` this test can wedge in an
|
||||
# un-Ctrl-C-able state (abandoned-subint + shared-GIL
|
||||
# starvation → signal-wakeup-fd pipe fills → SIGINT silently
|
||||
# dropped; see `ai/conc-anal/subint_sigint_starvation_issue.md`).
|
||||
# `method='thread'` is specifically required because `signal`-
|
||||
# method SIGALRM suffers the same GIL-starvation path and
|
||||
# wouldn't fire the Python-level handler.
|
||||
# At timeout the plugin hard-kills the pytest process — that's
|
||||
# the intended behavior here; the alternative is an unattended
|
||||
# suite run that never returns.
|
||||
# @pytest.mark.timeout(
|
||||
# 30,
|
||||
# # NOTE should be a 2.1s happy path.
|
||||
# # XXX for `main_thread_forkserver` this is SUPER SENSITIVE
|
||||
# # so keep it higher to avoid flaky runs..
|
||||
# method='thread',
|
||||
# )
|
||||
@pytest.mark.skipon_spawn_backend(
|
||||
'subint',
|
||||
# 'main_thread_forkserver',
|
||||
reason=(
|
||||
'XXX SUBINT HANGING TEST XXX\n'
|
||||
'See outstanding issue(s)\n'
|
||||
# TODO, put issue link!
|
||||
)
|
||||
)
|
||||
def test_stale_entry_is_deleted(
|
||||
debug_mode: bool,
|
||||
daemon: subprocess.Popen,
|
||||
start_method: str,
|
||||
reg_addr: tuple,
|
||||
# set_fork_aware_capture,
|
||||
):
|
||||
'''
|
||||
Ensure that when a stale entry is detected in the registrar's
|
||||
table that the `find_actor()` API takes care of deleting the
|
||||
stale entry and not delivering a bad portal.
|
||||
|
||||
'''
|
||||
async def main():
|
||||
name: str = 'transport_fails_actor'
|
||||
_reg_ptl: tractor.Portal
|
||||
an: tractor.ActorNursery
|
||||
async with (
|
||||
tractor.open_nursery(
|
||||
debug_mode=debug_mode,
|
||||
registry_addrs=[reg_addr],
|
||||
) as an,
|
||||
tractor.get_registry(reg_addr) as _reg_ptl,
|
||||
):
|
||||
ptl: tractor.Portal = await an.start_actor(
|
||||
name,
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
async with ptl.open_context(
|
||||
kill_transport,
|
||||
) as (first, ctx):
|
||||
async with tractor.find_actor(
|
||||
name,
|
||||
registry_addrs=[reg_addr],
|
||||
) as maybe_portal:
|
||||
# because the transitive
|
||||
# `._api.maybe_open_portal()` call should
|
||||
# fail and implicitly call `.delete_addr()`
|
||||
assert maybe_portal is None
|
||||
registry: dict = await unpack_reg(_reg_ptl)
|
||||
assert ptl.chan.aid.uid not in registry
|
||||
|
||||
# should fail since we knocked out the IPC tpt XD
|
||||
await ptl.cancel_actor()
|
||||
await an.cancel()
|
||||
|
||||
# XXX, for tracing if this starts being flaky again..
|
||||
#
|
||||
timeout: float = 4
|
||||
async def _timeout_main():
|
||||
with trio.move_on_after(timeout) as cs:
|
||||
await main()
|
||||
|
||||
if (
|
||||
cs.cancel_called
|
||||
and
|
||||
debug_mode
|
||||
):
|
||||
await tractor.pause()
|
||||
|
||||
# TODO, remove once the `[subint]` variant no longer hangs.
|
||||
#
|
||||
# Status (as of Phase B hard-kill landing):
|
||||
#
|
||||
# - `[trio]`/`[mp_*]` variants: completes normally; `dump_on_hang`
|
||||
# is a no-op safety net here.
|
||||
#
|
||||
# - `[subint]` variant: hangs indefinitely AND is un-Ctrl-C-able.
|
||||
# `strace -p <pytest_pid>` while in the hang reveals a silently-
|
||||
# dropped SIGINT — the C signal handler tries to write the
|
||||
# signum byte to Python's signal-wakeup fd and gets `EAGAIN`,
|
||||
# meaning the pipe is full (nobody's draining it).
|
||||
#
|
||||
# Root-cause chain: our hard-kill in `spawn._subint` abandoned
|
||||
# the driver OS-thread (which is `daemon=True`) after the soft-
|
||||
# kill timeout, but the *sub-interpreter* inside that thread is
|
||||
# still running `trio.run()` — `_interpreters.destroy()` can't
|
||||
# force-stop a running subint (raises `InterpreterError`), and
|
||||
# legacy-config subints share the main GIL. The abandoned subint
|
||||
# starves the parent's trio event loop from iterating often
|
||||
# enough to drain its wakeup pipe → SIGINT silently drops.
|
||||
#
|
||||
# This is structurally a CPython-level limitation: there's no
|
||||
# public force-destroy primitive for a running subint. We
|
||||
# escape on the harness side via a SIGINT-loop in the `daemon`
|
||||
# fixture teardown (killing the bg registrar subproc closes its
|
||||
# end of the IPC, which eventually unblocks a recv in main trio,
|
||||
# which lets the loop drain the wakeup pipe). Long-term fix path:
|
||||
# msgspec PEP 684 support (jcrist/msgspec#563) → isolated-mode
|
||||
# subints with per-interp GIL.
|
||||
#
|
||||
# Full analysis:
|
||||
# `ai/conc-anal/subint_sigint_starvation_issue.md`
|
||||
#
|
||||
# See also the *sibling* hang class documented in
|
||||
# `ai/conc-anal/subint_cancel_delivery_hang_issue.md` — same
|
||||
# subint backend, different root cause (Ctrl-C-able hang, main
|
||||
# trio loop iterating fine; ours to fix, not CPython's).
|
||||
# Reproduced by `tests/test_subint_cancellation.py
|
||||
# ::test_subint_non_checkpointing_child`.
|
||||
#
|
||||
# Kept here (and not behind a `pytestmark.skip`) so we can still
|
||||
# inspect the dump file if the hang ever returns after a refactor.
|
||||
# `pytest`'s stderr capture eats `faulthandler` output otherwise,
|
||||
# so we route `dump_on_hang` to a file.
|
||||
with dump_on_hang(
|
||||
seconds=timeout*2,
|
||||
path=f'/tmp/test_stale_entry_is_deleted_{start_method}.dump',
|
||||
):
|
||||
trio.run(_timeout_main)
|
||||
|
|
@ -1,345 +0,0 @@
|
|||
'''
|
||||
`open_root_actor(tpt_bind_addrs=...)` test suite.
|
||||
|
||||
Verify all three runtime code paths for explicit IPC-server
|
||||
bind-address selection in `_root.py`:
|
||||
|
||||
1. Non-registrar, no explicit bind -> random addrs from registry proto
|
||||
2. Registrar, no explicit bind -> binds to registry_addrs
|
||||
3. Explicit bind given -> wraps via `wrap_address()` and uses them
|
||||
|
||||
'''
|
||||
import pytest
|
||||
import trio
|
||||
import tractor
|
||||
from tractor.discovery._addr import (
|
||||
wrap_address,
|
||||
)
|
||||
from tractor.discovery._multiaddr import mk_maddr
|
||||
from tractor._testing.addr import get_rando_addr
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# helpers
|
||||
# ------------------------------------------------------------------
|
||||
def _bound_bindspaces(
|
||||
actor: tractor.Actor,
|
||||
) -> set[str]:
|
||||
'''
|
||||
Collect the set of bindspace strings from the actor's
|
||||
currently bound IPC-server accept addresses.
|
||||
|
||||
'''
|
||||
return {
|
||||
wrap_address(a).bindspace
|
||||
for a in actor.accept_addrs
|
||||
}
|
||||
|
||||
|
||||
def _bound_wrapped(
|
||||
actor: tractor.Actor,
|
||||
) -> list:
|
||||
'''
|
||||
Return the actor's accept addrs as wrapped `Address` objects.
|
||||
|
||||
'''
|
||||
return [
|
||||
wrap_address(a)
|
||||
for a in actor.accept_addrs
|
||||
]
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# 1) Registrar + explicit tpt_bind_addrs
|
||||
# ------------------------------------------------------------------
|
||||
@pytest.mark.parametrize(
|
||||
'addr_combo',
|
||||
[
|
||||
'bind-eq-reg',
|
||||
'bind-subset-reg',
|
||||
'bind-disjoint-reg',
|
||||
],
|
||||
ids=lambda v: v,
|
||||
)
|
||||
def test_registrar_root_tpt_bind_addrs(
|
||||
reg_addr: tuple,
|
||||
tpt_proto: str,
|
||||
debug_mode: bool,
|
||||
addr_combo: str,
|
||||
):
|
||||
'''
|
||||
Registrar root-actor with explicit `tpt_bind_addrs`:
|
||||
bound set must include all registry + all bind addr bindspaces
|
||||
(merge behavior).
|
||||
|
||||
'''
|
||||
reg_wrapped = wrap_address(reg_addr)
|
||||
|
||||
if addr_combo == 'bind-eq-reg':
|
||||
bind_addrs = [reg_addr]
|
||||
# extra secondary reg addr for subset test
|
||||
extra_reg = []
|
||||
|
||||
elif addr_combo == 'bind-subset-reg':
|
||||
second_reg = get_rando_addr(tpt_proto)
|
||||
bind_addrs = [reg_addr]
|
||||
extra_reg = [second_reg]
|
||||
|
||||
elif addr_combo == 'bind-disjoint-reg':
|
||||
# port=0 on same host -> completely different addr
|
||||
rando = wrap_address(reg_addr).get_random(
|
||||
bindspace=reg_wrapped.bindspace,
|
||||
)
|
||||
bind_addrs = [rando.unwrap()]
|
||||
extra_reg = []
|
||||
|
||||
all_reg = [reg_addr] + extra_reg
|
||||
|
||||
async def _main():
|
||||
async with tractor.open_root_actor(
|
||||
registry_addrs=all_reg,
|
||||
tpt_bind_addrs=bind_addrs,
|
||||
debug_mode=debug_mode,
|
||||
):
|
||||
actor = tractor.current_actor()
|
||||
assert actor.is_registrar
|
||||
|
||||
bound = actor.accept_addrs
|
||||
bound_bs = _bound_bindspaces(actor)
|
||||
|
||||
# all registry bindspaces must appear in bound set
|
||||
for ra in all_reg:
|
||||
assert wrap_address(ra).bindspace in bound_bs
|
||||
|
||||
# all bind-addr bindspaces must appear
|
||||
for ba in bind_addrs:
|
||||
assert wrap_address(ba).bindspace in bound_bs
|
||||
|
||||
# registry addr must appear verbatim in bound
|
||||
# (after wrapping both sides for comparison)
|
||||
bound_w = _bound_wrapped(actor)
|
||||
assert reg_wrapped in bound_w
|
||||
|
||||
if addr_combo == 'bind-disjoint-reg':
|
||||
assert len(bound) >= 2
|
||||
|
||||
trio.run(_main)
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'addr_combo',
|
||||
[
|
||||
'bind-same-bindspace',
|
||||
'bind-disjoint',
|
||||
],
|
||||
ids=lambda v: v,
|
||||
)
|
||||
def test_non_registrar_root_tpt_bind_addrs(
|
||||
daemon,
|
||||
reg_addr: tuple,
|
||||
tpt_proto: str,
|
||||
debug_mode: bool,
|
||||
addr_combo: str,
|
||||
):
|
||||
'''
|
||||
Non-registrar root with explicit `tpt_bind_addrs`:
|
||||
bound set must exactly match the requested bind addrs
|
||||
(no merge with registry).
|
||||
|
||||
'''
|
||||
reg_wrapped = wrap_address(reg_addr)
|
||||
|
||||
if addr_combo == 'bind-same-bindspace':
|
||||
# same bindspace as reg but port=0 so we get a random port
|
||||
rando = reg_wrapped.get_random(
|
||||
bindspace=reg_wrapped.bindspace,
|
||||
)
|
||||
bind_addrs = [rando.unwrap()]
|
||||
|
||||
elif addr_combo == 'bind-disjoint':
|
||||
rando = reg_wrapped.get_random(
|
||||
bindspace=reg_wrapped.bindspace,
|
||||
)
|
||||
bind_addrs = [rando.unwrap()]
|
||||
|
||||
async def _main():
|
||||
async with tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
tpt_bind_addrs=bind_addrs,
|
||||
debug_mode=debug_mode,
|
||||
):
|
||||
actor = tractor.current_actor()
|
||||
assert not actor.is_registrar
|
||||
|
||||
bound = actor.accept_addrs
|
||||
assert len(bound) == len(bind_addrs)
|
||||
|
||||
# bindspaces must match
|
||||
bound_bs = _bound_bindspaces(actor)
|
||||
for ba in bind_addrs:
|
||||
assert wrap_address(ba).bindspace in bound_bs
|
||||
|
||||
# TCP port=0 should resolve to a real port
|
||||
for uw_addr in bound:
|
||||
w = wrap_address(uw_addr)
|
||||
if w.proto_key == 'tcp':
|
||||
_host, port = uw_addr
|
||||
assert port > 0
|
||||
|
||||
trio.run(_main)
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# 3) Non-registrar, default random bind (baseline)
|
||||
# ------------------------------------------------------------------
|
||||
def test_non_registrar_default_random_bind(
|
||||
daemon,
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
):
|
||||
'''
|
||||
Baseline: no `tpt_bind_addrs`, daemon running.
|
||||
Bound bindspace matches registry bindspace,
|
||||
but bound addr differs from reg_addr (random).
|
||||
|
||||
'''
|
||||
reg_wrapped = wrap_address(reg_addr)
|
||||
|
||||
async def _main():
|
||||
async with tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
):
|
||||
actor = tractor.current_actor()
|
||||
assert not actor.is_registrar
|
||||
|
||||
bound_bs = _bound_bindspaces(actor)
|
||||
assert reg_wrapped.bindspace in bound_bs
|
||||
|
||||
# bound addr should differ from the registry addr
|
||||
# (the runtime picks a random port/path)
|
||||
bound_w = _bound_wrapped(actor)
|
||||
assert reg_wrapped not in bound_w
|
||||
|
||||
trio.run(_main)
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# 4) Multiaddr string input
|
||||
# ------------------------------------------------------------------
|
||||
def test_tpt_bind_addrs_as_maddr_str(
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
):
|
||||
'''
|
||||
Pass multiaddr strings as `tpt_bind_addrs`.
|
||||
Runtime should parse and bind successfully.
|
||||
|
||||
'''
|
||||
reg_wrapped = wrap_address(reg_addr)
|
||||
# build a port-0 / random maddr string for binding
|
||||
rando = reg_wrapped.get_random(
|
||||
bindspace=reg_wrapped.bindspace,
|
||||
)
|
||||
maddr_str: str = str(mk_maddr(rando))
|
||||
|
||||
async def _main():
|
||||
async with tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
tpt_bind_addrs=[maddr_str],
|
||||
debug_mode=debug_mode,
|
||||
):
|
||||
actor = tractor.current_actor()
|
||||
assert actor.is_registrar
|
||||
|
||||
for uw_addr in actor.accept_addrs:
|
||||
w = wrap_address(uw_addr)
|
||||
if w.proto_key == 'tcp':
|
||||
_host, port = uw_addr
|
||||
assert port > 0
|
||||
|
||||
trio.run(_main)
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# 5) Registrar merge produces union of binds
|
||||
# ------------------------------------------------------------------
|
||||
def test_registrar_merge_binds_union(
|
||||
tpt_proto: str,
|
||||
debug_mode: bool,
|
||||
):
|
||||
'''
|
||||
Registrar + disjoint bind addr: bound set must include
|
||||
both registry and explicit bind addresses.
|
||||
|
||||
'''
|
||||
reg_addr = get_rando_addr(tpt_proto)
|
||||
reg_wrapped = wrap_address(reg_addr)
|
||||
|
||||
rando = reg_wrapped.get_random(
|
||||
bindspace=reg_wrapped.bindspace,
|
||||
)
|
||||
bind_addrs = [rando.unwrap()]
|
||||
|
||||
# NOTE: for UDS, `get_random()` produces the same
|
||||
# filename for the same pid+actor-state, so the
|
||||
# "disjoint" premise only holds when the addrs
|
||||
# actually differ (always true for TCP, may
|
||||
# collide for UDS).
|
||||
expect_disjoint: bool = (
|
||||
tuple(reg_addr) != rando.unwrap()
|
||||
)
|
||||
|
||||
async def _main():
|
||||
async with tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
tpt_bind_addrs=bind_addrs,
|
||||
debug_mode=debug_mode,
|
||||
):
|
||||
actor = tractor.current_actor()
|
||||
assert actor.is_registrar
|
||||
|
||||
bound = actor.accept_addrs
|
||||
bound_w = _bound_wrapped(actor)
|
||||
|
||||
if expect_disjoint:
|
||||
# must have at least 2 (registry + bind)
|
||||
assert len(bound) >= 2
|
||||
|
||||
# registry addr must appear in bound set
|
||||
assert reg_wrapped in bound_w
|
||||
|
||||
trio.run(_main)
|
||||
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# 6) open_nursery forwards tpt_bind_addrs
|
||||
# ------------------------------------------------------------------
|
||||
def test_open_nursery_forwards_tpt_bind_addrs(
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
):
|
||||
'''
|
||||
`open_nursery(tpt_bind_addrs=...)` forwards through
|
||||
`**kwargs` to `open_root_actor()`.
|
||||
|
||||
'''
|
||||
reg_wrapped = wrap_address(reg_addr)
|
||||
rando = reg_wrapped.get_random(
|
||||
bindspace=reg_wrapped.bindspace,
|
||||
)
|
||||
bind_addrs = [rando.unwrap()]
|
||||
|
||||
async def _main():
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
tpt_bind_addrs=bind_addrs,
|
||||
debug_mode=debug_mode,
|
||||
):
|
||||
actor = tractor.current_actor()
|
||||
bound_bs = _bound_bindspaces(actor)
|
||||
|
||||
for ba in bind_addrs:
|
||||
assert wrap_address(ba).bindspace in bound_bs
|
||||
|
||||
trio.run(_main)
|
||||
|
|
@ -8,16 +8,17 @@ from pathlib import Path
|
|||
import pytest
|
||||
import trio
|
||||
import tractor
|
||||
from tractor import Actor
|
||||
from tractor.runtime import _state
|
||||
from tractor.discovery import _addr
|
||||
from tractor import (
|
||||
Actor,
|
||||
_state,
|
||||
_addr,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def bindspace_dir_str() -> str:
|
||||
|
||||
from tractor.runtime._state import get_rt_dir
|
||||
rt_dir: Path = get_rt_dir()
|
||||
rt_dir: Path = tractor._state.get_rt_dir()
|
||||
bs_dir: Path = rt_dir / 'doggy'
|
||||
bs_dir_str: str = str(bs_dir)
|
||||
assert not bs_dir.is_dir()
|
||||
|
|
|
|||
|
|
@ -13,9 +13,9 @@ from tractor import (
|
|||
Portal,
|
||||
ipc,
|
||||
msg,
|
||||
_state,
|
||||
_addr,
|
||||
)
|
||||
from tractor.runtime import _state
|
||||
from tractor.discovery import _addr
|
||||
|
||||
@tractor.context
|
||||
async def chk_tpts(
|
||||
|
|
@ -59,19 +59,9 @@ async def chk_tpts(
|
|||
)
|
||||
def test_root_passes_tpt_to_sub(
|
||||
tpt_proto_key: str,
|
||||
tpt_proto: str,
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
):
|
||||
# `reg_addr` is sourced from the CLI `--tpt-proto={tpt_proto}`,
|
||||
# so when the parametrized `tpt_proto_key` differs, the test
|
||||
# asks the runtime to `enable_transports=[<other_proto>]` while
|
||||
# pointing `registry_addrs` at a `reg_addr` of the wrong proto.
|
||||
# The layer-2 guard in `open_root_actor` is expected to fail
|
||||
# fast with `ValueError` on this mismatch (rather than the prior
|
||||
# silent hang during the registrar handshake).
|
||||
proto_mismatch: bool = (tpt_proto_key != tpt_proto)
|
||||
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
enable_transports=[tpt_proto_key],
|
||||
|
|
@ -102,14 +92,4 @@ def test_root_passes_tpt_to_sub(
|
|||
# shudown sub-actor(s)
|
||||
await an.cancel()
|
||||
|
||||
if proto_mismatch:
|
||||
# mismatched proto must raise `ValueError` from the
|
||||
# `open_root_actor` runtime guard before any subactor spawn.
|
||||
with pytest.raises(ValueError) as excinfo:
|
||||
trio.run(main)
|
||||
msg: str = str(excinfo.value)
|
||||
assert 'enable_transports' in msg
|
||||
assert 'registry_addrs' in msg
|
||||
assert tpt_proto_key in msg or tpt_proto in msg
|
||||
else:
|
||||
trio.run(main)
|
||||
trio.run(main)
|
||||
|
|
|
|||
|
|
@ -1,4 +0,0 @@
|
|||
'''
|
||||
`tractor.msg.*` sub-sys test suite.
|
||||
|
||||
'''
|
||||
|
|
@ -1,4 +0,0 @@
|
|||
'''
|
||||
`tractor.msg.*` test sub-pkg conf.
|
||||
|
||||
'''
|
||||
|
|
@ -1,240 +0,0 @@
|
|||
'''
|
||||
Unit tests for `tractor.msg.pretty_struct`
|
||||
private-field filtering in `pformat()`.
|
||||
|
||||
'''
|
||||
import pytest
|
||||
|
||||
from tractor.msg.pretty_struct import (
|
||||
Struct,
|
||||
pformat,
|
||||
iter_struct_ppfmt_lines,
|
||||
)
|
||||
from tractor.msg._codec import (
|
||||
MsgDec,
|
||||
mk_dec,
|
||||
)
|
||||
|
||||
|
||||
# ------ test struct definitions ------ #
|
||||
|
||||
class PublicOnly(Struct):
|
||||
'''
|
||||
All-public fields for baseline testing.
|
||||
|
||||
'''
|
||||
name: str = 'alice'
|
||||
age: int = 30
|
||||
|
||||
|
||||
class PrivateOnly(Struct):
|
||||
'''
|
||||
Only underscore-prefixed (private) fields.
|
||||
|
||||
'''
|
||||
_secret: str = 'hidden'
|
||||
_internal: int = 99
|
||||
|
||||
|
||||
class MixedFields(Struct):
|
||||
'''
|
||||
Mix of public and private fields.
|
||||
|
||||
'''
|
||||
name: str = 'bob'
|
||||
_hidden: int = 42
|
||||
value: float = 3.14
|
||||
_meta: str = 'internal'
|
||||
|
||||
|
||||
class Inner(
|
||||
Struct,
|
||||
frozen=True,
|
||||
):
|
||||
'''
|
||||
Frozen inner struct with a private field,
|
||||
for nesting tests.
|
||||
|
||||
'''
|
||||
x: int = 1
|
||||
_secret: str = 'nope'
|
||||
|
||||
|
||||
class Outer(Struct):
|
||||
'''
|
||||
Outer struct nesting an `Inner`.
|
||||
|
||||
'''
|
||||
label: str = 'outer'
|
||||
inner: Inner = Inner()
|
||||
|
||||
|
||||
class EmptyStruct(Struct):
|
||||
'''
|
||||
Struct with zero fields.
|
||||
|
||||
'''
|
||||
pass
|
||||
|
||||
|
||||
# ------ tests ------ #
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'struct_and_expected',
|
||||
[
|
||||
(
|
||||
PublicOnly(),
|
||||
{
|
||||
'shown': ['name', 'age'],
|
||||
'hidden': [],
|
||||
},
|
||||
),
|
||||
(
|
||||
MixedFields(),
|
||||
{
|
||||
'shown': ['name', 'value'],
|
||||
'hidden': ['_hidden', '_meta'],
|
||||
},
|
||||
),
|
||||
(
|
||||
PrivateOnly(),
|
||||
{
|
||||
'shown': [],
|
||||
'hidden': ['_secret', '_internal'],
|
||||
},
|
||||
),
|
||||
],
|
||||
ids=[
|
||||
'all-public',
|
||||
'mixed-pub-priv',
|
||||
'all-private',
|
||||
],
|
||||
)
|
||||
def test_field_visibility_in_pformat(
|
||||
struct_and_expected: tuple[
|
||||
Struct,
|
||||
dict[str, list[str]],
|
||||
],
|
||||
):
|
||||
'''
|
||||
Verify `pformat()` shows public fields
|
||||
and hides `_`-prefixed private fields.
|
||||
|
||||
'''
|
||||
(
|
||||
struct,
|
||||
expected,
|
||||
) = struct_and_expected
|
||||
output: str = pformat(struct)
|
||||
|
||||
for field_name in expected['shown']:
|
||||
assert field_name in output, (
|
||||
f'{field_name!r} should appear in:\n'
|
||||
f'{output}'
|
||||
)
|
||||
|
||||
for field_name in expected['hidden']:
|
||||
assert field_name not in output, (
|
||||
f'{field_name!r} should NOT appear in:\n'
|
||||
f'{output}'
|
||||
)
|
||||
|
||||
|
||||
def test_iter_ppfmt_lines_skips_private():
|
||||
'''
|
||||
Directly verify `iter_struct_ppfmt_lines()`
|
||||
never yields tuples with `_`-prefixed field
|
||||
names.
|
||||
|
||||
'''
|
||||
struct = MixedFields()
|
||||
lines: list[tuple[str, str]] = list(
|
||||
iter_struct_ppfmt_lines(
|
||||
struct,
|
||||
field_indent=2,
|
||||
)
|
||||
)
|
||||
# should have lines for public fields only
|
||||
assert len(lines) == 2
|
||||
|
||||
for _prefix, line_content in lines:
|
||||
field_name: str = (
|
||||
line_content.split(':')[0].strip()
|
||||
)
|
||||
assert not field_name.startswith('_'), (
|
||||
f'private field leaked: {field_name!r}'
|
||||
)
|
||||
|
||||
|
||||
def test_nested_struct_filters_inner_private():
|
||||
'''
|
||||
Verify that nested struct's private fields
|
||||
are also filtered out during recursion.
|
||||
|
||||
'''
|
||||
outer = Outer()
|
||||
output: str = pformat(outer)
|
||||
|
||||
# outer's public field
|
||||
assert 'label' in output
|
||||
|
||||
# inner's public field (recursed into)
|
||||
assert 'x' in output
|
||||
|
||||
# inner's private field must be hidden
|
||||
assert '_secret' not in output
|
||||
|
||||
|
||||
def test_empty_struct_pformat():
|
||||
'''
|
||||
An empty struct should produce a valid
|
||||
`pformat()` result with no field lines.
|
||||
|
||||
'''
|
||||
output: str = pformat(EmptyStruct())
|
||||
assert 'EmptyStruct(' in output
|
||||
assert output.rstrip().endswith(')')
|
||||
|
||||
# no field lines => only struct header+footer
|
||||
lines: list[tuple[str, str]] = list(
|
||||
iter_struct_ppfmt_lines(
|
||||
EmptyStruct(),
|
||||
field_indent=2,
|
||||
)
|
||||
)
|
||||
assert lines == []
|
||||
|
||||
|
||||
def test_real_msgdec_pformat_hides_private():
|
||||
'''
|
||||
Verify `pformat()` on a real `MsgDec`
|
||||
hides the `_dec` internal field.
|
||||
|
||||
NOTE: `MsgDec.__repr__` is custom and does
|
||||
NOT call `pformat()`, so we call it directly.
|
||||
|
||||
'''
|
||||
dec: MsgDec = mk_dec(spec=int)
|
||||
output: str = pformat(dec)
|
||||
|
||||
# the private `_dec` field should be filtered
|
||||
assert '_dec' not in output
|
||||
|
||||
# but the struct type name should be present
|
||||
assert 'MsgDec(' in output
|
||||
|
||||
|
||||
def test_pformat_repr_integration():
|
||||
'''
|
||||
Verify that `Struct.__repr__()` (which calls
|
||||
`pformat()`) also hides private fields for
|
||||
custom structs that do NOT override `__repr__`.
|
||||
|
||||
'''
|
||||
mixed = MixedFields()
|
||||
output: str = repr(mixed)
|
||||
|
||||
assert 'name' in output
|
||||
assert 'value' in output
|
||||
assert '_hidden' not in output
|
||||
assert '_meta' not in output
|
||||
|
|
@ -1,12 +1,7 @@
|
|||
'''
|
||||
Audit the simplest inter-actor bidirectional (streaming)
|
||||
msg patterns.
|
||||
"""
|
||||
Bidirectional streaming.
|
||||
|
||||
'''
|
||||
from __future__ import annotations
|
||||
from typing import (
|
||||
Callable,
|
||||
)
|
||||
"""
|
||||
import pytest
|
||||
import trio
|
||||
import tractor
|
||||
|
|
@ -14,8 +9,10 @@ import tractor
|
|||
|
||||
@tractor.context
|
||||
async def simple_rpc(
|
||||
|
||||
ctx: tractor.Context,
|
||||
data: int,
|
||||
|
||||
) -> None:
|
||||
'''
|
||||
Test a small ping-pong server.
|
||||
|
|
@ -42,13 +39,15 @@ async def simple_rpc(
|
|||
|
||||
@tractor.context
|
||||
async def simple_rpc_with_forloop(
|
||||
|
||||
ctx: tractor.Context,
|
||||
data: int,
|
||||
) -> None:
|
||||
'''
|
||||
Same as previous test but using `async for` syntax/api.
|
||||
|
||||
'''
|
||||
) -> None:
|
||||
"""Same as previous test but using ``async for`` syntax/api.
|
||||
|
||||
"""
|
||||
|
||||
# signal to parent that we're up
|
||||
await ctx.started(data + 1)
|
||||
|
||||
|
|
@ -69,78 +68,62 @@ async def simple_rpc_with_forloop(
|
|||
|
||||
@pytest.mark.parametrize(
|
||||
'use_async_for',
|
||||
[
|
||||
True,
|
||||
False,
|
||||
],
|
||||
ids='use_async_for={}'.format,
|
||||
[True, False],
|
||||
)
|
||||
@pytest.mark.parametrize(
|
||||
'server_func',
|
||||
[
|
||||
simple_rpc,
|
||||
simple_rpc_with_forloop,
|
||||
],
|
||||
ids='server_func={}'.format,
|
||||
[simple_rpc, simple_rpc_with_forloop],
|
||||
)
|
||||
def test_simple_rpc(
|
||||
server_func: Callable,
|
||||
use_async_for: bool,
|
||||
loglevel: str,
|
||||
debug_mode: bool,
|
||||
):
|
||||
def test_simple_rpc(server_func, use_async_for):
|
||||
'''
|
||||
The simplest request response pattern.
|
||||
|
||||
'''
|
||||
async def main():
|
||||
with trio.fail_after(6):
|
||||
async with tractor.open_nursery(
|
||||
loglevel=loglevel,
|
||||
debug_mode=debug_mode,
|
||||
) as an:
|
||||
portal: tractor.Portal = await an.start_actor(
|
||||
'rpc_server',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
async with tractor.open_nursery() as n:
|
||||
|
||||
async with portal.open_context(
|
||||
server_func, # taken from pytest parameterization
|
||||
data=10,
|
||||
) as (ctx, sent):
|
||||
portal = await n.start_actor(
|
||||
'rpc_server',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
|
||||
assert sent == 11
|
||||
async with portal.open_context(
|
||||
server_func, # taken from pytest parameterization
|
||||
data=10,
|
||||
) as (ctx, sent):
|
||||
|
||||
async with ctx.open_stream() as stream:
|
||||
assert sent == 11
|
||||
|
||||
if use_async_for:
|
||||
async with ctx.open_stream() as stream:
|
||||
|
||||
count = 0
|
||||
# receive msgs using async for style
|
||||
if use_async_for:
|
||||
|
||||
count = 0
|
||||
# receive msgs using async for style
|
||||
print('ping')
|
||||
await stream.send('ping')
|
||||
|
||||
async for msg in stream:
|
||||
assert msg == 'pong'
|
||||
print('ping')
|
||||
await stream.send('ping')
|
||||
count += 1
|
||||
|
||||
async for msg in stream:
|
||||
assert msg == 'pong'
|
||||
print('ping')
|
||||
await stream.send('ping')
|
||||
count += 1
|
||||
if count >= 9:
|
||||
break
|
||||
|
||||
if count >= 9:
|
||||
break
|
||||
else:
|
||||
# classic send/receive style
|
||||
for _ in range(10):
|
||||
|
||||
else:
|
||||
# classic send/receive style
|
||||
for _ in range(10):
|
||||
print('ping')
|
||||
await stream.send('ping')
|
||||
assert await stream.receive() == 'pong'
|
||||
|
||||
print('ping')
|
||||
await stream.send('ping')
|
||||
assert await stream.receive() == 'pong'
|
||||
# stream should terminate here
|
||||
|
||||
# stream should terminate here
|
||||
# final context result(s) should be consumed here in __aexit__()
|
||||
|
||||
# final context result(s) should be consumed here in __aexit__()
|
||||
|
||||
await portal.cancel_actor()
|
||||
await portal.cancel_actor()
|
||||
|
||||
trio.run(main)
|
||||
|
|
|
|||
|
|
@ -98,8 +98,7 @@ def test_ipc_channel_break_during_stream(
|
|||
expect_final_exc = TransportClosed
|
||||
|
||||
mod: ModuleType = import_path(
|
||||
examples_dir()
|
||||
/ 'advanced_faults'
|
||||
examples_dir() / 'advanced_faults'
|
||||
/ 'ipc_failure_during_stream.py',
|
||||
root=examples_dir(),
|
||||
consider_namespace_packages=False,
|
||||
|
|
@ -114,9 +113,8 @@ def test_ipc_channel_break_during_stream(
|
|||
if (
|
||||
# only expect EoC if trans is broken on the child side,
|
||||
ipc_break['break_child_ipc_after'] is not False
|
||||
and
|
||||
# AND we tell the child to call `MsgStream.aclose()`.
|
||||
pre_aclose_msgstream
|
||||
and pre_aclose_msgstream
|
||||
):
|
||||
# expect_final_exc = trio.EndOfChannel
|
||||
# ^XXX NOPE! XXX^ since now `.open_stream()` absorbs this
|
||||
|
|
@ -146,6 +144,9 @@ def test_ipc_channel_break_during_stream(
|
|||
# a user sending ctl-c by raising a KBI.
|
||||
if pre_aclose_msgstream:
|
||||
expect_final_exc = KeyboardInterrupt
|
||||
if tpt_proto == 'uds':
|
||||
expect_final_exc = TransportClosed
|
||||
expect_final_cause = trio.BrokenResourceError
|
||||
|
||||
# XXX OLD XXX
|
||||
# if child calls `MsgStream.aclose()` then expect EoC.
|
||||
|
|
@ -159,13 +160,16 @@ def test_ipc_channel_break_during_stream(
|
|||
ipc_break['break_child_ipc_after'] is not False
|
||||
and (
|
||||
ipc_break['break_parent_ipc_after']
|
||||
>
|
||||
ipc_break['break_child_ipc_after']
|
||||
> ipc_break['break_child_ipc_after']
|
||||
)
|
||||
):
|
||||
if pre_aclose_msgstream:
|
||||
expect_final_exc = KeyboardInterrupt
|
||||
|
||||
if tpt_proto == 'uds':
|
||||
expect_final_exc = TransportClosed
|
||||
expect_final_cause = trio.BrokenResourceError
|
||||
|
||||
# NOTE when the parent IPC side dies (even if the child does as well
|
||||
# but the child fails BEFORE the parent) we always expect the
|
||||
# IPC layer to raise a closed-resource, NEVER do we expect
|
||||
|
|
@ -244,15 +248,8 @@ def test_ipc_channel_break_during_stream(
|
|||
# get raw instance from pytest wrapper
|
||||
value = excinfo.value
|
||||
if isinstance(value, ExceptionGroup):
|
||||
excs: tuple[Exception] = value.exceptions
|
||||
assert (
|
||||
len(excs) <= 2
|
||||
and
|
||||
all(
|
||||
isinstance(exc, TransportClosed)
|
||||
for exc in excs
|
||||
)
|
||||
)
|
||||
excs = value.exceptions
|
||||
assert len(excs) == 1
|
||||
final_exc = excs[0]
|
||||
assert isinstance(final_exc, expect_final_exc)
|
||||
|
||||
|
|
|
|||
|
|
@ -5,15 +5,10 @@ Advanced streaming patterns using bidirectional streams and contexts.
|
|||
from collections import Counter
|
||||
import itertools
|
||||
import platform
|
||||
from typing import Type
|
||||
|
||||
import pytest
|
||||
import trio
|
||||
import tractor
|
||||
from tractor._testing.trace import (
|
||||
AfkAlarmWTraceFactory,
|
||||
FailAfterWTraceFactory,
|
||||
)
|
||||
|
||||
|
||||
def is_win():
|
||||
|
|
@ -81,7 +76,9 @@ async def subscribe(
|
|||
|
||||
|
||||
async def consumer(
|
||||
|
||||
subs: list[str],
|
||||
|
||||
) -> None:
|
||||
|
||||
uid = tractor.current_actor().uid
|
||||
|
|
@ -111,193 +108,59 @@ async def consumer(
|
|||
print(f'{uid} got: {value}')
|
||||
|
||||
|
||||
# NOTE: deliberately NOT using `@pytest.mark.timeout(...)` —
|
||||
# both pytest-timeout enforcement modes break trio under
|
||||
# fork-based backends:
|
||||
#
|
||||
# - `method='signal'` (SIGALRM): the handler synchronously
|
||||
# raises `Failed` in trio's main thread mid-`epoll.poll()`,
|
||||
# leaves `GLOBAL_RUN_CONTEXT` half-installed ("Trio guest
|
||||
# run got abandoned"), and EVERY subsequent `trio.run()`
|
||||
# in the same pytest process bails with
|
||||
# `RuntimeError: Attempted to call run() from inside a
|
||||
# run()` — session-wide poison.
|
||||
#
|
||||
# - `method='thread'`: calls `_thread.interrupt_main()`
|
||||
# raising `KeyboardInterrupt` into the main thread. Under
|
||||
# fork-based backends with mid-cascade fd-juggling the KBI
|
||||
# can escape trio's `KIManager` and bubble out of pytest
|
||||
# itself — kills the WHOLE session.
|
||||
#
|
||||
# Instead we use `trio.fail_after()` INSIDE `main()` below:
|
||||
# trio's own `Cancelled`/`TooSlowError` machinery handles the
|
||||
# timeout, cleanly unwinds the actor nursery's cancel
|
||||
# cascade, and only fails the single test (no cross-test
|
||||
# state corruption either way).
|
||||
#
|
||||
# `pyproject.toml`'s default `timeout = 200` is still a
|
||||
# last-resort safety net.
|
||||
@pytest.mark.parametrize(
|
||||
'expect_cancel_exc', [
|
||||
KeyboardInterrupt,
|
||||
trio.TooSlowError,
|
||||
],
|
||||
ids=lambda item:
|
||||
f'expect_user_exc_raised={item.__name__}'
|
||||
)
|
||||
def test_dynamic_pub_sub(
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
test_log: tractor.log.StackLevelAdapter,
|
||||
reap_subactors_per_test: int,
|
||||
expect_cancel_exc: Type[BaseException],
|
||||
|
||||
is_forking_spawner: bool,
|
||||
set_fork_aware_capture,
|
||||
|
||||
fail_after_w_trace: FailAfterWTraceFactory,
|
||||
afk_alarm_w_trace: AfkAlarmWTraceFactory,
|
||||
):
|
||||
failed_to_raise_report: str = (
|
||||
f'Never got a {expect_cancel_exc!r} ??'
|
||||
)
|
||||
def test_dynamic_pub_sub():
|
||||
|
||||
global _registry
|
||||
|
||||
from multiprocessing import cpu_count
|
||||
cpus = cpu_count()
|
||||
|
||||
# Hard safety cap via trio's own cancellation. NOTE see the
|
||||
# module-level note on why we avoid `pytest-timeout` for this
|
||||
# test. Picked backend-aware: under `trio` backend spawn is
|
||||
# cheap (~1s for `cpus` actors) but fork-based backends pay
|
||||
# a per-spawn cost (forkserver round-trip + IPC peer-handshake)
|
||||
# that can stack up over `cpus - 1` sequential `n.run_in_actor()`
|
||||
# calls — especially on UDS under cross-pytest contention
|
||||
# (#451 / #452). 4s was flaking right at the edge under fork
|
||||
# backends — bumped to 8s with diag-snapshot-on-timeout via
|
||||
# `fail_after_w_trace` so a borderline run still fails loud
|
||||
# but lands a ptree/wchan/py-spy dump in
|
||||
# `$XDG_CACHE_HOME/tractor/hung-dumps/` for inspection.
|
||||
#
|
||||
# XXX caveat: this is an *inner* trio cancel — its `Cancelled`
|
||||
# cannot reach a task parked in a shielded `await` (e.g. inside
|
||||
# actor-nursery teardown). When the in-band cancel path is
|
||||
# itself buggy (the bug-class-3 `raise KBI` swallow we're
|
||||
# currently chasing) this guard does NOT fire and the test
|
||||
# sits forever until external SIGINT. The `afk_alarm_w_trace`
|
||||
# outer guard below is the AFK-safety counterpart (SIGALRM
|
||||
# raises in the main thread regardless of trio scope state).
|
||||
fail_after_s: int = (
|
||||
8
|
||||
if is_forking_spawner
|
||||
else 20
|
||||
)
|
||||
|
||||
async def main():
|
||||
# bug-class-3 breadcrumb: tag each level of the cancel path
|
||||
# so when the run hangs and we capture cancel-level logs, the
|
||||
# *last* breadcrumb that fired names the swallow point.
|
||||
test_log.cancel('test_dynamic_pub_sub: enter main()')
|
||||
try:
|
||||
async with fail_after_w_trace(fail_after_s):
|
||||
test_log.cancel(
|
||||
f'test_dynamic_pub_sub: '
|
||||
f'enter `fail_after_w_trace({fail_after_s})` scope'
|
||||
async with tractor.open_nursery() as n:
|
||||
|
||||
# name of this actor will be same as target func
|
||||
await n.run_in_actor(publisher)
|
||||
|
||||
for i, sub in zip(
|
||||
range(cpus - 2),
|
||||
itertools.cycle(_registry.keys())
|
||||
):
|
||||
await n.run_in_actor(
|
||||
consumer,
|
||||
name=f'consumer_{sub}',
|
||||
subs=[sub],
|
||||
)
|
||||
try:
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
) as n:
|
||||
test_log.cancel(
|
||||
'test_dynamic_pub_sub: '
|
||||
'actor nursery opened'
|
||||
)
|
||||
|
||||
# name of this actor will be same as target func
|
||||
await n.run_in_actor(publisher)
|
||||
|
||||
for i, sub in zip(
|
||||
range(cpus - 2),
|
||||
itertools.cycle(_registry.keys())
|
||||
):
|
||||
await n.run_in_actor(
|
||||
consumer,
|
||||
name=f'consumer_{sub}',
|
||||
subs=[sub],
|
||||
)
|
||||
|
||||
# make one dynamic subscriber
|
||||
await n.run_in_actor(
|
||||
consumer,
|
||||
name='consumer_dynamic',
|
||||
subs=list(_registry.keys()),
|
||||
)
|
||||
|
||||
# block until "cancelled by user"
|
||||
await trio.sleep(3)
|
||||
test_log.warning(
|
||||
f'Raising user cancel exc: '
|
||||
f'{expect_cancel_exc!r}'
|
||||
)
|
||||
test_log.cancel(
|
||||
f'test_dynamic_pub_sub: '
|
||||
f'ABOUT TO RAISE {expect_cancel_exc!r}'
|
||||
)
|
||||
raise expect_cancel_exc('simulate user cancel!')
|
||||
finally:
|
||||
test_log.cancel(
|
||||
'test_dynamic_pub_sub: '
|
||||
'actor nursery `__aexit__` returned'
|
||||
)
|
||||
test_log.cancel(
|
||||
'test_dynamic_pub_sub: `fail_after` scope exited'
|
||||
)
|
||||
finally:
|
||||
test_log.cancel(
|
||||
'test_dynamic_pub_sub: leaving `main()`'
|
||||
# make one dynamic subscriber
|
||||
await n.run_in_actor(
|
||||
consumer,
|
||||
name='consumer_dynamic',
|
||||
subs=list(_registry.keys()),
|
||||
)
|
||||
|
||||
def _run_and_match():
|
||||
try:
|
||||
trio.run(main)
|
||||
pytest.fail(failed_to_raise_report)
|
||||
except expect_cancel_exc:
|
||||
# parent-side raised the user-cancel exc directly and
|
||||
# it propagated unwrapped; clean path.
|
||||
test_log.exception('Got user-cancel exc AS EXPECTED')
|
||||
except BaseExceptionGroup as err:
|
||||
# under fork-based backends the user-raised cancel
|
||||
# can race with subactor-side stream teardown
|
||||
# (`trio.EndOfChannel` from a publisher's `send()`
|
||||
# whose remote half got cut). The expected exc may
|
||||
# then be nested deeper in the group rather than at
|
||||
# the top level. `BaseExceptionGroup.split()` walks
|
||||
# the exc tree recursively (Python 3.11+).
|
||||
matched, _ = err.split(expect_cancel_exc)
|
||||
if matched is None:
|
||||
pytest.fail(failed_to_raise_report)
|
||||
# block until cancelled by user
|
||||
with trio.fail_after(3):
|
||||
await trio.sleep_forever()
|
||||
|
||||
test_log.exception('Got user-cancel exc AS EXPECTED')
|
||||
|
||||
# outer SIGALRM-based guard — survives a shielded-await
|
||||
# deadlock since `signal.alarm` raises in the main thread
|
||||
# regardless of trio's scope state, AND captures a full diag
|
||||
# snapshot to `$XDG_CACHE_HOME/tractor/hung-dumps/` before
|
||||
# re-raising. ONLY armed under fork-based backends since the
|
||||
# bug we're chasing is MTF-specific. Cap = `fail_after_s + 5`
|
||||
# so the trio-native path always wins when it works.
|
||||
if is_forking_spawner:
|
||||
with afk_alarm_w_trace(fail_after_s + 5):
|
||||
_run_and_match()
|
||||
else:
|
||||
_run_and_match()
|
||||
try:
|
||||
trio.run(main)
|
||||
except (
|
||||
trio.TooSlowError,
|
||||
ExceptionGroup,
|
||||
) as err:
|
||||
if isinstance(err, ExceptionGroup):
|
||||
for suberr in err.exceptions:
|
||||
if isinstance(suberr, trio.TooSlowError):
|
||||
break
|
||||
else:
|
||||
pytest.fail('Never got a `TooSlowError` ?')
|
||||
|
||||
|
||||
@tractor.context
|
||||
async def one_task_streams_and_one_handles_reqresp(
|
||||
|
||||
ctx: tractor.Context,
|
||||
|
||||
) -> None:
|
||||
|
||||
await ctx.started()
|
||||
|
|
@ -394,8 +257,7 @@ async def echo_ctx_stream(
|
|||
|
||||
|
||||
def test_sigint_both_stream_types():
|
||||
'''
|
||||
Verify that running a bi-directional and recv only stream
|
||||
'''Verify that running a bi-directional and recv only stream
|
||||
side-by-side will cancel correctly from SIGINT.
|
||||
|
||||
'''
|
||||
|
|
@ -425,11 +287,9 @@ def test_sigint_both_stream_types():
|
|||
assert resp == msg
|
||||
raise KeyboardInterrupt
|
||||
|
||||
# TODO, use pytest.raises() here instead?
|
||||
# (why weren't we originally?)
|
||||
try:
|
||||
trio.run(main)
|
||||
pytest.fail("Didn't receive KBI!?")
|
||||
assert 0, "Didn't receive KBI!?"
|
||||
except KeyboardInterrupt:
|
||||
pass
|
||||
|
||||
|
|
@ -496,12 +356,7 @@ async def inf_streamer(
|
|||
print('streamer exited .open_streamer() block')
|
||||
|
||||
|
||||
# @pytest.mark.timeout(
|
||||
# 6,
|
||||
# method='signal',
|
||||
# )
|
||||
def test_local_task_fanout_from_stream(
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
):
|
||||
'''
|
||||
|
|
@ -566,9 +421,4 @@ def test_local_task_fanout_from_stream(
|
|||
|
||||
await p.cancel_actor()
|
||||
|
||||
async def w_timeout():
|
||||
with trio.fail_after(6):
|
||||
await main()
|
||||
|
||||
# trio.run(main)
|
||||
trio.run(w_timeout)
|
||||
trio.run(main)
|
||||
|
|
|
|||
|
|
@ -7,7 +7,6 @@ import signal
|
|||
import platform
|
||||
import time
|
||||
from itertools import repeat
|
||||
from typing import Type
|
||||
|
||||
import pytest
|
||||
import trio
|
||||
|
|
@ -15,52 +14,11 @@ import tractor
|
|||
from tractor._testing import (
|
||||
tractor_test,
|
||||
)
|
||||
from tractor._testing.trace import FailAfterWTraceFactory
|
||||
from .conftest import no_windows
|
||||
|
||||
|
||||
_non_linux: bool = platform.system() != 'Linux'
|
||||
_friggin_windows: bool = platform.system() == 'Windows'
|
||||
|
||||
|
||||
pytestmark = [
|
||||
# Multi-actor cancel cascades under
|
||||
# `--spawn-backend=subint` trip the abandoned-subint
|
||||
# GIL-hostage class — a stuck subint can starve the
|
||||
# parent's trio loop and block cancel-delivery.
|
||||
# Apply the skip module-wide rather than per-test
|
||||
# since every test here exercises the same cascade.
|
||||
pytest.mark.skipon_spawn_backend(
|
||||
'subint',
|
||||
reason=(
|
||||
'XXX SUBINT GIL-CONTENTION HANGING TEST XXX\n'
|
||||
'Cancel cascades under '
|
||||
'`--spawn-backend=subint` trip the abandoned-subint '
|
||||
'GIL-hostage class — see\n'
|
||||
' - `ai/conc-anal/subint_sigint_starvation_issue.md` '
|
||||
'(GIL-hostage, SIGINT-unresponsive)\n'
|
||||
' - `ai/conc-anal/subint_cancel_delivery_hang_issue.md` '
|
||||
'(sibling: parent parks on dead chan)\n'
|
||||
' - https://github.com/goodboy/tractor/issues/379 '
|
||||
'(subint umbrella)\n'
|
||||
)
|
||||
),
|
||||
pytest.mark.usefixtures(
|
||||
'reap_subactors_per_test',
|
||||
# NOTE, cancellation tests stress the SIGKILL
|
||||
# `hard_kill` path which leaks UDS sock-files when
|
||||
# the subactor's IPC server `finally:` cleanup
|
||||
# doesn't run. Track per-test for blame attribution.
|
||||
'track_orphaned_uds_per_test',
|
||||
# NOTE, cancel-cascade timing races (see
|
||||
# `test_nested_multierrors`) can also leave a
|
||||
# subactor spinning at 100% CPU when its cancel
|
||||
# signal got swallowed mid-handshake. Catches the
|
||||
# runaway-loop class that doesn't leak UDS socks
|
||||
# but burns the box.
|
||||
'detect_runaway_subactors_per_test',
|
||||
),
|
||||
]
|
||||
def is_win():
|
||||
return platform.system() == 'Windows'
|
||||
|
||||
|
||||
async def assert_err(delay=0):
|
||||
|
|
@ -87,11 +45,7 @@ async def do_nuthin():
|
|||
],
|
||||
ids=['no_args', 'unexpected_args'],
|
||||
)
|
||||
def test_remote_error(
|
||||
reg_addr: tuple,
|
||||
args_err: tuple[dict, Type[Exception]],
|
||||
set_fork_aware_capture,
|
||||
):
|
||||
def test_remote_error(reg_addr, args_err):
|
||||
'''
|
||||
Verify an error raised in a subactor that is propagated
|
||||
to the parent nursery, contains the underlying boxed builtin
|
||||
|
|
@ -158,8 +112,6 @@ def test_remote_error(
|
|||
|
||||
def test_multierror(
|
||||
reg_addr: tuple[str, int],
|
||||
start_method: str, # parametrized
|
||||
set_fork_aware_capture, #: Callable,
|
||||
):
|
||||
'''
|
||||
Verify we raise a ``BaseExceptionGroup`` out of a nursery where
|
||||
|
|
@ -189,68 +141,31 @@ def test_multierror(
|
|||
trio.run(main)
|
||||
|
||||
|
||||
@pytest.mark.parametrize('delay', (0, 0.5))
|
||||
@pytest.mark.parametrize(
|
||||
'delay',
|
||||
(0, 0.5),
|
||||
ids='delays={}'.format,
|
||||
'num_subactors', range(25, 26),
|
||||
)
|
||||
@pytest.mark.parametrize(
|
||||
'num_subactors',
|
||||
range(25, 26),
|
||||
ids= 'num_subs={}'.format,
|
||||
)
|
||||
def test_multierror_fast_nursery(
|
||||
reg_addr: tuple,
|
||||
start_method: str,
|
||||
num_subactors: int,
|
||||
delay: float,
|
||||
set_fork_aware_capture,
|
||||
fail_after_w_trace: FailAfterWTraceFactory,
|
||||
):
|
||||
'''
|
||||
Verify we raise a ``BaseExceptionGroup`` out of a nursery where
|
||||
def test_multierror_fast_nursery(reg_addr, start_method, num_subactors, delay):
|
||||
"""Verify we raise a ``BaseExceptionGroup`` out of a nursery where
|
||||
more then one actor errors and also with a delay before failure
|
||||
to test failure during an ongoing spawning.
|
||||
|
||||
'''
|
||||
"""
|
||||
async def main():
|
||||
# budget = 2× natural trio-backend cascade time for
|
||||
# 25 errorer subactors (~14s observed). on-timeout
|
||||
# diag snapshot → if the cancel cascade hangs
|
||||
# (observed under MTF backend with N>=14 errorer
|
||||
# subactors) we get a fresh ptree/wchan/py-spy dump
|
||||
# on disk INSTEAD of an opaque pytest timeout-kill.
|
||||
# See `tractor/_testing/trace.py` for the helper.
|
||||
async with fail_after_w_trace(30.0):
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as nursery:
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as nursery:
|
||||
|
||||
for i in range(num_subactors):
|
||||
await nursery.run_in_actor(
|
||||
assert_err,
|
||||
name=f'errorer{i}',
|
||||
delay=delay
|
||||
)
|
||||
for i in range(num_subactors):
|
||||
await nursery.run_in_actor(
|
||||
assert_err,
|
||||
name=f'errorer{i}',
|
||||
delay=delay
|
||||
)
|
||||
|
||||
# with pytest.raises(trio.MultiError) as exc_info:
|
||||
# NOTE, `trio.TooSlowError` from `fail_after_w_trace`
|
||||
# bubbles UN-wrapped if `open_nursery.__aexit__` never
|
||||
# gets re-entered; wrapped inside a `BaseExceptionGroup`
|
||||
# if it did. Accept both shapes so the matcher itself
|
||||
# doesn't lie about *what* failed.
|
||||
with pytest.raises(
|
||||
(BaseExceptionGroup, trio.TooSlowError),
|
||||
) as exc_info:
|
||||
with pytest.raises(BaseExceptionGroup) as exc_info:
|
||||
trio.run(main)
|
||||
|
||||
if isinstance(exc_info.value, trio.TooSlowError):
|
||||
pytest.fail(
|
||||
f'cancel cascade hung past 12s '
|
||||
f'(num_subactors={num_subactors}, delay={delay}); '
|
||||
f'see stderr for `fail_after_w_trace` snapshot path'
|
||||
)
|
||||
|
||||
assert exc_info.type == ExceptionGroup
|
||||
err = exc_info.value
|
||||
exceptions = err.exceptions
|
||||
|
|
@ -274,15 +189,8 @@ async def do_nothing():
|
|||
pass
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'mechanism', [
|
||||
'nursery_cancel',
|
||||
KeyboardInterrupt,
|
||||
])
|
||||
def test_cancel_single_subactor(
|
||||
reg_addr: tuple,
|
||||
mechanism: str|KeyboardInterrupt,
|
||||
):
|
||||
@pytest.mark.parametrize('mechanism', ['nursery_cancel', KeyboardInterrupt])
|
||||
def test_cancel_single_subactor(reg_addr, mechanism):
|
||||
'''
|
||||
Ensure a ``ActorNursery.start_actor()`` spawned subactor
|
||||
cancels when the nursery is cancelled.
|
||||
|
|
@ -324,14 +232,9 @@ async def stream_forever():
|
|||
await trio.sleep(0.01)
|
||||
|
||||
|
||||
@tractor_test(
|
||||
timeout=6,
|
||||
)
|
||||
async def test_cancel_infinite_streamer(
|
||||
reg_addr: tuple,
|
||||
start_method: str,
|
||||
set_fork_aware_capture,
|
||||
):
|
||||
@tractor_test
|
||||
async def test_cancel_infinite_streamer(start_method):
|
||||
|
||||
# stream for at most 1 seconds
|
||||
with (
|
||||
trio.fail_after(4),
|
||||
|
|
@ -383,15 +286,11 @@ async def test_cancel_infinite_streamer(
|
|||
'no_daemon_actors_fail_all_run_in_actors_sleep_then_fail',
|
||||
],
|
||||
)
|
||||
@tractor_test(
|
||||
timeout=10,
|
||||
)
|
||||
@tractor_test
|
||||
async def test_some_cancels_all(
|
||||
num_actors_and_errs: tuple,
|
||||
reg_addr: tuple,
|
||||
start_method: str,
|
||||
loglevel: str,
|
||||
set_fork_aware_capture, #: Callable,
|
||||
):
|
||||
'''
|
||||
Verify a subset of failed subactors causes all others in
|
||||
|
|
@ -471,10 +370,7 @@ async def test_some_cancels_all(
|
|||
pytest.fail("Should have gotten a remote assertion error?")
|
||||
|
||||
|
||||
async def spawn_and_error(
|
||||
breadth: int,
|
||||
depth: int,
|
||||
) -> None:
|
||||
async def spawn_and_error(breadth, depth) -> None:
|
||||
name = tractor.current_actor().name
|
||||
async with tractor.open_nursery() as nursery:
|
||||
for i in range(breadth):
|
||||
|
|
@ -499,140 +395,28 @@ async def spawn_and_error(
|
|||
await nursery.run_in_actor(*args, **kwargs)
|
||||
|
||||
|
||||
# NOTE: `main_thread_forkserver` capture-fd hang class is no
|
||||
# longer skipped here — `--capture=sys` (the new `pyproject.toml`
|
||||
# default) sidesteps the pipe-buffer-fill deadlock for
|
||||
# `test_nested_multierrors`. See
|
||||
# `ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md`
|
||||
# / #449 for the post-mortem.
|
||||
# @pytest.mark.timeout(
|
||||
# 10,
|
||||
# method='thread',
|
||||
# )
|
||||
@pytest.mark.parametrize(
|
||||
'depth',
|
||||
[1, 3],
|
||||
ids='depth={}'.format,
|
||||
)
|
||||
@tractor_test(
|
||||
# bumped from the 30s default to cover fork-based
|
||||
# cancel-cascade flakes; 2 spawners × 2 errorers × depth 1+
|
||||
# cascade through 6 portal-wait_for_result paths each
|
||||
# paying `terminate_after=1.6s` + UDS sock-unlink under
|
||||
# MTF/UDS contention can easily blow past 30s.
|
||||
# Trio backend is fast and won't notice the extra budget.
|
||||
# See `ai/conc-anal/cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`.
|
||||
timeout=10,
|
||||
)
|
||||
async def test_nested_multierrors(
|
||||
reg_addr: tuple,
|
||||
loglevel: str,
|
||||
start_method: str,
|
||||
set_fork_aware_capture,
|
||||
fail_after_w_trace: FailAfterWTraceFactory,
|
||||
request: pytest.FixtureRequest,
|
||||
depth: int,
|
||||
):
|
||||
@tractor_test
|
||||
async def test_nested_multierrors(loglevel, start_method):
|
||||
'''
|
||||
Test that failed actor sets are wrapped in `BaseExceptionGroup`s.
|
||||
|
||||
Parametrized over recursion `depth ∈ {1, 3}`:
|
||||
|
||||
- `depth=1`: shallow tree (2 spawners × 2 errorers, 2
|
||||
levels). Cascade completes well within budget on ALL
|
||||
backends including MTF — regression-safety green case.
|
||||
|
||||
- `depth=3`: deep tree (2 spawners × recursive depth-3
|
||||
spawn-and-error). On `main_thread_forkserver` this
|
||||
trips the cancel-cascade shape-mismatch bug class
|
||||
(see `ai/conc-anal/cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`)
|
||||
— xfailed below.
|
||||
Test that failed actor sets are wrapped in `BaseExceptionGroup`s. This
|
||||
test goes only 2 nurseries deep but we should eventually have tests
|
||||
for arbitrary n-depth actor trees.
|
||||
|
||||
'''
|
||||
# XXX: `multiprocessing.forkserver` can't handle nested
|
||||
# spawning at any depth — hangs / broken-pipes. Pre-existing
|
||||
# backend limitation, NOT depth-specific.
|
||||
if start_method == 'forkserver':
|
||||
pytest.skip("Forksever sux hard at nested spawning...")
|
||||
if start_method == 'trio':
|
||||
depth = 3
|
||||
subactor_breadth = 2
|
||||
else:
|
||||
# XXX: multiprocessing can't seem to handle any more then 2 depth
|
||||
# process trees for whatever reason.
|
||||
# Any more process levels then this and we see bugs that cause
|
||||
# hangs and broken pipes all over the place...
|
||||
if start_method == 'forkserver':
|
||||
pytest.skip("Forksever sux hard at nested spawning...")
|
||||
depth = 1 # means an additional actor tree of spawning (2 levels deep)
|
||||
subactor_breadth = 2
|
||||
|
||||
subactor_breadth = 2
|
||||
|
||||
# MTF backend trips a probabilistic timing race in the
|
||||
# cancel-cascade — NOT depth-gated; depth amplifies the
|
||||
# variance so depth=3 misses nearly every run while
|
||||
# depth=1 misses occasionally. Both get the xfail mark
|
||||
# (with `strict=False`) since the bug class can fire at
|
||||
# either depth.
|
||||
#
|
||||
# The scenario in detail:
|
||||
#
|
||||
# T=0 spawn spawner_0 + spawner_1 in parallel
|
||||
# T=t1 spawner_0's child errors →
|
||||
# RemoteActorError reaches root nursery
|
||||
# T=t1+ε root nursery starts cancelling
|
||||
# spawner_1's portal-wait
|
||||
# T=t2 spawner_1's child errors → tries to send
|
||||
# RemoteActorError back
|
||||
#
|
||||
# if t2 < t1+ε: BEG = [RAE, RAE] ← clean (xpass)
|
||||
# if t2 > t1+ε: BEG = [RAE, Cancelled] ← race tripped (xfail)
|
||||
#
|
||||
# i.e. the assertion below (`isinstance(_, RemoteActorError)`)
|
||||
# fails iff cancel-delivery beats the other tree's natural
|
||||
# error-propagation. Depth amplifies `t2-t1` variance
|
||||
# (longer per-tree paths = more skew); under MTF the
|
||||
# fork-spawn jitter + UDS-contention widens both `t1` and
|
||||
# `t2` further.
|
||||
#
|
||||
# With `strict=False` the clean-cascade cases (most
|
||||
# depth=1 runs, rare depth=3 runs) report as `xpassed`
|
||||
# while the race-tripped cases report as `xfailed` —
|
||||
# neither flakes `--lf`. When MTF cancel-cascade
|
||||
# eventually speeds up enough to close the race even at
|
||||
# depth=3, BOTH variants will reliably `xpass` and
|
||||
# pytest will yell — our signal to drop the marker. See
|
||||
# `ai/conc-anal/cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`.
|
||||
if start_method == 'main_thread_forkserver':
|
||||
request.node.add_marker(
|
||||
pytest.mark.xfail(
|
||||
strict=False,
|
||||
reason=(
|
||||
f'MTF cancel-cascade shape-mismatch at '
|
||||
f'depth={depth} (Cancelled races '
|
||||
f'RemoteActorError in BEG); see conc-anal/'
|
||||
'cancel_cascade_too_slow_under_main_thread_forkserver_issue.md'
|
||||
),
|
||||
)
|
||||
)
|
||||
|
||||
# Per-backend/-depth budgets: in the non-hang case the
|
||||
# whole spawn + cancel-cascade should complete in well
|
||||
# under these. On the borderline hang case the
|
||||
# `fail_after_w_trace` fires `TooSlowError` AND captures a
|
||||
# ptree/wchan/py-spy snapshot to
|
||||
# `$XDG_CACHE_HOME/tractor/hung-dumps/` for offline
|
||||
# inspection. See
|
||||
# `ai/conc-anal/cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`.
|
||||
#
|
||||
# NOTE: the `trio` depth=3 budget was bumped 6 -> 12s after
|
||||
# the `trio` 0.29 -> 0.33 lock bump (commit c7741bba) slowed
|
||||
# the depth-3 cancel-cascade from <6s to ~7-8s; the 6s
|
||||
# deadline was firing and its `Cancelled(source='deadline')`
|
||||
# (trio 0.33 cancel-reason metadata) collapsed a BEG branch,
|
||||
# breaking the `RemoteActorError` assertion below. depth=1
|
||||
# still finishes in ~3s so keeps the 6s budget. See
|
||||
# `ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.
|
||||
match (start_method, depth):
|
||||
case ('trio', 1):
|
||||
timeout = 6
|
||||
case ('trio', 3):
|
||||
timeout = 12
|
||||
case ('main_thread_forkserver', 1):
|
||||
timeout = 16
|
||||
case ('main_thread_forkserver', 3):
|
||||
timeout = 30
|
||||
|
||||
async with fail_after_w_trace(timeout):
|
||||
with trio.fail_after(120):
|
||||
try:
|
||||
async with tractor.open_nursery() as nursery:
|
||||
for i in range(subactor_breadth):
|
||||
|
|
@ -647,7 +431,7 @@ async def test_nested_multierrors(
|
|||
for subexc in err.exceptions:
|
||||
|
||||
# verify first level actor errors are wrapped as remote
|
||||
if _friggin_windows:
|
||||
if is_win():
|
||||
|
||||
# windows is often too slow and cancellation seems
|
||||
# to happen before an actor is spawned
|
||||
|
|
@ -680,7 +464,7 @@ async def test_nested_multierrors(
|
|||
# XXX not sure what's up with this..
|
||||
# on windows sometimes spawning is just too slow and
|
||||
# we get back the (sent) cancel signal instead
|
||||
if _friggin_windows:
|
||||
if is_win():
|
||||
if isinstance(subexc, tractor.RemoteActorError):
|
||||
assert subexc.boxed_type in (
|
||||
BaseExceptionGroup,
|
||||
|
|
@ -699,24 +483,20 @@ async def test_nested_multierrors(
|
|||
|
||||
@no_windows
|
||||
def test_cancel_via_SIGINT(
|
||||
reg_addr: tuple,
|
||||
loglevel: str,
|
||||
start_method: str,
|
||||
loglevel,
|
||||
start_method,
|
||||
spawn_backend,
|
||||
):
|
||||
'''
|
||||
Ensure that a control-C (SIGINT) signal cancels both the parent and
|
||||
"""Ensure that a control-C (SIGINT) signal cancels both the parent and
|
||||
child processes in trionic fashion
|
||||
|
||||
'''
|
||||
pid: int = os.getpid()
|
||||
"""
|
||||
pid = os.getpid()
|
||||
|
||||
async def main():
|
||||
with trio.fail_after(2):
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as tn:
|
||||
async with tractor.open_nursery() as tn:
|
||||
await tn.start_actor('sucka')
|
||||
if 'mp' in start_method:
|
||||
if 'mp' in spawn_backend:
|
||||
time.sleep(0.1)
|
||||
os.kill(pid, signal.SIGINT)
|
||||
await trio.sleep_forever()
|
||||
|
|
@ -727,38 +507,23 @@ def test_cancel_via_SIGINT(
|
|||
|
||||
@no_windows
|
||||
def test_cancel_via_SIGINT_other_task(
|
||||
reg_addr: tuple,
|
||||
loglevel: str,
|
||||
start_method: str,
|
||||
spawn_backend: str,
|
||||
loglevel,
|
||||
start_method,
|
||||
spawn_backend,
|
||||
):
|
||||
'''
|
||||
Ensure that a control-C (SIGINT) signal cancels both the parent
|
||||
and child processes in trionic fashion even a subprocess is
|
||||
started from a seperate ``trio`` child task.
|
||||
|
||||
'''
|
||||
from .conftest import cpu_scaling_factor
|
||||
|
||||
pid: int = os.getpid()
|
||||
timeout: float = (
|
||||
4 if _non_linux
|
||||
else 2
|
||||
)
|
||||
if _friggin_windows: # smh
|
||||
"""Ensure that a control-C (SIGINT) signal cancels both the parent
|
||||
and child processes in trionic fashion even a subprocess is started
|
||||
from a seperate ``trio`` child task.
|
||||
"""
|
||||
pid = os.getpid()
|
||||
timeout: float = 2
|
||||
if is_win(): # smh
|
||||
timeout += 1
|
||||
|
||||
# add latency headroom for CPU freq scaling (auto-cpufreq et al.)
|
||||
headroom: float = cpu_scaling_factor()
|
||||
if headroom != 1.:
|
||||
timeout *= headroom
|
||||
|
||||
async def spawn_and_sleep_forever(
|
||||
task_status=trio.TASK_STATUS_IGNORED
|
||||
):
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as tn:
|
||||
async with tractor.open_nursery() as tn:
|
||||
for i in range(3):
|
||||
await tn.run_in_actor(
|
||||
sleep_forever,
|
||||
|
|
@ -822,7 +587,7 @@ async def spawn_sub_with_sync_blocking_task():
|
|||
def test_cancel_while_childs_child_in_sync_sleep(
|
||||
loglevel: str,
|
||||
start_method: str,
|
||||
is_forking_spawner: bool,
|
||||
spawn_backend: str,
|
||||
debug_mode: bool,
|
||||
reg_addr: tuple,
|
||||
man_cancel_outer: bool,
|
||||
|
|
@ -838,10 +603,7 @@ def test_cancel_while_childs_child_in_sync_sleep(
|
|||
|
||||
'''
|
||||
if start_method == 'forkserver':
|
||||
pytest.skip(
|
||||
"`multiprocessing`'s forkserver sux hard at "
|
||||
"resuming from sync sleep..."
|
||||
)
|
||||
pytest.skip("Forksever sux hard at resuming from sync sleep...")
|
||||
|
||||
async def main():
|
||||
#
|
||||
|
|
@ -882,15 +644,7 @@ def test_cancel_while_childs_child_in_sync_sleep(
|
|||
#
|
||||
# delay = 1 # no AssertionError in eg, TooSlowError raised.
|
||||
# delay = 2 # is AssertionError in eg AND no TooSlowError !?
|
||||
# is AssertionError in eg AND no _cs cancellation.
|
||||
delay = (
|
||||
6 if (
|
||||
_non_linux
|
||||
or
|
||||
is_forking_spawner
|
||||
)
|
||||
else 4
|
||||
)
|
||||
delay = 4 # is AssertionError in eg AND no _cs cancellation.
|
||||
|
||||
with trio.fail_after(delay) as _cs:
|
||||
# with trio.CancelScope() as cs:
|
||||
|
|
@ -924,7 +678,7 @@ def test_cancel_while_childs_child_in_sync_sleep(
|
|||
|
||||
|
||||
def test_fast_graceful_cancel_when_spawn_task_in_soft_proc_wait_for_daemon(
|
||||
start_method: str,
|
||||
start_method,
|
||||
):
|
||||
'''
|
||||
This is a very subtle test which demonstrates how cancellation
|
||||
|
|
@ -942,7 +696,7 @@ def test_fast_graceful_cancel_when_spawn_task_in_soft_proc_wait_for_daemon(
|
|||
kbi_delay = 0.5
|
||||
timeout: float = 2.9
|
||||
|
||||
if _friggin_windows: # smh
|
||||
if is_win(): # smh
|
||||
timeout += 1
|
||||
|
||||
async def main():
|
||||
|
|
|
|||
|
|
@ -18,15 +18,16 @@ from tractor import RemoteActorError
|
|||
|
||||
|
||||
async def aio_streamer(
|
||||
chan: tractor.to_asyncio.LinkedTaskChannel,
|
||||
from_trio: asyncio.Queue,
|
||||
to_trio: trio.abc.SendChannel,
|
||||
) -> trio.abc.ReceiveChannel:
|
||||
|
||||
# required first msg to sync caller
|
||||
chan.started_nowait(None)
|
||||
to_trio.send_nowait(None)
|
||||
|
||||
from itertools import cycle
|
||||
for i in cycle(range(10)):
|
||||
chan.send_nowait(i)
|
||||
to_trio.send_nowait(i)
|
||||
await asyncio.sleep(0.01)
|
||||
|
||||
|
||||
|
|
@ -68,7 +69,7 @@ async def wrapper_mngr(
|
|||
else:
|
||||
async with tractor.to_asyncio.open_channel_from(
|
||||
aio_streamer,
|
||||
) as (from_aio, first):
|
||||
) as (first, from_aio):
|
||||
assert not first
|
||||
|
||||
# cache it so next task uses broadcast receiver
|
||||
|
|
|
|||
|
|
@ -10,19 +10,7 @@ from tractor._testing import tractor_test
|
|||
MESSAGE = 'tractoring at full speed'
|
||||
|
||||
|
||||
def test_empty_mngrs_input_raises(
|
||||
tpt_proto: str,
|
||||
) -> None:
|
||||
# TODO, the `open_actor_cluster()` teardown hangs
|
||||
# intermittently on UDS when `gather_contexts(mngrs=())`
|
||||
# raises `ValueError` mid-setup; likely a race in the
|
||||
# actor-nursery cleanup vs UDS socket shutdown. Needs
|
||||
# a deeper look at `._clustering`/`._supervise` teardown
|
||||
# paths with the UDS transport.
|
||||
if tpt_proto == 'uds':
|
||||
pytest.skip(
|
||||
'actor-cluster teardown hangs intermittently on UDS'
|
||||
)
|
||||
def test_empty_mngrs_input_raises() -> None:
|
||||
|
||||
async def main():
|
||||
with trio.fail_after(3):
|
||||
|
|
@ -68,44 +56,25 @@ async def worker(
|
|||
print(msg)
|
||||
assert msg == MESSAGE
|
||||
|
||||
# ?TODO, does this ever cause a hang?
|
||||
# TODO: does this ever cause a hang
|
||||
# assert 0
|
||||
|
||||
|
||||
# ?TODO, but needs a fn-scoped tpt_proto fixture..
|
||||
# @pytest.mark.no_tpt('uds')
|
||||
@tractor_test
|
||||
async def test_streaming_to_actor_cluster(
|
||||
tpt_proto: str,
|
||||
is_forking_spawner: bool,
|
||||
):
|
||||
'''
|
||||
Open an actor "cluster" using the (experimental) `._clustering`
|
||||
API and conduct standard inter-task-ctx streaming.
|
||||
async def test_streaming_to_actor_cluster() -> None:
|
||||
|
||||
'''
|
||||
if tpt_proto == 'uds':
|
||||
pytest.skip(
|
||||
f'Test currently fails with tpt-proto={tpt_proto!r}\n'
|
||||
)
|
||||
async with (
|
||||
open_actor_cluster(modules=[__name__]) as portals,
|
||||
|
||||
delay: float = (
|
||||
10 if is_forking_spawner
|
||||
else 6
|
||||
)
|
||||
with trio.fail_after(delay):
|
||||
async with (
|
||||
open_actor_cluster(modules=[__name__]) as portals,
|
||||
gather_contexts(
|
||||
mngrs=[p.open_context(worker) for p in portals.values()],
|
||||
) as contexts,
|
||||
|
||||
gather_contexts(
|
||||
mngrs=[p.open_context(worker) for p in portals.values()],
|
||||
) as contexts,
|
||||
gather_contexts(
|
||||
mngrs=[ctx[0].open_stream() for ctx in contexts],
|
||||
) as streams,
|
||||
|
||||
gather_contexts(
|
||||
mngrs=[ctx[0].open_stream() for ctx in contexts],
|
||||
) as streams,
|
||||
|
||||
):
|
||||
with trio.move_on_after(1):
|
||||
for stream in itertools.cycle(streams):
|
||||
await stream.send(MESSAGE)
|
||||
):
|
||||
with trio.move_on_after(1):
|
||||
for stream in itertools.cycle(streams):
|
||||
await stream.send(MESSAGE)
|
||||
|
|
|
|||
|
|
@ -9,7 +9,6 @@ from itertools import count
|
|||
import math
|
||||
import platform
|
||||
from pprint import pformat
|
||||
import sys
|
||||
from typing import (
|
||||
Callable,
|
||||
)
|
||||
|
|
@ -26,7 +25,7 @@ from tractor._exceptions import (
|
|||
StreamOverrun,
|
||||
ContextCancelled,
|
||||
)
|
||||
from tractor.runtime._state import current_ipc_ctx
|
||||
from tractor._state import current_ipc_ctx
|
||||
|
||||
from tractor._testing import (
|
||||
tractor_test,
|
||||
|
|
@ -115,12 +114,10 @@ async def not_started_but_stream_opened(
|
|||
)
|
||||
def test_started_misuse(
|
||||
target: Callable,
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
):
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
) as an:
|
||||
portal = await an.start_actor(
|
||||
|
|
@ -186,24 +183,15 @@ def test_simple_context(
|
|||
error_parent,
|
||||
child_blocks_forever,
|
||||
pointlessly_open_stream,
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
is_forking_spawner: bool,
|
||||
):
|
||||
|
||||
timeout: float = 1.5
|
||||
# windows and forking-spawner both have "slower but more
|
||||
# deterministic" cancel teardown.
|
||||
if platform.system() == 'Windows':
|
||||
timeout = 4
|
||||
elif is_forking_spawner:
|
||||
timeout = 3
|
||||
timeout = 1.5 if not platform.system() == 'Windows' else 4
|
||||
|
||||
async def main():
|
||||
|
||||
with trio.fail_after(timeout):
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
) as an:
|
||||
portal = await an.start_actor(
|
||||
|
|
@ -289,7 +277,6 @@ def test_parent_cancels(
|
|||
cancel_method: str,
|
||||
chk_ctx_result_before_exit: bool,
|
||||
child_returns_early: bool,
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
):
|
||||
'''
|
||||
|
|
@ -367,7 +354,6 @@ def test_parent_cancels(
|
|||
async def main():
|
||||
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
) as an:
|
||||
portal = await an.start_actor(
|
||||
|
|
@ -944,7 +930,6 @@ async def keep_sending_from_child(
|
|||
)
|
||||
def test_one_end_stream_not_opened(
|
||||
overrun_by: tuple[str, int, Callable],
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
):
|
||||
'''
|
||||
|
|
@ -953,17 +938,11 @@ def test_one_end_stream_not_opened(
|
|||
|
||||
'''
|
||||
overrunner, buf_size_increase, entrypoint = overrun_by
|
||||
from tractor.runtime._runtime import Actor
|
||||
from tractor._runtime import Actor
|
||||
buf_size = buf_size_increase + Actor.msg_buffer_size
|
||||
|
||||
timeout: float = (
|
||||
1 if sys.platform == 'linux'
|
||||
else 3
|
||||
)
|
||||
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
) as an:
|
||||
portal = await an.start_actor(
|
||||
|
|
@ -971,7 +950,7 @@ def test_one_end_stream_not_opened(
|
|||
enable_modules=[__name__],
|
||||
)
|
||||
|
||||
with trio.fail_after(timeout):
|
||||
with trio.fail_after(1):
|
||||
async with portal.open_context(
|
||||
entrypoint,
|
||||
) as (ctx, sent):
|
||||
|
|
@ -1128,7 +1107,6 @@ def test_maybe_allow_overruns_stream(
|
|||
|
||||
# conftest wide
|
||||
loglevel: str,
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
):
|
||||
'''
|
||||
|
|
@ -1149,7 +1127,6 @@ def test_maybe_allow_overruns_stream(
|
|||
'''
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
) as an:
|
||||
portal = await an.start_actor(
|
||||
|
|
@ -1266,7 +1243,6 @@ def test_maybe_allow_overruns_stream(
|
|||
|
||||
def test_ctx_with_self_actor(
|
||||
loglevel: str,
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
):
|
||||
'''
|
||||
|
|
@ -1281,7 +1257,6 @@ def test_ctx_with_self_actor(
|
|||
'''
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
enable_modules=[__name__],
|
||||
) as an:
|
||||
|
|
|
|||
|
|
@ -0,0 +1,415 @@
|
|||
"""
|
||||
Actor "discovery" testing
|
||||
"""
|
||||
import os
|
||||
import signal
|
||||
import platform
|
||||
from functools import partial
|
||||
import itertools
|
||||
|
||||
import psutil
|
||||
import pytest
|
||||
import subprocess
|
||||
import tractor
|
||||
from tractor.trionics import collapse_eg
|
||||
from tractor._testing import tractor_test
|
||||
import trio
|
||||
|
||||
|
||||
@tractor_test
|
||||
async def test_reg_then_unreg(reg_addr):
|
||||
actor = tractor.current_actor()
|
||||
assert actor.is_arbiter
|
||||
assert len(actor._registry) == 1 # only self is registered
|
||||
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as n:
|
||||
|
||||
portal = await n.start_actor('actor', enable_modules=[__name__])
|
||||
uid = portal.channel.uid
|
||||
|
||||
async with tractor.get_registry(reg_addr) as aportal:
|
||||
# this local actor should be the arbiter
|
||||
assert actor is aportal.actor
|
||||
|
||||
async with tractor.wait_for_actor('actor'):
|
||||
# sub-actor uid should be in the registry
|
||||
assert uid in aportal.actor._registry
|
||||
sockaddrs = actor._registry[uid]
|
||||
# XXX: can we figure out what the listen addr will be?
|
||||
assert sockaddrs
|
||||
|
||||
await n.cancel() # tear down nursery
|
||||
|
||||
await trio.sleep(0.1)
|
||||
assert uid not in aportal.actor._registry
|
||||
sockaddrs = actor._registry.get(uid)
|
||||
assert not sockaddrs
|
||||
|
||||
|
||||
the_line = 'Hi my name is {}'
|
||||
|
||||
|
||||
async def hi():
|
||||
return the_line.format(tractor.current_actor().name)
|
||||
|
||||
|
||||
async def say_hello(
|
||||
other_actor: str,
|
||||
reg_addr: tuple[str, int],
|
||||
):
|
||||
await trio.sleep(1) # wait for other actor to spawn
|
||||
async with tractor.find_actor(
|
||||
other_actor,
|
||||
registry_addrs=[reg_addr],
|
||||
) as portal:
|
||||
assert portal is not None
|
||||
return await portal.run(__name__, 'hi')
|
||||
|
||||
|
||||
async def say_hello_use_wait(
|
||||
other_actor: str,
|
||||
reg_addr: tuple[str, int],
|
||||
):
|
||||
async with tractor.wait_for_actor(
|
||||
other_actor,
|
||||
registry_addr=reg_addr,
|
||||
) as portal:
|
||||
assert portal is not None
|
||||
result = await portal.run(__name__, 'hi')
|
||||
return result
|
||||
|
||||
|
||||
@tractor_test
|
||||
@pytest.mark.parametrize('func', [say_hello, say_hello_use_wait])
|
||||
async def test_trynamic_trio(
|
||||
func,
|
||||
start_method,
|
||||
reg_addr,
|
||||
):
|
||||
'''
|
||||
Root actor acting as the "director" and running one-shot-task-actors
|
||||
for the directed subs.
|
||||
|
||||
'''
|
||||
async with tractor.open_nursery() as n:
|
||||
print("Alright... Action!")
|
||||
|
||||
donny = await n.run_in_actor(
|
||||
func,
|
||||
other_actor='gretchen',
|
||||
reg_addr=reg_addr,
|
||||
name='donny',
|
||||
)
|
||||
gretchen = await n.run_in_actor(
|
||||
func,
|
||||
other_actor='donny',
|
||||
reg_addr=reg_addr,
|
||||
name='gretchen',
|
||||
)
|
||||
print(await gretchen.result())
|
||||
print(await donny.result())
|
||||
print("CUTTTT CUUTT CUT!!?! Donny!! You're supposed to say...")
|
||||
|
||||
|
||||
async def stream_forever():
|
||||
for i in itertools.count():
|
||||
yield i
|
||||
await trio.sleep(0.01)
|
||||
|
||||
|
||||
async def cancel(use_signal, delay=0):
|
||||
# hold on there sally
|
||||
await trio.sleep(delay)
|
||||
|
||||
# trigger cancel
|
||||
if use_signal:
|
||||
if platform.system() == 'Windows':
|
||||
pytest.skip("SIGINT not supported on windows")
|
||||
os.kill(os.getpid(), signal.SIGINT)
|
||||
else:
|
||||
raise KeyboardInterrupt
|
||||
|
||||
|
||||
async def stream_from(portal):
|
||||
async with portal.open_stream_from(stream_forever) as stream:
|
||||
async for value in stream:
|
||||
print(value)
|
||||
|
||||
|
||||
async def unpack_reg(actor_or_portal):
|
||||
'''
|
||||
Get and unpack a "registry" RPC request from the "arbiter" registry
|
||||
system.
|
||||
|
||||
'''
|
||||
if getattr(actor_or_portal, 'get_registry', None):
|
||||
msg = await actor_or_portal.get_registry()
|
||||
else:
|
||||
msg = await actor_or_portal.run_from_ns('self', 'get_registry')
|
||||
|
||||
return {tuple(key.split('.')): val for key, val in msg.items()}
|
||||
|
||||
|
||||
async def spawn_and_check_registry(
|
||||
reg_addr: tuple,
|
||||
use_signal: bool,
|
||||
debug_mode: bool = False,
|
||||
remote_arbiter: bool = False,
|
||||
with_streaming: bool = False,
|
||||
maybe_daemon: tuple[
|
||||
subprocess.Popen,
|
||||
psutil.Process,
|
||||
]|None = None,
|
||||
|
||||
) -> None:
|
||||
|
||||
if maybe_daemon:
|
||||
popen, proc = maybe_daemon
|
||||
# breakpoint()
|
||||
|
||||
async with tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
):
|
||||
async with tractor.get_registry(reg_addr) as portal:
|
||||
# runtime needs to be up to call this
|
||||
actor = tractor.current_actor()
|
||||
|
||||
if remote_arbiter:
|
||||
assert not actor.is_arbiter
|
||||
|
||||
if actor.is_arbiter:
|
||||
extra = 1 # arbiter is local root actor
|
||||
get_reg = partial(unpack_reg, actor)
|
||||
|
||||
else:
|
||||
get_reg = partial(unpack_reg, portal)
|
||||
extra = 2 # local root actor + remote arbiter
|
||||
|
||||
# ensure current actor is registered
|
||||
registry: dict = await get_reg()
|
||||
assert actor.uid in registry
|
||||
|
||||
try:
|
||||
async with tractor.open_nursery() as an:
|
||||
async with (
|
||||
collapse_eg(),
|
||||
trio.open_nursery() as trion,
|
||||
):
|
||||
portals = {}
|
||||
for i in range(3):
|
||||
name = f'a{i}'
|
||||
if with_streaming:
|
||||
portals[name] = await an.start_actor(
|
||||
name=name, enable_modules=[__name__])
|
||||
|
||||
else: # no streaming
|
||||
portals[name] = await an.run_in_actor(
|
||||
trio.sleep_forever, name=name)
|
||||
|
||||
# wait on last actor to come up
|
||||
async with tractor.wait_for_actor(name):
|
||||
registry = await get_reg()
|
||||
for uid in an._children:
|
||||
assert uid in registry
|
||||
|
||||
assert len(portals) + extra == len(registry)
|
||||
|
||||
if with_streaming:
|
||||
await trio.sleep(0.1)
|
||||
|
||||
pts = list(portals.values())
|
||||
for p in pts[:-1]:
|
||||
trion.start_soon(stream_from, p)
|
||||
|
||||
# stream for 1 sec
|
||||
trion.start_soon(cancel, use_signal, 1)
|
||||
|
||||
last_p = pts[-1]
|
||||
await stream_from(last_p)
|
||||
|
||||
else:
|
||||
await cancel(use_signal)
|
||||
|
||||
finally:
|
||||
await trio.sleep(0.5)
|
||||
|
||||
# all subactors should have de-registered
|
||||
registry = await get_reg()
|
||||
assert len(registry) == extra
|
||||
assert actor.uid in registry
|
||||
|
||||
|
||||
@pytest.mark.parametrize('use_signal', [False, True])
|
||||
@pytest.mark.parametrize('with_streaming', [False, True])
|
||||
def test_subactors_unregister_on_cancel(
|
||||
debug_mode: bool,
|
||||
start_method,
|
||||
use_signal,
|
||||
reg_addr,
|
||||
with_streaming,
|
||||
):
|
||||
'''
|
||||
Verify that cancelling a nursery results in all subactors
|
||||
deregistering themselves with the arbiter.
|
||||
|
||||
'''
|
||||
with pytest.raises(KeyboardInterrupt):
|
||||
trio.run(
|
||||
partial(
|
||||
spawn_and_check_registry,
|
||||
reg_addr,
|
||||
use_signal,
|
||||
debug_mode=debug_mode,
|
||||
remote_arbiter=False,
|
||||
with_streaming=with_streaming,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.parametrize('use_signal', [False, True])
|
||||
@pytest.mark.parametrize('with_streaming', [False, True])
|
||||
def test_subactors_unregister_on_cancel_remote_daemon(
|
||||
daemon: subprocess.Popen,
|
||||
debug_mode: bool,
|
||||
start_method,
|
||||
use_signal,
|
||||
reg_addr,
|
||||
with_streaming,
|
||||
):
|
||||
"""Verify that cancelling a nursery results in all subactors
|
||||
deregistering themselves with a **remote** (not in the local process
|
||||
tree) arbiter.
|
||||
"""
|
||||
with pytest.raises(KeyboardInterrupt):
|
||||
trio.run(
|
||||
partial(
|
||||
spawn_and_check_registry,
|
||||
reg_addr,
|
||||
use_signal,
|
||||
debug_mode=debug_mode,
|
||||
remote_arbiter=True,
|
||||
with_streaming=with_streaming,
|
||||
maybe_daemon=(
|
||||
daemon,
|
||||
psutil.Process(daemon.pid)
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
async def streamer(agen):
|
||||
async for item in agen:
|
||||
print(item)
|
||||
|
||||
|
||||
async def close_chans_before_nursery(
|
||||
reg_addr: tuple,
|
||||
use_signal: bool,
|
||||
remote_arbiter: bool = False,
|
||||
) -> None:
|
||||
|
||||
# logic for how many actors should still be
|
||||
# in the registry at teardown.
|
||||
if remote_arbiter:
|
||||
entries_at_end = 2
|
||||
else:
|
||||
entries_at_end = 1
|
||||
|
||||
async with tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
):
|
||||
async with tractor.get_registry(reg_addr) as aportal:
|
||||
try:
|
||||
get_reg = partial(unpack_reg, aportal)
|
||||
|
||||
async with tractor.open_nursery() as tn:
|
||||
portal1 = await tn.start_actor(
|
||||
name='consumer1', enable_modules=[__name__])
|
||||
portal2 = await tn.start_actor(
|
||||
'consumer2', enable_modules=[__name__])
|
||||
|
||||
# TODO: compact this back as was in last commit once
|
||||
# 3.9+, see https://github.com/goodboy/tractor/issues/207
|
||||
async with portal1.open_stream_from(
|
||||
stream_forever
|
||||
) as agen1:
|
||||
async with portal2.open_stream_from(
|
||||
stream_forever
|
||||
) as agen2:
|
||||
async with (
|
||||
collapse_eg(),
|
||||
trio.open_nursery() as tn,
|
||||
):
|
||||
tn.start_soon(streamer, agen1)
|
||||
tn.start_soon(cancel, use_signal, .5)
|
||||
try:
|
||||
await streamer(agen2)
|
||||
finally:
|
||||
# Kill the root nursery thus resulting in
|
||||
# normal arbiter channel ops to fail during
|
||||
# teardown. It doesn't seem like this is
|
||||
# reliably triggered by an external SIGINT.
|
||||
# tractor.current_actor()._root_nursery.cancel_scope.cancel()
|
||||
|
||||
# XXX: THIS IS THE KEY THING that
|
||||
# happens **before** exiting the
|
||||
# actor nursery block
|
||||
|
||||
# also kill off channels cuz why not
|
||||
await agen1.aclose()
|
||||
await agen2.aclose()
|
||||
finally:
|
||||
with trio.CancelScope(shield=True):
|
||||
await trio.sleep(1)
|
||||
|
||||
# all subactors should have de-registered
|
||||
registry = await get_reg()
|
||||
assert portal1.channel.uid not in registry
|
||||
assert portal2.channel.uid not in registry
|
||||
assert len(registry) == entries_at_end
|
||||
|
||||
|
||||
@pytest.mark.parametrize('use_signal', [False, True])
|
||||
def test_close_channel_explicit(
|
||||
start_method,
|
||||
use_signal,
|
||||
reg_addr,
|
||||
):
|
||||
"""Verify that closing a stream explicitly and killing the actor's
|
||||
"root nursery" **before** the containing nursery tears down also
|
||||
results in subactor(s) deregistering from the arbiter.
|
||||
"""
|
||||
with pytest.raises(KeyboardInterrupt):
|
||||
trio.run(
|
||||
partial(
|
||||
close_chans_before_nursery,
|
||||
reg_addr,
|
||||
use_signal,
|
||||
remote_arbiter=False,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.parametrize('use_signal', [False, True])
|
||||
def test_close_channel_explicit_remote_arbiter(
|
||||
daemon: subprocess.Popen,
|
||||
start_method,
|
||||
use_signal,
|
||||
reg_addr,
|
||||
):
|
||||
"""Verify that closing a stream explicitly and killing the actor's
|
||||
"root nursery" **before** the containing nursery tears down also
|
||||
results in subactor(s) deregistering from the arbiter.
|
||||
"""
|
||||
with pytest.raises(KeyboardInterrupt):
|
||||
trio.run(
|
||||
partial(
|
||||
close_chans_before_nursery,
|
||||
reg_addr,
|
||||
use_signal,
|
||||
remote_arbiter=True,
|
||||
),
|
||||
)
|
||||
|
|
@ -9,17 +9,12 @@ import sys
|
|||
import subprocess
|
||||
import platform
|
||||
import shutil
|
||||
from typing import Callable
|
||||
|
||||
import pytest
|
||||
import tractor
|
||||
from tractor._testing import (
|
||||
examples_dir,
|
||||
)
|
||||
|
||||
_non_linux: bool = platform.system() != 'Linux'
|
||||
_friggin_macos: bool = platform.system() == 'Darwin'
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def run_example_in_subproc(
|
||||
|
|
@ -94,10 +89,8 @@ def run_example_in_subproc(
|
|||
for f in p[2]
|
||||
|
||||
if (
|
||||
'__' not in f # ignore any pkg-mods
|
||||
# ignore any `__pycache__` subdir
|
||||
and '__pycache__' not in str(p[0])
|
||||
and f[0] != '_' # ignore any WIP "examplel mods"
|
||||
'__' not in f
|
||||
and f[0] != '_'
|
||||
and 'debugging' not in p[0]
|
||||
and 'integration' not in p[0]
|
||||
and 'advanced_faults' not in p[0]
|
||||
|
|
@ -108,10 +101,8 @@ def run_example_in_subproc(
|
|||
ids=lambda t: t[1],
|
||||
)
|
||||
def test_example(
|
||||
run_example_in_subproc: Callable,
|
||||
example_script: str,
|
||||
test_log: tractor.log.StackLevelAdapter,
|
||||
ci_env: bool,
|
||||
run_example_in_subproc,
|
||||
example_script,
|
||||
):
|
||||
'''
|
||||
Load and run scripts from this repo's ``examples/`` dir as a user
|
||||
|
|
@ -125,39 +116,9 @@ def test_example(
|
|||
'''
|
||||
ex_file: str = os.path.join(*example_script)
|
||||
|
||||
if (
|
||||
'rpc_bidir_streaming' in ex_file
|
||||
and
|
||||
sys.version_info < (3, 9)
|
||||
):
|
||||
if 'rpc_bidir_streaming' in ex_file and sys.version_info < (3, 9):
|
||||
pytest.skip("2-way streaming example requires py3.9 async with syntax")
|
||||
|
||||
if (
|
||||
'full_fledged_streaming_service' in ex_file
|
||||
and
|
||||
_friggin_macos
|
||||
and
|
||||
ci_env
|
||||
):
|
||||
pytest.skip(
|
||||
'Streaming example is too flaky in CI\n'
|
||||
'AND their competitor runs this CI service..\n'
|
||||
'This test does run just fine "in person" however..'
|
||||
)
|
||||
|
||||
from .conftest import cpu_scaling_factor
|
||||
|
||||
timeout: float = (
|
||||
60
|
||||
if ci_env and _non_linux
|
||||
else 16
|
||||
)
|
||||
|
||||
# add latency headroom for CPU freq scaling (auto-cpufreq et al.)
|
||||
headroom: float = cpu_scaling_factor()
|
||||
if headroom != 1.:
|
||||
timeout *= headroom
|
||||
|
||||
with open(ex_file, 'r') as ex:
|
||||
code = ex.read()
|
||||
|
||||
|
|
@ -165,12 +126,9 @@ def test_example(
|
|||
err = None
|
||||
try:
|
||||
if not proc.poll():
|
||||
_, err = proc.communicate(timeout=timeout)
|
||||
_, err = proc.communicate(timeout=15)
|
||||
|
||||
except subprocess.TimeoutExpired as e:
|
||||
test_log.exception(
|
||||
f'Example failed to finish within {timeout}s ??\n'
|
||||
)
|
||||
proc.kill()
|
||||
err = e.stderr
|
||||
|
||||
|
|
|
|||
|
|
@ -57,7 +57,6 @@ from tractor.msg._ops import (
|
|||
limit_plds,
|
||||
)
|
||||
|
||||
|
||||
def enc_nsp(obj: Any) -> Any:
|
||||
actor: Actor = tractor.current_actor(
|
||||
err_on_no_runtime=False,
|
||||
|
|
@ -618,17 +617,6 @@ def test_ext_types_over_ipc(
|
|||
debug_mode: bool,
|
||||
pld_spec: Union[Type],
|
||||
add_hooks: bool,
|
||||
|
||||
set_fork_aware_capture,
|
||||
# ^^XXX? for forking spawners
|
||||
|
||||
# capfd: pytest.CaptureFixture,
|
||||
# ^^NOTE, super interesting that if
|
||||
# we disable this below then the tpt-layer
|
||||
# suffers as an "unclean EOF"??
|
||||
# ?TODO, determine why/how that mks sense when addressing,
|
||||
# https://github.com/pytest-dev/pytest/issues/14444
|
||||
#
|
||||
):
|
||||
'''
|
||||
Ensure we can support extension types coverted using
|
||||
|
|
@ -737,26 +725,18 @@ def test_ext_types_over_ipc(
|
|||
|
||||
await p.cancel_actor()
|
||||
|
||||
async def fa_main():
|
||||
with (
|
||||
trio.fail_after(2),
|
||||
# ?TODO, investigate? see NOTE above..
|
||||
# capfd.disabled(),
|
||||
):
|
||||
await main()
|
||||
|
||||
if (
|
||||
NamespacePath in pld_types
|
||||
and
|
||||
add_hooks
|
||||
):
|
||||
trio.run(fa_main)
|
||||
trio.run(main)
|
||||
|
||||
else:
|
||||
with pytest.raises(
|
||||
expected_exception=tractor.RemoteActorError,
|
||||
) as excinfo:
|
||||
trio.run(fa_main)
|
||||
trio.run(main)
|
||||
|
||||
exc = excinfo.value
|
||||
# bc `.started(nsp: NamespacePath)` will raise
|
||||
|
|
@ -26,36 +26,10 @@ from tractor import (
|
|||
to_asyncio,
|
||||
RemoteActorError,
|
||||
ContextCancelled,
|
||||
_state,
|
||||
)
|
||||
from tractor.runtime import _state
|
||||
from tractor.trionics import BroadcastReceiver
|
||||
from tractor._testing import expect_ctxc
|
||||
from tractor._testing.trace import (
|
||||
AfkAlarmWTraceFactory,
|
||||
FailAfterWTraceFactory,
|
||||
)
|
||||
|
||||
|
||||
# Per-test zombie-subactor reaper. Opt-in (NOT autouse) —
|
||||
# see `tractor._testing.pytest.reap_subactors_per_test`'s
|
||||
# docstring for the full rationale. This module specifically
|
||||
# needs it because tests like
|
||||
# `test_echoserver_detailed_mechanics[KeyboardInterrupt]`
|
||||
# and the `test_sigint_closes_lifetime_stack[*]` matrix have
|
||||
# been observed to hang past pytest's wall-clock under
|
||||
# `main_thread_forkserver`, leaving subactor forks that
|
||||
# squat on registrar resources and cascade-fail every
|
||||
# subsequent test (`test_inter_peer_cancellation`,
|
||||
# `test_legacy_one_way_streaming`, etc.).
|
||||
pytestmark = pytest.mark.usefixtures(
|
||||
'reap_subactors_per_test',
|
||||
# NOTE, asyncio cancel cascade has historically
|
||||
# triggered both UDS sockfile leaks (SIGKILL path)
|
||||
# AND the trio `WakeupSocketpair.drain()` busy-loop
|
||||
# — see `test_aio_simple_error`'s history.
|
||||
'track_orphaned_uds_per_test',
|
||||
'detect_runaway_subactors_per_test',
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture(
|
||||
|
|
@ -73,11 +47,12 @@ async def sleep_and_err(
|
|||
|
||||
# just signature placeholders for compat with
|
||||
# ``to_asyncio.open_channel_from()``
|
||||
chan: to_asyncio.LinkedTaskChannel|None = None,
|
||||
to_trio: trio.MemorySendChannel|None = None,
|
||||
from_trio: asyncio.Queue|None = None,
|
||||
|
||||
):
|
||||
if chan:
|
||||
chan.started_nowait('start')
|
||||
if to_trio:
|
||||
to_trio.send_nowait('start')
|
||||
|
||||
await asyncio.sleep(sleep_for)
|
||||
assert 0
|
||||
|
|
@ -209,7 +184,6 @@ def test_tractor_cancels_aio(
|
|||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
debug_mode=debug_mode,
|
||||
registry_addrs=[reg_addr],
|
||||
) as an:
|
||||
portal = await an.run_in_actor(
|
||||
asyncio_actor,
|
||||
|
|
@ -232,11 +206,11 @@ def test_trio_cancels_aio(
|
|||
|
||||
'''
|
||||
async def main():
|
||||
# cancel the nursery shortly after boot
|
||||
|
||||
with trio.move_on_after(1):
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as tn:
|
||||
# cancel the nursery shortly after boot
|
||||
|
||||
async with tractor.open_nursery() as tn:
|
||||
await tn.run_in_actor(
|
||||
asyncio_actor,
|
||||
target='aio_sleep_forever',
|
||||
|
|
@ -264,7 +238,7 @@ async def trio_ctx(
|
|||
trio.open_nursery() as tn,
|
||||
tractor.to_asyncio.open_channel_from(
|
||||
sleep_and_err,
|
||||
) as (chan, first),
|
||||
) as (first, chan),
|
||||
):
|
||||
|
||||
assert first == 'start'
|
||||
|
|
@ -304,9 +278,7 @@ def test_context_spawns_aio_task_that_errors(
|
|||
'''
|
||||
async def main():
|
||||
with trio.fail_after(1 + delay):
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as an:
|
||||
async with tractor.open_nursery() as an:
|
||||
p = await an.start_actor(
|
||||
'aio_daemon',
|
||||
enable_modules=[__name__],
|
||||
|
|
@ -389,9 +361,7 @@ def test_aio_cancelled_from_aio_causes_trio_cancelled(
|
|||
async def main():
|
||||
|
||||
an: tractor.ActorNursery
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as an:
|
||||
async with tractor.open_nursery() as an:
|
||||
p: tractor.Portal = await an.run_in_actor(
|
||||
asyncio_actor,
|
||||
target='aio_cancel',
|
||||
|
|
@ -429,7 +399,7 @@ async def no_to_trio_in_args():
|
|||
|
||||
async def push_from_aio_task(
|
||||
sequence: Iterable,
|
||||
chan: to_asyncio.LinkedTaskChannel,
|
||||
to_trio: trio.abc.SendChannel,
|
||||
expect_cancel: False,
|
||||
fail_early: bool,
|
||||
exit_early: bool,
|
||||
|
|
@ -437,12 +407,15 @@ async def push_from_aio_task(
|
|||
) -> None:
|
||||
|
||||
try:
|
||||
# print('trying breakpoint')
|
||||
# breakpoint()
|
||||
|
||||
# sync caller ctx manager
|
||||
chan.started_nowait(True)
|
||||
to_trio.send_nowait(True)
|
||||
|
||||
for i in sequence:
|
||||
print(f'asyncio sending {i}')
|
||||
chan.send_nowait(i)
|
||||
to_trio.send_nowait(i)
|
||||
await asyncio.sleep(0.001)
|
||||
|
||||
if (
|
||||
|
|
@ -505,7 +478,7 @@ async def stream_from_aio(
|
|||
trio_exit_early
|
||||
))
|
||||
|
||||
) as (chan, first):
|
||||
) as (first, chan):
|
||||
|
||||
assert first is True
|
||||
|
||||
|
|
@ -600,9 +573,7 @@ def test_basic_interloop_channel_stream(
|
|||
async def main():
|
||||
# TODO, figure out min timeout here!
|
||||
with trio.fail_after(6):
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as an:
|
||||
async with tractor.open_nursery() as an:
|
||||
portal = await an.run_in_actor(
|
||||
stream_from_aio,
|
||||
infect_asyncio=True,
|
||||
|
|
@ -615,13 +586,9 @@ def test_basic_interloop_channel_stream(
|
|||
|
||||
|
||||
# TODO: parametrize the above test and avoid the duplication here?
|
||||
def test_trio_error_cancels_intertask_chan(
|
||||
reg_addr: tuple[str, int],
|
||||
):
|
||||
def test_trio_error_cancels_intertask_chan(reg_addr):
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as an:
|
||||
async with tractor.open_nursery() as an:
|
||||
portal = await an.run_in_actor(
|
||||
stream_from_aio,
|
||||
trio_raise_err=True,
|
||||
|
|
@ -656,7 +623,6 @@ def test_trio_closes_early_causes_aio_checkpoint_raise(
|
|||
async with tractor.open_nursery(
|
||||
debug_mode=debug_mode,
|
||||
# enable_stack_on_sig=True,
|
||||
registry_addrs=[reg_addr],
|
||||
) as an:
|
||||
portal = await an.run_in_actor(
|
||||
stream_from_aio,
|
||||
|
|
@ -705,7 +671,6 @@ def test_aio_exits_early_relays_AsyncioTaskExited(
|
|||
async def main():
|
||||
with trio.fail_after(1 + delay):
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
# enable_stack_on_sig=True,
|
||||
) as an:
|
||||
|
|
@ -746,7 +711,6 @@ def test_aio_errors_and_channel_propagates_and_closes(
|
|||
):
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
) as an:
|
||||
portal = await an.run_in_actor(
|
||||
|
|
@ -768,21 +732,15 @@ def test_aio_errors_and_channel_propagates_and_closes(
|
|||
|
||||
|
||||
async def aio_echo_server(
|
||||
chan: to_asyncio.LinkedTaskChannel,
|
||||
to_trio: trio.MemorySendChannel,
|
||||
from_trio: asyncio.Queue,
|
||||
) -> None:
|
||||
'''
|
||||
An IPC-msg "echo server" with msgs received and relayed by
|
||||
a parent `trio.Task` into a child `asyncio.Task`
|
||||
and then repeated back to that local parent (`trio.Task`)
|
||||
and sent again back to the original calling remote actor.
|
||||
|
||||
'''
|
||||
# same semantics as `trio.TaskStatus.started()`
|
||||
chan.started_nowait('start')
|
||||
to_trio.send_nowait('start')
|
||||
|
||||
while True:
|
||||
try:
|
||||
msg = await chan.get()
|
||||
msg = await from_trio.get()
|
||||
except to_asyncio.TrioTaskExited:
|
||||
print(
|
||||
'breaking aio echo loop due to `trio` exit!'
|
||||
|
|
@ -790,7 +748,7 @@ async def aio_echo_server(
|
|||
break
|
||||
|
||||
# echo the msg back
|
||||
chan.send_nowait(msg)
|
||||
to_trio.send_nowait(msg)
|
||||
|
||||
# if we get the terminate sentinel
|
||||
# break the echo loop
|
||||
|
|
@ -807,10 +765,7 @@ async def trio_to_aio_echo_server(
|
|||
):
|
||||
async with to_asyncio.open_channel_from(
|
||||
aio_echo_server,
|
||||
) as (
|
||||
chan,
|
||||
first, # value from `chan.started_nowait()` above
|
||||
):
|
||||
) as (first, chan):
|
||||
assert first == 'start'
|
||||
|
||||
await ctx.started(first)
|
||||
|
|
@ -821,8 +776,7 @@ async def trio_to_aio_echo_server(
|
|||
await chan.send(msg)
|
||||
|
||||
out = await chan.receive()
|
||||
|
||||
# echo back to parent-actor's remote parent-ctx-task!
|
||||
# echo back to parent actor-task
|
||||
await stream.send(out)
|
||||
|
||||
if out is None:
|
||||
|
|
@ -836,47 +790,16 @@ async def trio_to_aio_echo_server(
|
|||
|
||||
@pytest.mark.parametrize(
|
||||
'raise_error_mid_stream',
|
||||
[
|
||||
False,
|
||||
Exception,
|
||||
KeyboardInterrupt,
|
||||
],
|
||||
[False, Exception, KeyboardInterrupt],
|
||||
ids='raise_error={}'.format,
|
||||
)
|
||||
def test_echoserver_detailed_mechanics(
|
||||
reg_addr: tuple[str, int],
|
||||
debug_mode: bool,
|
||||
raise_error_mid_stream,
|
||||
|
||||
is_forking_spawner: bool,
|
||||
fail_after_w_trace: FailAfterWTraceFactory,
|
||||
):
|
||||
# NOTE: under fork-based backends the cancel-cascade
|
||||
# path is structurally slower than `trio`'s subproc-exec
|
||||
# (per-spawn forkserver-handshake compounds during
|
||||
# teardown). Bump the cap so cross-test contamination
|
||||
# doesn't flake this — see
|
||||
# `ai/conc-anal/cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`.
|
||||
timeout: float = (
|
||||
999 if tractor.debug_mode()
|
||||
else 4 if is_forking_spawner
|
||||
# was 1; the `trio` 0.29 -> 0.33 bump slowed the
|
||||
# cancel-cascade so a 1s budget raced the ~1s teardown
|
||||
# deadline. On a deadline-fire the injected
|
||||
# `Cancelled(source='deadline')` wraps the mid-stream
|
||||
# KBI in a `BaseExceptionGroup`, breaking the bare
|
||||
# `pytest.raises(KeyboardInterrupt)` below. See
|
||||
# `ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.
|
||||
else 4
|
||||
)
|
||||
|
||||
# body factored out so the `fail_after_w_trace`-wrapping
|
||||
# `main()` stays a 2-liner — keeps the deep `open_nursery`
|
||||
# /`open_context`/`open_stream` block at its natural indent
|
||||
# level instead of pushing it under yet another `async with`.
|
||||
async def _body():
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
) as an:
|
||||
p = await an.start_actor(
|
||||
|
|
@ -920,15 +843,6 @@ def test_echoserver_detailed_mechanics(
|
|||
# is cancelled by kbi or out of task cancellation
|
||||
await p.cancel_actor()
|
||||
|
||||
async def main():
|
||||
# on-timeout diag snapshot via `fail_after_w_trace`
|
||||
# — when the cancel cascade hangs under MTF we get a
|
||||
# fresh `ptree`/`wchan`/`py-spy` dump on disk INSTEAD
|
||||
# of an opaque pytest timeout-kill. See
|
||||
# `tractor/_testing/trace.py`.
|
||||
async with fail_after_w_trace(timeout):
|
||||
await _body()
|
||||
|
||||
if raise_error_mid_stream:
|
||||
with pytest.raises(raise_error_mid_stream):
|
||||
trio.run(main)
|
||||
|
|
@ -1064,7 +978,7 @@ async def manage_file(
|
|||
],
|
||||
ids=[
|
||||
'bg_aio_task',
|
||||
'just_trio_sleep',
|
||||
'just_trio_slee',
|
||||
],
|
||||
)
|
||||
@pytest.mark.parametrize(
|
||||
|
|
@ -1080,15 +994,11 @@ async def manage_file(
|
|||
)
|
||||
def test_sigint_closes_lifetime_stack(
|
||||
tmp_path: Path,
|
||||
reg_addr: tuple,
|
||||
debug_mode: bool,
|
||||
|
||||
wait_for_ctx: bool,
|
||||
bg_aio_task: bool,
|
||||
trio_side_is_shielded: bool,
|
||||
debug_mode: bool,
|
||||
send_sigint_to: str,
|
||||
is_forking_spawner: bool,
|
||||
afk_alarm_w_trace: AfkAlarmWTraceFactory,
|
||||
):
|
||||
'''
|
||||
Ensure that an infected child can use the `Actor.lifetime_stack`
|
||||
|
|
@ -1098,30 +1008,12 @@ def test_sigint_closes_lifetime_stack(
|
|||
'''
|
||||
async def main():
|
||||
|
||||
delay: float = (
|
||||
999
|
||||
if debug_mode
|
||||
else 1
|
||||
)
|
||||
# pre-init so the `except (KeyboardInterrupt, ContextCancelled)`
|
||||
# handler below doesn't `UnboundLocalError` if KBI fires BEFORE
|
||||
# we ever enter the `as (ctx, first)` body (e.g. when
|
||||
# `p.open_context().__aenter__` is hung waiting for the
|
||||
# subactor's `StartAck` due to a fork-child IPC race —
|
||||
# see `dynamic_pub_sub_spawn_time_transport_close_under_mtf_issue.md`).
|
||||
tmp_file: Path|None = None
|
||||
ctx: tractor.Context|None = None
|
||||
delay = 999 if tractor.debug_mode() else 1
|
||||
try:
|
||||
an: tractor.ActorNursery
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
) as an:
|
||||
|
||||
# sanity
|
||||
if debug_mode:
|
||||
assert tractor.debug_mode()
|
||||
|
||||
p: tractor.Portal = await an.start_actor(
|
||||
'file_mngr',
|
||||
enable_modules=[__name__],
|
||||
|
|
@ -1136,7 +1028,7 @@ def test_sigint_closes_lifetime_stack(
|
|||
) as (ctx, first):
|
||||
|
||||
path_str, cpid = first
|
||||
tmp_file = Path(path_str)
|
||||
tmp_file: Path = Path(path_str)
|
||||
assert tmp_file.exists()
|
||||
|
||||
# XXX originally to simulate what (hopefully)
|
||||
|
|
@ -1156,10 +1048,6 @@ def test_sigint_closes_lifetime_stack(
|
|||
cpid if send_sigint_to == 'child'
|
||||
else os.getpid()
|
||||
)
|
||||
print(
|
||||
f'Sending SIGINT to {send_sigint_to!r}\n'
|
||||
f'pid: {pid!r}\n'
|
||||
)
|
||||
os.kill(
|
||||
pid,
|
||||
signal.SIGINT,
|
||||
|
|
@ -1170,37 +1058,13 @@ def test_sigint_closes_lifetime_stack(
|
|||
# timeout should trigger!
|
||||
if wait_for_ctx:
|
||||
print('waiting for ctx outcome in parent..')
|
||||
|
||||
if debug_mode:
|
||||
assert delay == 999
|
||||
|
||||
try:
|
||||
with trio.fail_after(
|
||||
1 + delay
|
||||
):
|
||||
with trio.fail_after(1 + delay):
|
||||
await ctx.wait_for_result()
|
||||
except tractor.ContextCancelled as ctxc:
|
||||
assert ctxc.canceller == ctx.chan.uid
|
||||
raise
|
||||
|
||||
except trio.TooSlowError:
|
||||
if (
|
||||
send_sigint_to == 'child'
|
||||
and
|
||||
is_forking_spawner
|
||||
):
|
||||
pytest.xfail(
|
||||
reason=(
|
||||
'SIGINT delivery to fork-child subactor is known '
|
||||
'to NOT SUCCEED, precisely bc we have not wired up a'
|
||||
'"trio SIGINT mode" in the child pre-fork.\n'
|
||||
'Also see `test_orphaned_subactor_sigint_cleanup_DRAFT` for'
|
||||
'a dedicated suite demonstrating this expected limitation as '
|
||||
'well as the detailed doc:\n'
|
||||
'`ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`.\n'
|
||||
),
|
||||
)
|
||||
|
||||
# XXX CASE 2: this seems to be the source of the
|
||||
# original issue which exhibited BEFORE we put
|
||||
# a `Actor.cancel_soon()` inside
|
||||
|
|
@ -1214,21 +1078,6 @@ def test_sigint_closes_lifetime_stack(
|
|||
KeyboardInterrupt,
|
||||
ContextCancelled,
|
||||
):
|
||||
# If we got here BEFORE entering the ctx body (e.g.
|
||||
# spawn-time IPC race hung `open_context.__aenter__` and
|
||||
# the AFK-guard `signal.alarm` fired KBI from outside the
|
||||
# trio loop), `tmp_file`/`ctx` are still `None` — surface
|
||||
# that fact directly instead of `UnboundLocalError`.
|
||||
if tmp_file is None:
|
||||
pytest.fail(
|
||||
'KBI/ctxc fired BEFORE `p.open_context()` returned '
|
||||
"the child's `started` value — likely fork-child "
|
||||
'IPC race; see '
|
||||
'`ai/conc-anal/'
|
||||
'dynamic_pub_sub_spawn_time_transport_close_'
|
||||
'under_mtf_issue.md`'
|
||||
)
|
||||
|
||||
# XXX CASE 2: without the bug fixed, in the
|
||||
# KBI-raised-in-parent case, the actor teardown should
|
||||
# never get run (silently abaondoned by `asyncio`..) and
|
||||
|
|
@ -1236,45 +1085,29 @@ def test_sigint_closes_lifetime_stack(
|
|||
assert not tmp_file.exists()
|
||||
assert ctx.maybe_error
|
||||
|
||||
# outer hard wall-clock backstop via `afk_alarm_w_trace`:
|
||||
# when the in-band trio cancel path doesn't fire (e.g.
|
||||
# parent is parked in a shielded `await` inside actor-
|
||||
# nursery teardown, or `open_context.__aenter__` hangs
|
||||
# waiting for a child's `StartAck` that never comes), the
|
||||
# `signal.alarm` inside the CM raises `AFKAlarmTimeout`
|
||||
# in the main thread regardless of trio's scope state —
|
||||
# AND captures a full diag snapshot to
|
||||
# `$XDG_CACHE_HOME/tractor/hung-dumps/` before re-raising.
|
||||
# Only armed under fork-based backends since this hang-
|
||||
# class is MTF-specific.
|
||||
if (
|
||||
not debug_mode
|
||||
and
|
||||
is_forking_spawner
|
||||
):
|
||||
with afk_alarm_w_trace(10):
|
||||
trio.run(main)
|
||||
else:
|
||||
trio.run(main)
|
||||
trio.run(main)
|
||||
|
||||
|
||||
|
||||
# ?TODO asyncio.Task fn-deco?
|
||||
# -[ ] do sig checkingat import time like @context?
|
||||
# -[ ] maybe name it @aio_task ??
|
||||
# -[ ] chan: to_asyncio.InterloopChannel ??
|
||||
# -[ ] do fn-sig checking at import time like @context?
|
||||
# |_[ ] maybe name it @a(sync)io_task ??
|
||||
# @asyncio_task <- not bad ??
|
||||
async def raise_before_started(
|
||||
# from_trio: asyncio.Queue,
|
||||
# to_trio: trio.abc.SendChannel,
|
||||
chan: to_asyncio.LinkedTaskChannel,
|
||||
|
||||
) -> None:
|
||||
'''
|
||||
`asyncio.Task` entry point which RTEs before calling
|
||||
`chan.started_nowait()`.
|
||||
`to_trio.send_nowait()`.
|
||||
|
||||
'''
|
||||
await asyncio.sleep(0.2)
|
||||
raise RuntimeError('Some shite went wrong before `.send_nowait()`!!')
|
||||
|
||||
# to_trio.send_nowait('Uhh we shouldve RTE-d ^^ ??')
|
||||
chan.started_nowait('Uhh we shouldve RTE-d ^^ ??')
|
||||
await asyncio.sleep(float('inf'))
|
||||
|
||||
|
|
@ -1334,7 +1167,6 @@ def test_aio_side_raises_before_started(
|
|||
with trio.fail_after(3):
|
||||
an: tractor.ActorNursery
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
debug_mode=debug_mode,
|
||||
loglevel=loglevel,
|
||||
) as an:
|
||||
|
|
|
|||
|
|
@ -11,46 +11,18 @@ import trio
|
|||
import tractor
|
||||
from tractor import ( # typing
|
||||
Actor,
|
||||
Context,
|
||||
ContextCancelled,
|
||||
MsgStream,
|
||||
Portal,
|
||||
RemoteActorError,
|
||||
current_actor,
|
||||
open_nursery,
|
||||
Portal,
|
||||
Context,
|
||||
ContextCancelled,
|
||||
RemoteActorError,
|
||||
)
|
||||
from tractor._testing import (
|
||||
# tractor_test,
|
||||
expect_ctxc,
|
||||
)
|
||||
|
||||
from .conftest import cpu_scaling_factor
|
||||
|
||||
pytestmark = [
|
||||
pytest.mark.skipon_spawn_backend(
|
||||
'subint',
|
||||
reason=(
|
||||
'XXX SUBINT GIL-CONTENTION HANGING TEST XXX\n'
|
||||
'Inter-peer cancel cascades under '
|
||||
'`--spawn-backend=subint` trip the abandoned-subint '
|
||||
'GIL-hostage class — see\n'
|
||||
' - `ai/conc-anal/subint_sigint_starvation_issue.md` '
|
||||
'(GIL-hostage, SIGINT-unresponsive)\n'
|
||||
' - `ai/conc-anal/subint_cancel_delivery_hang_issue.md` '
|
||||
'(sibling: parent parks on dead chan)\n'
|
||||
' - https://github.com/goodboy/tractor/issues/379 '
|
||||
'(subint umbrella)\n'
|
||||
)
|
||||
),
|
||||
# NOTE, inter-peer cancellation tests stress the
|
||||
# multi-actor cancel cascade which under SIGKILL
|
||||
# leaves UDS sock-files orphaned. Track per-test
|
||||
# for blame attribution.
|
||||
pytest.mark.usefixtures(
|
||||
'track_orphaned_uds_per_test',
|
||||
),
|
||||
]
|
||||
|
||||
# XXX TODO cases:
|
||||
# - [x] WE cancelled the peer and thus should not see any raised
|
||||
# `ContextCancelled` as it should be reaped silently?
|
||||
|
|
@ -228,7 +200,7 @@ async def stream_from_peer(
|
|||
) -> None:
|
||||
|
||||
# sanity
|
||||
assert tractor.debug_mode() == debug_mode
|
||||
assert tractor._state.debug_mode() == debug_mode
|
||||
|
||||
peer: Portal
|
||||
try:
|
||||
|
|
@ -608,7 +580,7 @@ def test_peer_canceller(
|
|||
assert (
|
||||
re.canceller
|
||||
==
|
||||
root.aid.uid
|
||||
root.uid
|
||||
)
|
||||
|
||||
else: # the other 2 ctxs
|
||||
|
|
@ -617,7 +589,7 @@ def test_peer_canceller(
|
|||
and (
|
||||
re.canceller
|
||||
==
|
||||
canceller.channel.aid.uid
|
||||
canceller.channel.uid
|
||||
)
|
||||
)
|
||||
|
||||
|
|
@ -772,7 +744,7 @@ def test_peer_canceller(
|
|||
# -> each context should have received
|
||||
# a silently absorbed context cancellation
|
||||
# in its remote nursery scope.
|
||||
# assert ctx.chan.aid.uid == ctx.canceller
|
||||
# assert ctx.chan.uid == ctx.canceller
|
||||
|
||||
# NOTE: when an inter-peer cancellation
|
||||
# occurred, we DO NOT expect this
|
||||
|
|
@ -824,12 +796,12 @@ async def basic_echo_server(
|
|||
|
||||
) -> None:
|
||||
'''
|
||||
Just the simplest `MsgStream` echo server which resays what you
|
||||
told it but with its uid in front ;)
|
||||
Just the simplest `MsgStream` echo server which resays what
|
||||
you told it but with its uid in front ;)
|
||||
|
||||
'''
|
||||
actor: Actor = tractor.current_actor()
|
||||
uid: tuple = actor.aid.uid
|
||||
uid: tuple = actor.uid
|
||||
await ctx.started(uid)
|
||||
async with ctx.open_stream() as ipc:
|
||||
async for msg in ipc:
|
||||
|
|
@ -868,7 +840,7 @@ async def serve_subactors(
|
|||
async with open_nursery() as an:
|
||||
|
||||
# sanity
|
||||
assert tractor.debug_mode() == debug_mode
|
||||
assert tractor._state.debug_mode() == debug_mode
|
||||
|
||||
await ctx.started(peer_name)
|
||||
async with ctx.open_stream() as ipc:
|
||||
|
|
@ -884,7 +856,7 @@ async def serve_subactors(
|
|||
f'|_{peer}\n'
|
||||
)
|
||||
await ipc.send((
|
||||
peer.chan.aid.uid,
|
||||
peer.chan.uid,
|
||||
peer.chan.raddr.unwrap(),
|
||||
))
|
||||
|
||||
|
|
@ -907,7 +879,7 @@ async def client_req_subactor(
|
|||
) -> None:
|
||||
# sanity
|
||||
if debug_mode:
|
||||
assert tractor.debug_mode()
|
||||
assert tractor._state.debug_mode()
|
||||
|
||||
# TODO: other cases to do with sub lifetimes:
|
||||
# -[ ] test that we can have the server spawn a sub
|
||||
|
|
@ -994,14 +966,9 @@ async def tell_little_bro(
|
|||
|
||||
caller: str = '',
|
||||
err_after: float|None = None,
|
||||
rng_seed: int = 100,
|
||||
# NOTE, ensure ^ is large enough (on fast hw anyway)
|
||||
# to ensure the peer cancel req arrives before the
|
||||
# echoing dialog does itself Bp
|
||||
rng_seed: int = 50,
|
||||
):
|
||||
# contact target actor, do a stream dialog.
|
||||
lb: Portal
|
||||
echo_ipc: MsgStream
|
||||
async with (
|
||||
tractor.wait_for_actor(
|
||||
name=actor_name
|
||||
|
|
@ -1016,17 +983,17 @@ async def tell_little_bro(
|
|||
else None
|
||||
),
|
||||
) as (sub_ctx, first),
|
||||
|
||||
sub_ctx.open_stream() as echo_ipc,
|
||||
):
|
||||
actor: Actor = current_actor()
|
||||
uid: tuple = actor.aid.uid
|
||||
uid: tuple = actor.uid
|
||||
for i in range(rng_seed):
|
||||
msg: tuple = (
|
||||
uid,
|
||||
i,
|
||||
)
|
||||
await echo_ipc.send(msg)
|
||||
await trio.sleep(0.001)
|
||||
resp = await echo_ipc.receive()
|
||||
print(
|
||||
f'{caller} => {actor_name}: {msg}\n'
|
||||
|
|
@ -1039,9 +1006,6 @@ async def tell_little_bro(
|
|||
assert sub_uid != uid
|
||||
assert _i == i
|
||||
|
||||
# XXX, usually should never get here!
|
||||
# await tractor.pause()
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'raise_client_error',
|
||||
|
|
@ -1056,10 +1020,6 @@ def test_peer_spawns_and_cancels_service_subactor(
|
|||
raise_client_error: str,
|
||||
reg_addr: tuple[str, int],
|
||||
raise_sub_spawn_error_after: float|None,
|
||||
loglevel: str,
|
||||
test_log: tractor.log.StackLevelAdapter,
|
||||
# ^XXX, set to 'warning' to see masked-exc warnings
|
||||
# that may transpire during actor-nursery teardown.
|
||||
):
|
||||
# NOTE: this tests for the modden `mod wks open piker` bug
|
||||
# discovered as part of implementing workspace ctx
|
||||
|
|
@ -1089,7 +1049,6 @@ def test_peer_spawns_and_cancels_service_subactor(
|
|||
# NOTE: to halt the peer tasks on ctxc, uncomment this.
|
||||
debug_mode=debug_mode,
|
||||
registry_addrs=[reg_addr],
|
||||
loglevel=loglevel,
|
||||
) as an:
|
||||
server: Portal = await an.start_actor(
|
||||
(server_name := 'spawn_server'),
|
||||
|
|
@ -1125,7 +1084,7 @@ def test_peer_spawns_and_cancels_service_subactor(
|
|||
) as (client_ctx, client_says),
|
||||
):
|
||||
root: Actor = current_actor()
|
||||
spawner_uid: tuple = spawn_ctx.chan.aid.uid
|
||||
spawner_uid: tuple = spawn_ctx.chan.uid
|
||||
print(
|
||||
f'Server says: {first}\n'
|
||||
f'Client says: {client_says}\n'
|
||||
|
|
@ -1144,7 +1103,7 @@ def test_peer_spawns_and_cancels_service_subactor(
|
|||
print(
|
||||
'Sub-spawn came online\n'
|
||||
f'portal: {sub}\n'
|
||||
f'.uid: {sub.actor.aid.uid}\n'
|
||||
f'.uid: {sub.actor.uid}\n'
|
||||
f'chan.raddr: {sub.chan.raddr}\n'
|
||||
)
|
||||
|
||||
|
|
@ -1178,7 +1137,7 @@ def test_peer_spawns_and_cancels_service_subactor(
|
|||
|
||||
assert isinstance(res, ContextCancelled)
|
||||
assert client_ctx.cancel_acked
|
||||
assert res.canceller == root.aid.uid
|
||||
assert res.canceller == root.uid
|
||||
assert not raise_sub_spawn_error_after
|
||||
|
||||
# cancelling the spawner sub should
|
||||
|
|
@ -1212,8 +1171,8 @@ def test_peer_spawns_and_cancels_service_subactor(
|
|||
# little_bro: a `RuntimeError`.
|
||||
#
|
||||
check_inner_rte(rae)
|
||||
assert rae.relay_uid == client.chan.aid.uid
|
||||
assert rae.src_uid == sub.chan.aid.uid
|
||||
assert rae.relay_uid == client.chan.uid
|
||||
assert rae.src_uid == sub.chan.uid
|
||||
|
||||
assert not client_ctx.cancel_acked
|
||||
assert (
|
||||
|
|
@ -1242,12 +1201,12 @@ def test_peer_spawns_and_cancels_service_subactor(
|
|||
except ContextCancelled as ctxc:
|
||||
_ctxc = ctxc
|
||||
print(
|
||||
f'{root.aid.uid} caught ctxc from ctx with {client_ctx.chan.aid.uid}\n'
|
||||
f'{root.uid} caught ctxc from ctx with {client_ctx.chan.uid}\n'
|
||||
f'{repr(ctxc)}\n'
|
||||
)
|
||||
|
||||
if not raise_sub_spawn_error_after:
|
||||
assert ctxc.canceller == root.aid.uid
|
||||
assert ctxc.canceller == root.uid
|
||||
else:
|
||||
assert ctxc.canceller == spawner_uid
|
||||
|
||||
|
|
@ -1278,20 +1237,9 @@ def test_peer_spawns_and_cancels_service_subactor(
|
|||
|
||||
# assert spawn_ctx.cancelled_caught
|
||||
|
||||
|
||||
async def _main():
|
||||
headroom: float = cpu_scaling_factor()
|
||||
this_fast_on_linux: float = 3
|
||||
this_fast = this_fast_on_linux * headroom
|
||||
if headroom != 1.:
|
||||
test_log.warning(
|
||||
f'Adding latency headroom on linux bc CPU scaling,\n'
|
||||
f'headroom: {headroom}\n'
|
||||
f'this_fast_on_linux: {this_fast_on_linux} -> {this_fast}\n'
|
||||
)
|
||||
with trio.fail_after(
|
||||
this_fast
|
||||
if not debug_mode
|
||||
3 if not debug_mode
|
||||
else 999
|
||||
):
|
||||
await main()
|
||||
|
|
|
|||
|
|
@ -1,22 +1,15 @@
|
|||
'''
|
||||
Streaming via the, now legacy, "async-gen API".
|
||||
|
||||
'''
|
||||
"""
|
||||
Streaming via async gen api
|
||||
"""
|
||||
import time
|
||||
from functools import partial
|
||||
import platform
|
||||
from typing import Callable
|
||||
|
||||
import trio
|
||||
import tractor
|
||||
import pytest
|
||||
|
||||
from tractor._testing import tractor_test
|
||||
from tractor._exceptions import ActorTooSlowError
|
||||
|
||||
_non_linux: bool = (
|
||||
_sys := platform.system()
|
||||
) != 'Linux'
|
||||
|
||||
|
||||
def test_must_define_ctx():
|
||||
|
|
@ -26,11 +19,7 @@ def test_must_define_ctx():
|
|||
async def no_ctx():
|
||||
pass
|
||||
|
||||
assert (
|
||||
"no_ctx must be `ctx: tractor.Context"
|
||||
in
|
||||
str(err.value)
|
||||
)
|
||||
assert "no_ctx must be `ctx: tractor.Context" in str(err.value)
|
||||
|
||||
@tractor.stream
|
||||
async def has_ctx(ctx):
|
||||
|
|
@ -73,23 +62,21 @@ async def stream_from_single_subactor(
|
|||
start_method,
|
||||
stream_func,
|
||||
):
|
||||
'''
|
||||
Verify we can spawn a daemon actor and retrieve streamed data.
|
||||
|
||||
'''
|
||||
"""Verify we can spawn a daemon actor and retrieve streamed data.
|
||||
"""
|
||||
# only one per host address, spawns an actor if None
|
||||
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
start_method=start_method,
|
||||
) as an:
|
||||
) as nursery:
|
||||
|
||||
async with tractor.find_actor('streamerd') as portals:
|
||||
|
||||
if not portals:
|
||||
|
||||
# no brokerd actor found
|
||||
portal = await an.start_actor(
|
||||
portal = await nursery.start_actor(
|
||||
'streamerd',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
|
|
@ -129,22 +116,11 @@ async def stream_from_single_subactor(
|
|||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'stream_func',
|
||||
[
|
||||
async_gen_stream,
|
||||
context_stream,
|
||||
],
|
||||
ids='stream_func={}'.format
|
||||
'stream_func', [async_gen_stream, context_stream]
|
||||
)
|
||||
def test_stream_from_single_subactor(
|
||||
reg_addr: tuple,
|
||||
start_method: str,
|
||||
stream_func: Callable,
|
||||
):
|
||||
'''
|
||||
Verify streaming from a spawned async generator.
|
||||
|
||||
'''
|
||||
def test_stream_from_single_subactor(reg_addr, start_method, stream_func):
|
||||
"""Verify streaming from a spawned async generator.
|
||||
"""
|
||||
trio.run(
|
||||
partial(
|
||||
stream_from_single_subactor,
|
||||
|
|
@ -156,9 +132,10 @@ def test_stream_from_single_subactor(
|
|||
|
||||
|
||||
# this is the first 2 actors, streamer_1 and streamer_2
|
||||
async def stream_data(seed: int):
|
||||
async def stream_data(seed):
|
||||
|
||||
for i in range(seed):
|
||||
|
||||
yield i
|
||||
|
||||
# trigger scheduler to simulate practical usage
|
||||
|
|
@ -166,17 +143,15 @@ async def stream_data(seed: int):
|
|||
|
||||
|
||||
# this is the third actor; the aggregator
|
||||
async def aggregate(seed: int):
|
||||
'''
|
||||
Ensure that the two streams we receive match but only stream
|
||||
async def aggregate(seed):
|
||||
"""Ensure that the two streams we receive match but only stream
|
||||
a single set of values to the parent.
|
||||
|
||||
'''
|
||||
async with tractor.open_nursery() as an:
|
||||
"""
|
||||
async with tractor.open_nursery() as nursery:
|
||||
portals = []
|
||||
for i in range(1, 3):
|
||||
# fork point
|
||||
portal = await an.start_actor(
|
||||
portal = await nursery.start_actor(
|
||||
name=f'streamer_{i}',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
|
|
@ -189,28 +164,20 @@ async def aggregate(seed: int):
|
|||
async with send_chan:
|
||||
|
||||
async with portal.open_stream_from(
|
||||
stream_data,
|
||||
seed=seed,
|
||||
stream_data, seed=seed,
|
||||
) as stream:
|
||||
|
||||
async for value in stream:
|
||||
# leverage trio's built-in backpressure
|
||||
await send_chan.send(value)
|
||||
|
||||
print(
|
||||
f'FINISHED ITERATING!\n'
|
||||
f'peer: {portal.channel.aid.uid}'
|
||||
)
|
||||
print(f"FINISHED ITERATING {portal.channel.uid}")
|
||||
|
||||
# spawn 2 trio tasks to collect streams and push to a local queue
|
||||
async with trio.open_nursery() as tn:
|
||||
async with trio.open_nursery() as n:
|
||||
|
||||
for portal in portals:
|
||||
tn.start_soon(
|
||||
push_to_chan,
|
||||
portal,
|
||||
send_chan.clone(),
|
||||
)
|
||||
n.start_soon(push_to_chan, portal, send_chan.clone())
|
||||
|
||||
# close this local task's reference to send side
|
||||
await send_chan.aclose()
|
||||
|
|
@ -227,21 +194,20 @@ async def aggregate(seed: int):
|
|||
|
||||
print("FINISHED ITERATING in aggregator")
|
||||
|
||||
await an.cancel()
|
||||
await nursery.cancel()
|
||||
print("WAITING on `ActorNursery` to finish")
|
||||
print("AGGREGATOR COMPLETE!")
|
||||
|
||||
|
||||
async def a_quadruple_example() -> list[int]:
|
||||
'''
|
||||
Open the root-actor which is also a "registrar".
|
||||
# this is the main actor and *arbiter*
|
||||
async def a_quadruple_example():
|
||||
# a nursery which spawns "actors"
|
||||
async with tractor.open_nursery() as nursery:
|
||||
|
||||
'''
|
||||
async with tractor.open_nursery() as an:
|
||||
seed = int(1e3)
|
||||
pre_start = time.time()
|
||||
|
||||
portal = await an.start_actor(
|
||||
portal = await nursery.start_actor(
|
||||
name='aggregator',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
|
|
@ -249,45 +215,23 @@ async def a_quadruple_example() -> list[int]:
|
|||
start = time.time()
|
||||
# the portal call returns exactly what you'd expect
|
||||
# as if the remote "aggregate" function was called locally
|
||||
result_stream: list[int] = []
|
||||
result_stream = []
|
||||
|
||||
async with portal.open_stream_from(
|
||||
aggregate,
|
||||
seed=seed,
|
||||
) as stream:
|
||||
async with portal.open_stream_from(aggregate, seed=seed) as stream:
|
||||
async for value in stream:
|
||||
result_stream.append(value)
|
||||
|
||||
print(
|
||||
f"STREAM TIME = {time.time() - start}\n"
|
||||
f"STREAM + SPAWN TIME = {time.time() - pre_start}\n"
|
||||
)
|
||||
print(f"STREAM TIME = {time.time() - start}")
|
||||
print(f"STREAM + SPAWN TIME = {time.time() - pre_start}")
|
||||
assert result_stream == list(range(seed))
|
||||
await portal.cancel_actor()
|
||||
return result_stream
|
||||
|
||||
|
||||
async def cancel_after(
|
||||
wait: float,
|
||||
reg_addr: tuple,
|
||||
expect_cancel: bool,
|
||||
) -> list[int]:
|
||||
|
||||
async with tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
):
|
||||
res: list[int]|None = None
|
||||
with trio.move_on_after(wait) as cs:
|
||||
res: list[int] = await a_quadruple_example()
|
||||
return res
|
||||
|
||||
if (
|
||||
not expect_cancel
|
||||
and
|
||||
cs.cancelled_caught
|
||||
):
|
||||
assert not res
|
||||
raise ActorTooSlowError
|
||||
async def cancel_after(wait, reg_addr):
|
||||
async with tractor.open_root_actor(registry_addrs=[reg_addr]):
|
||||
with trio.move_on_after(wait):
|
||||
return await a_quadruple_example()
|
||||
|
||||
|
||||
@pytest.fixture(scope='module')
|
||||
|
|
@ -295,16 +239,7 @@ def time_quad_ex(
|
|||
reg_addr: tuple,
|
||||
ci_env: bool,
|
||||
spawn_backend: str,
|
||||
is_forking_spawner: bool,
|
||||
tpt_proto: str,
|
||||
):
|
||||
if (
|
||||
ci_env
|
||||
and
|
||||
_non_linux
|
||||
):
|
||||
pytest.skip(f'Test is too flaky on {_sys!r} in CI')
|
||||
|
||||
if spawn_backend == 'mp':
|
||||
'''
|
||||
no idea but the mp *nix runs are flaking out here often...
|
||||
|
|
@ -312,79 +247,32 @@ def time_quad_ex(
|
|||
'''
|
||||
pytest.skip("Test is too flaky on mp in CI")
|
||||
|
||||
timeout: float = (
|
||||
7 if _non_linux
|
||||
else 4
|
||||
)
|
||||
|
||||
if (
|
||||
is_forking_spawner
|
||||
and
|
||||
tpt_proto in [
|
||||
'uds',
|
||||
]
|
||||
):
|
||||
timeout += 1
|
||||
|
||||
start: float = time.time()
|
||||
results: list[int] = trio.run(partial(
|
||||
cancel_after,
|
||||
wait=timeout,
|
||||
reg_addr=reg_addr,
|
||||
expect_cancel=True,
|
||||
))
|
||||
diff: float = time.time() - start
|
||||
if results is None:
|
||||
raise ActorTooSlowError(
|
||||
f'Streaming example took longer then timeout ??\n'
|
||||
f'timeout={timeout!r}\n'
|
||||
f'diff={diff!r}\n'
|
||||
f'results={results!r}\n'
|
||||
)
|
||||
|
||||
timeout = 7 if platform.system() in ('Windows', 'Darwin') else 4
|
||||
start = time.time()
|
||||
results = trio.run(cancel_after, timeout, reg_addr)
|
||||
diff = time.time() - start
|
||||
assert results
|
||||
return results, diff
|
||||
|
||||
|
||||
def test_a_quadruple_example(
|
||||
time_quad_ex: tuple[list[int], float],
|
||||
time_quad_ex: tuple,
|
||||
ci_env: bool,
|
||||
spawn_backend: str,
|
||||
test_log: tractor.log.StackLevelAdapter,
|
||||
):
|
||||
'''
|
||||
This also serves as a "we'd like to be this fast" smoke test
|
||||
given past empirical eval of this suite.
|
||||
This also serves as a kind of "we'd like to be this fast test".
|
||||
|
||||
'''
|
||||
|
||||
this_fast_on_linux: float = 3
|
||||
this_fast = (
|
||||
6 if _non_linux
|
||||
else this_fast_on_linux
|
||||
)
|
||||
# ^ XXX NOTE,
|
||||
# i've noticed that tweaking the CPU governor setting
|
||||
# to not "always" enable "turbo" mode can result in latency
|
||||
# which causes this limit to be too little. Not sure if it'd
|
||||
# be worth it to adjust the linux value based on reading the
|
||||
# CPU conf from the sys?
|
||||
#
|
||||
# For ex, see the `auto-cpufreq` docs on such settings,
|
||||
# https://github.com/AdnanHodzic/auto-cpufreq?tab=readme-ov-file#example-config-file-contents
|
||||
#
|
||||
# HENCE this below latency-headroom compensation logic..
|
||||
from .conftest import cpu_scaling_factor
|
||||
headroom: float = cpu_scaling_factor()
|
||||
if headroom != 1.:
|
||||
this_fast = this_fast_on_linux * headroom
|
||||
test_log.warning(
|
||||
f'Adding latency headroom on linux bc CPU scaling,\n'
|
||||
f'headroom: {headroom}\n'
|
||||
f'this_fast_on_linux: {this_fast_on_linux} -> {this_fast}\n'
|
||||
)
|
||||
|
||||
results, diff = time_quad_ex
|
||||
assert results
|
||||
this_fast = (
|
||||
6 if platform.system() in (
|
||||
'Windows',
|
||||
'Darwin',
|
||||
)
|
||||
else 3
|
||||
)
|
||||
assert diff < this_fast
|
||||
|
||||
|
||||
|
|
@ -393,77 +281,43 @@ def test_a_quadruple_example(
|
|||
list(map(lambda i: i/10, range(3, 9)))
|
||||
)
|
||||
def test_not_fast_enough_quad(
|
||||
reg_addr: tuple,
|
||||
time_quad_ex: tuple[list[int], float],
|
||||
cancel_delay: float,
|
||||
|
||||
ci_env: bool,
|
||||
spawn_backend: str,
|
||||
is_forking_spawner: bool,
|
||||
tpt_proto: str,
|
||||
test_log: tractor.log.StackLevelAdapter,
|
||||
reg_addr, time_quad_ex, cancel_delay, ci_env, spawn_backend
|
||||
):
|
||||
'''
|
||||
Verify we can cancel midway through `a_quadruple_example()`, at
|
||||
various delays, and all subactors cancel gracefully.
|
||||
|
||||
'''
|
||||
"""Verify we can cancel midway through the quad example and all actors
|
||||
cancel gracefully.
|
||||
"""
|
||||
results, diff = time_quad_ex
|
||||
delay = max(diff - cancel_delay, 0)
|
||||
results: list[int] = trio.run(partial(
|
||||
cancel_after,
|
||||
wait=delay,
|
||||
reg_addr=reg_addr,
|
||||
expect_cancel=True,
|
||||
))
|
||||
system: str = platform.system()
|
||||
if (
|
||||
system in ('Windows', 'Darwin')
|
||||
and
|
||||
results is not None
|
||||
):
|
||||
results = trio.run(cancel_after, delay, reg_addr)
|
||||
system = platform.system()
|
||||
if system in ('Windows', 'Darwin') and results is not None:
|
||||
# In CI envoirments it seems later runs are quicker then the first
|
||||
# so just ignore these
|
||||
print(f'Woa there {system} caught your breath eh?')
|
||||
print(f"Woa there {system} caught your breath eh?")
|
||||
else:
|
||||
if (
|
||||
results
|
||||
and
|
||||
is_forking_spawner
|
||||
and
|
||||
tpt_proto in [
|
||||
'uds',
|
||||
]
|
||||
):
|
||||
pytest.xfail(
|
||||
f'Spawning backend + tpt-proto is too fast XD\n'
|
||||
f'{spawn_backend!r} + {tpt_proto!r}\n'
|
||||
)
|
||||
|
||||
# should be cancelled mid-streaming
|
||||
assert results is None
|
||||
|
||||
|
||||
@tractor_test(timeout=20)
|
||||
@tractor_test
|
||||
async def test_respawn_consumer_task(
|
||||
reg_addr: tuple,
|
||||
spawn_backend: str,
|
||||
loglevel: str,
|
||||
reg_addr,
|
||||
spawn_backend,
|
||||
loglevel,
|
||||
):
|
||||
'''
|
||||
Verify that ``._portal.ReceiveStream.shield()``
|
||||
"""Verify that ``._portal.ReceiveStream.shield()``
|
||||
sucessfully protects the underlying IPC channel from being closed
|
||||
when cancelling and respawning a consumer task.
|
||||
|
||||
This also serves to verify that all values from the stream can be
|
||||
received despite the respawns.
|
||||
|
||||
'''
|
||||
"""
|
||||
stream = None
|
||||
|
||||
async with tractor.open_nursery() as an:
|
||||
async with tractor.open_nursery() as n:
|
||||
|
||||
portal = await an.start_actor(
|
||||
portal = await n.start_actor(
|
||||
name='streamer',
|
||||
enable_modules=[__name__]
|
||||
)
|
||||
|
|
|
|||
|
|
@ -1,5 +1,5 @@
|
|||
"""
|
||||
Registrar and "local" actor api
|
||||
Arbiter and "local" actor api
|
||||
"""
|
||||
import time
|
||||
|
||||
|
|
@ -10,28 +10,24 @@ import tractor
|
|||
from tractor._testing import tractor_test
|
||||
|
||||
|
||||
def test_no_runtime():
|
||||
'''
|
||||
A registrar must be established before any nurseries
|
||||
@pytest.mark.trio
|
||||
async def test_no_runtime():
|
||||
"""An arbitter must be established before any nurseries
|
||||
can be created.
|
||||
|
||||
(In other words ``tractor.open_root_actor()`` must be
|
||||
engaged at some point?)
|
||||
|
||||
'''
|
||||
async def main():
|
||||
(In other words ``tractor.open_root_actor()`` must be engaged at
|
||||
some point?)
|
||||
"""
|
||||
with pytest.raises(RuntimeError) :
|
||||
async with tractor.find_actor('doggy'):
|
||||
pass
|
||||
|
||||
with pytest.raises(tractor._exceptions.NoRuntime) :
|
||||
trio.run(main)
|
||||
|
||||
|
||||
@tractor_test
|
||||
async def test_self_is_registered(reg_addr):
|
||||
"Verify waiting on the registrar to register itself using the standard api."
|
||||
"Verify waiting on the arbiter to register itself using the standard api."
|
||||
actor = tractor.current_actor()
|
||||
assert actor.is_registrar
|
||||
assert actor.is_arbiter
|
||||
with trio.fail_after(0.2):
|
||||
async with tractor.wait_for_actor('root') as portal:
|
||||
assert portal.channel.uid[0] == 'root'
|
||||
|
|
@ -39,11 +35,11 @@ async def test_self_is_registered(reg_addr):
|
|||
|
||||
@tractor_test
|
||||
async def test_self_is_registered_localportal(reg_addr):
|
||||
"Verify waiting on the registrar to register itself using a local portal."
|
||||
"Verify waiting on the arbiter to register itself using a local portal."
|
||||
actor = tractor.current_actor()
|
||||
assert actor.is_registrar
|
||||
assert actor.is_arbiter
|
||||
async with tractor.get_registry(reg_addr) as portal:
|
||||
assert isinstance(portal, tractor.runtime._portal.LocalPortal)
|
||||
assert isinstance(portal, tractor._portal.LocalPortal)
|
||||
|
||||
with trio.fail_after(0.2):
|
||||
sockaddr = await portal.run_from_ns(
|
||||
|
|
@ -61,8 +57,8 @@ def test_local_actor_async_func(reg_addr):
|
|||
async with tractor.open_root_actor(
|
||||
registry_addrs=[reg_addr],
|
||||
):
|
||||
# registrar is started in-proc if dne
|
||||
assert tractor.current_actor().is_registrar
|
||||
# arbiter is started in-proc if dne
|
||||
assert tractor.current_actor().is_arbiter
|
||||
|
||||
for i in range(10):
|
||||
nums.append(i)
|
||||
|
|
|
|||
|
|
@ -1,260 +0,0 @@
|
|||
'''
|
||||
`tractor.log`-wrapping unit tests.
|
||||
|
||||
'''
|
||||
from pathlib import Path
|
||||
import shutil
|
||||
from types import ModuleType
|
||||
|
||||
import pytest
|
||||
import tractor
|
||||
from tractor import (
|
||||
_code_load,
|
||||
log,
|
||||
)
|
||||
|
||||
|
||||
def test_root_pkg_not_duplicated_in_logger_name():
|
||||
'''
|
||||
When both `pkg_name` and `name` are passed and they have
|
||||
a common `<root_name>.< >` prefix, ensure that it is not
|
||||
duplicated in the child's `StackLevelAdapter.name: str`.
|
||||
|
||||
Also pins the explicit-`name` contract: an explicitly passed
|
||||
dotted `name` is treated as a *literal* sub-logger path and is
|
||||
NOT leaf-collapsed. The leaf-module is only dropped when the
|
||||
trailing token duplicates the *caller's own* `__name__` leaf (the
|
||||
`{filename}` field) — see `test_implicit_mod_name_applied_for_child`
|
||||
for that (auto-naming) path. This is what keeps a real (possibly
|
||||
nested) sub-PACKAGE like `subpkg.mod` -> `devx.debug` addressable
|
||||
by the `tractor.log` logging-spec, instead of collapsing to its
|
||||
parent.
|
||||
|
||||
'''
|
||||
project_name: str = 'pylib'
|
||||
pkg_path: str = 'pylib.subpkg.mod'
|
||||
|
||||
assert not tractor.current_actor(
|
||||
err_on_no_runtime=False,
|
||||
)
|
||||
proj_log = log.get_logger(
|
||||
pkg_name=project_name,
|
||||
mk_sublog=False,
|
||||
)
|
||||
|
||||
sublog = log.get_logger(
|
||||
pkg_name=project_name,
|
||||
name=pkg_path,
|
||||
)
|
||||
|
||||
assert proj_log is not sublog
|
||||
# the root pkg-name appears exactly once (no `pylib.pylib...`)
|
||||
assert sublog.name.count(proj_log.name) == 1
|
||||
# explicit dotted `name` is preserved literally (NOT collapsed);
|
||||
# the trailing token survives since it's not the *caller's* own
|
||||
# leaf-module (`test_log_sys`), so this is treated as a literal
|
||||
# sub-pkg path.
|
||||
assert sublog.name == f'{project_name}.subpkg.mod'
|
||||
|
||||
|
||||
def test_implicit_mod_name_applied_for_child(
|
||||
testdir: pytest.Pytester,
|
||||
loglevel: str,
|
||||
):
|
||||
'''
|
||||
Verify that when `.log.get_logger(pkg_name='pylib')` is called
|
||||
from a given sub-mod from within the `pylib` pkg-path, we
|
||||
implicitly set the equiv of `name=__name__` from the caller's
|
||||
module.
|
||||
|
||||
'''
|
||||
# tractor.log.get_console_log(level=loglevel)
|
||||
proj_name: str = 'snakelib'
|
||||
mod_code: str = (
|
||||
f'import tractor\n'
|
||||
f'\n'
|
||||
# if you need to trace `testdir` stuff @ import-time..
|
||||
# f'breakpoint()\n'
|
||||
f'log = tractor.log.get_logger(pkg_name="{proj_name}")\n'
|
||||
)
|
||||
|
||||
# create a sub-module for each pkg layer
|
||||
_lib = testdir.mkpydir(proj_name)
|
||||
pkg: Path = Path(_lib)
|
||||
pkg_init_mod: Path = pkg / "__init__.py"
|
||||
pkg_init_mod.write_text(mod_code)
|
||||
|
||||
subpkg: Path = pkg / 'subpkg'
|
||||
subpkg.mkdir()
|
||||
subpkgmod: Path = subpkg / "__init__.py"
|
||||
subpkgmod.touch()
|
||||
subpkgmod.write_text(mod_code)
|
||||
|
||||
_submod: Path = testdir.makepyfile(
|
||||
_mod=mod_code,
|
||||
)
|
||||
|
||||
pkg_submod = pkg / 'mod.py'
|
||||
pkg_subpkg_submod = subpkg / 'submod.py'
|
||||
shutil.copyfile(
|
||||
_submod,
|
||||
pkg_submod,
|
||||
)
|
||||
shutil.copyfile(
|
||||
_submod,
|
||||
pkg_subpkg_submod,
|
||||
)
|
||||
testdir.chdir()
|
||||
# NOTE, to introspect the py-file-module-layout use (in .xsh
|
||||
# syntax): `ranger @str(testdir)`
|
||||
|
||||
# XXX NOTE, once the "top level" pkg mod has been
|
||||
# imported, we can then use `import` syntax to
|
||||
# import it's sub-pkgs and modules.
|
||||
subpkgmod: ModuleType = _code_load.load_module_from_path(
|
||||
Path(pkg / '__init__.py'),
|
||||
module_name=proj_name,
|
||||
)
|
||||
|
||||
pkg_root_log = log.get_logger(
|
||||
pkg_name=proj_name,
|
||||
mk_sublog=False,
|
||||
)
|
||||
# the top level pkg-mod, created just now,
|
||||
# by above API call.
|
||||
assert pkg_root_log.name == proj_name
|
||||
assert not pkg_root_log.logger.getChildren()
|
||||
#
|
||||
# ^TODO! test this same output but created via a `get_logger()`
|
||||
# call in the `snakelib.__init__py`!!
|
||||
|
||||
# NOTE, the pkg-level "init mod" should of course
|
||||
# have the same name as the package ns-path.
|
||||
import snakelib as init_mod
|
||||
assert init_mod.log.name == proj_name
|
||||
|
||||
# NOTE, a first-pkg-level sub-module should only
|
||||
# use the package-name since the leaf-node-module
|
||||
# will be included in log headers by default.
|
||||
from snakelib import mod
|
||||
assert mod.log.name == proj_name
|
||||
|
||||
from snakelib import subpkg
|
||||
assert (
|
||||
subpkg.log.name
|
||||
==
|
||||
subpkg.__package__
|
||||
==
|
||||
f'{proj_name}.subpkg'
|
||||
)
|
||||
|
||||
from snakelib.subpkg import submod
|
||||
assert (
|
||||
submod.log.name
|
||||
==
|
||||
submod.__package__
|
||||
==
|
||||
f'{proj_name}.subpkg'
|
||||
)
|
||||
|
||||
sub_logs = pkg_root_log.logger.getChildren()
|
||||
assert len(sub_logs) == 1 # only one nested sub-pkg module
|
||||
assert submod.log.logger in sub_logs
|
||||
|
||||
|
||||
def test_io_custom_level_registered():
|
||||
'''
|
||||
The `IO`(21) level (registered via `add_log_level()` at
|
||||
import, for `tractor.trionics._subproc`'s std-stream relay)
|
||||
is fully wired and SHOWN BY DEFAULT at `info`-level consoles
|
||||
since `21 >= INFO(20)`.
|
||||
|
||||
'''
|
||||
import logging
|
||||
assert log.CUSTOM_LEVELS.get('IO') == 21
|
||||
assert logging.getLevelName(21) == 'IO'
|
||||
assert log.STD_PALETTE.get('IO')
|
||||
assert log.BOLD_PALETTE['bold'].get('IO')
|
||||
|
||||
iolog = log.get_logger('io_lvl_test')
|
||||
assert callable(getattr(iolog, 'io', None))
|
||||
# emit must not raise
|
||||
iolog.io('hello from the IO level')
|
||||
|
||||
# 21 >= INFO(20) -> shown when console set to `info`
|
||||
assert 21 >= logging.INFO
|
||||
|
||||
|
||||
def test_add_log_level_pluggable():
|
||||
'''
|
||||
`add_log_level()` is the single pluggable entry-point: one
|
||||
call wires `CUSTOM_LEVELS` + `addLevelName` + both palettes +
|
||||
a same-named `StackLevelAdapter` emit method (so
|
||||
`get_logger()`'s per-level audit passes).
|
||||
|
||||
'''
|
||||
import logging
|
||||
name: str = 'XLVL'
|
||||
val: int = 19
|
||||
try:
|
||||
log.add_log_level(name, val, 'cyan')
|
||||
|
||||
assert log.CUSTOM_LEVELS[name] == val
|
||||
assert logging.getLevelName(val) == name
|
||||
assert log.STD_PALETTE[name] == 'cyan'
|
||||
assert log.BOLD_PALETTE['bold'][name] == 'bold_cyan'
|
||||
|
||||
# the audit in `get_logger()` (asserts a method per
|
||||
# `CUSTOM_LEVELS` entry) must still pass.
|
||||
xlog = log.get_logger('xlvl_test')
|
||||
emit = getattr(xlog, name.lower(), None)
|
||||
assert callable(emit)
|
||||
emit('hello from a plugged-in level')
|
||||
|
||||
finally:
|
||||
# best-effort cleanup of our module-global mutations so
|
||||
# later `get_logger()` audits don't see a half-removed
|
||||
# level.
|
||||
log.CUSTOM_LEVELS.pop(name, None)
|
||||
log.STD_PALETTE.pop(name, None)
|
||||
log.BOLD_PALETTE['bold'].pop(name, None)
|
||||
if hasattr(log.StackLevelAdapter, name.lower()):
|
||||
delattr(log.StackLevelAdapter, name.lower())
|
||||
|
||||
|
||||
# TODO, moar tests against existing feats:
|
||||
# ------ - ------
|
||||
# - [ ] color settings?
|
||||
# - [ ] header contents like,
|
||||
# - actor + thread + task names from various conc-primitives,
|
||||
# - [ ] `StackLevelAdapter` extensions,
|
||||
# - our custom levels/methods: `transport|runtime|cance|pdb|devx`
|
||||
# - [ ] custom-headers support?
|
||||
#
|
||||
|
||||
# TODO, test driven dev of new-ideas/long-wanted feats,
|
||||
# ------ - ------
|
||||
# - [ ] https://github.com/goodboy/tractor/issues/244
|
||||
# - [ ] @catern mentioned using a sync / deterministic sys
|
||||
# and in particular `svlogd`?
|
||||
# |_ https://smarden.org/runit/svlogd.8
|
||||
|
||||
# - [ ] using adapter vs. filters?
|
||||
# - https://stackoverflow.com/questions/60691759/add-information-to-every-log-message-in-python-logging/61830838#61830838
|
||||
|
||||
# - [ ] `.at_least_level()` optimization which short circuits wtv
|
||||
# `logging` is doing behind the scenes when the level filters
|
||||
# the emission..?
|
||||
|
||||
# - [ ] use of `.log.get_console_log()` in subactors and the
|
||||
# subtleties of ensuring it actually emits from a subproc.
|
||||
|
||||
# - [ ] this idea of activating per-subsys emissions with some
|
||||
# kind of `.name` filter passed to the runtime or maybe configured
|
||||
# via the root `StackLevelAdapter`?
|
||||
|
||||
# - [ ] use of `logging.dict.dictConfig()` to simplify the impl
|
||||
# of any of ^^ ??
|
||||
# - https://stackoverflow.com/questions/7507825/where-is-a-complete-example-of-logging-config-dictconfig
|
||||
# - https://docs.python.org/3/library/logging.config.html#configuration-dictionary-schema
|
||||
# - https://docs.python.org/3/library/logging.config.html#logging.config.dictConfig
|
||||
|
|
@ -0,0 +1,68 @@
|
|||
"""
|
||||
Multiple python programs invoking the runtime.
|
||||
"""
|
||||
import platform
|
||||
import time
|
||||
|
||||
import pytest
|
||||
import trio
|
||||
import tractor
|
||||
from tractor._testing import (
|
||||
tractor_test,
|
||||
)
|
||||
from .conftest import (
|
||||
sig_prog,
|
||||
_INT_SIGNAL,
|
||||
_INT_RETURN_CODE,
|
||||
)
|
||||
|
||||
|
||||
def test_abort_on_sigint(daemon):
|
||||
assert daemon.returncode is None
|
||||
time.sleep(0.1)
|
||||
sig_prog(daemon, _INT_SIGNAL)
|
||||
assert daemon.returncode == _INT_RETURN_CODE
|
||||
|
||||
# XXX: oddly, couldn't get capfd.readouterr() to work here?
|
||||
if platform.system() != 'Windows':
|
||||
# don't check stderr on windows as its empty when sending CTRL_C_EVENT
|
||||
assert "KeyboardInterrupt" in str(daemon.stderr.read())
|
||||
|
||||
|
||||
@tractor_test
|
||||
async def test_cancel_remote_arbiter(daemon, reg_addr):
|
||||
assert not tractor.current_actor().is_arbiter
|
||||
async with tractor.get_registry(reg_addr) as portal:
|
||||
await portal.cancel_actor()
|
||||
|
||||
time.sleep(0.1)
|
||||
# the arbiter channel server is cancelled but not its main task
|
||||
assert daemon.returncode is None
|
||||
|
||||
# no arbiter socket should exist
|
||||
with pytest.raises(OSError):
|
||||
async with tractor.get_registry(reg_addr) as portal:
|
||||
pass
|
||||
|
||||
|
||||
def test_register_duplicate_name(daemon, reg_addr):
|
||||
|
||||
async def main():
|
||||
|
||||
async with tractor.open_nursery(
|
||||
registry_addrs=[reg_addr],
|
||||
) as n:
|
||||
|
||||
assert not tractor.current_actor().is_arbiter
|
||||
|
||||
p1 = await n.start_actor('doggy')
|
||||
p2 = await n.start_actor('doggy')
|
||||
|
||||
async with tractor.wait_for_actor('doggy') as portal:
|
||||
assert portal.channel.uid in (p2.channel.uid, p1.channel.uid)
|
||||
|
||||
await n.cancel()
|
||||
|
||||
# run it manually since we want to start **after**
|
||||
# the other "daemon" program
|
||||
trio.run(main)
|
||||
|
|
@ -55,38 +55,13 @@ async def maybe_expect_raises(
|
|||
raises: BaseException|None = None,
|
||||
ensure_in_message: list[str]|None = None,
|
||||
post_mortem: bool = False,
|
||||
# NOTE, `None` selects a backend-aware default below —
|
||||
# see `_BACKEND_TIMEOUT_DEFAULTS` for rationale. Caller
|
||||
# can override with an explicit value to opt out.
|
||||
timeout: int|None = None,
|
||||
timeout: int = 3,
|
||||
) -> None:
|
||||
'''
|
||||
Async wrapper for ensuring errors propagate from the inner scope.
|
||||
|
||||
'''
|
||||
if timeout is None:
|
||||
# Pick a backend-aware default. Fork-based backends
|
||||
# (`main_thread_forkserver`) need much more headroom
|
||||
# because actor spawn + IPC ctx-exit + msg-validation
|
||||
# error path takes longer than under `trio` backend
|
||||
# — especially under cross-pytest-stream contention
|
||||
# (#451). `test_basic_payload_spec` empirically:
|
||||
# - 3s flaked all-valid variant (`TooSlowError`)
|
||||
# - 8s flaked `invalid-return` variant
|
||||
# (`Cancelled` surfaced instead of `MsgTypeError`
|
||||
# because `fail_after` fired mid-error-path)
|
||||
# - 15s flaked under cross-stream contention
|
||||
# 30s for fork-based gives plenty of headroom while
|
||||
# still failing-loud on a genuine hang. Other
|
||||
# backends keep the original 3s.
|
||||
from tractor.spawn import _spawn as _spawn_mod
|
||||
timeout = (
|
||||
30
|
||||
if _spawn_mod._spawn_method == 'main_thread_forkserver'
|
||||
else 3
|
||||
)
|
||||
|
||||
if tractor.debug_mode():
|
||||
if tractor._state.debug_mode():
|
||||
timeout += 999
|
||||
|
||||
with trio.fail_after(timeout):
|
||||
|
|
@ -284,11 +259,6 @@ def test_basic_payload_spec(
|
|||
return_value: str|None,
|
||||
started_value: int|PldMsg,
|
||||
pld_check_started_value: bool,
|
||||
|
||||
set_fork_aware_capture,
|
||||
# ^XXX TODO? for forking spawners, seems to prevent hangs when
|
||||
# --capture=sys not set, but only for a while then the problem
|
||||
# accumulates?
|
||||
):
|
||||
'''
|
||||
Validate the most basic `PldRx` msg-type-spec semantics around
|
||||
|
|
@ -7,14 +7,6 @@ import tractor
|
|||
from tractor.experimental import msgpub
|
||||
from tractor._testing import tractor_test
|
||||
|
||||
pytestmark = pytest.mark.skipon_spawn_backend(
|
||||
'subint',
|
||||
reason=(
|
||||
'XXX SUBINT HANGING TEST XXX\n'
|
||||
'See oustanding issue(s)\n'
|
||||
# TODO, put issue link!
|
||||
)
|
||||
)
|
||||
|
||||
def test_type_checks():
|
||||
|
||||
|
|
|
|||
|
|
@ -1,333 +0,0 @@
|
|||
'''
|
||||
Verify that externally registered remote actor error
|
||||
types are correctly relayed, boxed, and re-raised across
|
||||
IPC actor hops via `reg_err_types()`.
|
||||
|
||||
Also ensure that when custom error types are NOT registered
|
||||
the framework indicates the lookup failure to the user.
|
||||
|
||||
'''
|
||||
import pytest
|
||||
import trio
|
||||
import tractor
|
||||
from tractor import (
|
||||
Context,
|
||||
Portal,
|
||||
RemoteActorError,
|
||||
)
|
||||
from tractor._exceptions import (
|
||||
get_err_type,
|
||||
reg_err_types,
|
||||
)
|
||||
|
||||
|
||||
# -- custom app-level errors for testing --
|
||||
class CustomAppError(Exception):
|
||||
'''
|
||||
A hypothetical user-app error that should be
|
||||
boxed+relayed by `tractor` IPC when registered.
|
||||
|
||||
'''
|
||||
|
||||
|
||||
class AnotherAppError(Exception):
|
||||
'''
|
||||
A second custom error for multi-type registration.
|
||||
|
||||
'''
|
||||
|
||||
|
||||
class UnregisteredAppError(Exception):
|
||||
'''
|
||||
A custom error that is intentionally NEVER
|
||||
registered via `reg_err_types()` so we can
|
||||
verify the framework's failure indication.
|
||||
|
||||
'''
|
||||
|
||||
|
||||
# -- remote-task endpoints --
|
||||
@tractor.context
|
||||
async def raise_custom_err(
|
||||
ctx: Context,
|
||||
) -> None:
|
||||
'''
|
||||
Remote ep that raises a `CustomAppError`
|
||||
after sync-ing with the caller.
|
||||
|
||||
'''
|
||||
await ctx.started()
|
||||
raise CustomAppError(
|
||||
'the app exploded remotely'
|
||||
)
|
||||
|
||||
|
||||
@tractor.context
|
||||
async def raise_another_err(
|
||||
ctx: Context,
|
||||
) -> None:
|
||||
'''
|
||||
Remote ep that raises `AnotherAppError`.
|
||||
|
||||
'''
|
||||
await ctx.started()
|
||||
raise AnotherAppError(
|
||||
'another app-level kaboom'
|
||||
)
|
||||
|
||||
|
||||
@tractor.context
|
||||
async def raise_unreg_err(
|
||||
ctx: Context,
|
||||
) -> None:
|
||||
'''
|
||||
Remote ep that raises an `UnregisteredAppError`
|
||||
which has NOT been `reg_err_types()`-registered.
|
||||
|
||||
'''
|
||||
await ctx.started()
|
||||
raise UnregisteredAppError(
|
||||
'this error type is unknown to tractor'
|
||||
)
|
||||
|
||||
|
||||
# -- unit tests for the type-registry plumbing --
|
||||
|
||||
class TestRegErrTypesPlumbing:
|
||||
'''
|
||||
Low-level checks on `reg_err_types()` and
|
||||
`get_err_type()` without requiring IPC.
|
||||
|
||||
'''
|
||||
|
||||
def test_unregistered_type_returns_none(self):
|
||||
'''
|
||||
An unregistered custom error name should yield
|
||||
`None` from `get_err_type()`.
|
||||
|
||||
'''
|
||||
result = get_err_type('CustomAppError')
|
||||
assert result is None
|
||||
|
||||
def test_register_and_lookup(self):
|
||||
'''
|
||||
After `reg_err_types()`, the custom type should
|
||||
be discoverable via `get_err_type()`.
|
||||
|
||||
'''
|
||||
reg_err_types([CustomAppError])
|
||||
result = get_err_type('CustomAppError')
|
||||
assert result is CustomAppError
|
||||
|
||||
def test_register_multiple_types(self):
|
||||
'''
|
||||
Registering a list of types should make each
|
||||
one individually resolvable.
|
||||
|
||||
'''
|
||||
reg_err_types([
|
||||
CustomAppError,
|
||||
AnotherAppError,
|
||||
])
|
||||
assert (
|
||||
get_err_type('CustomAppError')
|
||||
is CustomAppError
|
||||
)
|
||||
assert (
|
||||
get_err_type('AnotherAppError')
|
||||
is AnotherAppError
|
||||
)
|
||||
|
||||
def test_builtin_types_always_resolve(self):
|
||||
'''
|
||||
Builtin error types like `RuntimeError` and
|
||||
`ValueError` should always be found without
|
||||
any prior registration.
|
||||
|
||||
'''
|
||||
assert (
|
||||
get_err_type('RuntimeError')
|
||||
is RuntimeError
|
||||
)
|
||||
assert (
|
||||
get_err_type('ValueError')
|
||||
is ValueError
|
||||
)
|
||||
|
||||
def test_tractor_native_types_resolve(self):
|
||||
'''
|
||||
`tractor`-internal exc types (e.g.
|
||||
`ContextCancelled`) should always resolve.
|
||||
|
||||
'''
|
||||
assert (
|
||||
get_err_type('ContextCancelled')
|
||||
is tractor.ContextCancelled
|
||||
)
|
||||
|
||||
def test_boxed_type_str_without_ipc_msg(self):
|
||||
'''
|
||||
When a `RemoteActorError` is constructed
|
||||
without an IPC msg (and no resolvable type),
|
||||
`.boxed_type_str` should return `'<unknown>'`.
|
||||
|
||||
'''
|
||||
rae = RemoteActorError('test')
|
||||
assert rae.boxed_type_str == '<unknown>'
|
||||
|
||||
|
||||
# -- IPC-level integration tests --
|
||||
|
||||
def test_registered_custom_err_relayed(
|
||||
debug_mode: bool,
|
||||
tpt_proto: str,
|
||||
):
|
||||
'''
|
||||
When a custom error type is registered via
|
||||
`reg_err_types()` on BOTH sides of an IPC dialog,
|
||||
the parent should receive a `RemoteActorError`
|
||||
whose `.boxed_type` matches the original custom
|
||||
error type.
|
||||
|
||||
'''
|
||||
reg_err_types([CustomAppError])
|
||||
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
debug_mode=debug_mode,
|
||||
enable_transports=[tpt_proto],
|
||||
) as an:
|
||||
ptl: Portal = await an.start_actor(
|
||||
'custom-err-raiser',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
async with ptl.open_context(
|
||||
raise_custom_err,
|
||||
) as (ctx, sent):
|
||||
assert not sent
|
||||
try:
|
||||
await ctx.wait_for_result()
|
||||
except RemoteActorError as rae:
|
||||
assert rae.boxed_type is CustomAppError
|
||||
assert rae.src_type is CustomAppError
|
||||
assert 'the app exploded remotely' in str(
|
||||
rae.tb_str
|
||||
)
|
||||
raise
|
||||
|
||||
with pytest.raises(RemoteActorError) as excinfo:
|
||||
trio.run(main)
|
||||
|
||||
rae = excinfo.value
|
||||
assert rae.boxed_type is CustomAppError
|
||||
|
||||
|
||||
def test_registered_another_err_relayed(
|
||||
debug_mode: bool,
|
||||
tpt_proto: str,
|
||||
):
|
||||
'''
|
||||
Same as above but for a different custom error
|
||||
type to verify multi-type registration works
|
||||
end-to-end over IPC.
|
||||
|
||||
'''
|
||||
reg_err_types([AnotherAppError])
|
||||
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
debug_mode=debug_mode,
|
||||
enable_transports=[tpt_proto],
|
||||
) as an:
|
||||
ptl: Portal = await an.start_actor(
|
||||
'another-err-raiser',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
async with ptl.open_context(
|
||||
raise_another_err,
|
||||
) as (ctx, sent):
|
||||
assert not sent
|
||||
try:
|
||||
await ctx.wait_for_result()
|
||||
except RemoteActorError as rae:
|
||||
assert (
|
||||
rae.boxed_type
|
||||
is AnotherAppError
|
||||
)
|
||||
raise
|
||||
|
||||
await an.cancel()
|
||||
|
||||
with pytest.raises(RemoteActorError) as excinfo:
|
||||
trio.run(main)
|
||||
|
||||
rae = excinfo.value
|
||||
assert rae.boxed_type is AnotherAppError
|
||||
|
||||
|
||||
def test_unregistered_err_still_relayed(
|
||||
debug_mode: bool,
|
||||
tpt_proto: str,
|
||||
):
|
||||
'''
|
||||
Verify that even when a custom error type is NOT registered via
|
||||
`reg_err_types()`, the remote error is still relayed as
|
||||
a `RemoteActorError` with all string-level info preserved
|
||||
(traceback, type name, source actor uid).
|
||||
|
||||
The `.boxed_type` will be `None` (type obj can't be resolved) but
|
||||
`.boxed_type_str` and `.src_type_str` still report the original
|
||||
type name from the IPC msg.
|
||||
|
||||
This documents the expected limitation: without `reg_err_types()`
|
||||
the `.boxed_type` property can NOT resolve to the original Python
|
||||
type.
|
||||
|
||||
'''
|
||||
# NOTE: intentionally do NOT call
|
||||
# `reg_err_types([UnregisteredAppError])`
|
||||
|
||||
async def main():
|
||||
async with tractor.open_nursery(
|
||||
debug_mode=debug_mode,
|
||||
enable_transports=[tpt_proto],
|
||||
) as an:
|
||||
ptl: Portal = await an.start_actor(
|
||||
'unreg-err-raiser',
|
||||
enable_modules=[__name__],
|
||||
)
|
||||
async with ptl.open_context(
|
||||
raise_unreg_err,
|
||||
) as (ctx, sent):
|
||||
assert not sent
|
||||
await ctx.wait_for_result()
|
||||
|
||||
await an.cancel()
|
||||
|
||||
with pytest.raises(RemoteActorError) as excinfo:
|
||||
trio.run(main)
|
||||
|
||||
rae = excinfo.value
|
||||
|
||||
# the error IS relayed even without
|
||||
# registration; type obj is unresolvable but
|
||||
# all string-level info is preserved.
|
||||
assert rae.boxed_type is None # NOT `UnregisteredAppError`
|
||||
assert rae.src_type is None
|
||||
|
||||
# string names survive the IPC round-trip
|
||||
# via the `Error` msg fields.
|
||||
assert (
|
||||
rae.src_type_str
|
||||
==
|
||||
'UnregisteredAppError'
|
||||
)
|
||||
assert (
|
||||
rae.boxed_type_str
|
||||
==
|
||||
'UnregisteredAppError'
|
||||
)
|
||||
|
||||
# original traceback content is preserved
|
||||
assert 'this error type is unknown' in rae.tb_str
|
||||
assert 'UnregisteredAppError' in rae.tb_str
|
||||
|
|
@ -12,14 +12,14 @@ import trio
|
|||
import tractor
|
||||
from tractor.trionics import (
|
||||
maybe_open_context,
|
||||
collapse_eg,
|
||||
)
|
||||
from tractor.log import (
|
||||
get_console_log,
|
||||
get_logger,
|
||||
)
|
||||
log = get_logger(__name__)
|
||||
|
||||
|
||||
log = get_logger()
|
||||
|
||||
_resource: int = 0
|
||||
|
||||
|
|
@ -213,12 +213,9 @@ def test_open_local_sub_to_stream(
|
|||
N local tasks using `trionics.maybe_open_context()`.
|
||||
|
||||
'''
|
||||
from .conftest import cpu_scaling_factor
|
||||
timeout: float = (
|
||||
4
|
||||
if not platform.system() == "Windows"
|
||||
else 10
|
||||
) * cpu_scaling_factor()
|
||||
timeout: float = 3.6
|
||||
if platform.system() == "Windows":
|
||||
timeout: float = 10
|
||||
|
||||
if debug_mode:
|
||||
timeout = 999
|
||||
|
|
@ -322,7 +319,7 @@ def test_open_local_sub_to_stream(
|
|||
|
||||
|
||||
@acm
|
||||
async def maybe_cancel_outer_cs(
|
||||
async def cancel_outer_cs(
|
||||
cs: trio.CancelScope|None = None,
|
||||
delay: float = 0,
|
||||
):
|
||||
|
|
@ -336,31 +333,12 @@ async def maybe_cancel_outer_cs(
|
|||
if cs:
|
||||
log.info('task calling cs.cancel()')
|
||||
cs.cancel()
|
||||
|
||||
trio.lowlevel.checkpoint()
|
||||
yield
|
||||
|
||||
if cs:
|
||||
await trio.sleep_forever()
|
||||
|
||||
# XXX, if not cancelled we'll leak this inf-blocking
|
||||
# subtask to the actor's service tn..
|
||||
else:
|
||||
await trio.lowlevel.checkpoint()
|
||||
await trio.sleep_forever()
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'delay',
|
||||
[0.05, 0.5, 1],
|
||||
ids="pre_sleep_delay={}".format,
|
||||
)
|
||||
@pytest.mark.parametrize(
|
||||
'cancel_by_cs',
|
||||
[True, False],
|
||||
ids="cancel_by_cs={}".format,
|
||||
)
|
||||
def test_lock_not_corrupted_on_fast_cancel(
|
||||
delay: float,
|
||||
cancel_by_cs: bool,
|
||||
debug_mode: bool,
|
||||
loglevel: str,
|
||||
):
|
||||
|
|
@ -377,14 +355,17 @@ def test_lock_not_corrupted_on_fast_cancel(
|
|||
due to it having erronously exited without calling
|
||||
`lock.release()`.
|
||||
|
||||
|
||||
'''
|
||||
delay: float = 1.
|
||||
|
||||
async def use_moc(
|
||||
cs: trio.CancelScope|None,
|
||||
delay: float,
|
||||
cs: trio.CancelScope|None = None,
|
||||
):
|
||||
log.info('task entering moc')
|
||||
async with maybe_open_context(
|
||||
maybe_cancel_outer_cs,
|
||||
cancel_outer_cs,
|
||||
kwargs={
|
||||
'cs': cs,
|
||||
'delay': delay,
|
||||
|
|
@ -395,13 +376,7 @@ def test_lock_not_corrupted_on_fast_cancel(
|
|||
else:
|
||||
log.info('1st task entered')
|
||||
|
||||
if cs:
|
||||
await trio.sleep_forever()
|
||||
|
||||
else:
|
||||
await trio.sleep(delay)
|
||||
|
||||
# ^END, exit shared ctx.
|
||||
await trio.sleep_forever()
|
||||
|
||||
async def main():
|
||||
with trio.fail_after(delay + 2):
|
||||
|
|
@ -410,7 +385,6 @@ def test_lock_not_corrupted_on_fast_cancel(
|
|||
debug_mode=debug_mode,
|
||||
loglevel=loglevel,
|
||||
),
|
||||
# ?TODO, pass this as the parent tn?
|
||||
trio.open_nursery() as tn,
|
||||
):
|
||||
get_console_log('info')
|
||||
|
|
@ -418,206 +392,15 @@ def test_lock_not_corrupted_on_fast_cancel(
|
|||
cs = tn.cancel_scope
|
||||
tn.start_soon(
|
||||
use_moc,
|
||||
cs,
|
||||
delay,
|
||||
cs if cancel_by_cs else None,
|
||||
name='child',
|
||||
)
|
||||
with trio.CancelScope() as rent_cs:
|
||||
await use_moc(
|
||||
cs=rent_cs,
|
||||
delay=delay,
|
||||
cs=rent_cs if cancel_by_cs else None,
|
||||
)
|
||||
|
||||
trio.run(main)
|
||||
|
||||
|
||||
@acm
|
||||
async def acm_with_resource(resource_id: str):
|
||||
'''
|
||||
Yield `resource_id` as the cached value.
|
||||
|
||||
Used to verify per-`ctx_key` isolation when the same
|
||||
`acm_func` is called with different kwargs.
|
||||
|
||||
'''
|
||||
yield resource_id
|
||||
|
||||
|
||||
def test_per_ctx_key_resource_lifecycle(
|
||||
debug_mode: bool,
|
||||
loglevel: str,
|
||||
):
|
||||
'''
|
||||
Verify that `maybe_open_context()` correctly isolates resource
|
||||
lifecycle **per `ctx_key`** when the same `acm_func` is called
|
||||
with different kwargs.
|
||||
|
||||
Previously `_Cache.users` was a single global `int` and
|
||||
`_Cache.locks` was keyed on `fid` (function ID), so calling
|
||||
the same `acm_func` with different kwargs (producing different
|
||||
`ctx_key`s) meant:
|
||||
|
||||
- teardown for one key was skipped bc the *other* key's users
|
||||
kept the global count > 0,
|
||||
- and re-entry could hit the old
|
||||
`assert not resources.get(ctx_key)` crash during the
|
||||
teardown window.
|
||||
|
||||
This was the root cause of a long-standing bug in piker's
|
||||
`brokerd.kraken` backend.
|
||||
|
||||
'''
|
||||
timeout: float = 6
|
||||
if debug_mode:
|
||||
timeout = 999
|
||||
|
||||
async def main():
|
||||
a_ready = trio.Event()
|
||||
a_exit = trio.Event()
|
||||
|
||||
async def hold_resource_a():
|
||||
'''
|
||||
Open resource 'a' and keep it alive until signalled.
|
||||
|
||||
'''
|
||||
async with maybe_open_context(
|
||||
acm_with_resource,
|
||||
kwargs={'resource_id': 'a'},
|
||||
) as (cache_hit, value):
|
||||
assert not cache_hit
|
||||
assert value == 'a'
|
||||
log.info("resource 'a' entered (holding)")
|
||||
a_ready.set()
|
||||
await a_exit.wait()
|
||||
log.info("resource 'a' exiting")
|
||||
|
||||
with trio.fail_after(timeout):
|
||||
async with (
|
||||
tractor.open_root_actor(
|
||||
debug_mode=debug_mode,
|
||||
loglevel=loglevel,
|
||||
),
|
||||
trio.open_nursery() as tn,
|
||||
):
|
||||
# Phase 1: bg task holds resource 'a' open.
|
||||
tn.start_soon(hold_resource_a)
|
||||
await a_ready.wait()
|
||||
|
||||
# Phase 2: open resource 'b' (different kwargs,
|
||||
# same acm_func) then exit it while 'a' is still
|
||||
# alive.
|
||||
async with maybe_open_context(
|
||||
acm_with_resource,
|
||||
kwargs={'resource_id': 'b'},
|
||||
) as (cache_hit, value):
|
||||
assert not cache_hit
|
||||
assert value == 'b'
|
||||
log.info("resource 'b' entered")
|
||||
|
||||
log.info("resource 'b' exited, waiting for teardown")
|
||||
await trio.lowlevel.checkpoint()
|
||||
|
||||
# Phase 3: re-open 'b'; must be a fresh cache MISS
|
||||
# proving 'b' was torn down independently of 'a'.
|
||||
#
|
||||
# With the old global `_Cache.users` counter this
|
||||
# would be a stale cache HIT (leaked resource) or
|
||||
# trigger `assert not resources.get(ctx_key)`.
|
||||
async with maybe_open_context(
|
||||
acm_with_resource,
|
||||
kwargs={'resource_id': 'b'},
|
||||
) as (cache_hit, value):
|
||||
assert not cache_hit, (
|
||||
"resource 'b' was NOT torn down despite "
|
||||
"having zero users! (global user count bug)"
|
||||
)
|
||||
assert value == 'b'
|
||||
log.info(
|
||||
"resource 'b' re-entered "
|
||||
"(cache miss, correct)"
|
||||
)
|
||||
|
||||
# Phase 4: let 'a' exit, clean shutdown.
|
||||
a_exit.set()
|
||||
|
||||
trio.run(main)
|
||||
|
||||
|
||||
def test_moc_reentry_during_teardown(
|
||||
debug_mode: bool,
|
||||
loglevel: str,
|
||||
):
|
||||
'''
|
||||
Reproduce the piker `open_cached_client('kraken')` race:
|
||||
|
||||
- same `acm_func`, NO kwargs (identical `ctx_key`)
|
||||
- multiple tasks share the cached resource
|
||||
- all users exit -> teardown starts
|
||||
- a NEW task enters during `_Cache.run_ctx.__aexit__`
|
||||
- `values[ctx_key]` is gone (popped in inner finally)
|
||||
but `resources[ctx_key]` still exists (outer finally
|
||||
hasn't run yet bc the acm cleanup has checkpoints)
|
||||
- old code: `assert not resources.get(ctx_key)` FIRES
|
||||
|
||||
This models the real-world scenario where `brokerd.kraken`
|
||||
tasks concurrently call `open_cached_client('kraken')`
|
||||
(same `acm_func`, empty kwargs, shared `ctx_key`) and
|
||||
the teardown/re-entry race triggers intermittently.
|
||||
|
||||
'''
|
||||
async def main():
|
||||
in_aexit = trio.Event()
|
||||
|
||||
@acm
|
||||
async def cached_client():
|
||||
'''
|
||||
Simulates `kraken.api.get_client()`:
|
||||
- no params (all callers share one `ctx_key`)
|
||||
- slow-ish cleanup to widen the race window
|
||||
between `values.pop()` and `resources.pop()`
|
||||
inside `_Cache.run_ctx`.
|
||||
|
||||
'''
|
||||
yield 'the-client'
|
||||
# Signal that we're in __aexit__ — at this
|
||||
# point `values` has already been popped by
|
||||
# `run_ctx`'s inner finally, but `resources`
|
||||
# is still alive (outer finally hasn't run).
|
||||
in_aexit.set()
|
||||
await trio.sleep(10)
|
||||
|
||||
first_done = trio.Event()
|
||||
|
||||
async def use_and_exit():
|
||||
async with maybe_open_context(
|
||||
cached_client,
|
||||
) as (cache_hit, value):
|
||||
assert value == 'the-client'
|
||||
first_done.set()
|
||||
|
||||
async def reenter_during_teardown():
|
||||
'''
|
||||
Wait for the acm's `__aexit__` to start (meaning
|
||||
`values` is popped but `resources` still exists),
|
||||
then re-enter — triggering the assert.
|
||||
|
||||
'''
|
||||
await in_aexit.wait()
|
||||
async with maybe_open_context(
|
||||
cached_client,
|
||||
) as (cache_hit, value):
|
||||
assert value == 'the-client'
|
||||
|
||||
with trio.fail_after(5):
|
||||
async with (
|
||||
tractor.open_root_actor(
|
||||
debug_mode=debug_mode,
|
||||
loglevel=loglevel,
|
||||
),
|
||||
collapse_eg(),
|
||||
trio.open_nursery() as tn,
|
||||
):
|
||||
tn.start_soon(use_and_exit)
|
||||
tn.start_soon(reenter_during_teardown)
|
||||
|
||||
trio.run(main)
|
||||
|
|
|
|||
|
|
@ -4,10 +4,6 @@ import trio
|
|||
import pytest
|
||||
|
||||
import tractor
|
||||
|
||||
# XXX `cffi` dun build on py3.14 yet..
|
||||
cffi = pytest.importorskip("cffi")
|
||||
|
||||
from tractor.ipc._ringbuf import (
|
||||
open_ringbuf,
|
||||
RBToken,
|
||||
|
|
@ -18,7 +14,7 @@ from tractor._testing.samples import (
|
|||
generate_sample_messages,
|
||||
)
|
||||
|
||||
# XXX, in case you want to melt your cores, comment this skip line XD
|
||||
# in case you don't want to melt your cores, uncomment dis!
|
||||
pytestmark = pytest.mark.skip
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -49,7 +49,7 @@ def test_infected_root_actor(
|
|||
),
|
||||
to_asyncio.open_channel_from(
|
||||
aio_echo_server,
|
||||
) as (chan, first),
|
||||
) as (first, chan),
|
||||
):
|
||||
assert first == 'start'
|
||||
|
||||
|
|
@ -91,12 +91,13 @@ def test_infected_root_actor(
|
|||
async def sync_and_err(
|
||||
# just signature placeholders for compat with
|
||||
# ``to_asyncio.open_channel_from()``
|
||||
chan: tractor.to_asyncio.LinkedTaskChannel,
|
||||
to_trio: trio.MemorySendChannel,
|
||||
from_trio: asyncio.Queue,
|
||||
ev: asyncio.Event,
|
||||
|
||||
):
|
||||
if chan:
|
||||
chan.started_nowait('start')
|
||||
if to_trio:
|
||||
to_trio.send_nowait('start')
|
||||
|
||||
await ev.wait()
|
||||
raise RuntimeError('asyncio-side')
|
||||
|
|
@ -173,7 +174,7 @@ def test_trio_prestarted_task_bubbles(
|
|||
sync_and_err,
|
||||
ev=aio_ev,
|
||||
)
|
||||
) as (chan, first),
|
||||
) as (first, chan),
|
||||
):
|
||||
|
||||
for i in range(5):
|
||||
|
|
|
|||
|
|
@ -94,15 +94,15 @@ def test_runtime_vars_unset(
|
|||
after the root actor-runtime exits!
|
||||
|
||||
'''
|
||||
assert not tractor.runtime._state._runtime_vars['_debug_mode']
|
||||
assert not tractor._state._runtime_vars['_debug_mode']
|
||||
async def main():
|
||||
assert not tractor.runtime._state._runtime_vars['_debug_mode']
|
||||
assert not tractor._state._runtime_vars['_debug_mode']
|
||||
async with tractor.open_nursery(
|
||||
debug_mode=True,
|
||||
):
|
||||
assert tractor.runtime._state._runtime_vars['_debug_mode']
|
||||
assert tractor._state._runtime_vars['_debug_mode']
|
||||
|
||||
# after runtime closure, should be reverted!
|
||||
assert not tractor.runtime._state._runtime_vars['_debug_mode']
|
||||
assert not tractor._state._runtime_vars['_debug_mode']
|
||||
|
||||
trio.run(main)
|
||||
|
|
|
|||
|
|
@ -110,7 +110,7 @@ def test_rpc_errors(
|
|||
) as n:
|
||||
|
||||
actor = tractor.current_actor()
|
||||
assert actor.is_registrar
|
||||
assert actor.is_arbiter
|
||||
await n.run_in_actor(
|
||||
sleep_back_actor,
|
||||
actor_name=subactor_requests_to,
|
||||
|
|
|
|||
|
|
@ -22,10 +22,6 @@ def unlink_file():
|
|||
async def crash_and_clean_tmpdir(
|
||||
tmp_file_path: str,
|
||||
error: bool = True,
|
||||
rent_cancel: bool = True,
|
||||
|
||||
# XXX unused, but do we really need to test these cases?
|
||||
self_cancel: bool = False,
|
||||
):
|
||||
global _file_path
|
||||
_file_path = tmp_file_path
|
||||
|
|
@ -36,75 +32,43 @@ async def crash_and_clean_tmpdir(
|
|||
assert os.path.isfile(tmp_file_path)
|
||||
await trio.sleep(0.1)
|
||||
if error:
|
||||
print('erroring in subactor!')
|
||||
assert 0
|
||||
|
||||
elif self_cancel:
|
||||
print('SELF-cancelling subactor!')
|
||||
else:
|
||||
actor.cancel_soon()
|
||||
|
||||
elif rent_cancel:
|
||||
await trio.sleep_forever()
|
||||
|
||||
print('subactor exiting task!')
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
'error_in_child',
|
||||
[True, False],
|
||||
ids='error_in_child={}'.format,
|
||||
)
|
||||
@tractor_test
|
||||
async def test_lifetime_stack_wipes_tmpfile(
|
||||
tmp_path,
|
||||
error_in_child: bool,
|
||||
loglevel: str,
|
||||
# log: tractor.log.StackLevelAdapter,
|
||||
# ^TODO, once landed via macos support!
|
||||
):
|
||||
child_tmp_file = tmp_path / "child.txt"
|
||||
child_tmp_file.touch()
|
||||
assert child_tmp_file.exists()
|
||||
path = str(child_tmp_file)
|
||||
|
||||
# NOTE, this is expected to cancel the sub
|
||||
# in the `error_in_child=False` case!
|
||||
timeout: float = (
|
||||
1.6 if error_in_child
|
||||
else 1
|
||||
)
|
||||
try:
|
||||
with trio.move_on_after(timeout) as cs:
|
||||
async with tractor.open_nursery(
|
||||
loglevel=loglevel,
|
||||
) as an:
|
||||
await ( # inlined `tractor.Portal`
|
||||
await an.run_in_actor(
|
||||
crash_and_clean_tmpdir,
|
||||
tmp_file_path=path,
|
||||
error=error_in_child,
|
||||
)
|
||||
).result()
|
||||
with trio.move_on_after(0.5):
|
||||
async with tractor.open_nursery() as n:
|
||||
await ( # inlined portal
|
||||
await n.run_in_actor(
|
||||
crash_and_clean_tmpdir,
|
||||
tmp_file_path=path,
|
||||
error=error_in_child,
|
||||
)
|
||||
).result()
|
||||
|
||||
except (
|
||||
tractor.RemoteActorError,
|
||||
# tractor.BaseExceptionGroup,
|
||||
BaseExceptionGroup,
|
||||
) as _exc:
|
||||
exc = _exc
|
||||
from tractor.log import get_console_log
|
||||
log = get_console_log(
|
||||
level=loglevel,
|
||||
name=__name__,
|
||||
)
|
||||
log.exception(
|
||||
f'Subactor failed as expected with {type(exc)!r}\n'
|
||||
)
|
||||
):
|
||||
pass
|
||||
|
||||
# tmp file should have been wiped by
|
||||
# teardown stack.
|
||||
assert not child_tmp_file.exists()
|
||||
|
||||
if error_in_child:
|
||||
assert not cs.cancel_called
|
||||
else:
|
||||
# expect timeout in some cases?
|
||||
assert cs.cancel_called
|
||||
|
|
|
|||
|
|
@ -2,7 +2,6 @@
|
|||
Shared mem primitives and APIs.
|
||||
|
||||
"""
|
||||
import platform
|
||||
import uuid
|
||||
|
||||
# import numpy
|
||||
|
|
@ -14,20 +13,6 @@ from tractor.ipc._shm import (
|
|||
attach_shm_list,
|
||||
)
|
||||
|
||||
pytestmark = pytest.mark.skipon_spawn_backend(
|
||||
'subint',
|
||||
# NOTE, `main_thread_forkserver` works for these tests
|
||||
# via the `mp.SharedMemory(track=False)` +
|
||||
# `mp.resource_tracker` monkey-patch in `.ipc._mp_bs`.
|
||||
# Without that workaround the fork-inherited
|
||||
# `resource_tracker` fd would EBADF on first shm op +
|
||||
# cascade into `FileExistsError` across parametrize
|
||||
# variants. Tracker doc:
|
||||
# `ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md`.
|
||||
reason=(
|
||||
'subint: GIL-contention hanging class.\n'
|
||||
)
|
||||
)
|
||||
|
||||
@tractor.context
|
||||
async def child_attach_shml_alot(
|
||||
|
|
@ -68,18 +53,7 @@ def test_child_attaches_alot():
|
|||
shm_key=shml.key,
|
||||
) as (ctx, start_val),
|
||||
):
|
||||
assert (_key := shml.key) == start_val
|
||||
|
||||
if platform.system() != 'Darwin':
|
||||
# XXX, macOS has a char limit..
|
||||
# see `ipc._shm._shorten_key_for_macos`
|
||||
assert (
|
||||
start_val
|
||||
==
|
||||
key
|
||||
==
|
||||
_key
|
||||
)
|
||||
assert start_val == key
|
||||
await ctx.result()
|
||||
|
||||
await portal.cancel_actor()
|
||||
|
|
|
|||
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue