151 lines
5.6 KiB
Markdown
151 lines
5.6 KiB
Markdown
---
|
|
name: piker-conc-expert
|
|
description: >
|
|
Distributed-runtime and structured-concurrency
|
|
expertise for piker's `tractor` actor-tree. Apply
|
|
when working on daemon/service architecture, actor
|
|
spawning/discovery, cross-actor RPC (ctx/stream
|
|
eps), `to_asyncio` integration, cancellation
|
|
semantics, or debugging hangs/wedges/skews in the
|
|
actor system.
|
|
user-invocable: false
|
|
---
|
|
|
|
# Piker Concurrency & Runtime Expertise
|
|
|
|
The distilled mental model for piker's distributed
|
|
runtime: a `trio`-structured actor tree supervised by
|
|
`tractor` (pinned to git main) where every long-lived
|
|
subsystem is a named daemon-actor talking over
|
|
ctx/stream IPC.
|
|
|
|
## Actor tree & daemon taxonomy
|
|
|
|
```
|
|
pikerd root supervisor + registry
|
|
├── datad.<broker> feed bus, shm writers, tsp
|
|
│ history, symbol search
|
|
├── brokerd.<broker> live order-ctl ONLY; lazily
|
|
│ spawned by emsd, credentialed
|
|
├── emsd dark-clearing + order routing
|
|
│ └── paperboi.<broker> sim-clearing (paper mode)
|
|
└── samplerd singleton OHLC clock/increment
|
|
```
|
|
|
|
Key invariants:
|
|
- `datad` hosts all `piker.data.validate._eps['datad']`
|
|
eps; `brokerd` only the `['brokerd']` (order-ctl)
|
|
ones. The `_eps` table in `piker/data/validate.py`
|
|
is the authoritative contract; `get_eps(mod, kind)`
|
|
introspects a backend's support.
|
|
- `brokerd.<broker>` is booted in EXACTLY one place:
|
|
`open_brokerd_dialog()` in `piker/clearing/_ems.py`
|
|
(with a `portal:` override for the `piker ledger`
|
|
ad-hoc actor). Chart-only + paper sessions run with
|
|
ZERO brokerd procs. Never add a data-path spawn!
|
|
- backends declare per-daemon-kind submods via
|
|
`_datad_mods`/`_brokerd_mods` in their
|
|
`__init__.py` (fallback: `__enable_modules__`).
|
|
|
|
## Daemon lifecycle conventions
|
|
|
|
Every daemon-kind follows the same trio of fns (see
|
|
`piker/brokers/_daemon.py` + `piker/data/_daemon.py`
|
|
as the canonical pair):
|
|
|
|
- `_setup_persistent_<kind>()`: a `@tractor.context`
|
|
"lifetime fixture" run via
|
|
`Services.start_service_task()`; does console-log
|
|
setup ONCE for the actor, allocs any actor-global
|
|
state (eg. datad's `_FeedsBus`), then
|
|
`await ctx.started()` + `trio.sleep_forever()`.
|
|
- `<kind>_init()`: builds `enable_modules` + actor
|
|
name `f'<kind>.{brokername}'` and copies backend
|
|
`_spawn_kwargs` (CRITICAL: `ib` needs
|
|
`infect_asyncio=True` in EVERY daemon-kind).
|
|
- `spawn_<kind>()` + `maybe_spawn_<kind>()`: thin
|
|
wrappers over `Services.actor_n.start_actor()` and
|
|
`piker.service.maybe_spawn_daemon()` (registry
|
|
find-or-spawn w/ per-service-name locking).
|
|
|
|
Caps-sec model: `enable_modules` gates RPC entry ONLY
|
|
— python imports are unrestricted in-proc. Keep each
|
|
daemon's enable set minimal; the (credentialed)
|
|
`brokerd` must never RPC-enable `piker.data.*` feed
|
|
mods.
|
|
|
|
## Actor-local state: the #1 split hazard
|
|
|
|
Module-globals and instance caches are PER-ACTOR.
|
|
Anything that "just worked" because two subsystems
|
|
shared a process will break when they're split into
|
|
sibling actors. Canonical example: `ib`'s
|
|
`Client._contracts` was warmed by feed-side
|
|
`get_mkt_info()` in-proc; post datad/brokerd-split
|
|
the trading actor must warm it itself (eagerly at
|
|
`open_trade_dialog()` startup for open pps/orders +
|
|
lazily per order request via
|
|
`symbols.cache_contract()`).
|
|
|
|
When moving code across actor boundaries ALWAYS audit:
|
|
- module-global registries (`feed._bus`,
|
|
`_accounts2clients`, `_client_cache`, ..)
|
|
- `@async_lifo_cache`/`maybe_open_context` caches
|
|
(NOTE: `async_lifo_cache` keys on POSITIONAL args
|
|
only; a cache-hit SKIPS the fn body and thus any
|
|
side-effect writes!)
|
|
- logging handler placement (see gotchas.md)
|
|
|
|
## tractor primitives as used here
|
|
|
|
- `@tractor.context` eps: `await ctx.started(val)`
|
|
unblocks the caller w/ `val`; long-lived eps then
|
|
`ctx.open_stream()` or `sleep_forever()`.
|
|
- discovery: `tractor.find_actor()` via
|
|
`piker.service.find_service()`;
|
|
`wait_for_actor(name, registry_addr=...)`;
|
|
`query_actor(name, regaddr=...)` yields
|
|
`(sockaddr, portal)`. Addrs are wrapped
|
|
`tractor.discovery._addr.Address` types — use
|
|
`wrap_address()` to normalize raw tuples and
|
|
`.unwrap()` for comparisons.
|
|
- runtime-vars: `_runtime_vars['piker_vars']` is
|
|
inherited down the spawn tree; used eg. for
|
|
`piker_test_dir` config isolation — read LAZILY at
|
|
use-time, never at import time (subactors only get
|
|
vars post runtime-boot).
|
|
- cancellation semantics (modern tractor): a
|
|
`ContextCancelled` whose `.canceller` is your own
|
|
actor is ABSORBED (clean exit, nothing raised);
|
|
single-exc groups collapse (`collapse_eg`) so eg.
|
|
a KBI propagates bare. Exc attrs:
|
|
`RemoteActorError.boxed_type` (not `.type`).
|
|
|
|
## `to_asyncio` (infect-asyncio) integration
|
|
|
|
For `ib` (and `deribit`) the backend client runs on
|
|
an embedded `asyncio` loop via
|
|
`tractor.to_asyncio.open_channel_from()` +
|
|
`LinkedTaskChannel`.
|
|
|
|
Rules learned the hard way:
|
|
- a shared req/resp channel MUST correlate responses
|
|
to requests (see `MethodProxy._run_method()`'s
|
|
`mid` protocol in `piker/brokers/ib/api.py`):
|
|
caller cancellation (eg. `move_on_after` timeouts)
|
|
otherwise orphans a response and silently skews
|
|
every later result off-by-one.
|
|
- the aio-side relay must catch + ship back ALL
|
|
(non-cancel) exceptions as `{'exception': err}`
|
|
resps; an escaping error kills the relay task ->
|
|
channel -> proxy nursery -> the whole dialog,
|
|
bypassing every caller-side guard.
|
|
- `TrioTaskExited` ("child asyncio task is still
|
|
running?") on teardown is a known wart family;
|
|
prefer upstream `tractor` fixes over piker-side
|
|
bandaids.
|
|
|
|
See [gotchas.md](gotchas.md) for the symptom->cause
|
|
registry and [debug-recipes.md](debug-recipes.md) for
|
|
forensics techniques.
|