5.6 KiB
Piker Concurrency & Runtime Expertise
The distilled mental model for piker’s distributed runtime: a trio-structured actor tree supervised by tractor (pinned to git main) where every long-lived subsystem is a named daemon-actor talking over ctx/stream IPC.
Actor tree & daemon taxonomy
pikerd root supervisor + registry
├── datad.<broker> feed bus, shm writers, tsp
│ history, symbol search
├── brokerd.<broker> live order-ctl ONLY; lazily
│ spawned by emsd, credentialed
├── emsd dark-clearing + order routing
│ └── paperboi.<broker> sim-clearing (paper mode)
└── samplerd singleton OHLC clock/increment
Key invariants: - datad hosts all piker.data.validate._eps['datad'] eps; brokerd only the ['brokerd'] (order-ctl) ones. The _eps table in piker/data/validate.py is the authoritative contract; get_eps(mod, kind) introspects a backend’s support. - brokerd.<broker> is booted in EXACTLY one place: open_brokerd_dialog() in piker/clearing/_ems.py (with a portal: override for the piker ledger ad-hoc actor). Chart-only + paper sessions run with ZERO brokerd procs. Never add a data-path spawn! - backends declare per-daemon-kind submods via _datad_mods/_brokerd_mods in their __init__.py (fallback: __enable_modules__).
Daemon lifecycle conventions
Every daemon-kind follows the same trio of fns (see piker/brokers/_daemon.py + piker/data/_daemon.py as the canonical pair):
_setup_persistent_<kind>(): a@tractor.context“lifetime fixture” run viaServices.start_service_task(); does console-log setup ONCE for the actor, allocs any actor-global state (eg. datad’s_FeedsBus), thenawait ctx.started()+trio.sleep_forever().<kind>_init(): buildsenable_modules+ actor namef'<kind>.{brokername}'and copies backend_spawn_kwargs(CRITICAL:ibneedsinfect_asyncio=Truein EVERY daemon-kind).spawn_<kind>()+maybe_spawn_<kind>(): thin wrappers overServices.actor_n.start_actor()andpiker.service.maybe_spawn_daemon()(registry find-or-spawn w/ per-service-name locking).
Caps-sec model: enable_modules gates RPC entry ONLY — python imports are unrestricted in-proc. Keep each daemon’s enable set minimal; the (credentialed) brokerd must never RPC-enable piker.data.* feed mods.
Actor-local state: the #1 split hazard
Module-globals and instance caches are PER-ACTOR. Anything that “just worked” because two subsystems shared a process will break when they’re split into sibling actors. Canonical example: ib’s Client._contracts was warmed by feed-side get_mkt_info() in-proc; post datad/brokerd-split the trading actor must warm it itself (eagerly at open_trade_dialog() startup for open pps/orders + lazily per order request via symbols.cache_contract()).
When moving code across actor boundaries ALWAYS audit: - module-global registries (feed._bus, _accounts2clients, _client_cache, ..) - @async_lifo_cache/maybe_open_context caches (NOTE: async_lifo_cache keys on POSITIONAL args only; a cache-hit SKIPS the fn body and thus any side-effect writes!) - logging handler placement (see gotchas.md)
tractor primitives as used here
@tractor.contexteps:await ctx.started(val)unblocks the caller w/val; long-lived eps thenctx.open_stream()orsleep_forever().- discovery:
tractor.find_actor()viapiker.service.find_service();wait_for_actor(name, registry_addr=...);query_actor(name, regaddr=...)yields(sockaddr, portal). Addrs are wrappedtractor.discovery._addr.Addresstypes — usewrap_address()to normalize raw tuples and.unwrap()for comparisons. - runtime-vars:
_runtime_vars['piker_vars']is inherited down the spawn tree; used eg. forpiker_test_dirconfig isolation — read LAZILY at use-time, never at import time (subactors only get vars post runtime-boot). - cancellation semantics (modern tractor): a
ContextCancelledwhose.cancelleris your own actor is ABSORBED (clean exit, nothing raised); single-exc groups collapse (collapse_eg) so eg. a KBI propagates bare. Exc attrs:RemoteActorError.boxed_type(not.type).
to_asyncio (infect-asyncio) integration
For ib (and deribit) the backend client runs on an embedded asyncio loop via tractor.to_asyncio.open_channel_from() + LinkedTaskChannel.
Rules learned the hard way: - a shared req/resp channel MUST correlate responses to requests (see MethodProxy._run_method()’s mid protocol in piker/brokers/ib/api.py): caller cancellation (eg. move_on_after timeouts) otherwise orphans a response and silently skews every later result off-by-one. - the aio-side relay must catch + ship back ALL (non-cancel) exceptions as {'exception': err} resps; an escaping error kills the relay task -> channel -> proxy nursery -> the whole dialog, bypassing every caller-side guard. - TrioTaskExited (“child asyncio task is still running?”) on teardown is a known wart family; prefer upstream tractor fixes over piker-side bandaids.
See gotchas.md for the symptom->cause registry and debug-recipes.md for forensics techniques.