diff --git a/.claude/skills/piker-conc-expert/SKILL.md b/.claude/skills/piker-conc-expert/SKILL.md
new file mode 100644
index 00000000..550087d0
--- /dev/null
+++ b/.claude/skills/piker-conc-expert/SKILL.md
@@ -0,0 +1,150 @@
+---
+name: piker-conc-expert
+description: >
+  Distributed-runtime and structured-concurrency
+  expertise for piker's `tractor` actor-tree. Apply
+  when working on daemon/service architecture, actor
+  spawning/discovery, cross-actor RPC (ctx/stream
+  eps), `to_asyncio` integration, cancellation
+  semantics, or debugging hangs/wedges/skews in the
+  actor system.
+user-invocable: false
+---
+
+# Piker Concurrency & Runtime Expertise
+
+The distilled mental model for piker's distributed
+runtime: a `trio`-structured actor tree supervised by
+`tractor` (pinned to git main) where every long-lived
+subsystem is a named daemon-actor talking over
+ctx/stream IPC.
+
+## Actor tree & daemon taxonomy
+
+```
+pikerd                  root supervisor + registry
+├── datad.<broker>      feed bus, shm writers, tsp
+│                       history, symbol search
+├── brokerd.<broker>    live order-ctl ONLY; lazily
+│                       spawned by emsd, credentialed
+├── emsd                dark-clearing + order routing
+│   └── paperboi.<broker>  sim-clearing (paper mode)
+└── samplerd            singleton OHLC clock/increment
+```
+
+Key invariants:
+- `datad` hosts all `piker.data.validate._eps['datad']`
+  eps; `brokerd` only the `['brokerd']` (order-ctl)
+  ones. The `_eps` table in `piker/data/validate.py`
+  is the authoritative contract; `get_eps(mod, kind)`
+  introspects a backend's support.
+- `brokerd.<broker>` is booted in EXACTLY one place:
+  `open_brokerd_dialog()` in `piker/clearing/_ems.py`
+  (with a `portal:` override for the `piker ledger`
+  ad-hoc actor). Chart-only + paper sessions run with
+  ZERO brokerd procs. Never add a data-path spawn!
+- backends declare per-daemon-kind submods via
+  `_datad_mods`/`_brokerd_mods` in their
+  `__init__.py` (fallback: `__enable_modules__`).
+
+## Daemon lifecycle conventions
+
+Every daemon-kind follows the same trio of fns (see
+`piker/brokers/_daemon.py` + `piker/data/_daemon.py`
+as the canonical pair):
+
+- `_setup_persistent_<kind>()`: a `@tractor.context`
+  "lifetime fixture" run via
+  `Services.start_service_task()`; does console-log
+  setup ONCE for the actor, allocs any actor-global
+  state (eg. datad's `_FeedsBus`), then
+  `await ctx.started()` + `trio.sleep_forever()`.
+- `<kind>_init()`: builds `enable_modules` + actor
+  name `f'<kind>.{brokername}'` and copies backend
+  `_spawn_kwargs` (CRITICAL: `ib` needs
+  `infect_asyncio=True` in EVERY daemon-kind).
+- `spawn_<kind>()` + `maybe_spawn_<kind>()`: thin
+  wrappers over `Services.actor_n.start_actor()` and
+  `piker.service.maybe_spawn_daemon()` (registry
+  find-or-spawn w/ per-service-name locking).
+
+Caps-sec model: `enable_modules` gates RPC entry ONLY
+— python imports are unrestricted in-proc. Keep each
+daemon's enable set minimal; the (credentialed)
+`brokerd` must never RPC-enable `piker.data.*` feed
+mods.
+
+## Actor-local state: the #1 split hazard
+
+Module-globals and instance caches are PER-ACTOR.
+Anything that "just worked" because two subsystems
+shared a process will break when they're split into
+sibling actors. Canonical example: `ib`'s
+`Client._contracts` was warmed by feed-side
+`get_mkt_info()` in-proc; post datad/brokerd-split
+the trading actor must warm it itself (eagerly at
+`open_trade_dialog()` startup for open pps/orders +
+lazily per order request via
+`symbols.cache_contract()`).
+
+When moving code across actor boundaries ALWAYS audit:
+- module-global registries (`feed._bus`,
+  `_accounts2clients`, `_client_cache`, ..)
+- `@async_lifo_cache`/`maybe_open_context` caches
+  (NOTE: `async_lifo_cache` keys on POSITIONAL args
+  only; a cache-hit SKIPS the fn body and thus any
+  side-effect writes!)
+- logging handler placement (see gotchas.md)
+
+## tractor primitives as used here
+
+- `@tractor.context` eps: `await ctx.started(val)`
+  unblocks the caller w/ `val`; long-lived eps then
+  `ctx.open_stream()` or `sleep_forever()`.
+- discovery: `tractor.find_actor()` via
+  `piker.service.find_service()`;
+  `wait_for_actor(name, registry_addr=...)`;
+  `query_actor(name, regaddr=...)` yields
+  `(sockaddr, portal)`. Addrs are wrapped
+  `tractor.discovery._addr.Address` types — use
+  `wrap_address()` to normalize raw tuples and
+  `.unwrap()` for comparisons.
+- runtime-vars: `_runtime_vars['piker_vars']` is
+  inherited down the spawn tree; used eg. for
+  `piker_test_dir` config isolation — read LAZILY at
+  use-time, never at import time (subactors only get
+  vars post runtime-boot).
+- cancellation semantics (modern tractor): a
+  `ContextCancelled` whose `.canceller` is your own
+  actor is ABSORBED (clean exit, nothing raised);
+  single-exc groups collapse (`collapse_eg`) so eg.
+  a KBI propagates bare. Exc attrs:
+  `RemoteActorError.boxed_type` (not `.type`).
+
+## `to_asyncio` (infect-asyncio) integration
+
+For `ib` (and `deribit`) the backend client runs on
+an embedded `asyncio` loop via
+`tractor.to_asyncio.open_channel_from()` +
+`LinkedTaskChannel`.
+
+Rules learned the hard way:
+- a shared req/resp channel MUST correlate responses
+  to requests (see `MethodProxy._run_method()`'s
+  `mid` protocol in `piker/brokers/ib/api.py`):
+  caller cancellation (eg. `move_on_after` timeouts)
+  otherwise orphans a response and silently skews
+  every later result off-by-one.
+- the aio-side relay must catch + ship back ALL
+  (non-cancel) exceptions as `{'exception': err}`
+  resps; an escaping error kills the relay task ->
+  channel -> proxy nursery -> the whole dialog,
+  bypassing every caller-side guard.
+- `TrioTaskExited` ("child asyncio task is still
+  running?") on teardown is a known wart family;
+  prefer upstream `tractor` fixes over piker-side
+  bandaids.
+
+See [gotchas.md](gotchas.md) for the symptom->cause
+registry and [debug-recipes.md](debug-recipes.md) for
+forensics techniques.
diff --git a/.claude/skills/piker-conc-expert/debug-recipes.md b/.claude/skills/piker-conc-expert/debug-recipes.md
new file mode 100644
index 00000000..6ef61517
--- /dev/null
+++ b/.claude/skills/piker-conc-expert/debug-recipes.md
@@ -0,0 +1,100 @@
+# Debug recipes: actor-system forensics
+
+Field-tested techniques for diagnosing hangs, wedges
+and cross-actor state bugs WITHOUT a debugger attached
+(or when `py-spy` ain't installed).
+
+## Wedged actor triage (no REPL)
+
+1. find the tree:
+   `ps -eo pid,etime,args | grep -E 'pytest|tractor._child'`
+   — long-`etime` `tractor._child` procs w/ a stuck
+   parent = wedge.
+2. kernel state:
+   `cat /proc/<pid>/wchan` + `status | grep -E
+   'State|Threads'` — `do_epoll_wait` + sleeping =
+   idle event loop, NOT cpu-spin.
+3. **the money read** — socket queues:
+   `ss -tnp | grep <pid>`
+   - `Recv-Q > 0` on the parent-IPC conn = the actor
+     STOPPED CONSUMING its msg loop (runtime bug),
+     parent is waiting on it.
+   - zero external (api/ws) conns = wedged before/
+     without provider IO; don't blame the network.
+   - `CLOSE-WAIT` lingerers = unclean peer teardown.
+4. cleanup: `pkill -f tractor._child` (NB: in
+   compound shell cmds `pkill`'s exit code poisons
+   `&&` chains — run it standalone).
+
+## Hang-proof test gating
+
+- per-suite, never combined (cross-suite session
+  state interacts w/ the 2nd-boot wedge):
+  `timeout -k 5 300 python -m pytest tests/<one>.py -q`
+- rc 124/143 = hang-kill -> retry ONCE before
+  investigating.
+- isolate a flaky test w/ a 3x loop; ~50% hit-rate
+  signatures match the known 2nd-boot wedge (see
+  gotchas.md).
+
+## Regression vs pre-existing attribution
+
+When a failure appears mid-refactor:
+1. `git stash -u` (or checkout the file subset) and
+   re-run the EXACT failing case at baseline.
+2. if baseline can't even run, selectively revert
+   ONLY the suspect layer:
+   `git diff <files> > /tmp/x.patch;
+    git checkout <files>` -> test ->
+   `git apply /tmp/x.patch`.
+3. flake-rate compare (3x runs) beats single-shot
+   conclusions.
+
+## Off-by-one / stale IPC resp detection
+
+Mismatched query->result content in logs (resp
+payload obviously for a prior request) = shared
+req/resp channel w/o correlation + a cancelled
+caller. Grep the ep for `move_on_after`/`fail_after`
+around proxied calls. Fix = req-id (`mid`) tagging,
+never "just a lock" (cancellation still orphans).
+
+## Logging-chain audits
+
+When records double-print or go bare (see gotchas.md):
+
+```python
+import logging
+l = logging.getLogger('piker.brokers.ib.broker')
+while l:
+    print(l.name, l.level, l.handlers, l.propagate)
+    l = l.parent
+```
+
+Exactly ONE stderr handler should exist in the chain,
+attached by the actor's daemon fixture.
+
+## Live actor-tree smoke (headless)
+
+Boot against an ALT registry port so a user's running
+stack is untouched; script in a REAL file (tractor
+children re-exec `__main__` from path — stdin scripts
+crash w/ `FileNotFoundError: .../<stdin>`):
+
+```python
+async with maybe_open_pikerd(
+    registry_addrs=[('127.0.0.1', 6979)],
+):
+    async with open_feed(['xbtusdt.kraken']) as feed:
+        assert await check_for_service('datad.kraken')
+        assert not await check_for_service(
+            'brokerd.kraken'
+        )
+```
+
+## In-proc fail-fast unit checks
+
+Spawn-path guards that raise BEFORE touching the
+runtime can be tested w/ a bare `trio.run()` (eg
+`spawn_brokerd('kucoin')` raising the datad-only
+error) — no pikerd needed.
diff --git a/.claude/skills/piker-conc-expert/gotchas.md b/.claude/skills/piker-conc-expert/gotchas.md
new file mode 100644
index 00000000..d1519c97
--- /dev/null
+++ b/.claude/skills/piker-conc-expert/gotchas.md
@@ -0,0 +1,116 @@
+# Known gotchas: symptom -> cause -> fix
+
+A registry of distributed-runtime failure modes hit
+(and diagnosed) in the field; check here FIRST when a
+log/traceback matches.
+
+## "Can not order ..., no qualified contract cached"
+
+- **Symptom**: `RuntimeError` from
+  `ib.api.Client.submit_limit()` w/ empty
+  `Client._contracts` in `brokerd.ib`.
+- **Cause**: per-actor cache never warmed; feed-side
+  qualification now lives in `datad.ib`.
+- **Fix(ed)**: eager warmup at `open_trade_dialog()`
+  start + lazy per-order `get_mkt_info()` +
+  `cache_contract()` (writes BOTH `mkt.bs_fqme` and
+  `mkt.fqme` keys; different consumers read each!).
+
+## Search returns results for the WRONG pattern
+
+- **Symptom**: fqme search for 'gld' returns nvda
+  results; next query returns the prior query's set.
+- **Cause**: `MethodProxy` channel off-by-one — a
+  caller cancelled (search `move_on_after` timeout)
+  after sending its request orphans the response;
+  every later caller consumes the previous resp.
+- **Fix(ed)**: `mid` req-id correlation in
+  `_run_method()` + relay; stale resps are dropped w/
+  a "Dropping stale method-resp" warning. If that
+  warning spams, some caller is being cancelled
+  mid-call habitually — find + fix its timeout.
+
+## One bad request crashes a whole dialog/actor
+
+- **Symptom**: `TrioTaskExited` storm + nursery
+  teardown after a single method error (eg ambiguous
+  contract `AttributeError`).
+- **Cause**: exception escaped the aio-side relay
+  loop (`open_aio_client_method_relay()`) killing
+  channel + proxy nursery; caller-side `try/except`
+  CANNOT catch it.
+- **Fix(ed)**: relay catches `Exception` -> ships
+  `{'exception': err, 'mid': ...}` resp; order
+  handler converts to EMS `BrokerdError` msgs.
+
+## Ambiguous ib contracts -> `NoneType` attr errors
+
+- **Symptom**: `'NoneType' object has no attribute
+  'primaryExchange'` in `find_contracts()`.
+- **Cause**: `qualifyContractsAsync()` returns `None`
+  entries for ambiguous (eg venue-less stonk fqme
+  matching multiple listings: 'gld' -> ARCA/USD +
+  VENTURE/CAD).
+- **Fix(ed)**: filter `None`s + raise descriptive
+  `ValueError` ("use 'gld.arca.ib'").
+
+## Double-printed log records (same task id, 2x)
+
+- **Symptom**: every record from some subsys printed
+  twice w/ identical task ids.
+- **Cause**: stderr handlers attached at TWO levels
+  of one logger-propagation chain (eg daemon fixture
+  on `piker.brokers.ib` + an ep calling
+  `get_console_log(name=__name__)` on the child).
+  tractor's handler-dedup only checks the SAME
+  logger, not ancestors.
+- **Rule**: console handlers are attached ONCE per
+  actor in the `_setup_persistent_*()` fixture; eps
+  needing a different level use `log.setLevel()`
+  ONLY, never `get_console_log()`.
+
+## Bare/non-colorized log lines
+
+- **Symptom**: records w/ no timestamp/actor prefix.
+- **Cause**: NO handler anywhere in the emitting
+  logger's chain -> stdlib `logging.lastResort`. Post
+  actor-splits, a daemon fixture may only cover its
+  own subsys subtree (eg datad's `piker.data.*` but
+  not the backend's `piker.brokers.<broker>.*`).
+- **Fix(ed)**: `_setup_persistent_datad()` enables
+  BOTH `piker.data.<broker>` and
+  `piker.brokers.<broker>` subtrees.
+
+## 2nd in-proc runtime boot wedges (~50%)
+
+- **Symptom**: test hangs when one test proc boots a
+  2nd `pikerd` (eg `test_multi_fill_positions`'s
+  persistence re-check); a zombie `*.{broker}` child
+  lingers w/ unread bytes in its parent-IPC Recv-Q.
+- **Cause**: pre-existing `tractor`-main runtime
+  teardown bug (confirmed independent of piker-layer
+  changes via revert-testing 2026-06).
+- **Mitigation**: run suites per-file wrapped in
+  `timeout -k 5 300 ...`; retry once on rc 124/143.
+  Do NOT chase as a regression of unrelated changes.
+
+## ib client-id collisions post-split
+
+- **Symptom**: 2nd ib daemon burns the full
+  conn-timeout retry cycle connecting to gw/tws.
+- **Cause**: `datad.ib` + `brokerd.ib` both default
+  `client_id=6116` w/ linear `+i` retries.
+- **Fix(ed)**: role-based offsets in
+  `load_aio_clients()`: datad +16, ad-hoc (test/cli)
+  actors +32.
+
+## `async_lifo_cache` skipped side-effects
+
+- **Symptom**: a fn's cache-write side effect
+  (eg `get_mkt_info()` -> `_contracts`) missing for
+  a 2nd client/proxy.
+- **Cause**: cache keys on POSITIONAL args only; a
+  hit skips the body entirely.
+- **Rule**: never rely on cached-fn side effects;
+  perform required writes explicitly at the call
+  site (eg `cache_contract()` after `get_mkt_info`).