# Known gotchas: symptom -> cause -> fix A registry of distributed-runtime failure modes hit (and diagnosed) in the field; check here FIRST when a log/traceback matches. ## "Can not order ..., no qualified contract cached" - **Symptom**: `RuntimeError` from `ib.api.Client.submit_limit()` w/ empty `Client._contracts` in `brokerd.ib`. - **Cause**: per-actor cache never warmed; feed-side qualification now lives in `datad.ib`. - **Fix(ed)**: eager warmup at `open_trade_dialog()` start + lazy per-order `get_mkt_info()` + `cache_contract()` (writes BOTH `mkt.bs_fqme` and `mkt.fqme` keys; different consumers read each!). ## Search returns results for the WRONG pattern - **Symptom**: fqme search for 'gld' returns nvda results; next query returns the prior query's set. - **Cause**: `MethodProxy` channel off-by-one — a caller cancelled (search `move_on_after` timeout) after sending its request orphans the response; every later caller consumes the previous resp. - **Fix(ed)**: `mid` req-id correlation in `_run_method()` + relay; stale resps are dropped w/ a "Dropping stale method-resp" warning. If that warning spams, some caller is being cancelled mid-call habitually — find + fix its timeout. ## One bad request crashes a whole dialog/actor - **Symptom**: `TrioTaskExited` storm + nursery teardown after a single method error (eg ambiguous contract `AttributeError`). - **Cause**: exception escaped the aio-side relay loop (`open_aio_client_method_relay()`) killing channel + proxy nursery; caller-side `try/except` CANNOT catch it. - **Fix(ed)**: relay catches `Exception` -> ships `{'exception': err, 'mid': ...}` resp; order handler converts to EMS `BrokerdError` msgs. ## Ambiguous ib contracts -> `NoneType` attr errors - **Symptom**: `'NoneType' object has no attribute 'primaryExchange'` in `find_contracts()`. - **Cause**: `qualifyContractsAsync()` returns `None` entries for ambiguous (eg venue-less stonk fqme matching multiple listings: 'gld' -> ARCA/USD + VENTURE/CAD). - **Fix(ed)**: filter `None`s + raise descriptive `ValueError` ("use 'gld.arca.ib'"). ## Double-printed log records (same task id, 2x) - **Symptom**: every record from some subsys printed twice w/ identical task ids. - **Cause**: stderr handlers attached at TWO levels of one logger-propagation chain (eg daemon fixture on `piker.brokers.ib` + an ep calling `get_console_log(name=__name__)` on the child). tractor's handler-dedup only checks the SAME logger, not ancestors. - **Rule**: console handlers are attached ONCE per actor in the `_setup_persistent_*()` fixture; eps needing a different level use `log.setLevel()` ONLY, never `get_console_log()`. ## Bare/non-colorized log lines - **Symptom**: records w/ no timestamp/actor prefix. - **Cause**: NO handler anywhere in the emitting logger's chain -> stdlib `logging.lastResort`. Post actor-splits, a daemon fixture may only cover its own subsys subtree (eg datad's `piker.data.*` but not the backend's `piker.brokers..*`). - **Fix(ed)**: `_setup_persistent_datad()` enables BOTH `piker.data.` and `piker.brokers.` subtrees. ## 2nd in-proc runtime boot wedges (~50%) - **Symptom**: test hangs when one test proc boots a 2nd `pikerd` (eg `test_multi_fill_positions`'s persistence re-check); a zombie `*.{broker}` child lingers w/ unread bytes in its parent-IPC Recv-Q. - **Cause**: pre-existing `tractor`-main runtime teardown bug (confirmed independent of piker-layer changes via revert-testing 2026-06). - **Mitigation**: run suites per-file wrapped in `timeout -k 5 300 ...`; retry once on rc 124/143. Do NOT chase as a regression of unrelated changes. ## ib client-id collisions post-split - **Symptom**: 2nd ib daemon burns the full conn-timeout retry cycle connecting to gw/tws. - **Cause**: `datad.ib` + `brokerd.ib` both default `client_id=6116` w/ linear `+i` retries. - **Fix(ed)**: role-based offsets in `load_aio_clients()`: datad +16, ad-hoc (test/cli) actors +32. ## `async_lifo_cache` skipped side-effects - **Symptom**: a fn's cache-write side effect (eg `get_mkt_info()` -> `_contracts`) missing for a 2nd client/proxy. - **Cause**: cache keys on POSITIONAL args only; a hit skips the body entirely. - **Rule**: never rely on cached-fn side effects; perform required writes explicitly at the call site (eg `cache_contract()` after `get_mkt_info`).