Exit-path failure modes
Compiled truth (2026-04-27, post-83ba9eb): A “successful exit”
requires three things to be true simultaneously: (1) the engine
decided to exit, (2) a SELL landed at IBKR with a working orderId,
(3) the broker position is actually flat AND the engine-side
tracker reflects that. Each is its own break point. Three trust
classes have been documented and fixed:
Class 1 - Alert without action (downstream)
The engine thinks it exited but no SELL ever reached IBKR. User is told a position is closing while it is silently bleeding.
Concrete: OVERSELL_GUARD_BLOCK reason=tp_cancel_rejected aborted
the SELL because canceling a REJECTED upstream order returned
REJECTED, and the cancel-guard treated that as “still active, do
not oversell.” The guard exists for the legitimate
TP-partial-fill-then-cancel race; it does not apply when the TP
never rested. Fixed in cd48ecf and hardened in 9a118b1.
Fix: classify order states as ACTIVE (cancel needed) vs
TERMINAL (skip cancel, proceed to SELL). Active set: PENDING,
CANCELLING, UNKNOWN, PARTIAL. Terminal set: FILLED,
CANCELLED, REJECTED, INACTIVE, ApiCancelled. Case-normalized.
Class 2 - Status without truth (engine tracker drift)
Broker is flat but engine UI says EXIT_PENDING with growing “unrealized PnL.” User makes decisions off a UI that has been lying for hours. Same trust class as Class 1, opposite direction.
Concrete: 2026-04-24, IBKR updatePortfolio reported position=0 realizedPNL=5625.58 continuously from 10:36:45 onward. Engine-side
tracker never finalized - kept emitting EXIT_PENDING and
recomputing ghost unrealized as last_known_qty * marketPrice_tick.
As underlying moved 5.04 over four hours, the ghost
“climbed” 11K even though position=0.0 marketValue=0.0
was the broker truth.
Fix landed (fdcf6ad): when updatePortfolio reports
position=0 for a tracked symbol/strike/right/expiry, the
engine-side tracker finalizes to CLOSED with
realized=updatePortfolio.realizedPNL. Drops the EXIT_PENDING
sentinel.
Class 3 - Entry-window races (NEW, 2026-04-27)
The reconciler and watchdog acted on positions that were still in
the entry-fill window - broker_qty was changing or _last_tick_at
was 0 because subscription hadn’t returned a tick yet. Treating
those transient states as “OPEN and behaving abnormally” caused
two distinct failure modes in two consecutive trades.
Concrete failure A - orphan from reconcile-during-entry (#186):
- 09:36:30 BUY 100 SPY 713C @ $1.75 placed
- 09:36:33 reconciler ran 3s later, broker hadn’t filled yet, saw
broker_qty=0. Called_finalize_close(EXTERNAL_CLOSE)on the PENDING_ENTRY position. - 09:36:40 BUY actually filled. Engine had no record. Orphan: 100 contracts at IBKR with NO engine management. Position would have ridden 0DTE theta to expiry worthless absent intervention.
Concrete failure B - orphan-contaminated TP qty + dead-man’s-switch hair-trigger (#187):
- 09:38:30 BUY 100 SPY 713C @ $1.59 placed (SAME contract as #186 orphan).
- 09:38:43-44 reconciler saw broker_qty climbing 101→131→200 (the
100 from #186 orphan + new fills landing). Took broker as truth →
set engine
pos.contracts_remaining = 200for #187. - 09:38:44 TP placed for
qty=200@ $1.74 (wrong; #187 is a 100-contract position; 200 was contaminated by orphan). - 09:38:47 watchdog DEAD MAN’S SWITCH fired ~7s after entry. The
brand-new contract had
_last_tick_at=0(subscription pending), soage = now - 0 = inf,inf > 120is True,market_price > 0was True (initialized from entry $1.59). Force-closed all 200 contracts at MKT. - Net broker realized: -1,015 (computed off #187’s $1.59 cost only - wrong).
Fix landed (83ba9eb):
confirm_fill()no longer flips state to OPEN on partial fills. PENDING_ENTRY persists until full fill.reconcile_with_broker()skips positions wherestate != PositionState.OPEN. Reconciler is a no-op during entry.- Watchdog DEAD MAN’S SWITCH requires
state == OPENAND_last_tick_at > 0. Brand-new positions are not “blind”, they’re just not subscribed yet. - Reconciler now detects orphan contamination: when
broker_qty > sum(engine.contracts_remaining)for a contract, logORPHAN_DELTA(CRITICAL) - do NOT credit delta to any single position. Operator decides. - Brand-new broker positions with no engine match: log
ORPHAN_BROKER_POSITION(CRITICAL). No auto-adopt.
The 8-item audit surface (all P0/P1 fixed)
- TP placement not verified → fixed:
get_order_status()poll within 500ms afterplaceOrder; on REJECTED/Inactive cleartp_order_id, alert once. - Cancel-guard treats terminal states as active → fixed via Class 1 ACTIVE/TERMINAL classifier.
- Stop-loss had the same class of bug → eliminated by removing the broker-resting STOP path entirely. SL is software-only at -30%.
- No retry cap, no market-sell escalation → fixed: 5 cycles /
30s cap, then
reqGlobalCancel()+ Telegram escalation. - Tracker drift → fixed via Class 2 (
updatePortfoliocallback finalizes tracker). - Alert dedup → fixed: edge-triggered, suppressed during exit-in-progress.
- Margin/BP forensics → fixed: BP precheck via
broker.get_account_summary()before TP placement. Skip TP on insufficient funds, software TP fallback covers +10%. - Startup state reconciliation → fixed:
reconcile_with_broker()exists, gated on OPEN state, orphan-aware.
Plus Class 3 (new, 2026-04-27):
9. Entry-window race in reconciler → fixed: state gate.
10. Orphan contamination of new positions’ TP qty → fixed:
ORPHAN_DELTA logging, no contamination of live qty.
11. Watchdog DEAD MAN’S SWITCH hair-trigger → fixed: state gate
plus _last_tick_at > 0 requirement.
Acceptance bar (all met in 83ba9eb)
- ✅ TP-rejected-at-placement →
tp_order_idcleared, alerts once. - ✅ TP filled → position finalizes CLOSED with broker realized PnL.
- ✅ Price SL trigger → cancels TP, market sells, finalizes.
- ✅ Thesis-invalid (BULL ≤30, BEAR ≥70) → same exit path.
- ✅ Time close at 14:15 CT → same exit path.
- ✅ Cancel-TP terminal states (5 cases) → fall through to SELL.
- ✅ Cancel-TP active states → block SELL until terminal.
- ✅ Market-SELL rejected → retry up to 5, then escalate.
- ✅ updatePortfolio position=0 (OPEN position) → finalize within one tick.
- ✅ Startup drift → broker-as-truth, orphans don’t auto-adopt.
- ✅ THESIS_INVALID → no spam during exit-in-progress.
- ✅ Reconciler during entry-fill window → no-op (state == PENDING_ENTRY).
- ✅ Orphan contamination → ORPHAN_DELTA log, no contamination.
- ✅ Watchdog before first tick → no-op (
_last_tick_at == 0).
State:
- GH #46: implementation landed at
83ba9eb(post9a118b1andfdcf6ad). All P0/P1 audit items + the Class 3 entry-window races are closed. Issue stays open pending end-of-day Monday validation. - Postmortem:
postmortems/tp-rejected-infinite-exit-loop-2026-04-24.md - See also:
concepts/position-state-machine,concepts/position-lifecycle. - Project memory:
project_pm_ibkr_exit_invariant.
Open threads:
- End-of-day Monday validation report (paper market until 15:00 CT).
- BP-precheck math uses
qty * limit_price * 100. Should query IBKR’s actual init_margin computation if available. - Circuit-breaker pause window for UW endpoint (currently 600s, too long; 60s with exponential backoff would recover faster). Not blocking; documented as follow-up.
- EXIT_PENDING as an explicit 4th state vs current
_exit_in_progressflag (seeposition-state-machineopen threads).
Timeline:
2026-04-27 | Class 3 (entry-window races) added after #186 +
#187 incidents. Both caused by reconciler / watchdog acting on
positions during the entry-fill window. 83ba9eb ships state
gates and orphan-delta logging. New companion pages:
position-state-machine, position-lifecycle.
2026-04-27 (earlier) | 9a118b1 shipped fixes for the audit
items 1-8 (BP precheck, software TP fallback, qty reconcile, etc).
2026-04-27 (still earlier) | Concept formalized after the 2026-04-24 #184 incident + the 2026-04-24 TRACKER_DRIFT_UNRESOLVED ghost-PnL discovery. Two distinct trust classes named so they could be tested for and asserted against, not re-discovered every quarter.
2026-04-24 | Original incident - see postmortem.