Exit-path failure modes

Compiled truth (2026-04-27, post-83ba9eb): A “successful exit” requires three things to be true simultaneously: (1) the engine decided to exit, (2) a SELL landed at IBKR with a working orderId, (3) the broker position is actually flat AND the engine-side tracker reflects that. Each is its own break point. Three trust classes have been documented and fixed:

Class 1 - Alert without action (downstream)

The engine thinks it exited but no SELL ever reached IBKR. User is told a position is closing while it is silently bleeding.

Concrete: OVERSELL_GUARD_BLOCK reason=tp_cancel_rejected aborted the SELL because canceling a REJECTED upstream order returned REJECTED, and the cancel-guard treated that as “still active, do not oversell.” The guard exists for the legitimate TP-partial-fill-then-cancel race; it does not apply when the TP never rested. Fixed in cd48ecf and hardened in 9a118b1.

Fix: classify order states as ACTIVE (cancel needed) vs TERMINAL (skip cancel, proceed to SELL). Active set: PENDING, CANCELLING, UNKNOWN, PARTIAL. Terminal set: FILLED, CANCELLED, REJECTED, INACTIVE, ApiCancelled. Case-normalized.

Class 2 - Status without truth (engine tracker drift)

Broker is flat but engine UI says EXIT_PENDING with growing “unrealized PnL.” User makes decisions off a UI that has been lying for hours. Same trust class as Class 1, opposite direction.

Concrete: 2026-04-24, IBKR updatePortfolio reported position=0 realizedPNL=5625.58 continuously from 10:36:45 onward. Engine-side tracker never finalized - kept emitting EXIT_PENDING and recomputing ghost unrealized as last_known_qty * marketPrice_tick. As underlying moved 5.04 over four hours, the ghost “climbed” 11K even though position=0.0 marketValue=0.0 was the broker truth.

Fix landed (fdcf6ad): when updatePortfolio reports position=0 for a tracked symbol/strike/right/expiry, the engine-side tracker finalizes to CLOSED with realized=updatePortfolio.realizedPNL. Drops the EXIT_PENDING sentinel.

Class 3 - Entry-window races (NEW, 2026-04-27)

The reconciler and watchdog acted on positions that were still in the entry-fill window - broker_qty was changing or _last_tick_at was 0 because subscription hadn’t returned a tick yet. Treating those transient states as “OPEN and behaving abnormally” caused two distinct failure modes in two consecutive trades.

Concrete failure A - orphan from reconcile-during-entry (#186):

  • 09:36:30 BUY 100 SPY 713C @ $1.75 placed
  • 09:36:33 reconciler ran 3s later, broker hadn’t filled yet, saw broker_qty=0. Called _finalize_close(EXTERNAL_CLOSE) on the PENDING_ENTRY position.
  • 09:36:40 BUY actually filled. Engine had no record. Orphan: 100 contracts at IBKR with NO engine management. Position would have ridden 0DTE theta to expiry worthless absent intervention.

Concrete failure B - orphan-contaminated TP qty + dead-man’s-switch hair-trigger (#187):

  • 09:38:30 BUY 100 SPY 713C @ $1.59 placed (SAME contract as #186 orphan).
  • 09:38:43-44 reconciler saw broker_qty climbing 101→131→200 (the 100 from #186 orphan + new fills landing). Took broker as truth → set engine pos.contracts_remaining = 200 for #187.
  • 09:38:44 TP placed for qty=200 @ $1.74 (wrong; #187 is a 100-contract position; 200 was contaminated by orphan).
  • 09:38:47 watchdog DEAD MAN’S SWITCH fired ~7s after entry. The brand-new contract had _last_tick_at=0 (subscription pending), so age = now - 0 = inf, inf > 120 is True, market_price > 0 was True (initialized from entry $1.59). Force-closed all 200 contracts at MKT.
  • Net broker realized: -1,015 (computed off #187’s $1.59 cost only - wrong).

Fix landed (83ba9eb):

  • confirm_fill() no longer flips state to OPEN on partial fills. PENDING_ENTRY persists until full fill.
  • reconcile_with_broker() skips positions where state != PositionState.OPEN. Reconciler is a no-op during entry.
  • Watchdog DEAD MAN’S SWITCH requires state == OPEN AND _last_tick_at > 0. Brand-new positions are not “blind”, they’re just not subscribed yet.
  • Reconciler now detects orphan contamination: when broker_qty > sum(engine.contracts_remaining) for a contract, log ORPHAN_DELTA (CRITICAL) - do NOT credit delta to any single position. Operator decides.
  • Brand-new broker positions with no engine match: log ORPHAN_BROKER_POSITION (CRITICAL). No auto-adopt.

The 8-item audit surface (all P0/P1 fixed)

  1. TP placement not verified → fixed: get_order_status() poll within 500ms after placeOrder; on REJECTED/Inactive clear tp_order_id, alert once.
  2. Cancel-guard treats terminal states as active → fixed via Class 1 ACTIVE/TERMINAL classifier.
  3. Stop-loss had the same class of bug → eliminated by removing the broker-resting STOP path entirely. SL is software-only at -30%.
  4. No retry cap, no market-sell escalation → fixed: 5 cycles / 30s cap, then reqGlobalCancel() + Telegram escalation.
  5. Tracker drift → fixed via Class 2 (updatePortfolio callback finalizes tracker).
  6. Alert dedup → fixed: edge-triggered, suppressed during exit-in-progress.
  7. Margin/BP forensics → fixed: BP precheck via broker.get_account_summary() before TP placement. Skip TP on insufficient funds, software TP fallback covers +10%.
  8. Startup state reconciliation → fixed: reconcile_with_broker() exists, gated on OPEN state, orphan-aware.

Plus Class 3 (new, 2026-04-27): 9. Entry-window race in reconciler → fixed: state gate. 10. Orphan contamination of new positions’ TP qty → fixed: ORPHAN_DELTA logging, no contamination of live qty. 11. Watchdog DEAD MAN’S SWITCH hair-trigger → fixed: state gate plus _last_tick_at > 0 requirement.

Acceptance bar (all met in 83ba9eb)

  • ✅ TP-rejected-at-placement → tp_order_id cleared, alerts once.
  • ✅ TP filled → position finalizes CLOSED with broker realized PnL.
  • ✅ Price SL trigger → cancels TP, market sells, finalizes.
  • ✅ Thesis-invalid (BULL ≤30, BEAR ≥70) → same exit path.
  • ✅ Time close at 14:15 CT → same exit path.
  • ✅ Cancel-TP terminal states (5 cases) → fall through to SELL.
  • ✅ Cancel-TP active states → block SELL until terminal.
  • ✅ Market-SELL rejected → retry up to 5, then escalate.
  • ✅ updatePortfolio position=0 (OPEN position) → finalize within one tick.
  • ✅ Startup drift → broker-as-truth, orphans don’t auto-adopt.
  • ✅ THESIS_INVALID → no spam during exit-in-progress.
  • ✅ Reconciler during entry-fill window → no-op (state == PENDING_ENTRY).
  • ✅ Orphan contamination → ORPHAN_DELTA log, no contamination.
  • ✅ Watchdog before first tick → no-op (_last_tick_at == 0).

State:

  • GH #46: implementation landed at 83ba9eb (post 9a118b1 and fdcf6ad). All P0/P1 audit items + the Class 3 entry-window races are closed. Issue stays open pending end-of-day Monday validation.
  • Postmortem: postmortems/tp-rejected-infinite-exit-loop-2026-04-24.md
  • See also: concepts/position-state-machine, concepts/position-lifecycle.
  • Project memory: project_pm_ibkr_exit_invariant.

Open threads:

  • End-of-day Monday validation report (paper market until 15:00 CT).
  • BP-precheck math uses qty * limit_price * 100. Should query IBKR’s actual init_margin computation if available.
  • Circuit-breaker pause window for UW endpoint (currently 600s, too long; 60s with exponential backoff would recover faster). Not blocking; documented as follow-up.
  • EXIT_PENDING as an explicit 4th state vs current _exit_in_progress flag (see position-state-machine open threads).

Timeline:

2026-04-27 | Class 3 (entry-window races) added after #186 + #187 incidents. Both caused by reconciler / watchdog acting on positions during the entry-fill window. 83ba9eb ships state gates and orphan-delta logging. New companion pages: position-state-machine, position-lifecycle.

2026-04-27 (earlier) | 9a118b1 shipped fixes for the audit items 1-8 (BP precheck, software TP fallback, qty reconcile, etc).

2026-04-27 (still earlier) | Concept formalized after the 2026-04-24 #184 incident + the 2026-04-24 TRACKER_DRIFT_UNRESOLVED ghost-PnL discovery. Two distinct trust classes named so they could be tested for and asserted against, not re-discovered every quarter.

2026-04-24 | Original incident - see postmortem.