Nautilus Execution

Nautilus’s execution stack is a single deterministic command/event pipeline: Strategy → (OrderEmulator | ExecAlgorithm) → RiskEngine → ExecutionEngine → ExecutionClient → Venue, with events streaming back through the ExecutionEngine to update Cache, Portfolio, and Strategy. The pipeline is identical in backtest, sandbox, and live; only the ExecutionClient and the presence of a reconciliation loop differ. Every order goes through every stage by construction - there is no “fast path” that bypasses the RiskEngine, no PM-side handle that talks directly to the broker, no place a SELL command can be issued without an ExecutionClient route to a venue. This is the structural property that closes MK2’s exit-path failure classes (alert-without-action GH #46, updatePortfolio reconciliation drift, tracker drift, dead-code meta sizing GH #88). The doc explicitly defines RiskEngine, ExecutionEngine, ExecutionClient contracts, OMS handling, overfill detection, four reconciliation report variants, the external_order_claims mechanism for venue-initiated fills, and the LiveExecEngineConfig knobs (open_check_interval_secs, inflight_check_threshold_ms, reconciliation_startup_delay_secs, allow_overfills) that govern continuous reconciliation against broker truth.

Cody’s question - answered

Q: Does the Nautilus execution architecture make MK2’s exit-path bug classes (GH #46, tracker drift, updatePortfolio race, GH #88 dead-code sizing) structurally impossible?

A: Yes for #46, tracker drift, and the updatePortfolio race - prevented by construction. Yes for #88 conditionally - dead-code meta sizing is impossible IFF meta-prob is implemented as a RiskEngine rule rather than inline in the Strategy, because every order routes through the RiskEngine by construction. The “is it implemented as a rule” choice is on us; Nautilus provides the seam, not the rule.

The execution doc’s verbatim claim on RiskEngine routing: “Cancel and query commands route directly to other execution components and do not pass through the RiskEngine” - but submit and modify commands always do. That asymmetry is correct (you don’t want a cancel blocked by a stale risk rule) and load-bearing for our use case.

Core claim

The concepts/execution/ page presents execution as a typed pipeline of components communicating via the MessageBus. The components are explicit:

  1. Strategy - the only thing allowed to originate order commands.
  2. OrderEmulator - handles order types the venue does not natively support (e.g., trailing stops on a venue without them); transforms them into supported types when the trigger fires.
  3. ExecAlgorithm - splits one primary order into many spawned secondary orders (TWAP is the built-in example); the execution algorithm is itself an Actor and can subscribe to data, set timers, and read the Cache.
  4. RiskEngine - pre-trade validation gate; the only place size / notional / precision / state checks live.
  5. ExecutionEngine (LiveExecutionEngine in live mode) - routes commands to the right ExecutionClient, applies fill events to orders, resolves position IDs, emits OrderFilled / PositionOpened / PositionChanged / PositionClosed, runs continuous reconciliation against venue truth in live mode.
  6. ExecutionClient (LiveExecutionClient in live) - the venue adapter; wire-format-specific code that talks REST/WebSocket to the exchange and emits typed events back into the engine.

Routing per command type, verbatim from the doc:

  • submit_order(...) routes to OrderEmulator for emulated orders, to an ExecAlgorithm when exec_algorithm_id is set, and to the RiskEngine otherwise.”
  • submit_order_list(...) follows the same branching behavior based on emulation and exec_algorithm_id.”
  • modify_order(...) routes to the OrderEmulator for emulated orders and to the RiskEngine otherwise.”
  • “Cancel and query commands can route directly to the OrderEmulator, ExecAlgorithm, or ExecutionEngine, depending on the command and order state.”

For new order submission the typical chain is Strategy → OrderEmulator | ExecAlgorithm | RiskEngine then downstream OrderEmulator → ExecAlgorithm | ExecutionEngine and ExecAlgorithm → RiskEngine → ExecutionEngine → ExecutionClient. In all new-submission paths the RiskEngine runs before the order leaves Nautilus - even an emulated order, when released, transforms into a basic order type and is then sent through the standard pipeline (which includes RiskEngine validation).

ExecutionEngine - responsibilities

From nautilus-architecture.md and the execution doc combined:

“Manages order lifecycle and execution: routes trading commands to the appropriate adapter clients, tracks order and position states, coordinates with risk management systems, handles execution reports and fills from venues, handles reconciliation of external execution state.”

Concrete responsibilities:

  1. Command routing - accepts validated SubmitOrder / SubmitOrderList / ModifyOrder / CancelOrder / CancelAllOrders / QueryOrder / QueryAccount commands and dispatches each to the ExecutionClient registered for the order’s Venue.
  2. Event ingestion - receives OrderEvent and reconciliation reports from the ExecutionClient, applies them to the order in the Cache, and re-publishes on the bus.
  3. Position resolution - on OrderFilled, resolves which position (existing or new) the fill belongs to and emits the corresponding PositionOpened / PositionChanged / PositionClosed event.
  4. OMS adjudication - when strategy and venue OMS types differ (NETTING vs HEDGING), assigns or overrides position_id values on incoming fills to maintain the strategy’s view of positions.
  5. Overfill detection - before applying any fill, compares filled_qty + last_qty against quantity; behavior controlled by allow_overfills (False by default, rejects + logs; True applies the fill and tracks overfill_qty).
  6. Duplicate-fill detection - Order.is_duplicate_fill() checks (trade_id, order_side, last_px, last_qty) before apply(); exact replays log a warning and skip; the Order.apply() trade_id invariant is the hard backstop.
  7. External-order creation - on receipt of a report referencing an unknown order (venue-initiated ADL/liquidation/settlement, restart, another API client), creates an external order owned by the EXTERNAL strategy or the strategy that has called register_external_order_claims(), then plays the lifecycle events through the normal pipeline.
  8. Reconciliation (live only) - startup snapshot of broker truth, plus a continuous reconciliation loop that polls the venue at open_check_interval_secs and matches against the Cache. Events synthesized from reconciliation carry reconciliation=True so downstream consumers can distinguish them from venue-originated events.
  9. Cache writes for execution events - note the asymmetry vs Data: “For execution events, the Cache update is asynchronous in live; rely on the event payload directly if you need exact-at-event state.”

RiskEngine - responsibilities and pre-trade checks

The execution doc enumerates the RiskEngine’s checks verbatim. Unless specifically bypassed in RiskEngineConfig, the engine validates:

  1. Price and trigger-price precision for the instrument.
  2. Positive prices, unless the instrument class allows negative prices.
  3. Quantity precision and base-quantity min/max bounds.
  4. GTD orders have not already expired.
  5. reduce_only orders do not increase the referenced position.
  6. max_notional_per_order engine-level limits and instrument max_notional limits.
  7. Cash-account balance impact for non-margin accounts.
  8. Submit and modify rate limits.
  9. Trading-state restrictions (ACTIVE, HALTED, REDUCING).

On failure:

  • A submit-time failure produces an OrderDenied event with a human-readable reason; the order never reaches the venue.
  • A modify-time failure produces OrderModifyRejected.

The OrderDenied event is the load-bearing audit row: every blocked order produces a typed event with reason: str, so a logger Actor subscribed to on_order_denied (or on_order_event) captures all risk-engine rejections without per-rule plumbing.

TradingState - kill-switch

Three states control the entire submission path:

  • ACTIVE - submit and modify operate normally.
  • HALTED - new submit and modify commands are denied. Cancels still pass through.
  • REDUCING - cancels allowed; submits or modifies that increase exposure are rejected; reduce-only operations pass.

This is the venue-agnostic equivalent of MK2’s “hit the kill switch” button. HALTED is exactly the right state for an on-call human to flip when the engine is mis-behaving - pending orders can still be canceled but no new entries can be opened. Maps cleanly to feedback_no_kill_with_open_positions.md: a HALTED engine with open positions can still cancel them via the (allowed) cancel path.

RiskEngineConfig - known knobs

The doc references RiskEngineConfig but does not enumerate every option on this page; the API reference is the source of truth. From the execution doc plus nautilus-architecture.md:

  • bypass: bool - disables all checks (testing only).
  • max_order_submit_rate: tuple[int, str] - e.g., (100, "00:00:01").
  • max_order_modify_rate: tuple[int, str].
  • max_notional_per_order: dict[InstrumentId, Money].
  • Per-instrument max_notional via the Instrument itself.

Custom RiskEngine rules - the Q5 question

The concepts/execution/ page does NOT document a public extension API for adding a custom rule (e.g., a meta-prob veto/scaler) to the RiskEngine pipeline. The page describes the built-in checks and the RiskEngineConfig knobs but stops short of “here is how you write your own rule.” Two paths exist in principle, neither explicitly documented on this page:

  1. Subclass LiveRiskEngine and override the validation method. Heavyweight; requires re-doing the entire registration / wiring path in TradingNode; almost certainly not the intended seam.
  2. Pre-submit Actor that subscribes to a custom EntryIntent event from the Strategy, applies the meta-prob check, and either re-publishes EntryApproved (Strategy actually calls submit_order on receipt) or EntryDenied (Strategy logs and skips). The RiskEngine is then a second line of defense, not the meta-gate. This is a clean pattern but it is not “the RiskEngine evaluates meta-prob”; it’s “an Actor gates entry intents before they become orders.”
  3. OrderDenied from a custom validation Actor that intercepts the submit command on the bus before the RiskEngine sees it. Possible but fragile (depends on subscription order).

Spike-day actions for Q5 (this doc resolves the question partially):

  • Confirmed: every order routes through RiskEngine by construction. This is the structural property we needed.
  • Confirmed: OrderDenied is the standard event emitted by any pre-trade rejection.
  • Open: read crates/risk/src/engine.rs for the actual rule- registration API. Specifically look for a register_rule(...) or trait-based extension point.
  • Open: look at nautilus_trader/risk/engine.py (Python wrapper) and nautilus_trader/risk/sizing.py for sizing extension points.
  • Open: check whether RiskEngineConfig accepts a list of custom rule callables/objects or is fixed to the built-in set.
  • Backup plan if no extension API: implement meta-gate as the pre-submit Actor pattern above. Less elegant, still structural - the EntryIntent → EntryApproved/Denied events become the audit trail and meta-prob lives in one Actor that every strategy must consult by bus topology.

ExecutionClient - the adapter contract

What an adapter must implement (compiled from the execution doc plus nautilus-integrations.md):

  1. Connection lifecycle - connect(), disconnect(), watchdog + reconnect with IB_MAX_CONNECTION_ATTEMPTS-style retries.
  2. Account / position bootstrap - pull balances, margins, positions on connect; emit one or more AccountState events per venue (complete margin snapshots - partial snapshots overwrite per nautilus-events.md).
  3. Order command handlers - submit_order, submit_order_list, modify_order, cancel_order, cancel_all_orders, query_order, query_account. Each translates from Nautilus’s domain order to the venue’s wire format.
  4. Event emitters - for every venue lifecycle event, emit the corresponding Nautilus event: OrderSubmitted, OrderAccepted, OrderRejected, OrderTriggered, OrderUpdated, OrderModifyRejected, OrderCancelRejected, OrderCanceled, OrderExpired, OrderFilled. Timestamps must be UNIX-epoch nanoseconds (UTC).
  5. Reconciliation report emission - produce one of the four report variants on demand:
    • OrderStatusReport - standalone order state update.
    • FillReport - standalone execution.
    • OrderWithFills - bundled status + fills (atomic).
    • PositionStatusReport - position snapshot.
  6. Continuous reconciliation hook - answer the engine’s periodic poll for open orders / open positions / account state.
  7. Symbology translation - venue-native symbols ↔ Nautilus InstrumentId. IBKR exposes IB_SIMPLIFIED and IB_RAW modes.

The adapter is responsible for emitting complete margin snapshots, deterministic trade_id values for reconciliation-synthesized fills (documented requirement: “deterministic hashes of the reconciliation fill inputs, so a restart that replays reconciliation produces the same trade_id and is deduped”), and timestamps in UTC nanoseconds.

The adapter is not responsible for:

  • Position tracking (the engine owns positions; adapter emits fills).
  • Risk validation (the RiskEngine owns this).
  • Cache writes (the engine owns the Cache).
  • Order ID synthesis when client_order_id is supplied (the engine generates it via OrderFactory); adapter generates venue_order_id from venue acks.

Reconcile-on-startup pattern (live only)

The doc and nautilus-integrations.md together describe live reconciliation as a two-phase mechanism:

Phase 1 - startup snapshot

On connect():

  1. Adapter pulls all open orders, all positions, account balances/margins from the venue.
  2. Engine compares against the Cache (which may have been rehydrated from Redis if Cache database is configured).
  3. For every venue order/position not in the Cache: emit synthesized events (OrderAccepted, OrderFilled, PositionOpened) with reconciliation=True.
  4. For every Cache order/position not in venue truth: this is the harder case - the doc implies the engine treats venue truth as authoritative; the cached order is marked appropriately (OrderCanceled synthesized, or the cache entry pruned).
  5. reconciliation_startup_delay_secs (default 10) is the window given to WebSocket connections to stabilize before continuous reconciliation begins.

Phase 2 - continuous reconciliation

While running:

  1. LiveExecutionEngine periodically polls the venue at open_check_interval_secs (default not stated on this page; see API reference).
  2. For each open order in Cache, asks the adapter for current status.
  3. If adapter returns a status that differs from Cache: synthesize the delta event (OrderFilled for missed fills, OrderCanceled for missed cancels, etc.) - reconciliation=True.
  4. open_check_threshold_ms (default 5000ms) - engine waits at least this long before acting on a discrepancy, to allow real-time events to land first. Lowering this risks duplicate fills via real-time vs reconciliation race.
  5. inflight_check_threshold_ms (default 5000ms) - same idea for in-flight orders awaiting venue acknowledgment.

Race-condition handling

The doc explicitly calls out that real-time fill events and reconciliation polling can both deliver the same fill, especially during startup. Three lines of defense:

  1. Live reconciliation sanitizer - LiveExecutionEngine pre-filters on trade_id alone. If a report’s trade_id already exists on the order, skipped. (Even noisy duplicates with different last_px/last_qty are skipped at this layer; a warning is logged to flag potential venue data quality issues.)
  2. Core engine 4-field check - Order.is_duplicate_fill() checks (trade_id, order_side, last_px, last_qty). Exact replays skipped silently.
  3. Order.apply() invariant - hard error if trade_id already exists. The engine catches the error, logs full context, drops the fill, does not crash - this is the difference between a defensive hard-fail and a brittle one.

Why this directly answers project_pm_ibkr_exit_invariant.md

The MK2 invariant: PM exit intent → SELL at IBKR → position actually closes. Nautilus’s reconcile-on-startup + continuous reconciliation + the OrderDenied / OrderRejected event surface enforces a stronger property:

Cache state and broker state cannot durably diverge.

If a cached SELL is missing from the venue, the next reconciliation poll detects it and emits the missing event. If the engine “thinks” a position is closed but the broker still reports position > 0, the position is reopened from venue truth. Alert-without-action is impossible because the alert would have to be issued without an emitted event, and the engine emits an event for every state change the venue confirms.

Order routing flow - Strategy.submit_order() to OrderFilled

End-to-end with branch points:

1.  strategy.submit_order(order)
        |
        | (publishes SubmitOrder command on the bus)
        v
2.  Routing decision (one of three branches):
       - if order.emulation_trigger != NO_TRIGGER  -> OrderEmulator
       - elif order.exec_algorithm_id != None      -> ExecAlgorithm
       - else                                       -> RiskEngine
        |
        | (RiskEngine validation in all paths after emulation/algo
        |  release back to the engine path)
        v
3.  RiskEngine.handle_submit_order()
       - precision / quantity / price / GTD / reduce_only / notional
       - rate limits / TradingState
       - if violated: emit OrderDenied (terminal); STOP
       - if passed:   forward to ExecutionEngine
        |
        v
4.  ExecutionEngine.handle_submit_order()
       - emit OrderInitialized (if not already)
       - identify ExecutionClient by Venue
       - emit OrderSubmitted (engine-side; before adapter ack)
       - dispatch to adapter
        |
        v
5.  ExecutionClient.submit_order()
       - translate to venue wire format
       - HTTP/WebSocket call
       - on adapter receiving venue ack -> emit OrderAccepted
        |
        v
6.  ExecutionClient receives venue execution report:
       - emit OrderFilled(last_qty, last_px, trade_id, commission)
        |
        v
7.  ExecutionEngine.handle_order_filled()
       - is_duplicate_fill check (4-field)
       - overfill check
       - Order.apply(event) -> hard trade_id invariant
       - resolve position_id (NETTING vs HEDGING; OMS override)
       - emit PositionOpened | PositionChanged | PositionClosed
        |
        v
8.  MessageBus dispatches to subscribers:
       - Strategy.on_order_filled, on_position_opened, etc.
       - Audit logger Actor: on_event
       - Portfolio: net_exposure / unrealized / realized PnL update

Branch points worth memorizing:

  • (1→2) Routing is parameter-driven; same submit_order call, different downstream path based on order metadata.
  • (3 → OrderDenied) Every reject produces a typed event with reason. Strategy’s on_order_denied handler runs; nothing reached the venue.
  • (4 → OrderRejected) Venue can also reject an order after OrderSubmitted (e.g., bad symbol, venue-side risk). Different event, same audit semantics.
  • (7) Position flips (long→short or short→long in a single fill) are split into close-then-open events so each event has clean semantics (per nautilus-events.md).

OMS - Order Management System interactions

The execution doc devotes a section to OMS handling. Three OMS variants on OmsType enum:

  • UNSPECIFIED - defaults based on application context.
  • NETTING - one position per instrument ID.
  • HEDGING - multiple positions per instrument ID; supports both LONG and SHORT simultaneously.

OMS applies both to the strategy and to the venue. When they differ, the engine adjudicates by overriding or assigning position_id values:

Strategy OMSVenue OMSEffect
NETTINGNETTINGNative; one position ID per instrument.
HEDGINGHEDGINGNative; multiple position IDs per instrument.
NETTINGHEDGINGEngine collapses venue’s multiple positions into a single Nautilus position ID.
HEDGINGNETTINGEngine maintains multiple “virtual” positions inside Nautilus; venue tracks one.

Cortana implication: SPY 0DTE on IBKR. IBKR’s effective OMS is NETTING for equities (one net position per contract). Cortana wants NETTING strategy OMS too - one Position object per (symbol, strike, right, expiry) tuple. The default UNSPECIFIED will inherit NETTING from IBKR. No explicit configuration needed unless we ever want HEDGING (we don’t, since 0DTE chain has separate strikes that already partition exposure).

Execution algorithms - TWAP and the spawning model

ExecAlgorithm is an Actor that splits a primary order into spawned secondary orders. Built-in: TWAP. Custom is supported via subclassing.

Key mechanics:

  • A primary order arrives in on_order(order) when the strategy submits with exec_algorithm_id="TWAP".
  • The algorithm calls spawn_market(...), spawn_limit(...), or spawn_market_to_limit(...) to issue secondary orders. Each takes the primary as the first argument.
  • By default, reduce_primary=True decrements the primary’s leaves_qty by the spawned quantity. Spawned quantity must not exceed primary’s leaves_qty.
  • Spawned orders carry exec_spawn_id = primary.client_order_id and their own client_order_id is {exec_spawn_id}-E{spawn_sequence} (e.g., O-20230404-001-000-E1).
  • Cache provides orders_for_exec_algorithm(...) and orders_for_exec_spawn(...) for tracking.

Cortana relevance: 0DTE position sizes are small (5-25 contracts), so TWAP/VWAP slicing is unnecessary at our scale today. But this is the mechanism we’d reach for if we ever start running larger sizes or want to participate-rate. The “primary + spawned” pattern is also exactly how a defense-in-depth TP fallback could be modeled (primary = bracket TP, spawned = software-fallback market exit) - though Cortana’s actual fallback design uses a separate submit_order(reduce_only=True) path in the Strategy on on_quote_tick, not an exec algorithm.

Own order books

A new concept we don’t have in MK2: per-instrument L3 book of only your own orders, organized by price level. Updated automatically by the engine on submit/accept/modify/fill/cancel.

Use cases listed in the doc:

  • Real-time monitoring of your orders within the venue’s public book.
  • Validating order placement (liquidity check before submit).
  • Self-trade prevention (don’t place a buy at a price where your own sell is resting).
  • Queue position management.
  • Reconciliation between internal state and venue state.

Caveats from the doc:

  • Only orders with explicit prices can be in own books - market orders are excluded.
  • Safe cancellation queries: when querying for orders to cancel, use a status filter that excludes PENDING_CANCEL. Otherwise duplicate cancel attempts and inflated open-order counts.
  • accepted_buffer_ns parameter on many query methods - only return orders whose ts_accepted is at least N nanoseconds in the past. When > 0, you must also pass ts_now. Pre-acceptance orders have ts_accepted = 0 so they enter the result once the buffer elapses; pair with an explicit status filter (ACCEPTED / PARTIALLY_FILLED) to exclude in-flight orders.
  • Audit interval: own_books_audit_interval_secs periodically cross-checks own-book state against the Cache’s open/inflight indexes.

Cortana relevance: nice-to-have. We don’t currently have self-trade risk (single strategy, single instrument family per process), but the own-order-book audit is another structural defense against state drift.

Overfill detection and handling

How overfills happen

Two fundamentally different causes:

  1. Genuine overfills at the matching engine - the venue actually filled more than requested. Causes per the doc:
    • Race conditions in fast markets (multiple counterparties match before the order is removed from the book).
    • Minimum lot size constraints (venue fills the min lot rather than leaving an untradeable remainder).
    • DEX/AMM mechanics (fill ≠ requested due to price impact).
    • Multi-fill non-atomicity at the venue.
  2. Duplicate fill events - the same fill is delivered more than once. Causes: WebSocket reconnection replays, venue retry/delivery guarantees, API timing issues, or reconciliation polling racing against real-time WebSocket fills.

System behavior

allow_overfills: bool config option on LiveExecEngineConfig:

SettingBehavior
False (default)Logs and rejects the fill; preserves order state.
TrueLogs a warning, applies the fill, tracks excess in overfill_qty.

When True, order transitions to FILLED and leaves_qty clamps to 0.

When to enable

The doc’s recommendation: enable True on venues known to emit duplicate fills, or when reconciliation races are expected. For IBKR, genuine overfills are rare; reconciliation races are more common. Cortana spike-day action: leave default False and watch for rejected- fill warnings during paper trading. If we see them, switch to True plus monitoring rather than have orders silently in inconsistent state.

Reconciliation report variants

The execution engine consumes four reconciliation report variants from adapters in live trading:

VariantUse caseIf order missing from cache
OrderStatusReportStandalone order state update.External order created from the report; if status is PartiallyFilled / Filled, an inferred fill is synthesised from avg_px / filled_qty.
FillReportStandalone execution.External order created from the fill (Market type, qty last_qty); the real fill is then applied so trade_id and commission are preserved.
OrderWithFillsStatus update bundled with fills.External order created without an inferred fill; supplied fills applied first; residual gap closed with inferred fill.
PositionStatusReportPosition snapshot from venue.Logged; positions are derived from fills, not bootstrapped from this report.

When to use each, per the doc:

  • OrderStatusReport: ordinary lifecycle (Accepted, PartiallyFilled, Canceled, Expired) where fill detail arrives separately.
  • FillReport: venues that surface a fill for venue-initiated closures without opening a user-level order. Canonical example: Hyperliquid liquidations (userFills entry with liquidation metadata but no entry on the orders stream).
  • OrderWithFills: when a single venue event maps to both a status update and one or more fills atomically. Binance Futures uses this for ADL, liquidation, and settlement orders via dispatch_exchange_generated_fill.
  • PositionStatusReport: position snapshots are advisory; the engine logs them but does not bootstrap positions from them.

External order creation

When a report references an order not in the cache:

  1. Venue-initiated event (ADL, liquidation, settlement).
  2. Order placed by a different process (other API client on the account; IBKR’s fetch_all_open_orders=True covers this).
  3. Order not yet observed locally (race during startup).

Engine creates an external order, routing ownership to:

  • The strategy that has called register_external_order_claims(...) for the instrument, or
  • The EXTERNAL strategy as a default fallback.

client_order_id comes from the report when present, else derived from venue_order_id. Order is added to the cache, the venue order ID index is registered, lifecycle events (OrderAccepted, OrderFilled, OrderCanceled, OrderExpired) are emitted so positions update through the normal event pipeline.

Cortana relevance: if a human (or a bug) ever places an SPY option order on the same paper account outside Nautilus, Nautilus will adopt it as an external order rather than ignore it. We need to decide whether Cortana’s strategy should register_external_order_claims(...) for SPY options or whether we want unknown orders to land on the EXTERNAL strategy and be handled by an audit Actor.

Paper vs sim vs live - execution behavior unification

The execution doc describes a single conceptual pipeline; the differences between contexts live below the ExecutionClient line:

AspectBacktestSandboxLive
ExecutionEngineExecutionEngineExecutionEngineLiveExecutionEngine
ExecutionClientBacktestExecClient (matching engine)Sandbox sim clientVenue adapter (IBKR, Binance, …)
ReconciliationNone - engine owns truthNone - sim owns truthStartup snapshot + continuous polling
FillsSimulated by matching engine + fill model (probabilistic limit fills, slippage, optional ThreeTierFillModel)SimulatedReal, from venue
ClockData-driven, deterministicWall-clock (or accelerated)Wall-clock
Random seedPinned for determinismPinned for determinismn/a
OrderDenied from RiskEngineYesYesYes
OrderRejected from venueSimulated by matching engineSimulatedReal

Strategy code is identical across all three contexts. The only behavioral differences a strategy author should know:

  1. Reconciliation events fire only in live (and carry reconciliation=True so a logger can distinguish them).
  2. Backtest fills are deterministic; live fills depend on real venue queue position.
  3. Backtest is replayable bit-identically given same seed + same data + same config. Live is not (latency, async ordering).

This unification is what makes backtest results predictive of live behavior - and what makes Nautilus’s claim of “backtest-live parity by construction” defensible (cf. nautilus-architecture.md).

Error and rejection handling

The taxonomy of failures, with the event each one produces:

FailureEvent emittedOrigin
RiskEngine pre-trade rejectOrderDeniedRiskEngine
RiskEngine modify-time rejectOrderModifyRejectedRiskEngine
Venue submit rejectOrderRejectedExecutionClient (translation of venue reject)
Venue modify rejectOrderModifyRejectedExecutionClient
Venue cancel rejectOrderCancelRejectedExecutionClient
GTD/DAY/IOC/FOK expirationOrderExpiredExecutionClient (or matching engine in backtest)
Overfill rejection (allow_overfills=False)(logged, fill dropped)ExecutionEngine
Duplicate trade_id exact replay(logged warning, fill skipped)ExecutionEngine 4-field check
Duplicate trade_id noisy replay(logged error, fill dropped)ExecutionEngine Order.apply()
Reconciliation discrepancySynthesized event (reconciliation=True)LiveExecutionEngine

Every typed event has reason or equivalent fields. A logger Actor subscribed to on_event captures all of these in causal order with zero per-rule plumbing. This is the structural basis for replacing MK2’s decisions.db (cf. nautilus-events.md).

The doc explicitly distinguishes “skip gracefully” from “drop with error” for fill processing: exact replays log a warning and skip; noisy duplicates (same trade_id, different qty/px) drop with full-context error log and DO NOT crash the engine. This is the “crash-only-for-invariants” posture from nautilus-architecture.md applied to execution: bad data is dropped, not panicked on, unless it violates a true invariant (e.g., applying a duplicate trade_id would double-count, which is detected and rejected).

Cortana MK3 implications - MK2 failure mode mapping

Each MK2 exit-path failure class, mapped to the Nautilus mechanism that prevents it.

GH #46 - Alert-without-action (project_pm_ibkr_exit_invariant.md)

MK2 failure: PM decided to exit, alerted Telegram, but no SELL landed at IBKR (or the SELL was rejected and the engine treated the rejection as success). Position bled to zero on theta.

Nautilus prevention: structural, by construction.

  1. The Strategy is the only thing that can submit orders. A “PM exit decision” is implemented as self.close_position(...) or a reduce-only self.submit_order(...) inside the Strategy. There is no “alert” code path that doesn’t also produce a SubmitOrder command on the bus.
  2. Every SubmitOrder produces typed events at every stage: OrderInitialized → (OrderDenied if RiskEngine blocks) → OrderSubmitted → (OrderRejected if venue blocks) → OrderAcceptedOrderFilled. An “alert” without a corresponding event sequence is impossible - there is no place for an alert to be issued from except a handler on one of these events.
  3. OrderRejected is a first-class event, not a string the adapter can swallow. The Strategy’s on_order_rejected handler runs; the audit logger’s on_event handler runs. Both reject reasons are permanently in the event stream.
  4. Continuous reconciliation detects “I think the position is closed but the broker reports qty > 0” within open_check_interval_secs and synthesizes the events to bring Cache and broker into agreement. The “alert lied for 4 hours” scenario described in exit-path-failure-modes.md Class 2 is bounded by the reconciliation cadence.

The structural property: alert iff event iff broker action. All three are coupled by construction.

Spike-day verification: write a paper-mode test where a Strategy calls close_position() while the IBKR adapter is configured to drop the SELL command (mock the adapter). Expected behavior: OrderRejected or reconciliation discrepancy event emitted; Strategy notices via on_order_rejected; no silent state divergence.

updatePortfolio position=0 reconciliation drift (fixed in fdcf6ad)

MK2 failure: IBKR reported position=0 realizedPNL=5625.58 continuously from 10:36:45 onward. Engine-side tracker never finalized; kept emitting EXIT_PENDING and recomputing ghost unrealized as last_known_qty * marketPrice_tick. Ghost “climbed” 11K over 4 hours.

Nautilus prevention: structural.

  1. Position state is owned by the ExecutionEngine, not by a parallel “tracker” object. Position.is_closed flips when net signed_qty == 0; PositionClosed event fires; realized_pnl finalizes; duration_ns populates. No “ghost unrealized” path exists.
  2. updatePortfolio-equivalent input is a PositionStatusReport from the IBKR adapter. Per the doc, the engine “logs” it but derives positions from fills, not from these reports. So a position report saying qty=0 doesn’t directly close the Nautilus position
    • but if the underlying fill events haven’t arrived, continuous reconciliation will detect the discrepancy and synthesize the missing fill events. Either way, the engine converges on broker truth.
  3. Cache writes for execution events are asynchronous - the doc warns “you might see a brief delay between an event and its appearance in the Cache” for execution events. Strategy handlers should rely on the event payload, not re-read Cache mid-handler. This is the discipline that prevents “I see qty=N in cache but the event said qty=0” races.
  4. No “tracker” exists. The position state is one object; nothing parallels it. The MK2 split between position_state and position_tracker cannot be reproduced.

Cite: position-state-machine.md, exit-path-failure-modes.md (Class 2 - Status without truth).

Tracker drift between position_state and position_tracker

MK2 failure: two parallel state stores for a position; one updated on engine action, the other on broker callback; they could drift.

Nautilus prevention: structural. There is no parallel store. The Position object is owned by the engine and lives in the Cache. Anything that wants to know about a position queries the cache; there is no “second source of truth.” Strategy queries via self.cache.position(position_id) or self.portfolio.net_position(...) read the same object.

The closest analog to MK2’s tracker drift would be: “Cache says X but what about the venue?” - and the answer is the reconciliation mechanism, which is one-directional (venue truth → cache state). Drift cannot durably persist past one reconciliation cycle.

Cite: exit-path-failure-modes.md (Class 2), position-state-machine.md.

GH #88 - dead-code meta-prob sizing (project_codex_review_p2s.md)

MK2 failure: meta-prob sizing was defined in scoring code but silently never called by the position-sizing path. A refactor dropped the call, no test caught it, the gate became dead code.

Nautilus prevention: conditional on implementation choice.

If meta-prob sizing is implemented as a RiskEngine rule (or as a custom rule per the open extension question), then:

  1. Every order routes through the RiskEngine by construction. No submit_order call bypasses it.
  2. Risk rules are configured centrally on RiskEngineConfig, not embedded per-strategy. A strategy can’t accidentally fail to reference the rule because it doesn’t reference rules at all.
  3. The rule receives every order before it leaves Nautilus. Its evaluation is deterministic; it can scale quantity, deny via OrderDenied, or pass.

If meta-prob is instead implemented as inline strategy logic, GH #88 is not prevented - the same kind of dead-code refactor regression remains possible.

Recommendation, restated from nautilus-strategies.md: meta-prob lives in the RiskEngine. Strategy submits at unweighted base size; the custom rule scales or vetoes. This makes the gate impossible to bypass.

Open dependency: extension API for custom rules (Q5 from the spike plan; see § RiskEngine § “Custom RiskEngine rules” above). Spike-day code reading is required to pin down the actual API. The backup plan (pre-submit Actor) achieves the same routing-by-construction property if the RiskEngine path turns out to require subclassing.

Cite: project_codex_review_p2s.md, GH #88, nautilus-strategies.md.

EOD-flatten with open positions (feedback_no_kill_with_open_positions.md)

MK2 caution: never kill the engine while a position is open.

Nautilus alignment: Strategy.market_exit() is the supported graceful path. From nautilus-strategies.md:

  1. Cancels all open and in-flight orders.
  2. Closes all open positions with reduce-only market orders.
  3. Periodically re-checks (market_exit_interval_ms, market_exit_max_attempts).
  4. Calls post_market_exit once flat or after max attempts.
  5. Non-reduce-only orders are denied during exit, structurally preventing a race where on_data fires a fresh entry mid-exit.

The HALTED TradingState adds a complementary safety: after market_exit() initiates, set HALTED to refuse all submits while allowing cancels. Open positions can still be reduce-only-closed; new entries are impossible.

Cite: feedback_no_kill_with_open_positions.md, feedback_dual_tp_defense_in_depth.md, project_eod_power_hour.md.

Open question #5 resolution status

Q5: Does Nautilus expose a clean extension point for a per-strategy meta_prob veto/scaler, or does it require subclassing LiveRiskEngine?

This page resolves: PARTIAL.

What the page confirms:

  • Every order goes through the RiskEngine by construction (✓ structural property required for “GH #88 impossible”).
  • OrderDenied is the standard event for any pre-trade rejection (✓ a rule can deny cleanly with a typed event + reason).
  • RiskEngineConfig is the central knob bag (✓ knobs exist; enumeration of which knobs is in API ref, not on this concept page).

What the page does not answer:

  • Whether RiskEngineConfig accepts user-defined rule callables / RiskRule trait objects, or whether the rule set is fixed.
  • The signature of a custom rule (does it see the order? the strategy ID? the cache?).
  • Whether nautilus_trader.risk.sizing provides a sizing-extension point distinct from validation rules.

Spike Saturday action items:

  1. Read crates/risk/src/engine.rs - look for register_rule(...) or a Vec<Box<dyn RiskRule>> field on the engine struct.
  2. Read nautilus_trader/risk/engine.py - look for the Python wrapper on rule registration; check whether RiskEngineConfig has a custom_rules: list[RiskRule] parameter or similar.
  3. Read nautilus_trader/risk/sizing.py - see if there’s a sizing extension point distinct from the rule pipeline.
  4. If no public API exists: implement meta-gate as the pre-submit Actor pattern (EntryIntent → EntryApproved/Denied) and treat the RiskEngine as the second line of defense. Document the choice in a nautilus-meta-gate.md brain page.

This is exactly the kind of “concept doc tells me the seam exists, code reading confirms the API shape” question that 80% of pre-spike-day prep can resolve and 20% has to wait for source.

Caveats and gotchas

  • Cache update lag for execution events. The doc explicitly warns that “you might see a brief delay between an event and its appearance in the Cache” in live trading. In a handler, prefer the event payload to a re-read of the Cache for exact-at-event state.
  • Reconciliation events look real. Always check event.reconciliation before treating an event as fresh broker action. Otherwise alert spam during startup.
  • Partial AccountState overwrites. The adapter must emit complete margin snapshots; partial snapshots wipe out account-wide entries silently until the next full snapshot. Per nautilus-events.md.
  • Emulated orders transform on release. Hold object references at your peril - query the cache by client_order_id. Events for emulated orders include OrderEmulated (intake) and OrderReleased (release).
  • Position flips emit two events (close-then-open). Audit consumers must treat them as a pair, not a single transition.
  • Cancel/query commands skip the RiskEngine. This is correct (you don’t want a stale rule blocking a needed cancel) but worth knowing if you’re tempted to put a rule on the cancel path.
  • OCA / OCO on IBKR doesn’t auto-create OCA groups. Setting ContingencyType.OCO on a Nautilus order does not create the IB-side OCA group. Use IBOrderTags(ocaGroup=..., ocaType=...). Per nautilus-integrations.md.
  • fetch_all_open_orders=True on IBKR pulls orders placed by any API client on the account. Useful for surviving restarts; default False. If True, expect to see external orders adopted as EXTERNAL strategy on startup.
  • Reconciliation race conditions scale with reconciliation frequency. Defaults: open_check_threshold_ms=5000, inflight_check_threshold_ms=5000, reconciliation_startup_delay_secs=10. Reducing any of them increases duplicate-fill probability.
  • PositionStatusReport is not used to bootstrap positions - positions derive from fills. A position-only snapshot from the adapter is logged but ignored for state mutation. (Adapter authors must not rely on PositionStatusReport to “fix” missing fills; the right tool is OrderStatusReport + FillReport / OrderWithFills.)

When this concept applies

  • Designing the MK3 IBKR adapter integration (or evaluating the shipped one).
  • Deciding where meta-prob lives (RiskEngine rule vs Strategy inline vs pre-submit Actor).
  • Designing the EOD-flatten / kill-switch path (market_exit() + HALTED TradingState).
  • Reasoning about whether a proposed change preserves the “alert iff event iff broker action” invariant.
  • Auditing whether Cortana’s exit path closes the GH #46 trust class.
  • Reading reconciliation race / overfill / external-order behavior during paper-trading bring-up.

When it breaks / does not apply

  • The page does not document the RiskEngineConfig field set in detail; refer to API docs.
  • The page does not document the public API for adding custom risk rules; this requires source reading (Q5).
  • Performance characterization - latency budgets per stage, queue depths, throughput limits - is not on this page; check nautilus-architecture.md and the live-trading concept page.
  • Specific venue quirks (IBKR OCA, Binance ADL events) live in nautilus-integrations.md, not here.

See Also

  • Nautilus Architecture - the kernel topology that hosts the execution components.
  • Nautilus Strategies - the only component that originates orders; lifecycle, handlers, market_exit().
  • Nautilus Events - the 17 order-lifecycle events + 3 position events; what every stage emits.
  • Nautilus Orders (parallel agent - order types, TIF, contingency).
  • Nautilus Positions (parallel agent - position lifecycle, OMS effects).
  • Nautilus Integrations - IBKR adapter detail; UW custom-adapter sketch.
  • Exit-path failure modes - MK2 trust classes (alert-without-action, status-without-truth, entry-window races).
  • Position state machine - MK2 PM state-gate enforcement.
  • 2026-05-09 Nautilus Spike Plan: ~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md
  • project_pm_ibkr_exit_invariant.md - MK2 invariant Nautilus enforces by construction.
  • project_codex_review_p2s.md - GH #88 dead-code meta sizing context.
  • feedback_dual_tp_defense_in_depth.md - TP fallback pattern that survives migration via reduce-only submit_order in on_quote_tick.
  • feedback_no_kill_with_open_positions.md - market_exit() plus HALTED TradingState as the supported graceful path.

Timeline

2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 2.