Nautilus Execution
Nautilus’s execution stack is a single deterministic command/event pipeline:
Strategy → (OrderEmulator | ExecAlgorithm) → RiskEngine → ExecutionEngine → ExecutionClient → Venue, with events streaming back through theExecutionEngineto update Cache, Portfolio, and Strategy. The pipeline is identical in backtest, sandbox, and live; only theExecutionClientand the presence of a reconciliation loop differ. Every order goes through every stage by construction - there is no “fast path” that bypasses the RiskEngine, no PM-side handle that talks directly to the broker, no place a SELL command can be issued without anExecutionClientroute to a venue. This is the structural property that closes MK2’s exit-path failure classes (alert-without-action GH #46,updatePortfolioreconciliation drift, tracker drift, dead-code meta sizing GH #88). The doc explicitly defines RiskEngine, ExecutionEngine, ExecutionClient contracts, OMS handling, overfill detection, four reconciliation report variants, theexternal_order_claimsmechanism for venue-initiated fills, and theLiveExecEngineConfigknobs (open_check_interval_secs,inflight_check_threshold_ms,reconciliation_startup_delay_secs,allow_overfills) that govern continuous reconciliation against broker truth.
Cody’s question - answered
Q: Does the Nautilus execution architecture make MK2’s exit-path bug
classes (GH #46, tracker drift, updatePortfolio race, GH #88 dead-code
sizing) structurally impossible?
A: Yes for #46, tracker drift, and the updatePortfolio race -
prevented by construction. Yes for #88 conditionally - dead-code meta
sizing is impossible IFF meta-prob is implemented as a RiskEngine rule
rather than inline in the Strategy, because every order routes through
the RiskEngine by construction. The “is it implemented as a rule” choice
is on us; Nautilus provides the seam, not the rule.
The execution doc’s verbatim claim on RiskEngine routing: “Cancel and query commands route directly to other execution components and do not pass through the RiskEngine” - but submit and modify commands always do. That asymmetry is correct (you don’t want a cancel blocked by a stale risk rule) and load-bearing for our use case.
Core claim
The concepts/execution/ page presents execution as a typed pipeline of
components communicating via the MessageBus. The components are explicit:
- Strategy - the only thing allowed to originate order commands.
- OrderEmulator - handles order types the venue does not natively support (e.g., trailing stops on a venue without them); transforms them into supported types when the trigger fires.
- ExecAlgorithm - splits one primary order into many spawned secondary
orders (TWAP is the built-in example); the execution algorithm is itself
an
Actorand can subscribe to data, set timers, and read the Cache. - RiskEngine - pre-trade validation gate; the only place size / notional / precision / state checks live.
- ExecutionEngine (
LiveExecutionEnginein live mode) - routes commands to the rightExecutionClient, applies fill events to orders, resolves position IDs, emitsOrderFilled/PositionOpened/PositionChanged/PositionClosed, runs continuous reconciliation against venue truth in live mode. - ExecutionClient (
LiveExecutionClientin live) - the venue adapter; wire-format-specific code that talks REST/WebSocket to the exchange and emits typed events back into the engine.
Routing per command type, verbatim from the doc:
- “
submit_order(...)routes toOrderEmulatorfor emulated orders, to anExecAlgorithmwhenexec_algorithm_idis set, and to theRiskEngineotherwise.” - “
submit_order_list(...)follows the same branching behavior based on emulation andexec_algorithm_id.” - “
modify_order(...)routes to theOrderEmulatorfor emulated orders and to theRiskEngineotherwise.” - “Cancel and query commands can route directly to the
OrderEmulator,ExecAlgorithm, orExecutionEngine, depending on the command and order state.”
For new order submission the typical chain is
Strategy → OrderEmulator | ExecAlgorithm | RiskEngine then downstream
OrderEmulator → ExecAlgorithm | ExecutionEngine and
ExecAlgorithm → RiskEngine → ExecutionEngine → ExecutionClient. In all
new-submission paths the RiskEngine runs before the order leaves
Nautilus - even an emulated order, when released, transforms into a
basic order type and is then sent through the standard pipeline (which
includes RiskEngine validation).
ExecutionEngine - responsibilities
From nautilus-architecture.md and the execution doc combined:
“Manages order lifecycle and execution: routes trading commands to the appropriate adapter clients, tracks order and position states, coordinates with risk management systems, handles execution reports and fills from venues, handles reconciliation of external execution state.”
Concrete responsibilities:
- Command routing - accepts validated
SubmitOrder/SubmitOrderList/ModifyOrder/CancelOrder/CancelAllOrders/QueryOrder/QueryAccountcommands and dispatches each to theExecutionClientregistered for the order’sVenue. - Event ingestion - receives
OrderEventand reconciliation reports from theExecutionClient, applies them to the order in the Cache, and re-publishes on the bus. - Position resolution - on
OrderFilled, resolves which position (existing or new) the fill belongs to and emits the correspondingPositionOpened/PositionChanged/PositionClosedevent. - OMS adjudication - when strategy and venue OMS types differ
(NETTING vs HEDGING), assigns or overrides
position_idvalues on incoming fills to maintain the strategy’s view of positions. - Overfill detection - before applying any fill, compares
filled_qty + last_qtyagainstquantity; behavior controlled byallow_overfills(False by default, rejects + logs; True applies the fill and tracksoverfill_qty). - Duplicate-fill detection -
Order.is_duplicate_fill()checks(trade_id, order_side, last_px, last_qty)beforeapply(); exact replays log a warning and skip; theOrder.apply()trade_idinvariant is the hard backstop. - External-order creation - on receipt of a report referencing an
unknown order (venue-initiated ADL/liquidation/settlement, restart,
another API client), creates an external order owned by the
EXTERNALstrategy or the strategy that has calledregister_external_order_claims(), then plays the lifecycle events through the normal pipeline. - Reconciliation (live only) - startup snapshot of broker truth,
plus a continuous reconciliation loop that polls the venue at
open_check_interval_secsand matches against the Cache. Events synthesized from reconciliation carryreconciliation=Trueso downstream consumers can distinguish them from venue-originated events. - Cache writes for execution events - note the asymmetry vs Data: “For execution events, the Cache update is asynchronous in live; rely on the event payload directly if you need exact-at-event state.”
RiskEngine - responsibilities and pre-trade checks
The execution doc enumerates the RiskEngine’s checks verbatim. Unless
specifically bypassed in RiskEngineConfig, the engine validates:
- Price and trigger-price precision for the instrument.
- Positive prices, unless the instrument class allows negative prices.
- Quantity precision and base-quantity min/max bounds.
- GTD orders have not already expired.
reduce_onlyorders do not increase the referenced position.max_notional_per_orderengine-level limits and instrumentmax_notionallimits.- Cash-account balance impact for non-margin accounts.
- Submit and modify rate limits.
- Trading-state restrictions (
ACTIVE,HALTED,REDUCING).
On failure:
- A submit-time failure produces an
OrderDeniedevent with a human-readable reason; the order never reaches the venue. - A modify-time failure produces
OrderModifyRejected.
The OrderDenied event is the load-bearing audit row: every blocked
order produces a typed event with reason: str, so a logger Actor
subscribed to on_order_denied (or on_order_event) captures all
risk-engine rejections without per-rule plumbing.
TradingState - kill-switch
Three states control the entire submission path:
ACTIVE- submit and modify operate normally.HALTED- new submit and modify commands are denied. Cancels still pass through.REDUCING- cancels allowed; submits or modifies that increase exposure are rejected; reduce-only operations pass.
This is the venue-agnostic equivalent of MK2’s “hit the kill switch”
button. HALTED is exactly the right state for an on-call human to flip
when the engine is mis-behaving - pending orders can still be canceled
but no new entries can be opened. Maps cleanly to
feedback_no_kill_with_open_positions.md: a HALTED engine with open
positions can still cancel them via the (allowed) cancel path.
RiskEngineConfig - known knobs
The doc references RiskEngineConfig but does not enumerate every option
on this page; the API reference is the source of truth. From the
execution doc plus nautilus-architecture.md:
bypass: bool- disables all checks (testing only).max_order_submit_rate: tuple[int, str]- e.g.,(100, "00:00:01").max_order_modify_rate: tuple[int, str].max_notional_per_order: dict[InstrumentId, Money].- Per-instrument max_notional via the Instrument itself.
Custom RiskEngine rules - the Q5 question
The concepts/execution/ page does NOT document a public extension
API for adding a custom rule (e.g., a meta-prob veto/scaler) to the
RiskEngine pipeline. The page describes the built-in checks and the
RiskEngineConfig knobs but stops short of “here is how you write your
own rule.” Two paths exist in principle, neither explicitly documented
on this page:
- Subclass
LiveRiskEngineand override the validation method. Heavyweight; requires re-doing the entire registration / wiring path inTradingNode; almost certainly not the intended seam. - Pre-submit Actor that subscribes to a custom
EntryIntentevent from the Strategy, applies the meta-prob check, and either re-publishesEntryApproved(Strategy actually callssubmit_orderon receipt) orEntryDenied(Strategy logs and skips). The RiskEngine is then a second line of defense, not the meta-gate. This is a clean pattern but it is not “the RiskEngine evaluates meta-prob”; it’s “an Actor gates entry intents before they become orders.” OrderDeniedfrom a custom validation Actor that intercepts the submit command on the bus before the RiskEngine sees it. Possible but fragile (depends on subscription order).
Spike-day actions for Q5 (this doc resolves the question partially):
- Confirmed: every order routes through RiskEngine by construction. This is the structural property we needed.
- Confirmed:
OrderDeniedis the standard event emitted by any pre-trade rejection. - Open: read
crates/risk/src/engine.rsfor the actual rule- registration API. Specifically look for aregister_rule(...)or trait-based extension point. - Open: look at
nautilus_trader/risk/engine.py(Python wrapper) andnautilus_trader/risk/sizing.pyfor sizing extension points. - Open: check whether
RiskEngineConfigaccepts a list of custom rule callables/objects or is fixed to the built-in set. - Backup plan if no extension API: implement meta-gate as the
pre-submit Actor pattern above. Less elegant, still structural - the
EntryIntent → EntryApproved/Deniedevents become the audit trail and meta-prob lives in one Actor that every strategy must consult by bus topology.
ExecutionClient - the adapter contract
What an adapter must implement (compiled from the execution doc plus
nautilus-integrations.md):
- Connection lifecycle -
connect(),disconnect(), watchdog + reconnect withIB_MAX_CONNECTION_ATTEMPTS-style retries. - Account / position bootstrap - pull balances, margins, positions
on connect; emit one or more
AccountStateevents per venue (complete margin snapshots - partial snapshots overwrite pernautilus-events.md). - Order command handlers -
submit_order,submit_order_list,modify_order,cancel_order,cancel_all_orders,query_order,query_account. Each translates from Nautilus’s domain order to the venue’s wire format. - Event emitters - for every venue lifecycle event, emit the
corresponding Nautilus event:
OrderSubmitted,OrderAccepted,OrderRejected,OrderTriggered,OrderUpdated,OrderModifyRejected,OrderCancelRejected,OrderCanceled,OrderExpired,OrderFilled. Timestamps must be UNIX-epoch nanoseconds (UTC). - Reconciliation report emission - produce one of the four report
variants on demand:
OrderStatusReport- standalone order state update.FillReport- standalone execution.OrderWithFills- bundled status + fills (atomic).PositionStatusReport- position snapshot.
- Continuous reconciliation hook - answer the engine’s periodic poll for open orders / open positions / account state.
- Symbology translation - venue-native symbols ↔ Nautilus
InstrumentId. IBKR exposesIB_SIMPLIFIEDandIB_RAWmodes.
The adapter is responsible for emitting complete margin snapshots,
deterministic trade_id values for reconciliation-synthesized fills
(documented requirement: “deterministic hashes of the reconciliation
fill inputs, so a restart that replays reconciliation produces the same
trade_id and is deduped”), and timestamps in UTC nanoseconds.
The adapter is not responsible for:
- Position tracking (the engine owns positions; adapter emits fills).
- Risk validation (the RiskEngine owns this).
- Cache writes (the engine owns the Cache).
- Order ID synthesis when
client_order_idis supplied (the engine generates it viaOrderFactory); adapter generatesvenue_order_idfrom venue acks.
Reconcile-on-startup pattern (live only)
The doc and nautilus-integrations.md together describe live
reconciliation as a two-phase mechanism:
Phase 1 - startup snapshot
On connect():
- Adapter pulls all open orders, all positions, account balances/margins from the venue.
- Engine compares against the Cache (which may have been rehydrated
from Redis if
Cache databaseis configured). - For every venue order/position not in the Cache: emit synthesized
events (
OrderAccepted,OrderFilled,PositionOpened) withreconciliation=True. - For every Cache order/position not in venue truth: this is the
harder case - the doc implies the engine treats venue truth as
authoritative; the cached order is marked appropriately
(
OrderCanceledsynthesized, or the cache entry pruned). reconciliation_startup_delay_secs(default10) is the window given to WebSocket connections to stabilize before continuous reconciliation begins.
Phase 2 - continuous reconciliation
While running:
LiveExecutionEngineperiodically polls the venue atopen_check_interval_secs(default not stated on this page; see API reference).- For each open order in Cache, asks the adapter for current status.
- If adapter returns a status that differs from Cache: synthesize the
delta event (
OrderFilledfor missed fills,OrderCanceledfor missed cancels, etc.) -reconciliation=True. open_check_threshold_ms(default5000ms) - engine waits at least this long before acting on a discrepancy, to allow real-time events to land first. Lowering this risks duplicate fills via real-time vs reconciliation race.inflight_check_threshold_ms(default5000ms) - same idea for in-flight orders awaiting venue acknowledgment.
Race-condition handling
The doc explicitly calls out that real-time fill events and reconciliation polling can both deliver the same fill, especially during startup. Three lines of defense:
- Live reconciliation sanitizer -
LiveExecutionEnginepre-filters ontrade_idalone. If a report’strade_idalready exists on the order, skipped. (Even noisy duplicates with differentlast_px/last_qtyare skipped at this layer; a warning is logged to flag potential venue data quality issues.) - Core engine 4-field check -
Order.is_duplicate_fill()checks(trade_id, order_side, last_px, last_qty). Exact replays skipped silently. Order.apply()invariant - hard error iftrade_idalready exists. The engine catches the error, logs full context, drops the fill, does not crash - this is the difference between a defensive hard-fail and a brittle one.
Why this directly answers project_pm_ibkr_exit_invariant.md
The MK2 invariant: PM exit intent → SELL at IBKR → position actually
closes. Nautilus’s reconcile-on-startup + continuous reconciliation +
the OrderDenied / OrderRejected event surface enforces a stronger
property:
Cache state and broker state cannot durably diverge.
If a cached SELL is missing from the venue, the next reconciliation
poll detects it and emits the missing event. If the engine “thinks” a
position is closed but the broker still reports position > 0, the
position is reopened from venue truth. Alert-without-action is
impossible because the alert would have to be issued without an
emitted event, and the engine emits an event for every state change
the venue confirms.
Order routing flow - Strategy.submit_order() to OrderFilled
End-to-end with branch points:
1. strategy.submit_order(order)
|
| (publishes SubmitOrder command on the bus)
v
2. Routing decision (one of three branches):
- if order.emulation_trigger != NO_TRIGGER -> OrderEmulator
- elif order.exec_algorithm_id != None -> ExecAlgorithm
- else -> RiskEngine
|
| (RiskEngine validation in all paths after emulation/algo
| release back to the engine path)
v
3. RiskEngine.handle_submit_order()
- precision / quantity / price / GTD / reduce_only / notional
- rate limits / TradingState
- if violated: emit OrderDenied (terminal); STOP
- if passed: forward to ExecutionEngine
|
v
4. ExecutionEngine.handle_submit_order()
- emit OrderInitialized (if not already)
- identify ExecutionClient by Venue
- emit OrderSubmitted (engine-side; before adapter ack)
- dispatch to adapter
|
v
5. ExecutionClient.submit_order()
- translate to venue wire format
- HTTP/WebSocket call
- on adapter receiving venue ack -> emit OrderAccepted
|
v
6. ExecutionClient receives venue execution report:
- emit OrderFilled(last_qty, last_px, trade_id, commission)
|
v
7. ExecutionEngine.handle_order_filled()
- is_duplicate_fill check (4-field)
- overfill check
- Order.apply(event) -> hard trade_id invariant
- resolve position_id (NETTING vs HEDGING; OMS override)
- emit PositionOpened | PositionChanged | PositionClosed
|
v
8. MessageBus dispatches to subscribers:
- Strategy.on_order_filled, on_position_opened, etc.
- Audit logger Actor: on_event
- Portfolio: net_exposure / unrealized / realized PnL update
Branch points worth memorizing:
- (1→2) Routing is parameter-driven; same
submit_ordercall, different downstream path based on order metadata. - (3 → OrderDenied) Every reject produces a typed event with reason.
Strategy’s
on_order_deniedhandler runs; nothing reached the venue. - (4 → OrderRejected) Venue can also reject an order after
OrderSubmitted(e.g., bad symbol, venue-side risk). Different event, same audit semantics. - (7) Position flips (long→short or short→long in a single fill)
are split into close-then-open events so each event has clean
semantics (per
nautilus-events.md).
OMS - Order Management System interactions
The execution doc devotes a section to OMS handling. Three OMS variants
on OmsType enum:
UNSPECIFIED- defaults based on application context.NETTING- one position per instrument ID.HEDGING- multiple positions per instrument ID; supports both LONG and SHORT simultaneously.
OMS applies both to the strategy and to the venue. When they differ,
the engine adjudicates by overriding or assigning position_id values:
| Strategy OMS | Venue OMS | Effect |
|---|---|---|
| NETTING | NETTING | Native; one position ID per instrument. |
| HEDGING | HEDGING | Native; multiple position IDs per instrument. |
| NETTING | HEDGING | Engine collapses venue’s multiple positions into a single Nautilus position ID. |
| HEDGING | NETTING | Engine maintains multiple “virtual” positions inside Nautilus; venue tracks one. |
Cortana implication: SPY 0DTE on IBKR. IBKR’s effective OMS is NETTING
for equities (one net position per contract). Cortana wants NETTING
strategy OMS too - one Position object per (symbol, strike, right,
expiry) tuple. The default UNSPECIFIED will inherit NETTING from IBKR.
No explicit configuration needed unless we ever want HEDGING (we don’t,
since 0DTE chain has separate strikes that already partition exposure).
Execution algorithms - TWAP and the spawning model
ExecAlgorithm is an Actor that splits a primary order into spawned
secondary orders. Built-in: TWAP. Custom is supported via subclassing.
Key mechanics:
- A primary order arrives in
on_order(order)when the strategy submits withexec_algorithm_id="TWAP". - The algorithm calls
spawn_market(...),spawn_limit(...), orspawn_market_to_limit(...)to issue secondary orders. Each takes the primary as the first argument. - By default,
reduce_primary=Truedecrements the primary’sleaves_qtyby the spawned quantity. Spawned quantity must not exceed primary’sleaves_qty. - Spawned orders carry
exec_spawn_id = primary.client_order_idand their ownclient_order_idis{exec_spawn_id}-E{spawn_sequence}(e.g.,O-20230404-001-000-E1). - Cache provides
orders_for_exec_algorithm(...)andorders_for_exec_spawn(...)for tracking.
Cortana relevance: 0DTE position sizes are small (5-25 contracts),
so TWAP/VWAP slicing is unnecessary at our scale today. But this is the
mechanism we’d reach for if we ever start running larger sizes or want
to participate-rate. The “primary + spawned” pattern is also exactly
how a defense-in-depth TP fallback could be modeled (primary = bracket
TP, spawned = software-fallback market exit) - though Cortana’s actual
fallback design uses a separate submit_order(reduce_only=True) path
in the Strategy on on_quote_tick, not an exec algorithm.
Own order books
A new concept we don’t have in MK2: per-instrument L3 book of only your own orders, organized by price level. Updated automatically by the engine on submit/accept/modify/fill/cancel.
Use cases listed in the doc:
- Real-time monitoring of your orders within the venue’s public book.
- Validating order placement (liquidity check before submit).
- Self-trade prevention (don’t place a buy at a price where your own sell is resting).
- Queue position management.
- Reconciliation between internal state and venue state.
Caveats from the doc:
- Only orders with explicit prices can be in own books - market orders are excluded.
- Safe cancellation queries: when querying for orders to cancel, use
a status filter that excludes
PENDING_CANCEL. Otherwise duplicate cancel attempts and inflated open-order counts. accepted_buffer_nsparameter on many query methods - only return orders whosets_acceptedis at least N nanoseconds in the past. When > 0, you must also passts_now. Pre-acceptance orders havets_accepted = 0so they enter the result once the buffer elapses; pair with an explicit status filter (ACCEPTED/PARTIALLY_FILLED) to exclude in-flight orders.- Audit interval:
own_books_audit_interval_secsperiodically cross-checks own-book state against the Cache’s open/inflight indexes.
Cortana relevance: nice-to-have. We don’t currently have self-trade risk (single strategy, single instrument family per process), but the own-order-book audit is another structural defense against state drift.
Overfill detection and handling
How overfills happen
Two fundamentally different causes:
- Genuine overfills at the matching engine - the venue actually
filled more than requested. Causes per the doc:
- Race conditions in fast markets (multiple counterparties match before the order is removed from the book).
- Minimum lot size constraints (venue fills the min lot rather than leaving an untradeable remainder).
- DEX/AMM mechanics (fill ≠ requested due to price impact).
- Multi-fill non-atomicity at the venue.
- Duplicate fill events - the same fill is delivered more than once. Causes: WebSocket reconnection replays, venue retry/delivery guarantees, API timing issues, or reconciliation polling racing against real-time WebSocket fills.
System behavior
allow_overfills: bool config option on LiveExecEngineConfig:
| Setting | Behavior |
|---|---|
False (default) | Logs and rejects the fill; preserves order state. |
True | Logs a warning, applies the fill, tracks excess in overfill_qty. |
When True, order transitions to FILLED and leaves_qty clamps to 0.
When to enable
The doc’s recommendation: enable True on venues known to emit
duplicate fills, or when reconciliation races are expected. For IBKR,
genuine overfills are rare; reconciliation races are more common.
Cortana spike-day action: leave default False and watch for rejected-
fill warnings during paper trading. If we see them, switch to True
plus monitoring rather than have orders silently in inconsistent state.
Reconciliation report variants
The execution engine consumes four reconciliation report variants from adapters in live trading:
| Variant | Use case | If order missing from cache |
|---|---|---|
OrderStatusReport | Standalone order state update. | External order created from the report; if status is PartiallyFilled / Filled, an inferred fill is synthesised from avg_px / filled_qty. |
FillReport | Standalone execution. | External order created from the fill (Market type, qty last_qty); the real fill is then applied so trade_id and commission are preserved. |
OrderWithFills | Status update bundled with fills. | External order created without an inferred fill; supplied fills applied first; residual gap closed with inferred fill. |
PositionStatusReport | Position snapshot from venue. | Logged; positions are derived from fills, not bootstrapped from this report. |
When to use each, per the doc:
OrderStatusReport: ordinary lifecycle (Accepted, PartiallyFilled, Canceled, Expired) where fill detail arrives separately.FillReport: venues that surface a fill for venue-initiated closures without opening a user-level order. Canonical example: Hyperliquid liquidations (userFillsentry with liquidation metadata but no entry on the orders stream).OrderWithFills: when a single venue event maps to both a status update and one or more fills atomically. Binance Futures uses this for ADL, liquidation, and settlement orders viadispatch_exchange_generated_fill.PositionStatusReport: position snapshots are advisory; the engine logs them but does not bootstrap positions from them.
External order creation
When a report references an order not in the cache:
- Venue-initiated event (ADL, liquidation, settlement).
- Order placed by a different process (other API client on the
account; IBKR’s
fetch_all_open_orders=Truecovers this). - Order not yet observed locally (race during startup).
Engine creates an external order, routing ownership to:
- The strategy that has called
register_external_order_claims(...)for the instrument, or - The
EXTERNALstrategy as a default fallback.
client_order_id comes from the report when present, else derived from
venue_order_id. Order is added to the cache, the venue order ID index
is registered, lifecycle events (OrderAccepted, OrderFilled,
OrderCanceled, OrderExpired) are emitted so positions update through
the normal event pipeline.
Cortana relevance: if a human (or a bug) ever places an SPY option order
on the same paper account outside Nautilus, Nautilus will adopt it as
an external order rather than ignore it. We need to decide whether
Cortana’s strategy should register_external_order_claims(...) for SPY
options or whether we want unknown orders to land on the EXTERNAL
strategy and be handled by an audit Actor.
Paper vs sim vs live - execution behavior unification
The execution doc describes a single conceptual pipeline; the
differences between contexts live below the ExecutionClient line:
| Aspect | Backtest | Sandbox | Live |
|---|---|---|---|
| ExecutionEngine | ExecutionEngine | ExecutionEngine | LiveExecutionEngine |
| ExecutionClient | BacktestExecClient (matching engine) | Sandbox sim client | Venue adapter (IBKR, Binance, …) |
| Reconciliation | None - engine owns truth | None - sim owns truth | Startup snapshot + continuous polling |
| Fills | Simulated by matching engine + fill model (probabilistic limit fills, slippage, optional ThreeTierFillModel) | Simulated | Real, from venue |
| Clock | Data-driven, deterministic | Wall-clock (or accelerated) | Wall-clock |
| Random seed | Pinned for determinism | Pinned for determinism | n/a |
OrderDenied from RiskEngine | Yes | Yes | Yes |
OrderRejected from venue | Simulated by matching engine | Simulated | Real |
Strategy code is identical across all three contexts. The only behavioral differences a strategy author should know:
- Reconciliation events fire only in live (and carry
reconciliation=Trueso a logger can distinguish them). - Backtest fills are deterministic; live fills depend on real venue queue position.
- Backtest is replayable bit-identically given same seed + same data + same config. Live is not (latency, async ordering).
This unification is what makes backtest results predictive of live
behavior - and what makes Nautilus’s claim of “backtest-live parity by
construction” defensible (cf. nautilus-architecture.md).
Error and rejection handling
The taxonomy of failures, with the event each one produces:
| Failure | Event emitted | Origin |
|---|---|---|
| RiskEngine pre-trade reject | OrderDenied | RiskEngine |
| RiskEngine modify-time reject | OrderModifyRejected | RiskEngine |
| Venue submit reject | OrderRejected | ExecutionClient (translation of venue reject) |
| Venue modify reject | OrderModifyRejected | ExecutionClient |
| Venue cancel reject | OrderCancelRejected | ExecutionClient |
| GTD/DAY/IOC/FOK expiration | OrderExpired | ExecutionClient (or matching engine in backtest) |
Overfill rejection (allow_overfills=False) | (logged, fill dropped) | ExecutionEngine |
Duplicate trade_id exact replay | (logged warning, fill skipped) | ExecutionEngine 4-field check |
Duplicate trade_id noisy replay | (logged error, fill dropped) | ExecutionEngine Order.apply() |
| Reconciliation discrepancy | Synthesized event (reconciliation=True) | LiveExecutionEngine |
Every typed event has reason or equivalent fields. A logger Actor
subscribed to on_event captures all of these in causal order with
zero per-rule plumbing. This is the structural basis for replacing MK2’s
decisions.db (cf. nautilus-events.md).
The doc explicitly distinguishes “skip gracefully” from “drop with
error” for fill processing: exact replays log a warning and skip; noisy
duplicates (same trade_id, different qty/px) drop with full-context
error log and DO NOT crash the engine. This is the
“crash-only-for-invariants” posture from nautilus-architecture.md
applied to execution: bad data is dropped, not panicked on, unless it
violates a true invariant (e.g., applying a duplicate trade_id would
double-count, which is detected and rejected).
Cortana MK3 implications - MK2 failure mode mapping
Each MK2 exit-path failure class, mapped to the Nautilus mechanism that prevents it.
GH #46 - Alert-without-action (project_pm_ibkr_exit_invariant.md)
MK2 failure: PM decided to exit, alerted Telegram, but no SELL landed at IBKR (or the SELL was rejected and the engine treated the rejection as success). Position bled to zero on theta.
Nautilus prevention: structural, by construction.
- The Strategy is the only thing that can submit orders. A “PM
exit decision” is implemented as
self.close_position(...)or a reduce-onlyself.submit_order(...)inside the Strategy. There is no “alert” code path that doesn’t also produce aSubmitOrdercommand on the bus. - Every
SubmitOrderproduces typed events at every stage:OrderInitialized→ (OrderDeniedif RiskEngine blocks) →OrderSubmitted→ (OrderRejectedif venue blocks) →OrderAccepted→OrderFilled. An “alert” without a corresponding event sequence is impossible - there is no place for an alert to be issued from except a handler on one of these events. OrderRejectedis a first-class event, not a string the adapter can swallow. The Strategy’son_order_rejectedhandler runs; the audit logger’son_eventhandler runs. Both reject reasons are permanently in the event stream.- Continuous reconciliation detects “I think the position is
closed but the broker reports
qty > 0” withinopen_check_interval_secsand synthesizes the events to bring Cache and broker into agreement. The “alert lied for 4 hours” scenario described inexit-path-failure-modes.mdClass 2 is bounded by the reconciliation cadence.
The structural property: alert iff event iff broker action. All three are coupled by construction.
Spike-day verification: write a paper-mode test where a Strategy
calls close_position() while the IBKR adapter is configured to drop
the SELL command (mock the adapter). Expected behavior: OrderRejected
or reconciliation discrepancy event emitted; Strategy notices via
on_order_rejected; no silent state divergence.
updatePortfolio position=0 reconciliation drift (fixed in fdcf6ad)
MK2 failure: IBKR reported position=0 realizedPNL=5625.58
continuously from 10:36:45 onward. Engine-side tracker never finalized;
kept emitting EXIT_PENDING and recomputing ghost unrealized as
last_known_qty * marketPrice_tick. Ghost “climbed” 11K over
4 hours.
Nautilus prevention: structural.
- Position state is owned by the
ExecutionEngine, not by a parallel “tracker” object.Position.is_closedflips when netsigned_qty == 0;PositionClosedevent fires;realized_pnlfinalizes;duration_nspopulates. No “ghost unrealized” path exists. updatePortfolio-equivalent input is aPositionStatusReportfrom the IBKR adapter. Per the doc, the engine “logs” it but derives positions from fills, not from these reports. So a position report sayingqty=0doesn’t directly close the Nautilus position- but if the underlying fill events haven’t arrived, continuous reconciliation will detect the discrepancy and synthesize the missing fill events. Either way, the engine converges on broker truth.
- Cache writes for execution events are asynchronous - the doc warns “you might see a brief delay between an event and its appearance in the Cache” for execution events. Strategy handlers should rely on the event payload, not re-read Cache mid-handler. This is the discipline that prevents “I see qty=N in cache but the event said qty=0” races.
- No “tracker” exists. The position state is one object; nothing
parallels it. The MK2 split between
position_stateandposition_trackercannot be reproduced.
Cite: position-state-machine.md, exit-path-failure-modes.md
(Class 2 - Status without truth).
Tracker drift between position_state and position_tracker
MK2 failure: two parallel state stores for a position; one updated on engine action, the other on broker callback; they could drift.
Nautilus prevention: structural. There is no parallel store.
The Position object is owned by the engine and lives in the Cache.
Anything that wants to know about a position queries the cache; there
is no “second source of truth.” Strategy queries via
self.cache.position(position_id) or self.portfolio.net_position(...)
read the same object.
The closest analog to MK2’s tracker drift would be: “Cache says X but what about the venue?” - and the answer is the reconciliation mechanism, which is one-directional (venue truth → cache state). Drift cannot durably persist past one reconciliation cycle.
Cite: exit-path-failure-modes.md (Class 2), position-state-machine.md.
GH #88 - dead-code meta-prob sizing (project_codex_review_p2s.md)
MK2 failure: meta-prob sizing was defined in scoring code but silently never called by the position-sizing path. A refactor dropped the call, no test caught it, the gate became dead code.
Nautilus prevention: conditional on implementation choice.
If meta-prob sizing is implemented as a RiskEngine rule (or as a
custom rule per the open extension question), then:
- Every order routes through the RiskEngine by construction. No
submit_ordercall bypasses it. - Risk rules are configured centrally on
RiskEngineConfig, not embedded per-strategy. A strategy can’t accidentally fail to reference the rule because it doesn’t reference rules at all. - The rule receives every order before it leaves Nautilus. Its
evaluation is deterministic; it can scale
quantity, deny viaOrderDenied, or pass.
If meta-prob is instead implemented as inline strategy logic, GH #88 is not prevented - the same kind of dead-code refactor regression remains possible.
Recommendation, restated from nautilus-strategies.md: meta-prob
lives in the RiskEngine. Strategy submits at unweighted base size; the
custom rule scales or vetoes. This makes the gate impossible to bypass.
Open dependency: extension API for custom rules (Q5 from the spike plan; see § RiskEngine § “Custom RiskEngine rules” above). Spike-day code reading is required to pin down the actual API. The backup plan (pre-submit Actor) achieves the same routing-by-construction property if the RiskEngine path turns out to require subclassing.
Cite: project_codex_review_p2s.md, GH #88,
nautilus-strategies.md.
EOD-flatten with open positions (feedback_no_kill_with_open_positions.md)
MK2 caution: never kill the engine while a position is open.
Nautilus alignment: Strategy.market_exit() is the supported
graceful path. From nautilus-strategies.md:
- Cancels all open and in-flight orders.
- Closes all open positions with reduce-only market orders.
- Periodically re-checks (
market_exit_interval_ms,market_exit_max_attempts). - Calls
post_market_exitonce flat or after max attempts. - Non-reduce-only orders are denied during exit, structurally
preventing a race where
on_datafires a fresh entry mid-exit.
The HALTED TradingState adds a complementary safety: after
market_exit() initiates, set HALTED to refuse all submits while
allowing cancels. Open positions can still be reduce-only-closed; new
entries are impossible.
Cite: feedback_no_kill_with_open_positions.md,
feedback_dual_tp_defense_in_depth.md,
project_eod_power_hour.md.
Open question #5 resolution status
Q5: Does Nautilus expose a clean extension point for a per-strategy
meta_prob veto/scaler, or does it require subclassing
LiveRiskEngine?
This page resolves: PARTIAL.
What the page confirms:
- Every order goes through the RiskEngine by construction (✓ structural property required for “GH #88 impossible”).
OrderDeniedis the standard event for any pre-trade rejection (✓ a rule can deny cleanly with a typed event + reason).RiskEngineConfigis the central knob bag (✓ knobs exist; enumeration of which knobs is in API ref, not on this concept page).
What the page does not answer:
- Whether
RiskEngineConfigaccepts user-defined rule callables /RiskRuletrait objects, or whether the rule set is fixed. - The signature of a custom rule (does it see the order? the strategy ID? the cache?).
- Whether
nautilus_trader.risk.sizingprovides a sizing-extension point distinct from validation rules.
Spike Saturday action items:
- Read
crates/risk/src/engine.rs- look forregister_rule(...)or aVec<Box<dyn RiskRule>>field on the engine struct. - Read
nautilus_trader/risk/engine.py- look for the Python wrapper on rule registration; check whetherRiskEngineConfighas acustom_rules: list[RiskRule]parameter or similar. - Read
nautilus_trader/risk/sizing.py- see if there’s a sizing extension point distinct from the rule pipeline. - If no public API exists: implement meta-gate as the pre-submit
Actor pattern (
EntryIntent → EntryApproved/Denied) and treat the RiskEngine as the second line of defense. Document the choice in anautilus-meta-gate.mdbrain page.
This is exactly the kind of “concept doc tells me the seam exists, code reading confirms the API shape” question that 80% of pre-spike-day prep can resolve and 20% has to wait for source.
Caveats and gotchas
- Cache update lag for execution events. The doc explicitly warns that “you might see a brief delay between an event and its appearance in the Cache” in live trading. In a handler, prefer the event payload to a re-read of the Cache for exact-at-event state.
- Reconciliation events look real. Always check
event.reconciliationbefore treating an event as fresh broker action. Otherwise alert spam during startup. - Partial
AccountStateoverwrites. The adapter must emit complete margin snapshots; partial snapshots wipe out account-wide entries silently until the next full snapshot. Pernautilus-events.md. - Emulated orders transform on release. Hold object references at
your peril - query the cache by
client_order_id. Events for emulated orders includeOrderEmulated(intake) andOrderReleased(release). - Position flips emit two events (close-then-open). Audit consumers must treat them as a pair, not a single transition.
- Cancel/query commands skip the RiskEngine. This is correct (you don’t want a stale rule blocking a needed cancel) but worth knowing if you’re tempted to put a rule on the cancel path.
- OCA / OCO on IBKR doesn’t auto-create OCA groups. Setting
ContingencyType.OCOon a Nautilus order does not create the IB-side OCA group. UseIBOrderTags(ocaGroup=..., ocaType=...). Pernautilus-integrations.md. fetch_all_open_orders=Trueon IBKR pulls orders placed by any API client on the account. Useful for surviving restarts; defaultFalse. If True, expect to see external orders adopted asEXTERNALstrategy on startup.- Reconciliation race conditions scale with reconciliation
frequency. Defaults:
open_check_threshold_ms=5000,inflight_check_threshold_ms=5000,reconciliation_startup_delay_secs=10. Reducing any of them increases duplicate-fill probability. PositionStatusReportis not used to bootstrap positions - positions derive from fills. A position-only snapshot from the adapter is logged but ignored for state mutation. (Adapter authors must not rely onPositionStatusReportto “fix” missing fills; the right tool isOrderStatusReport+FillReport/OrderWithFills.)
When this concept applies
- Designing the MK3 IBKR adapter integration (or evaluating the shipped one).
- Deciding where meta-prob lives (RiskEngine rule vs Strategy inline vs pre-submit Actor).
- Designing the EOD-flatten / kill-switch path (
market_exit()+HALTEDTradingState). - Reasoning about whether a proposed change preserves the “alert iff event iff broker action” invariant.
- Auditing whether Cortana’s exit path closes the GH #46 trust class.
- Reading reconciliation race / overfill / external-order behavior during paper-trading bring-up.
When it breaks / does not apply
- The page does not document the
RiskEngineConfigfield set in detail; refer to API docs. - The page does not document the public API for adding custom risk rules; this requires source reading (Q5).
- Performance characterization - latency budgets per stage, queue
depths, throughput limits - is not on this page; check
nautilus-architecture.mdand the live-trading concept page. - Specific venue quirks (IBKR OCA, Binance ADL events) live in
nautilus-integrations.md, not here.
See Also
- Nautilus Architecture - the kernel topology that hosts the execution components.
- Nautilus Strategies - the only component
that originates orders; lifecycle, handlers,
market_exit(). - Nautilus Events - the 17 order-lifecycle events + 3 position events; what every stage emits.
- Nautilus Orders (parallel agent - order types, TIF, contingency).
- Nautilus Positions (parallel agent - position lifecycle, OMS effects).
- Nautilus Integrations - IBKR adapter detail; UW custom-adapter sketch.
- Exit-path failure modes - MK2 trust classes (alert-without-action, status-without-truth, entry-window races).
- Position state machine - MK2 PM state-gate enforcement.
- 2026-05-09 Nautilus Spike Plan:
~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md project_pm_ibkr_exit_invariant.md- MK2 invariant Nautilus enforces by construction.project_codex_review_p2s.md- GH #88 dead-code meta sizing context.feedback_dual_tp_defense_in_depth.md- TP fallback pattern that survives migration via reduce-onlysubmit_orderinon_quote_tick.feedback_no_kill_with_open_positions.md-market_exit()plusHALTEDTradingState as the supported graceful path.
Timeline
2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 2.