Nautilus Live Trading
The Nautilus
concepts/live/page is execution-reconciliation-centric: it documents how theLiveExecutionEnginealigns the venue’s actual order and position state with the system’s internal state at startup and continuously thereafter, and pins down four invariants (position quantity, average entry price, PnL integrity, ID determinism) that hold even when reconciliation windows miss fill history. The page is the canonical answer to MK2’s “open positions on restart” failure class - and to the 2026-04-22 data-loss class and the 2026-05-06 power-outage state-divergence class. Reconciliation is not an opt-in feature: “Unlessreconciliationis set to false, the execution engine reconciles state for each venue at startup.” If reconciliation fails, “the system logs an error and does not start” - fail-fast at boot is the default. TradingNode lifecycle, signal handling, in-flight-order drain semantics, and Redis cache config are NOT on this page (they live ingetting_started/live_trading/, theLiveExecEngineConfigAPI reference, andnautilus-cache.md); this page covers the reconciliation loop, the in-flight check loop, the open-orders poll, the own-books audit, and the partial-window adjustment scenarios.
Why this page exists (vs. siblings)
nautilus-architecture.mdcovers the kernel, the FSM (PRE_INITIALIZED → READY → RUNNING → STOPPED), the crash-only design (“Startup and crash recovery share the same code path”), and the production stance (panic = abort).nautilus-execution.mdcovers the command/event pipeline, the RiskEngine, theExecutionClientadapter contract, the four reconciliation report variants, race-condition handling, and theLiveExecEngineConfigknobs that govern continuous reconciliation (open_check_interval_secs,inflight_check_threshold_ms,reconciliation_startup_delay_secs,allow_overfills).nautilus-cache.mdcovers Redis externalization,CacheConfig,flush_on_start,use_instance_id, and the cache-then-publish invariant.nautilus-positions.mdcovers thePositionobject, OMS adjudication, NETTING vs HEDGING, and how reconciled positions land.nautilus-message-bus.mdcovers the bus, external Redis Streams, and the producer/consumer pattern.
This page is the operational handbook for what happens
between the moment a TradingNode boots and the moment it is ready
to take a fresh entry order. It is what an operator (or a launchd
preflight check) needs to know about boot, reconnect, in-flight
drift, and shutdown.
Core claim
Live trading in Nautilus is backtested-strategy code, deployed
unchanged, behind a LiveExecutionEngine that owns reconciliation.
The strategy author writes one program; the framework rebuilds the
strategy’s view of reality from venue truth on every startup, every
reconnect, and continuously thereafter.
“NautilusTrader deploys backtested strategies to live markets with no code changes. The same actors, strategies, and execution algorithms run against both the backtest engine and a live trading node.”
“Live trading involves real financial risk. Before deploying to production, understand system configuration, node operations, execution reconciliation, and the differences between backtesting and live trading.”
The author of this page wants you to internalize four things before flipping the live switch:
- Configuration (a TradingNodeConfig with data + exec clients).
- Node operations (build → start → run → stop → dispose).
- Execution reconciliation (this page’s main subject).
- Backtest-vs-live differences (latency, real fills, real risk).
TradingNode lifecycle
The concepts/live/ page itself does not document the
TradingNode lifecycle methods - it points to the
Configure a live trading node how-to and the configuration
concept guide. Pulling the lifecycle from
nautilus-architecture.md (FSM) plus the standard Nautilus example
shape:
from nautilus_trader.live.node import TradingNode
from nautilus_trader.config import TradingNodeConfig
config = TradingNodeConfig(
trader_id="CORTANA-PAPER",
data_clients={"IB": data_config},
exec_clients={"IB": exec_config},
cache=cache_config, # optional Redis backing
message_bus=message_bus_config, # optional external streams
data_engine=LiveDataEngineConfig(...),
exec_engine=LiveExecEngineConfig(...),
risk_engine=RiskEngineConfig(...),
)
node = TradingNode(config=config)
node.add_strategy(MyStrategy(...))
node.add_actor(AuditLogger(...))
node.build() # wire components, register adapters
node.run() # blocks; performs reconciliation, then dispatches
# SIGINT / SIGTERM trigger graceful shutdown
node.dispose() # tear down clients, flush writers, release resourcesState machine (from nautilus-architecture.md)
Stable states: PRE_INITIALIZED, READY, RUNNING, STOPPED,
DEGRADED, FAULTED, DISPOSED.
Transitional states (brief): STARTING, STOPPING, RESUMING,
RESETTING, DISPOSING, DEGRADING, FAULTING.
The architecture page commits to:
“The system does provide graceful shutdown flows (stop, dispose) for normal operation. These tear down clients, persist state, and flush writers. The crash-only philosophy applies specifically to unrecoverable faults where attempting graceful cleanup could cause further damage.”
So the intended shutdown path on SIGTERM is:
STOPPING- stop accepting new commands (the engine no longer dispatches new strategy events).- Tear down clients - disconnect from venues.
- Persist state - Cache flush to Redis (if configured).
- Flush writers - Parquet catalog, audit sinks.
STOPPED- components are quiescent.DISPOSING→DISPOSED- release resources.
Open question on this page (and the architecture page): does
stop() block on draining in-flight orders, or does it terminate
abruptly leaving in-flight orders unresolved? See “Shutdown
semantics - does it respect open positions?” below.
Reconnection behavior
The concepts/live/ page does not document reconnection mechanics
end-to-end, but several signals indicate the model:
- Continuous reconciliation IS the reconnect-recovery story. Per the page: “These [in-flight] orders are monitored by the continuous reconciliation loop to detect stale or lost messages.” When the WebSocket drops and reconnects, the open-orders poll and the in-flight-order check together detect any state divergence that occurred during the gap and synthesize the missing events.
- Adapter-level reconnect lives in the
ExecutionClient/DataClient. Pernautilus-integrations.md(IBKR): “Watchdog loop monitors the socket; auto-reconnect withIB_MAX_CONNECTION_ATTEMPTSretries.” Each adapter is responsible for its own reconnect logic; the engine then re-runs reconciliation against the now-connected venue. - No explicit “reconnect” event documented on this page. The
reconciliation events that fire after a reconnect carry
reconciliation=Trueso audit consumers can distinguish them from fresh broker activity (pernautilus-execution.md).
For Cortana: reconnect-driven reconciliation means the engine self-heals from network drops. The MK2 model - where a Telegram alert might be the first signal that “we’re disconnected and the position is bleeding” - is replaced by structural automatic recovery.
Reconcile-on-startup - the canonical pattern
This is the most load-bearing section. Verbatim from the doc:
“Execution reconciliation aligns the venue’s actual order and position state with the system’s internal state built from events. Only the
LiveExecutionEngineperforms reconciliation, since backtesting controls both sides.”
“Unless
reconciliationis set to false, the execution engine reconciles state for each venue at startup.”
“If reconciliation fails, the system logs an error and does not start.”
The engine fails to start if reconciliation fails. This is the
fail-fast at boot posture from nautilus-architecture.md applied
operationally. There is no “alert and degrade” path - the engine
either reconciles cleanly or refuses to come up.
Two startup scenarios
“Two scenarios:
- Cached state exists: report data generates missing events to align the state.
- No cached state: all orders and positions at the venue are generated from scratch.”
The first case is the normal restart (Redis-backed Cache survives the process exit; reconciliation just patches deltas). The second is a clean-room boot - no Cache, the engine rebuilds the order and position graph entirely from venue mass status reports.
“Persist all execution events to the cache database. This reduces reliance on venue history and allows full recovery even with short lookback windows.”
This is the doc’s recommendation: always persist execution events to the cache database. The combination of “Cache backed by Redis”
- “cache database persists execution events” is what makes reconciliation cheap and bounded.
Reconciliation procedure - the three calls
“All adapter execution clients follow the same reconciliation procedure, calling three methods to produce an execution mass status:
generate_order_status_reportsgenerate_fill_reportsgenerate_position_status_reports”
Each adapter implements these three methods. The engine then walks the reports through the procedure below.
Reconciliation procedure - the steps (verbatim)
“Duplicate check: Deduplicates order reports within the batch and logs warnings. Logs duplicate trade IDs as warnings for investigation.”
“Order reconciliation: Generates and applies events to move orders from cached state to current state. Infers
OrderFilledevents for missing trade reports. Generates external order events for unrecognized client order IDs or reports missing a client order ID. Verifies fill report data consistency with tolerance-based price and commission comparisons.”“Position reconciliation: Matches the net position per instrument against venue position reports using instrument precision. Generates external order events when order reconciliation leaves a position that differs from the venue. When
generate_missing_ordersis enabled (default: True), generates orders with strategy IDEXTERNALand tagRECONCILIATIONto align discrepancies.”
The price hierarchy used by reconciliation orders, in order of preference:
- Calculated reconciliation price (preferred) - targets the correct average position.
- Market mid-price - uses the current bid-ask midpoint.
- Current position average - uses the existing position’s average price.
- MARKET order (last resort) - “used only when no price data exists (no positions, no market data).”
“Uses LIMIT orders when a price can be determined (cases 1-3) to preserve PnL accuracy. Skips zero quantity differences after precision rounding.”
What signals “reconciliation complete, ready to trade”?
The page does not name a single named event such as
ReconciliationComplete. The canonical signal is the engine
transitioning to RUNNING state (per the architecture FSM). The
mechanism:
start()is called.LiveExecutionEnginebegins reconciliation against each registered venue.- Reconciliation runs through the procedure above; any synthesized
events flow through the normal pipeline carrying
reconciliation=True(pernautilus-execution.md). - If reconciliation fails: the system logs an error and does not
start. The engine never reaches
RUNNING. Boot aborts. - If reconciliation succeeds: the engine transitions to
RUNNING, strategies begin receiving fresh data and events, and theStrategy.on_start()lifecycle hook fires - which is the operational signal that “reconciliation is complete and we are ready to trade.”
So the “reconciliation complete” signal is implicit:
Strategy.on_start() running ≡ reconciliation has succeeded ≡
engine is RUNNING. Anything an operator (or audit logger) wants
to gate on “reconciliation done” should subscribe to the
ComponentStateChanged event for the LiveExecutionEngine, or
hook off the strategy’s on_start() lifecycle method.
A reconciliation_startup_delay_secs window (default 10s; per
nautilus-execution.md) is provided for WebSocket connections to
stabilize before continuous reconciliation begins. This is a buffer
between “data clients connected” and “in-flight check loop active”,
not a signal an external consumer should wait on.
Reconciliation invariants (verbatim)
“The reconciliation system maintains four invariants:
- Position quantity: the final quantity matches the venue within instrument precision.
- Average entry price: the position’s average entry price matches the venue’s reported price within tolerance (default 0.01%).
- PnL integrity: all generated fills, including synthetic fills, use calculated prices that preserve correct unrealized PnL.
- ID determinism: synthetic
trade_idandvenue_order_idvalues emitted during reconciliation are deterministic functions of the logical event. The same logical fill or position-adjustment order produces the same ID across restarts, so replayed reconciliation events dedupe against earlier runs instead of being treated as new.”
“These hold even when:
- The reconciliation window misses complete fill history.
- Fills are missing from venue reports.
- Position lifecycles span beyond the lookback window.
- Multiple zero-crossings have occurred.”
The fourth invariant (ID determinism) is the structural property that makes “restart twice in five minutes” safe - replayed reconciliation events dedupe against earlier runs.
In-flight order checks - the runtime continuous loop
Three terminology states the doc names verbatim:
“An in-flight order is one awaiting venue acknowledgement:
SUBMITTED- initial submission, awaiting accept/reject.PENDING_UPDATE- modification requested, awaiting confirmation.PENDING_CANCEL- cancellation requested, awaiting confirmation.”
“These orders are monitored by the continuous reconciliation loop to detect stale or lost messages.”
The runtime check table (verbatim):
| Scenario | System behavior |
|---|---|
| In-flight order timeout - order remains unconfirmed beyond threshold | After inflight_check_retries, resolves to REJECTED. |
| Open orders check discrepancy - periodic poll detects a venue state change | Confirms status at open_check_interval_secs and applies transitions. |
| Own books audit mismatch - own order books diverge from venue public books | Audits at own_books_audit_interval_secs, logs inconsistencies. |
This is the structural fix to MK2’s “alert without action” class.
A SUBMITTED order that never gets a venue ack does not silently
sit forever - after inflight_check_retries polls, the engine
treats it as REJECTED, emits the typed event, and the strategy’s
on_order_rejected fires.
Configuration knobs (LiveExecEngineConfig)
The page mentions these knobs by name but defers the full
enumeration to the API reference. Compiled list (this page +
nautilus-execution.md):
| Knob | Default | Effect |
|---|---|---|
reconciliation | True | Master switch - turn off only for testing |
reconciliation_lookback_mins | unset (= max venue history) | History window the adapter requests |
generate_missing_orders | True | Generate EXTERNAL/RECONCILIATION orders to align position discrepancies |
external_order_claims | empty | Per-strategy (strategy_id, instrument_id) claims for orders that pre-existed reconciliation |
filter_unclaimed_external_orders | (default unspecified on this page) | Whether to filter unclaimed externals from the strategy event stream |
filtered_client_order_ids | empty | Specific client order IDs to skip during reconciliation |
inflight_check_interval_ms | (per API ref) | Cadence of the in-flight order check |
inflight_check_threshold_ms | 5000ms (per nautilus-execution.md) | Wait threshold before acting on a discrepancy |
inflight_check_retries | (per API ref) | After this many failed re-checks, in-flight order resolves to REJECTED |
open_check_interval_secs | (per API ref) | Cadence of the open-orders venue poll |
open_check_threshold_ms | 5000ms (per nautilus-execution.md) | Wait threshold before acting on an open-order discrepancy |
own_books_audit_interval_secs | (per API ref) | Cadence of the own-order-book audit |
reconciliation_startup_delay_secs | 10 | Buffer to let WebSocket connections stabilize before continuous reconciliation begins |
allow_overfills | False | If True, applies overfills with overfill_qty tracking; if False, rejects + logs |
“For all live trading options, see the
LiveExecEngineConfigAPI Reference.”
The doc explicitly notes:
“Leave
reconciliation_lookback_minsunset. This lets the engine request the maximum execution history the venue provides. Executions before the lookback window still generate alignment events, but with some information loss that a longer window would avoid. Some venues also filter or drop older execution data. Persisting all events to the cache database prevents both issues.”
So the recommended posture: leave lookback unset, persist all events to the Cache database (Redis).
External order claims - surviving a clean-room boot
“Each strategy can claim external orders for an instrument ID generated during reconciliation via the
external_order_claimsconfig parameter. This lets a strategy resume managing open orders when no cached state exists.”
“Orders generated with strategy ID
EXTERNALand tagRECONCILIATIONduring position reconciliation are internal to the engine. They cannot be claimed viaexternal_order_claimsand should not be managed by user strategies. To detect external orders in your strategy, checkorder.strategy_id.value == "EXTERNAL". These orders participate in portfolio calculations and position tracking like any other order.”
The two-tag taxonomy:
EXTERNAL+ tagVENUE- pre-existed at the venue, the engine adopted them. Claimable by a strategy viaexternal_order_claims.EXTERNAL+ tagRECONCILIATION- synthesized by the engine to align position discrepancies. NOT claimable - engine-internal.
For Cortana’s “clean-room boot after a workspace archive” scenario:
the strategy can declare external_order_claims = {(StrategyId, InstrumentId)} so that any pre-existing open SPY 0DTE positions
are adopted by the strategy on restart, and the strategy’s
on_order_event / on_position_event handlers fire as if they
were its own orders.
Common reconciliation issues (verbatim)
“Missing trade reports: Some venues filter out older trades. Increase
reconciliation_lookback_minsor cache all events locally.”
“Position mismatches: External orders that predate the lookback window cause position drift. Flatten the account before restarting to reset state.”
“Duplicate order IDs: Deduplicated with warnings logged. Frequent duplicates may indicate venue data integrity issues.”
“Precision differences: Small decimal differences are handled using instrument precision. Large discrepancies may indicate missing orders.”
“Out-of-order reports: Fill reports arriving before order status reports are deferred until order state is available.”
“For persistent issues, drop cached state or flatten accounts before restarting.”
The “flatten the account before restarting” suggestion is the doc’s escape hatch when reconciliation can’t converge. For Cortana this maps cleanly to MK2’s “kill engine, manual flatten, restart” recovery playbook - same operational pattern, but Nautilus makes it the exception, not the rule.
Partial window adjustment scenarios
When reconciliation_lookback_mins is set, the window may miss
opening fills. The doc enumerates lifecycle reconstruction
scenarios verbatim:
| Scenario | System behavior |
|---|---|
| Complete lifecycle - all fills captured | No adjustment. |
| Incomplete single lifecycle - window misses opening fills, no zero-crossings | Adds synthetic opening fill with calculated price. |
| Multiple lifecycles, current matches venue | Filters out old lifecycles, returns current only. |
| Multiple lifecycles, current mismatch | Replaces current lifecycle with a single synthetic fill. |
| Flat position | No adjustment. |
| No fills | No adjustment, empty result. |
Key concepts (verbatim):
- Zero-crossing: position quantity crosses through zero (FLAT), marking a lifecycle boundary.
- Lifecycle: a sequence of fills between zero-crossings representing one open-close cycle.
- Synthetic fill: a calculated fill report representing missing activity, priced to achieve the correct average position.
- Tolerance: position matching uses configurable price tolerance
(default
0.0001= 0.01%) to absorb minor calculation differences.
For Cortana: SPY 0DTE positions are short-lived; a single trading day rarely has a position older than the lookback window. The partial-window adjustment is more relevant for restart sequences that span multiple days (overnight crash recovery), which is rare in 0DTE but can happen during a Friday-night-into-Monday-morning incident.
Shutdown semantics - does it respect open positions?
Direct answer: NO, not by default. The framework’s shutdown
flow is mechanical - stop() tears down clients, persists state,
and flushes writers (per nautilus-architecture.md). It does NOT
intrinsically:
- Block waiting for open positions to be closed.
- Issue reduce-only market exits before shutdown.
- Drain in-flight orders to a terminal state before disposing.
- Refuse to shut down while any position has
signed_qty != 0.
The concepts/live/ page is silent on shutdown semantics
specifically. The architecture page states:
“These tear down clients, persist state, and flush writers.”
That’s the entire shutdown contract. There is no documented position-respecting safety gate.
Implication for feedback_no_kill_with_open_positions.md
The MK2 invariant - never kill the engine while a position is
open - is NOT structurally enforced by Nautilus’s default
shutdown path. We must wire it explicitly. Two options, both
documented in nautilus-execution.md:
-
Strategy.market_exit()- the supported graceful flatten path:- Cancels all open and in-flight orders.
- Closes all open positions with reduce-only market orders.
- Periodically re-checks (
market_exit_interval_ms,market_exit_max_attempts). - Calls
post_market_exitonce flat or after max attempts. - Non-reduce-only orders are denied during exit, structurally preventing a fresh entry mid-exit.
-
HALTEDTradingState- flips the RiskEngine to refuse all new submits while still allowing cancels:- Submit and modify denied.
- Cancels pass through.
- Open positions can still be closed via reduce-only.
The Cortana MK3 shutdown sequence must therefore be:
# Pre-shutdown hook (e.g., on SIGTERM):
strategy.market_exit() # initiates flatten
risk_engine.set_trading_state(HALTED) # refuses new entries
# Wait for `post_market_exit` to fire (positions are flat OR
# market_exit_max_attempts exceeded - surface to operator)
node.stop() # only after flat
node.dispose()The operator (or launchd preflight) must orchestrate this
sequence. The framework provides the components; it does not
provide the policy. This is consistent with nautilus-execution.md’s
guidance that the “alert iff event iff broker action” property is
about routing, not about policy.
For feedback_no_kill_with_open_positions.md adherence:
- Pre-flight check:
cache.positions_open_count() == 0ANDcache.orders_inflight_count() == 0before allowingnode.stop(). - If non-zero: invoke
market_exit()and wait for the reduce-only flatten to complete. - Only after
is_completely_flat()is True does the operator shut down.
The hard rule from feedback_no_kill_with_open_positions.md is
preserved: position-state safety check is its own step, ABORT if
count > 0 unless explicitly approved (in which case market_exit()
runs first). Nautilus’s market_exit() is the safe-by-construction
implementation of “explicitly approved” - but the operator still
has to invoke it.
Watchdog patterns and healthchecks
The page does not document watchdog/healthcheck patterns explicitly. The continuous reconciliation loop is itself the liveness mechanism - if the loop is running and reconciliation keeps converging, the engine is healthy. If reconciliation starts emitting persistent discrepancies, that’s the canary.
Per nautilus-integrations.md, the IBKR adapter has its own
watchdog:
“Watchdog loop monitors the socket; auto-reconnect with
IB_MAX_CONNECTION_ATTEMPTSretries.”
Per nautilus-architecture.md’s production guidance:
“In production deployments, the system is typically configured with
panic = abortin release builds, ensuring that any panic results in a clean process termination that can be handled by process supervisors or orchestration systems.”
So the framework’s stance is: the engine is responsible for fail-fast on invariant violations; the process supervisor (launchd/systemd/k8s) is responsible for restart. Nautilus does not ship a healthcheck endpoint or a watchdog daemon - that’s the supervisor’s job.
For Cortana MK3:
- Replace MK2’s
feedback_watchdog_to_telegram.mdAI-meta watchdog with a launchd KeepAlive plist that restarts on exit. - Trading-event Telegrams (signals/fills/errors) come from an
Audit Logger Actor subscribed to
on_event(pernautilus-message-bus.md), not from a separate watchdog process. - Healthcheck = “is the LiveExecutionEngine in
RUNNINGstate and not emitting persistent reconciliation discrepancies?” This can be a custom Actor that subscribes toComponentStateChangedand to reconciliation events, and exposes status via a Redis key the dashboard reads.
Backtest vs live differences (this page’s scope)
The page calls out the high-level posture:
“The same actors, strategies, and execution algorithms run against both the backtest engine and a live trading node. Live trading involves real financial risk.”
Differences specific to reconciliation:
| Aspect | Backtest | Live |
|---|---|---|
| Reconciliation runs? | No - simulator owns truth on both sides | Yes - LiveExecutionEngine reconciles at startup + continuously |
reconciliation=True events emitted? | No | Yes, after each reconciliation cycle that finds a discrepancy |
| In-flight check loop active? | No | Yes |
| Open-orders poll active? | No | Yes |
| Own-books audit active? | No | Yes |
| Adapter-level reconnect? | No (no adapter, simulator) | Yes |
The strategy code is identical; the framework runs different machinery underneath.
Cortana MK3 implications
Direct mapping from MK2’s three live-failure modes to the Nautilus mechanisms that prevent them.
(a) Power outage state divergence (2026-05-06)
MK2 failure: power outage during a trading session; on restart,
the engine’s view of position state diverged from IBKR’s because
multiple parallel state stores (in-memory dicts, SQLite, Pickle
blobs) all rebooted from incomplete persisted snapshots and never
fully reconciled against IBKR updatePortfolio.
Nautilus prevention: structural.
- Cache externalized to Redis (per
nautilus-cache.md) survives the process exit. The Cache rehydrates from Redis on boot. LiveExecutionEnginereconciliation then runs the three adapter calls (generate_order_status_reports,generate_fill_reports,generate_position_status_reports) against IBKR.- Four invariants (position quantity, avg entry price, PnL
integrity, ID determinism) hold by construction. The engine
converges on broker truth before transitioning to
RUNNING. - If reconciliation fails, the engine refuses to start. No “boot into a half-state and hope” path exists.
The MK2 power-outage incident is structurally impossible because the cache rebuild + venue reconciliation + fail-fast-on-failure chain runs every single boot. There is no “happy path” that skips reconciliation; reconciliation IS the boot path.
(b) Workspace archive data loss (2026-04-22)
MK2 failure: gwangju workspace archive honored .gitignore,
silently destroying gitignored runtime state (decisions.db, signal
weights, .env, .venv). Phase 1 fix shipped same day:
three-tier externalization (config in repo, secrets in
~/.config/cortanaroi/, runtime state in ~/cortanaroi-data/).
Nautilus prevention: structural for trading state, partial for secrets.
- Trading state is in Redis. Workspace archive cannot touch it. The Cache rebuilds from Redis on boot, regardless of what happens to the workspace directory.
- Reconciliation against IBKR patches any gap between the most recent Redis snapshot and the venue’s actual state.
- Custom data (cooldown state, scoring history) lives in the
Cache too (via
cache.add(key, bytes)for unstructured data orStrategy.on_save()/on_load()for typed snapshots), so it survives. - What’s NOT solved by Nautilus: secrets (IBKR credentials,
UW API key) still need three-tier externalization
(
~/.config/cortanaroi/). Nautilus doesn’t prescribe a secret store - that’s our operational responsibility. - What’s NOT solved: if Redis itself runs on the same workspace volume that gets archived, we still lose state. Mitigation: run Redis on a separate host or volume, or use managed Redis with cross-zone replication.
The 2026-04-22 class is structurally addressed for trading state but operationally must be paired with a Redis topology that survives workspace archives.
(c) Launchd restart with open position (feedback_no_kill_with_open_positions.md)
MK2 caution: never kill the engine while a position is open.
Nautilus partial alignment:
- The default
node.stop()does NOT block on open positions - this is operator/policy responsibility. - BUT Nautilus provides the components to make this safe:
Strategy.market_exit()(graceful flatten),HALTEDTradingState(refuse new entries),is_completely_flat()predicate.
Cortana MK3 must wire the no-kill-with-open-positions invariant explicitly, exactly as MK2 does today:
- Pre-shutdown SIGTERM handler invokes
market_exit()on the primary Cortana strategy. - Trading state flipped to
HALTEDto refuse new entries. - Wait for
post_market_exitto fire (positions flat) or formarket_exit_max_attemptsto elapse. cache.positions_open_count() == 0andcache.orders_inflight_count() == 0are checked.- If still non-flat: surface to operator (Telegram), do NOT
auto-stop. This is the “ABORT and ask the user” branch from
feedback_no_kill_with_open_positions.md. - If flat:
node.stop()thennode.dispose().
This is policy code we own. Nautilus provides the primitives; Cortana provides the orchestration. The launchd preflight check that runs before the SIGTERM is sent should also enforce the invariant - exactly as MK2’s preflight does today.
Net assessment: Nautilus is more safe than MK2’s default launchd behavior because the engine self-reconciles on every boot, but the “refuse to kill while open positions exist” gate is still ours to implement. The key win is that the recovery path is now structural, even if the prevention path remains operator policy.
Caveats and gotchas
- Reconciliation events look real. Always check
event.reconciliationbefore treating an event as fresh broker action. Otherwise alert spam during startup. Pernautilus-execution.md. reconciliation_lookback_minsdefaults to “max venue history”. Setting a low value to “speed up boot” is a footgun- partial window adjustment kicks in and synthesizes fills, which is correct but generates noisier reconciliation events.
generate_missing_orders=Falseis dangerous. If you turn it off, position discrepancies are logged but not auto-aligned. The engine will start with a known-divergent state. DefaultTrueis the right answer.- External orders are a real category. A human trade on the
same paper account, or a leftover order from a previous
session, becomes an
EXTERNALstrategy order on boot. Your audit logger will see it; your strategy will not unless you declareexternal_order_claims. reconciliation_startup_delay_secs=10(default) - boot is at least 10 seconds slower than backtest because of this WebSocket-stabilization buffer. Lowering it risks duplicate fills via real-time vs reconciliation race.- Shutdown is NOT position-respecting by default. Wire
market_exit()+HALTEDexplicitly. - No documented healthcheck endpoint. Build one as a custom Actor if needed.
inflight_check_threshold_ms=5000ms(default) is the wait before acting on a discrepancy - lowering it raises duplicate- fill probability.- The framework stops if reconciliation fails at startup (verbatim: “the system logs an error and does not start”). Operator runbook must include “reconciliation failed → check venue connectivity, check account ID, consider flatten + retry” as a top-level recovery path.
market_exit()non-reduce-only orders are denied during exit. This is a feature: it prevents a freshon_dataentry mid-flatten. But strategy code that callssubmit_order(reduce_only=False)aftermarket_exit()will silently fail withOrderDenied. Test the path.
When this concept applies
- Designing the MK3 boot/shutdown sequence (launchd plist, preflight check, SIGTERM handler).
- Reasoning about reconnect behavior during a network partition.
- Auditing whether MK2’s three live-failure modes (power outage, workspace archive, launchd restart with open position) survive migration.
- Configuring
LiveExecEngineConfigdefaults for paper-trade bring-up. - Wiring an operator dashboard or healthcheck endpoint.
When it breaks / does not apply
- The page does not enumerate
LiveExecEngineConfigfields in full - refer to API reference. - Lifecycle methods on
TradingNode(build,run,dispose, signal handling) are not on this page; seegetting_started/live_trading/and the architecture page. - Redis cache config is in
nautilus-cache.md, not here. - The page is silent on multi-tenant boot orchestration; see
nautilus-message-bus.mdfor the producer/consumer pattern.
See Also
- Nautilus Architecture - kernel FSM,
crash-only design,
panic = abortproduction stance. - Nautilus Execution - RiskEngine,
ExecutionClient, four reconciliation report variants,
LiveExecEngineConfigknobs, race-condition handling,market_exit()mechanism. - Nautilus Cache - Redis externalization,
CacheConfig,flush_on_start,use_instance_id, cache-then-publish. - Nautilus Message Bus - external Redis Streams, producer/consumer pattern for multi-tenant.
- Nautilus Positions - Position object, OMS adjudication, NETTING vs HEDGING, reconciled positions.
- Nautilus Integrations - IBKR
adapter watchdog,
IB_MAX_CONNECTION_ATTEMPTS, paper account configuration. - 2026-05-09 Nautilus Spike Plan:
~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md project_data_loss_april22- workspace archive failure class, three-tier externalization Phase 1.feedback_no_kill_with_open_positions- invariant Nautilus partially supports (components yes, default policy no).project_pm_ibkr_exit_invariant- broker-truth alignment that Nautilus reconciliation enforces by construction.
Timeline
2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 3.