Nautilus Live Trading

The Nautilus concepts/live/ page is execution-reconciliation-centric: it documents how the LiveExecutionEngine aligns the venue’s actual order and position state with the system’s internal state at startup and continuously thereafter, and pins down four invariants (position quantity, average entry price, PnL integrity, ID determinism) that hold even when reconciliation windows miss fill history. The page is the canonical answer to MK2’s “open positions on restart” failure class - and to the 2026-04-22 data-loss class and the 2026-05-06 power-outage state-divergence class. Reconciliation is not an opt-in feature: “Unless reconciliation is set to false, the execution engine reconciles state for each venue at startup.” If reconciliation fails, “the system logs an error and does not start” - fail-fast at boot is the default. TradingNode lifecycle, signal handling, in-flight-order drain semantics, and Redis cache config are NOT on this page (they live in getting_started/live_trading/, the LiveExecEngineConfig API reference, and nautilus-cache.md); this page covers the reconciliation loop, the in-flight check loop, the open-orders poll, the own-books audit, and the partial-window adjustment scenarios.

Why this page exists (vs. siblings)

  • nautilus-architecture.md covers the kernel, the FSM (PRE_INITIALIZED → READY → RUNNING → STOPPED), the crash-only design (“Startup and crash recovery share the same code path”), and the production stance (panic = abort).
  • nautilus-execution.md covers the command/event pipeline, the RiskEngine, the ExecutionClient adapter contract, the four reconciliation report variants, race-condition handling, and the LiveExecEngineConfig knobs that govern continuous reconciliation (open_check_interval_secs, inflight_check_threshold_ms, reconciliation_startup_delay_secs, allow_overfills).
  • nautilus-cache.md covers Redis externalization, CacheConfig, flush_on_start, use_instance_id, and the cache-then-publish invariant.
  • nautilus-positions.md covers the Position object, OMS adjudication, NETTING vs HEDGING, and how reconciled positions land.
  • nautilus-message-bus.md covers the bus, external Redis Streams, and the producer/consumer pattern.

This page is the operational handbook for what happens between the moment a TradingNode boots and the moment it is ready to take a fresh entry order. It is what an operator (or a launchd preflight check) needs to know about boot, reconnect, in-flight drift, and shutdown.

Core claim

Live trading in Nautilus is backtested-strategy code, deployed unchanged, behind a LiveExecutionEngine that owns reconciliation. The strategy author writes one program; the framework rebuilds the strategy’s view of reality from venue truth on every startup, every reconnect, and continuously thereafter.

“NautilusTrader deploys backtested strategies to live markets with no code changes. The same actors, strategies, and execution algorithms run against both the backtest engine and a live trading node.”

“Live trading involves real financial risk. Before deploying to production, understand system configuration, node operations, execution reconciliation, and the differences between backtesting and live trading.”

The author of this page wants you to internalize four things before flipping the live switch:

  1. Configuration (a TradingNodeConfig with data + exec clients).
  2. Node operations (build → start → run → stop → dispose).
  3. Execution reconciliation (this page’s main subject).
  4. Backtest-vs-live differences (latency, real fills, real risk).

TradingNode lifecycle

The concepts/live/ page itself does not document the TradingNode lifecycle methods - it points to the Configure a live trading node how-to and the configuration concept guide. Pulling the lifecycle from nautilus-architecture.md (FSM) plus the standard Nautilus example shape:

from nautilus_trader.live.node import TradingNode
from nautilus_trader.config import TradingNodeConfig
 
config = TradingNodeConfig(
    trader_id="CORTANA-PAPER",
    data_clients={"IB": data_config},
    exec_clients={"IB": exec_config},
    cache=cache_config,                 # optional Redis backing
    message_bus=message_bus_config,     # optional external streams
    data_engine=LiveDataEngineConfig(...),
    exec_engine=LiveExecEngineConfig(...),
    risk_engine=RiskEngineConfig(...),
)
 
node = TradingNode(config=config)
node.add_strategy(MyStrategy(...))
node.add_actor(AuditLogger(...))
 
node.build()        # wire components, register adapters
node.run()          # blocks; performs reconciliation, then dispatches
                    # SIGINT / SIGTERM trigger graceful shutdown
node.dispose()      # tear down clients, flush writers, release resources

State machine (from nautilus-architecture.md)

Stable states: PRE_INITIALIZED, READY, RUNNING, STOPPED, DEGRADED, FAULTED, DISPOSED.

Transitional states (brief): STARTING, STOPPING, RESUMING, RESETTING, DISPOSING, DEGRADING, FAULTING.

The architecture page commits to:

“The system does provide graceful shutdown flows (stop, dispose) for normal operation. These tear down clients, persist state, and flush writers. The crash-only philosophy applies specifically to unrecoverable faults where attempting graceful cleanup could cause further damage.”

So the intended shutdown path on SIGTERM is:

  1. STOPPING - stop accepting new commands (the engine no longer dispatches new strategy events).
  2. Tear down clients - disconnect from venues.
  3. Persist state - Cache flush to Redis (if configured).
  4. Flush writers - Parquet catalog, audit sinks.
  5. STOPPED - components are quiescent.
  6. DISPOSINGDISPOSED - release resources.

Open question on this page (and the architecture page): does stop() block on draining in-flight orders, or does it terminate abruptly leaving in-flight orders unresolved? See “Shutdown semantics - does it respect open positions?” below.

Reconnection behavior

The concepts/live/ page does not document reconnection mechanics end-to-end, but several signals indicate the model:

  1. Continuous reconciliation IS the reconnect-recovery story. Per the page: “These [in-flight] orders are monitored by the continuous reconciliation loop to detect stale or lost messages.” When the WebSocket drops and reconnects, the open-orders poll and the in-flight-order check together detect any state divergence that occurred during the gap and synthesize the missing events.
  2. Adapter-level reconnect lives in the ExecutionClient / DataClient. Per nautilus-integrations.md (IBKR): “Watchdog loop monitors the socket; auto-reconnect with IB_MAX_CONNECTION_ATTEMPTS retries.” Each adapter is responsible for its own reconnect logic; the engine then re-runs reconciliation against the now-connected venue.
  3. No explicit “reconnect” event documented on this page. The reconciliation events that fire after a reconnect carry reconciliation=True so audit consumers can distinguish them from fresh broker activity (per nautilus-execution.md).

For Cortana: reconnect-driven reconciliation means the engine self-heals from network drops. The MK2 model - where a Telegram alert might be the first signal that “we’re disconnected and the position is bleeding” - is replaced by structural automatic recovery.

Reconcile-on-startup - the canonical pattern

This is the most load-bearing section. Verbatim from the doc:

“Execution reconciliation aligns the venue’s actual order and position state with the system’s internal state built from events. Only the LiveExecutionEngine performs reconciliation, since backtesting controls both sides.”

“Unless reconciliation is set to false, the execution engine reconciles state for each venue at startup.”

“If reconciliation fails, the system logs an error and does not start.”

The engine fails to start if reconciliation fails. This is the fail-fast at boot posture from nautilus-architecture.md applied operationally. There is no “alert and degrade” path - the engine either reconciles cleanly or refuses to come up.

Two startup scenarios

“Two scenarios:

  • Cached state exists: report data generates missing events to align the state.
  • No cached state: all orders and positions at the venue are generated from scratch.”

The first case is the normal restart (Redis-backed Cache survives the process exit; reconciliation just patches deltas). The second is a clean-room boot - no Cache, the engine rebuilds the order and position graph entirely from venue mass status reports.

“Persist all execution events to the cache database. This reduces reliance on venue history and allows full recovery even with short lookback windows.”

This is the doc’s recommendation: always persist execution events to the cache database. The combination of “Cache backed by Redis”

  • “cache database persists execution events” is what makes reconciliation cheap and bounded.

Reconciliation procedure - the three calls

“All adapter execution clients follow the same reconciliation procedure, calling three methods to produce an execution mass status:

  • generate_order_status_reports
  • generate_fill_reports
  • generate_position_status_reports

Each adapter implements these three methods. The engine then walks the reports through the procedure below.

Reconciliation procedure - the steps (verbatim)

Duplicate check: Deduplicates order reports within the batch and logs warnings. Logs duplicate trade IDs as warnings for investigation.”

Order reconciliation: Generates and applies events to move orders from cached state to current state. Infers OrderFilled events for missing trade reports. Generates external order events for unrecognized client order IDs or reports missing a client order ID. Verifies fill report data consistency with tolerance-based price and commission comparisons.”

Position reconciliation: Matches the net position per instrument against venue position reports using instrument precision. Generates external order events when order reconciliation leaves a position that differs from the venue. When generate_missing_orders is enabled (default: True), generates orders with strategy ID EXTERNAL and tag RECONCILIATION to align discrepancies.”

The price hierarchy used by reconciliation orders, in order of preference:

  1. Calculated reconciliation price (preferred) - targets the correct average position.
  2. Market mid-price - uses the current bid-ask midpoint.
  3. Current position average - uses the existing position’s average price.
  4. MARKET order (last resort) - “used only when no price data exists (no positions, no market data).”

“Uses LIMIT orders when a price can be determined (cases 1-3) to preserve PnL accuracy. Skips zero quantity differences after precision rounding.”

What signals “reconciliation complete, ready to trade”?

The page does not name a single named event such as ReconciliationComplete. The canonical signal is the engine transitioning to RUNNING state (per the architecture FSM). The mechanism:

  1. start() is called.
  2. LiveExecutionEngine begins reconciliation against each registered venue.
  3. Reconciliation runs through the procedure above; any synthesized events flow through the normal pipeline carrying reconciliation=True (per nautilus-execution.md).
  4. If reconciliation fails: the system logs an error and does not start. The engine never reaches RUNNING. Boot aborts.
  5. If reconciliation succeeds: the engine transitions to RUNNING, strategies begin receiving fresh data and events, and the Strategy.on_start() lifecycle hook fires - which is the operational signal that “reconciliation is complete and we are ready to trade.”

So the “reconciliation complete” signal is implicit: Strategy.on_start() running ≡ reconciliation has succeeded ≡ engine is RUNNING. Anything an operator (or audit logger) wants to gate on “reconciliation done” should subscribe to the ComponentStateChanged event for the LiveExecutionEngine, or hook off the strategy’s on_start() lifecycle method.

A reconciliation_startup_delay_secs window (default 10s; per nautilus-execution.md) is provided for WebSocket connections to stabilize before continuous reconciliation begins. This is a buffer between “data clients connected” and “in-flight check loop active”, not a signal an external consumer should wait on.

Reconciliation invariants (verbatim)

“The reconciliation system maintains four invariants:

  • Position quantity: the final quantity matches the venue within instrument precision.
  • Average entry price: the position’s average entry price matches the venue’s reported price within tolerance (default 0.01%).
  • PnL integrity: all generated fills, including synthetic fills, use calculated prices that preserve correct unrealized PnL.
  • ID determinism: synthetic trade_id and venue_order_id values emitted during reconciliation are deterministic functions of the logical event. The same logical fill or position-adjustment order produces the same ID across restarts, so replayed reconciliation events dedupe against earlier runs instead of being treated as new.”

“These hold even when:

  • The reconciliation window misses complete fill history.
  • Fills are missing from venue reports.
  • Position lifecycles span beyond the lookback window.
  • Multiple zero-crossings have occurred.”

The fourth invariant (ID determinism) is the structural property that makes “restart twice in five minutes” safe - replayed reconciliation events dedupe against earlier runs.

In-flight order checks - the runtime continuous loop

Three terminology states the doc names verbatim:

“An in-flight order is one awaiting venue acknowledgement:

  • SUBMITTED - initial submission, awaiting accept/reject.
  • PENDING_UPDATE - modification requested, awaiting confirmation.
  • PENDING_CANCEL - cancellation requested, awaiting confirmation.”

“These orders are monitored by the continuous reconciliation loop to detect stale or lost messages.”

The runtime check table (verbatim):

ScenarioSystem behavior
In-flight order timeout - order remains unconfirmed beyond thresholdAfter inflight_check_retries, resolves to REJECTED.
Open orders check discrepancy - periodic poll detects a venue state changeConfirms status at open_check_interval_secs and applies transitions.
Own books audit mismatch - own order books diverge from venue public booksAudits at own_books_audit_interval_secs, logs inconsistencies.

This is the structural fix to MK2’s “alert without action” class. A SUBMITTED order that never gets a venue ack does not silently sit forever - after inflight_check_retries polls, the engine treats it as REJECTED, emits the typed event, and the strategy’s on_order_rejected fires.

Configuration knobs (LiveExecEngineConfig)

The page mentions these knobs by name but defers the full enumeration to the API reference. Compiled list (this page + nautilus-execution.md):

KnobDefaultEffect
reconciliationTrueMaster switch - turn off only for testing
reconciliation_lookback_minsunset (= max venue history)History window the adapter requests
generate_missing_ordersTrueGenerate EXTERNAL/RECONCILIATION orders to align position discrepancies
external_order_claimsemptyPer-strategy (strategy_id, instrument_id) claims for orders that pre-existed reconciliation
filter_unclaimed_external_orders(default unspecified on this page)Whether to filter unclaimed externals from the strategy event stream
filtered_client_order_idsemptySpecific client order IDs to skip during reconciliation
inflight_check_interval_ms(per API ref)Cadence of the in-flight order check
inflight_check_threshold_ms5000ms (per nautilus-execution.md)Wait threshold before acting on a discrepancy
inflight_check_retries(per API ref)After this many failed re-checks, in-flight order resolves to REJECTED
open_check_interval_secs(per API ref)Cadence of the open-orders venue poll
open_check_threshold_ms5000ms (per nautilus-execution.md)Wait threshold before acting on an open-order discrepancy
own_books_audit_interval_secs(per API ref)Cadence of the own-order-book audit
reconciliation_startup_delay_secs10Buffer to let WebSocket connections stabilize before continuous reconciliation begins
allow_overfillsFalseIf True, applies overfills with overfill_qty tracking; if False, rejects + logs

“For all live trading options, see the LiveExecEngineConfig API Reference.”

The doc explicitly notes:

“Leave reconciliation_lookback_mins unset. This lets the engine request the maximum execution history the venue provides. Executions before the lookback window still generate alignment events, but with some information loss that a longer window would avoid. Some venues also filter or drop older execution data. Persisting all events to the cache database prevents both issues.”

So the recommended posture: leave lookback unset, persist all events to the Cache database (Redis).

External order claims - surviving a clean-room boot

“Each strategy can claim external orders for an instrument ID generated during reconciliation via the external_order_claims config parameter. This lets a strategy resume managing open orders when no cached state exists.”

“Orders generated with strategy ID EXTERNAL and tag RECONCILIATION during position reconciliation are internal to the engine. They cannot be claimed via external_order_claims and should not be managed by user strategies. To detect external orders in your strategy, check order.strategy_id.value == "EXTERNAL". These orders participate in portfolio calculations and position tracking like any other order.”

The two-tag taxonomy:

  • EXTERNAL + tag VENUE - pre-existed at the venue, the engine adopted them. Claimable by a strategy via external_order_claims.
  • EXTERNAL + tag RECONCILIATION - synthesized by the engine to align position discrepancies. NOT claimable - engine-internal.

For Cortana’s “clean-room boot after a workspace archive” scenario: the strategy can declare external_order_claims = {(StrategyId, InstrumentId)} so that any pre-existing open SPY 0DTE positions are adopted by the strategy on restart, and the strategy’s on_order_event / on_position_event handlers fire as if they were its own orders.

Common reconciliation issues (verbatim)

Missing trade reports: Some venues filter out older trades. Increase reconciliation_lookback_mins or cache all events locally.”

Position mismatches: External orders that predate the lookback window cause position drift. Flatten the account before restarting to reset state.”

Duplicate order IDs: Deduplicated with warnings logged. Frequent duplicates may indicate venue data integrity issues.”

Precision differences: Small decimal differences are handled using instrument precision. Large discrepancies may indicate missing orders.”

Out-of-order reports: Fill reports arriving before order status reports are deferred until order state is available.”

“For persistent issues, drop cached state or flatten accounts before restarting.”

The “flatten the account before restarting” suggestion is the doc’s escape hatch when reconciliation can’t converge. For Cortana this maps cleanly to MK2’s “kill engine, manual flatten, restart” recovery playbook - same operational pattern, but Nautilus makes it the exception, not the rule.

Partial window adjustment scenarios

When reconciliation_lookback_mins is set, the window may miss opening fills. The doc enumerates lifecycle reconstruction scenarios verbatim:

ScenarioSystem behavior
Complete lifecycle - all fills capturedNo adjustment.
Incomplete single lifecycle - window misses opening fills, no zero-crossingsAdds synthetic opening fill with calculated price.
Multiple lifecycles, current matches venueFilters out old lifecycles, returns current only.
Multiple lifecycles, current mismatchReplaces current lifecycle with a single synthetic fill.
Flat positionNo adjustment.
No fillsNo adjustment, empty result.

Key concepts (verbatim):

  • Zero-crossing: position quantity crosses through zero (FLAT), marking a lifecycle boundary.
  • Lifecycle: a sequence of fills between zero-crossings representing one open-close cycle.
  • Synthetic fill: a calculated fill report representing missing activity, priced to achieve the correct average position.
  • Tolerance: position matching uses configurable price tolerance (default 0.0001 = 0.01%) to absorb minor calculation differences.

For Cortana: SPY 0DTE positions are short-lived; a single trading day rarely has a position older than the lookback window. The partial-window adjustment is more relevant for restart sequences that span multiple days (overnight crash recovery), which is rare in 0DTE but can happen during a Friday-night-into-Monday-morning incident.

Shutdown semantics - does it respect open positions?

Direct answer: NO, not by default. The framework’s shutdown flow is mechanical - stop() tears down clients, persists state, and flushes writers (per nautilus-architecture.md). It does NOT intrinsically:

  • Block waiting for open positions to be closed.
  • Issue reduce-only market exits before shutdown.
  • Drain in-flight orders to a terminal state before disposing.
  • Refuse to shut down while any position has signed_qty != 0.

The concepts/live/ page is silent on shutdown semantics specifically. The architecture page states:

“These tear down clients, persist state, and flush writers.”

That’s the entire shutdown contract. There is no documented position-respecting safety gate.

Implication for feedback_no_kill_with_open_positions.md

The MK2 invariant - never kill the engine while a position is open - is NOT structurally enforced by Nautilus’s default shutdown path. We must wire it explicitly. Two options, both documented in nautilus-execution.md:

  1. Strategy.market_exit() - the supported graceful flatten path:

    • Cancels all open and in-flight orders.
    • Closes all open positions with reduce-only market orders.
    • Periodically re-checks (market_exit_interval_ms, market_exit_max_attempts).
    • Calls post_market_exit once flat or after max attempts.
    • Non-reduce-only orders are denied during exit, structurally preventing a fresh entry mid-exit.
  2. HALTED TradingState - flips the RiskEngine to refuse all new submits while still allowing cancels:

    • Submit and modify denied.
    • Cancels pass through.
    • Open positions can still be closed via reduce-only.

The Cortana MK3 shutdown sequence must therefore be:

# Pre-shutdown hook (e.g., on SIGTERM):
strategy.market_exit()                  # initiates flatten
risk_engine.set_trading_state(HALTED)   # refuses new entries
# Wait for `post_market_exit` to fire (positions are flat OR
# market_exit_max_attempts exceeded - surface to operator)
node.stop()                             # only after flat
node.dispose()

The operator (or launchd preflight) must orchestrate this sequence. The framework provides the components; it does not provide the policy. This is consistent with nautilus-execution.md’s guidance that the “alert iff event iff broker action” property is about routing, not about policy.

For feedback_no_kill_with_open_positions.md adherence:

  • Pre-flight check: cache.positions_open_count() == 0 AND cache.orders_inflight_count() == 0 before allowing node.stop().
  • If non-zero: invoke market_exit() and wait for the reduce-only flatten to complete.
  • Only after is_completely_flat() is True does the operator shut down.

The hard rule from feedback_no_kill_with_open_positions.md is preserved: position-state safety check is its own step, ABORT if count > 0 unless explicitly approved (in which case market_exit() runs first). Nautilus’s market_exit() is the safe-by-construction implementation of “explicitly approved” - but the operator still has to invoke it.

Watchdog patterns and healthchecks

The page does not document watchdog/healthcheck patterns explicitly. The continuous reconciliation loop is itself the liveness mechanism - if the loop is running and reconciliation keeps converging, the engine is healthy. If reconciliation starts emitting persistent discrepancies, that’s the canary.

Per nautilus-integrations.md, the IBKR adapter has its own watchdog:

“Watchdog loop monitors the socket; auto-reconnect with IB_MAX_CONNECTION_ATTEMPTS retries.”

Per nautilus-architecture.md’s production guidance:

“In production deployments, the system is typically configured with panic = abort in release builds, ensuring that any panic results in a clean process termination that can be handled by process supervisors or orchestration systems.”

So the framework’s stance is: the engine is responsible for fail-fast on invariant violations; the process supervisor (launchd/systemd/k8s) is responsible for restart. Nautilus does not ship a healthcheck endpoint or a watchdog daemon - that’s the supervisor’s job.

For Cortana MK3:

  • Replace MK2’s feedback_watchdog_to_telegram.md AI-meta watchdog with a launchd KeepAlive plist that restarts on exit.
  • Trading-event Telegrams (signals/fills/errors) come from an Audit Logger Actor subscribed to on_event (per nautilus-message-bus.md), not from a separate watchdog process.
  • Healthcheck = “is the LiveExecutionEngine in RUNNING state and not emitting persistent reconciliation discrepancies?” This can be a custom Actor that subscribes to ComponentStateChanged and to reconciliation events, and exposes status via a Redis key the dashboard reads.

Backtest vs live differences (this page’s scope)

The page calls out the high-level posture:

“The same actors, strategies, and execution algorithms run against both the backtest engine and a live trading node. Live trading involves real financial risk.”

Differences specific to reconciliation:

AspectBacktestLive
Reconciliation runs?No - simulator owns truth on both sidesYes - LiveExecutionEngine reconciles at startup + continuously
reconciliation=True events emitted?NoYes, after each reconciliation cycle that finds a discrepancy
In-flight check loop active?NoYes
Open-orders poll active?NoYes
Own-books audit active?NoYes
Adapter-level reconnect?No (no adapter, simulator)Yes

The strategy code is identical; the framework runs different machinery underneath.

Cortana MK3 implications

Direct mapping from MK2’s three live-failure modes to the Nautilus mechanisms that prevent them.

(a) Power outage state divergence (2026-05-06)

MK2 failure: power outage during a trading session; on restart, the engine’s view of position state diverged from IBKR’s because multiple parallel state stores (in-memory dicts, SQLite, Pickle blobs) all rebooted from incomplete persisted snapshots and never fully reconciled against IBKR updatePortfolio.

Nautilus prevention: structural.

  1. Cache externalized to Redis (per nautilus-cache.md) survives the process exit. The Cache rehydrates from Redis on boot.
  2. LiveExecutionEngine reconciliation then runs the three adapter calls (generate_order_status_reports, generate_fill_reports, generate_position_status_reports) against IBKR.
  3. Four invariants (position quantity, avg entry price, PnL integrity, ID determinism) hold by construction. The engine converges on broker truth before transitioning to RUNNING.
  4. If reconciliation fails, the engine refuses to start. No “boot into a half-state and hope” path exists.

The MK2 power-outage incident is structurally impossible because the cache rebuild + venue reconciliation + fail-fast-on-failure chain runs every single boot. There is no “happy path” that skips reconciliation; reconciliation IS the boot path.

(b) Workspace archive data loss (2026-04-22)

MK2 failure: gwangju workspace archive honored .gitignore, silently destroying gitignored runtime state (decisions.db, signal weights, .env, .venv). Phase 1 fix shipped same day: three-tier externalization (config in repo, secrets in ~/.config/cortanaroi/, runtime state in ~/cortanaroi-data/).

Nautilus prevention: structural for trading state, partial for secrets.

  1. Trading state is in Redis. Workspace archive cannot touch it. The Cache rebuilds from Redis on boot, regardless of what happens to the workspace directory.
  2. Reconciliation against IBKR patches any gap between the most recent Redis snapshot and the venue’s actual state.
  3. Custom data (cooldown state, scoring history) lives in the Cache too (via cache.add(key, bytes) for unstructured data or Strategy.on_save()/on_load() for typed snapshots), so it survives.
  4. What’s NOT solved by Nautilus: secrets (IBKR credentials, UW API key) still need three-tier externalization (~/.config/cortanaroi/). Nautilus doesn’t prescribe a secret store - that’s our operational responsibility.
  5. What’s NOT solved: if Redis itself runs on the same workspace volume that gets archived, we still lose state. Mitigation: run Redis on a separate host or volume, or use managed Redis with cross-zone replication.

The 2026-04-22 class is structurally addressed for trading state but operationally must be paired with a Redis topology that survives workspace archives.

(c) Launchd restart with open position (feedback_no_kill_with_open_positions.md)

MK2 caution: never kill the engine while a position is open.

Nautilus partial alignment:

  • The default node.stop() does NOT block on open positions - this is operator/policy responsibility.
  • BUT Nautilus provides the components to make this safe: Strategy.market_exit() (graceful flatten), HALTED TradingState (refuse new entries), is_completely_flat() predicate.

Cortana MK3 must wire the no-kill-with-open-positions invariant explicitly, exactly as MK2 does today:

  1. Pre-shutdown SIGTERM handler invokes market_exit() on the primary Cortana strategy.
  2. Trading state flipped to HALTED to refuse new entries.
  3. Wait for post_market_exit to fire (positions flat) or for market_exit_max_attempts to elapse.
  4. cache.positions_open_count() == 0 and cache.orders_inflight_count() == 0 are checked.
  5. If still non-flat: surface to operator (Telegram), do NOT auto-stop. This is the “ABORT and ask the user” branch from feedback_no_kill_with_open_positions.md.
  6. If flat: node.stop() then node.dispose().

This is policy code we own. Nautilus provides the primitives; Cortana provides the orchestration. The launchd preflight check that runs before the SIGTERM is sent should also enforce the invariant - exactly as MK2’s preflight does today.

Net assessment: Nautilus is more safe than MK2’s default launchd behavior because the engine self-reconciles on every boot, but the “refuse to kill while open positions exist” gate is still ours to implement. The key win is that the recovery path is now structural, even if the prevention path remains operator policy.

Caveats and gotchas

  • Reconciliation events look real. Always check event.reconciliation before treating an event as fresh broker action. Otherwise alert spam during startup. Per nautilus-execution.md.
  • reconciliation_lookback_mins defaults to “max venue history”. Setting a low value to “speed up boot” is a footgun
    • partial window adjustment kicks in and synthesizes fills, which is correct but generates noisier reconciliation events.
  • generate_missing_orders=False is dangerous. If you turn it off, position discrepancies are logged but not auto-aligned. The engine will start with a known-divergent state. Default True is the right answer.
  • External orders are a real category. A human trade on the same paper account, or a leftover order from a previous session, becomes an EXTERNAL strategy order on boot. Your audit logger will see it; your strategy will not unless you declare external_order_claims.
  • reconciliation_startup_delay_secs=10 (default) - boot is at least 10 seconds slower than backtest because of this WebSocket-stabilization buffer. Lowering it risks duplicate fills via real-time vs reconciliation race.
  • Shutdown is NOT position-respecting by default. Wire market_exit() + HALTED explicitly.
  • No documented healthcheck endpoint. Build one as a custom Actor if needed.
  • inflight_check_threshold_ms=5000ms (default) is the wait before acting on a discrepancy - lowering it raises duplicate- fill probability.
  • The framework stops if reconciliation fails at startup (verbatim: “the system logs an error and does not start”). Operator runbook must include “reconciliation failed → check venue connectivity, check account ID, consider flatten + retry” as a top-level recovery path.
  • market_exit() non-reduce-only orders are denied during exit. This is a feature: it prevents a fresh on_data entry mid-flatten. But strategy code that calls submit_order(reduce_only=False) after market_exit() will silently fail with OrderDenied. Test the path.

When this concept applies

  • Designing the MK3 boot/shutdown sequence (launchd plist, preflight check, SIGTERM handler).
  • Reasoning about reconnect behavior during a network partition.
  • Auditing whether MK2’s three live-failure modes (power outage, workspace archive, launchd restart with open position) survive migration.
  • Configuring LiveExecEngineConfig defaults for paper-trade bring-up.
  • Wiring an operator dashboard or healthcheck endpoint.

When it breaks / does not apply

  • The page does not enumerate LiveExecEngineConfig fields in full - refer to API reference.
  • Lifecycle methods on TradingNode (build, run, dispose, signal handling) are not on this page; see getting_started/live_trading/ and the architecture page.
  • Redis cache config is in nautilus-cache.md, not here.
  • The page is silent on multi-tenant boot orchestration; see nautilus-message-bus.md for the producer/consumer pattern.

See Also

  • Nautilus Architecture - kernel FSM, crash-only design, panic = abort production stance.
  • Nautilus Execution - RiskEngine, ExecutionClient, four reconciliation report variants, LiveExecEngineConfig knobs, race-condition handling, market_exit() mechanism.
  • Nautilus Cache - Redis externalization, CacheConfig, flush_on_start, use_instance_id, cache-then-publish.
  • Nautilus Message Bus - external Redis Streams, producer/consumer pattern for multi-tenant.
  • Nautilus Positions - Position object, OMS adjudication, NETTING vs HEDGING, reconciled positions.
  • Nautilus Integrations - IBKR adapter watchdog, IB_MAX_CONNECTION_ATTEMPTS, paper account configuration.
  • 2026-05-09 Nautilus Spike Plan: ~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md
  • project_data_loss_april22 - workspace archive failure class, three-tier externalization Phase 1.
  • feedback_no_kill_with_open_positions - invariant Nautilus partially supports (components yes, default policy no).
  • project_pm_ibkr_exit_invariant - broker-truth alignment that Nautilus reconciliation enforces by construction.

Timeline

2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 3.