How-To - Run Live Trading

URL resolution (Spike Plan Step 0): The user’s prompt pasted write_rust_strategy/ as the Rust how-to URL - assumed copy-paste typo. Tried the natural parallel slug https://nautilustrader.io/docs/latest/how_to/run_rust_live_trading/HTTP 200 (canonical). Tried https://nautilustrader.io/docs/latest/how_to/run_python_live_trading/HTTP 404. Fallback URLs live_trading/ and running_live/ also returned 404. Conclusion: as of 2026-05-07 there is no Python “Run Live Trading” how-to published in the Nautilus docs. The Python lifecycle story is split across (a) concepts/live/ (the reconciliation-centric concept page, captured in nautilus-live.md) and (b) the how_to/configure_live_trading recipe (captured at index level in nautilus-how-to.md). This page therefore extracts the Rust how-to verbatim, then translates the lifecycle, signal-handling, and shutdown patterns to Python (Cortana’s actual deployment language) by cross-referencing nautilus-live.md, nautilus-execution.md, and nautilus-architecture.md - calling out every point where Python and Rust diverge or where the Python answer is inferred rather than verbatim from a how-to. The Spike Plan Step 0 deliverable - “does a Python how-to exist?” - is therefore answered NO; the spike must rely on the concept guide + configure-node how-to + source-code examples to reconstruct the Python orchestration shape.

Core claim

The Rust how-to commits to a single shape: build a LiveNode via builder, register data + exec client factories, register strategies, call run().await, let tokio block until SIGINT or programmatic shutdown. The Python equivalent (from nautilus-live.md and the standard examples) is the same shape: build a TradingNode via TradingNodeConfig, register adapters via factories, call node.add_strategy(...), node.build(), node.run() - the call blocks until SIGINT or programmatic stop. Reconciliation runs at startup; the engine refuses to start if it fails. There is no documented healthcheck endpoint, no documented watchdog daemon, and no position-respecting shutdown gate - that is operator policy on top of the framework primitives.

What “ready to trade” means (reconcile-on-startup signal)

Per nautilus-live.md: the framework does NOT emit a single named event such as ReconciliationComplete. The canonical signal is the engine transitioning to RUNNING state in the FSM (nautilus-architecture.md). Operationally:

  1. node.run() (Python) / node.run().await (Rust) is called.
  2. LiveExecutionEngine runs the three reconciliation calls against each registered venue: generate_order_status_reports, generate_fill_reports, generate_position_status_reports.
  3. The engine walks the reports through duplicate-check → order reconciliation → position reconciliation.
  4. If reconciliation fails: “the system logs an error and does not start” (verbatim from concepts/live/). The engine never reaches RUNNING. Boot aborts.
  5. If reconciliation succeeds: the engine transitions to RUNNING, strategies receive their first events, and Strategy.on_start() fires. on_start() running ≡ reconciliation succeeded ≡ ready to trade.

A reconciliation_startup_delay_secs window (default 10s) buffers WebSocket stabilization before the continuous reconciliation loops (in-flight check, open-orders poll, own-books audit) begin. Boot is therefore at least 10 seconds slower than backtest.

For Cortana: the operator dashboard / Telegram alert wiring should gate “Cortana is up” on a ComponentStateChanged event for the LiveExecutionEngine reaching RUNNING, OR equivalently on the strategy’s on_start() executing. Anything earlier is premature.

Rust how-to - verbatim shape

The Rust how-to (the only published one) walks through an OKX example. Salient verbatim pieces:

Dependencies (Cargo.toml):

[dependencies]
nautilus-common  = "0.55"
nautilus-live    = "0.55"
nautilus-model   = "0.55"
nautilus-okx     = "0.55"
nautilus-trading = { version = "0.55", features = ["examples"] }
 
anyhow   = "1"
dotenvy  = "0.15"
log      = "0.4"
tokio    = { version = "1", features = ["full"] }

Build the node (builder pattern):

use log::LevelFilter;
use nautilus_common::{enums::Environment, logging::logger::LoggerConfig};
use nautilus_live::node::LiveNode;
use nautilus_model::identifiers::{AccountId, TraderId};
use nautilus_okx::{
    common::enums::OKXInstrumentType,
    config::{OKXDataClientConfig, OKXExecClientConfig},
    factories::{OKXDataClientFactory, OKXExecutionClientFactory},
};
 
let trader_id  = TraderId::from("TESTER-001");
let account_id = AccountId::from("OKX-001");
 
let data_config = OKXDataClientConfig {
    instrument_types: vec![OKXInstrumentType::Swap],
    ..Default::default()
};
 
let exec_config = OKXExecClientConfig {
    trader_id,
    account_id,
    instrument_types: vec![OKXInstrumentType::Swap],
    ..Default::default()
};
 
let log_config = LoggerConfig {
    stdout_level: LevelFilter::Info,
    ..Default::default()
};
 
let mut node = LiveNode::builder(trader_id, Environment::Live)?
    .with_name("MY-NODE-001".to_string())
    .with_logging(log_config)
    .add_data_client(
        None,
        Box::new(OKXDataClientFactory::new()),
        Box::new(data_config),
    )?
    .add_exec_client(
        None,
        Box::new(OKXExecutionClientFactory::new()),
        Box::new(exec_config),
    )?
    .with_reconciliation(false)        // simplified; enable in production
    .with_delay_post_stop_secs(5)
    .build()?;

The doc explicitly warns: “This example disables reconciliation for simplicity. In production, remove .with_reconciliation(false) so the engine aligns cached state with the venue on startup.” For Cortana, never set with_reconciliation(false) - the entire reason to migrate is the structural reconciliation guarantee.

Add strategies and run:

use nautilus_model::{identifiers::InstrumentId, types::Quantity};
use nautilus_trading::examples::strategies::{
    GridMarketMaker, GridMarketMakerConfig,
};
 
let mut config = GridMarketMakerConfig::new(
    InstrumentId::from("ETH-USDT-SWAP.OKX"),
    Quantity::from("0.10"),
)
    .with_num_levels(3)
    .with_grid_step_bps(100)
    .with_skew_factor(0.5)
    .with_requote_threshold_bps(10)
    .with_expire_time_secs(8)
    .with_on_cancel_resubmit(true);
 
config.base.use_hyphens_in_client_order_ids = false; // OKX-specific
 
let strategy = GridMarketMaker::new(config);
 
node.add_strategy(strategy)?;
node.run().await?;

“The node runs until interrupted (Ctrl+C) or shut down programmatically.”

Async runtime requirement (Rust):

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    dotenvy::dotenv().ok();
    // ... node setup ...
    node.run().await?;
    Ok(())
}

Adapter examples directory (verbatim list - Cortana cares about the IBKR equivalent, but it’s not in this list because IBKR is Python-only as of 0.55):

AdapterExample directory
Architect AXcrates/adapters/architect_ax/examples/
Betfaircrates/adapters/betfair/examples/
Binancecrates/adapters/binance/examples/
BitMEXcrates/adapters/bitmex/examples/
Bybitcrates/adapters/bybit/examples/
Databentocrates/adapters/databento/examples/
Deribitcrates/adapters/deribit/examples/
dYdXcrates/adapters/dydx/examples/
Hyperliquidcrates/adapters/hyperliquid/examples/
Krakencrates/adapters/kraken/examples/
OKXcrates/adapters/okx/examples/
Polymarketcrates/adapters/polymarket/examples/
Sandboxcrates/adapters/sandbox/examples/
Tardiscrates/adapters/tardis/examples/

IBKR is conspicuously absent from this list - meaning the Rust how-to is not the right reference for Cortana’s IBKR deployment. The IBKR adapter is Python (per nautilus-ib.md), which is precisely why the missing Python how-to matters.

Python lifecycle (translated; not verbatim from a how-to)

Compiled from nautilus-live.md + nautilus-architecture.md + nautilus-execution.md:

PhaseMethodWhat happens
ConstructTradingNode(config=...)Wires config, no I/O
Wirenode.add_strategy(...), node.add_actor(...)Register strategy + actor instances
Buildnode.build()Resolve adapters, register clients via factories
Runnode.run()Blocking; performs reconciliation; transitions through STARTING → RUNNING; dispatches events; SIGINT/SIGTERM trigger graceful path
Stopnode.stop()STOPPING; tears down clients; persists state; flushes writers; reaches STOPPED
Disposenode.dispose()DISPOSING → DISPOSED; releases resources

Async runtime (Python)

Unlike Rust’s #[tokio::main], Python’s TradingNode.run() manages its own asyncio loop internally. The standard pattern is synchronous-looking from __main__:

node = TradingNode(config=config)
node.add_strategy(strategy)
node.build()
node.run()       # blocks; manages asyncio loop internally
node.dispose()

The framework installs SIGINT and SIGTERM handlers internally during run(). On signal, the handler triggers the documented graceful path: STOPPING → STOPPED. It does NOT, by default, flatten open positions or refuse to stop. That is the Cortana policy gate (next section).

Graceful shutdown - THE canonical sequence Cortana must implement

This is the most load-bearing section. Per nautilus-live.md: the framework’s default shutdown is NOT position-respecting. node.stop() tears down clients, persists state, flushes writers - that’s the entire shutdown contract. There is no built-in “refuse to stop while positions are open” gate.

The MK2 invariant (feedback_no_kill_with_open_positions.md) - never kill the engine while a position is open - must therefore be wired explicitly using framework primitives. The framework provides three building blocks:

  1. Strategy.market_exit() - graceful flatten: cancels open and in-flight orders, closes positions with reduce-only markets, periodically re-checks (market_exit_interval_ms, market_exit_max_attempts), calls post_market_exit once flat or after max attempts. Non-reduce-only orders are denied during exit (per nautilus-execution.md).
  2. HALTED TradingState - flips the RiskEngine to refuse all new submits/modifies; cancels still pass; reduce-only closes still pass.
  3. Cache predicates - cache.positions_open_count(), cache.orders_inflight_count(), cache.is_completely_flat() (per nautilus-cache.md / nautilus-execution.md).

The canonical Cortana shutdown sequence (must implement)

1. SIGTERM received (from launchd / docker stop / operator).
2. Flip TradingState → HALTED         (refuses new entries structurally).
3. Call strategy.market_exit()        (initiates reduce-only flatten).
4. Wait for post_market_exit hook
   OR market_exit_max_attempts elapsed.
5. Verify cache.positions_open_count() == 0
        AND cache.orders_inflight_count() == 0.
6. If still non-flat:
      - Telegram operator with position detail.
      - DO NOT auto-stop the node.
      - Wait for explicit operator approval (the
        `feedback_no_kill_with_open_positions.md` ABORT branch).
7. If flat:
      - node.stop()
      - node.dispose()
      - process exits cleanly.

This is policy code Cortana owns; the framework supplies the primitives but not the orchestration. The launchd preflight check that runs before SIGTERM is sent should also enforce the invariant - exactly as MK2’s preflight does today.

Cortana orchestration script - cortana_run.py

Complete __main__.py showing TradingNode build + start + signal handling for graceful, position-respecting shutdown. Python preferred per the prompt; Rust fallback follows below.

"""
cortana_mk3/__main__.py
 
Entry point for the Cortana MK3 paper deployment. Same shape for
live - flip Environment + IB Gateway port + account ID.
 
Run:
    python -m cortana_mk3
"""
import logging
import signal
import sys
import threading
import time
 
from nautilus_trader.adapters.interactive_brokers.config import (
    InteractiveBrokersDataClientConfig,
    InteractiveBrokersExecClientConfig,
)
from nautilus_trader.adapters.interactive_brokers.factories import (
    InteractiveBrokersLiveDataClientFactory,
    InteractiveBrokersLiveExecClientFactory,
)
from nautilus_trader.config import (
    CacheConfig,
    LiveDataEngineConfig,
    LiveExecEngineConfig,
    LoggingConfig,
    MessageBusConfig,
    RiskEngineConfig,
    TradingNodeConfig,
)
from nautilus_trader.live.node import TradingNode
from nautilus_trader.model.enums import TradingState
from nautilus_trader.model.identifiers import StrategyId, TraderId
 
from cortana_mk3.config import load_settings
from cortana_mk3.strategies.cortana import CortanaStrategy, CortanaConfig
from cortana_mk3.actors.audit_logger import AuditLoggerActor
from cortana_mk3.alerts.telegram import telegram_alert
 
 
log = logging.getLogger("cortana_mk3")
 
 
def build_node(settings) -> TradingNode:
    """Build a configured TradingNode for Cortana paper or live."""
 
    # ----- Cache: Redis-externalized so workspace archive can't kill state. -----
    cache = CacheConfig(
        database=settings.cache_database_config,   # Redis URL + auth
        encoding="msgpack",
        flush_on_start=False,                       # NEVER flush in prod
        use_instance_id=False,                      # stable instance ID across restarts
    )
 
    # ----- Message bus: external Redis Streams for audit + dashboard. -----
    msg_bus = MessageBusConfig(
        database=settings.bus_database_config,
        encoding="msgpack",
        timestamps_as_iso8601=True,
        use_instance_id=False,
        types_filter=None,
        autotrim_mins=60,
    )
 
    # ----- Live execution engine: reconciliation MUST stay on. -----
    exec_engine = LiveExecEngineConfig(
        reconciliation=True,                                  # mandatory
        reconciliation_lookback_mins=None,                    # max venue history
        generate_missing_orders=True,                         # auto-align positions
        inflight_check_interval_ms=2_000,
        inflight_check_threshold_ms=5_000,
        inflight_check_retries=5,
        open_check_interval_secs=5.0,
        open_check_threshold_ms=5_000,
        own_books_audit_interval_secs=10.0,
        reconciliation_startup_delay_secs=10,                 # WebSocket settle
        allow_overfills=False,
        external_order_claims=[
            (StrategyId("CORTANA-001"), settings.spy_0dte_instrument_id),
        ],
    )
 
    risk_engine = RiskEngineConfig(
        bypass=False,
        max_order_submit_rate="100/00:00:01",
        max_order_modify_rate="100/00:00:01",
        max_notional_per_order={"USD": 50_000.0},
    )
 
    # ----- IBKR data + exec clients (Dockerized Gateway). -----
    ib_data = InteractiveBrokersDataClientConfig(
        ibg_host=settings.ibg_host,                           # "ib-gateway" docker svc
        ibg_port=settings.ibg_port,                           # 4002 paper / 4001 live
        ibg_client_id=1,
        use_regular_trading_hours=False,
        market_data_type="DELAYED_FROZEN" if settings.is_paper else "REALTIME",
    )
    ib_exec = InteractiveBrokersExecClientConfig(
        ibg_host=settings.ibg_host,
        ibg_port=settings.ibg_port,
        ibg_client_id=1,
        account_id=settings.ibg_account_id,                   # DU* paper / U* live
    )
 
    # ----- TradingNodeConfig wraps it all. -----
    config = TradingNodeConfig(
        environment="LIVE" if not settings.is_paper else "LIVE",  # paper IS live env
        trader_id=TraderId(f"CORTANA-{'PAPER' if settings.is_paper else 'LIVE'}"),
        instance_id=None,
        cache=cache,
        message_bus=msg_bus,
        data_engine=LiveDataEngineConfig(),
        exec_engine=exec_engine,
        risk_engine=risk_engine,
        logging=LoggingConfig(
            log_level="INFO",
            log_level_file="DEBUG",
            log_directory=str(settings.log_dir),
            log_file_format="json",
            log_colors=True,
        ),
        data_clients={"IB": ib_data},
        exec_clients={"IB": ib_exec},
        timeout_connection=30.0,
        timeout_reconciliation=20.0,
        timeout_portfolio=20.0,
        timeout_disconnection=10.0,
        timeout_post_stop=5.0,
    )
 
    node = TradingNode(config=config)
    node.add_data_client_factory("IB", InteractiveBrokersLiveDataClientFactory)
    node.add_exec_client_factory("IB", InteractiveBrokersLiveExecClientFactory)
 
    # ----- Strategy + Audit Actor. -----
    strategy = CortanaStrategy(
        config=CortanaConfig(
            strategy_id=StrategyId("CORTANA-001"),
            instrument_id=settings.spy_0dte_instrument_id,
            tp_pct=0.10,
            sl_pct=0.20,
            market_exit_interval_ms=2_000,
            market_exit_max_attempts=10,
        ),
    )
    audit = AuditLoggerActor(
        config_path=settings.audit_config_path,
        telegram_token=settings.telegram_token,
        telegram_chat=settings.telegram_chat,
    )
 
    node.trader.add_strategy(strategy)
    node.trader.add_actor(audit)
 
    return node
 
 
# ---------------------------------------------------------------------------
# Graceful shutdown orchestration - the load-bearing piece.
# ---------------------------------------------------------------------------
 
class GracefulShutdown:
    """
    Implements `feedback_no_kill_with_open_positions.md` on top of
    Nautilus primitives. Framework provides market_exit + HALTED;
    we provide the orchestration + the ABORT branch.
    """
 
    def __init__(self, node: TradingNode, max_wait_secs: float = 60.0) -> None:
        self.node = node
        self.max_wait_secs = max_wait_secs
        self._shutting_down = threading.Event()
 
    def install(self) -> None:
        signal.signal(signal.SIGTERM, self._on_signal)
        signal.signal(signal.SIGINT, self._on_signal)
 
    def _on_signal(self, signum, frame) -> None:
        if self._shutting_down.is_set():
            log.warning("Second signal received - letting framework finish.")
            return
        self._shutting_down.set()
        log.info("Signal %s received - beginning Cortana graceful shutdown.", signum)
        threading.Thread(target=self._run, daemon=True).start()
 
    def _run(self) -> None:
        cache = self.node.kernel.cache
        risk_engine = self.node.kernel.risk_engine
        trader = self.node.trader
 
        # Step 1 - Flip TradingState to HALTED. Refuses new entries structurally.
        try:
            risk_engine.set_trading_state(TradingState.HALTED)
            log.info("TradingState → HALTED.")
            telegram_alert("Cortana: HALTED - flatten initiated.")
        except Exception as e:
            log.exception("Failed to set HALTED: %s", e)
 
        # Step 2 - Initiate market_exit on every strategy.
        for strategy in trader.strategies():
            try:
                strategy.market_exit()
                log.info("market_exit() invoked on %s.", strategy.id)
            except Exception as e:
                log.exception("market_exit failed on %s: %s", strategy.id, e)
 
        # Step 3 - Wait for flat OR timeout.
        deadline = time.monotonic() + self.max_wait_secs
        while time.monotonic() < deadline:
            open_positions = cache.positions_open_count()
            inflight = cache.orders_inflight_count()
            if open_positions == 0 and inflight == 0:
                log.info("Flat. Proceeding to node.stop().")
                break
            log.info(
                "Waiting for flat - open=%d inflight=%d",
                open_positions, inflight,
            )
            time.sleep(1.0)
        else:
            # Step 4 - ABORT branch. Surface to operator; do NOT auto-stop.
            open_positions = cache.positions_open_count()
            inflight = cache.orders_inflight_count()
            telegram_alert(
                f"Cortana: SHUTDOWN ABORTED - {open_positions} open, "
                f"{inflight} in-flight. Manual intervention required."
            )
            log.error(
                "ABORT - open=%d inflight=%d - refusing to call node.stop().",
                open_positions, inflight,
            )
            return  # Process stays up; operator decides.
 
        # Step 5 - Clean stop + dispose.
        try:
            self.node.stop()
            log.info("node.stop() complete.")
        except Exception as e:
            log.exception("node.stop() raised: %s", e)
        finally:
            try:
                self.node.dispose()
            except Exception as e:
                log.exception("node.dispose() raised: %s", e)
            telegram_alert("Cortana: stopped cleanly.")
 
 
def main() -> int:
    settings = load_settings()
    logging.basicConfig(level=logging.INFO)
 
    node = build_node(settings)
    shutdown = GracefulShutdown(node, max_wait_secs=settings.shutdown_max_wait_secs)
    shutdown.install()
 
    try:
        node.build()
        # Pre-flight: verify Redis reachable, IB Gateway healthy.
        # If either fails, abort BEFORE node.run() so we never reach RUNNING.
        node.run()                   # blocks; framework SIGINT handler will
                                      # delegate to our shutdown thread above.
    except Exception as e:
        log.exception("Fatal error during run(): %s", e)
        telegram_alert(f"Cortana: FATAL - {e}")
        return 2
    return 0
 
 
if __name__ == "__main__":
    sys.exit(main())

Note on the signal-handler thread. The Cortana shutdown orchestration runs in a daemon thread because the main thread is inside node.run() blocking on the asyncio loop. The framework’s own SIGINT handler will also fire and begin its tear-down; our job is to ensure the position-respecting steps complete first. In practice we either (a) intercept SIGTERM before the framework sees it, do market_exit, then re-raise to the framework’s handler; or (b) use a pre_stop hook on the strategy. The clearest pattern is to call market_exit() in the strategy’s on_stop() and rely on the framework’s documented contract that on_stop() runs before disconnection - but wire the HALTED flip and the position-count gate explicitly anyway.

Rust fallback orchestration (since Python how-to is 404)

If/when Cortana migrates a hot path to Rust (per nautilus-rust.md the engine itself is already Rust under the hood), the equivalent shape is:

use tokio::signal;
 
#[tokio::main]
async fn main() -> anyhow::Result<()> {
    dotenvy::dotenv().ok();
 
    let mut node = build_node()?;        // builder per Rust how-to
    node.add_strategy(strategy)?;
 
    // Rust's run().await blocks until SIGINT; for graceful policy use
    // tokio::select! with a manual signal future + framework run().
    tokio::select! {
        res = node.run() => {
            res?;
        }
        _ = signal::ctrl_c() => {
            // Implement graceful: HALTED + market_exit + wait + stop.
            // (The Rust API surface is parallel to Python; cross-ref
            //  `nautilus-rust.md` for crate-level method names once
            //  the Rust IBKR adapter ships.)
            graceful_shutdown(&node).await?;
        }
    }
 
    Ok(())
}

The Rust how-to’s with_delay_post_stop_secs(5) is the Rust counterpart to Python’s timeout_post_stop=5.0 - a fixed-duration wait after stop() completes, before final teardown. It is not a position-respecting wait - it’s a time wait. The position check is still ours.

Healthchecks and watchdog patterns

Per nautilus-live.md: the framework ships no healthcheck endpoint and no watchdog daemon. The continuous reconciliation loop is itself the liveness mechanism - if reconciliation keeps converging, the engine is healthy. If it starts emitting persistent discrepancies, that’s the canary.

Cortana’s healthcheck strategy:

  1. Process supervisor (launchd / Docker / systemd) restarts the process on exit. This is the framework’s documented stance: “any panic results in a clean process termination that can be handled by process supervisors or orchestration systems” (per nautilus-architecture.md’s panic = abort posture).
  2. AuditLoggerActor subscribes to ComponentStateChanged events and emits Telegram alerts on every transition (STARTING, RUNNING, STOPPING, STOPPED, DEGRADED, FAULTED). This wires feedback_watchdog_to_telegram.md correctly: trading-event Telegrams flow from the Actor on the message bus, not from a separate watchdog process.
  3. External healthcheck endpoint (custom Actor): exposes a Redis key like cortana:health:last_heartbeat updated every N seconds while engine is RUNNING. The dashboard (or a simple cron) reads the key and alerts if stale.
  4. IB Gateway healthcheck integrates with the Docker compose health status - see nautilus-ib.md for the Dockerized Gateway pattern. Cortana’s compose file should gate cortana_mk3 depends_on the gateway container with condition: service_healthy.

Recovery after crash

Per nautilus-architecture.md (crash-only design): startup and crash recovery share the same code path. The LiveNode / TradingNode always boots through reconciliation. There is no separate “recovery mode.”

Recovery sequence after an unclean exit:

  1. Process supervisor restarts the binary (launchd KeepAlive, docker restart=always, systemd Restart=on-failure).
  2. node.run() boots normally.
  3. Cache rehydrates from Redis (state survives the process exit because we externalized the Cache per nautilus-cache.md).
  4. LiveExecutionEngine reconciles against IBKR using the three adapter calls.
  5. Any orders or positions that opened during the gap are detected; missing fills synthesized; missing alignment orders generated as EXTERNAL + tag RECONCILIATION.
  6. Strategy’s external_order_claims adopt pre-existing positions for the strategy’s instrument set - Cortana’s SPY 0DTE positions are reattached to CortanaStrategy.
  7. Strategy.on_start() fires; strategy is back in business.

The four reconciliation invariants - position quantity, average entry price, PnL integrity, ID determinism - hold across this recovery (per nautilus-live.md). Replay-safe restart is structural, not lucky.

Cortana MK3 implications

(a) Launchd plist replacement

MK2’s ~/Library/LaunchAgents/com.cortanaroi.engine.plist becomes either:

Option A - Native launchd, Python entry point. Simple, same shape as today:

<key>ProgramArguments</key>
<array>
    <string>/Users/codysmith/.venvs/cortana_mk3/bin/python</string>
    <string>-m</string>
    <string>cortana_mk3</string>
</array>
<key>RunAtLoad</key><true/>
<key>KeepAlive</key>
<dict>
    <key>SuccessfulExit</key><false/>
</dict>
<key>WorkingDirectory</key>
<string>/Users/codysmith/cortanaroi-data</string>
<key>EnvironmentVariables</key>
<dict>
    <key>CORTANAROI_ENV</key><string>paper</string>
</dict>

Pre-flight check (feedback_no_kill_with_open_positions.md) runs as a separate launchctl bootout gate - same script as MK2.

Option B - Dockerized Cortana + Dockerized IB Gateway. Per nautilus-ib.md (Dockerized Gateway, port 4002 paper / 4001 live). Compose:

services:
  ib-gateway:
    image: ghcr.io/gnzsnz/ib-gateway:latest
    environment:
      TRADING_MODE: paper
      TWS_USERID: ${TWS_USERID}
      TWS_PASSWORD: ${TWS_PASSWORD}
    ports: ["4002:4002"]
    healthcheck:
      test: ["CMD", "nc", "-z", "localhost", "4002"]
      interval: 30s
      timeout: 5s
      retries: 3
 
  redis:
    image: redis:7-alpine
    volumes: ["./redis-data:/data"]
    command: redis-server --appendonly yes
 
  cortana:
    build: .
    depends_on:
      ib-gateway:
        condition: service_healthy
      redis:
        condition: service_started
    environment:
      CORTANAROI_ENV: paper
      IBG_HOST: ib-gateway
      IBG_PORT: 4002
      REDIS_URL: redis://redis:6379/0
    restart: unless-stopped
    stop_grace_period: 90s     # > shutdown_max_wait_secs (60s) + buffer
    stop_signal: SIGTERM

stop_grace_period: 90s is the critical knob: it gives the GracefulShutdown orchestration time to flatten before Docker SIGKILLs. **Set this longer than your `market_exit_max_attempts

  • market_exit_interval_ms` budget plus a safety margin.**

(b) Telegram alert wiring on TradingState transitions

AuditLoggerActor subscribes to ComponentStateChanged events on the message bus and emits a Telegram on each transition. Wire:

TransitionTelegram
STARTING → RUNNING”Cortana: up, reconciliation OK.”
RUNNING → DEGRADED”Cortana: DEGRADED - investigate.”
RUNNING → STOPPING”Cortana: STOPPING (graceful).”
STOPPING → STOPPED”Cortana: stopped cleanly.”
any → FAULTED”Cortana: FAULTED - process will exit.”
HALTED flip”Cortana: HALTED - no new entries.”

This honors feedback_watchdog_to_telegram.md (Telegram is for trading events, not AI-meta watchdog noise).

(c) Dockerized IB Gateway healthcheck integration

Cortana’s cortana container depends_on: ib-gateway: condition: service_healthy. If the gateway is unhealthy at boot, cortana doesn’t start at all - fail-fast at the orchestration layer, mirroring Nautilus’s fail-fast at the reconciliation layer. The combination: gateway healthy → TradingNode boots → reconciliation succeeds → strategies running. Any link in the chain breaks → process exits → docker restarts → chain re-runs from the top.

Caveats and gotchas

  • Python how-to does not exist (404). Cortana cannot point to a single canonical recipe. The orchestration shape is inferred from the concept guide + Rust how-to + source examples. Expect the Python how-to to land eventually; until then, this page is the closest equivalent.
  • with_reconciliation(false) in the Rust how-to is for pedagogy only. Do not copy that into Cortana production.
  • node.run() installs its own signal handlers. The Cortana GracefulShutdown thread must coordinate with the framework’s handler - install the shutdown thread before node.run() and rely on the strategy’s on_stop() plus our HALTED gate.
  • stop_grace_period (Docker) and timeout_post_stop (TradingNode) are independent. Both must be longer than the worst-case market_exit budget, or Docker SIGKILLs while the flatten is in progress.
  • The Rust how-to’s adapter table omits IBKR. IBKR is Python-only as of 0.55; do not expect a drop-in Rust IBKR adapter for the spike.
  • Reconciliation runs every boot. Boot is at least reconciliation_startup_delay_secs slower than backtest (default 10s). Operator dashboards should not interpret a 10-15s “no events” gap as a fault.
  • market_exit() denies non-reduce-only orders. Strategy code that submits a fresh entry mid-flatten silently fails with OrderDenied. Test the path before relying on it.
  • Cache flush_on_start=False is mandatory in production. True would erase the rehydration source and force the clean-room reconciliation path on every boot - slower and noisier.
  • Audit Logger Actor must be registered BEFORE node.run() to receive the STARTING → RUNNING transition Telegram. Late registration silently misses the boot event.

When this concept applies

  • Designing the MK3 launchd plist or docker-compose unit.
  • Writing the Cortana entry point (cortana_mk3/__main__.py).
  • Wiring the SIGTERM handler that respects open positions.
  • Designing the operator Telegram alert taxonomy.
  • Reasoning about “what does the engine being up actually mean?” (answer: LiveExecutionEngine reached RUNNING, which implies reconciliation succeeded).

When it does not apply

  • For configuring the TradingNode (knob taxonomy, adapter options, cache config) - see nautilus-howto-configure-live-trading (sibling, parallel filing) and nautilus-live.md.
  • For deep reconciliation semantics (the four invariants, partial-window adjustments, in-flight check loop) - see nautilus-live.md and nautilus-execution.md.
  • For Cache externalization to Redis specifically - see nautilus-cache.md.
  • For IBKR-specific port/account/credential setup - see nautilus-ib.md.

See Also

  • Nautilus Live Trading - concept guide, reconcile-on-startup, four invariants, in-flight check loop, shutdown semantics analysis.
  • nautilus-howto-configure-live-trading - parallel how-to filing on the configuration side (knob taxonomy).
  • Nautilus Rust - Rust crate layout, build features, project setup that the Rust how-to assumes.
  • Nautilus Execution - LiveExecEngineConfig, market_exit() mechanism, RiskEngine, HALTED state behavior.
  • Nautilus Cache - Redis externalization, rehydration, flush_on_start, use_instance_id.
  • Nautilus IB - Dockerized Gateway, port 4002 paper / 4001 live, account ID conventions.
  • Nautilus Positions - Position object, reduce-only semantics, OMS adjudication.
  • 2026-05-09 Nautilus Spike Plan: ~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md
  • feedback_no_kill_with_open_positions - invariant the GracefulShutdown class above implements.
  • feedback_watchdog_to_telegram - alert taxonomy honored by the AuditLoggerActor.
  • project_pm_ibkr_exit_invariant - broker-truth alignment Nautilus reconciliation enforces by construction.
  • project_data_loss_april22 - workspace-archive class that Cache externalization + reconciliation structurally addresses.

Timeline

2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 6 (how-tos).