How-To - Run Live Trading
URL resolution (Spike Plan Step 0): The user’s prompt pasted
write_rust_strategy/as the Rust how-to URL - assumed copy-paste typo. Tried the natural parallel slughttps://nautilustrader.io/docs/latest/how_to/run_rust_live_trading/→ HTTP 200 (canonical). Triedhttps://nautilustrader.io/docs/latest/how_to/run_python_live_trading/→ HTTP 404. Fallback URLslive_trading/andrunning_live/also returned 404. Conclusion: as of 2026-05-07 there is no Python “Run Live Trading” how-to published in the Nautilus docs. The Python lifecycle story is split across (a)concepts/live/(the reconciliation-centric concept page, captured innautilus-live.md) and (b) thehow_to/configure_live_tradingrecipe (captured at index level innautilus-how-to.md). This page therefore extracts the Rust how-to verbatim, then translates the lifecycle, signal-handling, and shutdown patterns to Python (Cortana’s actual deployment language) by cross-referencingnautilus-live.md,nautilus-execution.md, andnautilus-architecture.md- calling out every point where Python and Rust diverge or where the Python answer is inferred rather than verbatim from a how-to. The Spike Plan Step 0 deliverable - “does a Python how-to exist?” - is therefore answered NO; the spike must rely on the concept guide + configure-node how-to + source-code examples to reconstruct the Python orchestration shape.
Core claim
The Rust how-to commits to a single shape: build a LiveNode
via builder, register data + exec client factories, register
strategies, call run().await, let tokio block until SIGINT or
programmatic shutdown. The Python equivalent (from
nautilus-live.md and the standard examples) is the same
shape: build a TradingNode via TradingNodeConfig, register
adapters via factories, call node.add_strategy(...),
node.build(), node.run() - the call blocks until SIGINT or
programmatic stop. Reconciliation runs at startup; the engine
refuses to start if it fails. There is no documented healthcheck
endpoint, no documented watchdog daemon, and no
position-respecting shutdown gate - that is operator policy on
top of the framework primitives.
What “ready to trade” means (reconcile-on-startup signal)
Per nautilus-live.md: the framework does NOT emit a single named
event such as ReconciliationComplete. The canonical signal is
the engine transitioning to RUNNING state in the FSM
(nautilus-architecture.md). Operationally:
node.run()(Python) /node.run().await(Rust) is called.LiveExecutionEngineruns the three reconciliation calls against each registered venue:generate_order_status_reports,generate_fill_reports,generate_position_status_reports.- The engine walks the reports through duplicate-check → order reconciliation → position reconciliation.
- If reconciliation fails: “the system logs an error and
does not start” (verbatim from
concepts/live/). The engine never reachesRUNNING. Boot aborts. - If reconciliation succeeds: the engine transitions to
RUNNING, strategies receive their first events, andStrategy.on_start()fires.on_start()running ≡ reconciliation succeeded ≡ ready to trade.
A reconciliation_startup_delay_secs window (default 10s) buffers
WebSocket stabilization before the continuous reconciliation
loops (in-flight check, open-orders poll, own-books audit) begin.
Boot is therefore at least 10 seconds slower than backtest.
For Cortana: the operator dashboard / Telegram alert wiring should
gate “Cortana is up” on a ComponentStateChanged event for the
LiveExecutionEngine reaching RUNNING, OR equivalently on the
strategy’s on_start() executing. Anything earlier is premature.
Rust how-to - verbatim shape
The Rust how-to (the only published one) walks through an OKX example. Salient verbatim pieces:
Dependencies (Cargo.toml):
[dependencies]
nautilus-common = "0.55"
nautilus-live = "0.55"
nautilus-model = "0.55"
nautilus-okx = "0.55"
nautilus-trading = { version = "0.55", features = ["examples"] }
anyhow = "1"
dotenvy = "0.15"
log = "0.4"
tokio = { version = "1", features = ["full"] }Build the node (builder pattern):
use log::LevelFilter;
use nautilus_common::{enums::Environment, logging::logger::LoggerConfig};
use nautilus_live::node::LiveNode;
use nautilus_model::identifiers::{AccountId, TraderId};
use nautilus_okx::{
common::enums::OKXInstrumentType,
config::{OKXDataClientConfig, OKXExecClientConfig},
factories::{OKXDataClientFactory, OKXExecutionClientFactory},
};
let trader_id = TraderId::from("TESTER-001");
let account_id = AccountId::from("OKX-001");
let data_config = OKXDataClientConfig {
instrument_types: vec![OKXInstrumentType::Swap],
..Default::default()
};
let exec_config = OKXExecClientConfig {
trader_id,
account_id,
instrument_types: vec![OKXInstrumentType::Swap],
..Default::default()
};
let log_config = LoggerConfig {
stdout_level: LevelFilter::Info,
..Default::default()
};
let mut node = LiveNode::builder(trader_id, Environment::Live)?
.with_name("MY-NODE-001".to_string())
.with_logging(log_config)
.add_data_client(
None,
Box::new(OKXDataClientFactory::new()),
Box::new(data_config),
)?
.add_exec_client(
None,
Box::new(OKXExecutionClientFactory::new()),
Box::new(exec_config),
)?
.with_reconciliation(false) // simplified; enable in production
.with_delay_post_stop_secs(5)
.build()?;The doc explicitly warns: “This example disables reconciliation
for simplicity. In production, remove
.with_reconciliation(false) so the engine aligns cached state
with the venue on startup.” For Cortana, never set
with_reconciliation(false) - the entire reason to migrate is
the structural reconciliation guarantee.
Add strategies and run:
use nautilus_model::{identifiers::InstrumentId, types::Quantity};
use nautilus_trading::examples::strategies::{
GridMarketMaker, GridMarketMakerConfig,
};
let mut config = GridMarketMakerConfig::new(
InstrumentId::from("ETH-USDT-SWAP.OKX"),
Quantity::from("0.10"),
)
.with_num_levels(3)
.with_grid_step_bps(100)
.with_skew_factor(0.5)
.with_requote_threshold_bps(10)
.with_expire_time_secs(8)
.with_on_cancel_resubmit(true);
config.base.use_hyphens_in_client_order_ids = false; // OKX-specific
let strategy = GridMarketMaker::new(config);
node.add_strategy(strategy)?;
node.run().await?;“The node runs until interrupted (Ctrl+C) or shut down programmatically.”
Async runtime requirement (Rust):
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
dotenvy::dotenv().ok();
// ... node setup ...
node.run().await?;
Ok(())
}Adapter examples directory (verbatim list - Cortana cares about the IBKR equivalent, but it’s not in this list because IBKR is Python-only as of 0.55):
| Adapter | Example directory |
|---|---|
| Architect AX | crates/adapters/architect_ax/examples/ |
| Betfair | crates/adapters/betfair/examples/ |
| Binance | crates/adapters/binance/examples/ |
| BitMEX | crates/adapters/bitmex/examples/ |
| Bybit | crates/adapters/bybit/examples/ |
| Databento | crates/adapters/databento/examples/ |
| Deribit | crates/adapters/deribit/examples/ |
| dYdX | crates/adapters/dydx/examples/ |
| Hyperliquid | crates/adapters/hyperliquid/examples/ |
| Kraken | crates/adapters/kraken/examples/ |
| OKX | crates/adapters/okx/examples/ |
| Polymarket | crates/adapters/polymarket/examples/ |
| Sandbox | crates/adapters/sandbox/examples/ |
| Tardis | crates/adapters/tardis/examples/ |
IBKR is conspicuously absent from this list - meaning the
Rust how-to is not the right reference for Cortana’s IBKR
deployment. The IBKR adapter is Python (per nautilus-ib.md),
which is precisely why the missing Python how-to matters.
Python lifecycle (translated; not verbatim from a how-to)
Compiled from nautilus-live.md + nautilus-architecture.md +
nautilus-execution.md:
| Phase | Method | What happens |
|---|---|---|
| Construct | TradingNode(config=...) | Wires config, no I/O |
| Wire | node.add_strategy(...), node.add_actor(...) | Register strategy + actor instances |
| Build | node.build() | Resolve adapters, register clients via factories |
| Run | node.run() | Blocking; performs reconciliation; transitions through STARTING → RUNNING; dispatches events; SIGINT/SIGTERM trigger graceful path |
| Stop | node.stop() | STOPPING; tears down clients; persists state; flushes writers; reaches STOPPED |
| Dispose | node.dispose() | DISPOSING → DISPOSED; releases resources |
Async runtime (Python)
Unlike Rust’s #[tokio::main], Python’s TradingNode.run()
manages its own asyncio loop internally. The standard pattern is
synchronous-looking from __main__:
node = TradingNode(config=config)
node.add_strategy(strategy)
node.build()
node.run() # blocks; manages asyncio loop internally
node.dispose()The framework installs SIGINT and SIGTERM handlers internally
during run(). On signal, the handler triggers the documented
graceful path: STOPPING → STOPPED. It does NOT, by default,
flatten open positions or refuse to stop. That is the Cortana
policy gate (next section).
Graceful shutdown - THE canonical sequence Cortana must implement
This is the most load-bearing section. Per nautilus-live.md:
the framework’s default shutdown is NOT position-respecting.
node.stop() tears down clients, persists state, flushes
writers - that’s the entire shutdown contract. There is no built-in
“refuse to stop while positions are open” gate.
The MK2 invariant (feedback_no_kill_with_open_positions.md) -
never kill the engine while a position is open - must therefore
be wired explicitly using framework primitives. The framework
provides three building blocks:
Strategy.market_exit()- graceful flatten: cancels open and in-flight orders, closes positions with reduce-only markets, periodically re-checks (market_exit_interval_ms,market_exit_max_attempts), callspost_market_exitonce flat or after max attempts. Non-reduce-only orders are denied during exit (pernautilus-execution.md).HALTEDTradingState- flips the RiskEngine to refuse all new submits/modifies; cancels still pass; reduce-only closes still pass.- Cache predicates -
cache.positions_open_count(),cache.orders_inflight_count(),cache.is_completely_flat()(pernautilus-cache.md/nautilus-execution.md).
The canonical Cortana shutdown sequence (must implement)
1. SIGTERM received (from launchd / docker stop / operator).
2. Flip TradingState → HALTED (refuses new entries structurally).
3. Call strategy.market_exit() (initiates reduce-only flatten).
4. Wait for post_market_exit hook
OR market_exit_max_attempts elapsed.
5. Verify cache.positions_open_count() == 0
AND cache.orders_inflight_count() == 0.
6. If still non-flat:
- Telegram operator with position detail.
- DO NOT auto-stop the node.
- Wait for explicit operator approval (the
`feedback_no_kill_with_open_positions.md` ABORT branch).
7. If flat:
- node.stop()
- node.dispose()
- process exits cleanly.This is policy code Cortana owns; the framework supplies the primitives but not the orchestration. The launchd preflight check that runs before SIGTERM is sent should also enforce the invariant - exactly as MK2’s preflight does today.
Cortana orchestration script - cortana_run.py
Complete __main__.py showing TradingNode build + start + signal
handling for graceful, position-respecting shutdown. Python
preferred per the prompt; Rust fallback follows below.
"""
cortana_mk3/__main__.py
Entry point for the Cortana MK3 paper deployment. Same shape for
live - flip Environment + IB Gateway port + account ID.
Run:
python -m cortana_mk3
"""
import logging
import signal
import sys
import threading
import time
from nautilus_trader.adapters.interactive_brokers.config import (
InteractiveBrokersDataClientConfig,
InteractiveBrokersExecClientConfig,
)
from nautilus_trader.adapters.interactive_brokers.factories import (
InteractiveBrokersLiveDataClientFactory,
InteractiveBrokersLiveExecClientFactory,
)
from nautilus_trader.config import (
CacheConfig,
LiveDataEngineConfig,
LiveExecEngineConfig,
LoggingConfig,
MessageBusConfig,
RiskEngineConfig,
TradingNodeConfig,
)
from nautilus_trader.live.node import TradingNode
from nautilus_trader.model.enums import TradingState
from nautilus_trader.model.identifiers import StrategyId, TraderId
from cortana_mk3.config import load_settings
from cortana_mk3.strategies.cortana import CortanaStrategy, CortanaConfig
from cortana_mk3.actors.audit_logger import AuditLoggerActor
from cortana_mk3.alerts.telegram import telegram_alert
log = logging.getLogger("cortana_mk3")
def build_node(settings) -> TradingNode:
"""Build a configured TradingNode for Cortana paper or live."""
# ----- Cache: Redis-externalized so workspace archive can't kill state. -----
cache = CacheConfig(
database=settings.cache_database_config, # Redis URL + auth
encoding="msgpack",
flush_on_start=False, # NEVER flush in prod
use_instance_id=False, # stable instance ID across restarts
)
# ----- Message bus: external Redis Streams for audit + dashboard. -----
msg_bus = MessageBusConfig(
database=settings.bus_database_config,
encoding="msgpack",
timestamps_as_iso8601=True,
use_instance_id=False,
types_filter=None,
autotrim_mins=60,
)
# ----- Live execution engine: reconciliation MUST stay on. -----
exec_engine = LiveExecEngineConfig(
reconciliation=True, # mandatory
reconciliation_lookback_mins=None, # max venue history
generate_missing_orders=True, # auto-align positions
inflight_check_interval_ms=2_000,
inflight_check_threshold_ms=5_000,
inflight_check_retries=5,
open_check_interval_secs=5.0,
open_check_threshold_ms=5_000,
own_books_audit_interval_secs=10.0,
reconciliation_startup_delay_secs=10, # WebSocket settle
allow_overfills=False,
external_order_claims=[
(StrategyId("CORTANA-001"), settings.spy_0dte_instrument_id),
],
)
risk_engine = RiskEngineConfig(
bypass=False,
max_order_submit_rate="100/00:00:01",
max_order_modify_rate="100/00:00:01",
max_notional_per_order={"USD": 50_000.0},
)
# ----- IBKR data + exec clients (Dockerized Gateway). -----
ib_data = InteractiveBrokersDataClientConfig(
ibg_host=settings.ibg_host, # "ib-gateway" docker svc
ibg_port=settings.ibg_port, # 4002 paper / 4001 live
ibg_client_id=1,
use_regular_trading_hours=False,
market_data_type="DELAYED_FROZEN" if settings.is_paper else "REALTIME",
)
ib_exec = InteractiveBrokersExecClientConfig(
ibg_host=settings.ibg_host,
ibg_port=settings.ibg_port,
ibg_client_id=1,
account_id=settings.ibg_account_id, # DU* paper / U* live
)
# ----- TradingNodeConfig wraps it all. -----
config = TradingNodeConfig(
environment="LIVE" if not settings.is_paper else "LIVE", # paper IS live env
trader_id=TraderId(f"CORTANA-{'PAPER' if settings.is_paper else 'LIVE'}"),
instance_id=None,
cache=cache,
message_bus=msg_bus,
data_engine=LiveDataEngineConfig(),
exec_engine=exec_engine,
risk_engine=risk_engine,
logging=LoggingConfig(
log_level="INFO",
log_level_file="DEBUG",
log_directory=str(settings.log_dir),
log_file_format="json",
log_colors=True,
),
data_clients={"IB": ib_data},
exec_clients={"IB": ib_exec},
timeout_connection=30.0,
timeout_reconciliation=20.0,
timeout_portfolio=20.0,
timeout_disconnection=10.0,
timeout_post_stop=5.0,
)
node = TradingNode(config=config)
node.add_data_client_factory("IB", InteractiveBrokersLiveDataClientFactory)
node.add_exec_client_factory("IB", InteractiveBrokersLiveExecClientFactory)
# ----- Strategy + Audit Actor. -----
strategy = CortanaStrategy(
config=CortanaConfig(
strategy_id=StrategyId("CORTANA-001"),
instrument_id=settings.spy_0dte_instrument_id,
tp_pct=0.10,
sl_pct=0.20,
market_exit_interval_ms=2_000,
market_exit_max_attempts=10,
),
)
audit = AuditLoggerActor(
config_path=settings.audit_config_path,
telegram_token=settings.telegram_token,
telegram_chat=settings.telegram_chat,
)
node.trader.add_strategy(strategy)
node.trader.add_actor(audit)
return node
# ---------------------------------------------------------------------------
# Graceful shutdown orchestration - the load-bearing piece.
# ---------------------------------------------------------------------------
class GracefulShutdown:
"""
Implements `feedback_no_kill_with_open_positions.md` on top of
Nautilus primitives. Framework provides market_exit + HALTED;
we provide the orchestration + the ABORT branch.
"""
def __init__(self, node: TradingNode, max_wait_secs: float = 60.0) -> None:
self.node = node
self.max_wait_secs = max_wait_secs
self._shutting_down = threading.Event()
def install(self) -> None:
signal.signal(signal.SIGTERM, self._on_signal)
signal.signal(signal.SIGINT, self._on_signal)
def _on_signal(self, signum, frame) -> None:
if self._shutting_down.is_set():
log.warning("Second signal received - letting framework finish.")
return
self._shutting_down.set()
log.info("Signal %s received - beginning Cortana graceful shutdown.", signum)
threading.Thread(target=self._run, daemon=True).start()
def _run(self) -> None:
cache = self.node.kernel.cache
risk_engine = self.node.kernel.risk_engine
trader = self.node.trader
# Step 1 - Flip TradingState to HALTED. Refuses new entries structurally.
try:
risk_engine.set_trading_state(TradingState.HALTED)
log.info("TradingState → HALTED.")
telegram_alert("Cortana: HALTED - flatten initiated.")
except Exception as e:
log.exception("Failed to set HALTED: %s", e)
# Step 2 - Initiate market_exit on every strategy.
for strategy in trader.strategies():
try:
strategy.market_exit()
log.info("market_exit() invoked on %s.", strategy.id)
except Exception as e:
log.exception("market_exit failed on %s: %s", strategy.id, e)
# Step 3 - Wait for flat OR timeout.
deadline = time.monotonic() + self.max_wait_secs
while time.monotonic() < deadline:
open_positions = cache.positions_open_count()
inflight = cache.orders_inflight_count()
if open_positions == 0 and inflight == 0:
log.info("Flat. Proceeding to node.stop().")
break
log.info(
"Waiting for flat - open=%d inflight=%d",
open_positions, inflight,
)
time.sleep(1.0)
else:
# Step 4 - ABORT branch. Surface to operator; do NOT auto-stop.
open_positions = cache.positions_open_count()
inflight = cache.orders_inflight_count()
telegram_alert(
f"Cortana: SHUTDOWN ABORTED - {open_positions} open, "
f"{inflight} in-flight. Manual intervention required."
)
log.error(
"ABORT - open=%d inflight=%d - refusing to call node.stop().",
open_positions, inflight,
)
return # Process stays up; operator decides.
# Step 5 - Clean stop + dispose.
try:
self.node.stop()
log.info("node.stop() complete.")
except Exception as e:
log.exception("node.stop() raised: %s", e)
finally:
try:
self.node.dispose()
except Exception as e:
log.exception("node.dispose() raised: %s", e)
telegram_alert("Cortana: stopped cleanly.")
def main() -> int:
settings = load_settings()
logging.basicConfig(level=logging.INFO)
node = build_node(settings)
shutdown = GracefulShutdown(node, max_wait_secs=settings.shutdown_max_wait_secs)
shutdown.install()
try:
node.build()
# Pre-flight: verify Redis reachable, IB Gateway healthy.
# If either fails, abort BEFORE node.run() so we never reach RUNNING.
node.run() # blocks; framework SIGINT handler will
# delegate to our shutdown thread above.
except Exception as e:
log.exception("Fatal error during run(): %s", e)
telegram_alert(f"Cortana: FATAL - {e}")
return 2
return 0
if __name__ == "__main__":
sys.exit(main())Note on the signal-handler thread. The Cortana shutdown
orchestration runs in a daemon thread because the main thread is
inside node.run() blocking on the asyncio loop. The framework’s
own SIGINT handler will also fire and begin its tear-down; our
job is to ensure the position-respecting steps complete first.
In practice we either (a) intercept SIGTERM before the framework
sees it, do market_exit, then re-raise to the framework’s
handler; or (b) use a pre_stop hook on the strategy. The
clearest pattern is to call market_exit() in the strategy’s
on_stop() and rely on the framework’s documented contract that
on_stop() runs before disconnection - but wire the HALTED flip
and the position-count gate explicitly anyway.
Rust fallback orchestration (since Python how-to is 404)
If/when Cortana migrates a hot path to Rust (per nautilus-rust.md
the engine itself is already Rust under the hood), the equivalent
shape is:
use tokio::signal;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
dotenvy::dotenv().ok();
let mut node = build_node()?; // builder per Rust how-to
node.add_strategy(strategy)?;
// Rust's run().await blocks until SIGINT; for graceful policy use
// tokio::select! with a manual signal future + framework run().
tokio::select! {
res = node.run() => {
res?;
}
_ = signal::ctrl_c() => {
// Implement graceful: HALTED + market_exit + wait + stop.
// (The Rust API surface is parallel to Python; cross-ref
// `nautilus-rust.md` for crate-level method names once
// the Rust IBKR adapter ships.)
graceful_shutdown(&node).await?;
}
}
Ok(())
}The Rust how-to’s with_delay_post_stop_secs(5) is the Rust
counterpart to Python’s timeout_post_stop=5.0 - a fixed-duration
wait after stop() completes, before final teardown. It is not
a position-respecting wait - it’s a time wait. The position check
is still ours.
Healthchecks and watchdog patterns
Per nautilus-live.md: the framework ships no healthcheck
endpoint and no watchdog daemon. The continuous reconciliation
loop is itself the liveness mechanism - if reconciliation keeps
converging, the engine is healthy. If it starts emitting
persistent discrepancies, that’s the canary.
Cortana’s healthcheck strategy:
- Process supervisor (launchd / Docker / systemd) restarts
the process on exit. This is the framework’s documented
stance: “any panic results in a clean process termination
that can be handled by process supervisors or orchestration
systems” (per
nautilus-architecture.md’spanic = abortposture). - AuditLoggerActor subscribes to
ComponentStateChangedevents and emits Telegram alerts on every transition (STARTING,RUNNING,STOPPING,STOPPED,DEGRADED,FAULTED). This wiresfeedback_watchdog_to_telegram.mdcorrectly: trading-event Telegrams flow from the Actor on the message bus, not from a separate watchdog process. - External healthcheck endpoint (custom Actor): exposes a
Redis key like
cortana:health:last_heartbeatupdated every N seconds while engine isRUNNING. The dashboard (or a simple cron) reads the key and alerts if stale. - IB Gateway healthcheck integrates with the Docker
compose health status - see
nautilus-ib.mdfor the Dockerized Gateway pattern. Cortana’s compose file should gatecortana_mk3depends_onthe gateway container withcondition: service_healthy.
Recovery after crash
Per nautilus-architecture.md (crash-only design): startup and
crash recovery share the same code path. The LiveNode /
TradingNode always boots through reconciliation. There is no
separate “recovery mode.”
Recovery sequence after an unclean exit:
- Process supervisor restarts the binary (launchd KeepAlive, docker restart=always, systemd Restart=on-failure).
node.run()boots normally.- Cache rehydrates from Redis (state survives the process exit
because we externalized the Cache per
nautilus-cache.md). LiveExecutionEnginereconciles against IBKR using the three adapter calls.- Any orders or positions that opened during the gap are
detected; missing fills synthesized; missing alignment orders
generated as
EXTERNAL+ tagRECONCILIATION. - Strategy’s
external_order_claimsadopt pre-existing positions for the strategy’s instrument set - Cortana’s SPY 0DTE positions are reattached toCortanaStrategy. Strategy.on_start()fires; strategy is back in business.
The four reconciliation invariants - position quantity, average
entry price, PnL integrity, ID determinism - hold across this
recovery (per nautilus-live.md). Replay-safe restart is
structural, not lucky.
Cortana MK3 implications
(a) Launchd plist replacement
MK2’s ~/Library/LaunchAgents/com.cortanaroi.engine.plist becomes
either:
Option A - Native launchd, Python entry point. Simple, same shape as today:
<key>ProgramArguments</key>
<array>
<string>/Users/codysmith/.venvs/cortana_mk3/bin/python</string>
<string>-m</string>
<string>cortana_mk3</string>
</array>
<key>RunAtLoad</key><true/>
<key>KeepAlive</key>
<dict>
<key>SuccessfulExit</key><false/>
</dict>
<key>WorkingDirectory</key>
<string>/Users/codysmith/cortanaroi-data</string>
<key>EnvironmentVariables</key>
<dict>
<key>CORTANAROI_ENV</key><string>paper</string>
</dict>Pre-flight check (feedback_no_kill_with_open_positions.md)
runs as a separate launchctl bootout gate - same script as MK2.
Option B - Dockerized Cortana + Dockerized IB Gateway. Per
nautilus-ib.md (Dockerized Gateway, port 4002 paper / 4001
live). Compose:
services:
ib-gateway:
image: ghcr.io/gnzsnz/ib-gateway:latest
environment:
TRADING_MODE: paper
TWS_USERID: ${TWS_USERID}
TWS_PASSWORD: ${TWS_PASSWORD}
ports: ["4002:4002"]
healthcheck:
test: ["CMD", "nc", "-z", "localhost", "4002"]
interval: 30s
timeout: 5s
retries: 3
redis:
image: redis:7-alpine
volumes: ["./redis-data:/data"]
command: redis-server --appendonly yes
cortana:
build: .
depends_on:
ib-gateway:
condition: service_healthy
redis:
condition: service_started
environment:
CORTANAROI_ENV: paper
IBG_HOST: ib-gateway
IBG_PORT: 4002
REDIS_URL: redis://redis:6379/0
restart: unless-stopped
stop_grace_period: 90s # > shutdown_max_wait_secs (60s) + buffer
stop_signal: SIGTERMstop_grace_period: 90s is the critical knob: it gives the
GracefulShutdown orchestration time to flatten before Docker
SIGKILLs. **Set this longer than your `market_exit_max_attempts
- market_exit_interval_ms` budget plus a safety margin.**
(b) Telegram alert wiring on TradingState transitions
AuditLoggerActor subscribes to ComponentStateChanged events
on the message bus and emits a Telegram on each transition. Wire:
| Transition | Telegram |
|---|---|
STARTING → RUNNING | ”Cortana: up, reconciliation OK.” |
RUNNING → DEGRADED | ”Cortana: DEGRADED - investigate.” |
RUNNING → STOPPING | ”Cortana: STOPPING (graceful).” |
STOPPING → STOPPED | ”Cortana: stopped cleanly.” |
any → FAULTED | ”Cortana: FAULTED - process will exit.” |
HALTED flip | ”Cortana: HALTED - no new entries.” |
This honors feedback_watchdog_to_telegram.md (Telegram is for
trading events, not AI-meta watchdog noise).
(c) Dockerized IB Gateway healthcheck integration
Cortana’s cortana container depends_on: ib-gateway: condition: service_healthy. If the gateway is unhealthy at
boot, cortana doesn’t start at all - fail-fast at the
orchestration layer, mirroring Nautilus’s fail-fast at the
reconciliation layer. The combination: gateway healthy →
TradingNode boots → reconciliation succeeds → strategies
running. Any link in the chain breaks → process exits → docker
restarts → chain re-runs from the top.
Caveats and gotchas
- Python how-to does not exist (404). Cortana cannot point to a single canonical recipe. The orchestration shape is inferred from the concept guide + Rust how-to + source examples. Expect the Python how-to to land eventually; until then, this page is the closest equivalent.
with_reconciliation(false)in the Rust how-to is for pedagogy only. Do not copy that into Cortana production.node.run()installs its own signal handlers. The Cortana GracefulShutdown thread must coordinate with the framework’s handler - install the shutdown thread beforenode.run()and rely on the strategy’son_stop()plus our HALTED gate.stop_grace_period(Docker) andtimeout_post_stop(TradingNode) are independent. Both must be longer than the worst-case market_exit budget, or Docker SIGKILLs while the flatten is in progress.- The Rust how-to’s adapter table omits IBKR. IBKR is Python-only as of 0.55; do not expect a drop-in Rust IBKR adapter for the spike.
- Reconciliation runs every boot. Boot is at least
reconciliation_startup_delay_secsslower than backtest (default 10s). Operator dashboards should not interpret a 10-15s “no events” gap as a fault. market_exit()denies non-reduce-only orders. Strategy code that submits a fresh entry mid-flatten silently fails withOrderDenied. Test the path before relying on it.- Cache
flush_on_start=Falseis mandatory in production.Truewould erase the rehydration source and force the clean-room reconciliation path on every boot - slower and noisier. - Audit Logger Actor must be registered BEFORE
node.run()to receive theSTARTING → RUNNINGtransition Telegram. Late registration silently misses the boot event.
When this concept applies
- Designing the MK3 launchd plist or docker-compose unit.
- Writing the Cortana entry point (
cortana_mk3/__main__.py). - Wiring the SIGTERM handler that respects open positions.
- Designing the operator Telegram alert taxonomy.
- Reasoning about “what does the engine being up actually mean?”
(answer:
LiveExecutionEnginereachedRUNNING, which implies reconciliation succeeded).
When it does not apply
- For configuring the TradingNode (knob taxonomy, adapter
options, cache config) - see
nautilus-howto-configure-live-trading(sibling, parallel filing) andnautilus-live.md. - For deep reconciliation semantics (the four invariants,
partial-window adjustments, in-flight check loop) - see
nautilus-live.mdandnautilus-execution.md. - For Cache externalization to Redis specifically - see
nautilus-cache.md. - For IBKR-specific port/account/credential setup - see
nautilus-ib.md.
See Also
- Nautilus Live Trading - concept guide, reconcile-on-startup, four invariants, in-flight check loop, shutdown semantics analysis.
- nautilus-howto-configure-live-trading - parallel how-to filing on the configuration side (knob taxonomy).
- Nautilus Rust - Rust crate layout, build features, project setup that the Rust how-to assumes.
- Nautilus Execution -
LiveExecEngineConfig,market_exit()mechanism, RiskEngine, HALTED state behavior. - Nautilus Cache - Redis externalization,
rehydration,
flush_on_start,use_instance_id. - Nautilus IB - Dockerized Gateway, port 4002 paper / 4001 live, account ID conventions.
- Nautilus Positions - Position object, reduce-only semantics, OMS adjudication.
- 2026-05-09 Nautilus Spike Plan:
~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md feedback_no_kill_with_open_positions- invariant the GracefulShutdown class above implements.feedback_watchdog_to_telegram- alert taxonomy honored by the AuditLoggerActor.project_pm_ibkr_exit_invariant- broker-truth alignment Nautilus reconciliation enforces by construction.project_data_loss_april22- workspace-archive class that Cache externalization + reconciliation structurally addresses.
Timeline
2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 6 (how-tos).