Nautilus Reports
Nautilus’s
ReportProviderproduces structured pandas DataFrames from the orders, fills, positions, and account-state graph held in the Cache. Five first-class report types (orders, order_fills, fills, positions, account) share a uniform “static method onReportProvider+ Trader helper wrapper” shape; the same code runs in backtest, sandbox, and live. Performance analytics live next door onPortfolio.analyzer- three stat buckets (pnls,returns,general) plus pluggablePortfolioStatisticclasses. Visualization plugs in vianautilus_trader.analysis.create_tearsheet(Plotly HTML - equity curve, drawdown, monthly heatmap, returns distribution, stats table) and lower-levelcreate_equity_curve. Reports are runtime-DataFrame-only - there is no native Brier score, AUC, or classification metric, so Cortana’s meta-model evaluation must be a customPortfolioStatistic(or post-hoc Parquet pipeline) reading the same DataFrames. The cleanest persistence path for downstream ML training isreport_df.to_parquet(...)after each backtest run, not the liveParquetDataCatalog(which is for market data, not analytics rows).
Why this page exists
Cortana MK3 milestone M2 requires “Nautilus tearsheet shows backtest of last 4 weeks with comparable Brier score, AUC, win rate to MK2” and “Daily MK2/MK3 decision diff < 5%.” Both deliverables sit on top of the report layer. This page is the API+config saturation reference for the report surface so M2 deliverables are unblocked the moment M1 finishes.
Core claim
“The
ReportProviderclass in NautilusTrader generates structured analytical reports from trading data, transforming raw orders, fills, positions, and account states into pandas DataFrames for analysis and visualization.”
One provider class, five static methods, two invocation paths (Trader
helper or direct), pandas DataFrames as the universal output shape.
Backtest and live use the same provider - same call from the same
data source (the Cache). This is one of the seven backtest-live parity
points (cf. nautilus-architecture.md).
Report taxonomy - the five built-ins
1. Orders report
Full view of every order regardless of status.
# Trader helper (recommended)
orders_report = trader.generate_orders_report()
# ReportProvider directly
from nautilus_trader.analysis import ReportProvider
orders = cache.orders()
orders_report = ReportProvider.generate_orders_report(orders)Indexed by client_order_id. Key columns:
instrument_id, strategy_id, trader_id, account_id,
venue_order_id, side, type, status, quantity (str),
filled_qty (str), price, avg_px, time_in_force, ts_init
(Unix nanos), ts_last (Unix nanos). Type-conditional columns:
trigger_price for stops, expire_time for GTD. Source of truth for
column list: Order.to_dict().
Use case: “Did we attempt to enter? With what params? When?” - the audit row for every entry intent that became an order.
2. Order fills report
One row per order with at least one fill (filtered subset of the orders report).
fills_report = trader.generate_order_fills_report()Same column shape as the orders report, but ts_init and ts_last
are converted to datetime objects for easier analysis. Filter
applied: filled_qty > 0.
Use case: “Of the orders we sent, which actually executed?” - first filter for win-rate calculations.
3. Fills report
One row per fill event (an order can produce multiple rows if partially filled across executions).
fills_report = trader.generate_fills_report()Indexed by client_order_id. Key columns: trade_id, venue_order_id,
instrument_id, strategy_id, account_id, position_id,
order_side, order_type, last_px (str), last_qty (str),
currency, liquidity_side (MAKER / TAKER), commission, ts_event
(datetime), ts_init (datetime). Source: OrderFilled.to_dict().
Use case: precise commission and slippage attribution - fills are the atomic unit, not orders.
4. Positions report
Position analysis including historical snapshots (NETTING OMS only; HEDGING uses unique IDs and never reopens).
# Trader helper auto-includes snapshots for NETTING OMS
positions_report = trader.generate_positions_report()
# Direct path requires explicit snapshot pass-in
positions = cache.positions()
snapshots = cache.position_snapshots()
positions_report = ReportProvider.generate_positions_report(
positions=positions,
snapshots=snapshots,
)Indexed by position_id. Key columns: instrument_id, strategy_id,
trader_id, account_id, opening_order_id, closing_order_id,
entry (BUY/SELL), side (LONG/SHORT/FLAT), quantity, peak_qty,
avg_px_open, avg_px_close, commissions (list), realized_pnl,
realized_return, ts_init, ts_opened, ts_last, ts_closed,
duration_ns, is_snapshot (bool).
Snapshot caveat (load-bearing for NETTING OMS): “Always include
snapshots in reports for accurate total PnL calculation. In HEDGING
OMS, snapshots are not used since each position has a unique ID and
is never reopened.” For Cortana (NETTING on IBKR), the Trader helper
path is the correct default - direct ReportProvider calls require
explicit snapshot pass-in or PnL is wrong.
Use case: win-rate, hold-time distribution, realized-PnL per trade - the canonical row class for “how did each trade perform.”
5. Account report
Balance and margin time-series per venue.
from nautilus_trader.model.identifiers import Venue
venue = Venue("INTERACTIVE_BROKERS")
account_report = trader.generate_account_report(venue)Indexed by ts_event (timestamp of state change). Key columns:
account_id, account_type (SPOT, MARGIN, …), base_currency,
total (str), free (str), locked (str), currency, reported
(bool - venue-reported vs computed), margins (list), info.
Multi-currency accounts produce multiple rows per state event (one per currency).
Use case: equity curve, margin utilization, currency exposure.
Invocation API - three contexts
Post-backtest
engine.run(start=start_time, end=end_time)
orders_report = engine.trader.generate_orders_report()
positions_report = engine.trader.generate_positions_report()
fills_report = engine.trader.generate_fills_report()The BacktestEngine exposes its trader after run(). Reports are read
from the engine’s Cache, which holds the entire run’s state in memory.
Post-live snapshot
Identical API on the live TradingNode’s trader:
node.trader.generate_orders_report()
node.trader.generate_positions_report()
node.trader.generate_account_report(venue)Cache is the source. With Redis-backed cache, this works across
restarts (per nautilus-cache.md).
In-flight queries (live)
Reports can be generated periodically by an Actor:
import pandas as pd
class ReportingActor(Actor):
def on_start(self):
self.clock.set_timer(
name="generate_reports",
interval=pd.Timedelta(minutes=30),
callback=self.generate_reports,
)
def generate_reports(self, event):
positions_report = self.trader.generate_positions_report()
positions_report.to_csv(f"positions_{event.ts_event}.csv")Use cases: end-of-day rollup, periodic CSV/Parquet export, live dashboard refresh.
Output formats - what comes out
| Format | How | Notes |
|---|---|---|
pd.DataFrame | Native return type of every report | The universal interchange shape |
| CSV | df.to_csv(path) | Lossy on Decimal types (becomes string) - fine if you keep quantity / price columns string-typed |
| Parquet | df.to_parquet(path) | Recommended for downstream ML. Preserves dtype, columnar, fast |
| JSON | df.to_json(path) | Round-trips cleanly with orient='records'; verbose |
| HTML tearsheet | create_tearsheet(engine, output_path=...) | Plotly interactive HTML, equity curve + drawdown + monthly heatmap + stats table + returns distribution |
| Individual Plotly figures | create_equity_curve(returns) | .show() (browser) or .write_image('png') (requires kaleido) |
The DataFrame is the seam. Every export format is downstream of df.
Visualization - tearsheets
from nautilus_trader.analysis import create_tearsheet
engine.run()
create_tearsheet(engine, output_path="tearsheet.html")Contents (per docs verbatim):
- Equity curve
- Drawdown analysis
- Monthly returns heatmap
- Performance statistics table
- Returns distribution
Lower-level Plotly figures available individually:
from nautilus_trader.analysis import create_equity_curve
returns = engine.portfolio.analyzer.returns()
fig = create_equity_curve(returns, title="Cortana MK3 - May 2026")
fig.show()
fig.write_image("equity.png")Install: uv pip install "nautilus_trader[visualization]".
For Cortana M2: create_tearsheet is the literal “tearsheet” referenced
in the M2 deliverable. Win-rate and PnL stats appear on the stats table;
Brier and AUC do NOT (see Limitations).
Portfolio statistics - the analytics layer next door
portfolio = engine.portfolio
stats_pnls = portfolio.analyzer.get_performance_stats_pnls()
stats_returns = portfolio.analyzer.get_performance_stats_returns()
stats_general = portfolio.analyzer.get_performance_stats_general()Three buckets, each a dict-like:
- PnLs -
PnL (total), per-currency PnL breakdown - Returns -
Sharpe Ratio (252 days),Sortino Ratio,Max Drawdown, return distribution moments - General -
Win Rate,Profit Factor,Avg Winner,Avg Loser, position counts
The doc’s example aggregation:
results = {
"total_positions": len(positions_closed),
"pnl_total": stats_pnls.get("PnL (total)"),
"sharpe_ratio": stats_returns.get("Sharpe Ratio (252 days)"),
"profit_factor": stats_general.get("Profit Factor"),
"win_rate": stats_general.get("Win Rate"),
}Win Rate is the literal field that satisfies the M2 “comparable win
rate” deliverable. Brier and AUC are not in any of the three buckets;
the parallel nautilus-portfolio.md page (when filed) covers
PortfolioStatistic registration as the customization seam.
Customization - PortfolioStatistic and beyond
Per the doc: “For detailed information about available statistics and creating custom metrics, see the Portfolio guide.” Registration pattern (forward-referenced, source canonical):
from nautilus_trader.analysis.statistic import PortfolioStatistic
class BrierScore(PortfolioStatistic):
def name(self): return "Brier Score"
def calculate_from_returns(self, returns): ...
# Or calculate_from_orders / from_positions for non-return inputs
portfolio.analyzer.register_statistic(BrierScore())
stats = portfolio.analyzer.get_performance_stats_general()
# Now includes "Brier Score" if calculate_from_X matched the bucketThree bucket-specific calculation methods (from the docs canon):
calculate_from_returns(...), calculate_from_orders(...),
calculate_from_positions(...). The bucket determines which input the
statistic receives. Brier and AUC need predicted probabilities paired
with realized outcomes - neither is exposed by the built-in inputs.
This is the open question for Cortana customization (see Limitations).
PnL accounting - the load-bearing details
Three considerations the doc calls out:
Position-based PnL
- Realized PnL computed when positions are partially or fully closed.
- Unrealized PnL marked-to-market against current price.
- Commission impact included only when in settlement currency.
Multi-currency accounting
- Each position tracks PnL in its settlement currency.
- Portfolio aggregation requires user-provided exchange rates.
- Commission currencies may differ from settlement.
For Cortana (USD throughout - SPY options, IBKR USD account), this collapses to the single-currency case. No conversion math.
Snapshot considerations (NETTING OMS - Cortana’s case)
from nautilus_trader.model.objects import Money
pnl_by_currency = {}
for position in cache.positions(instrument_id=instrument_id):
if position.realized_pnl:
currency = position.realized_pnl.currency
pnl_by_currency.setdefault(currency, 0.0)
pnl_by_currency[currency] += position.realized_pnl.as_double()
for snapshot in cache.position_snapshots(instrument_id=instrument_id):
if snapshot.realized_pnl:
currency = snapshot.realized_pnl.currency
pnl_by_currency.setdefault(currency, 0.0)
pnl_by_currency[currency] += snapshot.realized_pnl.as_double()
total_pnls = [Money(amount, currency)
for currency, amount in pnl_by_currency.items()]If you query positions and forget snapshots in NETTING OMS, you
undercount realized PnL for any instrument that has reopened. The
Trader helper does this for you; the ReportProvider direct path does
not.
Backtest vs live - uniformity
The docs claim uniformity verbatim: “Reports provide consistent analytics across both backtesting and live trading environments, enabling reliable performance evaluation and strategy comparison.”
| Aspect | Backtest | Live |
|---|---|---|
| Data source | engine.cache | node.cache (in-memory, optionally Redis-backed) |
| Snapshot inclusion | Trader helper auto-includes | Same - Trader helper auto-includes |
| Reconciliation impact | None | Reconciliation events flow through Cache; reports reflect them |
| Performance | Sub-second on backtest cache | O(orders + positions) on live cache; Redis-backed cache may add network latency |
For Cortana M2: the same code that produces the backtest tearsheet runs against the live MK3 paper session. Decision-diff harness (next section) exploits this.
Cortana MK3 implications
M2 deliverable: “Nautilus tearsheet shows backtest of last 4 weeks with comparable Brier score, AUC, win rate to MK2”
Three sub-deliverables, three answers:
- Win rate - first-class.
stats_general["Win Rate"]. Done by runningcreate_tearsheet(engine)and reading the stats table. - Brier score - not built in. Custom
PortfolioStatisticsubclass implementingcalculate_from_XwhereXis the bucket carrying the meta-model’s predicted probabilities. Predicted probabilities are NOT in the built-in inputs, so this requires either (a) emitting a customWinProbEstimateevent from the Strategy at decision time and persisting predicted-prob alongside the realized outcome, then computing Brier in a post-hoc pandas pipeline reading the Parquet’d events; or (b) attaching predicted prob to the order’stagsfield and reading it in a customcalculate_from_ordersstatistic. (a) is cleaner - the event stream is already the audit trail pernautilus-events.md. - AUC - same shape as Brier. Custom statistic OR post-hoc pipeline
joining
WinProbEstimateevents to position-close outcomes.
Recommendation: post-hoc pandas pipeline. Brier and AUC are
diagnostic, not real-time, so adding a PortfolioStatistic adds
runtime cost without runtime benefit. Build them as a separate
mk3_eval.py module that reads the Parquet event stream + positions
report.
M2 deliverable: “Daily MK2/MK3 decision diff < 5%”
The decision-diff harness pattern:
# Run identical inputs through both engines
mk2_decisions = mk2.run(date) # → list of (ts, action, contract)
mk3_decisions = mk3.run(date) # → list of (ts, action, contract)
# Generate MK3 reports
mk3_orders = mk3.trader.generate_orders_report()
mk3_positions = mk3.trader.generate_positions_report()
# Diff
diff_df = compare_decisions(mk2_decisions, mk3_orders)
diff_pct = diff_df["disagreement"].mean()
assert diff_pct < 0.05The MK3 side comes free from generate_orders_report(). The MK2 side
needs a parallel adapter (the existing decisions.db queries already
produce something close - port the SELECT into a pandas DataFrame with
the same column shape as the Nautilus orders report). The harness
becomes a simple pandas inner-join on (ts_init, instrument_id) with
a side/status equality check.
Parquet export for downstream ML training
The cleanest path:
# Post-backtest
orders_report = engine.trader.generate_orders_report()
positions_report = engine.trader.generate_positions_report()
fills_report = engine.trader.generate_fills_report()
orders_report.to_parquet(f"runs/{run_id}/orders.parquet")
positions_report.to_parquet(f"runs/{run_id}/positions.parquet")
fills_report.to_parquet(f"runs/{run_id}/fills.parquet")Plus the custom event stream (per nautilus-events.md):
# A logging Actor subscribes to all custom events and writes Parquet
class AuditLogger(Actor):
def on_event(self, event):
if isinstance(event, (ScoreUpdate, GateDecision, WinProbEstimate)):
self._sink.append_parquet(event)ML training joins these by (ts_event, position_id) to assemble a row
class of (features, predicted_prob, realized_outcome, realized_pnl).
That row class is the meta-model training set.
Do NOT use ParquetDataCatalog for this. That catalog is for
market data (quotes, trades, bars) - it has its own schema
conventions ({start_timestamp}_{end_timestamp}.parquet,
type-and-instrument partitioning). Analytics rows go to a separate
filesystem path (runs/{run_id}/...) so they don’t pollute the
catalog and so they can be regenerated without touching market data.
What’s NOT in MK3 reports (deferred)
- Customer-facing tearsheet UI - M5 deliverable; reuses the Plotly HTML output in an iframe.
- Per-tenant report partitioning - M4 deliverable; each
TradingNodeproduces its own reports per the per-tenant cache scoping innautilus-cache.md. - Streaming reports to a database - out of scope; the doc explicitly says “Reports are generated from in-memory data structures. For large-scale analysis or long-running systems, consider persisting reports to a database for efficient querying.” Cortana persists to Parquet, not Postgres.
Limitations and gotchas
- No native Brier / AUC / log-loss / classification metrics. The built-in stats are PnL- and return-centric. Probability-vs-outcome metrics require custom statistics or a post-hoc pipeline. This is the single most important MK3 finding from this page.
- Snapshot omission silently undercounts realized PnL in NETTING
OMS. Use the Trader helper unless you have a specific reason to call
ReportProviderdirectly. quantity,filled_qty,price,last_px,last_qty,total,free,lockedare strings, not floats. PreservesDecimalprecision; requires explicit cast for math.ts_initandts_lasttypes differ across reports. Orders report keeps Unix nanoseconds. Order-fills report and fills report convert todatetime. Positions report mixes (ts_initis nanos,ts_opened/ts_closedare datetimes). Reconcile types before cross-report joins.create_tearsheetrequires the[visualization]extras (Plotly- scipy). PNG export additionally requires
kaleido.
- scipy). PNG export additionally requires
- Reports read from in-memory Cache. With Redis-backed live cache, reports reflect Redis state on demand - but if Redis is down or partially populated post-restart, reports are partial. Reconcile before reporting.
- HEDGING OMS does not use snapshots. Cortana on IBKR is NETTING so this doesn’t apply, but note for any future market that uses HEDGING (some FX prime brokers, some crypto venues).
PositionStatusReportfrom adapters is NOT a report in this sense - that’s a reconciliation report variant for venue → engine state alignment (pernautilus-execution.md). Different concept, same word.- Multi-currency aggregation requires user-provided exchange rates
- Nautilus does not bundle a forex rate source. Cortana’s USD-throughout architecture sidesteps this.
Open questions for the 2026-05-09 spike
PortfolioStatisticcalculation method signature. Source-readnautilus_trader/analysis/statistic.pyto confirm whethercalculate_from_Xmethods receive the raw input or a pre-aggregated shape, and whether they can opt into multiple buckets at once.- Custom event → DataFrame conversion. Confirm whether
engine.cacheexposes custom events for report-style retrieval, or whether the audit-logger Actor + Parquet sink is the only path. - In-flight tearsheet generation for live. Can
create_tearsheetaccept a liveTradingNode’s trader, or is it backtest-engine-only? Affects whether the M3 dashboard can render a live tearsheet. - Redis-backed cache report latency. With 1k-5k orders in cache
and Redis on a separate host, what’s the round-trip time for
generate_orders_report()? Affects whether to persist reports periodically vs generate-on-demand. - Per-tenant report partitioning. Confirm reports are scoped to
the local
TradingNode’s cache automatically (they should be - one cache per node - but verify before M4).
See Also
- Nautilus Portfolio (parallel -
analyzer, performance stat buckets,PortfolioStatisticregistration, the customization seam for Brier/AUC) - Nautilus Events - custom-event taxonomy that
carries
WinProbEstimate/ScoreUpdate/GateDecisionfor downstream-ML training rows - Nautilus Execution - order/fill/position events that the reports flatten into DataFrames; reconciliation report variants (different “report” - same word)
- Nautilus Cache - the data source for every report; in-memory vs Redis-backed; per-tenant scoping
- Nautilus Tutorials - Quickstart and Backtest (High-Level) tutorials that exercise the report API
- 2026-05-09 Nautilus Spike Plan:
~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md - MK3 Roadmap:
~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-mk3-roadmap.md project_codex_review_p2s.md- meta-prob evaluation context (Brier/AUC is how we’ll compare MK3 meta-gate quality vs MK2)
Timeline
2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 3.