Databento vs UW vs IBKR - Cortana data-feed layering

Decision: for Cortana MK3, augment with Databento, do not replace UW or IBKR. Each vendor occupies a distinct role: IBKR for execution + pricing-of-record, UW for derived options-flow signals, Databento for historical-OPRA replay (and optionally raw live OPRA later). The Nautilus stack ties all three through one DataEngine → Cache → MessageBus pipeline; only Databento has an OOB Nautilus adapter, UW remains a custom DataClient, IBKR is built-in.

Tier recommendation: start on $125 f reecre d i t s + u s a g e - ba se d (p a y - a s - yo u - g o) * * . U p g r a d e t o * * St an d a r d ($ 199/mo) only after deciding you want a continuous historical dataset; Plus ($1,399/mo) only if/when going live with Databento for paper-or-real flow. Cortana’s near-term need is one focused historical pull for spike validation - credits cover it.

Role layering (the load-bearing decision)

Concern	Vendor	Nautilus expression
Execution + pricing-of-record	IBKR	Built-in `InteractiveBrokersExecClient` + `InteractiveBrokersDataClient`
Live underlying ticks (SPY)	IBKR	Built-in `QuoteTick` / `Bar` from IBKR DataClient
Live options flow signals (sweeps/blocks/GEX/charm/vanna/dealer pos)	UW	Custom `UWFlowAlert(Data)` via custom `UWDataClient` (sketched in `nautilus-integrations.md`)
Historical OPRA tape for backtest replay	Databento	OOB `DatabentoDataLoader` → `ParquetDataCatalog` (per `nautilus-options.md` § “docs-blessed vendor for replay”)
Full L2/L3 options depth for adversarial backtest	Databento (MBP-10 / MBO schemas)	Native Nautilus types via Databento adapter
UW alert cross-validation (catch missed/dropped flow)	Databento	Reconstruct raw tape, diff against UW alerts

The pricing-source-of-truth invariant (feedback_ibkr_pricing_source.md) is preserved: IBKR remains the only vendor whose price string lands on an order. Databento is replay + audit; UW is signals only.

Why “augment, not replace”

Replacing UW with raw OPRA = 3-6 months of derivatives-quant work. UW pre-derives flow_score, sweep/block classification, GEX/charm/vanna aggregates, dark pool prints, dealer positioning. Databento gives raw prints. Rebuilding UW’s signal layer is the project, not a side quest.
Replacing IBKR is impossible. Databento is data-only; orders go nowhere through it. (And IBKR is the OOB Nautilus execution adapter.)
Adding Databento has a clear marginal win. Backtest replay fidelity is the gating need for the 80% win-rate mandate - decisions.db rows can be replayed through ParquetDataCatalog (per nautilus-data.md § “Replaying today’s decisions.db rows as Nautilus events”) and adversarial scenarios (what UW missed, what dealer flow looked like at the climax) require the raw OPRA tape Databento ships.

Pricing tiers (verified 2026-05-07)

Tier	Price	What’s included	Cortana fit
Free trial	$125 credits, 6 mo	Pay-as-you-go burndown	Use this first - covers spike-step historical pull
Usage-based	$/GB (no monthly)	Pay only what you use; OPRA + CME + NYSE + NASDAQ same $/GB rate	Default ongoing tier post-credits - until consumption is known
Standard	$199/mo	12 mo L1 history, 1 mo L2/L3 then PAYG	Worth it once you commit to continuous historical backfills
Plus	$1,399/mo (annual)	Standard + entire L1 history + live data	Only if going live with Databento as live tick source
Unlimited	$3,500/mo (annual)	Entire history, all schemas	Overkill for solo MK3

Notes:

All exchanges (CME, OPRA, NYSE, NASDAQ) priced uniformly per $/GB - no exchange-tier surcharge on usage-based.
Live data has separate license-fee pass-through from publishers AND per-message billing on some datasets. Usage-based live is being deprecated 2025-03-31 - live data effectively requires Plus or higher.
One free-credit set per team.

Spike validation (one-day pull)

Per the nautilus-spike plan, the validation pattern is:

Sign up, claim $125 credits.
Pull one day of SPY OPRA in MBP-1 + trades schemas via Databento HTTP API or CLI (target: < 1 GB, well within credit budget).
Use Nautilus’s DatabentoDataLoader (per nautilus-tutorials.md § “Data Catalog with Databento”) to ingest into a local ParquetDataCatalog.
Run a BacktestRunConfig over the catalog with the Cortana stub strategy. Confirm QuoteTick/TradeTick materialize correctly and timestamps align with decisions.db rows for the same day.

If the materialization + replay works in <90 min, the Databento path is proven. Defer live subscription decision until backtest value is concrete.

Caveats

Vendor count creep. Three feeds = three auth surfaces, three rate limits, three failure modes. Mitigate via Nautilus’s adapter pattern (one DataClient per vendor; MessageBus topics keep callsites clean).
OPRA license semantics. OPRA historical data is generally fine for research; live OPRA has redistribution rules. Solo paper-trading use is in-bounds; anything customer-facing later requires reading Databento’s terms.
Latency reality check. Databento live is ~5–20ms region-dependent vs IBKR ~50–200ms. Only matters if Cortana’s reaction-time budget is feed- bound, not signal-compute-bound. Today: signal compute dominates.

Open threads

Which Databento schema(s) does Cortana actually need for replay? Likely OPRA Trades (every option print) + MBP-1 (top-of-book on the chain) for starters. MBO (full L3) is overkill until adversarial backtest framework is built. Decide during spike Step 0.5.
How does UW’s flow_score correlate with raw-OPRA-derived sweep/block detection? Unknown until cross-validation runs.
Multi-tenant cost model: at 1000 customers, do we share one Databento historical dataset (single team account, internal redistribution) or require each customer to BYO Databento credentials? Affects MK3 SaaS unit economics.

Timeline

2026-05-07 | Cody - Filed during pre-spike vendor-layering decision; Cody asked “should I sign up for Databento?” Decision: augment, not replace; start on $125 credits + usage-based.
2026-05-07 | Cody - Committed: signing up for Databento. Will use $125 free credits to fund Step 0.5 Databento validation Saturday 2026-05-09. Spike playbook at ~/brain/concepts/nautilus-databento.md. Two blocking unknowns to resolve first thing Saturday: (1) confirm OPRA dataset code (OPRA.PILLAR is the assumption - run databento datasets to verify), (2) check OPRA venue MIC tagging vs IBKR .ARCA listing - re-stamp during catalog ingest if mismatched.
2026-05-07 | Cody - Asked whether Databento’s options + order-book capabilities (0DTE examples, IV estimation, MBP-10/MBO reconstruction) could displace UW. Resolution: yes long-term, no for V1. Decision: stays “backtest only” through M2; Standard tier ( $199/ m o) w h e n re a d y f oro n g o in g hi s t or i c a l rese a rc h; Pl u s t i er ($ 1,399/mo for live tape) deferred to M3-M4 transition when Cortana has authored own signal classification on raw OPRA. The dealer-flow/order-book angle is the load-bearing reason to ever subscribe to live Databento - replacing UW’s interpretation (signal classification, dealer aggregates, GEX/charm/vanna) with own-built classifiers is months of derivatives-quant work, not a side quest. Filed as future research direction; do not let it leak into V1 scope.

2026-05-07 (late evening) | Cody - Purchased UW data shop bundle for SPY: 10 datasets, $1,531 total, all parquet, all single-ticker SPY (Market Tide is market-wide). Corrects my earlier wrong claim that “UW historical = REST-only and shallow.” Actual UW historical product shape is one-time-priced parquet bundles in the data shop, mostly cheap, with depth varying by dataset.

Inventory (Order ID 8158d271-4582-4&a8-ae50-9bf6033316a):

Dataset	URL slug	Granularity	Window	Price	Schema highlights
Market Tide	`/data_shop/market_tide`	1-minute, market-wide	1y	$500	date / net_call_premium / net_put_premium / net_volume / timestamp
Big Option Trades	`/data_shop/big_option_trades`	per-trade (>$25k or >150 contracts)	1y	$180	executed_at / nbbo_bid / nbbo_ask / size / price / option_chain_id / expiry / option_type / open_interest / strike / premium / underlying_price / ewma_nbbo_bid / ewma_nbbo_ask / volume / iv / delta / theta / gamma / vega / rho / theo / market_center_locate / canceled / trade_id / exchange
Net Flow	(TBD)	TBD	1y	$500	schema unverified at purchase; inspect parquet
Option Chains	`/data_shop/option_chains`	per-strike chain snapshots	1y	$180	(assumed: strike, side, expiry, OI, volume, Greeks)
Net Flow Holdings	`/data_shop/net_flow_holdings`	1-minute SPY-specific	90d only	$135	date / net_call_premium / net_put_premium / net_volume / timestamp / underlying_price
Gamma Exposure	`/data_shop/greek_exposure` (named `gamma_exposure`)	daily SPY	1y	$10	date / put_gex / call_gex / net_gex / put_call_gex_ratio
Delta Exposure	(parallel `delta_exposure` SKU)	daily SPY	1y	$10	(assumed: date / put_dex / call_dex / net_dex)
IV Rank	(parallel `iv_rank` SKU)	daily SPY	1y	$10	(assumed: date / iv_rank / iv_percentile)
OHLC Daily	(parallel `ohlc_daily` SKU)	daily SPY	5y	$3	OHLCV
OHLC Intraday	(parallel `ohlc` SKU)	intraday SPY	1y	$3	OHLCV

Key findings:

Big Option Trades is the load-bearing dataset for backtest. Per-trade with NBBO + full Greeks + OI but NOT pre-classified - sweep / block / OTM / opening / aggressor_side / flow_score are derived at backtest time from the raw schema (~150 LOC of derivation logic, validated against MK2’s live flow-alerts labels at target ≥85% agreement). This is structurally the MK3→MK4 “displace UW interpretation” thread executed early at $180 instead of months of derivatives-quant work to rebuild from raw OPRA. Strategic upside: we own the heuristic and can tune it.
Pricing pattern: UW prices by (granularity × scope × window). Per-trade > per-minute > daily. Full-market > single-ticker. 1y > 90d > 30d. Daily-aggregated single-ticker products are essentially free ( $3 - 10) . P er - min u t es in g l e - t i c k er i s mi d - p r i ce d ($ 135-500). Per-trade single-ticker is the most expensive but still bounded ($180).
Net Flow Holdings only 90d - minute-granularity SPY flow with underlying_price. Limits backtest window for features that depend on it. Acceptable; load-bearing flow signal for 1Y is Big Option Trades.
Daily GEX/DEX provide regime classification, NOT intra-day regime gating. MK2’s live spot-exposures endpoint is intra-day 1-min; data shop daily product gives only end-of-day snapshots. For V1 backtest, daily GEX is good enough for chop-vs-trend regime classification per day, which IS what the post-stabilization scoring engine review needs. Intra-day GEX SKU may exist; if found, future purchase.
The MK3→MK4 trajectory shifts. Earlier I framed Databento Plus tier ( $1, 399/ m o f or l i v er a wOPR A) a s t h e p a t h t o d i s pl a ce U W . Wi t h B i g Opt i o n T r a d e s^{'} r a w sc h e maa v ai l ab l e a t$ 180 one-time, the displacement happens in the historical bundle: we develop the classification heuristics on this 1Y of UW-pre-filtered raw trades, then either (a) keep buying yearly UW updates ( $180/ yr t r i v ia l) or (b) re n e g o t ia t e U W re l a t i o n s hi p b y d e m o n s t r a t in g o w n - c l a ss i f i c a t i o n c a p abi l i t y + D a t ab e n t o Pl u s f or l i v er a wOPR A . Pl an B^{'} s$ 1,399/mo cost has a clearer path-to-payback now.

CortanaROI Brain

Explorer

databento-vs-uw-vs-ibkr-data-feeds

Databento vs UW vs IBKR - Cortana data-feed layering

Role layering (the load-bearing decision)

Why “augment, not replace”

Pricing tiers (verified 2026-05-07)

Spike validation (one-day pull)

Caveats

Open threads

See also

Timeline

Graph View

Table of Contents

Backlinks