Databento account setup - Cortana operator reference

One-page operator reference for the Databento account that funds Cortana MK3’s historical-OPRA replay (per databento-vs-uw-vs-ibkr-data-feeds.md). Captures sign-up shape, credential storage, schema names you’ll actually request, and the calc-before-pull discipline that keeps the $125 trial credits from disappearing on a fat-finger.

Account shape

Plan: free tier - $125 credits, 6-month expiration, one set per team.
Tier ladder (don’t auto-upgrade): Usage-based → Standard ( $199/ m o) \to Pl u s ($ 1,399/mo annual) → Unlimited ($3,500/mo annual). See databento-vs-uw-vs-ibkr-data-feeds.md § “Pricing tiers” for the upgrade trigger map.
Exchange agreements: OPRA requires a click-through agreement on sign-up. Personal/research use is in-bounds; redistribution (anything customer-facing under an MK3 SaaS) needs a separate license.

Credential storage

API key lives in the account dashboard. Generate one named cortana-mk3 so the audit trail is obvious.
Storage path: ~/.config/cortana/databento.env - single line DATABENTO_API_KEY=.... Never commit, never put in repo .env.
Loader pattern: source ~/.config/cortana/databento.env from launchd preflight or shell rc, mirroring the IBKR / UW key pattern.
Rotation: treat like any vendor key - rotate if a workspace is archived to iCloud (per project_data_loss_april22.md) or shared with another machine.

Python SDK

# Direct SDK (for ad-hoc pulls / spike validation)
pip install databento
 
# Nautilus adapter (brings the SDK in transitively as a dep)
uv pip install "nautilus_trader[databento]"

The Nautilus adapter wraps the SDK; once nautilus_trader is installed with the databento extra, ad-hoc SDK calls and Nautilus DatabentoDataLoader ingest both work from the same env.

Schemas Cortana will actually use

Don’t request anything else during the spike - every additional schema burns credits.

Schema	What it is	Cortana use
`trades`	Every option print (executions)	UW alert cross-validation; raw flow reconstruction
`mbp-1`	Top-of-book quote on the chain (best bid/ask + size)	Adverse-selection check for replay; quote at decision time
`mbp-10`	10-deep book on each side	Adversarial backtest only - defer until v2
`mbo`	Full L3 market-by-order	Overkill - defer indefinitely unless we want HFT-grade replay
`ohlcv-1s` / `ohlcv-1m`	Aggregated bars	Skip - Nautilus aggregates from `trades` if needed

Default spike pull: SPY (or SPY chain) - trades + mbp-1, single trading day, OPRA dataset.

Calc-before-pull discipline (load-bearing)

Always estimate cost before kicking off a pull. The $125 trial budget disappears fast if you accidentally request a month of MBO across the whole chain.

# Cost preview pattern (Databento HTTP client)
from databento import Historical
 
client = Historical(key=os.environ["DATABENTO_API_KEY"])
 
cost = client.metadata.get_cost(
    dataset="OPRA.PILLAR",
    schema="trades",
    symbols=["SPY"],
    stype_in="parent",          # match parent symbol → all option contracts
    start="2026-05-06T13:30:00Z",
    end="2026-05-06T20:00:00Z",
)
print(f"Estimated cost: ${cost:.2f}")
# ABORT if > $30 for the spike validation

Hard rules:

Always call get_cost() (or the equivalent CLI dry-run) first.
Never request more than 1 trading day for the spike.
Hard cap: abort any single pull >$30. Re-scope and try again.
Use stype_in="parent" to symbol-resolve the whole option chain at once instead of enumerating every strike.

Identifiers / publisher conventions

Dataset code: OPRA.PILLAR for US equity options (the consolidated options tape).
Symbology: Databento ships several stype_in modes - raw_symbol (OPRA underlying option symbol), instrument_id (Databento numeric ID), parent (symbol-resolves to whole chain). The spike uses parent for SPY.
Nautilus instrument-id mapping: SPY.ARCA (per nautilus-data.md) on the Nautilus side. Databento’s adapter handles the translation; if things look misaligned in the catalog, this is where to look.
Timestamps: Databento ships nanosecond UNIX epoch - matches Nautilus ts_event natively. No * 1e6 adjustment needed (unlike UW’s millisecond timestamps).

Live data caveat

Usage-based live data is being deprecated 2025-03-31 for most datasets.
Live OPRA effectively requires Plus ($1,399/mo annual).
Do not turn on live during the spike. Historical-only is the validated path.

Quick smoke test (before any real pull)

source ~/.config/cortana/databento.env
python -c "
import os
from databento import Historical
c = Historical(key=os.environ['DATABENTO_API_KEY'])
print(c.metadata.list_datasets()[:5])
"
# If this prints a list of datasets, the key works.

Timeline

2026-05-07 | Cody - Filed alongside the layering decision; account sign-up imminent. Captures pre-spike operator setup so Saturday’s Step 0.5 can run without re-deriving these conventions.

CortanaROI Brain

Explorer

databento-account-setup