Databento account setup - Cortana operator reference

One-page operator reference for the Databento account that funds Cortana MK3’s historical-OPRA replay (per databento-vs-uw-vs-ibkr-data-feeds.md). Captures sign-up shape, credential storage, schema names you’ll actually request, and the calc-before-pull discipline that keeps the $125 trial credits from disappearing on a fat-finger.

Account shape

  • Plan: free tier - $125 credits, 6-month expiration, one set per team.
  • Tier ladder (don’t auto-upgrade): Usage-based → Standard (1,399/mo annual) → Unlimited ($3,500/mo annual). See databento-vs-uw-vs-ibkr-data-feeds.md § “Pricing tiers” for the upgrade trigger map.
  • Exchange agreements: OPRA requires a click-through agreement on sign-up. Personal/research use is in-bounds; redistribution (anything customer-facing under an MK3 SaaS) needs a separate license.

Credential storage

  • API key lives in the account dashboard. Generate one named cortana-mk3 so the audit trail is obvious.
  • Storage path: ~/.config/cortana/databento.env - single line DATABENTO_API_KEY=.... Never commit, never put in repo .env.
  • Loader pattern: source ~/.config/cortana/databento.env from launchd preflight or shell rc, mirroring the IBKR / UW key pattern.
  • Rotation: treat like any vendor key - rotate if a workspace is archived to iCloud (per project_data_loss_april22.md) or shared with another machine.

Python SDK

# Direct SDK (for ad-hoc pulls / spike validation)
pip install databento
 
# Nautilus adapter (brings the SDK in transitively as a dep)
uv pip install "nautilus_trader[databento]"

The Nautilus adapter wraps the SDK; once nautilus_trader is installed with the databento extra, ad-hoc SDK calls and Nautilus DatabentoDataLoader ingest both work from the same env.

Schemas Cortana will actually use

Don’t request anything else during the spike - every additional schema burns credits.

SchemaWhat it isCortana use
tradesEvery option print (executions)UW alert cross-validation; raw flow reconstruction
mbp-1Top-of-book quote on the chain (best bid/ask + size)Adverse-selection check for replay; quote at decision time
mbp-1010-deep book on each sideAdversarial backtest only - defer until v2
mboFull L3 market-by-orderOverkill - defer indefinitely unless we want HFT-grade replay
ohlcv-1s / ohlcv-1mAggregated barsSkip - Nautilus aggregates from trades if needed

Default spike pull: SPY (or SPY chain) - trades + mbp-1, single trading day, OPRA dataset.

Calc-before-pull discipline (load-bearing)

Always estimate cost before kicking off a pull. The $125 trial budget disappears fast if you accidentally request a month of MBO across the whole chain.

# Cost preview pattern (Databento HTTP client)
from databento import Historical
 
client = Historical(key=os.environ["DATABENTO_API_KEY"])
 
cost = client.metadata.get_cost(
    dataset="OPRA.PILLAR",
    schema="trades",
    symbols=["SPY"],
    stype_in="parent",          # match parent symbol → all option contracts
    start="2026-05-06T13:30:00Z",
    end="2026-05-06T20:00:00Z",
)
print(f"Estimated cost: ${cost:.2f}")
# ABORT if > $30 for the spike validation

Hard rules:

  • Always call get_cost() (or the equivalent CLI dry-run) first.
  • Never request more than 1 trading day for the spike.
  • Hard cap: abort any single pull >$30. Re-scope and try again.
  • Use stype_in="parent" to symbol-resolve the whole option chain at once instead of enumerating every strike.

Identifiers / publisher conventions

  • Dataset code: OPRA.PILLAR for US equity options (the consolidated options tape).
  • Symbology: Databento ships several stype_in modes - raw_symbol (OPRA underlying option symbol), instrument_id (Databento numeric ID), parent (symbol-resolves to whole chain). The spike uses parent for SPY.
  • Nautilus instrument-id mapping: SPY.ARCA (per nautilus-data.md) on the Nautilus side. Databento’s adapter handles the translation; if things look misaligned in the catalog, this is where to look.
  • Timestamps: Databento ships nanosecond UNIX epoch - matches Nautilus ts_event natively. No * 1e6 adjustment needed (unlike UW’s millisecond timestamps).

Live data caveat

  • Usage-based live data is being deprecated 2025-03-31 for most datasets.
  • Live OPRA effectively requires Plus ($1,399/mo annual).
  • Do not turn on live during the spike. Historical-only is the validated path.

Quick smoke test (before any real pull)

source ~/.config/cortana/databento.env
python -c "
import os
from databento import Historical
c = Historical(key=os.environ['DATABENTO_API_KEY'])
print(c.metadata.list_datasets()[:5])
"
# If this prints a list of datasets, the key works.

See also

  • concepts/databento-vs-uw-vs-ibkr-data-feeds.md - the layering decision and tier upgrade triggers
  • concepts/nautilus-data.md - ParquetDataCatalog, ingest pipeline
  • concepts/nautilus-tutorials.md § “Data Catalog with Databento” - Nautilus-side ingest pattern
  • concepts/nautilus-options.md § “docs-blessed vendor for replay” - why OPRA via Databento for options replay
  • plans/2026-05-09-nautilus-spike.md § Step 0.5 - the validation pull this account funds
  • ~/.claude/projects/.../memory/feedback_ibkr_pricing_source.md - pricing-of-record invariant; Databento is replay/audit only

Timeline

  • 2026-05-07 | Cody - Filed alongside the layering decision; account sign-up imminent. Captures pre-spike operator setup so Saturday’s Step 0.5 can run without re-deriving these conventions.