Leading-signal method families for 0DTE SPY (the timing-method gap)

The core reframe (this is the whole point)

The MK2 failure (“late and wrong, crushed on puts”) is not primarily a missing-feed problem. Codebase inventory 2026-05-18 shows MK2 already ingests ~19 feeds and computes ~24 “leading” methods. The failures are:

  1. Open interest is a stock; aggressive flow is a flow. The engine collapses the two - reads a static 1–11k. Volume-vs-OI (volume > OI = new positioning; volume ≪ OI = closing churn on the old stack) is the direct antidote and is under-used.
  2. GEX is computed but used as score points, not as a regime GATE. GEX is never directional - it is a mean-reversion-vs-momentum regime classifier. “Crushed on puts” is textbook buying puts in a positive-gamma (dealer dip-buying) regime that mean-reverts the move. Using GEX for directional score points is the category error.
  3. Entry fires at climax because the exhaustion gate is partial and there is no start-of-move (CUSUM) trigger - so it confirms late instead of detecting the leg’s start.

So: fix the use of existing data before buying new data. See 2026-05-18-put-side-structural-loss-and-bypass-impulse, 2026-05-15-mk3-setup-hunter-architecture, 2026-05-14-five-corpus-rules-for-mk3.

Method families: leads/lags, data cost, bars-vs-tick

FamilyLeads?Data costBars OK?MK2 status
Intraday GEX / zero-gamma-flip regime gateRegime (gates direction)Low (have it)Yes (min)Computed, misused as score pts not a gate
Volume-climax + range-decel exhaustion VETONegative filterTrivial (1-min)YesPartial/weak (GH #84 pending)
CUSUM start-of-move triggerYes (leg start)Trivial (1-min)YesAbsent (regime-change = discrete GEX flip only)
ES-leads-SPY (post-midday) / VIX1D / termYes (regime/size)Low (have ES+VIX)YesES basis + VIX term present; VIX1D not isolated
NYSE TICK / $ADD breadthYes (breadth aggressor)Low (standard retail)5–15s bestABSENT - genuine cheap feed gap
Volume-vs-OI new-positioning filterYes (flow≠stock)Low (have UW)YesUnder-used; the static-OI bug
Trade aggressor (Lee-Ready/tick rule)Yes (~20-tick lead)High (tick+quote)NoUW options-side only; no equity-tape classify
BVC signed flowYes (probabilistic)Low (1-min)Yes (bar-native)Absent (the no-tick-feed fallback)
OFI / Kyle’s λYes (strong)High (L2/MBO)NoAbsent → MK3/Databento
BOCPD on signed flowYes (regime transition)HighNoAbsent → MK3
VPIN, naked Kyle’s λHype-flagHigh-Skip (VPIN ≈ volatility proxy, Andersen-Bondarenko 2014)

Ranked fixes by edge-per-cost (retail-accessible)

  1. GEX/zero-gamma-flip as a hard regime gate - no new data. Above flip = mean-revert regime → suppress momentum/put continuation; below = momentum regime → trend entries (incl. puts) allowed. Directly explains and fixes “crushed on puts.” Highest leverage, zero feed cost.
  2. Exhaustion VETO + CUSUM start-of-move - 1-min bars only. Veto = volume-climax + range-expansion-then-deceleration (kills the climax entry). CUSUM (vol-scaled threshold) = fire at drift start. Must be trend-day-gated (don’t veto one-way-day continuations).
  3. Add NYSE TICK / $ADD breadth feed - the one genuinely missing cheap feed. Breadth-wide aggressor pressure; leading; needs regime disambiguation (TICK extreme = reversal on normal day, continuation if persists >10min on trend day).
  4. Volume-vs-OI new-positioning filter + sweep tags as weak prior - corrects the static-OI-as-flow category error directly.
  5. BVC signed flow - only bar-native aggressor proxy; use if no tick budget. 6+. True OFI / Lee-Ready / BOCPD-on-flow → MK3 via Databento (paid microstructure; degrades in fast bursts anyway - not a quick win).

Hype / category-error flags (do NOT chase)

  • VPIN is largely a volatility proxy (Andersen & Bondarenko 2014) - a risk/sizing conditioner at best, never an informed-flow trigger.
  • Options “sweep” tags = weak prior; many are routing artifacts, not conviction. MK2 may be over-trusting them.
  • GEX as a directional signal = the exact category error driving the put losses. GEX classifies regime, never direction.
  • “ES always leads SPY” = false; cash/ETF leads in the morning, ES takes over post-midday. Lead direction is time-of-day dependent.
  • Lee-Ready/tick rule have degraded to ~77–79% in the HFT era and fail worst in fast bursts - i.e., exactly the climax moments. BVC (~90% at coarse buckets) is the bar-native alternative.

The microstructure feed path = MK3 / Databento (Nautilus-native)

Databento OPRA (options) + equity tape decodes natively to Nautilus TradeTick with aggressor_side embedded; MBP-1/MBP-10/MBO give L1/L2/L3 for OFI/λ. Path: DBN → ParquetDataCatalog (convert once) → BacktestEngine streams it. This is also the answer to “do the 1-yr backtest the Nautilus way.” Spike (1 day) < ~$30 on free credits; full 1-year needs metadata.get_cost budget confirmation. See nautilus-databento, nautilus-data, nautilus-backtesting. Per 0dte-ml-best-in-class-comparison: path to 80% WR = clean labels + meta-veto + microstructure (not a better model); needs n≥500 + 6–12mo - the Databento microstructure stack is the enabling input.

Prerequisite flag (not a method)

The 1-year UW data-shop bundle purchased 2026-05-07 ($1,531, 10 datasets) appears not downloaded on this machine (RED in 2026-05-15-mk3-data-foundation-constraint). It gates the lead-lag / regime-conditioned backtests. Re-acquire from the UW account before those analyses can run; this is a download prerequisite, not a build task.

Sources

Lee-Ready/BVC era-degradation: Chakrabarty/Pascual/Shkilko (JFM 2015); Panayides/Shohfi/Smith. OFI/VPIN/λ: Easley/López de Prado/O’Hara; Andersen & Bondarenko 2014 (VPIN≈vol proxy). GEX/charm/vanna: SpotGamma, MenthorQ, FlashAlpha. Cross-asset: Hasbrouck 2003 (ES price discovery); Cboe VIX1D; NYSE TICK trend-day (tosindicators). Change-point: BOCPD order-flow (arXiv 2307.02375), BOCPD financial TS (ACM 2025). Feed vendors: Databento OPRA.PILLAR, Polygon, CBOE LiveVol. (Full URLs in the 2026-05-18 research agent transcript.)