2026-05-05 Overnight ML Pipeline Rebuild
12-hour push from 2026-05-04 evening to 2026-05-05 6:30 AM. Goal: stop the bleeding on the ML data layer (silent label rot since Apr 17), then pivot from “more XGBoost” to meta-labeling per the literature. Result: 18 commits, full test suite green for the first time in days, secondary classifier live in shadow mode with charm/vanna magnitude features empirically validated by mutual-information selection.
What broke (verified, not assumed)
Three bugs that had been silently corrupting the ML data loop:
-
outcome_resolvedbackfill rolled back every run since 2026-04-17.decision_logger.py:798ran DETACH DATABASE while a write transaction was open; SQLite under WAL throws “database pt is locked”; the exception handler rolled the whole batch back; the caller swallowed the error as a non-fatal warning. 9 nightly retrains on a frozen 36-row snapshot. Model AUC=0.50 every time because there was nothing new to fit.- Fix: explicit
conn.commit()before DETACH (commit00d0537). - Result: 36 → 130 ENTERED resolved (3.6x lift).
- Fix: explicit
-
outcome_hit_tpsubstring check missedTAKE_PROFIT. The check"TP" in str(exit_reason).upper()doesn’t match “TAKE_PROFIT” (the characters T-A-K-E-_-P-R-O-F-I-T contain no “TP” pair). 22 winning trades were flagged as losses in the dashboard/calibration tables.- Fix:
er.startswith("TP") or "TAKE_PROFIT" in er(commit9ab9b00). - Result: hit_tp=1 went from 5 → 27 (5.4x correction).
- Fix:
-
forward_returns—apply path rolled back even when called. The script ran the UPDATE in both —apply and —no-apply paths, then incremented the counter unconditionally regardless of cursor.rowcount. Dry-run rolled the writes back but the “Updated 106 trades” log line still printed.- Fix: rowcount validation, —apply guard around the actual UPDATE
(commit
538cb7a).
- Fix: rowcount validation, —apply guard around the actual UPDATE
(commit
-
triple_barrier_labelreadunderlying_price(NULL in production); should have readoption_mid. The hand-rolled Codex impl used underlying SPY price, but productiontrade_path_snapshotsonly populatesoption_mid(option premium). Backfill produced 0 labels on first run.- Fix:
COALESCE(option_mid, underlying_price)+ treat early exits as scratch (commit85e058e). - Result: 0 → 84 production labels (55 SL-first, 29 TP-first).
- Fix:
-
ENTRY_PARTIAL_THEN_REJECTEDskippedconfirm_fill. When an entry partially filled then got rejected, the engine adopted the partial as a smaller position viapos.contracts_remaining = filled_qtydirectly - skippingtotal_contracts,entry_count,avg_cost,entry_order_filled_qty,sl_price. P&L computation downstream was malformed; sl_price stayed at 0.- Fix: route through
confirm_filllike_cancel_entrydoes (commit895cec8). - Test that had been quietly failing all session is now green.
- Fix: route through
-
14 historical CLOSED trades had no outcomes row - most from the Apr 7-10 era plus 4 persistent later-era cases. Most likely cause: force_close scripts and broker reconcile paths that flipped status=‘CLOSED’ without going through
close_trade().- Heal script:
scripts/heal_missing_outcomes.pyreconstructs from partial_exits or last path snapshot; flagsexit_reason='HEALED_*'. - Result: 5 healed (4 from partials, 1 from snapshot), 9 lost (early-era pre-path-tracker). 130 → 130 trainable (5 new resolved scoring_events from the heal).
- Heal script:
What we built
Triple-barrier vol-scaled label (Task #57)
src/cortana/engines/triple_barrier_label.py- TP/SL/30-min timeout with vol-scaled barriers from 30-min realized vol of pre-entry underlying. mlfinpy parity test (3/3 green) confirms our impl matches López de Prado’s canonical algorithm.outcomes.triple_barrier_labelcolumn populated nightly.
Meta-labeling secondary classifier (Task #56)
src/cortana/engines/secondary_model.py- L1-penalized logistic regression withMIN_SAMPLES_TO_TRAIN=50, mutual-information feature selection down to 8 features.- Wired in shadow mode: emits
meta_win_probto scoring_events for every decision. Does NOT gate live trades (Task #56 phase 2 after AUC validation). load_meta_labeling_training_data()JOINs decisions.db ↔ paper_trades.db with the COMMIT-before-DETACH pattern.
Charm + vanna magnitudes (Task #58)
- Continuous magnitudes added to FEATURIZE_EVENT_COLUMNS:
dp_charm_magnitude = total_charm * spy_spot * 100(per-min hedge $)dp_vanna_magnitude = total_vanna * spy_spot * vix(per-VIX-pt hedge $)
- Mutual-information selection picked both in the live model’s top-8 on 2026-05-05 06:30 retrain - empirical confirmation of the literature recommendation that magnitudes carry signal that 1-bit directions don’t.
Broker-direct reconciler (Task #42)
scripts/broker_truth_reconcile.py- pulls IBKR live socket (reqExecutionsAsync 24h, ib.positions, ib.openOrders, ib.accountSummary)- Flex Query (all-time, gated on env vars). Compares against paper_trades.db. Emits structured findings; —apply only fixes per-trade qty/price/pnl drift (refuses to touch positions table).
- Scheduled daily at 15:30 CT via
com.cortanaroi.broker-reconcile.plist.
Nightly ML pipeline orchestrator
scripts/nightly_ml_pipeline.py- chains forward_returns + triple_barrier + outcomes_backfill + secondary retrain. Each step independent; orchestrator returns non-zero only if ALL fail.- Replaces single-step
com.cortanaroi.ml-backfill.plistinvocation. Runs 16:00 CT weekdays.
Empirical findings worth remembering
-
XGBoost on n=100 is anti-predictive. TabPFN spike (commit
98162f1) measured AUC = 0.430 on a 25-row time-series holdout. Brier barely above the 0.25 random floor. Conditional expectancy at p≥0.55 = 0.682, base rate = 0.680 (model adds nothing). The literature was right about small-n XGBoost. -
Meta-labeling beat XGBoost in the published literature - H&T’s verified examples show 17-57pp accuracy lifts (Bollinger MR 20→77% validation, SMA crossover 37→56%). At our n=100, this is the highest-EV move per the comparison.
-
80% win rate via ML alone is a 6-12 month horizon. Codex’s adversarial review computed: cutting losses 44 → 15 with zero win attrition is a 66% loss reduction with no win attrition - extreme bar. Filter alone won’t get there. Plausible path: meta-labeling veto on top 20-30% danger setups + microstructure capture (#51, 3-5 weeks).
Open Threads
- Task #62: User must manually accept the TabPFN license at https://ux.priorlabs.ai to enable the spike’s TabPFN side. Once TABPFN_TOKEN is set, re-run scripts.tabpfn_spike for the A/B.
- Task #51: Microstructure feature tables - the only path to 80% per the literature. 5 new tables (entry_microstructure, signal_execution_context, cross_asset_state, flow_state, trader_state). 3-5 weeks scope.
- Task #46: Broker-truth-first writes - architectural shift so DB is a projection of broker_executions, not authored independently. Needs design doc.
- Task #36: Dashboard meta_win_prob cards - fired tonight as a Codex tab; status pending verification.
- Task #53: Promotion gate based on conditional expectancy, not AUC. Blocked on having more shadow-mode meta_win_prob data.
See Also
- 0DTE ML Best-in-Class Comparison
- Meta-Labeling Implementation Patterns
- ML Training Label Grounding
- 2026-05-04 Adversarial ML Data Review
- 2026-05-05 TabPFN Spike + XGBoost Baseline
Timeline
2026-05-05 06:30 CDT | observed - Pre-market boot verify run after all night work. Schema migration applied production decisions.db (added dp_charm_magnitude + dp_vanna_magnitude). Secondary retrained on 101 rows. Mutual-info selected both new magnitude features in top-8 - first empirical evidence that the magnitude pivot is the right call. Engine loads cleanly.
2026-05-05 04:00-06:30 CDT | Push session 2. Codex #58 charm/vanna magnitudes shipped. position_manager bug diagnosed and fixed (was labeled “known pre-existing” but was a real production bug since de2898f). Heal script for missing outcomes ran (5 healed). Daily broker reconcile + nightly ML orchestrator wired into launchd.
2026-05-04 evening | Push session 1. outcome_resolved root cause found and fixed; 36 → 125 ENTERED resolved. Triple-barrier label shipped. TabPFN spike ran (XGBoost AUC=0.430 baseline). Meta-labeling secondary classifier landed in shadow mode.