2026-05-05 Overnight ML Pipeline Rebuild

12-hour push from 2026-05-04 evening to 2026-05-05 6:30 AM. Goal: stop the bleeding on the ML data layer (silent label rot since Apr 17), then pivot from “more XGBoost” to meta-labeling per the literature. Result: 18 commits, full test suite green for the first time in days, secondary classifier live in shadow mode with charm/vanna magnitude features empirically validated by mutual-information selection.

What broke (verified, not assumed)

Three bugs that had been silently corrupting the ML data loop:

outcome_resolved backfill rolled back every run since 2026-04-17. decision_logger.py:798 ran DETACH DATABASE while a write transaction was open; SQLite under WAL throws “database pt is locked”; the exception handler rolled the whole batch back; the caller swallowed the error as a non-fatal warning. 9 nightly retrains on a frozen 36-row snapshot. Model AUC=0.50 every time because there was nothing new to fit.
- Fix: explicit conn.commit() before DETACH (commit 00d0537).
- Result: 36 → 130 ENTERED resolved (3.6x lift).
outcome_hit_tp substring check missed TAKE_PROFIT. The check "TP" in str(exit_reason).upper() doesn’t match “TAKE_PROFIT” (the characters T-A-K-E-_-P-R-O-F-I-T contain no “TP” pair). 22 winning trades were flagged as losses in the dashboard/calibration tables.
- Fix: er.startswith("TP") or "TAKE_PROFIT" in er (commit 9ab9b00).
- Result: hit_tp=1 went from 5 → 27 (5.4x correction).
forward_returns —apply path rolled back even when called. The script ran the UPDATE in both —apply and —no-apply paths, then incremented the counter unconditionally regardless of cursor.rowcount. Dry-run rolled the writes back but the “Updated 106 trades” log line still printed.
- Fix: rowcount validation, —apply guard around the actual UPDATE (commit 538cb7a).
triple_barrier_label read underlying_price (NULL in production); should have read option_mid. The hand-rolled Codex impl used underlying SPY price, but production trade_path_snapshots only populates option_mid (option premium). Backfill produced 0 labels on first run.
- Fix: COALESCE(option_mid, underlying_price) + treat early exits as scratch (commit 85e058e).
- Result: 0 → 84 production labels (55 SL-first, 29 TP-first).
ENTRY_PARTIAL_THEN_REJECTED skipped confirm_fill. When an entry partially filled then got rejected, the engine adopted the partial as a smaller position via pos.contracts_remaining = filled_qty directly - skipping total_contracts, entry_count, avg_cost, entry_order_filled_qty, sl_price. P&L computation downstream was malformed; sl_price stayed at 0.
- Fix: route through confirm_fill like _cancel_entry does (commit 895cec8).
- Test that had been quietly failing all session is now green.
14 historical CLOSED trades had no outcomes row - most from the Apr 7-10 era plus 4 persistent later-era cases. Most likely cause: force_close scripts and broker reconcile paths that flipped status=‘CLOSED’ without going through close_trade().
- Heal script: scripts/heal_missing_outcomes.py reconstructs from partial_exits or last path snapshot; flags exit_reason='HEALED_*'.
- Result: 5 healed (4 from partials, 1 from snapshot), 9 lost (early-era pre-path-tracker). 130 → 130 trainable (5 new resolved scoring_events from the heal).

What we built

Triple-barrier vol-scaled label (Task #57)

src/cortana/engines/triple_barrier_label.py - TP/SL/30-min timeout with vol-scaled barriers from 30-min realized vol of pre-entry underlying. mlfinpy parity test (3/3 green) confirms our impl matches López de Prado’s canonical algorithm.
outcomes.triple_barrier_label column populated nightly.

Meta-labeling secondary classifier (Task #56)

src/cortana/engines/secondary_model.py - L1-penalized logistic regression with MIN_SAMPLES_TO_TRAIN=50, mutual-information feature selection down to 8 features.
Wired in shadow mode: emits meta_win_prob to scoring_events for every decision. Does NOT gate live trades (Task #56 phase 2 after AUC validation).
load_meta_labeling_training_data() JOINs decisions.db ↔ paper_trades.db with the COMMIT-before-DETACH pattern.

Charm + vanna magnitudes (Task #58)

Continuous magnitudes added to FEATURIZE_EVENT_COLUMNS:
- dp_charm_magnitude = total_charm * spy_spot * 100 (per-min hedge $)
- dp_vanna_magnitude = total_vanna * spy_spot * vix (per-VIX-pt hedge $)
Mutual-information selection picked both in the live model’s top-8 on 2026-05-05 06:30 retrain - empirical confirmation of the literature recommendation that magnitudes carry signal that 1-bit directions don’t.

Broker-direct reconciler (Task #42)

scripts/broker_truth_reconcile.py - pulls IBKR live socket (reqExecutionsAsync 24h, ib.positions, ib.openOrders, ib.accountSummary)
- Flex Query (all-time, gated on env vars). Compares against paper_trades.db. Emits structured findings; —apply only fixes per-trade qty/price/pnl drift (refuses to touch positions table).
Scheduled daily at 15:30 CT via com.cortanaroi.broker-reconcile.plist.

Nightly ML pipeline orchestrator

scripts/nightly_ml_pipeline.py - chains forward_returns + triple_barrier + outcomes_backfill + secondary retrain. Each step independent; orchestrator returns non-zero only if ALL fail.
Replaces single-step com.cortanaroi.ml-backfill.plist invocation. Runs 16:00 CT weekdays.

Empirical findings worth remembering

XGBoost on n=100 is anti-predictive. TabPFN spike (commit 98162f1) measured AUC = 0.430 on a 25-row time-series holdout. Brier barely above the 0.25 random floor. Conditional expectancy at p≥0.55 = 0.682, base rate = 0.680 (model adds nothing). The literature was right about small-n XGBoost.
Meta-labeling beat XGBoost in the published literature - H&T’s verified examples show 17-57pp accuracy lifts (Bollinger MR 20→77% validation, SMA crossover 37→56%). At our n=100, this is the highest-EV move per the comparison.
80% win rate via ML alone is a 6-12 month horizon. Codex’s adversarial review computed: cutting losses 44 → 15 with zero win attrition is a 66% loss reduction with no win attrition - extreme bar. Filter alone won’t get there. Plausible path: meta-labeling veto on top 20-30% danger setups + microstructure capture (#51, 3-5 weeks).

Open Threads

Task #62: User must manually accept the TabPFN license at https://ux.priorlabs.ai to enable the spike’s TabPFN side. Once TABPFN_TOKEN is set, re-run scripts.tabpfn_spike for the A/B.
Task #51: Microstructure feature tables - the only path to 80% per the literature. 5 new tables (entry_microstructure, signal_execution_context, cross_asset_state, flow_state, trader_state). 3-5 weeks scope.
Task #46: Broker-truth-first writes - architectural shift so DB is a projection of broker_executions, not authored independently. Needs design doc.
Task #36: Dashboard meta_win_prob cards - fired tonight as a Codex tab; status pending verification.
Task #53: Promotion gate based on conditional expectancy, not AUC. Blocked on having more shadow-mode meta_win_prob data.

Timeline

2026-05-05 06:30 CDT | observed - Pre-market boot verify run after all night work. Schema migration applied production decisions.db (added dp_charm_magnitude + dp_vanna_magnitude). Secondary retrained on 101 rows. Mutual-info selected both new magnitude features in top-8 - first empirical evidence that the magnitude pivot is the right call. Engine loads cleanly.

2026-05-05 04:00-06:30 CDT | Push session 2. Codex #58 charm/vanna magnitudes shipped. position_manager bug diagnosed and fixed (was labeled “known pre-existing” but was a real production bug since de2898f). Heal script for missing outcomes ran (5 healed). Daily broker reconcile + nightly ML orchestrator wired into launchd.

2026-05-04 evening | Push session 1. outcome_resolved root cause found and fixed; 36 → 125 ENTERED resolved. Triple-barrier label shipped. TabPFN spike ran (XGBoost AUC=0.430 baseline). Meta-labeling secondary classifier landed in shadow mode.

CortanaROI Brain

Explorer

2026-05-05-overnight-ml-pipeline-rebuild