ML Training Label Grounding
The XGBoost model’s training target is
outcome_pnl >= 0- literal trade P&L in dollars. Not directional underlying movement, not TP-hit, not forward-return sign. P&L-grounded. The orphanedscoring_events.realized_labelcolumn (BULLISH/BEARISH/NEUTRAL from forward_15m_pct sign) is written nightly but read nowhere in production - it is a diagnostic artifact, not the model’s label.
Core claim
Cortana MK2’s win-probability model trains on actual trade outcomes, not
proxy signals. The classifier label is 1.0 if outcome_pnl >= 0 else 0.0
(xgboost_model.py:64-71); the regression label is log-transformed
outcome_pnl_pct clamped to [-1, 1]. The training query
(TRAINING_DATA_BASE_QUERY, decision_logger.py:830-869) joins
scoring_events ↔ outcomes on paper_trade_id and selects
outcome_pnl, outcome_pnl_pct - never realized_label.
Evidence
- derived -
xgboost_model.py:64-71defines_win_label()returning1.0 if outcome_pnl >= 0 else 0.0;_return_label()returns log ofoutcome_pnl_pct. 2026-05-04. - derived -
decision_logger.py:830-869(TRAINING_DATA_BASE_QUERY) - norealized_labelin projection. - derived -
realized_labelis written bybackfill_forward_returns.py:107-119with directional thresholds (±0.0005 on forward_15m_pct), but grep of src/ shows zero readers.
When it applies
Any conversation about “what does the ML model predict” or “what target should we re-label.” The answer is already P&L-grounded. Don’t propose a relabel of training data without verifying which column the live training query actually selects.
When it breaks
- If a future engineer points the training Y vector at
realized_labelthinking it’s the canonical label, the model will start predicting underlying-direction (a different game from option-premium P&L). outcome_hit_tpis a separate column that’s always 0 despite 60 wins in outcomes - a write-side bug. Don’t graboutcome_hit_tpthinking it’s a usable label until that’s fixed (Task #45 / GH future).
See Also
Timeline
2026-05-04 | derived - Audit triggered by misread calibration table: joined model_win_prob × outcome_hit_tp and got 0% wins, which made it look like the model was predicting the wrong thing. Real bugs: (a) hit_tp write-side broken, (b) forward_X_pnl_pct UPDATE silently failing, (c) n_train=26 so AUC=0.5. Label semantics were never the problem. Filed this concept so the next “we should relabel” thread starts from correct ground truth.