ML Training Label Grounding

The XGBoost model’s training target is outcome_pnl >= 0 - literal trade P&L in dollars. Not directional underlying movement, not TP-hit, not forward-return sign. P&L-grounded. The orphaned scoring_events.realized_label column (BULLISH/BEARISH/NEUTRAL from forward_15m_pct sign) is written nightly but read nowhere in production - it is a diagnostic artifact, not the model’s label.

Core claim

Cortana MK2’s win-probability model trains on actual trade outcomes, not proxy signals. The classifier label is 1.0 if outcome_pnl >= 0 else 0.0 (xgboost_model.py:64-71); the regression label is log-transformed outcome_pnl_pct clamped to [-1, 1]. The training query (TRAINING_DATA_BASE_QUERY, decision_logger.py:830-869) joins scoring_eventsoutcomes on paper_trade_id and selects outcome_pnl, outcome_pnl_pct - never realized_label.

Evidence

  • derived - xgboost_model.py:64-71 defines _win_label() returning 1.0 if outcome_pnl >= 0 else 0.0; _return_label() returns log of outcome_pnl_pct. 2026-05-04.
  • derived - decision_logger.py:830-869 (TRAINING_DATA_BASE_QUERY) - no realized_label in projection.
  • derived - realized_label is written by backfill_forward_returns.py:107-119 with directional thresholds (±0.0005 on forward_15m_pct), but grep of src/ shows zero readers.

When it applies

Any conversation about “what does the ML model predict” or “what target should we re-label.” The answer is already P&L-grounded. Don’t propose a relabel of training data without verifying which column the live training query actually selects.

When it breaks

  • If a future engineer points the training Y vector at realized_label thinking it’s the canonical label, the model will start predicting underlying-direction (a different game from option-premium P&L).
  • outcome_hit_tp is a separate column that’s always 0 despite 60 wins in outcomes - a write-side bug. Don’t grab outcome_hit_tp thinking it’s a usable label until that’s fixed (Task #45 / GH future).

See Also


Timeline

2026-05-04 | derived - Audit triggered by misread calibration table: joined model_win_prob × outcome_hit_tp and got 0% wins, which made it look like the model was predicting the wrong thing. Real bugs: (a) hit_tp write-side broken, (b) forward_X_pnl_pct UPDATE silently failing, (c) n_train=26 so AUC=0.5. Label semantics were never the problem. Filed this concept so the next “we should relabel” thread starts from correct ground truth.