Meta-Labeling Implementation Patterns
Practical patterns for the meta-labeling secondary classifier shipped in Cortana MK2 (Task #56). Sources: López de Prado Advances in Financial Machine Learning Ch.3.6, Hudson & Thames published research (lifts of 17-57pp on validated S&P 500 E-mini examples), mlfinpy 0.1.2 reference implementation. Pattern: primary signal decides direction, secondary classifier decides take-or-skip on top.
Core architecture
Two-stage:
- Primary (existing scoring engine) → signal + direction.
- Secondary (L1-penalized logistic regression) → P(primary correct).
The secondary’s hypothesis space is much narrower: “given the primary already fired, will it succeed?” - not “predict direction from raw features.” This is why meta-labeling lifts performance at small n where direct training fails.
Implementation choices in our stack
Architecture: L1 logistic regression
Not XGBoost. Below n=200 with high-dimensional features, L1-penalized logistic regression is empirically more stable than tree ensembles (van Smeden 2019, Statistical Methods in Medical Research). The sparsity from L1 forces the model to commit to a small feature subset
- matching the “shrink to ~12 features” recommendation in the adversarial review.
Feature selection: mutual information → top 8
sklearn mutual_info_classif on the binary triple-barrier target,
take top 8 features. At our n=100, anything above 8 features overfits.
The mutual-information score also handles non-linear relationships
better than Pearson correlation.
Target: triple-barrier label remapped to binary
- triple_barrier_label = +1 (TP first) → meta target 1
- triple_barrier_label = -1 (SL first) → meta target 0
- triple_barrier_label = 0 (timeout/scratch) → meta target 0
Treating timeout as skip (not reward) is intentional: the meta classifier should learn “when does the primary hit a real win?” not “when does the primary not lose?” - the latter rewards sideways action.
MIN_SAMPLES_TO_TRAIN = 50
Below this, return PRIOR (uniform 0.5). Mirrors the xgboost backend pattern. Our current data has 100 trainable rows post-Task #49, so we train; would auto-fall-back if data degrades.
Wiring: shadow mode first
Critical principle - never gate live trades on a meta-label that
hasn’t proven itself OOS. Our secondary emits meta_win_prob to
scoring_events.meta_win_prob for every decision but does NOT block
entry. Promotion to live gate (Task #56 phase 2) requires:
- AUC > 0.55 on a 30-day rolling holdout
- Decile lift on bottom decile (model identifies losers)
- Calibration: high-bucket predictions retain their stated win rate
- At least 50-100 chronologically-latest trades scored
What this is NOT
- Not a replacement for the primary. Both run side by side.
- Not a Bayesian online model. That’s a different architecture (next iteration if L1 logistic plateaus).
- Not a directional predictor. It only confirms or vetoes.
- Not trained on path data. mfe_pct/mae_pct features could theoretically improve it, but they’re survivorship-biased (only 85 of 126 trades have path snapshots) and would discard 30%+ of training data. Defer until path coverage is universal.
Reference implementations
- mlfinpy 0.1.2 (MIT, https://github.com/baobach/mlfinpy):
open-source López de Prado port. Has
triple_barriers,add_vertical_barrier,get_events,get_bins. We sanity-checked ourcompute_triple_barrier_labelagainst it (tests/test_triple_barrier_vs_mlfinpy.py). - Hudson & Thames meta-labeling repo (no license, https://github.com/hudson-and-thames/meta-labeling): 4 papers + Jupyter notebooks demonstrating architectures, calibration, ensemble approaches. Read-and-reimplement (no copy).
- AFML Ch.3.6 (López de Prado, Wiley 2018, ISBN 978-1119482086): canonical text on meta-labeling. The primary academic source.
What to build next (rough roadmap)
- Wire dashboard to display meta_win_prob alongside model_win_prob on signal cards (Task #36) - operator visibility.
- Calibration harness that runs nightly, computes AUC + decile lift on the 30-day rolling holdout, writes to retrain_history.
- Promotion gate based on conditional expectancy, not AUC (Task #53). Pre-requisite for live veto.
- Sample weights by recency or class balance (López de Prado Ch.4) once n > 200.
- Position sizing layer - meta_win_prob → contract count, not just take/skip (López de Prado Ch.10).
See Also
- 0DTE ML Best-in-Class Comparison
- ML Training Label Grounding
- 2026-05-04 Adversarial ML Data Review
- 2026-05-05 TabPFN Spike + XGBoost Baseline
Timeline
2026-05-05 | derived - Filed during Task #56 implementation as the methodology reference future engineers can read before changing the secondary classifier. Stefan Jansen ML4T was checked as a possible reference - none of its 24 chapters cover meta-labeling or triple-barrier directly. The canonical references are AFML Ch.3.6 + Hudson & Thames published research + mlfinpy reference impl.