Meta-Labeling Implementation Patterns

Practical patterns for the meta-labeling secondary classifier shipped in Cortana MK2 (Task #56). Sources: López de Prado Advances in Financial Machine Learning Ch.3.6, Hudson & Thames published research (lifts of 17-57pp on validated S&P 500 E-mini examples), mlfinpy 0.1.2 reference implementation. Pattern: primary signal decides direction, secondary classifier decides take-or-skip on top.

Core architecture

Two-stage:

Primary (existing scoring engine) → signal + direction.
Secondary (L1-penalized logistic regression) → P(primary correct).

The secondary’s hypothesis space is much narrower: “given the primary already fired, will it succeed?” - not “predict direction from raw features.” This is why meta-labeling lifts performance at small n where direct training fails.

Implementation choices in our stack

Architecture: L1 logistic regression

Not XGBoost. Below n=200 with high-dimensional features, L1-penalized logistic regression is empirically more stable than tree ensembles (van Smeden 2019, Statistical Methods in Medical Research). The sparsity from L1 forces the model to commit to a small feature subset

matching the “shrink to ~12 features” recommendation in the adversarial review.

Feature selection: mutual information → top 8

sklearn mutual_info_classif on the binary triple-barrier target, take top 8 features. At our n=100, anything above 8 features overfits. The mutual-information score also handles non-linear relationships better than Pearson correlation.

Target: triple-barrier label remapped to binary

triple_barrier_label = +1 (TP first) → meta target 1
triple_barrier_label = -1 (SL first) → meta target 0
triple_barrier_label = 0 (timeout/scratch) → meta target 0

Treating timeout as skip (not reward) is intentional: the meta classifier should learn “when does the primary hit a real win?” not “when does the primary not lose?” - the latter rewards sideways action.

MIN_SAMPLES_TO_TRAIN = 50

Below this, return PRIOR (uniform 0.5). Mirrors the xgboost backend pattern. Our current data has 100 trainable rows post-Task #49, so we train; would auto-fall-back if data degrades.

Wiring: shadow mode first

Critical principle - never gate live trades on a meta-label that hasn’t proven itself OOS. Our secondary emits meta_win_prob to scoring_events.meta_win_prob for every decision but does NOT block entry. Promotion to live gate (Task #56 phase 2) requires:

AUC > 0.55 on a 30-day rolling holdout
Decile lift on bottom decile (model identifies losers)
Calibration: high-bucket predictions retain their stated win rate
At least 50-100 chronologically-latest trades scored

What this is NOT

Not a replacement for the primary. Both run side by side.
Not a Bayesian online model. That’s a different architecture (next iteration if L1 logistic plateaus).
Not a directional predictor. It only confirms or vetoes.
Not trained on path data. mfe_pct/mae_pct features could theoretically improve it, but they’re survivorship-biased (only 85 of 126 trades have path snapshots) and would discard 30%+ of training data. Defer until path coverage is universal.

Reference implementations

mlfinpy 0.1.2 (MIT, https://github.com/baobach/mlfinpy): open-source López de Prado port. Has triple_barriers, add_vertical_barrier, get_events, get_bins. We sanity-checked our compute_triple_barrier_label against it (tests/test_triple_barrier_vs_mlfinpy.py).
Hudson & Thames meta-labeling repo (no license, https://github.com/hudson-and-thames/meta-labeling): 4 papers + Jupyter notebooks demonstrating architectures, calibration, ensemble approaches. Read-and-reimplement (no copy).
AFML Ch.3.6 (López de Prado, Wiley 2018, ISBN 978-1119482086): canonical text on meta-labeling. The primary academic source.

What to build next (rough roadmap)

Wire dashboard to display meta_win_prob alongside model_win_prob on signal cards (Task #36) - operator visibility.
Calibration harness that runs nightly, computes AUC + decile lift on the 30-day rolling holdout, writes to retrain_history.
Promotion gate based on conditional expectancy, not AUC (Task #53). Pre-requisite for live veto.
Sample weights by recency or class balance (López de Prado Ch.4) once n > 200.
Position sizing layer - meta_win_prob → contract count, not just take/skip (López de Prado Ch.10).

Timeline

2026-05-05 | derived - Filed during Task #56 implementation as the methodology reference future engineers can read before changing the secondary classifier. Stefan Jansen ML4T was checked as a possible reference - none of its 24 chapters cover meta-labeling or triple-barrier directly. The canonical references are AFML Ch.3.6 + Hudson & Thames published research + mlfinpy reference impl.

CortanaROI Brain

Explorer

meta-labeling-implementation-patterns