Time Series Cross Validation

Purged K-Fold, Walk-Forward, & Combinatorial Purged CV

Created by Dr. Pedram Jahangiry | Enhanced with Claude

Purged K-Fold Cross Validation

Standard K-Fold CV randomly shuffles data into folds — but time series data is not i.i.d.! Nearby observations are correlated, and labels often span multiple time periods (e.g., a 20-day forward return). This creates data leakage: the training set contains information that "bleeds" into the test period.

Purged K-Fold CV (from de Prado, 2018) fixes this with two mechanisms:

  • Purging: Remove training samples whose labels overlap with the test period — eliminates direct lookahead bias
  • Embargo: Remove a buffer of training samples after each test block — accounts for feature leakage (e.g., rolling averages that include test data)

⚠ Why Not Standard K-Fold?

With time series, standard K-Fold allows future data to train the model. Even if folds are contiguous, labels that span multiple bars create overlap between train and test sets. Purging and embargoing are essential to get an honest out-of-sample estimate.

5
2
1
Train
Test
Purged
Embargo
Unused

Key Insight

Purging removes training samples whose label resolution period overlaps the test set, while embargo removes samples immediately after the test set whose features may contain test-period information. Together, they prevent both direct and indirect data leakage.

Walk-Forward Cross Validation

Walk-Forward CV respects the temporal ordering of time series by always training on past data and testing on future data. It comes in two variants:

  • Expanding Window: Training set grows over time — each split adds more historical data. Uses all available past data, but training cost increases with each split.
  • Rolling (Sliding) Window: Training set has a fixed size that slides forward — drops the oldest data as new data is added. Better for non-stationary data where old patterns may be irrelevant.
5
5
2
Train
Test
Unused

Key Insight

Expanding window uses all available history for training, making it data-efficient. However, it produces only a single backtest path, making it hard to assess overfitting. Old patterns may dominate if the data is non-stationary.

Combinatorial Purged Cross Validation (CPCV)

CPCV (de Prado, 2018) is the gold standard for financial time series model evaluation. It solves two critical problems that Walk-Forward and Purged K-Fold leave unaddressed:

  • Multiple backtest paths: Instead of one backtest path, CPCV generates φ(N,k) = C(N−1, k−1) unique paths that span the entire dataset
  • Overfitting detection: With multiple paths, you get a distribution of performance metrics (e.g., Sharpe ratios), enabling formal overfitting tests like the Probability of Backtest Overfitting (PBO)

How it works: Divide data into N groups, choose k as test groups in each split. This produces C(N, k) splits. Each observation is tested C(N−1, k−1) times. Predictions are recombined into full backtest paths.

6
2
Train
Test
Purge / Embargo
All C(N, k) Splits
Backtest Paths — click a path to highlight its source splits

Key Insight

CPCV produces multiple backtest paths, enabling you to assess the distribution of out-of-sample performance — not just a single point estimate. This makes it possible to compute the Probability of Backtest Overfitting (PBO) and detect strategies that performed well by luck rather than genuine predictive power.

Comparison of Methods

Property Standard K-Fold Walk-Forward Purged K-Fold CPCV
Respects time order
Prevents label leakage
Handles feature leakage (embargo)
Multiple backtest paths
Overfitting detection (PBO)
Data efficiency High Low High High
Backtest paths 1 1 1 φ(N,k)