Time Series Cross Validation | Dr. Pedram Jahangiry

Purged K-Fold Cross Validation

Standard K-Fold CV randomly shuffles data into folds — but time series data is not i.i.d.! Nearby observations are correlated, and labels often span multiple time periods (e.g., a 20-day forward return). This creates data leakage: the training set contains information that "bleeds" into the test period.

Purged K-Fold CV (from de Prado, 2018) fixes this with two mechanisms:

Purging: Remove training samples whose labels overlap with the test period — eliminates direct lookahead bias
Embargo: Remove a buffer of training samples after each test block — accounts for feature leakage (e.g., rolling averages that include test data)

⚠ Why Not Standard K-Fold?

With time series, standard K-Fold allows future data to train the model. Even if folds are contiguous, labels that span multiple bars create overlap between train and test sets. Purging and embargoing are essential to get an honest out-of-sample estimate.

Folds (K): 5

Purge size: 2

Embargo size: 1

Train

Test

Purged

Embargo

Unused

Key Insight

Purging removes training samples whose label resolution period overlaps the test set, while embargo removes samples immediately after the test set whose features may contain test-period information. Together, they prevent both direct and indirect data leakage.

Walk-Forward Cross Validation

Walk-Forward CV respects the temporal ordering of time series by always training on past data and testing on future data. It comes in two variants:

Expanding Window: Training set grows over time — each split adds more historical data. Uses all available past data, but training cost increases with each split.
Rolling (Sliding) Window: Training set has a fixed size that slides forward — drops the oldest data as new data is added. Better for non-stationary data where old patterns may be irrelevant.

Splits: 5

Min train size: 5

Test size: 2

Train

Test

Unused

Key Insight

Expanding window uses all available history for training, making it data-efficient. However, it produces only a single backtest path, making it hard to assess overfitting. Old patterns may dominate if the data is non-stationary.

Combinatorial Purged Cross Validation (CPCV)

CPCV (de Prado, 2018) is the gold standard for financial time series model evaluation. It solves two critical problems that Walk-Forward and Purged K-Fold leave unaddressed:

Multiple backtest paths: Instead of one backtest path, CPCV generates φ(N,k) = C(N−1, k−1) unique paths that span the entire dataset
Overfitting detection: With multiple paths, you get a distribution of performance metrics (e.g., Sharpe ratios), enabling formal overfitting tests like the Probability of Backtest Overfitting (PBO)

How it works: Divide data into N groups, choose k as test groups in each split. This produces C(N, k) splits. Each observation is tested C(N−1, k−1) times. Predictions are recombined into full backtest paths.

Groups (N): 6

Test groups (k): 2

Train

Test

Purge / Embargo

All C(N, k) Splits

Backtest Paths — click a path to highlight its source splits

Key Insight

CPCV produces multiple backtest paths, enabling you to assess the distribution of out-of-sample performance — not just a single point estimate. This makes it possible to compute the Probability of Backtest Overfitting (PBO) and detect strategies that performed well by luck rather than genuine predictive power.

Comparison of Methods

Property	Standard K-Fold	Walk-Forward	Purged K-Fold	CPCV
Respects time order	✗	✓	✓	✓
Prevents label leakage	✗	✓	✓	✓
Handles feature leakage (embargo)	✗	✗	✓	✓
Multiple backtest paths	✗	✗	✗	✓
Overfitting detection (PBO)	✗	✗	✗	✓
Data efficiency	High	Low	High	High
Backtest paths	1	1	1	φ(N,k)