Cross Validation

Module 5 — Machine Learning for Time Series Forecasting
Created by Dr. Pedram Jahangiry | Enhanced with Claude

K-Fold Cross Validation

When we don't have enough data for a separate validation set, K-Fold CV solves the problem by reusing data. The training data is split into K equal-sized folds. In each iteration, one fold is held out as the validation set and the remaining K−1 folds are used for training. The final performance is the average across all K iterations.

5
Dataset: 20 samples
Training
Validation
Average CV Score

Stratified K-Fold Cross Validation

Standard K-Fold may create folds where class proportions are unbalanced — one fold might have mostly one class. Stratified K-Fold ensures each fold has approximately the same percentage of samples from each class as the complete dataset. This is especially important for imbalanced classification problems.

3
Training
Validation
Class A (30%)
Class B (70%)
Each fold preserves class proportions

Leave-One-Out Cross Validation (LOOCV)

LOOCV is K-Fold taken to its extreme: K = N (the number of samples). Each iteration uses a single sample as the validation set and all remaining N−1 samples for training.

Why is it almost unbiased? Each training set contains N−1 out of N samples — nearly the entire dataset. So the model trained in each fold is almost identical to a model trained on all the data. The performance estimate therefore closely approximates the true generalization error. It's "almost" rather than exactly unbiased because each fold is still missing one sample, introducing a tiny amount of bias.

The downside: it is computationally expensive — the model must be trained N times, and the estimate can have high variance since each validation set is just a single observation.

Demo: Classification Example

The simulation below demonstrates LOOCV on a classification task. In each iteration, the single held-out sample is predicted as either correct (✓) or incorrect (✗). The final LOOCV accuracy is the fraction of samples correctly classified.

For regression tasks, the logic is the same — but instead of correct/incorrect, each iteration produces a prediction error (e.g., squared error), and the final LOOCV score is the average error across all N iterations.

Current Iteration
Click "Animate LOOCV" to start
0 / 12 iterations
LOOCV Score

When to Use Which?

Each cross-validation strategy has trade-offs between computational cost, data usage, and reliability of the performance estimate. The right choice depends on your dataset size, class balance, and available compute.

Aspect K-Fold Stratified K-Fold LOOCV

K-Fold CV (K=5 or 10)

  • Computationally efficient
  • Most commonly used in practice
  • Random splits may not preserve class ratios

Stratified K-Fold

  • Preserves class distribution in every fold
  • Essential for imbalanced datasets
  • More reliable estimates for classification
  • Only applicable to classification tasks

LOOCV (K=N)

  • Uses maximum data for training each time
  • Deterministic — no randomness in splits
  • Very expensive: trains N separate models

Practical Recommendation

  • Use K=5 or K=10 as default
  • Use Stratified K-Fold for classification with imbalanced classes
  • Use LOOCV only when N is very small (< 50)
  • For time series, use specialized Time Series CV (next topic!)