Cross Validation Interactive Tool | Dr. Pedram Jahangiry

K-Fold Cross Validation

When we don't have enough data for a separate validation set, K-Fold CV solves the problem by reusing data. The training data is split into K equal-sized folds. In each iteration, one fold is held out as the validation set and the remaining K−1 folds are used for training. The final performance is the average across all K iterations.

K (folds): 5

Samples:

Dataset: 20 samples

Training

Validation

Average CV Score

Stratified K-Fold Cross Validation

Standard K-Fold may create folds where class proportions are unbalanced — one fold might have mostly one class. Stratified K-Fold ensures each fold has approximately the same percentage of samples from each class as the complete dataset. This is especially important for imbalanced classification problems.

K (folds): 3

Training

Validation

Class A (30%)

Class B (70%)

Each fold preserves class proportions

Leave-One-Out Cross Validation (LOOCV)

LOOCV is K-Fold taken to its extreme: K = N (the number of samples). Each iteration uses a single sample as the validation set and all remaining N−1 samples for training.

Why is it almost unbiased? Each training set contains N−1 out of N samples — nearly the entire dataset. So the model trained in each fold is almost identical to a model trained on all the data. The performance estimate therefore closely approximates the true generalization error. It's "almost" rather than exactly unbiased because each fold is still missing one sample, introducing a tiny amount of bias.

The downside: it is computationally expensive — the model must be trained N times, and the estimate can have high variance since each validation set is just a single observation.

Demo: Classification Example

The simulation below demonstrates LOOCV on a classification task. In each iteration, the single held-out sample is predicted as either correct (✓) or incorrect (✗). The final LOOCV accuracy is the fraction of samples correctly classified.

For regression tasks, the logic is the same — but instead of correct/incorrect, each iteration produces a prediction error (e.g., squared error), and the final LOOCV score is the average error across all N iterations.

Samples (N):

Current Iteration

Click "Animate LOOCV" to start

0 / 12 iterations

LOOCV Score

When to Use Which?

Each cross-validation strategy has trade-offs between computational cost, data usage, and reliability of the performance estimate. The right choice depends on your dataset size, class balance, and available compute.

Aspect	K-Fold	Stratified K-Fold	LOOCV

K-Fold CV (K=5 or 10)

Computationally efficient
Most commonly used in practice
Random splits may not preserve class ratios

Stratified K-Fold

Preserves class distribution in every fold
Essential for imbalanced datasets
More reliable estimates for classification
Only applicable to classification tasks

LOOCV (K=N)

Uses maximum data for training each time
Deterministic — no randomness in splits
Very expensive: trains N separate models

Practical Recommendation

Use K=5 or K=10 as default
Use Stratified K-Fold for classification with imbalanced classes
Use LOOCV only when N is very small (< 50)
For time series, use specialized Time Series CV (next topic!)