The Learning Curve

Module 5 — Machine Learning for Time Series Forecasting
Created by Dr. Pedram Jahangiry | Enhanced with Claude

Do We Need to Collect More Data?

A learning curve plots model performance (error) against training set size. As we add more training data, the training error typically increases (harder to fit more points perfectly) while the validation error typically decreases (more data = better generalization).

The gap between the two curves tells you everything: a small gap at high error = high bias (underfitting), a large gap = high variance (overfitting), and both curves converging at low error = just right.

3
Degree 0
Constant
Degree 1
Linear
Degree 2
Quadratic
Degree 3
Cubic ✓
Degree 4
Quartic
Degree 5
Quintic
Learning Curve — Error vs Training Set Size
Current Model Fit (One CV Fold)

High Bias (Underfitting)

Both curves converge to a high error. The gap between them is small. Adding more data won't help — the model is too simple. Fix: increase model complexity, add features.

High Variance (Overfitting)

Training error is low but validation error is high. The gap between them is large. Adding more data can help — it gives the model less room to memorize. Fix: more data, reduce complexity, regularize.