English translation
Generate synthetic data
Bayesian learning centers on integrating prior beliefs with new evidence while explicitly quantifying uncertainty. While reading, structure your understanding around the logical flow: “Overfitting → Concrete examples of overfitting → Regularization → Theoretical foundations of regularization”, then verify each concept using the code snippets, case studies, or evaluation metrics presented in the main text.
After reading, reinforce your understanding with a small, realistic task:
- What is the input?
- Where does processing occur?
- Is the output verifiable and acceptable?
If the task fails, first diagnose overfitting, then consult the examples of overfitting section.
In the previous chapter, we explored the Bayes factor and model comparison, learning how to select among competing models. Next, we delve into two concepts intimately tied to model selection: overfitting and regularization—both essential for ensuring the generalization capability of our Bayesian learning models.
Overfitting
Overfitting occurs when a model performs exceptionally well on training data but exhibits a sharp decline in performance on new, unseen data. This typically arises when the model is excessively complex—i.e., it has too many parameters—and thus fits not only the underlying pattern but also the noise present in the training data.
When grasping overfitting and regularization, first examine:
- Training error vs. validation error
- Parameter complexity
- Prior constraints
- Generalization performance
Examples of Overfitting
Consider linear regression: suppose we have a set of data points and fit them using a high-degree polynomial. On the training set, this polynomial may pass through every point almost perfectly—but on a validation set, its predictive accuracy deteriorates significantly. This degradation is a hallmark of overfitting.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
# Generate synthetic data
np.random.seed(0)
X = np.sort(5 * np.random.rand(80, 1), axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])
# Fit polynomials of varying degrees
degrees = [1, 3, 5, 10]
plt.figure(figsize=(15, 10))
for i, degree in enumerate(degrees):
model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
model.fit(X, y)
y_pred = model.predict(X)
plt.subplot(2, 2, i + 1)
plt.scatter(X, y, s=10, label='Data')
plt.plot(X, y_pred, label='Prediction (degree={})'.format(degree), color='red')
plt.title('Polynomial Degree: {}'.format(degree))
plt.legend()
plt.show()
In the figure above, as the polynomial degree increases, the model’s fit to the training data improves—but its predictive performance on unseen (test) data does not improve proportionally; instead, it begins to degrade. This illustrates overfitting.
Regularization
To combat overfitting, we employ regularization: a technique that adds a penalty term to the loss function to constrain model complexity, thereby reducing the risk of overfitting. Common regularization methods include L1 regularization (Lasso) and L2 regularization (Ridge).
Don’t stop at “I understand” after reading Bayesian Learning and Statistical Inference: Overfitting and Regularization in Model Selection. Go back, pick one step, implement it yourself—and note where you get stuck. Doing so will solidify your learning for future topics.
The Principle Behind Regularization
Within the Bayesian framework, regularization corresponds to placing a prior distribution over model parameters. A common choice is a Gaussian prior, which yields L2 regularization; conversely, a Laplace prior leads to L1 regularization.
Example of Regularization
Continuing with the earlier example, we now apply Ridge regression (L2 regularization) to mitigate overfitting.
from sklearn.linear_model import Ridge
# Apply Ridge regression
plt.figure(figsize=(10, 5))
ridge_model = make_pipeline(PolynomialFeatures(10), Ridge(alpha=1.0))
ridge_model.fit(X, y)
y_ridge_pred = ridge_model.predict(X)
plt.scatter(X, y, s=10, label='Data')
plt.plot(X, y_ridge_pred, label='Ridge Prediction (degree=10)', color='green')
plt.title('Ridge Regression with Regularization')
plt.legend()
plt.show()
In the plot above, Ridge regression balances model complexity against fitting fidelity. Although the curve no longer passes exactly through all training points, its generalization to new data improves markedly.
At this point, consolidate Bayesian Learning and Statistical Inference: Overfitting and Regularization in Model Selection into a concise retrospective table: first articulate the core narrative, then validate it using a small concrete task.
After finishing Bayesian Learning and Statistical Inference: Overfitting and Regularization in Model Selection, try walking through a small example end-to-end. Then assess which steps you can now execute independently.
Summary
In Bayesian learning, overfitting and regularization are two foundational concepts. Recognizing overfitting—and knowing how to counteract it via regularization—empowers us to make more robust model selections. In the next chapter, we will explore Bayesian regression, focusing specifically on practical implementation and applications of linear regression models.
Through this tutorial, we hope you’ll internalize the importance of balancing model complexity during selection—and learn to use regularization techniques not only to achieve good fit, but also to avoid overfitting entirely.
Continue