English translation
Generate synthetic data
The core of Bayesian learning lies in integrating prior beliefs with new evidence while explicitly quantifying uncertainty. While reading, structure your understanding as follows: “Concept of model complexity → Overfitting and underfitting → Bayesian model selection → Model complexity and the Bayes factor,” then verify each concept using the code snippets, case studies, or evaluation metrics presented in the main text.
After reading, conduct a quick review using a small, realistic task: identify what the inputs are, where the processing steps occur, and whether the outputs are verifiable and acceptable. If the task fails, first revisit “Concept of model complexity,” then proceed to “Overfitting and underfitting.”
In Bayesian learning and statistical inference, model complexity plays a critical role in determining both model performance and generalization capability. It not only influences parameter estimation but also directly affects the validity of model selection. This article discusses how to assess and select appropriate model complexity within the Bayesian framework—illustrated through a concrete case study to clarify these ideas.
Concept of Model Complexity
Model complexity refers to the intrinsic flexibility of a model—typically reflecting its capacity to capture underlying patterns in data. Broadly speaking, low-complexity models have fewer parameters and are suitable for describing simple data structures; high-complexity models can accommodate more variation but are prone to overfitting.
When selecting model complexity, consider data size, noise level, prior constraints, posterior uncertainty, and predictive performance.
Overfitting and Underfitting
- Overfitting: The model is excessively complex—fits training data well but performs poorly on new (unseen) data.
- Underfitting: The model is overly simplistic—fails to capture true underlying patterns, resulting in poor performance on both training and test data.
Within Bayesian statistics, we often favor more flexible (complex) models—but must control complexity deliberately to avoid overfitting.
Bayesian Model Selection
In the previous section, we discussed parameter selection and evaluation. Here, we extend that discussion to model selection using Bayesian methods.
Before reading “Bayesian Learning and Statistical Inference: Model Complexity Selection,” use the accompanying diagram to confirm the central narrative. After reading, check which steps you can implement directly—and which require supplementary material.
Within the Bayesian framework, model selection proceeds by comparing the posterior probabilities of competing models. For example, given a dataset and candidate models , the posterior probability of model is:
where:
- is the likelihood—the degree to which model fits the observed data;
- is the prior probability of model , encoding our initial belief about its plausibility.
Model Complexity and the Bayes Factor
The Bayes factor is a key tool for comparing two models and , defined as:
By computing the Bayes factor, we assess which model better explains the observed data. Importantly, Bayes factor computation is inherently sensitive to model complexity.
Case Study: Comparing Model Complexity Using Ridge Regression and LASSO
Suppose we face a regression problem—predicting a company’s sales based on several explanatory variables. We may compare two distinct regression approaches: Ridge regression (L2 regularization) and LASSO (L1 regularization). Their complexities differ fundamentally:
- Ridge regression controls complexity by adding a penalty term proportional to the squared magnitude of coefficients.
- LASSO, by contrast, encourages sparsity—driving some coefficients exactly to zero—thus performing feature selection and reducing effective model complexity.
Below is Python code implementing and evaluating both models:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, Lasso
from sklearn.metrics import mean_squared_error
# Generate synthetic data
X = np.random.randn(100, 10)
y = X @ np.random.randn(10) + np.random.randn(100) * 0.5
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Ridge regression model
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)
ridge_predictions = ridge_model.predict(X_test)
ridge_mse = mean_squared_error(y_test, ridge_predictions)
# LASSO model
lasso_model = Lasso(alpha=0.1)
lasso_model.fit(X_train, y_train)
lasso_predictions = lasso_model.predict(X_test)
lasso_mse = mean_squared_error(y_test, lasso_predictions)
print("Ridge MSE:", ridge_mse)
print("LASSO MSE:", lasso_mse)
In this code, we generate synthetic data and fit both Ridge and LASSO regressions. By comparing their Mean Squared Error (MSE) on held-out test data, we gain insight into how their differing complexities affect real-world predictive performance.
After completing “Bayesian Learning and Statistical Inference: Model Complexity Selection,” try adapting it to your own scenario—pay close attention to whether inputs, processing steps, and outputs align coherently.
To apply “Bayesian Learning and Statistical Inference: Model Complexity Selection” to your own task, start small: isolate and validate just one critical decision point.
Conclusion
In this section, we examined the pivotal role of model complexity in Bayesian learning and introduced model selection via the Bayes factor. Since different levels of complexity yield markedly different predictive behaviors, model choice should balance complexity against data characteristics and out-of-sample performance. Subsequent sections will delve deeper into Bayes factors and formal model comparison—helping readers build a robust, principled framework for Bayesian model selection.
Continue