Guozhen AIGlobal AI field notes and model intelligence

English translation

Generate synthetic data

Published:

Category: Bayesian Learning

Read time: 4 min

Reads: 0

Lesson #15Views are counted together with the original Chinese articleImages are preserved from the source page

Structure Diagram: Bayesian Regression — Prediction and Uncertainty Quantification

Bayesian learning centers on integrating prior beliefs with new evidence—and explicitly representing uncertainty. While reading, structure your understanding around the sequence: “Prediction in Bayesian regression → Predictive distribution → Computing the predictive distribution → Result interpretation”. Then verify each step using the code, case studies, or evaluation metrics presented in the main text.

Verification Flowchart: Bayesian Regression — Prediction and Uncertainty Quantification

After reading, validate your understanding with a small real-world task: identify what the inputs are, where the processing steps occur, and whether outputs meet acceptance criteria. If something fails, first revisit “Prediction in Bayesian Regression”, then check “Predictive Distribution”.

In the previous chapter, we explored “Bayesian Regression — Prior Selection and Posterior Analysis”, focusing on how to choose appropriate prior distributions and derive posterior distributions from data. In this tutorial, we delve into how to use Bayesian regression models for prediction—and, crucially, how to quantify the uncertainty associated with those predictions.

1. Prediction in Bayesian Regression

In Bayesian regression, prediction goes beyond computing a single point estimate. More importantly, it enables us to rigorously quantify the uncertainty of that estimate. The Bayesian framework allows us to combine prior knowledge with observed data to model uncertainty over unknown parameters.

Prediction & Uncertainty Checklist Card

When performing Bayesian regression prediction, first examine:

  • the posterior distribution of parameters,
  • the predicted mean,
  • credible intervals,
  • observational noise, and
  • out-of-sample risk.

1.1 Predictive Distribution

Given a new input point xx^*, we aim to predict its corresponding output yy^*. In Bayesian regression, rather than merely evaluating the regression function at xx^*, we compute the full conditional distribution of yy^*—i.e., p(yx,D)p(y^* \mid x^*, D), where DD denotes our training dataset.

By applying Bayes’ theorem, we obtain:

p(yx,D)=p(yx,θ)p(θD)dθp(y^* \mid x^*, D) = \int p(y^* \mid x^*, \theta)\, p(\theta \mid D)\, d\theta

Here, p(yx,θ)p(y^* \mid x^*, \theta) is the predictive distribution of yy^* given parameters θ\theta, and p(θD)p(\theta \mid D) is the posterior distribution over parameters.

1.2 Computing the Predictive Distribution

In practice, because p(θD)p(\theta \mid D) is often analytically intractable, we typically approximate the integral using Monte Carlo methods. Below is a simple example demonstrating Bayesian regression prediction in Python.

Example: House Price Prediction

Suppose we have a dataset containing house sizes (in square feet) and corresponding sale prices. We apply Bayesian linear regression to predict prices—and quantify prediction uncertainty.

import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm

# Generate synthetic data
np.random.seed(42)
X = np.random.normal(1000, 200, 100)
true_slope = 200
true_intercept = 10000
y = true_intercept + true_slope * X + np.random.normal(0, 5000, 100)

# Visualize raw data
plt.scatter(X, y, c='black', label='Data')
plt.xlabel('Size (sqft)')
plt.ylabel('Price ($)')
plt.title('House Prices')
plt.legend()
plt.show()

# Define Bayesian regression model
with pm.Model() as model:
    # Priors
    alpha = pm.Normal('alpha', mu=0, sigma=10000)
    beta = pm.Normal('beta', mu=0, sigma=500)
    sigma = pm.HalfNormal('sigma', sigma=1000)

    # Linear regression mean
    mu = alpha + beta * X
    
    # Likelihood
    Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=y)

    # Sample from posterior
    trace = pm.sample(2000, tune=1000, return_inferencedata=False)

# Generate predictions for new inputs
X_new = np.linspace(min(X), max(X), 100)
with model:
    pm.set_data({'X_new': X_new})
    mu_new = pm.sample_posterior_predictive(trace)
    
# Compute summary statistics
y_pred = mu_new['Y_obs'].mean(axis=0)
y_pred_std = mu_new['Y_obs'].std(axis=0)

# Visualize predictions
plt.figure(figsize=(10, 5))
plt.scatter(X, y, c='black', label='Data')
plt.plot(X_new, y_pred, color='blue', label='Predicted Mean')
plt.fill_between(X_new, y_pred - 1.96 * y_pred_std, y_pred + 1.96 * y_pred_std, color='blue', alpha=0.3, label='95% Prediction Interval')
plt.xlabel('Size (sqft)')
plt.ylabel('Price ($)')
plt.title('Bayesian Linear Regression Prediction')
plt.legend()
plt.show()

1.3 Interpreting Results

In this example, we used Bayesian regression to predict house prices. In the resulting plot, the blue line represents the predicted mean, while the shaded region shows the prediction uncertainty—specifically, a 95% prediction interval. This visualization delivers not only a point estimate but also a principled quantification of uncertainty.

2. Approaches to Handling Uncertainty

A key advantage of Bayesian regression is its natural ability to handle uncertainty. We directly extract parameter uncertainty from the posterior distribution—and quantify prediction uncertainty via the predictive distribution. This leads to more robust, well-calibrated decisions.

Bayesian Learning Reading Map Card

Read “Bayesian Regression — Prediction and Uncertainty Quantification” through the lens of “Scenario → Concept → Action → Outcome.” First align these four dimensions; then revisit parameters, code, or workflow details in the main text.

2.1 Impact of Prior Choice on Uncertainty

Different prior choices lead to different degrees of predictive uncertainty. For instance, a strongly informative prior tends to shrink prediction intervals, whereas a weakly informative (or diffuse) prior generally yields wider, more conservative uncertainty estimates.

2.2 Importance of Monte Carlo Methods

As noted earlier, analytical solutions for posteriors are rarely available. Monte Carlo sampling—drawing samples from the posterior—is thus a standard and powerful technique in Bayesian prediction. It enables rich, sample-based approximations of predictive distributions and supports nuanced uncertainty reporting.

Application Retrospective Card: Bayesian Regression — Prediction and Uncertainty Quantification

If you haven’t fully internalized “Bayesian Regression — Prediction and Uncertainty Quantification,” walk through the four actions on this card again.

Application Verification Card: Bayesian Regression — Prediction and Uncertainty Quantification

When reviewing “Bayesian Regression — Prediction and Uncertainty Quantification,” avoid launching large-scale projects upfront. Instead, test comprehension using a single, minimal working example to confirm whether the core logic is clear.

3. Summary

In this chapter, we examined the distinctive features of prediction in Bayesian regression—and demonstrated how to quantify prediction uncertainty. Using a practical example with Python’s pymc3 library, we showed how to fit a Bayesian linear regression model, generate predictions, and visualize both point estimates and their associated uncertainty. This approach provides deeper insight than classical linear regression—especially when modeling complex, noisy, or sparse data.

In the next chapter, we will explore “Bayesian Classification — Foundational Theory,” continuing our journey through Bayesian learning.

Continue

Keep reading from here

Browse English site

Reader Messages

Reader messages

Questions, corrections, extra sources, or hands-on results can be left here. No login is required.

Max 800 characters

To reduce spam, each message is checked for length, link count, and posting frequency.

0/800

Messages

0 messages
Loading messages...