Guozhen AIGlobal AI field notes and model intelligence

English translation

Bayesian Basics: Prior and Posterior Distributions

Published:

Category: Bayesian Learning

Read time: 4 min

Reads: 0

Lesson #5Views are counted together with the original Chinese articleImages are preserved from the source page

Structure Diagram: Fundamentals of Bayes’ Theorem — Prior and Posterior Distributions

The core of Bayesian learning lies in coherently integrating existing beliefs with new evidence while explicitly quantifying uncertainty. While reading, structure your understanding as follows: “Prior Distribution → Types of Prior Distributions → Example: Selecting a Prior Distribution → Posterior Distribution”, then verify each concept using the code snippets, case studies, or metrics presented in the main text.

Verification Diagram: Fundamentals of Bayes’ Theorem — Prior and Posterior Distributions

After reading, reinforce your understanding with a small real-world task: identify what the inputs are, where the processing steps occur, and whether the outputs are verifiable and acceptable. If the task fails, first inspect your choice of prior distribution, then check whether the type of prior is appropriate.

In the previous article, we derived Bayes’ theorem and learned how to update our beliefs based on prior knowledge. In this article, we delve deeper into the concepts—and significance—of prior distributions and posterior distributions. Through concrete examples, we demonstrate how to select an appropriate prior for a given problem and compute the corresponding posterior distribution.

Prior Distribution

A prior distribution is a subjective or objective representation of the probability distribution of a random variable before observing any data. It encodes our knowledge or beliefs about that variable prior to collecting empirical evidence.

Prior–Posterior Distribution Decision Card

When learning about prior and posterior distributions, align three elements along a single conceptual line: your initial judgment, the observed data, and the updated result.

Types of Prior Distributions

  1. Non-informative (or Objective) Priors:

    • These priors express minimal assumptions—assigning equal weight across plausible values—and are suitable when little or no prior knowledge is available. A uniform distribution is a common example.
  2. Informative Priors:

    • These incorporate substantive prior knowledge—for instance, results from past studies or domain expertise. Common choices include the normal distribution (e.g., for unknown means with known variance) or the gamma distribution (e.g., for unknown scale parameters).

Example: Selecting a Prior Distribution

Suppose we wish to estimate the defect rate θ of a manufactured product. Historical production data suggests this rate typically falls between 1% and 5%. We may therefore choose a Beta distribution, supported on [0,1], as our prior—well-suited for modeling proportions.

Let the defect rate be θ. We adopt the following Beta prior:

Beta(α,β)with  α=2,  β=8\text{Beta}(\alpha, \beta) \quad \text{with} \; \alpha = 2,\; \beta = 8

This reflects our belief that the defect rate is likely low.

Posterior Distribution

A posterior distribution is the updated probability distribution of a random variable after incorporating observed data. It represents a rational revision of the prior distribution in light of new evidence. According to Bayes’ theorem, the posterior is computed via:

Bayesian Learning Practice Retrospective Card

After finishing “Fundamentals of Bayes’ Theorem — Prior and Posterior Distributions”, reflect on three questions:

  • What problem does this framework solve?
  • At which step is error most likely to occur?
  • Can I implement and validate it on a small, self-contained example?
P(θD)=P(Dθ)P(θ)P(D)P(\theta \mid D) = \frac{P(D \mid \theta)\, P(\theta)}{P(D)}
  • P(θD)P(\theta \mid D) is the posterior distribution.
  • P(Dθ)P(D \mid \theta) is the likelihood—the probability of observing data DD given parameter θ\theta.
  • P(θ)P(\theta) is the prior distribution.
  • P(D)P(D) is the marginal likelihood (also called the evidence), a normalizing constant representing the average likelihood over all possible values of θ\theta.

Example: Computing the Posterior Distribution

Returning to our defect-rate estimation: suppose we inspect n=100n = 100 units and observe k=3k = 3 defects. We now compute the posterior using Bayes’ rule.

  1. Likelihood Function:
    Since each unit is independently defective with probability θ\theta, the number of defects follows a binomial distribution:
P(Dθ)=(nk)θk(1θ)nkP(D \mid \theta) = \binom{n}{k}\, \theta^k (1 - \theta)^{n - k}

where nn is total sample size and kk is observed defects.

  1. Prior Distribution:
    As chosen earlier:
P(θ)=Beta(2,8)P(\theta) = \text{Beta}(2, 8)
  1. Computing the Posterior:
    Substituting into Bayes’ formula and leveraging conjugacy (Beta–Binomial), the posterior is proportional to the product:
P(θD)P(Dθ)P(θ)P(\theta \mid D) \propto P(D \mid \theta) \cdot P(\theta)

Because the Beta distribution is conjugate to the Binomial likelihood, the posterior remains a Beta distribution—with updated parameters:

  • The new shape parameters become: αpost=α+k=2+3=5,βpost=β+(nk)=8+97=105\alpha_{\text{post}} = \alpha + k = 2 + 3 = 5,\quad \beta_{\text{post}} = \beta + (n - k) = 8 + 97 = 105

Thus, the posterior distribution is:

P(θD)=Beta(5,105)P(\theta \mid D) = \text{Beta}(5,\, 105)

This updated distribution fully captures how our belief about the defect rate has shifted after seeing the data.

Application Retrospective Card: Fundamentals of Bayes’ Theorem — Prior and Posterior Distributions

When reviewing “Fundamentals of Bayes’ Theorem — Prior and Posterior Distributions”, place key concepts, procedural steps, and observable outcomes side-by-side on a single page for efficient recall.

Application Checklist Card: Fundamentals of Bayes’ Theorem — Prior and Posterior Distributions

When practicing “Fundamentals of Bayes’ Theorem — Prior and Posterior Distributions”, write down the input conditions, processing actions, and observable outcomes together—making future review and debugging straightforward.

Summary

In this tutorial, we explored the definitions and significance of prior and posterior distributions. By selecting an appropriate prior and combining it with observed data, we computed the posterior distribution—thereby formalizing how beliefs should rationally evolve in light of evidence.

In the next tutorial, we will examine Bayesian updating rules and walk through practical case studies—deepening your grasp of Bayesian learning and statistical inference. Stay tuned!

Continue

Keep reading from here

Browse English site

Reader Messages

Reader messages

Questions, corrections, extra sources, or hands-on results can be left here. No login is required.

Max 800 characters

To reduce spam, each message is checked for length, link count, and posting frequency.

0/800

Messages

0 messages
Loading messages...