English translation
Bayesian Basics: Prior and Posterior Distributions
The core of Bayesian learning lies in coherently integrating existing beliefs with new evidence while explicitly quantifying uncertainty. While reading, structure your understanding as follows: “Prior Distribution → Types of Prior Distributions → Example: Selecting a Prior Distribution → Posterior Distribution”, then verify each concept using the code snippets, case studies, or metrics presented in the main text.
After reading, reinforce your understanding with a small real-world task: identify what the inputs are, where the processing steps occur, and whether the outputs are verifiable and acceptable. If the task fails, first inspect your choice of prior distribution, then check whether the type of prior is appropriate.
In the previous article, we derived Bayes’ theorem and learned how to update our beliefs based on prior knowledge. In this article, we delve deeper into the concepts—and significance—of prior distributions and posterior distributions. Through concrete examples, we demonstrate how to select an appropriate prior for a given problem and compute the corresponding posterior distribution.
Prior Distribution
A prior distribution is a subjective or objective representation of the probability distribution of a random variable before observing any data. It encodes our knowledge or beliefs about that variable prior to collecting empirical evidence.
When learning about prior and posterior distributions, align three elements along a single conceptual line: your initial judgment, the observed data, and the updated result.
Types of Prior Distributions
-
Non-informative (or Objective) Priors:
- These priors express minimal assumptions—assigning equal weight across plausible values—and are suitable when little or no prior knowledge is available. A uniform distribution is a common example.
-
Informative Priors:
- These incorporate substantive prior knowledge—for instance, results from past studies or domain expertise. Common choices include the normal distribution (e.g., for unknown means with known variance) or the gamma distribution (e.g., for unknown scale parameters).
Example: Selecting a Prior Distribution
Suppose we wish to estimate the defect rate θ of a manufactured product. Historical production data suggests this rate typically falls between 1% and 5%. We may therefore choose a Beta distribution, supported on [0,1], as our prior—well-suited for modeling proportions.
Let the defect rate be θ. We adopt the following Beta prior:
This reflects our belief that the defect rate is likely low.
Posterior Distribution
A posterior distribution is the updated probability distribution of a random variable after incorporating observed data. It represents a rational revision of the prior distribution in light of new evidence. According to Bayes’ theorem, the posterior is computed via:
After finishing “Fundamentals of Bayes’ Theorem — Prior and Posterior Distributions”, reflect on three questions:
- What problem does this framework solve?
- At which step is error most likely to occur?
- Can I implement and validate it on a small, self-contained example?
- is the posterior distribution.
- is the likelihood—the probability of observing data given parameter .
- is the prior distribution.
- is the marginal likelihood (also called the evidence), a normalizing constant representing the average likelihood over all possible values of .
Example: Computing the Posterior Distribution
Returning to our defect-rate estimation: suppose we inspect units and observe defects. We now compute the posterior using Bayes’ rule.
- Likelihood Function:
Since each unit is independently defective with probability , the number of defects follows a binomial distribution:
where is total sample size and is observed defects.
- Prior Distribution:
As chosen earlier:
- Computing the Posterior:
Substituting into Bayes’ formula and leveraging conjugacy (Beta–Binomial), the posterior is proportional to the product:
Because the Beta distribution is conjugate to the Binomial likelihood, the posterior remains a Beta distribution—with updated parameters:
- The new shape parameters become:
Thus, the posterior distribution is:
This updated distribution fully captures how our belief about the defect rate has shifted after seeing the data.
When reviewing “Fundamentals of Bayes’ Theorem — Prior and Posterior Distributions”, place key concepts, procedural steps, and observable outcomes side-by-side on a single page for efficient recall.
When practicing “Fundamentals of Bayes’ Theorem — Prior and Posterior Distributions”, write down the input conditions, processing actions, and observable outcomes together—making future review and debugging straightforward.
Summary
In this tutorial, we explored the definitions and significance of prior and posterior distributions. By selecting an appropriate prior and combining it with observed data, we computed the posterior distribution—thereby formalizing how beliefs should rationally evolve in light of evidence.
In the next tutorial, we will examine Bayesian updating rules and walk through practical case studies—deepening your grasp of Bayesian learning and statistical inference. Stay tuned!
Continue