English translation
Define objective function (negative because we minimize)
Bayesian learning centers on integrating prior beliefs with new evidence while explicitly quantifying uncertainty. As you read, structure your understanding as follows:
Bayes’ Theorem and the Posterior Distribution → Definition of Maximum A Posteriori Estimation (MAP) → MAP Application Example: Coin Tossing → Choosing a Prior Distribution,
then return to the code, examples, or metrics in the main text for verification.
After reading, test your understanding using a small real-world task:
- What are the inputs?
- Where does processing occur?
- Is the output verifiable and acceptable?
If the task fails, first revisit “Bayes’ Theorem and the Posterior Distribution”, then check “Definition of Maximum A Posteriori Estimation (MAP)”.
In this tutorial, we delve into Maximum A Posteriori Estimation (MAP). In the previous part, we covered the fundamentals of Bayes’ theorem and its updating rule. Now, we apply Bayes’ theorem to parameter estimation—specifically, using MAP to infer unknown parameters.
Bayes’ Theorem and the Posterior Distribution
The core idea behind Bayes’ theorem is to update our belief about a parameter based on observed data. The posterior distribution represents our updated belief about the parameter after observing the data, expressed mathematically as:
When learning MAP, first compare what information the likelihood and the prior each contribute. When sample size is small or noise is high, the prior strongly influences the final estimate.
where:
- is the posterior distribution of parameter given data ;
- is the likelihood function—the probability of observing data under parameter ;
- is the prior distribution—our belief about before seeing the data;
- is the marginal likelihood (or evidence), which is constant w.r.t. and thus plays no role in optimization.
Definition of Maximum A Posteriori Estimation (MAP)
Maximum A Posteriori Estimation (MAP) estimates the parameter value by maximizing the posterior distribution. Specifically, we seek:
You don’t need to absorb all details of “Maximum A Posteriori Estimation (MAP)” at once. Start with a small, hands-on problem you can verify yourself, then use the diagrams and main text to fill in conceptual gaps.
Using Bayes’ theorem, this is equivalent to:
since is independent of and therefore omitted during maximization.
MAP Application Example: Coin Tossing
Suppose we have a biased coin and wish to estimate the probability that it lands heads up. We toss it 10 times and observe 7 heads (H) and 3 tails (T). We’ll use MAP to estimate .
1. Choosing a Prior Distribution
We adopt a Beta distribution as the prior:
We select and , reflecting a prior belief that heads and tails are equally likely before any tosses.
2. Likelihood Function
Given 7 heads and 3 tails in 10 tosses, the likelihood is:
3. Computing the Posterior Distribution
Maximizing the posterior is equivalent to maximizing the unnormalized posterior:
4. Solving for the MAP Estimate
To find the value of that maximizes , we differentiate and solve for critical points:
Alternatively, we can use numerical optimization:
import numpy as np
from scipy.optimize import minimize_scalar
# Define objective function (negative because we minimize)
def objective(theta):
return - (theta**8 * (1 - theta)**4)
result = minimize_scalar(objective, bounds=(0, 1), method='bounded')
theta_map = result.x
print("MAP estimate:", theta_map)
This code outputs our estimated probability of heads.
If “Maximum A Posteriori Estimation (MAP)” hasn’t fully clicked yet, walk through these four actions again using this card.
When reviewing “Maximum A Posteriori Estimation (MAP)”, avoid jumping straight into large projects. Instead, first validate the core logic using a simple, concrete example.
Summary
This tutorial thoroughly introduced the concept and application of Maximum A Posteriori Estimation (MAP). Using the coin-tossing example, we demonstrated how to estimate parameters by maximizing the posterior distribution. In the next tutorial, we will compare Bayesian estimation with frequentist estimation to deepen our understanding of statistical inference.
Mastering MAP lays a solid foundation for further study in Bayesian learning. If you have questions or would like deeper discussion, feel free to ask!
Continue