English translation
Assume we have the following features and labels
The core of Bayesian learning lies in integrating prior judgments with new evidence while explicitly quantifying uncertainty. While reading, structure your understanding as follows: “Core Idea of the Bayesian Classifier → Prior Probability, Likelihood Function, and Marginal Probability → Prior Probability → Likelihood Function,” then verify each concept using the code snippets, case studies, or evaluation metrics presented in the main text.
After reading, conduct a quick review using a small real-world task: identify what the inputs are, where the processing steps occur, and whether the outputs are verifiable and acceptable. If the task fails, first revisit the “Core Idea of the Bayesian Classifier”; if unresolved, proceed to “Prior Probability, Likelihood Function, and Marginal Probability.”
In this article, we delve into the foundational theory of Bayesian classification—an essential topic within Bayesian learning and statistical inference. At its heart, Bayesian classification infers unknown class labels by combining prior knowledge with observed data. In contrast to Bayesian regression—which focuses on predicting continuous numerical values—Bayesian classification addresses the problem of assigning discrete class labels to samples based on their features.
Core Idea of the Bayesian Classifier
The Bayesian classifier is grounded in Bayes’ theorem, expressed as:
When interpreting Bayesian classification, begin by examining: class priors, feature conditional probabilities, evidence normalization, posterior comparison, and decision boundaries.
In this formula:
- is the posterior probability of class given feature vector ;
- is the likelihood function: the probability of observing feature under class ;
- is the prior probability of class ;
- is the marginal probability of feature , serving as a normalizing constant for the posterior.
For classification, we select the class with the highest posterior probability, typically applying the following decision rule:
Prior Probability, Likelihood Function, and Marginal Probability
Prior Probability
By the end of “Basic Theory of Bayesian Classification,” treat the diagram above as a checklist: Is the problem clearly defined? Are operations concretely implemented? Can the evaluation criteria be reused?
The prior probability represents our belief about each class before observing any data. These probabilities can be set based on historical data or domain expertise.
For example, in a tumor classification task, suppose we know that malignant tumors (class ) occur at a rate of 10%, so , while benign tumors (class ) occur at 90%, so .
Likelihood Function
The likelihood function gives the probability of observing feature given that the true class is . Modeling this requires assumptions about the distribution of features—commonly assuming feature independence and modeling each feature using distributions such as Gaussian (normal), Bernoulli, or multinomial.
For instance, in our tumor classification example, features might include tumor size and shape. Suppose tumor size follows for benign tumors and for malignant tumors. Then:
Marginal Probability
The marginal probability is often not computed explicitly—since classification only requires comparing posterior probabilities across classes. It can be derived via the law of total probability:
Implementing a Bayesian Classifier
Below is a Python example demonstrating how to build a simple Bayesian classifier using GaussianNB from scikit-learn.
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Assume we have the following features and labels
X = np.array([[5], [6], [8], [9], [10], [3], [4], [7], [2], [1]])
y = np.array([0, 0, 0, 1, 1, 1, 0, 1, 1, 0]) # 0: benign, 1: malignant
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Instantiate and train the Bayesian classifier
model = GaussianNB()
model.fit(X_train, y_train)
# Make predictions and compute accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Model accuracy: {accuracy:.2f}')
In this example, we construct a simple binary classification problem and use a Gaussian Naive Bayes classifier to learn from the data, ultimately evaluating its accuracy.
By this point, you can distill “Basic Theory of Bayesian Classification” into a concise post-mortem summary: first articulate the central narrative, then validate it with a small, concrete task.
After completing “Basic Theory of Bayesian Classification,” pick a small working example and walk through the full pipeline end-to-end—then assess which steps you can now execute independently.
Summary
This article introduced the fundamental theory of Bayesian classification, including Bayes’ theorem and the conceptual roles of prior probability, likelihood function, and marginal probability. We also demonstrated—via a practical Python example—how to implement a basic Bayesian classifier. These foundations will support deeper exploration of more advanced Bayesian classification techniques.
Next, we will examine the Naive Bayes classifier in detail, exploring its practical applications and implementation nuances.
Continue