Guozhen AIGlobal AI field notes and model intelligence

English translation

Assume we have the following features and labels

Published:

Category: Bayesian Learning

Read time: 4 min

Reads: 0

Lesson #16Views are counted together with the original Chinese articleImages are preserved from the source page

Structure Diagram of Basic Bayesian Classification Theory

The core of Bayesian learning lies in integrating prior judgments with new evidence while explicitly quantifying uncertainty. While reading, structure your understanding as follows: “Core Idea of the Bayesian Classifier → Prior Probability, Likelihood Function, and Marginal Probability → Prior Probability → Likelihood Function,” then verify each concept using the code snippets, case studies, or evaluation metrics presented in the main text.

Verification Checklist for Basic Bayesian Classification Theory

After reading, conduct a quick review using a small real-world task: identify what the inputs are, where the processing steps occur, and whether the outputs are verifiable and acceptable. If the task fails, first revisit the “Core Idea of the Bayesian Classifier”; if unresolved, proceed to “Prior Probability, Likelihood Function, and Marginal Probability.”

In this article, we delve into the foundational theory of Bayesian classification—an essential topic within Bayesian learning and statistical inference. At its heart, Bayesian classification infers unknown class labels by combining prior knowledge with observed data. In contrast to Bayesian regression—which focuses on predicting continuous numerical values—Bayesian classification addresses the problem of assigning discrete class labels to samples based on their features.

Core Idea of the Bayesian Classifier

The Bayesian classifier is grounded in Bayes’ theorem, expressed as:

Bayesian Classification Theory Decision Card

When interpreting Bayesian classification, begin by examining: class priors, feature conditional probabilities, evidence normalization, posterior comparison, and decision boundaries.

P(CX)=P(XC)P(C)P(X)P(C|X) = \frac{P(X|C)P(C)}{P(X)}

In this formula:

  • P(CX)P(C|X) is the posterior probability of class CC given feature vector XX;
  • P(XC)P(X|C) is the likelihood function: the probability of observing feature XX under class CC;
  • P(C)P(C) is the prior probability of class CC;
  • P(X)P(X) is the marginal probability of feature XX, serving as a normalizing constant for the posterior.

For classification, we select the class with the highest posterior probability, typically applying the following decision rule:

C^=argmaxCP(CX)\hat{C} = \arg\max_{C} P(C|X)

Prior Probability, Likelihood Function, and Marginal Probability

Prior Probability

Bayesian Learning Reading Roadmap Card

By the end of “Basic Theory of Bayesian Classification,” treat the diagram above as a checklist: Is the problem clearly defined? Are operations concretely implemented? Can the evaluation criteria be reused?

The prior probability P(C)P(C) represents our belief about each class before observing any data. These probabilities can be set based on historical data or domain expertise.

For example, in a tumor classification task, suppose we know that malignant tumors (class C1C_1) occur at a rate of 10%, so P(C1)=0.1P(C_1) = 0.1, while benign tumors (class C2C_2) occur at 90%, so P(C2)=0.9P(C_2) = 0.9.

Likelihood Function

The likelihood function P(XC)P(X|C) gives the probability of observing feature XX given that the true class is CC. Modeling this requires assumptions about the distribution of features—commonly assuming feature independence and modeling each feature using distributions such as Gaussian (normal), Bernoulli, or multinomial.

For instance, in our tumor classification example, features might include tumor size and shape. Suppose tumor size follows N(5,22)\mathcal{N}(5, 2^2) for benign tumors and N(10,32)\mathcal{N}(10, 3^2) for malignant tumors. Then:

P(XC1)=12π3exp((X10)2232)P(X|C_1) = \frac{1}{\sqrt{2\pi}\cdot 3} \exp\left(-\frac{(X-10)^2}{2\cdot 3^2}\right) P(XC2)=12π2exp((X5)2222)P(X|C_2) = \frac{1}{\sqrt{2\pi}\cdot 2} \exp\left(-\frac{(X-5)^2}{2\cdot 2^2}\right)

Marginal Probability

The marginal probability P(X)P(X) is often not computed explicitly—since classification only requires comparing posterior probabilities across classes. It can be derived via the law of total probability:

P(X)=CP(XC)P(C)P(X) = \sum_{C} P(X|C)P(C)

Implementing a Bayesian Classifier

Below is a Python example demonstrating how to build a simple Bayesian classifier using GaussianNB from scikit-learn.

import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Assume we have the following features and labels
X = np.array([[5], [6], [8], [9], [10], [3], [4], [7], [2], [1]])
y = np.array([0, 0, 0, 1, 1, 1, 0, 1, 1, 0])  # 0: benign, 1: malignant

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Instantiate and train the Bayesian classifier
model = GaussianNB()
model.fit(X_train, y_train)

# Make predictions and compute accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f'Model accuracy: {accuracy:.2f}')

In this example, we construct a simple binary classification problem and use a Gaussian Naive Bayes classifier to learn from the data, ultimately evaluating its accuracy.

Post-Mortem Summary Card for Basic Bayesian Classification Theory

By this point, you can distill “Basic Theory of Bayesian Classification” into a concise post-mortem summary: first articulate the central narrative, then validate it with a small, concrete task.

Application Verification Card for Basic Bayesian Classification Theory

After completing “Basic Theory of Bayesian Classification,” pick a small working example and walk through the full pipeline end-to-end—then assess which steps you can now execute independently.

Summary

This article introduced the fundamental theory of Bayesian classification, including Bayes’ theorem and the conceptual roles of prior probability, likelihood function, and marginal probability. We also demonstrated—via a practical Python example—how to implement a basic Bayesian classifier. These foundations will support deeper exploration of more advanced Bayesian classification techniques.

Next, we will examine the Naive Bayes classifier in detail, exploring its practical applications and implementation nuances.

Continue

Keep reading from here

Browse English site

Reader Messages

Reader messages

Questions, corrections, extra sources, or hands-on results can be left here. No login is required.

Max 800 characters

To reduce spam, each message is checked for length, link count, and posting frequency.

0/800

Messages

0 messages
Loading messages...