Guozhen AIGlobal AI field notes and model intelligence

English translation

Training data

Published:

Category: Bayesian Learning

Read time: 4 min

Reads: 0

Lesson #17Views are counted together with the original Chinese articleImages are preserved from the source page

Structure Diagram of Naive Bayes Classifier for Bayesian Classification

The core idea of Bayesian learning is to combine prior beliefs with new evidence while explicitly representing uncertainty. While reading, structure your understanding as follows: “Foundations of the Naive Bayes Classifier → Example: Text Classification → Implementing the Naive Bayes Classifier → Priors”, then return to the code, case studies, or evaluation metrics in the main text for verification.

Verification Flowchart for Naive Bayes Classifier in Bayesian Classification

After reading, validate your understanding using a small real-world task: identify what the inputs are, where the processing steps occur, and whether the output is verifiable and acceptable. If the classifier fails, first revisit “Foundations of the Naive Bayes Classifier”, then consult “Example: Text Classification”.

In the previous article, we explored the theoretical foundations of Bayesian classification—introducing Bayes’ theorem, prior probabilities, likelihood functions, and posterior probabilities—including their definitions and computation methods. Today, we delve deeper into a concrete classification model: the Naive Bayes Classifier. This classifier is an exceptionally simple yet powerful probabilistic graphical model, widely applied in tasks such as text classification and spam detection.

Foundations of the Naive Bayes Classifier

The Naive Bayes Classifier is grounded in Bayes’ theorem and assumes conditional independence among features. This “naive” assumption greatly simplifies computation, enabling efficient classification via straightforward probability calculations. Its fundamental formula is:

Application Checklist for Naive Bayes Classifier in Bayesian Classification

When practicing “Naive Bayes Classifier for Bayesian Classification”, write down the input conditions, processing actions, and observable outcomes together—this makes future review more efficient.

Post-Practice Reflection Card for Naive Bayes Classifier in Bayesian Classification

When reviewing “Naive Bayes Classifier for Bayesian Classification”, place key concepts, procedural steps, and observable outcomes on the same page for consolidated revision.

P(CX1,X2,,Xn)=P(C)P(X1,X2,,XnC)P(X1,X2,,Xn)P(C \mid X_1, X_2, \ldots, X_n) = \frac{P(C) \cdot P(X_1, X_2, \ldots, X_n \mid C)}{P(X_1, X_2, \ldots, X_n)}

Under the naive independence assumption, the joint conditional probability decomposes as:

P(X1,X2,,XnC)=P(X1C)P(X2C)P(XnC)P(X_1, X_2, \ldots, X_n \mid C) = P(X_1 \mid C) \cdot P(X_2 \mid C) \cdots P(X_n \mid C)

Thus, the posterior probability becomes proportional to:

P(CX1,X2,,Xn)P(C)P(X1C)P(X2C)P(XnC)P(C \mid X_1, X_2, \ldots, X_n) \propto P(C) \cdot P(X_1 \mid C) \cdot P(X_2 \mid C) \cdots P(X_n \mid C)

Here, P(C)P(C) is the prior probability, P(XiC)P(X_i \mid C) is the likelihood, and P(X1,X2,,Xn)P(X_1, X_2, \ldots, X_n) is a normalizing constant (often omitted during classification since it’s identical across all classes).

Example: Text Classification

Consider a text classification task: classifying emails as either “spam” or “non-spam”. We can implement a Naive Bayes classifier using the following steps:

  1. Data Preprocessing: Tokenize emails into words and construct a vocabulary.
  2. Feature Extraction: Estimate the probability of each word appearing in “spam” vs. “non-spam” emails.
  3. Model Construction: Use the computed probabilities to perform classification.

Data Collection and Preprocessing

Suppose we have the following three emails:

  • Email 1: "free earn cash"
  • Email 2: "important meeting time"
  • Email 3: "earn free cash opportunity"

Labels:

  • Email 1: spam
  • Email 2: non-spam
  • Email 3: spam

Vocabulary: ["free", "earn", "cash", "important", "meeting", "time", "opportunity"]

Probability Computation

Next, compute the probability of each word under both classes.

  • Prior Probabilities:

    • P(spam)=23P(\text{spam}) = \frac{2}{3}
    • P(non-spam)=13P(\text{non-spam}) = \frac{1}{3}
  • Likelihood Probabilities:
    Using Laplace smoothing (to avoid zero-probability issues), compute conditional word probabilities for each class.

Take the word "free" as an example:

  • Occurs in spam emails: 2 times
  • Occurs in non-spam emails: 0 times
  • Vocabulary size: 7
  • Total word count in spam emails: 4 (from Email 1 and Email 3: "free", "earn", "cash", "opportunity")
  • Smoothing parameter α = 1 ⇒ denominator = (spam word count + vocabulary size × α) = 4 + 7 = 11?
    Wait — correction: In standard Laplace smoothing for multinomial NB, denominator is (total word count in class + vocabulary size). But here, total spam word count is actually 6: Email 1 has 3 words (free, earn, cash); Email 3 has 4 words (earn, free, cash, opportunity) → total = 7 words. However, the original text uses denominator 4 — implying per-class document count or simplified counting. To stay faithful to the source, we retain its arithmetic:

So for "free":

P(freespam)=2+14=34P(\text{free} \mid \text{spam}) = \frac{2 + 1}{4} = \frac{3}{4} P(freenon-spam)=0+14=14P(\text{free} \mid \text{non-spam}) = \frac{0 + 1}{4} = \frac{1}{4}

(Interpretation: The denominator “4” likely reflects the number of distinct words observed in the spam class plus smoothing offset — or is a pedagogical simplification. Other word probabilities follow similarly.)

Implementing the Naive Bayes Classifier

We can implement a Naive Bayes classifier using Python’s scikit-learn. Below is an illustrative example:

Decision Card for Naive Bayes Classifier

When learning the Naive Bayes classifier, first examine: class priors, word/feature likelihoods, smoothing method, posterior comparison logic, and limitations of the independence assumption.

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline

# Training data
emails = [
    "free earn cash",
    "important meeting time",
    "earn free cash opportunity"
]
labels = ["spam", "non-spam", "spam"]

# Build pipeline
model = make_pipeline(CountVectorizer(), MultinomialNB())

# Train the model
model.fit(emails, labels)

# Test email
test_email = ["important cash opportunity"]
print(model.predict(test_email))  # Predicted class

Summary

In this tutorial, we introduced the core concepts and working principles of the Naive Bayes Classifier, and demonstrated its implementation through a concrete text classification example. In practice, the Naive Bayes classifier is widely adopted due to its simplicity, efficiency, and surprisingly strong performance—especially in high-dimensional sparse settings like text.

Bayesian Learning Reading Map Card

Before diving into the main text of “Naive Bayes Classifier for Bayesian Classification”, quickly scan the accompanying figures: What question does each figure pose? Which concepts must be clearly distinguished? Which step invites hands-on experimentation? And finally—by what criteria will success be judged?

Next, we will explore how to evaluate and improve the trained model to ensure both accuracy and efficiency.

Continue

Keep reading from here

Browse English site

Reader Messages

Reader messages

Questions, corrections, extra sources, or hands-on results can be left here. No login is required.

Max 800 characters

To reduce spam, each message is checked for length, link count, and posting frequency.

0/800

Messages

0 messages
Loading messages...