English translation
Training data
The core idea of Bayesian learning is to combine prior beliefs with new evidence while explicitly representing uncertainty. While reading, structure your understanding as follows: “Foundations of the Naive Bayes Classifier → Example: Text Classification → Implementing the Naive Bayes Classifier → Priors”, then return to the code, case studies, or evaluation metrics in the main text for verification.
After reading, validate your understanding using a small real-world task: identify what the inputs are, where the processing steps occur, and whether the output is verifiable and acceptable. If the classifier fails, first revisit “Foundations of the Naive Bayes Classifier”, then consult “Example: Text Classification”.
In the previous article, we explored the theoretical foundations of Bayesian classification—introducing Bayes’ theorem, prior probabilities, likelihood functions, and posterior probabilities—including their definitions and computation methods. Today, we delve deeper into a concrete classification model: the Naive Bayes Classifier. This classifier is an exceptionally simple yet powerful probabilistic graphical model, widely applied in tasks such as text classification and spam detection.
Foundations of the Naive Bayes Classifier
The Naive Bayes Classifier is grounded in Bayes’ theorem and assumes conditional independence among features. This “naive” assumption greatly simplifies computation, enabling efficient classification via straightforward probability calculations. Its fundamental formula is:
When practicing “Naive Bayes Classifier for Bayesian Classification”, write down the input conditions, processing actions, and observable outcomes together—this makes future review more efficient.
When reviewing “Naive Bayes Classifier for Bayesian Classification”, place key concepts, procedural steps, and observable outcomes on the same page for consolidated revision.
Under the naive independence assumption, the joint conditional probability decomposes as:
Thus, the posterior probability becomes proportional to:
Here, is the prior probability, is the likelihood, and is a normalizing constant (often omitted during classification since it’s identical across all classes).
Example: Text Classification
Consider a text classification task: classifying emails as either “spam” or “non-spam”. We can implement a Naive Bayes classifier using the following steps:
- Data Preprocessing: Tokenize emails into words and construct a vocabulary.
- Feature Extraction: Estimate the probability of each word appearing in “spam” vs. “non-spam” emails.
- Model Construction: Use the computed probabilities to perform classification.
Data Collection and Preprocessing
Suppose we have the following three emails:
- Email 1:
"free earn cash" - Email 2:
"important meeting time" - Email 3:
"earn free cash opportunity"
Labels:
- Email 1: spam
- Email 2: non-spam
- Email 3: spam
Vocabulary: ["free", "earn", "cash", "important", "meeting", "time", "opportunity"]
Probability Computation
Next, compute the probability of each word under both classes.
-
Prior Probabilities:
-
Likelihood Probabilities:
Using Laplace smoothing (to avoid zero-probability issues), compute conditional word probabilities for each class.
Take the word "free" as an example:
- Occurs in spam emails: 2 times
- Occurs in non-spam emails: 0 times
- Vocabulary size: 7
- Total word count in spam emails: 4 (from Email 1 and Email 3:
"free","earn","cash","opportunity") - Smoothing parameter α = 1 ⇒ denominator = (spam word count + vocabulary size × α) = 4 + 7 = 11?
Wait — correction: In standard Laplace smoothing for multinomial NB, denominator is (total word count in class + vocabulary size). But here, total spam word count is actually 6: Email 1 has 3 words (free,earn,cash); Email 3 has 4 words (earn,free,cash,opportunity) → total = 7 words. However, the original text uses denominator 4 — implying per-class document count or simplified counting. To stay faithful to the source, we retain its arithmetic:
So for "free":
(Interpretation: The denominator “4” likely reflects the number of distinct words observed in the spam class plus smoothing offset — or is a pedagogical simplification. Other word probabilities follow similarly.)
Implementing the Naive Bayes Classifier
We can implement a Naive Bayes classifier using Python’s scikit-learn. Below is an illustrative example:
When learning the Naive Bayes classifier, first examine: class priors, word/feature likelihoods, smoothing method, posterior comparison logic, and limitations of the independence assumption.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
# Training data
emails = [
"free earn cash",
"important meeting time",
"earn free cash opportunity"
]
labels = ["spam", "non-spam", "spam"]
# Build pipeline
model = make_pipeline(CountVectorizer(), MultinomialNB())
# Train the model
model.fit(emails, labels)
# Test email
test_email = ["important cash opportunity"]
print(model.predict(test_email)) # Predicted class
Summary
In this tutorial, we introduced the core concepts and working principles of the Naive Bayes Classifier, and demonstrated its implementation through a concrete text classification example. In practice, the Naive Bayes classifier is widely adopted due to its simplicity, efficiency, and surprisingly strong performance—especially in high-dimensional sparse settings like text.
Before diving into the main text of “Naive Bayes Classifier for Bayesian Classification”, quickly scan the accompanying figures: What question does each figure pose? Which concepts must be clearly distinguished? Which step invites hands-on experimentation? And finally—by what criteria will success be judged?
Next, we will explore how to evaluate and improve the trained model to ensure both accuracy and efficiency.
Continue