English translation
Load dataset
Model evaluation is itself a probabilistic problem: prediction scores, decision thresholds, and misclassification costs jointly determine whether a model is fit for use.
I integrate performance metrics with business costs. Relying solely on accuracy risks overlooking class imbalance and critical errors.
In the previous article, “Applied Case Analysis: Practical Data Analysis Examples,” we explored how to extract valuable insights through data cleaning and analysis. This article focuses on model evaluation and selection—a pivotal stage in building AI models.
Why Model Evaluation Is Essential
After constructing a predictive model, evaluating its performance is crucial: it helps us determine whether the model is effective and likely to deliver strong results in real-world applications. Model evaluation typically relies on quantitative metrics that enable comparison across candidate models and support selection of the optimal one.
Common Model Evaluation Metrics
Evaluation metrics vary by task type—classification or regression:
1. Classification Metrics
-
Accuracy: The proportion of correctly classified samples out of all samples.
-
Precision: Among all instances predicted as positive, the proportion that are truly positive.
-
Recall (Sensitivity): Among all actual positive instances, the proportion correctly identified as positive.
-
F1 Score: The harmonic mean of precision and recall.
2. Regression Metrics
-
Mean Squared Error (MSE): The average of squared differences between predictions and true values.
-
Root Mean Squared Error (RMSE): The square root of MSE—interpretable as the standard deviation of prediction errors.
-
Coefficient of Determination (R²): The proportion of variance in the target variable explained by the model; closer to 1 indicates better fit.
Model Selection Strategies
After evaluation, we select the best-performing model based on results and practical requirements. Common approaches include:
When analyzing model evaluation and selection cases, first examine: data splitting strategy, metric combinations, cost of misclassification, model stability, interpretability needs, and rationale behind the final choice.
-
Cross-Validation: Partition data into training and test sets; use cross-validation to mitigate overfitting.
-
AIC/BIC Criteria: Compare complex models using information-theoretic criteria—select the model with the lowest AIC or BIC value.
-
Learning Curves: Plot training and validation loss against increasing sample size to diagnose high bias (underfitting) or high variance (overfitting).
Practical Example
We implement a simple classification model using scikit-learn and evaluate it:
“Applied Case Analysis: Model Evaluation and Selection” can be read through four lenses: Scenario, Concept, Action, and Outcome. First align these four dimensions, then revisit parameters, code, or workflows in the main text.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Build Random Forest classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Generate predictions
y_pred = model.predict(X_test)
# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')
print(f'Accuracy: {accuracy:.2f}')
print(f'Precision: {precision:.2f}')
print(f'Recall: {recall:.2f}')
print(f'F1 Score: {f1:.2f}')
In this code snippet, we train a RandomForestClassifier, then assess its performance using accuracy_score, precision_score, recall_score, and f1_score. The resulting metrics help us gauge how well the model generalizes to unseen data.
If you haven’t fully internalized “Applied Case Analysis: Model Evaluation and Selection,” revisit the four core actions outlined on this card to retrace the workflow.
When reviewing “Applied Case Analysis: Model Evaluation and Selection,” avoid launching large-scale projects upfront. Instead, start with a single, simple example to verify whether the core logic is clear.
Summary
By applying appropriate evaluation metrics and selection strategies, we can rigorously assess and choose the best model for our specific dataset and task. Solid model evaluation and selection form the foundation for robust downstream learning and application. In the next article, we’ll explore curated resources and advanced techniques to further strengthen your AI knowledge and capabilities.
Continue