Naive Bayes Example

This is a simple example of Naive Bayes using Python and scikit-learn.

Naive Bayes Overview

Naive Bayes is a probabilistic machine learning algorithm commonly used for classification tasks. It is based on Bayes' theorem, which describes the probability of an event based on prior knowledge of conditions that might be related to the event. The "naive" assumption in Naive Bayes is that features are conditionally independent given the class label, which simplifies the modeling process.

Key concepts of Naive Bayes:

Bayes' Theorem: A mathematical formula for calculating conditional probabilities.
Prior Probability: The probability of a class before considering the evidence from features.
Likelihood: The probability of observing the features given a particular class.
Posterior Probability: The updated probability of a class after considering the evidence.
Conditional Independence: The assumption that features are independent given the class label (naive assumption).

Naive Bayes is computationally efficient and works well in practice, especially for text classification and spam filtering. It is sensitive to the assumption of feature independence and may not perform well if this assumption is violated.

Python Source Code:

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Generate synthetic data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = (4 + 3 * X + np.random.randn(100, 1)) > 6  # Binary classification task

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Naive Bayes model
model = GaussianNB()
model.fit(X_train, y_train.ravel())

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

# Plot the results (for binary classification)
plt.scatter(X_test, y_test, color='black', label='Actual')
plt.scatter(X_test, y_pred, color='red', marker='x', label='Predicted')
plt.title('Naive Bayes Example')
plt.xlabel('X')
plt.ylabel('Class (0 or 1)')
plt.legend()
plt.show()

Explanation:

Import Libraries: Import necessary Python libraries, including NumPy for numerical operations, Matplotlib for plotting, and scikit-learn for machine learning.
Generate Synthetic Data: Create synthetic binary classification data with a linear relationship and some random noise.
Split Data: Split the data into training and testing sets using the train_test_split function.
Train Model: Create and train a Naive Bayes model using scikit-learn's GaussianNB.
Make Predictions: Use the trained Naive Bayes model to make predictions on the test set.
Evaluate Model: Calculate accuracy to evaluate the performance of the model (for binary classification).
Plot Results: Plot the actual vs. predicted values for visualization (for binary classification).