Isolation Forest Example

This is a simple example of Isolation Forest using Python and scikit-learn.

Isolation Forest Overview

Isolation Forest is an anomaly detection algorithm that isolates outliers in a dataset. It builds an ensemble of decision trees, where each tree is grown by recursively partitioning the data. Anomalies are expected to be easier to isolate and require fewer splits to separate from the rest of the data. The algorithm assigns anomaly scores to data points, and points with higher scores are considered more likely to be outliers.

Key concepts of Isolation Forest:

Isolation Forest is particularly useful for detecting anomalies in datasets where anomalies are rare and have different characteristics than normal instances.

Python Source Code:

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.ensemble import IsolationForest

# Generate synthetic data with two clusters and outliers
np.random.seed(42)
X, _ = make_blobs(n_samples=300, centers=2, random_state=42)
outliers = np.array([[10, 10]])

# Add outliers to the data
X = np.concatenate([X, outliers])

# Build an Isolation Forest model
iso_forest = IsolationForest(contamination=0.03, random_state=42)
iso_forest.fit(X)

# Predict anomaly scores for the data
anomaly_scores = iso_forest.decision_function(X)

# Plot the results
plt.scatter(X[:, 0], X[:, 1], c='blue', marker='o', edgecolors='black', label='Normal Instances')
plt.scatter(outliers[:, 0], outliers[:, 1], c='red', marker='x', label='Outliers')
plt.title('Isolation Forest Example')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()

# Plot decision boundary based on anomaly scores
xx, yy = np.meshgrid(np.linspace(X[:, 0].min(), X[:, 0].max(), 100),
                     np.linspace(X[:, 1].min(), X[:, 1].max(), 100))
Z = iso_forest.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='black', linestyles='dashed')
plt.show()

Explanation: