Stacking Example

This is a simple example of stacking using Python and the scikit-learn library.

Stacking Overview

Stacking is an ensemble learning technique that combines multiple base models to create a meta-model, often referred to as a blender or meta-classifier. It involves training several diverse base models on the training data and then using a meta-model to make predictions based on the outputs of these base models. Stacking can be effective in improving predictive performance by leveraging the strengths of different models.

Key concepts of stacking:

Base Models: Individual models trained on the training data.
Meta-Model: A higher-level model that combines the predictions of base models.
Diversity: Ensuring diversity among base models to capture different aspects of the data.
Training and Prediction: Training base models, using them to make predictions, and then training the meta-model on these predictions.

Python Source Code:

# Import necessary libraries
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification
from sklearn.ensemble import StackingClassifier

# Generate synthetic data for classification
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_clusters_per_class=2, random_state=42)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base models
base_models = [
    ('rf', RandomForestClassifier(n_estimators=50, random_state=42)),
    ('gb', GradientBoostingClassifier(n_estimators=50, random_state=42))
]

# Define the meta-model
meta_model = LogisticRegression()

# Create the stacking classifier
stacking_classifier = StackingClassifier(estimators=base_models, final_estimator=meta_model)

# Train the stacking classifier on the training data
stacking_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = stacking_classifier.predict(X_test)

# Evaluate the performance of the stacking classifier
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of Stacking Classifier: {accuracy:.2f}')

Explanation:

Import Libraries: Import necessary Python libraries, including scikit-learn for ensemble learning.
Generate Synthetic Data: Generate synthetic data for classification.
Split Data: Split the data into training and testing sets.
Define Base Models: Define a list of base models (Random Forest and Gradient Boosting).
Define Meta-Model: Define the meta-model (Logistic Regression) that combines the outputs of base models.
Create Stacking Classifier: Create a Stacking Classifier using the base models and meta-model.
Train and Predict: Train the stacking classifier on the training data and make predictions on the test set.
Evaluate Performance: Evaluate the performance of the stacking classifier using accuracy.