This is a simple example of stacking using Python and the scikit-learn library.
Stacking is an ensemble learning technique that combines multiple base models to create a meta-model, often referred to as a blender or meta-classifier. It involves training several diverse base models on the training data and then using a meta-model to make predictions based on the outputs of these base models. Stacking can be effective in improving predictive performance by leveraging the strengths of different models.
Key concepts of stacking:
Python Source Code:
# Import necessary libraries
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification
from sklearn.ensemble import StackingClassifier
# Generate synthetic data for classification
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_clusters_per_class=2, random_state=42)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define base models
base_models = [
('rf', RandomForestClassifier(n_estimators=50, random_state=42)),
('gb', GradientBoostingClassifier(n_estimators=50, random_state=42))
]
# Define the meta-model
meta_model = LogisticRegression()
# Create the stacking classifier
stacking_classifier = StackingClassifier(estimators=base_models, final_estimator=meta_model)
# Train the stacking classifier on the training data
stacking_classifier.fit(X_train, y_train)
# Make predictions on the test set
y_pred = stacking_classifier.predict(X_test)
# Evaluate the performance of the stacking classifier
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of Stacking Classifier: {accuracy:.2f}')
Explanation: