XGBoost is an open-source library that implements the gradient boosting algorithm, a powerful ensemble learning technique. It was developed to optimize performance and computational efficiency, making it one of the most popular choices for structured/tabular data problems. XGBoost can be used for both classification and regression tasks and has become a standard tool in many machine learning workflows.
import xgboost as xgb
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Load the Boston Housing dataset
boston = load_boston()
X, y = boston.data, boston.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Convert the data to DMatrix format, a specialized data structure used by XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
# Specify XGBoost parameters
params = {
'objective': 'reg:squarederror',
'max_depth': 3,
'learning_rate': 0.1,
'n_estimators': 100
}
# Train the XGBoost model
model = xgb.train(params, dtrain)
# Make predictions on the test set
predictions = model.predict(dtest)
# Evaluate the model
mse = mean_squared_error(y_test, predictions)
print(f'Mean Squared Error on Test Set: {mse}')
This example demonstrates using XGBoost for regression on the Boston Housing dataset:
Feel free to run this code in a Python environment with XGBoost and scikit-learn installed to explore the capabilities of XGBoost for gradient boosting!
To install XGBoost, you can use the following command:
pip install xgboost