You can install scikit-learn using pip:
pip install scikit-learn
Import the relevant modules from sklearn:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
scikit-learn works with data represented as NumPy arrays or SciPy sparse matrices. Features are expected to be in a two-dimensional array, and the target variable should be a one-dimensional array.
Split your data into training and testing sets:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)
Choose a machine learning model and train it on the training data:
model = LogisticRegression()
model.fit(X_train, y_train)
Use the trained model to make predictions on new data:
predictions = model.predict(X_test)
Evaluate the performance of the model using metrics such as accuracy:
accuracy = accuracy_score(y_test, predictions)
scikit-learn provides a wide range of machine learning algorithms, tools for model selection, feature extraction, and preprocessing.
Refer to the scikit-learn Documentation for detailed information and examples.
Here's a simple example demonstrating the use of scikit-learn for a basic classification task:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a k-nearest neighbors classifier
knn = KNeighborsClassifier(n_neighbors=3)
# Train the classifier
knn.fit(X_train, y_train)
# Make predictions on the test set
predictions = knn.predict(X_test)
# Evaluate accuracy
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")