Principal Component Analysis (PCA) Example

This is a simple example of Principal Component Analysis (PCA) using Python and scikit-learn.

Principal Component Analysis Overview

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional representation while retaining as much of the original variance as possible. PCA achieves this by identifying and preserving the principal components, which are orthogonal axes in the feature space corresponding to the directions of maximum variance.

Key concepts of Principal Component Analysis:

PCA is widely used for visualization, noise reduction, and speeding up machine learning algorithms by reducing the number of features while preserving the most important information in the data.

Python Source Code:

# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.decomposition import PCA

# Generate synthetic data
np.random.seed(42)
X, _ = make_blobs(n_samples=100, n_features=2, centers=3, random_state=42)

# Apply PCA for dimensionality reduction
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Plot the original and PCA-transformed data
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c='blue', marker='o', label='Original Data')
plt.title('Original Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()

plt.subplot(1, 2, 2)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c='red', marker='o', label='PCA-Transformed Data')
plt.title('PCA-Transformed Data')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()

plt.tight_layout()
plt.show()

Explanation: