Intermediate to Advanced NumPy Questions for Data Pipelines

1. How do you create a NumPy array from a list and apply a mathematical operation to all elements?

Answer: Use numpy.array() to create an array and perform operations like addition, multiplication directly on the array.

import numpy as np
data = [1, 2, 3, 4]
arr = np.array(data)
result = arr + 10
print(result)
# Output: [11 12 13 14]

2. How do you filter elements in a NumPy array using a condition?

Answer: Use conditional indexing to filter elements. For example, to get elements greater than 5:

arr = np.array([2, 4, 6, 8, 10])
filtered = arr[arr > 5]
print(filtered)
# Output: [ 6  8 10]

3. How can you create an identity matrix using NumPy?

Answer: Use np.eye() to create an identity matrix.

identity_matrix = np.eye(4)
print(identity_matrix)
# Output:
# [[1. 0. 0. 0.]
#  [0. 1. 0. 0.]
#  [0. 0. 1. 0.]
#  [0. 0. 0. 1.]]

4. How can you reshape a NumPy array?

Answer: Use reshape() to change the shape of an array.

arr = np.array([1, 2, 3, 4, 5, 6])
reshaped = arr.reshape((2, 3))
print(reshaped)
# Output:
# [[1 2 3]
#  [4 5 6]]

5. How do you stack two arrays vertically using NumPy?

Answer: Use np.vstack() to stack arrays vertically.

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
stacked = np.vstack((arr1, arr2))
print(stacked)
# Output:
# [[1 2 3]
#  [4 5 6]]

6. How do you find the mean across a specific axis in a 2D array?

Answer: Use np.mean() and specify the axis parameter (0 for columns, 1 for rows).

arr = np.array([[1, 2, 3], [4, 5, 6]])
mean_col = np.mean(arr, axis=0)
mean_row = np.mean(arr, axis=1)
print(mean_col)  # Output: [2.5 3.5 4.5]
print(mean_row)  # Output: [2. 5.]

7. How do you generate random numbers from a normal distribution in NumPy?

Answer: Use np.random.normal() to generate random numbers from a normal (Gaussian) distribution.

random_numbers = np.random.normal(loc=0, scale=1, size=5)
print(random_numbers)
# Output: [-0.245 0.343 -1.098 1.567 -0.657]

8. How do you concatenate two arrays along an axis?

Answer: Use np.concatenate() to join arrays along a specified axis.

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])
concatenated = np.concatenate((arr1, arr2), axis=0)
print(concatenated)
# Output:
# [[1 2]
#  [3 4]
#  [5 6]]

9. How can you compute the dot product of two arrays?

Answer: Use np.dot() to compute the dot product of two arrays.

arr1 = np.array([1, 2])
arr2 = np.array([3, 4])
dot_product = np.dot(arr1, arr2)
print(dot_product)
# Output: 11

10. How do you flatten a multi-dimensional array in NumPy?

Answer: Use flatten() or ravel() to convert a multi-dimensional array into a 1D array.

arr = np.array([[1, 2, 3], [4, 5, 6]])
flattened = arr.flatten()
print(flattened)
# Output: [1 2 3 4 5 6]

11. How do you calculate the cumulative sum of a NumPy array?

Answer: Use np.cumsum() to compute the cumulative sum of elements.

arr = np.array([1, 2, 3, 4])
cumulative_sum = np.cumsum(arr)
print(cumulative_sum)
# Output: [ 1  3  6 10]

12. How can you find unique elements and their counts in a NumPy array?

Answer: Use np.unique() with return_counts=True.

arr = np.array([1, 2, 2, 3, 3, 3])
unique_elements, counts = np.unique(arr, return_counts=True)
print(unique_elements)  # Output: [1 2 3]
print(counts)  # Output: [1 2 3]

13. How do you replace elements in a NumPy array using a condition?

Answer: Use np.where() to conditionally replace elements.

arr = np.array([1, 2, 3, 4, 5])
result = np.where(arr > 3, 0, arr)
print(result)
# Output: [1 2 3 0 0]

14. How do you generate a NumPy array of random integers?

Answer: Use np.random.randint() to generate random integers within a range.

random_integers = np.random.randint(low=0, high=10, size=5)
print(random_integers)
# Output: [3 7 0 1 9]

15. How can you calculate the standard deviation of a NumPy array?

Answer: Use np.std() to compute the standard deviation.