Apache Kafka Overview

PYTHON

Key Features of Apache Kafka:

  1. Distributed and Scalable: Kafka is designed to be distributed across multiple nodes, providing scalability to handle large volumes of data and high throughput.
  2. Fault Tolerance: Kafka is fault-tolerant, meaning it can continue to operate in the presence of node failures without losing data. It achieves fault tolerance through data replication across multiple brokers.
  3. Durability: Kafka provides persistent storage of messages, ensuring that data is not lost even if a consumer is not able to process it immediately.
  4. High Throughput: Kafka is capable of handling a high volume of data streams and can process thousands of messages per second.
  5. Partitioning: Kafka topics are divided into partitions, allowing for parallel processing and scalability. Each partition is an ordered, immutable sequence of messages.
  6. Retention: Kafka allows you to configure the retention period for messages, determining how long messages are stored in a topic.
  7. Exactly-Once Semantics: Kafka supports exactly-once message delivery semantics, ensuring that messages are neither lost nor duplicated during processing.
  8. Connectivity: Kafka has a variety of connectors (source and sink) for integrating with various data sources and sinks, making it versatile in connecting with other systems.

Example Kafka Configuration:

        
# server.properties

# Kafka Broker ID
broker.id=1

# Port the broker listens on
listeners=PLAINTEXT://localhost:9092

# Log storage directory
log.dirs=/tmp/kafka-logs

# Number of partitions for new topics
num.partitions=3

# Replication factor for topics
default.replication.factor=2

# ZooKeeper connection string
zookeeper.connect=localhost:2181
        
    

Basic Kafka Usage:

1. Starting Kafka Server:

        
bin/kafka-server-start.sh config/server.properties
        
    

2. Creating a Topic:

        
bin/kafka-topics.sh --create --topic myTopic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 2
        
    

3. Producing Messages:

        
bin/kafka-console-producer.sh --topic myTopic --bootstrap-server localhost:9092
        
    

4. Consuming Messages:

        
bin/kafka-console-consumer.sh --topic myTopic --bootstrap-server localhost:9092 --from-beginning