Kafka Data Retention Optimization

Optimizing data retention in Kafka involves configuring retention policies to balance data availability and storage usage. Here are key parameters and strategies for optimizing data retention.

1. Retention Policies

Kafka allows you to configure retention policies based on time and size:

Broker Configuration

# Retention time for topic data
log.retention.hours=168 # Retain data for 7 days

# Maximum size of log segments before rolling over
log.segment.bytes=1073741824 # 1 GB per segment

# Size of logs to retain before deletion
log.retention.bytes=53687091200 # 50 GB per topic

2. Topic-Level Configuration

You can override broker-level settings at the topic level for finer control:

Topic Configuration

# Retention time for a specific topic
kafka-topics.sh --alter --zookeeper localhost:2181 --topic my-topic --config retention.ms=86400000 # Retain data for 1 day

# Maximum size of log segments for a specific topic
kafka-topics.sh --alter --zookeeper localhost:2181 --topic my-topic --config segment.bytes=536870912 # 512 MB per segment

3. Log Compaction

Log compaction is used to retain only the latest value for each key, useful for topics where the latest state is more important than the entire history:

Compaction Configuration

# Enable log compaction for a topic
kafka-topics.sh --alter --zookeeper localhost:2181 --topic my-topic --config cleanup.policy=compact

4. Monitoring and Maintenance

Regularly monitor disk usage and adjust retention settings as needed. Kafka provides tools to check log size and perform maintenance:

Monitoring Tools

# Check log directory size
du -sh /path/to/kafka/logs/

# List topics and their configuration
kafka-topics.sh --describe --zookeeper localhost:2181