Apache Kafka provides built-in mechanisms for managing logs. Each topic is broken down into partitions, and each partition is a commit log. To avoid the logs growing indefinitely, Kafka offers configurable policies for retention and cleanup of log data.
Log retention policies in Kafka determine how long Kafka retains logs before they are deleted. There are two main types of log retention policies:
log.retention.hours
)This policy allows Kafka to delete logs older than a specified amount of time. The default retention period is 168 hours (7 days).
# Retain logs for 7 days (default)
log.retention.hours=168
If you want to retain logs for a longer or shorter period, modify this setting:
# Retain logs for 3 days
log.retention.hours=72
log.retention.bytes
)This policy allows Kafka to delete logs based on the total size of the logs. Once the log size exceeds the configured threshold, the oldest logs are deleted.
# Retain logs up to 10 GB per partition
log.retention.bytes=10737418240 # 10 GB
By configuring size-based retention, you can ensure that your disk usage is kept under control.
Log cleanup policies determine how Kafka handles log segments. There are two main log cleanup policies:
delete
Cleanup PolicyThis is the default policy in Kafka, where old log segments are simply deleted after the retention period or size threshold is exceeded.
# Use the delete cleanup policy (default)
log.cleanup.policy=delete
In this policy, data is removed when it surpasses the defined retention criteria (time or size).
compact
Cleanup PolicyIn the compact policy, Kafka only removes old versions of messages with the same key, retaining the most recent version of each key. This is useful for topics where you want to keep the latest state of a key, such as in a database-like scenario.
# Enable log compaction
log.cleanup.policy=compact
Compaction ensures that Kafka retains the latest update for each key while discarding older duplicates.
It’s possible to combine both time-based and size-based retention policies for logs, along with either the delete
or compact
cleanup policy:
# Retain logs for 3 days or until they exceed 5 GB per partition
log.retention.hours=72
log.retention.bytes=5368709120 # 5 GB
# Use the delete cleanup policy
log.cleanup.policy=delete
Logs are stored as segments in Kafka. Each segment is a file, and Kafka continuously writes to the latest segment. Kafka also allows you to control how frequently new log segments are created.
log.segment.bytes
This property controls the maximum size of a log segment file before Kafka creates a new segment.
# Each log segment is 1 GB in size
log.segment.bytes=1073741824 # 1 GB
log.segment.ms
This setting defines the time interval after which a new log segment is created, regardless of its size.
# Create a new log segment every 24 hours
log.segment.ms=86400000 # 24 hours in milliseconds
Here is a sample configuration that combines several retention and cleanup policies:
# Retain logs for 7 days or until they exceed 20 GB
log.retention.hours=168
log.retention.bytes=21474836480 # 20 GB
# Use log compaction
log.cleanup.policy=compact
# Create a new log segment every 2 GB or every 24 hours
log.segment.bytes=2147483648 # 2 GB
log.segment.ms=86400000 # 24 hours
It is important to monitor the log retention and cleanup processes to avoid running out of disk space. Kafka provides metrics for this purpose:
Log retention and cleanup policies are critical for managing disk usage in a Kafka cluster. By configuring time-based, size-based retention, and choosing between the delete or compact cleanup policies, you can optimize the storage and retrieval of data in Kafka.