Kafka Log Retention – Delete Policy

By design, Kafka doesn’t delete messages as soon as they are consumed unlike other pubsub messaging platforms. Delete Log Retention is basically a log cleanup strategy where older log segments are purged or deleted when they breach a threshold limit set either by time or by size. The most common configuration for how long Kafka will retain messages before they are deleted is by time.

However, out of the box, Kafka is configured by default to retain messages only up to a time limit and there is no default limit on the size of topics. So, always configure your cluster and topics appropriately using cluster-wide log.retention.bytes or topic-level retention.bytes setting when sizing your clusters and setting up your clients.

Retention by time limit has three cluster-wide configuration settings, namely: log.retention.ms, log.retention.minutes, log.retention.hours in their order of priority or precedence. The smaller or smallest unit size always take precedence where more than one is specified.

Important to mention here that, for the retention by size limit, the cluster-wide log.retention.bytes or topic-level retention.bytes setting is applied per-partition. So cumulatively, a topic with 10 partitions that was configured with retention.bytes of 1G would retain at most 10G worth data. If it was replicated as often the case with say, a replication factor of 3, then total estimate storage requirement would be 30G for this topic.

Note also that the cleanup policy applies to the whole log segment, not individual messages written to it. In other words, messages are not deleted individually but along with other messages in a segment. And to do this, the log cleaner checks if the segment age, which is the timestamp of the last message written to a segment, has exceeded the retention time limit. So, although this guarantees that messages will live at least as long as the retention time limit, but many messages in the segment file will remain longer than the retention time limit!

Broker/Cluster-wideTopic OverrideDescriptionDefault value
log.retention.msretention.msSet this to -1 to apply NO time limit.null
log.retention.minutesn/anull
log.retention.hoursn/a7 days
log.retention.bytesretention.bytesThe maximum size of the topic partition before deleting it. By default, there is no limit on size. Since this limit is enforced at the partition level, multiply it by the number of partitions to compute the topic retention in bytes.-1
log.retention.check.interval.msn/aThe frequency in milliseconds that the log cleaner checks whether any log is eligible for deletion
log.segment.bytessegment.bytesThe maximum size of a single log file1 gibibyte
log.roll.mssegment.msMaximum time before a new segment is rolled out even if not full to ensure that log cleaning happens7 days
log.roll.hoursn/aMaximum time before a new segment is rolled out even if not full to ensure that log cleaning happens7 days

How Delete Retention Policy Works

Similarly, here is a flowchart illustration of how log retention for delete cleanup policy works. Interestingly, there is an edge case where an active segment may be deleted! Hope you find it useful.