Skip to main content

Command Palette

Search for a command to run...

5 Things About Kafka Consumers That Beginners Should Know

Updated
5 Things About Kafka Consumers That Beginners Should Know

Kafka is an event streaming library used for asynchronous communication between microservices. It is know for its ability to handle high throughput of events with ease. In the story of Kafka, there are two main actors - producers and consumers. Producers are simple: publish a message, then go to sleep. Consumers are like books of Franz Kafka - looks simple on first read yet so much complicated from inside. There are some important configurations about consumers that every developer should be aware of.

Max Poll Interval

Consumers are members of a group, managed by a Group Coordinator (which lives on the Kafka Broker). The coordinator has too many consumers to look after. So it prioritizes work (event consumption) over anything and loves micro managing. No, I am not talking about your boss. It expects every consumer to keep checking in and asking for more work (polling) at regular intervals.

If a consumer is processing a heavy task and doesn't ask for more work within a specified time, the Coordinator assumes the consumer has died or is stuck in an infinite loop. this threshold is known as max.poll.interval.ms. By definition, it is the max duration between two subsequent polling calls of a consumer before the consumer is kicked from the group and the remaining group members rebalance.

Why wouldn’t a consumer poll for messages? Either the consumer is dead or the consumer has not completed executing the last polled messages. In this case, your message has the potential to be consumed twice. Hence it is important to keep the value in max.poll.interval.ms big enough to ensure that you are providing adequate time to your consumer for processing the message successfully. A time interval shorter than the average time needed to process an event will result in continuous rebalancing of your consumers without any event being processed.

Max Records

When consumers poll for messages, they can specify how many events they want to consume at a time. For e.g. you can have 20 incoming messages at a time. Say you have 10 consumers. if the max records is set as 2, it will mean each consumer will consume two messages before making the next polling call. A large max.poll.records ensures fewer polling calls thus saving on network calls to fetch data. However, one needs to ensure max.poll.interval.ms is configured accordingly by keeping in mind the max.poll.records set for the consumer.

Fetch Max Wait

While max.poll.records property helps us configure the max records that a polling call can return at a time, it may be the scenario that pending messages in queue are less than the configured fetch.min.bytes. In this case two things can be done - start consuming what we have in our queue or wait for sometime until fetch.min.bytes is satisfied and then start consuming. The max duration that a consumer can wait before it starts consuming the message by returning from a polling call is known as fetch.max.wait.ms. A wait time (fetch.max.wait.ms) of zero means the broker will return immediately without waiting for more data to accumulate. The amount of data returned will be the Maximum Available Data, but it is capped by max.poll.records.

Depending upon your incoming traffic pattern, you can configure this parameter along with the fetch.min.bytes parameter to reduce frequent polling calls. The downside is the added latency due to the time spent waiting for new records.

Heartbeat Interval

Just like humans, consumers liveliness is captured by their heartbeats. These are small “I am alive” pings that consumers keeps sending to the group coordinator (broker side) to let them know about their existence. This helps the broker to maintain a track of which consumers are working fine. heartbeat.interval.ms specifies the time interval between any two consecutive heartbeat calls. Sending many such calls in a small duration of time can cause unnecessary spamming at broker.

Abhi hum zinda hai - Indian Meme Templates

(POV: consumers at while sending heartbeats at regular intervals)

Session Timeout

Kafka broker is a chill guy. It trusts the consumers to keep sending “i am alive” signals at regular intervals. But once they don’t send it for some time, the group coordinator assumes them dead. It removes them from the group and triggers a rebalance. The time it waits before assuming them stuck or hang is defined by session.timeout.ms. So depending upon your use case, you can define the session timeout and heartbeat.interval.ms in such a manner, that the broker waits for atleast 3 heartbeat intervals before marking the consumer as inactive. This ensures temporary network fluctuations (if any) are handled gracefully and does not trigger any unnecessary rebalancing.

Bonus Question:

Say your consumer is processing messages from 1 to 50. A new member joins your consumer group triggering a rebalance. The broker asks your consumer to stop processing and return the partition. What will happen to the messages you were consuming?

Ending

Just like Franz Kafka’s metamorphosis, your Kafka consumer also has the potential to transform into a bug if you have not configured it properly. The only way to become better is to make mistakes and learn from them. I hope you enjoyed reading this blog. Namaste!