Apache Kafka Components
— from brokers to topics, partitions, consumer groups, clusters, and replication.
Apache Kafka is an open-source distributed streaming platform that facilitates the construction of real-time streaming data pipelines and applications. In the context of Kafka, “distributed” means that it operates across multiple computers that form a cluster and work together towards a common goal.
Let’s explore some of the key Kafka components:
- Broker
Same as a message broker/messaging system. Broker comprised of three main components: producer, broker, and consumer. Producer send data to brokers. While Brokers are responsible for storing the messages. And Consumer consume messages from brokers.
2. Kafka Topic
Kafka Topic is an internal component of the broker. It used to classify the purpose of each message. For example, there could be 2 topics, Topic A and Topic B, each serving as a destination for specific types of messages. Producer send messages related to Topic A, and the broker stores Topic A messages. These messages are eventually received by consumer. Within Topic A, messages are organized in sequential order, identified by ordinal numbers such as 0, 1, 2, and so on. These messages are append-only and can only be removed based on retention policies.
3. Kafka Partition
Within a topic, messages are stored in partitions. The number of partitions determines how many consumers can be involved in processing(see number 4 about Consumer Group). If there are 2 partitions, there can be a maximum of 2 consumers. If there are 3 partitions, there can be a maximum of 3 consumers or can be 2 consumers. The count of partitions is not limited, while consumer group is limited based on partition.
4. Consumer Group
Kafka operates using a point-to-point and pub-sub messaging model. Consumer groups allow for collaboration, enabling consumers to read specific partitions. It’s important to balance the number of consumers and partitions to optimize data processing. In case, there are 3 Partitions. Consumer at the top read only partition 1, while the another Consumer read partition 2 and 3 or it can be each Consumer receive message from each partition. But, what happens if there is an additional consumer? It cannot be accommodated. The significance of consumer groups becomes evident when one consumer becomes idle or malfunctions, another consumer can take over the task. There is no hard limit on the number of consumer groups, but there are limitations on the number of consumers within each group.
5. Kafka Cluster
Since Kafka is a distributed system, it can consist of multiple brokers unified under the term “Cluster.” This approach ensures if one broker fails, another can take over its tasks. The cluster monitors the availability of brokers and ensures that messages are stored on the live brokers.
6. Replica
This feature is utilized to prevent data loss in case a broker experiences downtime. With multiple brokers, data can be replicated as desired. Replicas exist at the partition level, not the topic level. Each replica comprises a leader and followers.
- The leader is responsible for consumers consuming data. If the broker with the leader fails, another broker automatically takes over the leadership role.
- The followers act as backups that replicate data from the leader broker.
This comprehensive architecture, from brokers to topics, partitions, consumer groups, clusters, and replication, empowers Kafka as a robust and scalable platform for real-time data streaming and processing.