Why Did Kafka Abandon Zookeeper?

2024.10.17

For a long time, ZooKeeper was the standard configuration of Kafka. Now, Kafka officials are gradually removing ZooKeeper. Why did Kafka abandon Zookeeper? Let's talk about the reasons in this article.

Relationship between Kafka and ZooKeeper

ZooKeeper is a distributed coordination service that is commonly used to manage configuration, naming, and synchronization services. For a long time, Kafka has used ZooKeeper to manage cluster metadata, controller elections, and consumer group coordination, including topics, partition information, ACLs (access control lists), etc.

ZooKeeper provides Kafka with core functions such as leader election and cluster member management, and provides Kafka with a reliable distributed coordination service, enabling Kafka to communicate and manage effectively between multiple nodes. However, as Kafka develops, its dependence on ZooKeeper gradually reveals some problems, which are also the reasons why Kafka removes ZooKeeper below.

Reasons to abandon ZooKeeper

(1) Increased complexity

ZooKeeper is an external component independent of Kafka and needs to be deployed and maintained separately. Therefore, using ZooKeeper greatly increases the complexity of Kafka's operation and maintenance. The operation and maintenance team must manage two distributed systems (Kafka and ZooKeeper) at the same time, which not only increases management costs, but also requires operation and maintenance personnel to have higher technical capabilities.

(2) Performance bottleneck

As a coordination service, ZooKeeper is not designed specifically for high-load scenarios. Therefore, as the cluster size increases, ZooKeeper's performance issues in processing metadata become increasingly prominent. For example, when the number of partitions increases, ZooKeeper needs to store more information, which leads to increased monitoring latency, affecting Kafka's overall performance34. Under high load conditions, ZooKeeper may become a bottleneck in the system, limiting Kafka's scalability.

(3) Consistency issues

The distributed consistency model within Kafka is different from the consistency model of ZooKeeper. Due to the inefficient data synchronization mechanism between ZooKeeper and Kafka controllers, inconsistent states may occur, especially when dealing with cluster expansion or unavailable scenarios. This inconsistency will affect the reliability of message delivery and system stability.

(4) Develop your own ecosystem

I personally think that the core reason why Kafka abandoned ZooKeeper is that the Kafka ecosystem has become strong and it needs to establish its own business so as not to be strangled by others. Looking at both at home and abroad, there are many such vivid examples. When they are weak, they will choose to use other people's products first, and when they are fully grown, they will choose to build and improve their own ecosystem.

Introducing KRaft

In order to strip off and remove ZooKeeper, Kafka introduced its own son KRaft (Kafka Raft Metadata Mode). KRaft is a new metadata management architecture, a built-in metadata management method based on the Raft consensus algorithm, which aims to replace the metadata management function of ZooKeeper. Its advantages are:

  • Completely built-in and self-contained: KRaft embeds all coordination services into Kafka itself and no longer relies on external systems. This greatly simplifies deployment and management because administrators only need to focus on the Kafka cluster.
  • Efficient consistency protocol: Raft is a concise and easy-to-understand consistency algorithm that is easy to debug and implement. KRaft uses the Raft protocol to achieve strong consistency metadata management and optimize the replication mechanism.
  • Improved scalability of metadata operations: The new architecture allows more concurrent operations and reduces bottlenecks caused by scalability issues, especially in high-load scenarios.
  • Lower latency: After eliminating ZooKeeper as the middle layer, Kafka's latency performance is expected to improve, especially in scenarios involving leader election and metadata updates.
  • Complete autonomy: Because it is an in-house product, you have the final say on the product architecture design and code development, and the future architecture direction is completely in your own hands.

KRaft design details

Decentralization of controller nodes: In KRaft mode, the controller node is replaced by a group of Kafka service processes instead of an independent ZooKeeper cluster. These nodes are jointly responsible for managing the metadata of the cluster and achieving data consistency through Raft.

  • Log replication and recovery mechanism: Using Raft's log replication and state machine application mechanism, KRaft implements strong consistency support for metadata changes, which means that all controller nodes can reach a consensus on the cluster state.
  • Dynamic cluster management: KRaft allows nodes to be added or removed from the cluster dynamically without manually updating the configuration in ZooKeeper, which makes cluster management more convenient.

Here is a comparison chart between Zookeeper and KRaft:

Summary

In this article, we analyze why Kafka wants to remove ZooKeeper. There are two main reasons: ZooKeeper cannot meet the development of Kafka and Kafka wants to create its own ecosystem. In the face of increasingly complex data stream processing requirements, the KRaft mode provides Kafka with a more efficient and concise architecture solution. Regardless of the outcome, Kafka and ZooKeeper once had a wonderful honeymoon period. I wish Kafka will become more and more powerful in the KRaft mode and bring a better experience to users.