Ingesting data into Apache Kafka topics with multiple partitions:

Can order be maintained between partitions?

Jeni Joe
2 min readFeb 8, 2021

I recently started working on a client engagement using Apache Kafka. This is a very well established client, now in the phase of transitioning to use Apache Kafka.

Kafk(huh)?
In short, Apache Kafka allows the decoupling of source systems from target systems. This is especially useful when there are multiple source and target systems that all need to exchange data with each other. With Kafka, the source systems can write to Kafka, from where the target systems consume. Kafka is distributed, provides horizontal scaling and fault tolerance.

Use cases:
User activity tracking and recommendation systems, operational metrics management, and notification systems (my client’s main concern), amongst many others.

Others who use Kafka:
Uber, Netflix, LinkedIn, Pinterest, Tinder,…

Messages, topics, offsets, partitions:
To start with, messages in Kafka are written to Kafka topics. Once written, data is immutable and is deleted after the retention period, which by default is a week.
Topic partitions are written to Kafka clusters. Clusters are made up of servers, called brokers.
Kafka, being distributed, writes partitions to different brokers in a round-robin fashion, balancing the load.
Topics are identified by name. Topics can have multiple partitions, the number of partitions specified at the time of topic creation.
Having multiple partitions is useful since it enables multiple consumers to read from a topic in parallel.
Messages in a partition are ordered with offsets(unique sequential ID).
Kafka guarantees the maintenance of order within a partition. Consumers read messages from partitions in order.

And thus, we get to the issue at hand: What about the ordering of messages between partitions?

If it is imperative that all messages must be ordered, then the solution is to use just a single partition. This does not come without a price, the first of which is loss of parallelization benefits.

I want order; give me order

Replication factor
Having a replication factor greater than 1, usually 3, assures availability.
Setting acks (acknowledgement) to all for the producer ensures no data loss even in the case of broker failures. Acks = all ensures that the message and all its replicas are received by the brokers responsible for storing it.

Message keys
If total ordering is not imperative, then setting a message key ensures that every message with a particular key is sent to one partition, and thus all messages with the same key will be ordered.

Using Kafka Streams and Exactly-once delivery
The Kafka Streams API offers exactly-once message delivery, ensuring messages are delivered and that there are no duplicates.

Retries and max.in.flight.requests.per.connection
Setting max.in.flight.requests.per.connection to one, ensures that if the producer is retrying sending a batch of messages, additional messages will not be sent, thus maintaining strict ordering of messages and reliability.

Reference: http://kafka.apache.org/documentation/#semantics

--

--