Skip to content

Kafka partitioner results

If a producer sends a message to Kafka, then it needs to specify the partition for that message. The producer’s partitioner maps each message to a Kafka topic partition, and the producer sends a request to the Kafka brokers to write the message to that specific partition. This is especially relevant for public DSH streams, where the DSH acts as a producer for messages that come in from HTTP clients or MQTT clients, via the messaging API.

If you create your own Kafka producer, then it needs to use the same partitioning algorithm as the DSH, to make sure that messages with the same MQTT topic end up in the same partition, maintaining the order and history of messages. This page describes the 2 available types of partitioning logic, and provides expected results, so that you can test your implementation of the partitioning algorithm.

When you request a public DSH stream, then you need to specify the partitioner, and there are 2 options:

  1. Default Kafka partitioner: The partitioner uses the full MQTT topic as input to calculate the partition.
  2. Topic-level partitioner: The partitioner uses the first n levels of the MQTT topic as input to calculate the partition, where n is an integer. The default value is “1”.

Default Kafka partitioner

The default Kafka partitioner uses the full MQTT topic to calculate the partition. It hashes the MQTT topic using murmur2, and then applies the modulo operation to it, with the number of partitions of the DSH stream as a divisor:

murmur2(mqtt_topic) % number_of_partitions

Some things to bear in mind:

  • The MQTT topic in question doesn’t have a leading forward slash (/).
  • You can find the number of partitions of a DSH stream in the environment variables of your service, see Retrieve the Kafka configuration for more information.
  • The DSH uses the Java implementation of the murmur2 function from the standard Kafka library:
    • Java has no unsigned integer types, so make sure that you account for this in your implementation by adding a bitmask. See the Python script of the Publish to DSH stream page for an example.
    • Make sure that your code accounts for the way that Java handles overflow in multiplications with wrapping.

The table below lists the expected results for the default partitioner, given a series of MQTT topics and number of partitions:

MQTT topic Number of partitions Result
AAA 1 0
4 2
12 2
AAA/BBB 1 0
4 2
12 2
AAA/BBB/CCC 1 0
4 3
12 7

Topic-level partitioner

The topic-level partioner only uses the first n levels of the MQTT topic to calculate the partition, where n is defined by the “topic level depth” setting of the DSH stream. Its default value is “1”. The partitioner reduces the MQTT topic using the topic level depth, hashes the reduced MQTT topic using murmur2, and then applies the modulo operation to it, with the number of partitions of the DSH stream as a divisor:

murmur2(reduce_topic(mqtt_topic, topic_level_depth)) % number_of_partitions

Some things to bear in mind:

  • The MQTT topic in question doesn’t have a leading forward slash (/).
  • You can find the number of partitions and the topic level depth of a DSH stream in the environment variables of your service, see Retrieve the Kafka configuration for more information.
  • The DSH uses the Java implementation of the murmur2 function from the standard Kafka library:
    • Java has no unsigned integer types, so make sure that you account for this in your implementation by adding a bitmask. See the Python script of the Publish to DSH stream page for an example.
    • Make sure that your code accounts for the way that Java handles overflow in multiplications with wrapping.

The table below lists the expected results for the the topic-level partitioner with different topic level depths

MQTT topic Number of partitions Results
Topic level depth 1 Topic level depth 2 Topic level depth 3
AAA 1 0 0 0
4 2 2 2
12 2 2 2
AAA/BBB 1 0 0 0
4 2 2 2
12 2 2 2
AAA/BBB/CCC 1 0 0 0
4 2 2 3
12 2 2 7