DSH stream¶

A DSH stream is a collection of Kafka topics that you can share with other tenants, and that you can expose to clients outside the DSH via MQTT or HTTP:

A Kafka topic is a mechanism to store a sequence of messages.
An internal stream is a collection of Kafka topics that you can share with other tenants on the DSH:
- You can give other tenants access with write permissions, read permissions, or both.
- Every tenant with write permissions receives its own dedicated Kafka topic in the DSH stream that they can produce messages to.
- Every tenant with read permissions can consume messages from all Kafka topics in the stream, via a wildcard subscription.
A public stream is a collection of Kafka topics that you can share both with other tenants, and with clients outside the DSH via the Messaging API:
- A public stream has all the features of an internal stream: you can give other tenants on the DSH permission to produce messages to their own Kafka topic in the DSH stream, to consume from all Kafka topics in the stream, or to do both.
- Additionally, external clients can produce messages to a dedicated Kafka topic in the public stream if they have the “Publish” permission, via the Messaging API.
- External clients can consume messages from all Kafka topics in the stream, again via the Messaging API.
Using MQTT and HTTP on public streams is advantageous if your setup involves a small volume of data at a time, and many sources. The Kafka Proxy is a better fit if your setup involves a large volume of data at a time, and few sources.

Note

The DSH streams aren’t related to the Kafka Streams API.
Every DSH platform contains 1 Kafka cluster. The DSH stores the Kafka topics of all the platform’s tenants on this Kafka cluster, and it uses Access Control Lists (ACLs) to manage the access to these Kafka topics.

Limits¶

The DSH imposes the following limits on Kafka streams:

Consumer rate: The maximum rate at which tenants can read data from Kafka, in mebibyte per second (MiB/s).
Producer rate: The maximum rate at which a tenant can write data to Kafka, in mebibyte per second (MiB/s).
Request rate: The maximum CPU time, in milliseconds (ms), across all of the I/O and network threads on the platform that a broker is allowed to use for handling requests:
- If the request rate is 1 %, then one broker can use 10 ms of CPU time.
- If the request rate is 10 %, then one broker can use 100 ms of CPU time.
- If the broker exceeds this time, then the broker adds a delay before responding to bring the usage within the configured rate.
- See Quotas in the Kafka documentation for more information.

Note

The rates above only apply to read/write processes of tenants on the DSH, namely of their applications and services.
The rates aren’t relevant to any read/write processes that involve Messaging API. It is the DSH itself that manages those rates.

You can request new limits:

Click “Resources” > “Overview” in the menu bar of the DSH Console.
Under “Kafka”, click the “Request” button for the limit that you want to change.
Fill out the desired value and click “Request”. A platform administrator will process your request, which appears as a ticket in the Support Portal.

Concepts of DSH streams¶

In order to create and configure a DSH stream, you need to understand some of the basic concepts in MQTT and Kafka.

MQTT and Kafka¶

The following concepts are relevant to both MQTT and Kafka:

Message: The record of an event, such as an order, payment, activity, or measurement.
Producer: A process that writes messages to a stream.
Consumer: A process that reads messages from a stream.
DSH stream: A collection of Kafka topics on the DSH.
Internal DSH stream: A collection of Kafka topics that only DSH tenants can access. The DSH doesn’t expose an internal stream to the public internet via Messaging API.
Public DSH stream: A collection of Kafka topics that DSH tenants can access, and that the DSH also exposes to the public internet. Clients outside the DSH can access a public stream via Messaging API.

MQTT¶

The following concepts are specific to MQTT:

MQTT topic: A string that producers and consumers use to organize and filter messages in the MQTT protocol:
- The MQTT topic is a path that represents a hierarchical structure. For example, ‘/house/kitchen/sensor’ and ‘/house/bathroom/sensor’ represent sensors in different rooms of the same house.
- As a consequence, messages with the same MQTT topic pertain to the same device or point of information.
- If a client writes a message to the DSH via the Messaging API, then the MQTT topic must include the cluster (“tt”) and the name of the DSH stream. For a DSH stream called ‘temperature’, this results in the MQTT topics ‘/tt/temperature/house/kitchen/sensor’ and ‘/tt/temperature/house/bathroom/sensor’.
Retained: In the MQTT protocol, a producer can mark a message as “retained” for a specific MQTT topic. If a client subscribes to that MQTT topic for the first time, then the retained message is sent immediately.

Kafka¶

The following concepts are specific to Kafka:

Kafka topic: A log of events, in the form of messages.
- Every tenant that has “Write” permission on a stream receives its own Kafka topic to publish messages to.
- Public streams also have a Kafka topic (with the suffix “.dsh”) that the DSH uses for messages that come in via Messaging API. Only the DSH can write to this topic. See Naming conventions for more information.
Kafka broker: Kafka stores Kafka topics on multiple servers. Such an individual server is called a Kafka broker, and brokers are organized into clusters. Every DSH platform has one Kafka cluster.
Partition: Kafka can divide a Kafka topic into multiple partitions, which allow to scale easily and securely:
- Kafka can store partitions on different brokers within one cluster, which distributes the workload.
- If you use partitions on multiple brokers, then multiple instances of your application or service can read from the Kafka topic in parallel.
- Kafka replicates a partition that exists on one single broker to other brokers in the same cluster. If the original broker fails, then replicated partitions are still available on other brokers.
Segment: Kafka doesn’t store partitions in one big file on disk, but it divides them into smaller files, called segments.
Partitioner: The strategy that a producer uses to determine the partition that it should publish a message to.
- Internal stream: Every producer is responsible for communicating the partitioning strategy to Kafka.
- Public stream: The DSH hashes the MQTT topic of a message to calculate the partition of that message. This ensures that messages with an identical MQTT topic always end up in the same partition, maintaining the order of the messages.
Message key: A unique identifier of a message, that Kafka uses for log compaction. For public streams, the DSH uses the MQTT topic as the message key.
Log compaction: A cleanup mechanism where Kafka removes any old messages from a partition if a newer message with the same key is present in the partition. On the DSH, log compaction takes place when a segment has reached its maximum size.
Stream contract: Settings for a public stream that all producers should comply with. The stream contract specifies whether the DSH stream supports retained messages, and which partitioning scheme to follow to determine the partition.

Naming conventions¶

The DSH uses the <stream-type>.<stream-name> scheme for streams. For example:

‘internal.temperature’: an internal stream called ‘temperature’.
‘stream.moisture’: a public stream called ‘moisture’.

Note

The technical name for an internal stream is “internal”, but the technical name for a public stream is “stream”.

The DSH uses the <stream-type>.<stream-name>.<tenant-name> scheme for Kafka topics inside a stream. The tenant name “dsh” is reserved for topics that only the DSH can write to. For example:

‘internal.temperature.tenant-a’: The Kafka topic ‘tenant-a’ inside the internal stream called ‘temperature’. Only tenant-a can publish to this topic.
‘internal.temperature.tenant-b’: The Kafka topic ‘tenant-b’ inside the internal stream called ‘temperature’. Only tenant-b can publish to this topic.
‘stream.moisture.tenant-c’: The Kafka topic ‘tenant-c’ inside the public stream called ‘moisture’. Only tenant-c can publish to this topic.
‘stream.moisture.dsh’: The topic ‘dsh’ inside the public stream called ‘moisture’. Only the DSH can publish to this topic, and this topic contains the messages that clients produce to the DSH stream via Messaging API.

Managing DSH streams¶

You can manage your DSH streams in the DSH Console.

Adding a DSH stream¶

If you want to add a stream to your tenant, then you need to create a ticket in the DSH Support Portal:

Click the “Support” button in the menu bar of the DSH Console. Log in to the DSH Support Portal if necessary.
Click “New support ticket” next to the search bar.
Fill out the following fields:
- Tenant: Enter the name of your tenant.
- Platform: In the dropdown menu, select the platform that you want to request a stream for.
- Requester: Enter your email address, or use the prefilled address of your account for the DSH Support Portal.
- Company: In the dropdown menu, select the company that you request a stream for.
- Subject: Enter the stream type that you want to request:
  - ‘Request new internal stream’
  - ‘Request new public stream’
- Contents: Provide the information that is listed in the table below.
- Attach a file: Optionally, you can provide the information in a file and attach it to the ticket.
Click “Submit” to send the request. A platform administrator will process it.

The platform administrator needs the information listed below to create a DSH stream. Provide the information in the “Contents” field of your ticket, or in an attached file.

Field	Description
Name	Enter the name of the DSH stream. The name must be unique in the platform, and it must obey the following rules: It starts and ends with a lowercase letter (a–z), or a number (0–9). It can contain hyphens (`-`), lowercase letters (a–z), or numbers (0–9). It has a length between 3 and 100 characters. If you share the DSH stream with another tenant, then they will obviously see the stream's name.
Owner	Enter the tenant who owns the DSH stream.
Type	Choose between "Internal stream" or "Public stream".
Purpose	Describe what the DSH stream will be used for. This provides context to the platform administrator and allows them to choose the correct settings.
Data profile	Describe the profile of the data that will be stored in the DSH stream. For example: Average number of messages per second Average amount of data per second, e.g. 1 MiB/s The frequency, for example "Continuous", "Intermittent", or "Burst"
Maximum message size	Enter the maximum size of published messages, in kibibytes (KiB): If you request a public stream, then the maximum size can't be greater than 128 KiB, because this is the maximum size of messages in the MQTT protocol. If you request an internal stream, then the default value is 1 MiB (1024 KiB). You can request a new value if necessary.
Number of partitions	Enter the number of partitions that you need for each Kafka topic in the DSH stream. The default value is "6". This number defines the stream parallelism: each tenant service reading from the stream can have up to this number of instances to process messages in parallel. See How to Choose the Number of Topics/Partitions in a Kafka Cluster? and Picking the number of partitions for a topic if you need some pointers. Note that it's hard to change the number of partitions once the DSH stream exists.
Replication factor	This indicates how many times that Kafka replicates partitions across brokers. The default value is "3", which means that Kafka always replicates a partition on a specific broker to 2 other brokers in the same cluster.
Access	List the tenants that require access to the DSH stream, and which permissions they need ("Read", "Write", or "Read/Write"). Every tenant with the "Write" permission receives its own Kafka topic in the DSH stream.
Kafka topic properties	Optionally, you can request to set the following properties for the Kafka topics in the DSH stream. Click the property to find out more about it in the Apache Kafka documentation: `max.message.bytes`: The default value on the DSH is "1048588" (1 MiB),and the minimum value on the DSH is "1024" (1 KiB). `compression.type`: The default value on the DSH is "producer". `message.timestamp.type`: The default value on the DSH is "CreateTime". `cleanup.policy`: The default value on the DSH is "delete". The following properties are relevant if you choose the "delete" cleanup policy: `retention.bytes`: The default value on the DSH is "-1". `retention.ms`: The default value on the DSH is "604800000" (7 days). The minimum value on the DSH is "3600000" (1 hour), and the maximum value is "31557600000" (1 year). The following property is relevant if you choose the "compact" cleanup policy: `delete.retention.ms`: The default value on the DSH is "86400000" (1 day). `segment.bytes`: The default value on the DSH is "1073741824" (1 GiB). The minimum value on the DSH is "1024" (1 KiB), and the value should be greater than or equal to the value for `max.message.bytes`.

For public streams, the platform engineer also creates a so-called stream contract. The contract specifies the partitioning strategy for Kafka, and whether the stream supports retained messages. For that reason, you should also provide the following information when you request a public stream:

Field Description

Retained messages

This indicates whether the DSH stores MQTT messages in the Latest Value Store if they are marked as "Retained". The default value is "Yes".

MQTT offers the feature of retained messages: if a message is the retained message for an MQTT topic, then a consumer receives it immediately after subscribing to that MQTT topic.
However, Kafka doesn't have this concept of retained messages, so the DSH uses the so-called Latest Value Store to save the retained messages.
If set to "Yes", then the DSH stores the latest retained message both in the Kafka topic and in the Latest Value Store. It sends the retained message from the Latest Value Store immediately if a client subscribes to an MQTT topic pattern that matches the MQTT topic of the retained message.

The setting for retained messages is part of the stream contract. This means that every producer must adhere to the setting, even if the producer is a service on the DSH that writes to the public stream directly without using MQTT or HTTP.

Topic level for partitioning

The DSH hashes the MQTT topic path to calculate the number of the partition in the Kafka topic, using Murmur2. In the default setting, the DSH hashes the entire MQTT topic path to determine the partition.

However, you can instead use only a part of the MQTT topic path for this process, and the topic level then determines which part. The default value for the topic level is "1". For example:

'/tt/temperature/house/kitchen/sensor/state' is the MQTT topic path of the message.
'house' is the MQTT topic path that the DSH uses for partitioning if the topic level is "1": the DSH always strips the cluster ('tt') and the stream name ('temperature').
'house/kitchen' is the MQTT topic path that the DSH uses for partitioning if the topic level is "2".
'house/kitchen/sensor' is the MQTT topic path that the DSH uses for partitioning if the topic level is "3".

The topic level is part of the stream contract. This means that every producer must adhere to the setting, even if the producer is a service on the DSH that writes to the public stream directly without using the Messaging API. In other words: don't use your own partitioning scheme if you publish messages to a public stream. Instead, follow the partitioning scheme in the contract, which can either use the complete MQTT topic path (default setting), or a specific level of the MQTT topic path.

MQTT permissions

Enter the permissions that clients have if they connect via the Messaging API. These permissions are a combination of the following:

An MQTT topic path, and you can use wildcards to define it. For example, the following are valid topic paths:
- /tt/temperature/house/kitchen/sensor
- /tt/temperature/#
- /tt/temperature/house/+/sensor
Permissions for that MQTT topic path: "Publish", "Subscribe" or "Publish/Subscribe".

These MQTT permissions pertain to the messages' MQTT topic path, irrespective of the Kafka topic that these messages are stored in:

If an external client produces messages to a public stream, then the DSH writes these messages to the Kafka topic with the ".dsh" suffix.
If an external client consumes messages from a public stream, then it consumes messages from all Kafka topics in the public stream.
The MQTT permissions only define which MQTT topic paths an external client can access, and these permissions don't restrict access to Kafka topics in the DSH stream.
See Messaging API for more information.

Inspecting a DSH stream¶

Take the following steps to inspect a DSH stream:

Click “Resources” > “Streams” in the menu bar of the DSH Console.
The overview page for DSH streams lists all streams that you have access to, along with the following information:
- Type: The icon in front of the stream’s name indicates whether it’s an internal stream or a public stream.
- Name: The name of the DSH stream in question.
- Read/Write: Which permissions you have on the DSH stream: “R” (“Read”), “W” (“Write”), or “R/W” (“Read/Write”).
- Derived: If the DSH created the stream as part of a service, then the service in question appears in the “Derived” column.
- Data Catalog: Indicates whether the DSH stream is shared with the KPN Data Catalog. This feature is available on a limited number of DSH platforms only.
- Metrics: The DSH can display the following metrics, depending on the status of your DSH stream:
  - Errors /s: The average number of errors per second, measured over a period of an hour.
  - Message (write) /s: The average number of messages that your tenant writes to the DSH stream per second, measured over a period of an hour.
  - Bytes (write) /s: The average number of bytes that your tenant writes to the DSH stream per second, measured over a period of an hour.
  - Messages (total) /s: The average number of messages that all publishers (tenants and MQTT) write to the DSH stream per second, measured over a period of an hour.
  - Bytes (total) /s: The average number of bytes that all publishers (tenants and MQTT) write to the DSH stream per second, measured over a period of an hour.
  - Messages retained /s: The average number of retained messages per second, measured over a period of an hour.
  - Messages in store: The total number of messages in the Latest Value Store.