# What is the Data Services Hub (DSH)? The DSH is a Streaming Data Platform
## What is a “platform”? - Something you can build (applications) on. - It provides reusable infrastructure. - It takes care of recurring and tedious tasks. - It shouldn’t not hamper creativity.
## What is “streaming data”? > … data that is generated continuously by many independent data > sources. Typically, this data is of small size (order of kilobytes).
Data is
streaming
if it is produced and transmitted
without delays or batching
.
The smallest unit of streamed data is called a
message
or
event
.
Streaming data is sorted, collected, or aggregated in a
stream
or a
topic
.
## Types of streaming data Not all datastreams are created equal:
## What should a streaming data platform be able to do? - Handle hundreds of thousands of sources. - Send data to hundreds of thousands of sinks. - Process data: clean, refine, aggregate, combine… - Share data streams with other parties. - Do all of this, with high security standards.
## Key concepts - Scalable - Data as low-latency events (streams) - Real-time processing - Data sharing - Secure
## Types of streaming data, for the DSH
Many sources, low volume
Single sensor
MQTT
Few sources, high volume
Processing of streams
Kafka
# How does the DSH work? Let’s go through the DSH one step at a time.
## The DSH step by step The Kafka cluster is the core of messaging in the DSH. The MQTT adapter allows easy input and output of data.
## The DSH step by step As ‘Tenant A’, you receive an isolated environment on the DSH.
## The DSH step by step The DSH manages your access to Kafka topics and DSH streams via an Access Control List (ACL).
## The DSH step by step As a tenant, you can deploy applications (containers) in your isolated environment.
## The DSH step by step You also have access to resources, which are assigned specific limits.
## The DSH step by step The DSH offers services for monitoring, tracing, Public Key Infrastructure etc.
## The DSH step by step The DSH offers ways to interact with Kafka easily, such as the Kafka Proxy and adapter services.
# Data flow of the DSH
How does the data flow to, on and from the DSH?
## Data flow of the DSH Data sources and data sinks connect to the DSH via MQTT bridges.
## Data flow of the DSH The DSH stores messages from data sources on a so-called public stream. Data sinks consume from this public stream.
## Data flow of the DSH A stream is a collection of Kafka topics, and the DSH manages access to the stream. For example, the DSH gives Tenant A permission to consume from the public stream.
## Data flow of the DSH Tenant A processes the data, and produces new data. However, Tenant A wants this data to be accessible on the DSH only.
## Data flow of the DSH Tenant A publishes the new data on a so-called internal stream. Tenant A can give other tenants permission to consume from this internal stream.
## Data flow of the DSH For example, Tenant B has permission to consume from the internal stream. However, Tenant C can’t even see this stream.
## Data flow of the DSH Tenant A produces a separate set of data. Only Tenant A should be able to access these data, and no-one else.
## Data flow of the DSH Tenant A writes the data to a private Kafka topic. Only Tenant A can access this topic, and the other tenants can’t even see it.
## Data flow of the DSH This is only one example of the many possible data flows in the DSH. Data isn’t shared by default, and even MQTT access is on an explicit opt-in basis.
## Stream Processing > … is the processing of data in motion, or in other words, > computing on data directly as it is produced or received. https://docs.risingwave.com/reference/key-concepts#stream-processing
## Where to process The data source doesn’t have the resources to process the data, and the data sink can’t handle the huge number of data sources. The DSH can act as an intermediate to receive the data from the source, process it in real time, and send it on to the sink.
## Many ways to process the data - There are many frameworks for stream processing. - However, no framework fits all use cases. - Luckily, the DSH doesn’t dictate a framework: you can bring your own. No one framework to rule them all, but the DSH to bind them.
## Sharing securely - Tenants are completely separated, using Calico. - The DSH enforces the correct use of Docker containers. - Tenant containers use certificates to authenticate towards Kafka.