Schema Store¶
By default, the DSH provides a schema registry to manage schemas for data in Kafka topics of type “scratch” and DSH streams:
- Each DSH platform contains one Schema Store that is shared across tenants.
- Producers and consumers of Kafka topics can use schemas to ensure that data remains consistent and compatible across topics and applications.
- The Schema Store supports the Apache Avro, JSON Schema and Protobuf schema definition formats.
- The DSH offers an API for the Schema Store based on Apicurio, which is compatible with Confluent’s Schema Store API.
- The Schema Store contributes to your data governance in the following way:
- You ensure that you serialize and deserialize data correctly in Kafka.
- Your applications and services use the correct schema for the data they are working with, thus reducing errors.
- You improve the interoperability between different applications or services because they use the same schemas.
- The Schema Store allows you to manage schema changes and versioning easily.
Note
The Schema Store only stores the schemas. The DSH currently offers no mechanism out of the box to actually enforce these schemas in topics and services.
Permissions¶
The DSH applies the following permissions to the schemas in the Schema Store:
- A tenant has “Read” and “Write” permissions on all the schemas of its private (“scratch”) Kafka topics.
- In DSH streams:
- A tenant has “Read” and “Write” permissions on all the schemas of Kafka topics that the tenant can write to.
- A tenant has “Read” permissions on all the schemas of Kafka topics that the tenant can read.
Schema Store subjects¶
In the Schema Store, the subject refers to the name under which the schema is registered:
- The subject is identified by a unique name.
- A subject contains one or more versions of a schema that define the structure of the data entity.
- Subjects provide a way to organize schemas and manage different versions of the same schema.
- The DSH uses the “TopicNameStrategy” for the subject name in the Schema Store.
Subjects names¶
The DSH uses the “TopicNameStrategy” for the subject name in the Schema Store, see the Confluent documentation for more information. This means that the subject names are a concatenation of the following:
- The type of the topic or DSH stream:
scratch: a private Kafka topicinternal: an internal DSH streamstream: a public DSH stream
- The name of the Kafka topic, or the name of the DSH stream
- The name of the tenant (private Kafka topics of type “scratch”), or the name of tenant’s topic in the DSH stream (internal or public streams)
- One of the following:
-key: used for schemas that refer to the key of the message-value: used for schemas that refer to the value of the message
If the schemas are applied correctly, then the messages in one single Kafka topic all use the same pair of subjects (one for the key and one for the value), or one subject only (for the value). This limits the number of data types inside one single topic, which makes it suitable for use with Apache Flink.
The following subject names for a private topic are examples of the pattern scratch.<topic-name>.<tenant-name>-[key|value]:
- ‘scratch.temperature.tenant-a-key’: A schema for the message keys in the private “scratch” Kafka topic ‘temperature’, owned by ‘tenant-a’
- ‘scratch.vehicles.tenant-b-value’: A schema for the message values in the private “scratch” Kafka topic ‘vehicles’, owned by ‘tenant-b’
The subject names for topics inside DSH streams look similar, but follow a different pattern because the topics have the name of a tenant: see Naming conventions for more information. The following subject names are examples of the pattern <stream-type>.<stream-name>.<topic-name>-[key|value]:
- ‘internal.moisture.tenant-c-value’: A schema for the message values in the topic ‘tenant-c’ of the internal DSH stream ‘moisture’
- ‘stream.traffic.tenant-d-key’: A schema for the message keys in the topic ‘tenant-d’ of the public DSH stream ‘traffic’
Rules¶
The DSH applies the following rules when a tenant requests a schema from the Schema Store:
- If the tenant requests a specific subject, then it needs to supply the full subject name, including the tenant name.
- If the tenant requests all available subjects, then it receives the subjects related to topics for which it has read and/or write access. This amounts to subjects for its own Kafka topics of type “scratch”, and for the Kafka topics in DSH streams that it has access to.
- If a tenant has the “Write” permission for a Kafka topic, then it can register a schema for it. As a consequence, different schemas can be registered for the same topic if multiple tenants have the “Write” permission for it.
DSH streams¶
It’s important to realize that the Schema Store is a platform service that allows you to store schemas for data types, and that it doesn’t automatically enforce these data types inside Kafka topics. This is certainly true for DSH streams.
DSH streams consist of multiple Kafka topics. You can use the Schema Store to save the schemas for data in these separate topics, but that doesn’t mean that there is one consistent data type within the DSH stream. For example:
- 2 tenants share the internal DSH stream ‘temperature’: ‘tenant-a’ and ‘tenant-b’.
- As a consequence, this DSH stream contains 2 Kafka topics:
- ‘internal.temperature.tenant-a’: tenant-a has the “Read” and “Write” permission, and tenant-b only has the “Read” permission.
- ‘internal.temperature.tenant-b’: tenant-b has the “Read” and “Write” permission, and tenant-a only has the “Read” permission.
- As a consequence, tenant-a and tenant-b can each register their own separate subjects in the Schema Store for their own Kafka topics in the DSH stream. The data types inside the Kafka topics are consistent, but the DSH stream contains two different data types: one for every Kafka topic.
Furthermore, public DSH streams have an additional “dsh” Kafka topic, which is used to store messages that external clients publish via the Messaging API. As a tenant, you don’t always control the data types used by these external MQTT/HTTP clients.
In a nutshell, you can use the Schema Store to register schemas for your data, but it’s still your responsibility to create a mechanism that coordinates the use of these schemas across the different tenants, Kafka producers, Kafka consumers, or MQTT/HTTP clients.
Note
The Schema Store is only available to Kafka producers and consumers:
- On the DSH, they can connect to the Schema Store directly.
- Outside the DSH, they can connect to the Schema Store via the authentication mechanism of the Kafka Proxy.
MQTT/HTTP clients don’t have access to the Schema Store.
Schema Store API¶
The DSH offers an API that is based on Apicurio, which is compatible with the Confluent Schema Registry.
Authentication¶
All services on the DSH can access the API of the Schema Store. You can retrieve its hostname, ports and certificate by executing the get_signed_certificate.sh shell script. See Retrieve the Kafka configuration for more information.
Kafka clients outside the DSH can gain access to the Schema Store via the Kafka Proxy:
- Click “Services” > “Overview” in the menu bar of the DSH Console.
- Click the “+ Kafka Proxy” button at the top of the overview page.
- Fill out the necessary information for the Kafka Proxy. See Kafka Proxy for more information.
- Under “Schema Store”, select the checkbox “Include access to the Schema Store”.
- Select the correct number of CPU cores and memory for the Schema Store proxy.
- Click the “Deploy Kafka Proxy” button.
- Once the DSH deployed the Kafka Proxy, you can access the Schema Store API:
- The DSH provides a vhost in the format
<proxy-name>-schema-store.kafka.<tenant-name>.<platform-name>.kpn-dsh.com. See Naming conventions for more information. - You can authenticate using the authentication mechanism of the Kafka Proxy.
- The DSH provides a vhost in the format
Supported operations¶
The DSH supports the following operations for its Schema Store API.
Tip
The App Catalog contains the Schema Store UI app, with an interactive OpenAPI specification. Deploy this app to explore the Schema Store API. See App Catalog for more information.
Schemas¶
GET /schemas/ids/{int: id}: Retrieve the schema for the given ID.GET /schemas/ids/{int: id}/schema: Retrieve the raw schema content for the given ID.GET /schemas/types/: Retrieve a list of all registered schema types.GET /schemas/ids/{int: id}/versions: Retrieve a list of all available versions for a given schema ID.
Subjects¶
GET /subjects: Retrieve a list of all registered subject names.POST /subjects/(string: subject): Register a new subject with the given name.GET /subjects/(string: subject)/versions: Retrieve a list of all available versions for the given subject.POST /subjects/(string: subject)/versions: Register a new schema version under the given subject.GET /subjects/(string: subject)/versions/(versionId: version): Retrieve the schema for the given subject and version.GET /subjects/(string: subject)/versions/(versionId: version)/schema: Retrieve the raw schema content for the given subject and version.
Compatibility¶
POST /compatibility/subjects/(string: subject)/versions: Check the compatibility of a given schema against the latest version of the specified subject.
Configuration¶
GET /config/(string: subject): Retrieve the configuration for the given subject.PUT /config/(string: subject): Update the configuration for the given subject.- See Schema evolution and compatibility for more information.
- Once the compatibility level is set, you can only change to a less restrictive level:
FULLFORWARD_TRANSITIVEFORWARDBACKWARD_TRANSITIVEBACKWARDSNONE.
Unsupported features¶
The DSH doesn’t support the following features of the Confluent Schema Registry API:
- Delete a schema or a schema version
- Schema referencing
- On-bus schema arbitration