Multi-Tenant Kafka Streams Application

In this guide, we'll explore how to implement a multi-tenant Kafka Streams application. Multi-tenancy in Kafka refers to the ability to serve multiple clients (tenants) from a single Kafka cluster or application, while keeping data and processing isolated per tenant.

Concept of Multi-Tenancy in Kafka

In a multi-tenant Kafka environment, each tenant could be given its own:

Kafka topics: Separate topics per tenant for data segregation.
Stream processing logic: Each tenant might need different processing logic.
Consumer groups: To consume messages independently.
Security configurations: Tenant-specific authentication and authorization.

Approaches to Multi-Tenancy in Kafka

1. Isolated Topics Per Tenant

Each tenant has its own input and output topics. This ensures data isolation and tenant-specific processing. Tenant-specific topics could follow a naming convention like:

input-topic-tenant1, input-topic-tenant2, output-topic-tenant1, output-topic-tenant2

2. Shared Topics with Tenant Identifiers

All tenants share a single set of topics, and each message includes a tenant identifier (e.g., in the message key or headers). Kafka Streams can route and process messages based on this tenant identifier.

3. Separate Consumer Groups

Each tenant has its own consumer group. The Kafka Streams application uses different consumer group IDs for each tenant, allowing independent consumption and processing.

Step-by-Step Example: Isolated Topics Per Tenant

We will now implement a multi-tenant Kafka Streams application where each tenant has its own Kafka topics for input and output.

Step 1: Kafka Streams Application Logic

The Kafka Streams application will handle multiple tenants by dynamically creating Kafka Streams topology based on the tenant's topics. Each tenant will have its own input and output topic.

// MultiTenantKafkaStreamsApp.java
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;

import java.util.HashMap;
import java.util.Map;
import java.util.Properties;

public class MultiTenantKafkaStreamsApp {

    private static final Map<String, KafkaStreams> tenantStreams = new HashMap<>();

    public static void main(String[] args) {
        // List of tenants
        String[] tenants = {"tenant1", "tenant2"};

        for (String tenant : tenants) {
            KafkaStreams streams = createKafkaStreams(tenant);
            tenantStreams.put(tenant, streams);
            streams.start();
        }

        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
            tenantStreams.values().forEach(KafkaStreams::close);
        }));
    }

    private static KafkaStreams createKafkaStreams(String tenant) {
        // Configure the properties for the Kafka Streams instance for each tenant
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "kafka-streams-app-" + tenant);
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

        // Create a StreamsBuilder instance
        StreamsBuilder builder = new StreamsBuilder();

        // Define tenant-specific input and output topics
        String inputTopic = "input-topic-" + tenant;
        String outputTopic = "output-topic-" + tenant;

        // Define the stream processing logic
        KStream<String, String> inputStream = builder.stream(inputTopic);
        KStream<String, String> outputStream = inputStream.mapValues(value -> "Processed: " + value);

        // Send processed data to tenant-specific output topic
        outputStream.to(outputTopic);

        // Build and return the KafkaStreams instance
        return new KafkaStreams(builder.build(), props);
    }
}

Explanation

This code defines a Kafka Streams application that creates a separate stream topology for each tenant. Each tenant has its own input and output topics, and the application processes data for each tenant independently.

createKafkaStreams() creates a Kafka Streams instance for a specific tenant by configuring tenant-specific topics and properties.
The application uses a HashMap to store and manage the Kafka Streams instances for each tenant.

Step 2: Kafka Topic Configuration for Tenants

To ensure each tenant has its own topics, create Kafka topics for each tenant. You can create the topics manually or programmatically using Kafka Admin API.

# Create topics for each tenant
kafka-topics.sh --create --topic input-topic-tenant1 --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
kafka-topics.sh --create --topic output-topic-tenant1 --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

kafka-topics.sh --create --topic input-topic-tenant2 --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
kafka-topics.sh --create --topic output-topic-tenant2 --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

Step 3: Deploy the Kafka Streams Application

We can now deploy the Kafka Streams application. Ensure that the application is packaged and run as a JAR, or it can be containerized with Docker for Kubernetes deployment.

Step 4: Produce and Consume Messages Per Tenant

To test the multi-tenant setup, we can produce and consume messages for each tenant using Kafka console producer and consumer commands.

# Produce messages for tenant1
kafka-console-producer.sh --broker-list localhost:9092 --topic input-topic-tenant1
> message1 for tenant1

# Consume messages for tenant1
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic output-topic-tenant1 --from-beginning

# Produce messages for tenant2
kafka-console-producer.sh --broker-list localhost:9092 --topic input-topic-tenant2
> message1 for tenant2

# Consume messages for tenant2
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic output-topic-tenant2 --from-beginning

Step 5: Scaling Kafka Streams for Multiple Tenants

To scale the application, you can dynamically add tenants by introducing new tenant IDs and creating new input/output topics as needed. Kafka Streams can be scaled horizontally by deploying multiple instances of the application on Kubernetes or a cloud provider like AWS or GCP.

Conclusion

In this guide, we explored how to build a multi-tenant Kafka Streams application by isolating tenants with separate topics and consumer groups. We demonstrated how to deploy a multi-tenant Kafka application, create tenant-specific topics, and handle tenant-based data processing independently. This setup is flexible and can scale as the number of tenants increases, making Kafka a powerful choice for multi-tenant stream processing architectures.