Why should you put a proxy in front of Kafka ?

Why should you put a proxy in front of Kafka ?

We can easily consider Apache Kafka to be the main solution for companies that want and need streaming in their Information Systems. It comes with many advantages like high throughput and resilience. However, it can also introduce significant complexity and bad practices, making it like the Black Pearl : a legendary ship that only a minority of people can pilot.

Having a cursed ship that plunders resources is exactly what a company doesn't need. It needs a solution that contributes to profitability.

Abstract governance

When I discuss about Kafka and its problems, the solution most often shared with me is : apply governance rules. This can be done by :

  • Setting up RBAC on Kafka : This is helpful with a prefix-based strategy, provides fine-grained control, and is a core Kafka feature. However, the list of rules can grow quickly, becoming difficult to maintain and remaining largely static.
  • GitOps rules : Defining rules with the help of CI/CD pre-validation gives control over Kafka resources management (Terraform, Strimzi k8s operator). However, we must ensure that only our CI/CD is able to manage Kafka resources. Moreover, the time between the resource declaration and its application on Kafka can increase rapidly, and we don't like latency in the Kafka world..
  • Self-controller : Creating an application (which can be an API with an optional frontend) tp host your conventions and ensure they are applied. This gives you real-time control over resource management. However, you created a specific contract (often HTTP or GRPC) that you must maintain and synchronize with your consumers. This leads to specific libraries and their complexity, or worse, a frontend to manage Kafka resources.. A GitOps strategy is a must-have for (at least) production environment, using a frontend will negate its benefits.

All of these solutions offer possibilities to apply governance, but if you want to do it at scale, with minimal effort, they are likely insufficient. The Strimzi Operator for example, is certainly the best way to manage Kafka resources, but it's not enough to apply our governance. It must be combined with something else.


“You certainly need synchronous in this asynchronous world”

A Kafka Proxy is a component between clients (consumers, producers or admins) and the cluster. Proxy is transparent, meaning it directly uses the Kafka protocol. Clients connect to Proxy, and Proxy to the cluster.

Using the Kafka protocol directly makes migration effortless. Any Kafka client can directly migrate its connection to the proxy, without changing its library or anything else. As Proxy's owner, you can add logic in real-time for each request going to Kafka, for each response coming from Kafka, or both. You act as a man-in-the-middle for Kafka (for good purposes, of course).

Each request going to Kafka can be validated or transformed. Applying governance rules is now easy, regardless of the tool used in front. You just need to ensure that all communication with the Kafka cluster passes through the Kafka Proxy.


Article content
Create topic success

Going back to our governance use case, we can set up our rules at the proxy level :

  1. Client sends a Create Topic Request
  2. Proxy validates and forwards the request to the Kafka cluster
  3. Kafka cluster creates the topic and sends a success response to Proxy
  4. Proxy forwards the response to the client

That's it. We never transform the request protocol to or from HTTP or anything else. Just Kafka, and only Kafka.


Article content
Create topic violation

Now, imagine another Create Topic Request that doesn't follow your governance rules. The Proxy will simply not forward it to the Kafka cluster, and will directly respond with a rule violation to the client :

  1. Th client sends a Create Topic Request
  2. Proxy invalidates it and responds to the client


Summary

We have explored a basic use case for the Kafka proxy. Keep in mind: because we can control any request and / or response, we can enhance the Kafka Protocol for many things :

  • Ensure conventions
  • Limit pressure
  • Data masking
  • Encryption at rest
  • Monitoring

And we can go further with features like :

  • Multi-tenancy
  • Topic mapping
  • Virtualization

If I've piqued your interest, in a future article we'll explore how to improve business via a Kafka Proxy.

Author: Anthony Callaert - Staff engineer

Usman Ibrahim

Distinguished Architect at TJX Companies

1mo

This gateway is already there in the industry. e.g., Gravitee, Conduktor!

Nice post !

Like
Reply
Asaf Mesika

Principal Engineer @ Finout; Apache Pulsar committer; OpenTelemetry member; Managing Tech Leads IL & Java.IL

2mo

Did you write one ?

Like
Reply
Quentin Packard

SVP, GTM at Conduktor | Helping Customers Win with Streaming Data, Lakehouse Hydration & Agentic AI | Twin Dad, Coach, MBA

2mo

Very insightful Anthony Callaert - we love a Kafka Proxy connected world. Stephane Derosiaux

Francesco Tisiot

Field CTO @ Aiven | Data and AI | Open Source | Streaming | Databases

2mo

Great insight!

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics