Apache Kafka Connect
Apache Kafka Connect

Apache Kafka Connect

What is Kafka Connect?

For streaming data between Apache Kafka and other systems, scalably as well as reliably, we use Apache Kafka Connect. Moreover, connect makes it very simple to quickly define Kafka connectors that move large collections of data into and out of Kafka.

Read Apache Kafka Streams | Stream Processing Topology

Kafka Connect collects metrics or takes the entire database from application servers into Kafka Topic. It can make available data with low latency for Stream processing.

Kafka Connect Features

There are following features of Kafka Connect:

a. A common framework for Kafka connectors

It standardizes integration of other data systems with Kafka. Also, simplifies connector development, deployment, and management.

Top 5 Apache Kafka Books

b. Distributed and standalone modes

Scale up to a large, centrally managed service supporting an entire organization or scale down to development, testing, and small production deployments.

c. REST interface

By an easy to use REST API, we can submit and manage connectors to our Kafka Connect cluster.

d. Automatic offset management

However, Kafka Connect can manage the offset commit process automatically even with just a little information from connectors. Hence, connector developers do not need to worry about this error-prone part of connector development.

e. Distributed and scalable by default

It builds upon the existing group management protocol. And to scale up a Kafka Connect cluster we can add more workers.

f. Streaming/batch integration

We can say for bridging streaming and batch data systems Kafka Connect is an ideal solution.

Apache Kafka Workflow | Kafka Pub-Sub Messaging

Why Kafka Connect?

As we know, like Flume, there are many tools which are capable of writing to Kafka or reading from Kafka or also can import and export data. So, the question occurs that, why do we need Kafka Connect. Hence, here we are listing the primary advantages:

a. Auto-recovery after failure

To each record, a “source” connector can attach arbitrary “source location” information which it passes to Kafka Connect. Hence, at the time of failure Kafka Connect will automatically provide this information back to the connector. In this way, it can resume where it failed. Additionally, auto recovery for “sink” connectors is even easier.

Read also Advantages and Disadvantages of Kafka.

b. Autofailover

Auto-failover is possible because the Kafka Connect nodes build a Kafka cluster. That means if suppose one node fails the work it is doing is redistributed to other nodes.

c. Simple Parallelism

A connector can define data import or export tasks, especially which execute in parallel.

Kafka Connect Concepts

  • An operating-system process (Java-based) which executes connectors and their associated tasks in child threads, is what we call a Kafka Connect worker.
  • Also, there is an object which defines parameters for one or more tasks which should actually do the work of importing or exporting data, is what we call a connector.
  • To read from some arbitrary input and write to Kafka, a source connector generates tasks.
  • In order to read from Kafka and write to some arbitrary output, a sink connector generates tasks.

Read Complete Article>>

To view or add a comment, sign in

More articles by Malini Shukla

Insights from the community

Others also viewed

Explore topics