Getting to Know Kafka: An Introduction to Data Streaming Framework
Flow of Data represented using Neon light paths

Getting to Know Kafka: An Introduction to Data Streaming Framework

What is Kafka? 🤔

  • Open Source, Distributed Streaming platform used for creating real-time event-driven applications.
  • Allows developers to make applications that continuously produce and consume streams of data records.
  • It replicates and partitions the records that are produced in such a way that all the users can use it simultaneously without any perceptible lag.
  • It is fast, highly accurate, the packets are in order, and it is fault-tolerant (due to replication).

Use cases 🚀

Some of the common use cases of Kafka are as follows. Every use case application can range from simple to complex in nature.

  • Decoupling Applications
  • Metrics
  • Messaging
  • Location Tracking
  • Data Gathering
  • Log Aggregation

How Kafka works? 🛠️

Kafka works by allowing producers to send data to a topic, which is then partitioned and replicated across a cluster of servers. Consumers can then subscribe to one or more topics and process the data in real-time.

Kafka is built on four core APIs

1. Producer API

  • It creates the records and produces them as topics
  • A topic is an ordered list of events.
  • It can be persisted to a disc for however long based on the requirement and storage availability.

2. Consumer API

  • It subscribes to one or more topics and listens and ingests that data
  • It can subscribe to real time topics.
  • It can also take topics from the persistent disc.

3. Streams API

  • It is used to transform the data.
  • It leverages both Producer and Consumer APIs
  • It will consume from a topic or topics, and then it will analyze, aggregate, or otherwise transform the data in real-time and then produce the resulting streams to the same or new topics
  • This is what covers the complex use cases like location tracking a data gathering.

4. Connector API

  • It enables developers to write connectors
  • Connectors are basically reusable and richer producers and consumers.
  • It makes a developer's life easy by allowing them to write less code to get a job done like creating a producer or consumer.
  • Developers can use connectors to directly connect to any data source, they just need to configure it in order to get that data source into their cluster.

The below diagram illustrates the basic workflow of Kafka

No alt text provided for this image
Basic workflow of Kafka

Conclusion ⚡️

Kafka is a powerful, open-source, distributed streaming platform that is used for building real-time data pipelines and streaming applications. It is designed to handle high volume, high throughput, and low latency data streams, and it can be used to process and analyze real-time data in a variety of use cases.

Reference Video 📽️

To view or add a comment, sign in

More articles by Anurag Pola

Insights from the community

Others also viewed

Explore topics