Exploring Apache Kafka: Key Concepts and Practical Examples
Apache Kafka has become a core tool for managing real-time data pipelines and streaming applications. Below is an overview of its key components and some example commands to help you get started with Kafka.
Kafka Concepts Overview
Setting up Kafka with Docker Compose
To use Kafka locally, here’s a sample docker-compose.yml file setup:
version: "3.8"
services:
master:
image: custom-spark-3.5:latest #docker.io/bitnami/spark:3.5
container_name: spark-master
hostname: master
environment:
- SPARK_MODE=master
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_USER=spark
- SPARK_LOG_CONF=/opt/bitnami/spark/conf/log4j2.properties
- PYTHONPATH=/opt/bitnami/spark/jobs/app:/opt/bitnami/spark/jobs/app/
ports:
- "8080:8080"
- "7077:7077"
volumes:
- ./scripts:/opt/spark/scripts
- ./data:/opt/spark/data
worker1:
image: custom-spark-3.5:latest #docker.io/bitnami/spark:3.5
container_name: spark-worker-1
hostname: worker1
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark-master:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=2
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_USER=spark
- SPARK_LOG_CONF=/opt/bitnami/spark/conf/log4j2.properties
volumes:
- ./scripts:/opt/spark/scripts
- ./data:/opt/spark/data
depends_on:
- master
worker2:
image: custom-spark-3.5:latest #docker.io/bitnami/spark:3.5
container_name: spark-worker-2
hostname: worker2
environment:
- SPARK_MODE=worker
- SPARK_MASTER_URL=spark://spark-master:7077
- SPARK_WORKER_MEMORY=1G
- SPARK_WORKER_CORES=2
- SPARK_RPC_AUTHENTICATION_ENABLED=no
- SPARK_RPC_ENCRYPTION_ENABLED=no
- SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
- SPARK_SSL_ENABLED=no
- SPARK_USER=spark
- SPARK_LOG_CONF=/opt/bitnami/spark/conf/log4j2.properties
volumes:
- ./scripts:/opt/spark/scripts
- ./data:/opt/spark/data
depends_on:
- master
zookeeper:
image: bitnami/zookeeper:latest
ports:
- 2181:2181
volumes:
- zookeeper-volume:/bitnami/zookeeper
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
kafka:
image: bitnami/kafka:latest
container_name: kafka
restart: on-failure
ports:
- 9092:9092
environment:
- KAFKA_CFG_BROKER_ID=1
- KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
- KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- KAFKA_CFG_NUM_PARTITIONS=3
- ALLOW_PLAINTEXT_LISTENER=yes
volumes:
- kafka-volume:/var/lib/kafka/data:z # Volume persistente para dados do Kafka
depends_on:
- zookeeper
kafka-ui:
image: provectuslabs/kafka-ui
container_name: kafka-ui
depends_on:
- kafka
- zookeeper
ports:
- "8889:8080"
restart: always
environment:
- KAFKA_CLUSTERS_0_NAME=kafka-ui
- KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=kafka:9092
- KAFKA_CLUSTERS_0_ZOOKEEPER=zookeeper:2181
jupyter:
image: jupyter/datascience-notebook:python-3.8.4
container_name: jupyter
ports:
- "8888:8888"
volumes:
- ./notebooks:/home/jovyan/work/notebooks:rw
- ./data/:/home/jovyan/work/data:rw
- ./py/:/home/jovyan/work/py:rw
environment:
JUPYTER_ENABLE_LAB: "yes"
command: "start-notebook.sh --NotebookApp.token='' --NotebookApp.password=''"
volumes:
kafka-volume:
external: false
zookeeper-volume:
external: false
To bring up the Kafka environment:
docker-compose up -d
Basic Kafka Commands
Here are a few useful commands to help you get hands-on with Kafka.
For the experiments below, after starting the environment with Kafka in Docker, open a different Linux shell and access the container in interactive mode with command:
docker exec -it kafka bash
1. Creating a Topic
Create a topic named my_topic with three partitions:
docker exec -it kafka bash
/opt/bitnami/kafka/bin/kafka-topics.sh --bootstrap-server kafka:9092 \
--create \
--topic test-topic \
--replication-factor 1 \
--partitions 3
Recommended by LinkedIn
2. Producing a Message to a Topic
Send a message to my_topic:
docker exec -it kafka bash
/opt/bitnami/kafka/bin/kafka-console-producer.sh --bootstrap-server kafka:9092 \
--topic test-topic
3. Consuming Messages from the Start
To read messages from the beginning of the topic:
docker exec -it kafka bash
/opt/bitnami/kafka/bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 \
--topic test-topic \
--from-beginning
4. Consuming Messages from the End
To read only new messages added to the topic:
docker exec -it kafka bash
/opt/bitnami/kafka/bin/kafka-console-consumer.sh --bootstrap-server kafka:9092 \
--topic test-topic
Kafka Use Cases
Kafka is highly versatile and used in scenarios such as:
Kafka on the Cloud: AWS, Azure, and GCP
Each cloud provider offers managed Kafka services:
In my next articles I will integrate this environment with pyspark and do several tests, stay tuned
Data Engineer Specialist | SQL | PL/SQL | Power BI | Python
5moThanks for sharing!
FullStack Developer @ Itaú Digital Assets | Go | TS | Blockchain | Aws
5moGreat content!
Computer Vision Engineer | Intel AI Innovator | C++ | Python | OpenCV | Open3D
6moThanks for sharing it!
Software Engineer | .NET C# | Full Stack Engineer | Backend Engineer | Azure | AWS
6moNice content!