Install Apache Kafka on Ubuntu 22.04 with Full Steps

In this comprehensive guide, brought to you by Orcacore, we’ll walk you through the process to Install Apache Kafka on Ubuntu 22.04. Apache Kafka is a powerful open-source distributed streaming platform designed for building real-time data pipelines and streaming applications. Initially developed at LinkedIn to manage high-volume data streams in 2011, Kafka has evolved into a robust event streaming platform capable of handling over a million messages per second, processing trillions of messages daily. Understanding how to Install Apache Kafka on Ubuntu 22.04 is essential for modern data engineering.

Steps To Install and Configure Apache Kafka on Ubuntu 22.04

Before you begin this guide, ensure you are logged into your Ubuntu 22.04 server as a non-root user with sudo privileges and have a basic firewall configured. You can refer to our guide on Initial Server Setup with Ubuntu 22.04 for assistance. Once you’ve completed the initial server setup, proceed with the following steps to Install Apache Kafka on Ubuntu 22.04.

1. Install Required Packages For Kafka Ubuntu 22.04

To begin, prepare your Ubuntu 22.04 server by updating the package index and upgrading existing packages. Use the following command:

sudo apt update && sudo apt upgrade

Next, install the necessary packages, including the Java Runtime Environment (JRE), the Java Development Kit (JDK), wget, git, and unzip. These packages are crucial for running Kafka and related tools.

sudo apt install default-jre wget git unzip default-jdk -y

2. Install Apache Kafka on Ubuntu 22.04

Now, you’re ready to download and install the latest release of Apache Kafka.

Download Kafka Ubuntu 22.04

Visit the Apache Kafka downloads page. Locate the latest release and obtain the binary download link. Use the wget command to download the Kafka package to your server.

sudo wget https://downloads.apache.org/kafka/3.3.2/kafka_2.13-3.3.2.tgz

After downloading, create a directory for Kafka under /usr/local and navigate to it.

sudo mkdir /usr/local/kafka-server && sudo cd /usr/local/kafka-server

Extract the downloaded Kafka archive into the newly created directory. The --strip 1 option removes the top-level directory from the archive during extraction.

sudo tar -xvzf ~/kafka_2.13-3.3.2.tgz --strip 1

Create Zookeeper Systemd Unit File

Zookeeper is a critical component for managing Kafka. It acts as a centralized service for maintaining configuration information, naming, and providing synchronization within distributed systems. Zookeeper tracks the status of Kafka cluster nodes, topics, and partitions.

Create a systemd unit file for Zookeeper to facilitate service management. Use your preferred text editor (e.g., vi) to create the file:

sudo vi /etc/systemd/system/zookeeper.service

Add the following content to the file:

[Unit]
Description=Apache Zookeeper Server
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/usr/local/kafka-server/bin/zookeeper-server-start.sh /usr/local/kafka-server/config/zookeeper.properties
ExecStop=/usr/local/kafka-server/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save and close the file.

Create Systemd Unit File for Kafka

Next, create a systemd unit file for Kafka itself. This file defines how Kafka is managed as a service. Use vi or another text editor:

sudo vi /etc/systemd/system/kafka.service

Add the following content to the file. Important: Ensure the JAVA_HOME configuration is correct, or Kafka will fail to start.

[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64"
ExecStart=/usr/local/kafka-server/bin/kafka-server-start.sh /usr/local/kafka-server/config/server.properties
ExecStop=/usr/local/kafka-server/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save and close the file.

Reload the systemd daemon to apply the changes, then enable and start the Zookeeper and Kafka services.

sudo systemctl daemon-reload
sudo systemctl enable --now zookeeper
sudo systemctl enable --now kafka

Verify that the Kafka and Zookeeper services are active and running:

sudo systemctl status kafka

**Output**
● kafka.service - Apache Kafka Server
     Loaded: loaded (/etc/systemd/system/kafka.service; enabled; vendor preset: enabled)
     Active: **active** (**running**) since Wed 2023-01-25 06:59:47 UTC; 4s ago
       Docs: http://kafka.apache.org/documentation.html
   Main PID: 5605 (java)
      Tasks: 36 (limit: 4575)
     Memory: 282.7M
        CPU: 6.198s
     CGroup: /system.slice/kafka.service
             └─5605 /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xmx1G -Xms1G -
...

sudo systemctl status zookeeper

**Output**
● zookeeper.service - Apache Zookeeper Server
     Loaded: loaded (/etc/systemd/system/zookeeper.service; enabled; vendor preset: enabled)
     Active: **active** (**running**) since Wed 2023-01-25 06:59:40 UTC; 2min 29s ago
   Main PID: 5144 (java)
      Tasks: 32 (limit: 4575)
     Memory: 67.4M
        CPU: 3.660s
     CGroup: /system.slice/zookeeper.service
             └─5144 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseM>
...

3. Install Cluster Manager for Apache Kafka Ubuntu 22.04

CMAK (formerly known as Kafka Manager) is a web-based tool developed by Yahoo for managing Apache Kafka clusters. To install CMAK, clone the repository from GitHub:

cd ~
sudo git clone https://github.com/yahoo/CMAK.git

**Output**
Cloning into 'CMAK'...
remote: Enumerating objects: 6542, done.
remote: Counting objects: 100% (266/266), done.
remote: Compressing objects: 100% (142/142), done.
remote: Total 6542 (delta 150), reused 187 (delta 112), pack-reused 6276
Receiving objects: 100% (6542/6542), 3.97 MiB | 12.02 MiB/s, done.
Resolving deltas: 100% (4211/4211), done.

Configure Cluster Manager for Apache Kafka

Modify the CMAK configuration file to specify the Zookeeper hosts. Open the file with vi:

sudo vi ~/CMAK/conf/application.conf

Change the cmak.zkhosts value to your Zookeeper host(s). You can specify multiple hosts by comma-delimiting them.

cmak.zkhosts="localhost:2181"

Save and close the file.

Create a zip file for deploying the application:

cd ~/CMAK/
./sbt clean dist

This process will download and compile files, which may take some time.

**Output**
[info] Your package is ready in /root/CMAK/target/universal/cmak-3.0.0.7.zip

Navigate to the directory containing the zip file, unzip it, and enter the extracted directory:

cd /root/CMAK/target/universal
unzip cmak-3.0.0.7.zip
cd cmak-3.0.0.7

Access CMAK Service

Start the Cluster Manager for Apache Kafka:

bin/cmak

By default, CMAK runs on port 9000. Open your web browser and navigate to http://ip-or-domain-name-of-server:9000. If your firewall is enabled, allow access to port 9000:

sudo ufw allow 9000

You should see the CMAK interface.

Install Apache Kafka on Ubuntu 22.04. Cluster Manager — Cluster Manager for Apache Kafka

Add Cluster From the Kafka Manager

Click on "cluster" and then "add cluster" to add your Kafka cluster to CMAK.

Fill in the required details, such as the Cluster Name and Zookeeper Hosts.

Create a Topic in the CMAK interface

From the cluster view, click on "Topic" and then "create". Enter the details for the new topic, such as the Replication Factor and Partitions.

Click on "Cluster view" to see your topics.

You can now manage your Kafka topics through the CMAK interface.

Conclusion

You have successfully learned how to Install Apache Kafka on Ubuntu 22.04, access Kafka Manager, add clusters, and create topics. This is a fundamental step toward leveraging Kafka for real-time data streaming and processing.

Here are some articles you may also like:

Install Squid Proxy on Ubuntu 22.04

Install and Use Yarn on Ubuntu 22.04

Install XWiki on Debian 12

Create a File in Linux Command Line

Use htop command on Rocky Linux 8

Create and remove a directory in Linux

Get the Latest Version of CachyOS

Alternative Solutions for Managing Apache Kafka

While the guide above outlines a traditional method for installing and managing Kafka, two alternative approaches offer increased flexibility and scalability: using Docker and Kubernetes, and leveraging a managed Kafka service like Confluent Cloud or Amazon MSK.

1. Docker and Kubernetes for Apache Kafka

Explanation: Docker containers provide a consistent and isolated environment for running Kafka and its dependencies. Kubernetes orchestrates these containers, enabling automated deployment, scaling, and management. This approach simplifies the deployment process and ensures consistent performance across different environments. Managing Kafka on Kubernetes, offers a highly available and scalable solution.

Steps:

Create Docker Images: Build Docker images for Zookeeper and Kafka using Dockerfiles. These files define the environment and configurations for each service.
Define Kubernetes Deployments: Create Kubernetes deployment and service definitions for Zookeeper and Kafka. These definitions specify the number of replicas, resource requirements, and networking configurations.
Deploy to Kubernetes: Use kubectl to deploy the deployments and services to your Kubernetes cluster.
Manage with Kubernetes: Utilize Kubernetes features such as rolling updates, auto-scaling, and health checks to manage your Kafka cluster.

Code Example (simplified Kubernetes deployment for Kafka):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kafka
  template:
    metadata:
      labels:
        app: kafka
    spec:
      containers:
      - name: kafka
        image: wurstmeister/kafka:2.13-3.3.2
        ports:
        - containerPort: 9092
        env:
        - name: KAFKA_ADVERTISED_HOST_NAME
          value: kafka-service # Use the service name
        - name: KAFKA_ZOOKEEPER_CONNECT
          value: zookeeper-service:2181 # Zookeeper Service

Benefits:

Simplified Deployment: Docker containers encapsulate dependencies, eliminating compatibility issues.
Scalability: Kubernetes allows easy scaling of Kafka brokers based on demand.
High Availability: Kubernetes automatically restarts failed containers, ensuring high availability.
Resource Management: Kubernetes efficiently allocates resources to Kafka brokers.

2. Managed Kafka Services (Confluent Cloud, Amazon MSK)

Explanation: Managed Kafka services abstract away the complexities of Kafka deployment and management. Providers like Confluent Cloud and Amazon MSK handle tasks such as provisioning, scaling, patching, and monitoring, allowing users to focus on building applications.

Steps:

Create an Account: Sign up for an account with a managed Kafka service provider.
Create a Kafka Cluster: Use the provider’s console or API to create a Kafka cluster, specifying the desired configuration (e.g., number of brokers, storage capacity).
Configure Networking: Configure networking settings to allow your applications to connect to the Kafka cluster.
Develop Applications: Develop your Kafka producer and consumer applications using the provider’s client libraries.
Deploy and Monitor: Deploy your applications and monitor the performance of your Kafka cluster using the provider’s monitoring tools.

Code Example (using Confluent Cloud Kafka client in Python):

from confluent_kafka import Producer

conf = {
    'bootstrap.servers': 'YOUR_BOOTSTRAP_SERVERS',
    'security.protocol': 'SASL_SSL',
    'sasl.mechanisms': 'PLAIN',
    'sasl.username': 'YOUR_API_KEY',
    'sasl.password': 'YOUR_API_SECRET'
}

producer = Producer(conf)

def delivery_report(err, msg):
    if err is not None:
        print(f'Message delivery failed: {err}')
    else:
        print(f'Message delivered to {msg.topic()} [{msg.partition()}]')

topic = 'my-topic'
value = 'Hello, Kafka!'

producer.produce(topic, value.encode('utf-8'), callback=delivery_report)
producer.flush()

Benefits:

Reduced Operational Overhead: The provider handles infrastructure management, reducing operational costs.
Scalability: Managed services automatically scale Kafka clusters based on demand.
High Availability: Providers offer built-in redundancy and fault tolerance.
Security: Managed services provide security features such as encryption and access control.
Focus on Application Development: Developers can concentrate on building applications rather than managing infrastructure.

By exploring these alternative solutions, you can choose the deployment and management approach that best suits your needs and technical expertise. Whether you opt for the flexibility of Docker and Kubernetes or the simplicity of a managed service, there’s a Kafka solution that’s right for you. Learning to Install Apache Kafka on Ubuntu 22.04 is just the first step; mastering its deployment is equally important.

Install Apache Kafka on Ubuntu 22.04 with Full Steps

Steps To Install and Configure Apache Kafka on Ubuntu 22.04

1. Install Required Packages For Kafka Ubuntu 22.04

2. Install Apache Kafka on Ubuntu 22.04

Download Kafka Ubuntu 22.04

Create Zookeeper Systemd Unit File

Create Systemd Unit File for Kafka

3. Install Cluster Manager for Apache Kafka Ubuntu 22.04

Configure Cluster Manager for Apache Kafka

Access CMAK Service

Add Cluster From the Kafka Manager

Create a Topic in the CMAK interface

Conclusion

Alternative Solutions for Managing Apache Kafka

1. Docker and Kubernetes for Apache Kafka

2. Managed Kafka Services (Confluent Cloud, Amazon MSK)

Share this:

Related posts: