Install Apache Kafka on Ubuntu 20.04: Best Data Handle
In today’s data-driven world, the ability to process and analyze real-time data streams is crucial for businesses of all sizes. Apache Kafka, a distributed streaming platform, has emerged as the industry standard for handling data in motion. Capable of processing trillions of events per day, Kafka is more than just a messaging queue; it’s a robust and scalable solution for building real-time data pipelines and streaming applications. This guide provides a step-by-step walkthrough on how to Install Apache Kafka on Ubuntu 20.04.
This guide, inspired by the expertise at Orcacore, will walk you through the process of setting up Apache Kafka on your Ubuntu 20.04 server. By following these instructions, you’ll be well on your way to leveraging the power of Kafka for your data streaming needs.
Before you begin to Install Apache Kafka on Ubuntu 20.04, ensure you have a non-root user with sudo privileges on your Ubuntu 20.04 server. A basic firewall should also be configured. If you haven’t already done so, refer to a guide on Initial Server Setup with Ubuntu 20.04 for assistance.
1. Install Required Packages For Kafka on Ubuntu 20.04
The first step in setting up Kafka is to ensure your server has the necessary dependencies. Start by updating your local package index and upgrading existing packages:
# sudo apt update
# sudo apt upgrade
Next, install the required packages, including Java Runtime Environment (JRE), Java Development Kit (JDK), wget, git, and unzip:
sudo apt install default-jre wget git unzip default-jdk -y
2. Set up Apache Kafka on Ubuntu 20.04
With the dependencies in place, you can now proceed with downloading and installing Kafka.
Apache Kafka Download
Visit the Apache Kafka downloads page to find the latest release. Under "Binary downloads," select a recommended version and use the wget
command to download the archive:
sudo wget https://downloads.apache.org/kafka/3.3.2/kafka_2.13-3.3.2.tgz
Create a directory for Kafka under /usr/local
and navigate to it:
# sudo mkdir /usr/local/kafka-server
# sudo cd /usr/local/kafka-server
Extract the downloaded Kafka archive into this directory:
sudo tar -xvzf ~/kafka_2.13-3.3.2.tgz --strip 1
Create Zookeeper Systemd Unit File
Zookeeper is a critical component of Kafka, acting as a centralized service for managing configuration data, naming, and synchronization within the distributed system. It tracks the status of Kafka cluster nodes, topics, and partitions.
Create a systemd unit file for Zookeeper to simplify service management:
sudo vi /etc/systemd/system/zookeeper.service
Add the following content to the file:
[Unit]
Description=Apache Zookeeper Server
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
ExecStart=/usr/local/kafka-server/bin/zookeeper-server-start.sh /usr/local/kafka-server/config/zookeeper.properties
ExecStop=/usr/local/kafka-server/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Save and close the file.
Create Systemd Unit File for Kafka
Now, create a systemd unit file for Kafka itself:
sudo vi /etc/systemd/system/kafka.service
Add the following content, ensuring your JAVA_HOME
configuration is correct:
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64"
ExecStart=/usr/local/kafka-server/bin/kafka-server-start.sh /usr/local/kafka-server/config/server.properties
ExecStop=/usr/local/kafka-server/bin/kafka-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Save and close the file.
Manage Kafka and Zookeeper Services
Reload the systemd daemon to apply the changes and then start and enable the Zookeeper and Kafka services:
# sudo systemctl daemon-reload
# sudo systemctl enable --now zookeeper
# sudo systemctl enable --now kafka
Verify that both Kafka and Zookeeper services are active and running:
# sudo systemctl status kafka

# sudo systemctl status zookeeper

3. Install CMAK on Ubuntu 20.04
CMAK (formerly Kafka Manager) provides a user-friendly interface for managing your Kafka clusters. Clone the CMAK repository from GitHub:
# cd ~
# sudo git clone https://github.com/yahoo/CMAK.git

Configure CMAK on Ubuntu 20.04
Modify the CMAK configuration file to specify the Zookeeper host:
sudo vi ~/CMAK/conf/application.conf
Update the cmak.zkhosts
value to point to your Zookeeper instance. You can specify multiple Zookeeper hosts by comma delimiting them.
cmak.zkhosts="localhost:2181"
Save and close the file.
Create a zip file for deploying the application. This process may take some time as files are downloaded and compiled.
# cd ~/CMAK/
# ./sbt clean dist
Upon completion, you should see the following output:
**Output**
[info] Your package is ready in /root/CMAK/target/universal/cmak-3.0.0.7.zip
Navigate to the directory containing the zip file and extract it:
# cd /root/CMAK/target/universal
# unzip cmak-3.0.0.7.zip
# cd cmak-3.0.0.7
Access CMAK Service
Run the CMAK service:
bin/cmak
By default, CMAK runs on port 9000. Open your web browser and navigate to http://ip-or-domain-name-of-server:9000
. If your firewall is enabled, allow access to port 9000:
sudo ufw allow 9000
You should see the CMAK login interface:

Add Cluster From the CMAK
To add your Kafka cluster, click "cluster" and then "add cluster."

Fill in the form with the required details, such as Cluster Name and Zookeeper Hosts. If you have multiple Zookeeper hosts, separate them with commas. Configure other details as needed.

Create a Topic in the CMAK interface
From your newly added cluster, click on "Topic" and then "create." Enter the required details for the new topic, such as Replication Factor and Partitions. Click "Create" when finished.

Click on "Cluster view" to see your topics.

From here, you can add, delete, configure, and manage your topics. The process of Install Apache Kafka on Ubuntu 20.04 is now complete.
Conclusion
You have successfully completed the process to Install Apache Kafka on Ubuntu 20.04, accessed the CMAK interface, and created clusters. Now you can begin to explore the vast capabilities of Apache Kafka and build powerful data streaming applications.
Alternative Solutions for Setting Up Apache Kafka on Ubuntu 20.04
While the manual installation method outlined above provides a solid understanding of the underlying components, alternative approaches exist that can simplify and accelerate the deployment process. Here are two different ways to solve the problem of setting up Kafka:
1. Using Docker Compose:
Docker Compose simplifies the deployment of multi-container Docker applications. You can define a docker-compose.yml
file that specifies the Kafka and Zookeeper containers, their dependencies, and configurations. This approach offers several advantages, including portability, reproducibility, and simplified management.
Here’s an example docker-compose.yml
file:
version: '3.7'
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ports:
- "2181:2181"
kafka:
image: confluentinc/cp-kafka:latest
depends_on:
- zookeeper
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,BROKER://localhost:29092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,BROKER:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: BROKER
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
To deploy Kafka using Docker Compose, save the above content to a file named docker-compose.yml
and run the following command in the same directory:
docker-compose up -d
This command will download the necessary images, create the containers, and start Kafka and Zookeeper in detached mode.
2. Using Ansible:
Ansible is an automation tool that allows you to define infrastructure as code. You can create an Ansible playbook to automate the installation and configuration of Kafka on your Ubuntu 20.04 server. This approach is particularly useful for deploying Kafka across multiple servers or for maintaining a consistent configuration across your environment.
Here’s a simplified example of an Ansible playbook for installing Kafka:
---
- hosts: kafka_servers
become: true
tasks:
- name: Install required packages
apt:
name: "{{ packages }}"
state: present
vars:
packages:
- default-jre
- wget
- git
- unzip
- default-jdk
- name: Download Kafka
get_url:
url: "https://downloads.apache.org/kafka/3.3.2/kafka_2.13-3.3.2.tgz"
dest: /tmp/kafka_2.13-3.3.2.tgz
- name: Create Kafka directory
file:
path: /usr/local/kafka-server
state: directory
owner: root
group: root
- name: Extract Kafka
unarchive:
src: /tmp/kafka_2.13-3.3.2.tgz
dest: /usr/local/kafka-server
remote_src: yes
creates: /usr/local/kafka-server/bin/kafka-server-start.sh
extra_opts: [--strip-components=1]
# Add more tasks to configure Zookeeper and Kafka services
This playbook demonstrates the basic steps of installing required packages, downloading and extracting the Kafka archive, and creating the Kafka directory. You would need to add additional tasks to configure Zookeeper, create systemd unit files, and manage the Kafka service. This approach to Install Apache Kafka on Ubuntu 20.04 is very useful when you need to deploy it in multiple servers.
By leveraging Docker Compose or Ansible, you can significantly streamline the process of setting up Kafka on Ubuntu 20.04, saving time and effort while ensuring a consistent and reliable deployment.