Install Apache Kafka on Ubuntu 20.04: Best Data Handle

Posted on

Install Apache Kafka on Ubuntu 20.04: Best Data Handle

Install Apache Kafka on Ubuntu 20.04: Best Data Handle

In today’s data-driven world, the ability to process and analyze real-time data streams is crucial for businesses of all sizes. Apache Kafka, a distributed streaming platform, has emerged as the industry standard for handling data in motion. Capable of processing trillions of events per day, Kafka is more than just a messaging queue; it’s a robust and scalable solution for building real-time data pipelines and streaming applications. This guide provides a step-by-step walkthrough on how to Install Apache Kafka on Ubuntu 20.04.

This guide, inspired by the expertise at Orcacore, will walk you through the process of setting up Apache Kafka on your Ubuntu 20.04 server. By following these instructions, you’ll be well on your way to leveraging the power of Kafka for your data streaming needs.

Before you begin to Install Apache Kafka on Ubuntu 20.04, ensure you have a non-root user with sudo privileges on your Ubuntu 20.04 server. A basic firewall should also be configured. If you haven’t already done so, refer to a guide on Initial Server Setup with Ubuntu 20.04 for assistance.

1. Install Required Packages For Kafka on Ubuntu 20.04

The first step in setting up Kafka is to ensure your server has the necessary dependencies. Start by updating your local package index and upgrading existing packages:

# sudo apt update
# sudo apt upgrade

Next, install the required packages, including Java Runtime Environment (JRE), Java Development Kit (JDK), wget, git, and unzip:

sudo apt install default-jre wget git unzip default-jdk -y

2. Set up Apache Kafka on Ubuntu 20.04

With the dependencies in place, you can now proceed with downloading and installing Kafka.

Apache Kafka Download

Visit the Apache Kafka downloads page to find the latest release. Under "Binary downloads," select a recommended version and use the wget command to download the archive:

sudo wget https://downloads.apache.org/kafka/3.3.2/kafka_2.13-3.3.2.tgz

Create a directory for Kafka under /usr/local and navigate to it:

# sudo mkdir /usr/local/kafka-server
# sudo cd /usr/local/kafka-server

Extract the downloaded Kafka archive into this directory:

sudo tar -xvzf ~/kafka_2.13-3.3.2.tgz --strip 1

Create Zookeeper Systemd Unit File

Zookeeper is a critical component of Kafka, acting as a centralized service for managing configuration data, naming, and synchronization within the distributed system. It tracks the status of Kafka cluster nodes, topics, and partitions.

Create a systemd unit file for Zookeeper to simplify service management:

sudo vi /etc/systemd/system/zookeeper.service

Add the following content to the file:

[Unit]
Description=Apache Zookeeper Server
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/usr/local/kafka-server/bin/zookeeper-server-start.sh /usr/local/kafka-server/config/zookeeper.properties
ExecStop=/usr/local/kafka-server/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save and close the file.

Create Systemd Unit File for Kafka

Now, create a systemd unit file for Kafka itself:

sudo vi /etc/systemd/system/kafka.service

Add the following content, ensuring your JAVA_HOME configuration is correct:

[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64"
ExecStart=/usr/local/kafka-server/bin/kafka-server-start.sh /usr/local/kafka-server/config/server.properties
ExecStop=/usr/local/kafka-server/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save and close the file.

Manage Kafka and Zookeeper Services

Reload the systemd daemon to apply the changes and then start and enable the Zookeeper and Kafka services:

# sudo systemctl daemon-reload
# sudo systemctl enable --now zookeeper
# sudo systemctl enable --now kafka

Verify that both Kafka and Zookeeper services are active and running:

# sudo systemctl status kafka
apache kafka servcie status
# sudo systemctl status zookeeper
Zookeeper service status

3. Install CMAK on Ubuntu 20.04

CMAK (formerly Kafka Manager) provides a user-friendly interface for managing your Kafka clusters. Clone the CMAK repository from GitHub:

# cd ~
# sudo git clone https://github.com/yahoo/CMAK.git
Download Kafka manager

Configure CMAK on Ubuntu 20.04

Modify the CMAK configuration file to specify the Zookeeper host:

sudo vi ~/CMAK/conf/application.conf

Update the cmak.zkhosts value to point to your Zookeeper instance. You can specify multiple Zookeeper hosts by comma delimiting them.

cmak.zkhosts="localhost:2181"

Save and close the file.

Create a zip file for deploying the application. This process may take some time as files are downloaded and compiled.

# cd ~/CMAK/
# ./sbt clean dist

Upon completion, you should see the following output:

**Output**

[info] Your package is ready in /root/CMAK/target/universal/cmak-3.0.0.7.zip

Navigate to the directory containing the zip file and extract it:

# cd /root/CMAK/target/universal
# unzip cmak-3.0.0.7.zip
# cd cmak-3.0.0.7

Access CMAK Service

Run the CMAK service:

bin/cmak

By default, CMAK runs on port 9000. Open your web browser and navigate to http://ip-or-domain-name-of-server:9000. If your firewall is enabled, allow access to port 9000:

sudo ufw allow 9000

You should see the CMAK login interface:

Cluster Manager for Apache Kafka
Cluster Manager for Apache Kafka Login

Add Cluster From the CMAK

To add your Kafka cluster, click "cluster" and then "add cluster."

Apache kafka add cluster Ubuntu 20.04
Add Cluster

Fill in the form with the required details, such as Cluster Name and Zookeeper Hosts. If you have multiple Zookeeper hosts, separate them with commas. Configure other details as needed.

Cluster information
Cluster Info

Create a Topic in the CMAK interface

From your newly added cluster, click on "Topic" and then "create." Enter the required details for the new topic, such as Replication Factor and Partitions. Click "Create" when finished.

Cluster topic Kafka
Create Topic

Click on "Cluster view" to see your topics.

Topic info
Topic Summary

From here, you can add, delete, configure, and manage your topics. The process of Install Apache Kafka on Ubuntu 20.04 is now complete.

Conclusion

You have successfully completed the process to Install Apache Kafka on Ubuntu 20.04, accessed the CMAK interface, and created clusters. Now you can begin to explore the vast capabilities of Apache Kafka and build powerful data streaming applications.

Alternative Solutions for Setting Up Apache Kafka on Ubuntu 20.04

While the manual installation method outlined above provides a solid understanding of the underlying components, alternative approaches exist that can simplify and accelerate the deployment process. Here are two different ways to solve the problem of setting up Kafka:

1. Using Docker Compose:

Docker Compose simplifies the deployment of multi-container Docker applications. You can define a docker-compose.yml file that specifies the Kafka and Zookeeper containers, their dependencies, and configurations. This approach offers several advantages, including portability, reproducibility, and simplified management.

Here’s an example docker-compose.yml file:

version: '3.7'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - "2181:2181"

  kafka:
    image: confluentinc/cp-kafka:latest
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,BROKER://localhost:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,BROKER:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: BROKER
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

To deploy Kafka using Docker Compose, save the above content to a file named docker-compose.yml and run the following command in the same directory:

docker-compose up -d

This command will download the necessary images, create the containers, and start Kafka and Zookeeper in detached mode.

2. Using Ansible:

Ansible is an automation tool that allows you to define infrastructure as code. You can create an Ansible playbook to automate the installation and configuration of Kafka on your Ubuntu 20.04 server. This approach is particularly useful for deploying Kafka across multiple servers or for maintaining a consistent configuration across your environment.

Here’s a simplified example of an Ansible playbook for installing Kafka:

---
- hosts: kafka_servers
  become: true
  tasks:
    - name: Install required packages
      apt:
        name: "{{ packages }}"
        state: present
      vars:
        packages:
          - default-jre
          - wget
          - git
          - unzip
          - default-jdk

    - name: Download Kafka
      get_url:
        url: "https://downloads.apache.org/kafka/3.3.2/kafka_2.13-3.3.2.tgz"
        dest: /tmp/kafka_2.13-3.3.2.tgz

    - name: Create Kafka directory
      file:
        path: /usr/local/kafka-server
        state: directory
        owner: root
        group: root

    - name: Extract Kafka
      unarchive:
        src: /tmp/kafka_2.13-3.3.2.tgz
        dest: /usr/local/kafka-server
        remote_src: yes
        creates: /usr/local/kafka-server/bin/kafka-server-start.sh
        extra_opts: [--strip-components=1]

    # Add more tasks to configure Zookeeper and Kafka services

This playbook demonstrates the basic steps of installing required packages, downloading and extracting the Kafka archive, and creating the Kafka directory. You would need to add additional tasks to configure Zookeeper, create systemd unit files, and manage the Kafka service. This approach to Install Apache Kafka on Ubuntu 20.04 is very useful when you need to deploy it in multiple servers.

By leveraging Docker Compose or Ansible, you can significantly streamline the process of setting up Kafka on Ubuntu 20.04, saving time and effort while ensuring a consistent and reliable deployment.

Leave a Reply

Your email address will not be published. Required fields are marked *