Best Practices To Install Apache Kafka on Debian 11

Posted on

Best Practices To Install Apache Kafka on Debian 11

In this guide, we aim to show you how to Install Apache Kafka on Debian 11. Apache Kafka is an open-source distributed publish-subscribe messaging platform specifically designed to handle real-time streaming data for distributed streaming, pipelining, and replay of data feeds, enabling fast, scalable operations. It’s a powerful tool for building real-time data pipelines and streaming applications.

Kafka operates as a broker-based solution, managing streams of data as records within a cluster of servers. These Kafka servers can be geographically distributed across multiple data centers, ensuring data persistence by storing streams of records (messages) across multiple server instances within topics. A topic organizes records or messages as a series of tuples, which are immutable Python objects consisting of a key, a value, and a timestamp.

Let’s delve into the steps required to set up Apache Kafka on Debian 11.

Steps To Install and Configure Apache Kafka on Debian 11

Before proceeding, ensure you are logged in to your Debian 11 server as a non-root user with sudo privileges and have a basic firewall configured. Refer to our guide on Initial Server Setup with Debian 11 for detailed instructions.

1. Install Required Packages For Kafka

First, prepare your server for the Install Apache Kafka on Debian 11 process. Update and upgrade your local package index using the following command:

sudo apt update && sudo apt upgrade

Next, install the necessary packages, including JRE and JDK, on your Debian 11 system:

sudo apt install default-jre wget git unzip default-jdk -y

2. Install Apache Kafka on Debian 11

Now, download the latest release of Kafka.

Download Kafka Debian

Visit the Apache Kafka downloads page and locate the latest release. Under Binary downloads, obtain the sources, using the wget command:

sudo wget https://downloads.apache.org/kafka/3.3.2/kafka_2.13-3.3.2.tgz

Create a directory for Kafka under the /usr/local directory and navigate to it:

sudo mkdir /usr/local/kafka-server && sudo cd /usr/local/kafka-server

Extract the downloaded file into this directory:

sudo tar -xvzf ~/kafka_2.13-3.3.2.tgz --strip 1

Create Zookeeper Systemd Unit File

Create a Zookeeper systemd unit file to streamline common service actions like starting, stopping, and restarting Kafka.

Zookeeper is a top-level Apache software that acts as a centralized service, managing naming and configuration data and providing flexible and robust synchronization within distributed systems. Zookeeper monitors the status of Kafka cluster nodes and tracks Kafka topics, partitions, etc.

Use your preferred text editor (e.g., vi editor) to create the zookeeper systemd unit file:

sudo vi /etc/systemd/system/zookeeper.service

Add the following content to the file:

[Unit]
Description=Apache Zookeeper Server
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/usr/local/kafka-server/bin/zookeeper-server-start.sh /usr/local/kafka-server/config/zookeeper.properties
ExecStop=/usr/local/kafka-server/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save and close the file.

Create Systemd Unit File for Kafka

Create a systemd unit file for Apache Kafka on Debian 11. Again, use your preferred text editor:

sudo vi /etc/systemd/system/kafka.service

Add the following content, ensuring that your _JAVAHOME configurations are correct:

[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64"
ExecStart=/usr/local/kafka-server/bin/kafka-server-start.sh /usr/local/kafka-server/config/server.properties
ExecStop=/usr/local/kafka-server/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save and close the file.

Reload the systemd daemon to apply changes and start the services:

sudo systemctl daemon-reload
sudo systemctl enable --now zookeeper
sudo systemctl enable --now kafka

Verify that your Kafka and Zookeeper services are active and running:

sudo systemctl status kafka
Output
● kafka.service - Apache Kafka Server
     Loaded: loaded (/etc/systemd/system/kafka.service; enabled; vendor preset:>
     Active: active (running) since Mon 2023-01-30 04:30:51 EST; 12s ago
       Docs: http://kafka.apache.org/documentation.html
   Main PID: 5850 (java)
      Tasks: 69 (limit: 4679)
     Memory: 328.2M
        CPU: 7.525s
     CGroup: /system.slice/kafka.service
             └─5850 /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xmx1G -Xms1G -
...
sudo systemctl status zookeeper
Output
● zookeeper.service - Apache Zookeeper Server
     Loaded: loaded (/etc/systemd/system/zookeeper.service; enabled; vendor pre>
     Active: active (running) since Mon 2023-01-30 04:30:45 EST; 1min 12s ago
   Main PID: 5473 (java)
      Tasks: 32 (limit: 4679)
     Memory: 72.6M
        CPU: 2.811s
     CGroup: /system.slice/zookeeper.service
             └─5473 java -Xmx512M -Xms512M -server -XX:+UseG1GC -XX:MaxGCPauseM>
...

3. Install CMAK on Debian 11 (Kafka Manager)

CMAK (formerly Kafka Manager) is an open-source tool developed by Yahoo for managing Apache Kafka clusters. Clone CMAK from GitHub:

cd ~
sudo git clone https://github.com/yahoo/CMAK.git
Output
Cloning into 'CMAK'...
remote: Enumerating objects: 6542, done.
remote: Counting objects: 100% (266/266), done.
remote: Compressing objects: 100% (142/142), done.
remote: Total 6542 (delta 150), reused 187 (delta 112), pack-reused 6276
Receiving objects: 100% (6542/6542), 3.97 MiB | 7.55 MiB/s, done.
Resolving deltas: 100% (4211/4211), done.

4. Configure Cluster Manager for Apache Kafka

Make configuration changes in the CMAK config file:

sudo vi ~/CMAK/conf/application.conf

Change cmak.zkhosts="my.zookeeper.host.com:2181" to reflect your Zookeeper host(s). You can specify multiple Zookeeper hosts separated by commas, like this: cmak.zkhosts="my.zookeeper.host.com:2181,other.zookeeper.host.com:2181". The hostnames can be IP addresses. For this example, we’ll set it to:

cmak.zkhosts="localhost:2181"

Save and close the file.

Create a zip file for deploying the application:

cd ~/CMAK/
./sbt clean dist

This process downloads and compiles files, so it may take a while to complete.

When finished, you’ll see output similar to:

Output
[info] Your package is ready in /root/CMAK/target/universal/cmak-3.0.0.7.zip

Navigate to the directory containing the zip file and extract it:

cd /root/CMAK/target/universal
unzip cmak-3.0.0.7.zip
cd cmak-3.0.0.7

5. Access CMAK Service (Cluster Manager for Apache Kafka)

Run the Cluster Manager for Apache Kafka service:

bin/cmak

By default, it uses port 9000. Open your browser and go to http://ip-or-domain-name-of-server:9000. If your firewall is enabled, allow external access to the port:

sudo ufw allow 9000

You should see the CMAK interface.

Install Apache KAfka on Debian 11. Cluster Manager
Cluster Manager for Apache Kafka

Add Cluster From the CMAK

Add a cluster by clicking "cluster" and then "add cluster".

Add Cluster From Kafka Manager

Fill in the required details (Cluster Name, Zookeeper Hosts, etc.). If you have multiple Zookeeper hosts, separate them with commas.

Cluster Info

Create a Topic in the CMAK interface

From your newly added cluster, click on "Topic" and then "create". Input the necessary details for the new topic (Replication Factor, Partitions, etc.) and click "Create".

Create Topic

Click on Cluster View to see your topics.

Topic Summary

You can now add, delete, and configure topics as needed.

Conclusion

You have successfully learned how to Install Apache Kafka on Debian 11. Kafka finds extensive application in event-driven architectures, log aggregation, data analytics, and stream processing.

Alternative Solutions for Installing Apache Kafka on Debian 11

While the manual installation method described above is a solid approach, alternative solutions offer varying degrees of automation and ease of use. Here are two different ways to solve the problem of Install Apache Kafka on Debian 11.

1. Using Docker Compose

Docker Compose simplifies the deployment of multi-container Docker applications. This is an excellent option for setting up Kafka along with its dependencies (like Zookeeper) in a self-contained and reproducible environment.

Explanation:

Docker Compose uses a YAML file to define the services, networks, and volumes required for the application. This approach ensures consistency across different environments (development, testing, production). It also reduces the risk of dependency conflicts.

Code Example:

Create a docker-compose.yml file:

version: '3.7'

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - "2181:2181"

  kafka:
    image: confluentinc/cp-kafka:latest
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

  control-center:
    image: confluentinc/cp-control-center:latest
    depends_on:
      - kafka
    ports:
      - "9021:9021"
    environment:
      CONTROL_CENTER_BOOTSTRAP_SERVERS: 'kafka:9092'
      CONTROL_CENTER_ZOOKEEPER_CONNECT: 'zookeeper:2181'
      CONTROL_CENTER_REPLICATION_FACTOR: 1
      CONTROL_CENTER_INTERNAL_TOPICS_PARTITIONS: 1
      CONTROL_CENTER_MONITORING_INTERCEPTOR_TOPIC_PARTITIONS: 1
    restart: always

Steps:

  1. Install Docker and Docker Compose: If not already installed, follow the official Docker documentation to install Docker and Docker Compose on your Debian 11 system.
  2. Save the docker-compose.yml file: Save the above YAML file to a directory on your server.
  3. Run Docker Compose: Navigate to the directory containing the docker-compose.yml file and run the following command:

    docker-compose up -d

    This command will download the necessary images, create the containers, and start Kafka and Zookeeper in detached mode.

  4. Access Kafka: Kafka will be accessible on localhost:9092. Confluent Control Center can be accessed via localhost:9021, which provides a web UI for managing and monitoring Kafka.

2. Using Ansible Automation

Ansible is a powerful automation tool that can be used to automate the Install Apache Kafka on Debian 11 process. This is useful for managing multiple Kafka instances or deploying Kafka across a cluster of servers.

Explanation:

Ansible uses playbooks written in YAML to define the tasks to be executed on remote hosts. This allows you to automate the entire Kafka installation and configuration process, including installing dependencies, downloading Kafka, configuring Zookeeper, and starting the Kafka service.

Code Example:

Create an Ansible playbook (e.g., kafka_install.yml):

---
- hosts: kafka_servers
  become: true
  tasks:
    - name: Update apt cache
      apt:
        update_cache: yes

    - name: Install required packages
      apt:
        name:
          - default-jre
          - wget
          - git
          - unzip
          - default-jdk
        state: present

    - name: Create Kafka directory
      file:
        path: /usr/local/kafka-server
        state: directory
        owner: root
        group: root
        mode: '0755'

    - name: Download Kafka
      get_url:
        url: https://downloads.apache.org/kafka/3.3.2/kafka_2.13-3.3.2.tgz
        dest: /tmp/kafka_2.13-3.3.2.tgz

    - name: Extract Kafka
      unarchive:
        src: /tmp/kafka_2.13-3.3.2.tgz
        dest: /usr/local/kafka-server
        remote_src: yes
        extra_opts: [--strip-components=1]

    - name: Create Zookeeper systemd unit file
      template:
        src: templates/zookeeper.service.j2
        dest: /etc/systemd/system/zookeeper.service

    - name: Create Kafka systemd unit file
      template:
        src: templates/kafka.service.j2
        dest: /etc/systemd/system/kafka.service

    - name: Reload systemd daemon
      systemd:
        daemon_reload: yes

    - name: Enable and start Zookeeper
      systemd:
        name: zookeeper
        enabled: yes
        state: started

    - name: Enable and start Kafka
      systemd:
        name: kafka
        enabled: yes
        state: started

Create template files (templates/zookeeper.service.j2 and templates/kafka.service.j2) similar to the content provided in the original article for the systemd unit files.

Steps:

  1. Install Ansible: Install Ansible on your control machine (the machine from which you will run the playbook).

    sudo apt update
    sudo apt install ansible
  2. Configure Ansible Inventory: Create an Ansible inventory file (e.g., hosts) that lists the Kafka servers:

    [kafka_servers]
    kafka1 ansible_host=your_kafka_server_ip ansible_user=your_user ansible_password=your_password

    Replace your_kafka_server_ip, your_user, and your_password with the appropriate values for your Kafka server. You can also use SSH keys for authentication instead of passwords.

  3. Run the Playbook: Execute the Ansible playbook:

    ansible-playbook -i hosts kafka_install.yml

    This command will connect to the Kafka server(s) and execute the tasks defined in the playbook.

These alternative solutions offer different advantages. Docker Compose provides a simple and self-contained deployment, while Ansible offers powerful automation capabilities for managing multiple Kafka instances. The best choice depends on your specific requirements and infrastructure.

By understanding these different approaches, you can choose the method that best suits your needs for the Install Apache Kafka on Debian 11.