Apache Kafka is a distributed, high-throughput messaging system based on the concept of Streaming Data. This tutorial will teach you how to use Apache Kafka in your project through hands-on examples.
Apache Kafka is a distributed streaming platform that provides high throughput and low latency for data processing pipelines. The Apache Kafka Tutorial will teach you how to use Apache Kafka in your application.
Do you need a streaming platform that can manage a lot of data? On Linux, you’ve almost certainly heard of Apache Kafka. Apache Kafka is ideal for real-time data processing and is gaining in popularity. Apache Kafka Installation on Linux might be difficult, but don’t worry; this article will walk you through it.
You’ll learn how to install and setup Apache Kafka in this article so you can start processing data like a pro, increasing the efficiency and productivity of your organization.
Continue reading to learn how to use Apache Kafka to stream data.
This will be an interactive presentation. Make sure you have the following items if you want to follow along.
- A Linux computer — Debian 10 is used in this demonstration, but any Linux distribution will work.
- A non-root user with sudo rights that is required to run Kafka and is termed kafka in this tutorial.
- A dedicated Kafka sudo user — This tutorial makes use of the sudo user kafka.
- Java is a requirement for Apache Kafka installation.
- Git is used in this tutorial to obtain the Apache Kafka Unit files.
Apache Kafka Installation
You must first install Apache Kafka on your workstation before you may stream data. You may install Kafka without fear of harming your system since you have a special account for it.
1. Create the /home/kafka/Downloads directory using the mkdir command. You may name the directory anything you like, but for this demonstration, it will be named Downloads. The Kafka binaries will be stored under this directory. This operation guarantees that the kafka user has access to all of your Kafka files.
2. Next, execute the apt update command to update the package index on your system.
When asked, provide the password for your kafka user.
apt Package Manager must be updated.
3. Use the curl command to download Kafka binaries from the Apache Foundation website and save them to your /Downloads directory as a binary file (kafka.tgz). This binary file will be used to install Kafka.
Make sure to use the newest version of Kafka binaries instead of kafka/3.1.0/kafka 2.13-3.1.0.tgz. The current Kafka version is 3.1.0 as of this writing.
“https://dlcdn.apache.org/kafka/3.1.0/kafka 2.13-3.1.0.tgz” curl “https://dlcdn.apache.org/kafka/3. -o kafka.tgz/Downloads
Kafka binaries download
How to Use Python Wget to Download Files (A Curl Alternative)
4. Now use the tar command to extract (-x) the Kafka binaries (/Downloads/kafka.tgz) into the kafka directory that was generated automatically. The tar command’s options accomplish the following:
The tar command’s options accomplish the following:
- -v — This option instructs the tar program to display a list of all files as they are extracted.
- -z — This option instructs the tar program to gzip the archive as it is uncompressed. This behavior is not necessary in this scenario, but it is a great choice if you need to transport a compressed/zipped file quickly.
- -f – Instructs the tar command to extract a specific archive file.
- -strip 1 -Tells tar to remove the first level of directories from your file name list. As a consequence, a subfolder called kafka will be created automatically, including all of the extracted contents from the /Downloads/kafka.tgz file.
—strip 1 tar -xvzf /Downloads/kafka.tgz
kafka.tgz Binary File Extraction
Apache Kafka Server Configuration
You’ve downloaded and installed the Kafka binaries to your /Downloads directory at this point. You can’t utilize the Kafka server just yet since you can’t remove or alter any topics, which is a crucial category for organizing log messages.
You must change the Kafka configuration file (/etc/kafka/server.properties) to setup your Kafka server.
1. In your favourite text editor, open the Kafka configuration file (/etc/kafka/server.properties).
2. Next, near the bottom of the /kafka/config/server.properties file text, add the delete.topic.enable = true line, save the modifications, and shut the editor.
This configuration parameter allows you to remove or edit subjects, so make sure you understand what you’re doing before you delete them. When you remove a subject, you also erase its divisions. Once the partitions are gone, any data saved on them is no longer accessible.
If there are spaces at the start of each line, the file will not be recognized, and your Kafka server will not function.
Setting up your Kafka Server
3. Usage git to clone the ata-kafka project to your own workstation and alter it for use as a unit file for your Kafka service.
https://github.com/Adam-the-Automator/apache-kafka.git sudo git clone
the ata-kafka project is being cloned
To go inside the apache-kafka directory and list the files within, use the instructions below.
Now that you’ve navigated to the ata-kafka directory, you’ll see two files: kafka.service and zookeeper.service, as shown below.
Looking through the ata-kafka directory
5. In your chosen text editor, open the zookeeper.service file. This file will be used as a template for the kafka.service file.
As required, customize each component of the zookeeper.service file. This sample, however, utilizes the file exactly as it is.
- This unit’s starting settings are configured in the [Unit] section. This section instructs systemd on how to start the zookeeper service.
- The [Service] part of the kafka-server-start.sh script specifies how, when, and where the Kafka service should be started. The name, description, and command-line parameters (what comes after ExecStart=) are all defined in this section.
- When entering multi-user mode, the [Install] section specifies the runlevel to start the service.
The zookeeper.service file is being viewed.
6. In your favourite text editor, open the kafka.service file and customize how your Kafka server appears as a systemd service.
The default settings in the kafka.service file are used in this sample, but you may change them as required. This file refers to the zookeeper.service file, which you may want to change at some time.
Viewing the file kafka.service
7. To start the Kafka service, use the following command.
systemctl start kafka sudo
Keep in mind that your Kafka server should be stopped and started as a service. If you don’t, the process will stay in memory, and the only way to terminate it is to kill it. If you have subjects that are being written or changed as the process shuts down, you may lose data.
You can also use the instructions below to stop or restart your systemd-based Kafka server now that you’ve generated the kafka.service and zookeeper.service files.
sudo systemctl kafka stop sudo systemctl kafka restart
Systemd services may be controlled using Ubuntu systemctl.
8. Verify that the service has started up correctly using the journalctl command below.
This command displays all of the kafka service’s logs.
If everything is set up successfully, you should get a message saying “Started kafka.service,” as seen below. Congratulations! You now have a fully working Kafka server with systemd services.
Kafka Service Validation Successfully launched
Kafka User Restrictions
The Kafka Service is now running as the kafka user. Users connecting to Kafka should not see the kafka user since it is a system-level user.
Any client connecting to Kafka via this broker will have root-level access to the broker machine, which is not desirable. To reduce the risk, delete the kafka user from the sudoers file and deactivate the kafka user’s password.
1. To return to your usual user account, use the exit command shown below.
Returning to your Regular User Account
2. Run sudo deluser kafka sudo and press Enter to indicate you wish to remove the kafka user from sudoers.
The Kafka user has been removed from sudoers.
3. To deactivate the password for the kafka user, run the command below. This increases the security of your Kafka installation even further.
The password for the kafka user has been disabled.
4. To remove the kafka user from the sudoers list, repeat the following command.
The kafka user has been removed from the sudoers list.
5. Use the su command to restrict access to the kafka user to only permitted users such as root.
Adding root users as kafka users to perform commands
6. To ensure that your Kafka server is up and functioning, execute the following command to establish a new Kafka topic called ATA.
Kafka topics are message feeds to/from the server that assist to reduce the problems associated with unstructured and disorganized data in Kafka Servers.
cd /usr/local/kafka-server && bin/kafka-topics.sh –create –bootstrap-server localhost:9092 –replication-factor 1 –partitions 1 –topic ATA
Making a Kafka Topic (ATA)
7. Using the kafka-console-producer.sh script, run the following command to build a Kafka producer. Data is written to topics by Kafka producers.
echo “Hello World, this sample provided by ATA” | bin/kafka-console-producer.sh –broker-list localhost:9092 –topic ATA > /dev/null
Making a Producer for Kafka
8. Finally, using the kafka-console-consumer.sh script, execute the following command to build a kafka consumer. This command consumes all messages in the kafka topic (—topic ATA) and displays the message value.
—bootstrap-server localhost:9092 —topic ATA —from-beginning bin/kafka-console-consumer.sh
Because your messages are displayed by the Kafka console consumer from the ATA Kafka topic, you’ll see the message in the output below. At this moment, the consumer script is still running and waiting for further messages.
You may use another terminal to post new messages to your subject, and then click Ctrl+C to terminate the consumer script.
You should test your Kafka server.
You’ve learnt how to install and setup Apache Kafka on your PC in this guide. You’ve also mentioned ingesting messages from a Kafka topic created by the Kafka producer, which leads to better event log management.
Why not use your gained knowledge to better distribute and manage your communications by deploying Kafka with Flume? You may also use Kafka’s Streams API to create apps that read and publish data to the system. As a result, data is transformed as required before being written to another system such as HDFS, HBase, or Elasticsearch.
Apache Kafka is a distributed streaming platform that allows you to process and store large amounts of data in real-time. This tutorial will teach you how to use Apache Kafka with an example. Reference: kafka tutorial with example.
- apache kafka tutorial pdf
- kafka learning resources
- confluent kafka tutorial for beginners
- kafka streams tutorial
- kafka tutorial python