How to Speed Up Bash Scripts with Multithreading and GNU Parallel

choubertsprojects

The Best WordPress plugins!

1. WP Reset

2. WP 301 Redirects

3. WP Force SSL

Many Linux distributions come with the Bash shell script interpreter, but it can be a slow and inefficient way to run long-running processes. In this tutorial students will use GNU Parallel to speed up tasks executed in their bash scripts.

The “bash run scripts in parallel” is a process that allows users to speed up the execution of bash scripts by running them in parallel. This can be done through the use of multithreading.

How to Speed Up Bash Scripts with Multithreading and GNU Parallel

This guide is for you if your Bash scripts are taking an eternity to execute. Often, you may run many Bash scripts in simultaneously, greatly speeding up the process. How? With some useful GNU Parallel examples! Using the GNU Parallel tool, usually known as Parallel!

Parallel uses the notion of multi-threading to run Bash scripts in parallel. This program enables you to perform several tasks per CPU rather than just one, reducing the time it takes to run a script.

This article will teach you how to write multi-threading Bash scripts using a variety of GNU Parallel examples.

There will be a lot of hands-on demos in this course. If you want to follow along, you’ll need the following items:

  • A computer that runs Linux. Any kind of dissemination will suffice. Ubuntu 20.04 on Windows Subsystem for Linux is used in this tutorial (WSL).
  • I’m logged in as a user with sudo access.

The Ultimate Guide to the Linux Windows Subsystem (Windows WSL)

GNU Parallel Installation

To use multithreading to speed up Bash scripts, you must first install Parallel. So let’s get this party started by downloading and installing it.

1. Begin by opening a Bash terminal.

2. Download the Parallel package using wget. The following command installs the most recent version (parallel-latest) in the current working directory.

wget https://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2 https://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2

If you’d like to use an earlier version of GNU Parallel, all packages may be found on the official download site.

3. Now, to un-archive the item you just downloaded, use the tar command.

The program below utilizes the x flag to extract the archive, the j flag to denote that it is targeting a.bz2 archive, and the f flag to accept a file as input to the tar command. parallel-latest.tar.bz2 sudo tar -xjf

parallel-latest.tar.bz2 sudo tar -xjf

You should now have a parallel- directory with the month, day, and year of the most recent release.

4. Use cd to go to the package archive location. The package archive folder in this tutorial is named parallel-20210422, as seen below.

To go to the Parallel archive folder, navigate to it.To go to the Parallel archive folder, navigate to it.

5. Finally, use the following instructions to generate and install the GNU Parallel binary:

./configure mkdir mkdir mkdir mkdir m

Check if Parallel was installed successfully by looking at the version number.

Verifying GNU Parallel's version Verifying GNU Parallel’s version

When you initially start Parallel, you may encounter a handful of alarming lines that show something like perl: warning:. Parallel is unable to recognize your current locale and language settings, as shown by the warning warnings. But, for the time being, don’t be concerned about the warnings. Later on, you’ll learn how to remedy such warnings.

GNU Parallel Configuration

You may use Parallel right now now that it’s installed! However, before you begin, you must first setup a few basic options.

While continuing in your Bash terminal, consent to the GNU Parallel academic research authorization by giving the citation option followed by will cite and notifying Parallel that you will reference it in any academic study.

It is not necessary to agree to cite in order to use GNU Parallel if you do not wish to support GNU or its maintainers.

parallel —citation is a command that cites another command.

By executing the lines of code below, you may change the locale by changing the following environment variables. This isn’t a necessity for setting locale and language environment variables. GNU Parallel, on the other hand, looks for them every time it runs.

Parallel will always protest about the environment variables if they don’t exist, as you saw in the previous section.

This training assumes you are fluent in English. Other languages are also available.

export LC ALL=C man export LANGUAGE=en US export LANG=en US.UTF-8 import LC ALL=C import LC ALL=C import LC ALL=C import LC ALL=C import LC ALL=C

GNU Parallel's locale and language settingsGNU Parallel’s locale and language settings

Ad-Hoc Shell Commands Execution

Let’s get GNU Parallel up and running! To begin, you’ll master the fundamentals of syntax. After you’ve mastered the syntax, you’ll be able to move on to some useful GNU Parallel examples.

Let’s start with a super-simple example of just repeating the digits 1-5.

1. Run the following commands on your Bash terminal. Isn’t it thrilling? The numbers 1-5 are sent to the terminal by Bash using the echo command. If you placed each of these commands in a script, Bash would run them in order, waiting for the previous one to complete before running the next.

In this case, you’re running five commands in a short amount of time. But what if those commands were Bash scripts that were genuinely helpful but took an eternity to run?

echo 1 resound 2 echoes 3 echoes 4 echoes 5 echoes

Now, use Parallel to perform each of those commands at the same time, as seen below. Parallel performs the echo command and, as indicated by the:::, passes the arguments 1, 2, 3, 4, 5 to that command. The three colons notify Parallel that you’re using the command line rather than the pipeline to provide input (more later).

In the example below, you just gave Parallel a single command with no arguments. Parallel established a new process for each command, as it does in all Parallel examples, utilizing a separate CPU core.

# parallel echo::: from the command line 1, 2, 3, 4, and 5

All Parallel commands follow the syntax parallel [Options] <Command to multi-thread>.

3. Create a file named count file.txt to simulate parallel receiving input from the Bash pipeline, as shown below. The parameter you’ll supply to the echo command is represented by each number.

4. Now, as demonstrated below, use the cat command to read that file and provide the result to Parallel. The symbolizes each parameter (1-5) that will be supplied to Parallel in this example.

# cat count file.txt | parallel echo | from the pipeline

Example #1 of GNU ParallelExample #1 of GNU Parallel

Bash and GNU Parallel are compared.

Parallel may seem to be a cumbersome approach to perform Bash commands at the moment. However, the true value to you is the saved time. Remember that Bash will only operate on one CPU core at a time, but GNU Parallel will run on several CPU cores at the same time.

1. Create a Bash script named test.sh with the following code to show the difference between sequential and parallel Bash commands. Create this script in the same directory as you previously produced count file.txt.

The Bash script below reads the count file.txt file, sleeps for 1, 2, 3, 4, and 5 seconds, then ends by echoing the sleep duration to the terminal.

#!/bin/bash nums=$(cat count file.txt) # Look up num in $nums in count file.txt. # Begin a loop that sleeps $num for each line in the file. # Read the line and wait the specified amount of time echo $num # Print the full line

How to Use the Bash Shell Like a Pro with the Bash fc Command (Related)

2. Run the script again, this time using the time command to see how long it takes to finish. It will take 15 seconds to complete.

3. Now, do the same process with the time command, but this time using Parallel.

The command below does the same action, but instead of waiting for the first loop to finish before beginning the next, it runs one on each CPU core and starts as many as it can simultaneously.

count file.txt | parallel “sleep; echo” time cat count file.txt | parallel “sleep; echo”

The time each command took to finish is confirmed by the prompt on the right side of the terminal.The time each command took to finish is confirmed by the prompt on the right side of the terminal.

Learn about the Dry Run!

It’s now time to look at some more GNU Parallel examples in the real world. But first, you need familiarize yourself with the —dryrun flag. When you want to know what will happen without Parallel really doing it, this flag comes in useful.

Before executing a command that doesn’t perform as you expected, use the —dryrun parameter as a last sanity check. Unfortunately, if you submit a command that will hurt your system, GNU Parallel will simply help you harm it more quickly!

“rm rf rm rf rm rf rm rf rm rf rm

Example #1 of GNU Parallel: Downloading Files from the Web

You will download a list of files from different URLs on the internet for this assignment. These URLs may, for example, represent web pages, photos, or even a list of files from an FTP server.

For this example, you’ll use GNU parallel’s FTP server to download a list of archive packages (together with the SIG files).

1. Make a file named download items.txt and copy various download URLs from the official download site, separating them with a new line.

https://ftp.gnu.org/gnu/parallel/parallel-20120122.tar.bz2 https://ftp.gnu.org/gnu/parallel/parallel-20120122.tar.bz2.sig https://ftp.gnu.org/gnu/parallel/parallel-20120222.tar.bz2 https://ftp.gnu.org/gnu/parallel/parallel-20120222.tar.bz2.sig

You may save time by scraping all of the links from the download page using Python’s Beautiful Soup package.

2. Read all of the URLs from the download items.txt file and send them to Parallel, which will use wget to get each one.

parallel wget | cat download items.txt

Remember that the placeholder for the input string in a parallel command is!

3. You may want to limit the number of threads that GNU Parallel utilizes at any one time. Add the —jobs or -j argument to the command if this is the case. The —jobs argument restricts the number of concurrent threads to the number you provide.

To restrict Parallel to just downloading five URLs at a time, for example, the command would be:

wget wget wget wget wget wget wget wget wget wget wget wget wget wget wget wget wget wget wget wget wget

The —jobs argument in the preceding command may be changed to download any number of files as long as your machine has enough CPUs to process them.

4. To see how the —jobs argument works, change the task count and use the time command to see how long each run takes.

download items.txt | parallel —jobs 5 wget wget wget wget wget wget wget wget wget wget wget wget wget | parallel —jobs 10 time cat download items.txt

Unzipping Archive Packages (GNU Parallel Example #2)

Now that you’ve downloaded all of the archive files from the previous example, you’ll need to unarchive them.

Run the Parallel command while in the same directory as the archive packages. The usage of the wildcard (*) is noteworthy. You must instruct Parallel to only process.tar.bz2 files since this directory includes both archive packages and SIG files.

tar -xjf::: *.tar.bz2 sudo parallel

If you’re using GNU parallel interactively (rather than in a script), you may use the —bar switch to have Parallel display you a progress bar while the process runs.

The result of the —bar flag is shown.The result of the —bar flag is shown.

Removing Files (GNU Parallel Example #3)

If you followed examples one and two, your working directory should now contain a lot of folders using up space. So, let’s get rid of all of those files at the same time!

Using Parallel, list all of the folders using ls -d and pipe each of those folder paths to Parallel, performing rm -rf on each folder, as seen below.

Keep in mind the —dryrun flag!

ls -d parallel-*/ | parallel “rm -rf rm -rf rm -rf rm -rf rm -rf rm

Conclusion

With Bash, you can now automate processes and save a lot of time. It is entirely up to you what you do with that time. Saving time might mean anything from leaving work a bit earlier to reading another ATA blog article.

Consider the number of long-running scripts in your environment. Which ones can Parallel help you speed up?

The “gnu parallel examples” is a command that can be used to speed up bash scripts. The command will allow you to run multiple tasks simultaneously, which is useful for speeding up long-running tasks.

Related Tags

  • how to run multiple shell scripts in parallel
  • bash script parallel for loop
  • gnu parallel for loop
  • bash run in parallel and wait
  • linux parallel command

Table of Content