How to Use Tar in Linux (Ubuntu, CentOS, Red Hat)

Posted on

How to Use Tar in Linux (Ubuntu, CentOS, Red Hat)

How to Use Tar in Linux (Ubuntu, CentOS, Red Hat)

Tar is a fundamental utility in Linux and other UNIX-like operating systems, designed for archiving and compressing files. It allows you to consolidate multiple files and directories into a single .tar archive file, meticulously preserving permissions and directory structures. This makes Tar an indispensable tool for various tasks.

Tar is exceptionally useful for data backup, efficiently transferring groups of files between systems, and preparing source code distributions. This comprehensive guide delves into the essentials of using Tar, exploring both basic operations and more advanced features with practical examples.

An Overview of Tar

Tar stands for Tape ARchive. Its origins lie in archiving data onto tape drives for long-term storage. While writing archives to tape drives is still possible, Tar is now more commonly used with regular files and pipes.

Key characteristics of Tar:

  • It’s a command-line utility, meaning it’s operated through text-based commands.
  • It bundles multiple files and directories into a single archive.
  • It preserves file permissions, ownership, and timestamps.
  • It can be used for compression (using gzip, bzip2, etc.) or archiving without compression.
  • It’s a standard UNIX utility, so it’s available on most Linux and macOS systems.

Let’s define some common Tar terms:

  • Archive: A single file containing multiple files and directories.
  • Compression: Reducing the size of the archive to save storage space.
  • Extraction: Recovering the original files and directories from an archive.
  • Verbose: Displaying detailed information about the archiving or extraction process.

With an understanding of Tar terminology and basics, let’s examine usage examples.

Creating Archives

To create a new tar archive, use the tar -cvf command. This command breaks down as:

  • tar: The command itself.
  • -c: Create an archive.
  • -v: Verbose output (optional, shows files being added).
  • -f: Specify the archive filename.

For example:

$ tar -cvf archive.tar /path/to/folder

This archives the given folder recursively into archive.tar. You can provide multiple file/folder paths to add multiple entries:

$ tar -cvf archive.tar /path/one /path/two /path/three

To compress the archive using gzip, use -czvf instead of just -cvf:

$ tar -czvf archive.tar.gz /path/to/folder

For bzip2 compression, use -cjvf:

$ tar -cjvf archive.tar.bz2 /path/to/folder

You can control the verbose output using -v. Omit it to hide the file listings:

$ tar -cf archive.tar /path/to/folder

Viewing Archive Contents

To view files contained within a tar archive without extracting it, use:

$ tar -tf archive.tar

The -t lists the contents.

For compressed archives, you need to add the compression flag:

$ tar -tzf archive.tar.gz
$ tar -tjf archive.tar.bz2

Extracting Archives

To extract an archive, use -xf:

$ tar -xf archive.tar

This extracts the contents of archive.tar in the current directory while preserving permissions and attributes.

For compressed archives:

$ tar -xzf archive.tar.gz
$ tar -xjf archive.tar.bz2

You can extract to a specific directory using -C:

$ tar -xf archive.tar -C /tmp/extract-here

This extracts the archive into /tmp/extract-here.

Appending to Archives

You can append files/directories to an existing tar archive using -rvf instead of -cvf:

$ tar -rvf archive.tar /new/folder

This adds /new/folder recursively to archive.tar without affecting existing contents.

Updating Archives

To update existing files in an archive or add new files, use -uvf:

$ tar -uvf archive.tar /path/to/update

This adds any new files under /path/to/update, and replaces any existing files in the archive with the updated versions.

Deleting from Archives

Deleting files from tar archives involves creating a new archive without those files.

First, extract the archive contents to a temporary location:

$ mkdir /tmp/archive-temp
$ tar -xf archive.tar -C /tmp/archive-temp

Then delete the file you want removed from the temporary folder.

Finally, create a new archive from the temporary folder:

$ tar -cf new-archive.tar /tmp/archive-temp

new-archive.tar will now contain the archive contents minus the deleted file.

Excluding Files/Paths

To exclude certain files/paths when creating an archive, use --exclude:

$ tar -cvf archive.tar /path --exclude=/path/to/exclude

This prevents /path/to/exclude from being added to archive.tar.

You can have multiple --exclude options. For example, to exclude all .log files:

$ tar -cvf archive.tar /path --exclude=*.log

Including Only Matched Paths

Instead of excluding certain paths, you can choose to only include matches using -T:

$ tar -cvf archive.tar -T include-list.txt

Where include-list.txt contains patterns. Patterns like *.py can be used to only match certain extensions.

Compression Options

By default, tar often uses gzip for compression. You can specify different algorithms:

  • -z: gzip compression (.tar.gz or .tgz)
  • -j: bzip2 compression (.tar.bz2 or .tbz2)
  • -J: xz compression (.tar.xz)
  • --lzma: lzma compression (.tar.lzma)

For example:

$ tar -cjf archive.tar.bz2 /path  # bzip2 compression

You can also set the compression level, which usually ranges from 1 to 9 (higher = better compression but slower):

$ tar -czf -9 archive.tar.gz /path  # gzip level 9

Archive Verification

Once an archive is created, you can verify it has not been corrupted or altered using -W:

$ tar -Wvf archive.tar

This checks the integrity of the archive.

For compressed archives, add the compression flag as usual:

$ tar -Wzvf archive.tar.gz

Tar Over Pipes and Remote Access

Tar can read/write archives locally or remotely via stdin/stdout pipes.

For example, to create a tar over SSH:

$ ssh user@host 'tar -cf - /path/to/archive' | tar -xvf -

This pipes the tar output over SSH to extract locally.

You can also extract an archive and pipe it over SSH for remote extraction:

$ tar -cf - /path/to/archive | ssh user@host 'tar -xvf -'

Piping tar through SSH compressing/decompressing can significantly speed up transfers:

$ tar czf - /path/to/archive | ssh user@host 'tar xvzf -'

These are just some examples – tar gives you a lot of flexibility with pipes.

Splitting/Spanning Archives

If your archive does not fit onto a single volume like a tape drive or disk, you can split tar archives into multiple chunks.

To split by size:

$ tar -cvf - --tape-length=1G /path | split -b 1G - archive.tar.

This splits archive.tar into 1GB chunks named archive.tar.01, archive.tar.02, etc.

You can also split by number of chunks:

$ tar -cvf - /path | split -b 100m -d -a 5 - archive.tar.

This splits the archive into 5 parts (-a 5) named archive.tar.01, archive.tar.02, … archive.tar.05.

To reconstruct the archive from the chunks, use cat to concatenate them back in order:

$ cat archive.tar.0* > archive.tar

Then extract as usual with tar -xf archive.tar.

Archiving Special Files

Tar can handle special files like:

  • Symbolic Links: By default, tar archives the links themselves, not the files they point to. Use the -h option to archive the linked files instead.
  • Device Files: Tar can archive device files, but restoring them requires root privileges and careful consideration.
  • Named Pipes (FIFOs): Tar can archive named pipes.

Refer to the tar documentation for details on these and other specialty archiving options.

Useful Tar Flags/Examples

Here is a quick reference of some useful tar flags and operations:

# Create archive
$ tar -cf archive.tar /path/to/files
# Compressed archive
$ tar -czf archive.tar.gz /path/to/files
# View archive contents
$ tar -tf archive.tar
# Extract archive
$ tar -xf archive.tar
# Extract to specific folder
$ tar -xf archive.tar -C /tmp
# Append files to archive
$ tar -rvf archive.tar file1 file2
# Update files in archive
$ tar -uvf archive.tar file1
# Delete file from archive
$ tar --delete -f archive.tar file_to_delete
# Archive a remote folder over SSH
$ ssh user@host 'tar -cf - /path/to/archive' | tar -xvf -
# Verify archive integrity
$ tar -Wvf archive.tar
# Compression levels 1-9
$ tar -czf -9 archive.tar.gz /path
# Split archive into chunks
$ tar -cf - /path | split -b 100m -d -a 5 - archive.tar.

This covers a wide range of Tar usage examples. Be sure to refer to the man pages for your specific tar implementation for more details and supported flags.

Alternative Solutions for Archiving and Compression

While tar is a powerful and versatile tool, other solutions exist for archiving and compression in Linux. Here are two alternatives:

1. Using zip and unzip

The zip and unzip utilities offer a more straightforward approach to archiving and compression, particularly for users familiar with the ZIP format commonly used in other operating systems.

Explanation:

zip creates archives in the ZIP format, which includes compression. It’s generally easier to use for simple archiving tasks compared to tar, especially when compatibility with Windows systems is a concern. unzip extracts files from a ZIP archive.

Code Example:

To create a ZIP archive:

zip -r archive.zip /path/to/folder

The -r option makes the command recursive, including all files and subdirectories within /path/to/folder.

To extract a ZIP archive:

unzip archive.zip -d /tmp/extract-here

The -d option specifies the destination directory for the extracted files.

Benefits of using zip:

  • Widely supported and understood format.
  • Simpler syntax for basic archiving and compression.
  • Good compatibility with other operating systems, especially Windows.

Limitations:

  • May not preserve all file permissions and ownership information as accurately as tar.
  • Less flexible for advanced archiving scenarios compared to tar.

2. Using p7zip (7-Zip for Linux)

p7zip is the command-line version of the popular 7-Zip archiver, known for its high compression ratios and support for various archive formats.

Explanation:

p7zip offers significantly better compression than gzip or bzip2 in many cases, making it ideal for creating smaller archives when storage space is a primary concern. It also supports a wide range of archive formats, including 7z, ZIP, GZIP, BZIP2, and TAR.

Code Example:

To create a 7z archive:

7z a -t7z archive.7z /path/to/folder

The a option adds files to the archive. The -t7z option specifies the 7z format.

To extract a 7z archive:

7z x archive.7z -o/tmp/extract-here

The x option extracts files with full paths. The -o option specifies the output directory.

Benefits of using p7zip:

  • Excellent compression ratios, resulting in smaller archive sizes.
  • Support for a wide variety of archive formats.
  • Password protection for archives.

Limitations:

  • Slower compression and extraction speeds compared to gzip or bzip2.
  • The 7z format might not be as universally supported as ZIP or TAR.
  • May require installing the p7zip package, as it’s not always included by default in Linux distributions.

Conclusion

Tar is an indispensable tool for managing groups of files on Linux/UNIX systems. It lets you bundle any number of files, directories, and special files into a single portable archive that preserves permissions and attributes, and has many everyday uses for file backups, transfers, Docker and CI/CD workflows, and software distribution. It is a standardized UNIX utility guaranteed to be available on Linux and macOS systems.

This guide provides a comprehensive overview of Tar and its effective use for archive management on Linux systems like Debian, Ubuntu 18.04 / 20.04 / 22.04, CentOS 7 / 8, or Red Hat 7.

If you have any questions or require further clarification, please post them in the comments.