How to install and use Duplicity to Automate Backups

Posted on

How to install and use Duplicity to Automate Backups

How to install and use Duplicity to Automate Backups

Duplicity is a powerful open-source backup tool that enables you to perform encrypted and incremental backups. It supports various backends for storing backup data, including local or remote file systems, FTP, SSH, WebDAV, and cloud storage services. Duplicity utilizes GnuPG for encrypting and signing backup archives, ensuring data security and integrity. This article will guide you through installing and using How to install and use Duplicity to Automate Backups for automated backups on Linux.

Installing Duplicity

Duplicity is readily available in the default repositories of most Linux distributions, simplifying the installation process.

On Debian/Ubuntu

$ sudo apt update
$ sudo apt install duplicity

On CentOS/RHEL

$ sudo yum install epel-release
$ sudo yum update
$ sudo yum install duplicity

On Arch Linux

$ sudo pacman -S duplicity

On Fedora

$ sudo dnf install duplicity

For other Linux distributions, consult your specific package manager.

Once installed, verify Duplicity’s availability by checking its version:

$ duplicity --version

Generating GPG Keys

Duplicity relies on GnuPG keys to encrypt and/or sign backup archives, enhancing security. Therefore, generating a keypair is essential.

Import the GPG tools package if not already installed:

$ sudo apt install gnupg

Generate a new keypair with RSA encryption:

$ gpg --gen-key

Choose the key type as "RSA and RSA" with a size of 4096 bits. Set an expiry period if desired.

Provide your details for user ID, such as name and email. Add a secure passphrase for the keys. This generates a public/private keypair.

List the keys to find the ID:

$ gpg --list-keys

Export the public key for backup. Substitute the ID appropriately:

$ gpg -a --export 1234ABCD > public.gpg

The public key needs to be transferred to any remote backend you intend to use, such as SSH or cloud storage. The private key should be kept securely on the local system to perform the backups.

Configuring Duplicity

Duplicity supports a variety of storage backends, such as local, SSH, FTP, WebDAV, etc. Let’s explore configuring for some common backends:

Local Filesystem

To backup to a local directory, set the backup destination URL like:

file:///home/user/backups

SSH

To backup to a remote system over SSH:

ssh://user@host//path/to/backup

This assumes you have SSH access setup between the systems.

Amazon S3

To backup to an Amazon S3 bucket:

s3://s3-bucket-name[/prefix]

The S3 credentials can be supplied via a ~/.boto config file or environment variables.

See the Duplicity S3 Backend documentation for details.

Google Cloud Storage

For backing up to Google Cloud Storage:

gs://cloud-storage-bucket[/prefix]

Authentication can be done by various means, including service account files, ADC JSON files, or environmental variables.

Refer to the GCS Backend section in the docs for more details.

Swift

To backup to an OpenStack Swift container:

swift://container_name[/prefix]

Authentication is through environment variables. See the Swift Backend documentation.

WebDAV

To use a WebDAV server as backend:

webdav[s]://hostname[:port]/path

Duplicity will prompt for username/password as required.

See WebDAV Backend for details.

In this manner, you can configure backups to a variety of storage endpoints. Now we are ready to create our first backup.

Creating Backups

With the backend URL configured, we can now make a full backup.

Set the following environment variable to avoid interactive prompts:

export PASSPHRASE=your_passphrase

Then run a full backup:

$ duplicity /path/to/source dir file:///path/to/destination

This will recursively backup the /path/to/source directory to local destination /path/to/destination.

To backup to a remote location over SSH:

$ duplicity /path/to/source ssh://user@host//backup/path

For backing up to cloud storage:

$ duplicity /local/source s3://s3-bucket[/prefix]

Duplicity will prompt for any required credentials like SSH password or AWS secret keys. The backup will be encrypted and stored in the destination.

To backup certain folders only, specify them as include paths. For example:

$ duplicity include /path/to/folders1 include /path/to/folders2 /path/to/source file:////path/to/destination

This will only backup the specified folders from the source.

After the initial full backup, consecutive backups will be incremental. This saves time and storage space. Duplicity uses librsync to efficiently determine changed content.

To force a full backup instead of incremental, use the --full-if-older-than option:

$ duplicity --full-if-older-than 60D /path/to/source ssh://user@host//path/to/backup

This will do a full backup if the last full backup is older than 60 days.

Backup Scheduling with Cron

We can automate the Duplicity backups using Cron jobs.

Open the crontab for editing:

$ crontab -e

Add a cron schedule like:

0 1 * * * /usr/bin/duplicity /path/to/source ssh://user@host//backup/path

This will run a backup job every day at 1 AM.

For weekly backups:

0 1 * * 0 /usr/bin/duplicity /path/to/source ssh://user@host//backup/path

This will trigger the backup every Sunday at 1 AM.

Similarly, you can schedule monthly, yearly backups, etc.

For more granular control, you can trigger separate full and incremental jobs:

0 1 * * * /usr/bin/duplicity --full-if-older-than 30D /path/to/source ssh://user@host//path/to/full/backup
0 */4 * * * /usr/bin/duplicity /path/to/source ssh://user@host//path/to/incr/backup

This will run full backup on 1st of month and incremental every 4 hours.

Remember to redirect output if the jobs produce a lot of output to avoid cron email spam.

Restoring Backups

To restore the latest backup version:

$ duplicity restore ssh://user@host//backup/path /local/restore/path

This will restore the backup available in the remote location to the specified local path.

To restore an earlier version from a specific date:

$ duplicity restore --time 2020-01-01T12:30:00 ssh://user@host//backup/path /local/restore/path

List all stored backup versions:

$ duplicity collection-status ssh://user@host//backup/path

Delete old backups:

$ duplicity remove-older-than 6M --force ssh://user@host//backup/path

This will delete all backup versions older than 6 months.

In this manner, you can manage your backed-up archives, restoring selected versions whenever required.

Duplicity on Mac with Homebrew

On MacOS, Duplicity can be installed via Homebrew:

$ brew install duplicity

Usage remains the same as on Linux:

$ duplicity /path/to/source /path/to/destination

Schedule cron backup similarly using the native crontab:

$ crontab -e

Duplicity on Windows

Duplicity can be installed on Windows using the Cygwin Linux environment.

First install Cygwin with the rsync and python packages.

Then install Duplicity via pip:

C:> pip install duplicity

Now you can use Duplicity to backup files locally or to remote Windows shares:

C:> duplicity C:UsersuserDocuments E:Backups

Automate the scheduled backups using the Task Scheduler.

A Windows native port of Duplicity called cwDup is also available, though with reduced functionality.

Duplicity Best Practices

Here are some best practices to follow when using Duplicity:

  • Regularly test your backups: Perform test restores to ensure that your backups are working correctly and that you can recover your data when needed.
  • Store your backups off-site: This will protect your backups from physical damage or theft. Cloud storage is a good option for off-site backups.
  • Encrypt your backups: This will protect your data from unauthorized access.
  • Use strong passwords: Use strong passwords for your GPG keys and your backup storage accounts.
  • Monitor your backups: Regularly check your backup logs to ensure that your backups are running correctly.
  • Automate the backup process use a tool such as cron or systemd timers to schedule backups automatically.

Conclusion

Duplicity is a robust open-source solution for encrypted incremental backups. It provides a lot of flexibility in backend storage options. Utilizing GPG encryption allows for secure transfers and storage of backup archives. How to install and use Duplicity to Automate Backups provides a safe and secure solution for data.

With this guide, you should now be able to set up automated Duplicity backup jobs to a local or remote location. Storing backups off-site or in cloud storage protects against local disasters and provides redundancy.

Regular testing and validation of backups ensures your data is protected when needed for restores. Following best practices around security, validation, and monitoring helps maintain robust backups. We’ve shown you How to install and use Duplicity to Automate Backups, now let’s look at some alternatives.

Alternative Solutions to Automated Backups

While Duplicity is a great tool, here are two alternative approaches to achieving automated backups:

1. Using rsync with Encryption and Compression

rsync is a versatile command-line tool that excels at synchronizing files and directories. While it doesn’t inherently provide encryption like Duplicity, you can combine it with tools like gpg or openssl for encryption and tar or gzip for compression, creating a secure and efficient backup solution.

Explanation:

  • rsync: Handles the efficient transfer of files, only copying changes since the last backup.
  • gpg or openssl: Encrypts the backup archive, protecting sensitive data.
  • tar or gzip: Creates a compressed archive of the files, reducing storage space.

Code Example:

#!/bin/bash

# Configuration
SOURCE="/path/to/source/directory"
DESTINATION="/path/to/backup/directory"
DATE=$(date +%Y%m%d)
ARCHIVE="$DESTINATION/backup_$DATE.tar.gz.gpg"
PASSPHRASE="your_secret_passphrase"

# Create a tar archive and compress it
tar -czvf - "$SOURCE" |

# Encrypt the archive with GPG
gpg --symmetric --cipher-algo AES256 --passphrase "$PASSPHRASE" --output "$ARCHIVE"

# Remove the temporary archive (optional)
# rm "$ARCHIVE.tar.gz"

echo "Backup created: $ARCHIVE"

Advantages:

  • Simplicity: rsync is relatively easy to understand and use.
  • Efficiency: rsync‘s incremental transfer significantly reduces backup time and bandwidth.
  • Flexibility: You have more control over the encryption and compression methods.

Disadvantages:

  • More complex scripting: Requires writing a script to handle encryption and archiving.
  • Manual key management: You need to manage the encryption keys yourself.
  • No built-in versioning: Versioning needs to be implemented manually.

2. Leveraging Cloud-Based Backup Services

Several cloud-based backup services offer automated backups with encryption, versioning, and other advanced features. Examples include Backblaze, Carbonite, and cloud provider specific solutions like AWS Backup or Azure Backup.

Explanation:

These services provide a client application that automatically backs up your data to their cloud storage. They handle encryption, versioning, and data redundancy, simplifying the backup process.

Code Example (Conceptual – varies by service):

While there’s no code to directly manage backups in the same way as duplicity or rsync, most cloud backup services provide a command-line interface (CLI) or API for managing settings and monitoring backup status. For example, using the AWS CLI:

# Example - check the status of a backup job (conceptual)
aws backup list-backup-jobs --by-resource-arn arn:aws:ec2:us-west-2:123456789012:instance/i-0abcdef1234567890

Advantages:

  • Ease of use: Cloud backup services are generally very easy to set up and use.
  • Automatic backups: Backups are performed automatically in the background.
  • Off-site storage: Your backups are stored off-site in the cloud, protecting them from local disasters.
  • Versioning: Most services offer versioning, allowing you to restore previous versions of your files.
  • Encryption: Data is typically encrypted both in transit and at rest.

Disadvantages:

  • Cost: Cloud backup services can be expensive, especially for large amounts of data.
  • Reliance on internet connection: You need a reliable internet connection to back up and restore your data.
  • Privacy concerns: You are trusting a third-party with your data.
  • Vendor lock-in: Switching providers can be challenging.

These alternatives offer different trade-offs between simplicity, control, and cost. The best solution for you will depend on your specific needs and requirements. Consider the level of control you need, your technical expertise, and your budget when choosing a backup solution. Regardless of the method you choose, regularly testing your backups is crucial to ensure data recoverability.

Leave a Reply

Your email address will not be published. Required fields are marked *