Mclone

Mclone is a utility for offline file synchronization utilizing the Rclone as a backend for doing actual file transfer.

Purpose

Suppose you have a (large amount of) data which needs to be either distributed across many storages or simply backed up. For example, consider a terabyte private media archive one can not afford to lose.

As the data gets periodically updated, there is a need for regular synchronization. When the use of online cloud storage is not an option due storage space or security reasons, the good ol' offline backing up comes back into play.

A sane backup strategy mandates the data copies to be physically separated - be it a next room (building, city or planet) computer or just an external drive. Or, even better, the two computers' storages - a primary, where all activity takes place, a mirror storage which holds the backup, and a portable storage (USB flash disc, external HDD or SSD - whatever) which serves as both an intermediate storage and a means of propagating the changes between the primary and the mirror.

In a more complex scenario there may be multiple one-way or two-way point-to-point data transfer routes between the storages, employing portable storage as a "shuttle" or a "ferry".

All in all the synchronization task boils down to copying or synchronizing the contents of two local directories. However, since portable storage is involved, the actual file paths may change between synchronizations as a storage can be mounted under different mount points on *NIX system or change the disk drive on Windows system.

While the Rclone itself is a great tool for local file synchronization, typing command line to be executed in this case becomes tedious and error prone where possible cost of error is a backup corruption due to wrong paths or misspelled flags.

This is where the Mclone comes in. It is designed to automatize the Rclone synchronization process by memorizing command line options and detecting proper source and destination locations wherever they are.

Installation

Mclone is written in Ruby language and is distributed in the form of the Ruby GEM.

Once the Ruby runtime is properly set, the Mclone itself is installed with

$ gem install mclone

Obviously, the Rclone installation is also required. The Mclone will use either the contents of the RCLONE environment variable if exists or look though the PATH environment variable to locate the rclone executable.

Once properly installed, the Mclone provides the mclone command line utility.

$ mclone -h

Basic use case

Let's start with the simplest case.

Suppose you have a data directory /data and you'd want to set up the backup of the /data/files subdirectory into a backup directory /mnt/backup. The latter may be an ordinary directory or a mounted portable storage, or whatever.

1. Create volumes

Mclone has a notion of a volume - a file system directory containing the .mclone file, which is used as a root directory for all Mclone operations.

By default, in order to detect currently available volumes the Mclone scans all mount points on *NIX systems and all available disk drives on Windows system. Additionally, a static volume directories list to consider can be specified in the MCLONE_PATH environment variable which is a PATH-like list of directories separated by the double colon : on *NIX systems or the semicolon ; on Windows system.

If the /data is a regular directory, it won't be picked up by the Mclone automatically, so it needs to be put into the environment for later reuse

export MCLONE_PATH=/data

On the other hand, if the /mnt/backup is a mount point for a portable storage, it will be autodetected, therefore there is no need to put it there.

Both source and destination endpoints have to "formatted" in order to be recognized as the Mclone volumes

$ mclone volume create /data
$ mclone volume create /mnt/backup

After that, mclone info can be used to review the recognized volumes

$ mclone info

# Mclone version 0.1.0

## Volumes

* [6bfa4a2d] :: (/data)
* [7443e311] :: (/mnt/backup)

Each volume is identified by the randomly generated tag shown within the square brackets [...]. Obviously, the tags will be different in your case.

2. Create a task

A Mclone task corresponds to a single Rclone command. It contains the source and destination volume identifiers, the source and destination subdirectories relative to the respective volumes, as well as additional Rclone command line arguments to be used.

There can be multiple tasks linking different source and destination volumes as well as their respective subdirectores.

A task with all defaults is created with

$ mclone task create /data/files /mnt/backup/files

Note that at with point there is no need to use the above volume tags as they will be auto-determined during task creation.

Again, use the mclone info to review the changes

# Mclone version 0.1.0

## Volumes

* [6bfa4a2d] :: (/data)
* [7443e311] :: (/mnt/backup)

## Intact tasks

* [cef63f5e] :: update [6bfa4a2d](files) -> [7443e311](files) :: include **

The output literally means: ready to process (intact) update cef63f5e task from the files source subdirectory of the 6bfa4a2d volume to the files destination subdirectory of the 7443e311 volume including ** all files and subdirectories.

Again, the task's tag is randomly generated and will be different in your case.

There are two kinds of tasks to encounter - intact and stale.

An intact task is a task which is fully ready for processing with the Rclone. As with the volumes, its tag is shown in the square brackets [...]

Conversely, a stale task is not ready for processing due to currently missing source or destination volume. A stale task's tag is shown in the angle brackets <...>. Also, a missing stale task's volume tag will also be shown in the angle brackets.

Thank to the indirection in the source and destination directories, this task will be handled properly regardless of the portable storage directory it will be mounted in next time provided that it will be detectable by the Mclone.

The same applies to the Windows system where the portable storage can be appear as different disk drives and yet be detectable by the Mclone.

3. Modify the task

Once a task is created, its source and destination volumes and directories get fixed and can not be changed. Therefore the only way to modify it is to start from scratch preceded by the task deletion with the mclone task delete command.

A task's optional parameters however can be modified afterwards with the mclone task modify command.

Suppose you'd want to change the operation mode from default updating to synchronization and exclude .bak files.

$ mclone task modify -m sync -x '*.bak' cef

This time the task is identified by its tag instead of a directory.

Note the mode and task's tag abbreviations: synchronize is reduced to sync (or it can be cut down further to sy) and the tag is reduced from full cef63f5e to cef for convenience and type saving. Any part of the full word can be used as an abbreviation provided it is unique among all other full words of the same kind otherwise the Mclone will bail out with error.

The abbreviations are supported for operation mode, volume and task tags.

Behold the changes

$ mclone info

# Mclone version 0.1.0

## Volumes

* [6bfa4a2d] :: (/data)
* [7443e311] :: (/mnt/backup)

## Intact tasks

* [cef63f5e] :: synchronize [6bfa4a2d](files) -> [7443e311](files) :: include ** :: exclude *.bak

4. Process the tasks

Once created all intact tasks can be (sequentially) processed with the mclone task process command.

$ mclone task process

If specific tasks need to be processed, their (possibly abbreviated) tags are specified as command line arguments

$ mclone task process cef

Technically, for a task to be processed the Mclone renders the full source and destination path names from the respective volume locations and relative paths and passes them along with other options to the Rclone to do the actual processing.

Thats it. No more need to determine (and type in) current locations of the backup directory and retype all those Rclone arguments for every occasion.

Advanced use case

Now back to the triple storage scenario outlined above.

Let S be a source storage from where the data needs to be backed up, D be a destination storage where the data is to be mirrored and P be a portable storage which serves as both an intermediate storage and a means of the S->D data propagation.

In this case the full data propagation graph is S->P->D.

1. Set up the S->P route

1.1. Plug in the P portable storage to the S's computer and mount it.

1.2. As shown in the basic use case, create S's and P's volumes, then create a S->P task.

1.3. Unplug P.

At this point S and P are now separated and each carry its own copy of the S->P task.

2. Set up the P->D route

2.1. Plug in the P portable storage to the D's computer and mount it.

Note that at this point the S->P is a stale task as D's computer knows nothing about S storage.

2.2. Create the D's volume, then create a P->D task. Note that P at this point already contains a volume and therefore must not be formatted.

2.3. Unplug P.

Now S and D are formatted and carry the respective tasks. P contains its own copies of both S->P and P->D tasks.

3. Process the S->P->D route

3.1. Plug in P to the S's computer and mount it.

3.2. Process the intact tasks. In this case it is the S->P task (P->D is stale at this point).

3.3. Unplug P.

P now carries its own copy of the S's data.

3.4. Plug in P to the D's computer and mount it.

3.5. Process the intact tasks. In this case it is the P->D task (S->P is stale at this point).

3.6. Unplug P.

VoilĂ ! Both P and D now carry a copy of the S's data.

There may be more complex data propagation scenarios with multiple source and destination storages utilizing the portable storage in the above way.

Consider a two-way synchronization between two storages with a portable ferry which carries and propagates data in both directions.

Encryption

Encryption is an essential part of the Mclone as it is all about handling portable storage which may by compromised while holding confidential data. Mclone fully relies on encryption capabilities of Rclone, that is an encrypted directory structure can be further treated with the Rclone itself.

The encryption operation in Mclone is activated during task creation time. The encryption mode is activated with -e or -d command line flag for encryption or decryption, respectively. It no either flag is specified, the encryption gets turned off.

When in encryption mode, Mclone recursively takes plain files and directories under the source root and creates encrypted files and directories under the destination root. Conversely, when in decryption mode, Mclone takes encrypted source root and decrypts it into the destination root. Mclone is set up to encrypt not only the files' contents but also the file and directory names themselves. The file sizes, modification times as well as some other metadata are not encrypted, though, as they are required for proper operation the file synchronization mechanism. Note that the encrypted root is a regular directory hierarchy (just with fancy file names) and thus can be treated as such.

Be wary that file name encryption has a serious implication on the file name length. The Rclone crypt documentation states the the individual file or directory name length can not exceed ~143 charactes (although bytes here would be more correct). As the Rclone accepts UTF-8 encoded names, this estimate generally holds true for the Latin charset only, where a character is encoded with a single byte. For non-Latin characters, which can be encoded with two or even more bytes, the maximum allowed name length be much lower. When Rclone encounters a file name too long to hold, it will refuse to process it.

Rclone employs symmetric cryptography to do its job, which requires some form of password to be supplied upon task creation. This is done by the -p command line flag, which specifies a plain text password used to derive the real encryption key. There is another password-related -t command line flag which can be used to directly specify an Rclone-obscured token. Once created, a task memorizes the encryption key on the unencrypted end of the source/destination volume pair, so there will be no need to pass it during the task processing.

Whats next

On-screen help

Every mclone (sub)command has its own help page which can be shown with --help option

$ mclone task create --help

Usage:
    mclone task new [OPTIONS] SOURCE DESTINATION

Parameters:
    SOURCE                     Source path
    DESTINATION                Destination path

Options:
    -m, --mode MODE            Operation mode (update | synchronize | copy | move) (default: "update")
    -i, --include PATTERN      Include paths pattern
    -x, --exclude PATTERN      Exclude paths pattern
    -d, --decrypt              Decrypt source
    -e, --encrypt              Encrypt destination
    -p, --password PASSWORD    Plain text password
    -t, --token TOKEN          Rclone crypt token (obscured password)
    -f, --force                Insist on potentially dangerous actions (default: false)
    -n, --dry-run              Simulation mode with no on-disk modifications (default: false)
    -v, --verbose              Verbose operation (default: false)
    -V, --version              Show version
    -h, --help                 print help

File filtering

The Mclone passes its include and exclude options to the Rclone. The pattern format is an extended glob (*.dat) format described in detail in the corresponding Rclone documentation section.

Dry run

The Mclone respects the Rclone's dry run mode activated with --dry-run command line option in which case no volume (.mclone) files are ever touched (created, overwritten) during any operation. The Rclone is run during task processing but in turn is supplied with this option.

Force mode

The Mclone will refuse to automatically perform certain actions which are considered dangerous, such as deleting a volume or overwriting existing task. In this case a --force command line option should be used to pass through.

Task operation modes

Update

  • Copy source files which are newer than the destination's or have different size or checksum.

  • Do not delete destination files which are nonexistent in the source.

  • Do not copy source files which are older than the destination's.

A default refreshing mode which is considered to be least harmful with respect to the unintentional data override.

Rclone command: copy --update.

Synchronize

  • Copy source files which are newer than the destination's or have different size or checksum.

  • Delete destination files which are nonexistent in the source.

  • Copy source files which are older than the destination's.

This is the mirroring mode which makes destination completely identical to the source.

Rclone command: sync.

Copy

  • Copy source files which are newer than the destination's or have different size or checksum.

  • Do not delete destination files which are nonexistent in the source.

  • Do not copy source files which are older than the destination's.

This mode is much like synchronize with only difference that it does not delete files.

Rclone command: copy.

Move

  • Copy source files which are newer than the destination's or have different size or checksum.

  • Do not delete destination files which are nonexistent in the source.

  • Do not copy source files which are older than the destination's.

  • Delete source files after successful copy to the destination.

Rclone command: move.

The end

Cheers,

Oleg A. Khlybov [email protected]