Developing Asuran (Week of 2020-03-01)

I am going to try to start doing these every week on Sunday, looking back on the past weeks work, and looking forward to what should be done in the next week. This is mostly to organize my thoughts, but also serves as a look into what work has been done.

This edition covers work completed between 2020-02-24 to 2020-03-01

Features Landed

Cargo features

The asuran_core library now has cargo features setup for the dependencies of the various compression, encryption, and hashing algorithms. A core feature has been defined supporting ZStd, AES-CTR, and Blake3, features have been setup for each encryption family, as well as features for supporting all the algorithms for each operations (e.g.. all-compression).

The enums in the core library still have variants for the algorithms whose support has not been compiled in, for compatibility purposes, however, attempting to perform operations on these variants will result in a run time panic.

asuran_core will also compile-time panic if there are no HMAC algorithm features selected, as the repository format fundamentally does not make sense without an HMAC.

The default feature for asuran_core includes all the features for all the supported algorithms.

Underpinning of the Universal Listing API

The core data structure of the universal object listing API has been written. I chose to implement this as a “flat” tree stored in a HashMap. Right now the Node entries are keyed by String, however it would probably be wise to change these to some sort of byte string, to support keying Nodes by arbitrary data. The current model, at the very least, conflicts with the definition of *nix paths as “any sequence of non-NUL bytes”.

I additionally added a field to the Archive definition for the listing associated with that archive.

I am still in the process of porting the Target interface to use the new listing API instead of the old strategy of using Vec<String>.

Asuran CLI overhaul

The asuran-cli binary crate was, more or less, completely rewritten from scratch, with much less jank. I used structopt to simplify argument and sub-command handling, and created modules for each separate command, which has generally simplified things. I still need to figure out how to get structopt to output the help for global options when you pass --help to a sub command.

The rewrite highlighted a pain point in the asuran API, namely, using tasks to insert objects into an archive involves lots of unnecessary cloning. Currently, it looks like this:

let mut repo = repo.clone();
let archive = archive.clone();
let backup_target = backup_target.clone();

task_queue.push(task::spawn(async move {
    (
	path.clone(),
	backup_target
	    .store_object(&mut repo, chunker.clone(), &archive, path.clone())
	    .await,
    )
}));

In addition to the general code cleanliness improvement, the CLI has also been updated to support the FlatFile backend, as well as the MultiFile backend.

I believe the route forward on this is to have BackupTarget implementations be required to store a reference to the repository and archive, and have a method on the BackupTarget Trait for spawning a task directly.

Bug Fixes

Fix key length for Encryption::NoEncryption

The NoEncryption variant of the Encryption was updated to have a non-zero key length. Some other places in the asuran library assume that encryption key has a non-zero length, such as the use of argon2 in the key encryption algorithm.

Even fully NoEncryption repositories still require some key material for HMAC generation and other tasks, so the best fix for this seemed to be pretending NoEncryption behaves like other encryption modes and giving it a non-zero key length.

Removed key from FlatFile constructor

This was mistakenly added as a reflection of the MultFile interface. Unlike MultFile, which requires the key to verify the manifest merkle tree on open, FlatFile performs no such operation and thus does not require the key.

Looking Forward

FlatFile Produces Absurdly Large Archives

At the moment, FlatFile is producing archives that are way too big. With the test 10GB data set I have been using, and the same general settings, MultiFile produces a 3.6GB archive, where FlatFile produces a 6.8GB archive. I need to do more debugging, but I believe this to be due to a bug in how I am determining the location to start the next segment. I believe the route forward is going to be to to have FlatFile write the segements into an in memory buffer, and use the size of that buffer to extract length information, rather than the current abuse of read.seek(SeekFrom::Current(0) that I am currently using.