Developing Asuran (Week of 2020-03-08)

This week was a bit slow on actual development on asuran proper, there was mostly a lot of documentation changes, and work on an external library for cross-platform sparse file handling.

Features Landed

New listing API

The universal listing API, introduced in last weeks post, is now used by the Target api. This increases the flexibility of Archives and also comes with a mild performance boost!. This was by far the hardest of the actual features this week. A lot of work had to be done to undo the bad assumptions I had made when the listings were just a Vec<String>.

Backend API now speaks in Chunks

The Backend trait family has been updated to talk about Chunk rather than Vec<u8>. This greatly reduces cognitive complexity, by allowing the backend API to take care care of all the serialization tasks. which should have been its responsibility all along.

Tasked Segment struct removal

The TaskedSegment struct in the backend API was no longer used, with the introduction of the SyncBackend trait and wrapper, and has been removed.

Segment API brought up to date

The Segment struct was originally written under the assumption that the backend API would need a priori knowledge of a chunks size to be able to read it. This turned out to not be the case, and the API needed to be updated to reflect this change.

Bug Fixes

Replaced buggy roll my own semver parser

The asuran crate contained a less-than-ideal semver parser, that was used to parse the current semver passed by cargo, for use in generating headers for segment files and what not. The previous parser was fundamentally too simple to work, and broke down when given the “x.y.z-dev” type semvers I am now using for “in between” versions.

I have resolved this issue by pulling in the semver crate, and using it to parse the semver string passed by cargo. I will also use this crate in the future for doing comparisons against segment headers to detect version incompatibilities.

Documentation Update

This was, by far, the most time consuming task I underwent this week. I audited all the rustdoc and comments, in all the modules of all the crates, and made sure they were accurate and only contained up to date information. Asuran isn’t as well documented as I would like, but still has a lot of comments and external documentation.

As I am writing this (after the completion of the doc audit), the asuran repository has one line of comment for about every 4 lines of code, taking a look at this tokei report:

-------------------------------------------------------------------------------
 Language            Files        Lines         Code     Comments       Blanks
-------------------------------------------------------------------------------
 Markdown                5          493          493            0            0
 Rust                   51         8909         6502         1656          751
 TOML                   11          195          173            4           18
 YAML                    1           12           12            0            0
-------------------------------------------------------------------------------
 Total                  68         9609         7180         1660          769
-------------------------------------------------------------------------------

In addition to this, the Internals document on the Asuran website had drifted out of sync due to changes I have made to the repository format as I worked, and required substantial updating.

There is still a substantial amount of documentation to be written, but at least what does exist is up to date and correct now.

Sparse Files and Hole-Punch

Sparse Files are really difficult to handle in a cross platform way.

Sparse files are, fundamentally, a beautiful optimizing hack that make everyone’s lives easier, but as hacks in general go, no-one has ever really done it the same way twice, so each platform and filesystem handle sparse files in a slightly different way.

To combat this, I have developed and released the hole-punch crate, which provides a nice, simple, cross platform way of finding which sections of a sparse file are holes and which are data.

Support is currently only provided for Unix-like platforms that support the SEEK_HOLE and SEEK_DATA arguments to leesk. I need to get my hands on a Mac in general, but also for testing and future development on hole-punch, and Windows support is currently in the works.