This might get me some looks, but I have pretty solidly decided to go in on using XChaCha20+Blake3 as the AE of choice
for my future open source work. There are numerous reasons for this decision, but it mainly comes down to the desire for
defense in depth, and a deep dislike of fundamental properties of polynomial MACs.
The Why Nots
Why not just use poly1305?
The burning question, I can hear it now:
Why not just use ChaCha20-poly1305? Its standard, and its good enough for <insert application here>? Why roll your
own crypto?
The reason for this gets at a fundamental property of polynomial MACs that opens them up to all kinds of fun
attacks, like the partitioning oracle, you didn’t even know you needed to worry about. Polynomial MACs violate
intuitive expected properties of cryptosystems, and are so finicky to work with that even the specification for AES-GCM
contained invalid proofs of its security.
There is a not-infrequently important property of cryptosystems, referred to as “commiting to their keys”, or simply
“being commiting”. Simply put, a system has this property when it is impractical to produce a message that will
successfully decrypt or verify under multiple chosen keys. As a fundamental result of their construction, polynomial
MACs lack this property, and its even worse than simply being able to construct a message that will decrypt under two
chosen keys. Even for the more well behaved polynomial MACs, such as Poly1305, it is not only practical to construct
garden variety collisions, it is practical to construct multi-collisions that will decrypt under a large number of
chosen keys .
This lack of even a facade of collision resistance can actually turn into quite a big deal, just as Mega famously
experienced a loss of security due to their inappropriate use of CBC-MAC (another type of MAC that also lacks collision
resistance) polynomial MACs, in practice, worm their way into all kinds of places they shouldn’t be, places where a
collision resistant MAC would have been the only appropriate choice. Even in the context of more seemingly sane systems,
this lack of collision resistance can cause no end of problems, such as the partitioning oracle attack , which was
practically exploitable for over the network password recovery in Shadowsocks, and the always interesting Invisible
Salamander.
These issues alone are, personally, enough to turn me off from using poly1305 or any other polynomial MACs in any of my
future projects, given that it’s 2021 and hashes are fast now, but, it is my opinion that polynomial MACs being
non-collision resistant represents a huge foot gun, and as I am not a fan of handing foot guns to the consumers of my
code, I will refrain from offering this one, and provide a committing, collision resistant scheme instead.
Why not use AES
This one should be a little bit easier to answer, its not exactly a hot take to say that AES is a lot closer to the
bottom of the acceptable ciphers barrel than ChaCha20. Even discounting the standard complaints, like how AES’s
construction is somewhat-intrinsically vulnerable to cache based timing attacks when implemented in software (though
bit-slicing and other tricks mitigate this, they are surprisingly rare to see in the wild, even in software running on
devices that lack hardware accelerated AES), and how the block size is too small, AES still has other concerning factors
in its corner, like how the PRP-PRF distinguishing attack reduces the effective security of AES in counter mode.
AES just, all around, isn’t ideal. Sure, ChaCha is naturally vulnerable to electromagnetic side channel attacks,
which have been an item of, ehm, increasing concern for me, AES is just as vulnerable in that regard, with even
side-channel resistant hardware implementations being a rare thing, and for most practical purposes, only existing in
the literature. ChaCha20’s much larger block size (512-bit) and more timing side channel resistant construction
just lends itself to fewer foot guns, and while hardware assisted AES is a bit faster, I don’t think there is enough of
a speed differential to warrant the loss of misuse resistance in most applications.
Why not Panic
This is not intended to be an indictment of any particular protocol, while the issues I’ve brought up are real, and can
result in vulnerabilities in the real world, none of them are automatically going to lead to an exploitable
vulnerability. The people making the systems you rely on to keep your private data safe by and large understand the
state of the research on the matter, and are careful to design systems to avoid the foot guns. Your AES-GCM or
ChaCha20-Poly1305 TLS stream is not in immediate peril.
Cryptographic primitives don’t exist in a vacuum, any analysis of vulnerabilities must be done on complete systems, as
decisions on any level above the choice of fundamental primitives may sink or save the ship.
My interest in writing this post is in informing the design of new systems, where I much prefer foot guns be avoided
categorically instead of case-by-case, when practical, and I believe the time has come where the use of cryptosystems
that categorically avoid these foot guns is practical in all but niche edge case scenarios.
The Whys
Why use XChaCha20
ChaCha20 has a number of benefits over AES, and I don’t really feel the need to go too deep into them, but lets give
the overview. ChaCha20 is a stream cipher, but it acts like a block cipher being used in CTR mode. In this respect, it
has two major advantages over AES-CTR:
- The block size is 512 bits, compared to AES’s 128 bits, making a wide variety of attacks, such as birthday bound or
PRP-PRF distinguishing attacks many orders of magnitude less practical to pull of against ChaCha
- The nonce is provided to the underlying cipher independently from the stream position. This has the nice practical
effect of making accidental nonce-reuse that much harder, since while a given AES-CTR stream consumes a range of
nonces, a given ChaCha stream consumes only one
ChaCha20 also has other advantages, like being more energy efficient when dedicated hardware isn’t available, and being
naturally resistant to timing side channel attacks. The only real disadvantage is that it is slower than commonly
available hardware accelerated AES, but not by an amount that I think makes a critical difference for all but the
pickiest of applications.
The choice of XChaCha20 over that of plain ChaCha20 is also easy to explain. ChaCha20 only gives you a 64 bit nonce,
which, while enough to encrypt the world with a given key, means that choosing a nonce at random can be incredibly
dangerous, even AES-GCM’s 96-bit nonce is much too small, leading to concrete failures in production, and even the
128-bit nonce of straight AES-CTR leaves me feeling uneasy, the birthday bound there is a lot lower than it feels like
it is. XChaCha20’s choice of a 192 bit nonce, however, leaves plenty of headroom to feel safe, even when randomly
selecting nonces, taking another foot gun out of the equation.
Why use Blake3
This is another easy one to answer. The biggest reason, is, well Blake3 is fast, several times faster than ChaCha20 on
my machine, and having an HMAC that can out run your cipher several times over is always handy when building HMAC based
cryptosystems. In my opinion having a hash, like Blake3, that’s truly fast changes the trade off analysis, making it
much less worth the trade off of using a polynomial MAC.
Blake3 also has some other nice properties, its a modern, secure, length-extension resistant hash based on a merkle tree
construction, which provides lots of other side benefits, such as the ability to use the hash for verified streaming. It
additionally has the nice property of not really having variants, with the HMAC, KDF, hash, and XOF modes all
operating in a very similar and consistent manner, easing the cognitive load on the programmer.
Being effectively an HMAC, Blake3 also provides a scheme that commits to its keys, almost for
free as a side effect of collision resistance, directly sidestepping my concerns about polynomial
MACs.
Conclusion
I believe that using XChaCha20-Blake3 in the encrypt-then-hmac construction provides a more-than-fast-enough base to
build more than useable cryptosystems on top of, and acts as a primitive for doing so that presents substantially fewer
foot guns than other competing options, while still remaining more than useably fast, even in the absence of dedicated
hardware acceleration. Humans are such fallible creatures, and accidental de-footings are much less likely to happen
when we are provided fewer chances to do so.