Why I have settled on XChaCha20+Blake3 as the AE suite of choice for my projects

2021-11-29

This might get me some looks, but I have pretty solidly decided to go in on using XChaCha20+Blake3 as the AE of choice for my future open source work. There are numerous reasons for this decision, but it mainly comes down to the desire for defense in depth, and a deep dislike of fundamental properties of polynomial MACs.

The Why Nots

Why not just use poly1305?

The burning question, I can hear it now:

Why not just use ChaCha20-poly1305? Its standard, and its good enough for <insert application here>? Why roll your own crypto?

The reason for this gets at a fundamental property of polynomial MACs¹ that opens them up to all kinds of fun attacks, like the partitioning oracle², you didn’t even know you needed to worry about. Polynomial MACs violate intuitive expected properties of cryptosystems, and are so finicky to work with that even the specification for AES-GCM contained invalid proofs of its security³.

There is a not-infrequently important property of cryptosystems, referred to as “commiting to their keys”, or simply “being commiting”⁴. Simply put, a system has this property when it is impractical to produce a message that will successfully decrypt or verify under multiple chosen keys. As a fundamental result of their construction, polynomial MACs lack this property, and its even worse than simply being able to construct a message that will decrypt under two chosen keys. Even for the more well behaved polynomial MACs, such as Poly1305, it is not only practical to construct garden variety collisions, it is practical to construct multi-collisions that will decrypt under a large number of chosen keys⁵ .

This lack of even a facade of collision resistance can actually turn into quite a big deal, just as Mega⁶ famously experienced a loss of security due to their inappropriate use of CBC-MAC (another type of MAC that also lacks collision resistance) polynomial MACs, in practice, worm their way into all kinds of places they shouldn’t be, places where a collision resistant MAC would have been the only appropriate choice. Even in the context of more seemingly sane systems, this lack of collision resistance can cause no end of problems, such as the partitioning oracle attack ¹, which was practically exploitable for over the network password recovery in Shadowsocks, and the always interesting Invisible Salamander⁷.

These issues alone are, personally, enough to turn me off from using poly1305 or any other polynomial MACs in any of my future projects, given that it’s 2021 and hashes are fast now, but, it is my opinion that polynomial MACs being non-collision resistant represents a huge foot gun, and as I am not a fan of handing foot guns to the consumers of my code, I will refrain from offering this one, and provide a committing, collision resistant scheme instead.

Why not use AES

This one should be a little bit easier to answer, its not exactly a hot take to say that AES is a lot closer to the bottom of the acceptable ciphers barrel than ChaCha20⁸. Even discounting the standard complaints, like how AES’s construction is somewhat-intrinsically vulnerable to cache based timing attacks when implemented in software⁹ (though bit-slicing and other tricks mitigate this, they are surprisingly rare to see in the wild, even in software running on devices that lack hardware accelerated AES), and how the block size is too small¹⁰, AES still has other concerning factors in its corner, like how the PRP-PRF distinguishing attack reduces the effective security of AES in counter mode.

AES just, all around, isn’t ideal. Sure, ChaCha is naturally vulnerable to electromagnetic side channel attacks¹¹, which have been an item of, ehm, increasing concern for me¹², AES is just as vulnerable in that regard, with even side-channel resistant hardware implementations being a rare thing, and for most practical purposes, only existing in the literature¹³. ChaCha20’s much larger block size (512-bit) and more timing side channel resistant construction just lends itself to fewer foot guns, and while hardware assisted AES is a bit faster, I don’t think there is enough of a speed differential to warrant the loss of misuse resistance in most applications.

Why not Panic

This is not intended to be an indictment of any particular protocol, while the issues I’ve brought up are real, and can result in vulnerabilities in the real world, none of them are automatically going to lead to an exploitable vulnerability. The people making the systems you rely on to keep your private data safe by and large understand the state of the research on the matter, and are careful to design systems to avoid the foot guns. Your AES-GCM or ChaCha20-Poly1305 TLS stream is not in immediate peril.

Cryptographic primitives don’t exist in a vacuum, any analysis of vulnerabilities must be done on complete systems, as decisions on any level above the choice of fundamental primitives may sink or save the ship¹⁴.

My interest in writing this post is in informing the design of new systems, where I much prefer foot guns be avoided categorically instead of case-by-case, when practical, and I believe the time has come where the use of cryptosystems that categorically avoid these foot guns is practical in all but niche edge case scenarios.

The Whys

Why use XChaCha20

ChaCha20 has a number of benefits over AES, and I don’t really feel the need to go too deep into them, but lets give the overview. ChaCha20 is a stream cipher, but it acts like a block cipher being used in CTR mode. In this respect, it has two major advantages over AES-CTR:

The block size is 512 bits, compared to AES’s 128 bits, making a wide variety of attacks, such as birthday bound or PRP-PRF distinguishing attacks many orders of magnitude less practical to pull of against ChaCha
The nonce is provided to the underlying cipher independently from the stream position. This has the nice practical effect of making accidental nonce-reuse that much harder, since while a given AES-CTR stream consumes a range of nonces, a given ChaCha stream consumes only one

ChaCha20 also has other advantages, like being more energy efficient when dedicated hardware isn’t available, and being naturally resistant to timing side channel attacks. The only real disadvantage is that it is slower than commonly available hardware accelerated AES, but not by an amount that I think makes a critical difference for all but the pickiest of applications.

The choice of XChaCha20 over that of plain ChaCha20 is also easy to explain. ChaCha20 only gives you a 64 bit nonce, which, while enough to encrypt the world with a given key, means that choosing a nonce at random can be incredibly dangerous, even AES-GCM’s 96-bit nonce is much too small, leading to concrete failures in production¹⁵, and even the 128-bit nonce of straight AES-CTR leaves me feeling uneasy, the birthday bound there is a lot lower than it feels like it is. XChaCha20’s choice of a 192 bit nonce, however, leaves plenty of headroom to feel safe, even when randomly selecting nonces, taking another foot gun out of the equation.

Why use Blake3

This is another easy one to answer. The biggest reason, is, well Blake3 is fast, several times faster than ChaCha20 on my machine, and having an HMAC that can out run your cipher several times over is always handy when building HMAC based cryptosystems. In my opinion having a hash, like Blake3, that’s truly fast changes the trade off analysis, making it much less worth the trade off of using a polynomial MAC.

Blake3 also has some other nice properties, its a modern, secure, length-extension resistant hash based on a merkle tree construction, which provides lots of other side benefits, such as the ability to use the hash for verified streaming. It additionally has the nice property of not really having variants, with the HMAC, KDF, hash, and XOF modes all operating in a very similar and consistent manner, easing the cognitive load on the programmer.

Being effectively an HMAC, Blake3 also provides a scheme that commits to its keys, almost for free as a side effect of collision resistance, directly sidestepping my concerns about polynomial MACs.

Conclusion

I believe that using XChaCha20-Blake3 in the encrypt-then-hmac construction provides a more-than-fast-enough base to build more than useable cryptosystems on top of, and acts as a primitive for doing so that presents substantially fewer foot guns than other competing options, while still remaining more than useably fast, even in the absence of dedicated hardware acceleration. Humans are such fallible creatures, and accidental de-footings are much less likely to happen when we are provided fewer chances to do so.

A family of very closely related Message Authentication Codes, including AES-GCM’s GHASH, AES-GCM-SIV’s Polyval, and Poly1305 ↩ ↩²
Julia Len, Paul Grubbs and Thomas Ristenpart: Partitioning Oracle Attacks (USENIX ‘21) ↩
Tetsu Iwata, Keisuke Ohashi, Kazuhiko Minematsu: Breaking and Repairing GCM Security Proofs (CRYPTO 2012) ↩
OPAQUE protocol specification section discussing this ↩
Kryptos Logic: Faster Poly1305 key multicollisons ↩
Megafail ↩
Yevgeni Dodis, Paul Grubbs, Thomas Ristenpart, Joanne Woodage: Fast Message Franking: From Invisible Salamanders to Encryptment (CRYPTO 2018) ↩
Source: My Ass ↩
Daniel J. Bernstein: Cache-timing attacks on AES ↩
Even AES256 has a block size of 128 bits, which means you can start expecting collisions after 2^64 encryptions. This sounds like a lot, and the exact details of how this can cause a mode of operation to break down depend on the particular mode, but that’s a not an unachievable amount of data, the internet does that much every few months (assuming 1 block per encryption), and the security can start to break down well before the 2^64 mark. https://www.youtube.com/watch?v=v0IsYNDMV7A ↩
Alexandre Adomnicai, Jacques J.A. Fournier, Laurent Masson: Bricklayer Attack: A Side-Channel Analysis on the ChaCha Quarter Round ↩
Giovanni Camurati, Sebastian Poeplau, Marius Muench, Tom Hayes, Aurélien Francillon: Screaming Channels: When Electromagnetic Side Channels Meet Radio Transceivers ↩
Raghavan Kumar, Vikram Suresh, Monodeep Kar, et al: A 4900-μm2 839-Mb/s Side-Channel Attack-Resistant AES-128 in 14-nm CMOS With Heterogeneous Sboxes, Linear Masked MixColumns, and Dual-Rail Key Addition ↩
One good example of it saving the ship is HMAC-MD5. While the MD5 hash has been blown pretty widely open, the common HMAC construction gets many of its security guarantees from the first-preimage resistance of the underlying hash, and as there are no known practical first preimage attacks against MD5, HMAC-MD5 remains largely secure (please do not use it for new systems though) ↩
Hanno Böck, Aaron Zauner, Sean Devlin, Juraj Somorovsky, Phillip Jovanovic: Nonce-Disrespecting Adversaries: Practical ↩