Sorry, title is too long. Again. Nevertheless, ignore this and continue on…
I run my own mailserver. I love doing this, don’t ask me why. Anyway, I was recently migrating a few password hashes to Argon2. I confirmed manually that everything was working, I checked that Dovecot was able to generate and verify Argon2 hashes, that my backend storage was doing everything correctly and so on.
Then I changed a bunch of passwords to migrate them over to Argon2 (I was previously using bcrypt, but I started to like Argon2 more because of it’s resistance to many optimization attacks). Just after I had those new Argon2 hashes in the database, I could no longer login using these users. I think it worked like once and then never again.
Well, damn. I spend hours researching what may be wrong. Dovecot was simply spitting out it’s usual “dovecot: auth-worker(pid): Password mismatch” message. Nothing I could get any information from. To summarize what I found in the ‘net: Nothing of use.
So well, why am I writing this post then? Well because I finally figured out what’s wrong. The Dovecot documentation states this:
ARGON2 can require quite a hefty amount of virtual memory, so we recommend that you set service auth { vsz_limit = 2G } at least, or more.
Well, I obviously already did that – I do read the documentation from time to time, at least when I’m trying to solve a critical problem. But you know what’s strange? The docs also state that the actual login isn’t handled by the auth service, instead it’s usually done by a service called auth-worker1[at least if you’re using a database like I do] (that’s also the thing that’s producing the “Password mismatch” log messages).
To make a long story short, what happened was that Dovecot stopped the auth-worker process as it was trying to hash the Argon2 password. This simply triggered a generic “Password mismatch” message instead of something useful like “out of memory”, so yeah…
Lesson Learned: If you’re using Argon2, increase the memory limit of auth-worker, not the auth service like the docs tell you to.
This was the solution. I simply added a few config lines to Dovecot:
service auth-worker {
# Needed for argon2. THIS IS THE IMPORTANT SETTING!
vsz_limit = 0 # Means unlimited, other values like 2G or more also also valid
# Other custom settings for your auth-workers here...
}
And login was working again. I never found anyone mentioning that you need to set this value for auth-worker instead of auth, which is why I wrote this little post.
Please do not try to reach a 100% score on SSLabs.
I’ve seen this numerous times on the Let’s Encrypt forums and so I felt the need to scream out: Please don’t do this.
A small introduction to SSLabs and it’s ratings
The website SSLabs provides a service that checks SSL/TLS hosts for their security level, scans for typical TLS issues and displays a final score upon finishing scanning.
Users not familiar with the inner workings of TLS probably don’t understand much of the displayed data. It is therefore natural for these users to interpret the data they understand most – the score. The score is displayed at the top of the results page and may look like this:
As you can see here, there’s an “overall rating”. The highest possible value is A+, the lowest rating is F (or various connection error states). Besides the rating, there’s also a percentage score (0 – 100%), divided into four categories:
Certificate – The score of the site’s certificate
Protocol support – A score, based on which protocol versions are supported by the server
Key Exchange – Score based on the strength of all available key exchanges
Cipher strength – Score based on the strength of all available cipher suites
If you look at my screenshot, you should notice that I’ve got 100% only in two categories – Certificate and Protocol Support. The other two categories are only at 90%. Why is that, and why shouldn’t I fix that?
Before we dive into the details, the reader should be aware that the rating system of SSLabs constantly changes. Future versions may work differently and this information may not be up to date.
Category 1 – the certificate
Getting 100% is pretty simple here, and most users don’t need to do anything to achieve this. According the the SSLabs rating guide, the certificate is evaluated by typical misconfigurations or weaknesses, and points are deducted if anything is out of the usual. Most CA’s (like Let’s Encrypt) will generally not issue “bad” certificates, so there’s not much room for issues here. There are some things to misconfigure though, e.g a bad or incomplete cert chain – SSLabs will show a warning in the results if something was found.
Category 2 – protocol support
We could talk ages about what SSL/TLS version exist, and which you should use or shouldn’t use, but I’m going to make it short: Trust me, or trust Mozilla’s Guidelines, or trust one of the million sites that have great scores on SSLabs:
Use TLSv1.2 and TLSv1.3 only. Do not enable anything lower (anything newer doesn’t exist at the time of writing). Doing this will also get you the 100% score on SSLabs in terms of protocol support. TLSv1.2 is twelve years old, so compatibility isn’t a factor unless you need to support extremly old things (in that case you should consider isolating the legacy systems).
Category 3 – key exchange
Now it gets interesting. This is one of two categories where I don’t have 100%. And you shouldn’t either. In order to get 100% on key exchange, you would need to make sure that all algorithms used in the key exchange have a theoretical bit security strength that is greater than some arbitrary value set by SSLabs. That is currently at least 4096 Diffie-Hellman, or a 384-bit EC curve or higher. You’re not allowed to offer anything with less size, that would decrease your score (usually to 90%, which is what you see on my site).
So, why are larger keys bad?
They’re not bad, but their increase in security is mostly theoretical. We cannot break a 256 bit EC key, and we also cannot break a 384 bit key. We may be able to do this in the future though, which is why some folks tend to prefer larger keys. However, some researchers say that it is much more likely that a curve is broken by a technical flaw, rather than by true brute-force power. If such a flaw is found, it will most likely affect both the smaller and the larger curves. The value in practice is small.
The downside of a larger key is however, that not all clients support it (also performance). Google Chrome, for example, has dropped support for one the largest NIST curves, secp521r1. You cannot use this curve together with Chromium-based browser, which is the majority of the web. Android 7.0 has a bug, where the only supported elliptic curve is prime256v1. Prime256v1 is one of the smaller curves (256 bits), but statistically the most used ECDSA curve in the public internet.
So, using larger curves will not really increase your strength against an attacker, but will lock out clients without reason. That is why I’m reiterating: A DHE key exchange of 2048-bit is still okay today, and you should also offer smaller curves. You obviously can offer larger values – but do not drop support for the smaller ones yet. Except for things like 1024-bit RSA/DHE1 – that’s dangerously low. But 2048-bit to 4096-bit RSA/DHE is a reasonable range to offer.
Category 4 – cipher strength
The second category where I only have 90%. The reasoning is very similar to category 3 – a higher score kills of clients support, without a strong increase in security level.
Getting 100% here requires all offered block ciphers to have a key length of 256 bits or greater. Anything lower must not be offered. The most famous (and used) block cipher is probably AES. The largest key size available for AES is 256, the lowest 128 bits.
Similar to key exchanges, more bits are not neccessarily more secure. A larger key means more work brute-force wise and generally increases computation work for many attacks. But, there are also attacks, like Cache-timing attacks, that can work even on the large AES-256 keys. So yes, a larger key is a bit better, but it is also a lot slower and security doesn’t increase dramatically.
The major issue again here is, besides performance, that not all clients talk AES256 (in all cipher combinations). For example, according to SSLLabs, Firefox 47 on Windows 7 only speaks AES128-GCM, not AES256-GCM. 256-bit AES-CBC is supported, but we don’t want to talk CBC. So requiring the use of AES256 will again lock out lots of clients.
It is therefore far better to offer both – AES128-GCM and AES256-GCM. If the client supports both, you can still select AES256 by turning on server-side cipher preference (see my older posts on TLS). But do not force it, unless you like killing of clients.
I hope this post has cleared up some misconceptions about the scores on SSLabs, and why a full 100%-score is sacrificing support without better overall security. I also want to remind server operators, that a secure TLS connection is only a part of the deal – a secure site also requires a well-engineered and maintained application behind it. I plan to make some more web-related posts about typical attacks, security headers and similar things in the future.
If you have any questions, feel free to ask below or contact me directly – my email is here somewhere.
This is a question I’ve asked myself about two months ago. The question was whether I should follow Mozilla’s Guidelines for Server Side TLS and throw old, legacy ciphers (like CBC-based ciphers), overboard. This depends on if they’re still used in practice. There will certainly always be clients that only support legacy stuff. I have no interest in maintaining absurd backwards compatiblity by sacrifcing security. But I do want some reasonable amount of backwards compatibilty – I certainly don’t want to lock out legitimate users using only slightly outdated software.
I’m already requiring HTTPS on my sites since 2016. In 2018 I dropped support for TLS 1.1 and below – TLS 1.2 was published in 2008, so if you’re TWELVE years behind: Sorry, but that isn’t going to end well.
In the past I’ve also killed of more legacy ciphers such as those using Cipher Block Chaining Mode (CBC). I’ve now reenabled these for testing purposes. The CBC mode has some serious issues: For one, there are (possible) padding oracle attacks against CBC. This isn’t just a theoretical thing, actual attacks were performed in the past. While we’re at it: Don’t forget the Poodle variants… There are even more dragons lurking around, such as Lucky Thirteen (a timing side channel based attack on CBC). But it doesn’t stop here: CBC (in combination with TLS) has also suffered from indirect attacks like OpenSSL’s 0-Length Bug. There’s also a whole bunch of issues with ciphers that separate encryption and authentication. As a conclusion, ciphersuites that use CBC aren’t really preferable in a modern, secure encryption scheme.
TLS 1.2 introduced AEAD (authenticated encryption) ciphers. Those do both things (authentication & encryption) together, eliminating many oracle issues. They don’t solve all issues, but as of now (2020) they’re the best we have at this time. Because TLS 1.2 is already 12 years old, these secure AEAD ciphersuites have seen great adoption in practice.
Now, Mozilla recommends to turn of CBC entirely and only use those AEAD ciphersuites. I would love to do that, but before I turn my back on CBC I want to be sure that all common clients do support the newer AEAD things. Qualys SSL Labs seems to suggest that many outdated macOS computers do not support any AEAD cipher (Note that we do not care about clients that do not even speak TLS 1.2 – that’s already off the table).
Because I couldn’t find any usage study on cipher suites used in the wild, I decided to do my own, non-scientific study. For the past two months I logged all TLS ciphersuites used by clients that performed at least one complete HTTPS request [to my server]. This means that most TLS scans, which do not make any HTTP(S) requests, were excluded from the logging. This way most data gathered is from actually meaningful clients (crawlers & humans plus a few additional bots).
I’m actively monitoring whether I see clients that really do not speak anything better than CBC. For the past two months, I haven’t had one (except for two clients which were definetly TLS scanners). However, at this point I’m still not done with the survey. I will continue to monitor for some more and once I have gathered sufficient data, I will decide whether to ditch CBC or not.
Note that this is not a scientific study of any kind, and as such I will not compile any beautfiul data sets. I may post some stats here in the future if I feel like it. I’m not getting paid for this, so involved effort is low.
If you want to see what my current TLS setups looks like, you can see for yourself on Qualys. I’m also planning on writing a more detailed blog post about how the setup looks and the reasoning behind it. It’s all on my TODO list, I promise.
Update (1 month later)
I have continued to monitor ciphersuites and I did have some clients sending HTTP requests using CBC ciphers. All of these clients were some type of TLS scanner or vulnerability scanners. I couldn’t identify a single useful bot or human user with CBC. As a result I have turned of CBC completly. That also means that server cipher preference is now off and the client can choose it’s preferred cipher (as Mozilla recommends).
Another Update (many months later)
I did encounter some very old clients now only supporting CBC (but TLS 1.2) while setting up some new services. Those were IOT devices (sigh) and other embedded legacy stuff like webbrowsers from Smart TV’s and similar. One of these didn’t even knew what elliptic curves are (no ECDHE and no ECDSA). I’ve reenabled a single CBC cipher suite for those clients.
Another major problem with these clients is the upcoming switch of Let’s Encrypt’s Root Certificate, ISRG Root X1. Those clients are so old that they don’t have an up to date trust store and thus cannot validate certificates from the new root. I initially underestimated how many legacy client are out there, even my own household has some devices affected by this. Sadly, there’s no good strategy here, but that’s a topic for another post…