Categories
Server Administration Transport Layer Security (TLS)

Deploy CAA in exchange for internet cookies

Want do to something useful? Something that (slightly) improves your security in the internet? Want some internet cookies? Then please deploy CAA.

What is CAA?

CAA means Certificate Authority Authorization. In its essence, it’s just a DNS record. You know, just like those A records that resolve names into IP addresses, or MX records that list your mailserver(s). This DNS record tells other people which Certificate Authorities (CA) are allowed to issue certificates for your domain – nothing more, nothing less.

Why is that important? Who cares about this?

I do 🙂 . No seriously, it’s a useful thing, even though it may not seem like this on first glance. A CAA record on your domain improves the security of your domain. That is because by default every CA in the entire world can issue a certificate for your domain. That isn’t ideal, because there are more than 100 companies operating one or more root certificate authorities and much more subordinate Certificate Authorities, probably well over a thousand. Every one of these can in theory go rogue and forge a certificate for your domain(s). Are all of these thousand CA’s equally thrustworthy and secure? Well, from a browser security standpoint all of these are considered “trusted”, so everything signed by them will get a green lock in your browser (or any other application using TLS). Imagine just one of them getting hacked: Even if you are not a customer of them, they can still attack your website. You can detect such attacks using The Power Of Certificate Transparency, but preventing is better than detecting, right?1

With CAA, you can decide which CA you trust. If you only use a single CA, you can tell the entire world “I only trust them, no one else may issue a certificate for me”. That’s what CAA does. The DNS record prevents anyone else from issuing a certificate for your domain. Deploying CAA can be an important difference when a CA gets hacked or goes rogue for some reason.

How does this work on a technical side? How does a DNS record prevent someone from issuing? Can’t a rogue CA just ignore the DNS record?

Yes, they can in theory. However that would be really, really bad for them. CAA is mandatory for every single publicly trusted CA since 2017. No CA is allowed to simply “ignore” a CAA record. It must respect the record – failing to do so will cause the CA to loose it’s publicy trusted status. Minor accidents may be forgiveable, but major CAA accidents will most likely cause a CA to loose it’s trust status worldwide (if not our perspective on what is thrustworthy is really flawed). CAA is like a law: Failing to follow the law will likely lead to punishment. That is why it’s important to deploy CAA: If no one deploys it, you can’t punish the CA’s for ignoring it. It’s also possible to automatically monitor whether a certificate issuance was legal according to CAA (by using the power of Certificate Transparency and checking against the CAA record) and immediatly notify authorities if a mis-issuance happens.

Okay nice, now how do I go about deploying it?

First of all, we should talk about how CAA checking works from a CA point of view. The CA climbs the DNS domain names up to the TLD and checks each name for a valid CAA record. Let me explain it by example:

If we wanted to issue a certificate for blog.germancoding.com, the CA would first check for a CAA record at blog.germancoding.com. If no CAA record is found, the CA queries germancoding.com for a CAA record. Finally, if that was unsuccesfull too, the CA queries the TLD .com (yes, that’s technically also a domain2) for a record. If there isn’t anything either, the CA can consider that there isn’t any CAA record, so issuance is allowed by everyone (allow-by-default in this case). There are edge cases like when there’s a CName record involved, but this upwards climbing is the basic principle. CAA checking is stopped upon the first match, so a CAA record on blog.germancoding.com overrides one on germancoding.com.

What I’m trying to say here is that for most people a single CAA record on the main domain is enough – subdomains are covered automatically. Records on subdomains are only needed if you want to override something, for example:
“I only want Let’s Encrypt to issue for me, but on a single subdomain only I will additionally allow Sectigo” – this is easily doable due to the climbing-up principle explained above.

The actual record may look like this:

germancoding.com. IN CAA 0 issue "letsencrypt.org"

I think this format is called the Bind Zone Format, due to Bind’s popularity it’s seen everywhere (and due to the fact that it’s a standard). This means that there is a DNS record on germancoding.com which is of type CAA – CAA Records have their own type, which is kinda special these days – most protocols scramble data into DNS using a TXT record. The actual “data” of the Record is

0 issue “letsencrypt.org”

The first 0 is a flag. This flag is currently “reserved for future use” and should simply be set to zero (the actual definition of the flag is a bit more complicated, there’s already the “critical bit”, but let’s not blow up this post any more than neccessary).

The second part, “issue” describes what action we want to do (the definition calls this a “tag”): Currently defined tags are issue, issuewild, iodef, contactemail, contactphone. Most of these (iodef, contact*) are used for (real-time) reporting of CAA violations or issues. That isn’t really neccessary to implement, the issue/issuewild tags are the important ones.

The difference between issue and issuewild is that issue controls general issuance of certificates and issuewild controls the issuance of wildcard certificates. If no issuewild is present, issue takes over and also controls the issue of wildcards, so you do not need issuewild, unless you want special treatment for wildcard certificates.

The last part is the name of the certificate authority. The format here isn’t set in stone, each CA can name themselves, so you must check your CA documentation which name they use for their CAA checking. Most CA’s I’ve seen use their primary domain name, which is what is shown in my example (my example would be valid for Let’s Encrypt).

Once you have deployed this, you can use a TLS checker (like the one from Qualys) to see how it looks in the real world (you can also do a “dig CAA example.net” from a suitable box that has the dnsutils command). If you want to double check that everything works as intended, issue a certificate from your designated CA’s and see if you’re getting errors – if so, your CAA is too strict or incorrectly set and doesn’t allow issuance.

Excursion for specialists: The RFC 8657 CAA extension

The DNS CAA standard was recently (November 2019) expanded. The definition of what is new is documented in RFC 8657.

This extension is especially interesting for CA’s that use the ACME protocol (like Let’s Encrypt). With this extension, you have more fine-grained access control how certificates can be issued for your domain. For example if you’re familiar with the ACME protocol, you know that there are multiple challenges, e.g the DNS-01 and the HTTP-01 challenge (+ the forgotten ALPN challenge). With the extension from the RFC, you can define which challenges are allowed on your domains. You could, for example, only allow the DNS challenge. This would prevent someone who has the ability to re-reoute traffic to get certificates from Let’s Encrypt using the HTTP-01 challenge even when there’s a CAA record only allowing Let’s Encrypt. This obviously means that you must use the challenges allowed by your DNS CAA record yourself. An example of the extension in action can be seen on my main domain, germancoding.com:

# dig CAA germancoding.com

;; ANSWER SECTION:
germancoding.com.  3600  IN  CAA  0 issue "letsencrypt.org; validationmethods=dns-01"

As you can see, we can place a semicolon after the name of your allowed CA. After the semicolon we can set parameters, like the allowed challenges. The CA will disallow any challenge not listed in the CAA record, if the parameter is present.

There’s more than just the validationmethods. You can also bind to a specific account URI (e.g the account number or name you have at your CA), so that only accounts under your control can issue certificates – no one can create a new account at the same CA and use that for issuance, if you set that parameter.

Please note that this standard is too new to be supported everywhere. First of all, most CA’s do not use the ACME protocol and therefore things like validationmethods usually have no meaning to them. Secondly, even our ACME exemplary CA Let’s Encrypt doesn’t support the new extension yet in production. Nevertheless, you can still deploy it today on your domains, as support will be there in the future, I’m pretty sure – Let’s Encrypt already has it in staging and it’s only a matter of time until it’s deployed in production too.

Categories
Linux Server Administration

Dovecot and Argon2 doesn’t work? This may be why.

Sorry, title is too long. Again. Nevertheless, ignore this and continue on…

I run my own mailserver. I love doing this, don’t ask me why. Anyway, I was recently migrating a few password hashes to Argon2. I confirmed manually that everything was working, I checked that Dovecot was able to generate and verify Argon2 hashes, that my backend storage was doing everything correctly and so on.

Then I changed a bunch of passwords to migrate them over to Argon2 (I was previously using bcrypt, but I started to like Argon2 more because of it’s resistance to many optimization attacks). Just after I had those new Argon2 hashes in the database, I could no longer login using these users. I think it worked like once and then never again.

Well, damn. I spend hours researching what may be wrong. Dovecot was simply spitting out it’s usual “dovecot: auth-worker(pid): Password mismatch” message. Nothing I could get any information from. To summarize what I found in the ‘net: Nothing of use.

So well, why am I writing this post then? Well because I finally figured out what’s wrong. The Dovecot documentation states this:

ARGON2 can require quite a hefty amount of virtual memory, so we recommend that you set service auth { vsz_limit = 2G } at least, or more.

https://doc.dovecot.org/configuration_manual/authentication/password_schemes/

Well, I obviously already did that – I do read the documentation from time to time, at least when I’m trying to solve a critical problem. But you know what’s strange? The docs also state that the actual login isn’t handled by the auth service, instead it’s usually done by a service called auth-worker 1[at least if you’re using a database like I do] (that’s also the thing that’s producing the “Password mismatch” log messages).

To make a long story short, what happened was that Dovecot stopped the auth-worker process as it was trying to hash the Argon2 password. This simply triggered a generic “Password mismatch” message instead of something useful like “out of memory”, so yeah…

Lesson Learned: If you’re using Argon2, increase the memory limit of auth-worker, not the auth service like the docs tell you to.

This was the solution. I simply added a few config lines to Dovecot:

service auth-worker {
   # Needed for argon2. THIS IS THE IMPORTANT SETTING!
   vsz_limit = 0 # Means unlimited, other values like 2G or more also also valid
   # Other custom settings for your auth-workers here...
}

And login was working again. I never found anyone mentioning that you need to set this value for auth-worker instead of auth, which is why I wrote this little post.

Categories
Transport Layer Security (TLS)

Why 100% isn’t always the best score

Please do not try to reach a 100% score on SSLabs.

I’ve seen this numerous times on the Let’s Encrypt forums and so I felt the need to scream out: Please don’t do this.

A small introduction to SSLabs and it’s ratings

The website SSLabs provides a service that checks SSL/TLS hosts for their security level, scans for typical TLS issues and displays a final score upon finishing scanning.

Users not familiar with the inner workings of TLS probably don’t understand much of the displayed data. It is therefore natural for these users to interpret the data they understand most – the score. The score is displayed at the top of the results page and may look like this:

Screenshot taken from my own blog, on 18.09.2020 @ dev.ssllabs.com
Screenshot taken from my own blog, on 18.09.2020 @ dev.ssllabs.com

As you can see here, there’s an “overall rating”. The highest possible value is A+, the lowest rating is F (or various connection error states). Besides the rating, there’s also a percentage score (0 – 100%), divided into four categories:

  • Certificate – The score of the site’s certificate
  • Protocol support – A score, based on which protocol versions are supported by the server
  • Key Exchange – Score based on the strength of all available key exchanges
  • Cipher strength – Score based on the strength of all available cipher suites

If you look at my screenshot, you should notice that I’ve got 100% only in two categories – Certificate and Protocol Support. The other two categories are only at 90%. Why is that, and why shouldn’t I fix that?

Before we dive into the details, the reader should be aware that the rating system of SSLabs constantly changes. Future versions may work differently and this information may not be up to date.

Category 1 – the certificate

Getting 100% is pretty simple here, and most users don’t need to do anything to achieve this. According the the SSLabs rating guide, the certificate is evaluated by typical misconfigurations or weaknesses, and points are deducted if anything is out of the usual. Most CA’s (like Let’s Encrypt) will generally not issue “bad” certificates, so there’s not much room for issues here. There are some things to misconfigure though, e.g a bad or incomplete cert chain – SSLabs will show a warning in the results if something was found.

Category 2 – protocol support

We could talk ages about what SSL/TLS version exist, and which you should use or shouldn’t use, but I’m going to make it short: Trust me, or trust Mozilla’s Guidelines, or trust one of the million sites that have great scores on SSLabs:

Use TLSv1.2 and TLSv1.3 only. Do not enable anything lower (anything newer doesn’t exist at the time of writing). Doing this will also get you the 100% score on SSLabs in terms of protocol support. TLSv1.2 is twelve years old, so compatibility isn’t a factor unless you need to support extremly old things (in that case you should consider isolating the legacy systems).

Category 3 – key exchange

Now it gets interesting. This is one of two categories where I don’t have 100%. And you shouldn’t either. In order to get 100% on key exchange, you would need to make sure that all algorithms used in the key exchange have a theoretical bit security strength that is greater than some arbitrary value set by SSLabs. That is currently at least 4096 Diffie-Hellman, or a 384-bit EC curve or higher. You’re not allowed to offer anything with less size, that would decrease your score (usually to 90%, which is what you see on my site).

So, why are larger keys bad?

They’re not bad, but their increase in security is mostly theoretical. We cannot break a 256 bit EC key, and we also cannot break a 384 bit key. We may be able to do this in the future though, which is why some folks tend to prefer larger keys. However, some researchers say that it is much more likely that a curve is broken by a technical flaw, rather than by true brute-force power. If such a flaw is found, it will most likely affect both the smaller and the larger curves. The value in practice is small.

The downside of a larger key is however, that not all clients support it (also performance). Google Chrome, for example, has dropped support for one the largest NIST curves, secp521r1. You cannot use this curve together with Chromium-based browser, which is the majority of the web. Android 7.0 has a bug, where the only supported elliptic curve is prime256v1. Prime256v1 is one of the smaller curves (256 bits), but statistically the most used ECDSA curve in the public internet.

So, using larger curves will not really increase your strength against an attacker, but will lock out clients without reason. That is why I’m reiterating: A DHE key exchange of 2048-bit is still okay today, and you should also offer smaller curves. You obviously can offer larger values – but do not drop support for the smaller ones yet. Except for things like 1024-bit RSA/DHE1 – that’s dangerously low. But 2048-bit to 4096-bit RSA/DHE is a reasonable range to offer.

Category 4 – cipher strength

The second category where I only have 90%. The reasoning is very similar to category 3 – a higher score kills of clients support, without a strong increase in security level.

Getting 100% here requires all offered block ciphers to have a key length of 256 bits or greater. Anything lower must not be offered. The most famous (and used) block cipher is probably AES. The largest key size available for AES is 256, the lowest 128 bits.

Similar to key exchanges, more bits are not neccessarily more secure. A larger key means more work brute-force wise and generally increases computation work for many attacks. But, there are also attacks, like Cache-timing attacks, that can work even on the large AES-256 keys. So yes, a larger key is a bit better, but it is also a lot slower and security doesn’t increase dramatically.

The major issue again here is, besides performance, that not all clients talk AES256 (in all cipher combinations). For example, according to SSLLabs, Firefox 47 on Windows 7 only speaks AES128-GCM, not AES256-GCM. 256-bit AES-CBC is supported, but we don’t want to talk CBC. So requiring the use of AES256 will again lock out lots of clients.

It is therefore far better to offer both – AES128-GCM and AES256-GCM. If the client supports both, you can still select AES256 by turning on server-side cipher preference (see my older posts on TLS). But do not force it, unless you like killing of clients.

I hope this post has cleared up some misconceptions about the scores on SSLabs, and why a full 100%-score is sacrificing support without better overall security. I also want to remind server operators, that a secure TLS connection is only a part of the deal – a secure site also requires a well-engineered and maintained application behind it. I plan to make some more web-related posts about typical attacks, security headers and similar things in the future.

If you have any questions, feel free to ask below or contact me directly – my email is here somewhere.