It has become a common scenario: A reporter gets a newsworthy email forwarded out of the blue. But is the email legit? It turns out there are a few technical tools you can use to check on an email, in tandem with the traditional ones like calling for confirmation. I used some of these techniques last week to help authenticate some emails forwarded to my colleague Justin Elliott. Those emails were sent by Marc Kasowitz, one of President Trump’s personal attorneys.
This post is a very brief introduction to the tools I used and that you can use when you need to authenticate an email message.
There’s a cryptographic technique that can tell us if an email message that you or your source has received matches what was sent. It comes in two similar flavors. One’s called “DomainKeys Identified Mail,” or DKIM, and the other is “Authenticated Received Chain,” or ARC. You can use them to authenticate emails that come in over the transom. It takes a tiny bit of command-line work and maybe a little coaxing of your source, but it can offer you a mathematical guarantee that the email you have on your screen is identical to the one that the source received, with no possibility of intermediate tampering.
To understand it, we need to do a little bit of e-spelunking into how email and cryptography work.
An email message has two parts: the body, which is the text of the message, and the headers, which are kind of like the outside of an envelope on a piece of snail mail. The headers include stuff you are familiar with, like To, From and the Subject lines. But they also include a lot of other, more obscure fields that aren’t shown in Gmail or Outlook. For instance, one of the fields contains what is essentially a tracking log for the email, recording the path it took from the sender’s email service to the service hosting your own email.
The obscure header we’re interested in is called the DKIM Signature. It’s kind of like the shipper’s packing list. The DKIM Signature field contains two things: First, a set of instructions for making a summary of the email, mushing up some of the headers and the message itself, and, second, a version of that summary — technically, a “hash” — that’s cryptographically signed by the sending server.
It’s meant to give the receiving server the ability to see if the contents of the email changed in transit, the digital equivalent of detecting whether the mailman steamed open the envelope and modified the contents of a letter. We can put it to good use as journalists by creating our own version of the hash and then decrypting the one made by the sending server. If the hash we create from those instructions matches the decrypted one from the message exactly, we have mathematical proof that our email is the same as the one that was sent/received.
The inverse isn’t true. That is, if the hash we create isn’t the same as the hash in the DKIM Signature field, it doesn’t necessarily mean the messages are completely different or that the message was tampered with. Some email servers are a little wonky and make little changes to an email — adding or removing spaces at the end or something like that. Even a tiny change will totally throw off the cryptographic comparison, and isn’t at all uncommon. So if the keys don’t match, it’s possible this means the email was tampered with, but you can’t draw that conclusion from a DKIM hash mismatch alone.
There are other reasons why verification might fail on a genuine email. Older emails, in particular, are more likely to not validate because the public key used to decrypt the summary of the email might have been changed. (Remember — DKIM is meant to be used when the email was received right away, not months or years later.)
So that’s DKIM. Now for ARC. ARC is similar to DKIM, but instead of being used by the sending server, it’s used by intermediaries in the email process, like listservs or servers that receive email. Many emails that arrive into Gmail are signed by Google, but this is a new development — the ARC protocol isn’t even formally approved yet.
Some emails will have both DKIM signatures and ARC signatures. Some will have only one. For instance, the email our source received only had an ARC signature, put there by Google when it arrived in Gmail. It didn’t have a DKIM signature, because the email server used by the sender’s law firm doesn’t include them. And some have neither; both of these systems are slathered on top of the original email system like sunscreen — and, also like sunscreen, some people don’t use them.
What DKIM and ARC Prove (and What They Can’t Prove)
While a validated DKIM signature guarantees that you have the same email that was sent; a validated ARC signature can guarantee that you have the same email that was received by the receiving server. In practice, this was perfect for us, because we needed to know for sure that the email that was forwarded to us was exactly the one our source originally received.
Just because a message you’ve been forwarded matches what was sent or received, that doesn’t mean it’s completely authentic. DKIM and ARC can’t tell you whether the sender’s server was hacked or misconfigured.
And neither technique guarantees that the sender is who they say they are. It’s theoretically possible for me to create my own email server that pretends to be hillaryclinton.com. There’s a system called Sender Policy Framework (SPF) that validates whether a sending server is really allowed to send email on behalf of a given domain. Read up on that if you think that scenario is a possibility.
DKIM and ARC also can’t confirm that the person who typed the email was the person whose name is on the account, instead of somebody else with access to it. (An email that the sender “signed” with a different kind of encryption tool like PGP or S/MIME would have cryptographic proof that it came from a computer belonging to the sender, but that’s beyond the scope of this post. The emails we were analyzing didn’t have a PGP signature, and most people don’t use these tools.)
How to Check DKIM and ARC
You’ll need a little bit of command-line knowledge and to have Python installed on your computer. I’ll assume you have both, and I’ll also assume that you’re dealing with an email forwarded by a source you are still in contact with.
You’re also going to need an original copy of the email. A forwarded version won’t work at all — the headers we care about are stripped out. That means that the emails that Donald Trump Jr. tweeted can’t be verified using these techniques (but then, I suppose, he authenticated them for us).
You’ll need to find out the service your source uses to receive email — Gmail, Outlook, Yahoo, etc. — and then find their instructions on how to forward a message as an attachment. I’m including the Gmail ones here.
Here’s how you or your source can get the original message in Gmail:
- Open the message.
- Find the little down arrow next to the reply button and click it.
- Click the Show Original button.
- In the new tab that opens, you’ll see the source of the message — including all the headers that Gmail hides from you by default. Your source should click the Download Original message and email it to you as an attachment.
Now that you have the email you want to authenticate as an attachment, you’ll need two Python libraries.
dnspython
. We’ll use this to fetch the decryption key that’s used to guarantee that the two summaries match. This library grabs the key from the DNS system (it’s not included in the message, which makes it harder to spoof). You should be able to install this usingpip
oreasy_install
.
dkimpy
. This is a Python library for authenticating DKIM and ARC signatures. You can grab it at https://launchpad.net/dkimpy.
Once these are installed:
- Using the command line, go to the directory where you unzipped
dkimpy
. On Mac and Linux, that’s probably something likecd ~/Downloads/dkimpy-0.6.2
. - Make sure you know where the email message (sent as attachment) is. It might also be in Downloads — so let’s assume the path is
~/Downloads/original_msg.txt
. The file path might be something like .eml or .msg. That’s fine, too. - Execute the signature validation tool, providing the original message as an argument. That’s going to look something like one of these two commands:
python dkimverify.py < original_msg.txt
python arcverify.py < original_msg.txt
- Interpret the results. If the command comes back saying
arc verification: cv=b'pass' success
(for ARC) orsignature ok
(for DKIM) then we know the message is the same as sent or received, as the case may be. If the response is “signature verification failed” or “Message is not ARC signed,” we don’t know if the email’s been tampered with or not. (Seriously — you can’t conclude that it has been tampered with. You just don’t know.)
I hope you find these tools useful. Seeing as how the phrase “email scandal” could refer to any number of different political brouhahas over the past two years, it’s clear that email verification is a process that we’re all going to have to get more familiar with.