Wed 11 Jan 2006
Technology Corner: The SMTP Server Always Rings Twice (How to Read Email Headers)
Posted by Administrator under Privacy , TechnologyNo Comments
Technology Corner
The SMTP Server Always Rings Twice
(How to Read Email Headers)
The most ubiquitous end-user Internet application remains today what it was 20 years ago — email. Billions of email messages traverse the globe every day, and each one includes with it full details on where it’s been and where it’s going. This article will tell you how to read the hidden traces on any email message and in the process learn more about how email works under the hood.
The SMTP Protocol
All email transactions contain two parts — the envelope and the data. Much like a letter, the envelope contains all the routing information and the postmarks, showing where the email came from and how it was processed. This information may well be repeated in the data sections, just as addresses might be printed on the letter inside as well, but it’s the envelope information that counts. Let’s look at what a simple mail transaction entails on the network (lines starting with > were sent by me; everything else is returned by the server):
220 zealous.concentric.com [ConcentricHost ESMTP MX 2/2.54] ready
> EHLO nemesis.concentric.com
250-zealous.concentric.com
250-PIPELINING
250-SIZE 10485760
250 8BITMIME
> MAIL FROM: <testme @fluffysheep.com>
250 Ok
> RCPT TO: <technotes@concentric.com>
250 Ok
> DATA
354 End data with <CR><LF>.<CR><LF>
> From: <testme@fluffysheep.com>
> To: <technotes@concentric.com>
> Subject: This is an email test !
>
> Blah blah blah blah blah.
> .
250 Ok: queued as 7D4DFCFC16
While the above is an oversimplification, this is the basis for the SMTP protocol through which all Internet email is delivered. If we look closely, we can see several important details:
o What you receive in your email box is what comes across in the DATA section of the protocol.
o The envelope (everything before DATA) has the sender and recipient, as do the headers inside the data section.
o The sender tells the receiver the name of the sender’s computer — we’ll see this in more detail below.
Headers
Now, if I look at this message in my mail client (in my case, Apple’s ‘Mail’ program), the message looks like this:
Subject: This is an email test !
Date: January 10, 2006 8:00:46 AM PST
From: testme@fluffysheep.com
To: technotes@concentric.com
Blah blah blah blah blah.
We see that there are four headers to the email — but in fact, these headers have been selected and cleaned up by the mail client. Every mail client has a way to show ‘all headers’ or ‘full headers’, and that will let us see what really happened with the email. This option is usually in the pulldown menu options or under info/properties for the message you’re reading — for instructions on how to find the full headers option on most mail clients, you can visit our documentation in the Email FAQ https://register.cnchost.com/home/utility/parseSupportXML.cgi?manageUsers/email_faq#ehead.
The full headers for this message look like this:
Subject: This is an email test !
Date: January 10, 2006 8:00:46 AM PST
From: testme@fluffysheep.com
To: technotes@concentric.com
Return-Path: <testme @fluffysheep.com>
Received: from nemesis.concentric.com (nemesis.concentric.com [207.155.252.176]) by zealous.concentric.com (ConcentricHost(2.54) MX) with ESMTP id 7D4DFCFC16 for <technotes @concentric.com>; Tue, 10 Jan 2006 11:00:46 -0500 (EST)
Message-Id: <m1007/13/04M46.7D4DFCFC16@zealous.concentric.com>
X-Junkmail: NotJunk
X-Trid: 43c3da686Lpx88b4
X-Mfdata: [34.816010 v2.3:3 n182 s2832 g48979 b53636 p0.045383 sN3 t26,279456]
X-Uidl: 311188
You won’t see a list of headers much shorter than this, as this email was sent manually — email from mail clients will include a lot more data. Let’s walk through each of these headers in turn.
‘Subject’ — pretty obvious. The Subject is sent by the sender and largely unmodified by the server (although some spam filtering options may change it). It’s perfectly legitimate to have no subject header, but your mail client may add a fake one if it’s not there.
‘Date’ — also seemingly obvious, although more complex. Normally the Date will be sent by the sender and will reflect the sender’s clock and timezone. If the sender does NOT send a Date: header (as was the case in our test message here), then the server will set the Date to its local time. But some mail clients, like this one, will go ahead and change the date to reflect the receiver’s timezone — the ACTUAL header on this message would have reflected a date in EST, but Apple Mail has changed it to PST for us. (In my opinion, this is bad behavior — when looking at raw headers, they should truly be raw). There are some important ramifications from this — because the Date reflects the time the sender’s computer thinks it is, and not he time received, if your mail client sorts mail by Date header and not the time it received it, a message with a bad date (for example, a sender with a slow clock) will sort way out of order and you might never see it. Likewise, a message written many hours before it’s sent (for example, written on an airplane and sent later) may, depending on the sender’s client software, be dated much earlier than the time it was actually sent.
‘From’ — this is the From: line in the DATA section of our email. As such, it’s what the sender wants it to be, not necessarily reality.
‘To’ — likewise, what the sender wants it to be. It doesn’t reflect how the message was actually routed.
‘Return-Path’ — added by the mail server and will reflect the from address provided in the envelope. When we said ‘MAIL FROM: <testme@fluffysheep.com>’, that address was stored here in Return-Path.
‘Received’ — the postmark, showing where the message has been. For forensic analysis, this is the most important line in the message. We’ll discuss it in much more detail below.
‘Message-ID’ — a unique ID describing the message, used to determine if messages are duplicates. Like ‘Date’, it will be assigned to the message if not already provided in the headers sent by the sender.
The remaining headers start with X-, which means that they aren’t strictly defined by the SMTP standard but are used for other things, and may have meaning only to the system that added them. There are thousands of these that you might see. The four here are added by our platform:
‘X-Junkmail’ — ‘Junk’ or ‘NotJunk’, depending on the recipient’s mail filtering settings and the details of the message. If you leave junkmail in your inbox, this header can be used in a custom rule on your mail client to filter them locally, which is what I do.
‘X-Trid’ — an internal ID from our mail filtering system that we use for debugging purposes, if we need to trace back why a specific message was filtered the way it was.
‘X-Mfdata’ — more internal details from our mail filtering system
‘X-Uidl’ — the unique ID that POP3 uses to identify messages.
From And To Vs From and To
As you’ve seen above, there are two sets of addresses provided in the email. The first set, commonly referred to as ‘Envelope-From’ and ‘Envelope-To’, are the addresses provided in the MAIL FROM and RCPT TO commands of the protocol and are the actual addresses used for routing mail. The second, Header-From and Header-To, are provided in the headers sent with the message and are cosmetic rather than functional. On your average message from one individual to another, the From’s and To’s will match. But there are many legitimate cases where they don’t:
o Mail to multiple people may have To: and CC: in the header, but there’s no ‘CC’ concept in SMTP — every recipient ends up as a RCPT TO: line in the envelope.
o As a corollary of this, there’s no such thing as a BCC header — recipients that are ‘blind’ copied are simply not mentioned at all in the headers of the message.
o Mailing lists and other large distributions don’t list all the recipients in the headers, for privacy, security, and scalability reasons.
o Mailing lists often resend mail using a unique from address so that bounces can be tracked — this address will show up in the Return-path, but the actual ‘From’ will show up as the user who wrote the message.
o Email addresses that forward on to other addresses will rewrite the envelope (so that the next mail server will know where to deliver the message) but NOT the headers, so that you’ll see the original recipient in the email.
The Received Line
The Received header serves as a sort of postmark on the email. Every mail server it passes through will add a Received: header to trace its path; this means that most emails will have multiple Received: lines, and this header is the only common header that will occur multiple times in most emails. Each server will add its Received: header above the others in the message, and then pass all of them on to the next hop of the message, so reading the Received: lines is like digging into an archaeological site — each line down is an earlier event in the history of the email.
The format of the Received: header is defined by the SMTP standard, although frequently done inaccurately. Still, the proper Received line should include:
o Where the message came from, in up to three parts: the address or name the sender SAID she was sending from, the name that the server determined the sender was coming from, and the IP address of the server.
o The name of the server that received it
o The date (local server time) the server received the message. This date should reflect an accurate time, as network mail servers should all by synced to an official clock. However, this is not always the case. Also very important is that while this time stamp is in the local time zone, there must be included the offset (in our example above, ‘-0500 EST’) from Universal Time, which used to be referred to as Greenwich Mean Time or GMT. The name of the time zone is there for human benefit — the real magic is the number, which indicates the offset in hours/minutes from UT. In this case, EST is 5 hours behind UT.
An Example
With that information behind us, let’s look at a more complex real-world example of an email message and where it went. I’ve modified only personally-identifiable info in this header:
Subject: The times they are a changing
Date: January 10, 2006 11:00:57 AM PST
From: originaladdress@aol.com
These headers were provided by the original sender.
To: banjo-l@list.mail.virginia.edu
So was this one, but we know that the Envelope-To was different, because it was delivered to my mailbox.
Reply-To: banjo-l@list.mail.virginia.edu
The mailing list added this to tell clients to reply to the whole list and not just to the original sender. Reply-To is a header that mail clients respect when a user replies, making this rather than the From address the default recipient of the reply.
Return-Path: <banjo -l-bounces@list.mail.virginia.edu>
This was the original Envelope-From when the message was delivered to our server — and where errors would go. This is common and correct behavior for large public mailing lists — any list with thousands of users is likely to have at least a dozen that bounce at any given time as people’s addresses change, etc, and you wouldn’t want those bounces to all go to the original sender.
Received: from list.mail.Virginia.EDU (list.mail.Virginia.EDU [128.143.2.235]) by brilliant.cnchost.com (ConcentricHost(2.54) MX) with ESMTP id 0494D13A947 for <banjo@fluffysheep.com>; Tue, 10 Jan 2006 14:01:19 -0500 (EST)
The trace of where the message has been starts here. Remember that we read these going backwards in time. This line shows us that the message came into our MX (receiving) mail server at 2:01:19 ET. It was received from the mailing list server at virginia.edu, and the name that server said it was is in fact the right name. We also see the mailbox it was delivered to — I use ‘deliver all unknown mail to master’ on this domain and use a unique email address for each mailing list and each time I give out an address to a third party, so I can filter mailing lists easily on my client, and tell when my email address is sold or leaked to spammers. Here it shows up in the ‘for’ section of the Received line.
Received: from list.mail.Virginia.EDU (localhost [127.0.0.1]) by list.mail.Virginia.EDU (Postfix) with ESMTP id 5333A2D3714; Tue, 10 Jan 2006 14:01:13 -0500 (EST)
This is initially confusing — it looks like list.mail.virginia.edu sent the message to itself. In fact, that’s more or less what happened — the message came into the list server and was sent through the mailing list software, so that is reflected by this hop. ‘127.0.0.1′ is a magical IP address that always refers to the server itself, and exists on every machine on the net. Note that this was 6 seconds before the message was received on our servers — the mailing list software has to send out the message to hundreds of people, so this delay is not a surprise. Many mailing lists take a lot longer.
Received: from imo-d20.mx.aol.com (imo-d20.mx.aol.com [205.188.139.136]) by list.mail.Virginia.EDU (Postfix) with ESMTP id 850532D3495 for <banjo -l@list.mail.virginia.edu>; Tue, 10 Jan 2006 14:01:02 -0500 (EST)
Here the message came into the list server from one of aol’s servers. This was 11 seconds before it was processed through the mailing list server.
Received: from originaladdress@aol.com by imo-d20.mx.aol.com (mail_out_v38_r6.3.) id d.2d1.16dfb9a (16112) for <banjo-l@list.mail.virginia.edu>; Tue, 10 Jan 2006 14:00:57 -0500 (EST)
This is an internal hop within AOL. It’s not an SMTP hop but something internal to their systems. How do we know? Because the from here is an email address, rather than a server. We can assume that this reflects the passing of the message from AOL’s web mail application into their SMTP fabric.
Received: from FWM-M04 (fwm-m04.webmail.aol.com [64.12.168.68]) by air-id12.mx.aol.com (v108_r1_b1.2) with ESMTP id MAILINID124-3ef043c4046930d; Tue, 10 Jan 2006 14:00:57 -0500
Another hop inside AOL, but this one appears to be using SMTP. Note that the internal name ‘FWM-M04′ is not fully qualified — the servers inside AOL aren’t bothering to put aol.com on them when they identify themselves.
Received: from 1.2.3.4 by FWM-M04.sysops.aol.com (64.12.168.68) with HTTP (WebMailUI); Tue, 10 Jan 2006 14:00:57 -0500
Here’s the first hop, which also isn’t an SMTP hop but the posting of the message through AOL’s webmail interface. The IP there we can assume is the IP of the original writer’s computer. I’ve changed it for privacy reasons. Note that this has a third name for the same internal AOL server, this time the name the mail server thinks it is.
X-Original-To: banjo-l@list.mail.virginia.edu
Delivered-To: banjo-l@list.mail.virginia.edu
Some additional headers added by the mailing list software.
Message-Id: <8C7E42F52CC60B6-EDC-1155@FWM-M04.sysops.aol.com>
The unique ID of the message, added by AOL.
References: <mailman.337.1136912630.25372.banjo-l@list.mail.virginia.edu>
A header added by the original poster’s client when an earlier message was replied to — this is used by some clients that trace message threads to record which messages go together.
X-Mb-Message-Source: WebUI
X-Mb-Message-Type: User
In-Reply-To: <mailman.337.1136912630.25372.banjo-l@list.mail.virginia.edu>
Another header added by the poster’s client, much like References.
X-Mailer: AOL WebMail 15106
Mime-Version: 1.0
Two headers provided by the client, which in this case is the web server providing the AOL webmail function. X-Mailer is a commonly used header to indicate the software used to send the message — with this you can usually guess what software your colleagues are using and laugh at them if you disapprove. Mime-Version describes the mechanism used to encode the message, if relevant, and is used by some clients to determine how to display it.
X-Aol-Ip: 64.12.168.68
A custom message added by AOL which represents the IP of their web server that handled the mail. It’s duplicative with what’s in the earliest Received: line.
X-Spam-Flag: NO
X-Content-Filtered-By: Mailman/MimeDel 2.1.6
X-Beenthere: banjo-l@list.mail.virginia.edu
X-Mailman-Version: 2.1.6
These are all added by mailman, the open-source mailing list software this message went through. They have no special meaning outside the list — we can guess that ‘X-Beenthere’ is designed to prevent infinite loops on mailing lists. The ‘X-Spam-Flag’ is probably the list’s filtering, although as we’ll discuss below, this is largely meaningless to recipients.
Precedence: list
The mailing list added this, as all mailing lists should, to indicate that the message is low priority. Among other things, this prevents well-behaved autoresponders from sending ‘I’m on vacation for 2 weeks’ messages to the original sender. As you might have seen if you participate on mailing lists, many autoresponders are not well-behaved.
Content-Type: text/plain; charset=”us-ascii”
Content-Transfer-Encoding: 7bit
More headers describing the format of the message. If there were attachments, or the message were in an alternate character set such as Chinese, it would be reflected here and the recipient’s mail client would display it as appropriate.
Sender: banjo-l-bounces@list.mail.virginia.edu
Errors-To: banjo-l-bounces@list.mail.virginia.edu
Two more headers added by the mailing list, to further redirect problems back to the list rather than the original sender. There seem to be redundant headers to do this; since SMTP has existed for so long, and client software dating back decades is still in use, it’s necessary to use multiple techniques to achieve the desired goal. In this case, different clients and servers check different headers to bounce mail, and so the mailing list tries them all.
X-Junkmail: NotJunk
X-Mfdata: [51.801600 v2.3:3 n279 s4384 g87935 b93649 p0.031468 sN3 t11,761774]
X-Uidl: 107002
The filtering added by our server, as described above.
Spoofing
The examples above have all assumed that all the data is trustworthy and everything in the headers is, in fact, true. In the real world, trust doesn’t come so easily. Note that I’ve been making clear distinctions on which server in the series of email transactions adds each header. This is critical because you have to trace backwards in time to determine what you can trust. If you pull your mail off your ISPs server, as in the case above, that last hop is truly all you can start out by trusting. Everything before that might be spoofed, and spammers, phishers, and the like routinely fake headers to try to make the mail look legitimate, to make it look like it came from you, or just to confuse someone who is trying to do forensic analysis on the email.
This is why headers like ‘X-Spam-Flag’, unless added by your local mail server’s filtering product, are largely useless. It’s very common for spammers to put fake spam filtering headers in their messages, and most viruses now seem to include a header promising that the message has been virus scanned and is safe. If they’re not your headers, don’t trust them unless you routinely rely on the fox’s status report on henhouse security.
Likewise, Received headers showing the trace of the message are only valid as far back as you can prove a trust relationship. Spam will often have junk Received lines at the bottom to try to shift the blame elsewhere, which is why it’s critical to read them from top to bottom, newest to oldest. Let’s now look at an example of spam headers. Again, this is a real-world example out of my spam folder.
Received: from 3EB1EE28 (201008037147.user.veloxzone.com.br [201.8.37.147])
by brilliant.cnchost.com (ConcentricHost(2.54) MX) with SMTP id D17BC2E6901
for <junk @fluffysheep.com>; Mon, 9 Jan 2006 11:05:52 -0500 (EST)
Received: from 10.0.5.13 (fuse2.mailanyone.net) by fuse0.mailanyone.net with esmtp
(MailAnyone incSMTP) id 2EudEX-0000Rd-Pl for junk@fluffysheep.com; Mon,
09 Jan 2006 08:05:51 -0800
Received: from mailanyone.net by fuse0.mailanyone.net with esmtp (MailAnyone extSMTP)
id 1EudEX-0007RO-1j for junk@fluffysheep.com; Mon, 09 Jan 2006 08:05:51
-0800
Received: from root by fuse4.fusemail.net with local (Exim 4.20) id 1EudEe-0008X8-MH
for junk@fluffysheep.com; Mon, 09 Jan 2006 08:05:51 -0800
Superficially this appears to be a four-hop message originating from fusemail.net, and that would be the place you’d go to complain. But look more closely, and you’ll see that there’s a disconnect between the first two Received headers. The first line — the one we can trust the most — shows that the message came into our server from a server in Brazil. But the next header shows it came in to a server at mailanyone.net, and the headers beyond it show mail within mailanyone and fusemail. These headers, if they were standing alone, might pass muster, but it’s hard to explain that hop from mailanyone to veloxzone.com.br. We can do some checking — for example, is mailanyone.net a Brazilian company, etc — but this is 99% certain to have forged headers. The fact that the actual IP it came from appears from the name to be an end user dial or DSL or some such address suggests further that it was an end PC that was used to inject the spam. The last three received lines were provided by the sender as if they showed the previous trace — remember, trust goes back step by step, and we have to assume that the headers provided are legitimate, until we inspect them by hand.
Conclusion
There’s a wealth of information in mail headers which you’d never know was there unless you find the (usually fairly-well-hidden) option in your mail client. With header knowledge, you can trace the source of messages, see where emails are delayed, determine what software your friends use, and trace back to the actual source of spam. When you understand how SMTP works you’ll also realize the fundamental weaknesses in the core email protocols — trust is assumed from the systems that connect to your mail server, and no consistent system exists to evaluate the trustworthiness of data provided. This is why spam filtering is so important, and why new trust systems, such as DomainKeys and SPF, are starting to be built, but because of the vast legacy email infrastructure in place, it will be some time before the true trust relationships in email change … and important for power users to understand how the systems work.
Have a technology question for a future newsletter? technotes@concentric.com.
–David Schairer
VP/Engineering & Operations