SMTP session phases

In an SMTP transaction, the sending host opens a TCP connection on port 25 and basically sends the following commands:

HELO myname.helodomain.com
MAIL FROM: <user@senddomain.com>
RCPT TO: <ouruser@ourdomain.com>
DATA
<Headers>

<Body>

(with EHLO possibly in place of HELO).

Three different contexts can therefore be identified within each transaction

Envelope data: all the actionable information that can be collected even before the message starts being transmitted, like
- Sender
- Source IP
- Recipient
- HELO string of the sending system
Header data: all the information included in the header and as such already formatted -to a certain degree- to be properly parsable. This includes (but is not limited to)
- From header (or: the “sender” the recipient will actually see)
- Subject of the message
- Received headers, enumerating each SMTP server the message passed through
- Rcpt-To header, if present
Body of the message: the actual content of the message, usuallu composed a selection of plain text, HTML, attachments represented as encoded binaries, etc.

Different context allow for different filtering strategies, and in each of them Spamhaus data can be queried to assess the nature of the message.

Envelope Checks

Before the DATA stage even starts -and therefore before the message is actually transmitted- the SMTP protocol gives the following four parameters that can be used to check the sender’s reputation:

connecting IP address (IP)
reverse DNS (PTR) of the connecting IP address, if present (Domain)
domain used in HELO/EHLO like helodomain.com in the example above (Domain)
domain used in MAIL FROM (envelope from) like senddomain.com in the example above (Domain)

Spamhaus recommends the following actions based on these parameters before getting to the DATA stage:

reject the transaction if the connecting IP is listed by the SBL zone components, by XBL or by PBL (in other words, any hit against the zen zone, as long as the return code is contained in 127.0.0.0/8)
reject the transaction if the reverse DNS of the connecting IP (when defined) is listed by DBL with a return code lower than 127.0.1.100
reject the transaction if the domain used in HELO/EHLO is listed by DBL with a return code lower than 127.0.1.100
reject the transaction if the domain used in MAIL FROM is listed by DBL with a return code lower than 127.0.1.100
reject the transaction if the reverse DNS of the connecting IP (when defined) is listed by ZRD with a return code lower than or equal to 127.0.2.24
reject the transaction if the domain used in HELO/EHLO is listed by ZRD with a return code lower than or equal to 127.0.2.24
trigger greylisting if the domain used in MAIL FROM is listed by ZRD with a return code lower than or equal to 127.0.2.24

Numeric HELO

Despite being an error code, 127.0.1.255 can still be useful when the resource checked against the DBL is an HELO string, as no valid HELO string can be in the form of an IP address.

Note that DBL return codes larger than 127.0.1.100 refer to abused legitimate domains and they should be used only in contents analysis of message bodies to prevent false positives. Also note that the “24” in the ZRD rules is the maximum number of hours elapsed from the first observation of the domain, and can be decreased for a less aggressive behaviour toward new domains appearing on the Internet. SMTP transactions not rejected by the criteria above should be accepted and subjected to the contents analysis described below.

Both the message headers and the message body are transmitted within the SMTP DATA command.

For platforms and traffic volumes that allow this, the contents analysis should preferably be done while the original SMTP connection is still open, at the end of the DATA stage but before OK’ing the transmission to the sender.

This would give the opportunity to issue an immediate rejection based on contents to the sending server, rather than accepting and bouncing it later as a non-delivery notification to the envelope sender. The envelope sender is often forged in spam, and such non-delivery notifications would turn the receiving server into a backscatter spam source.

“Spam folders” are commonly used to avoid this problem, but not notifying the sender in any way could also be a problem in case a legitimate message is flagged as spam. Immediate rejections during the SMTP DATA stage do not cause backscatter.

Headers checks

We suggest to check the following actions:

score the message negatively if an IP address appearing in the second Received: line, or deeper ones when present, is listed in SBL, CSS or XBL
flag the mail as spam if the domain appearing in the From: user@fromdomain.com line (if present) is listed by DBL with a return code lower than 127.0.1.100
flag the mail as spam if the domain appearing in the Reply-To: user@replytodomain.com line (if present) is listed by DBL with a return code lower than 127.0.1.100
flag the mail as spam if the domain appearing in the Message-ID: <string@msgiddomain.com> line is listed by DBL with a return code lower than 127.0.1.100

Again, DBL return codes larger than 127.0.1.100 refer to abused legitimate domains and we recommend to use them only to score URLs in message bodies to prevent false positives. Also note that PBL listings should never be used as a spam criterion for originating IPs appearing in Received: header lines. Legitimate messages are normally originated by IP addresses listed in PBL, and they must not be penalized in any way for this reason.

Body checks

After properly decoding the message (that can use particular character sets, be encoded in Base64, etc), we recommend to identify all the URLs, including email addresses, appearing in the message body, and then extract IP addresses and domains out of these URLs. Then the following checks can be operated:

score the message negatively if any URL contains an IP address listed by the SBL or XBL zones components (any return code)
score the message negatively if any URL contains a domain/hostname listed by DBL (any return code, including the abuse legit ones, although different scores should be applied to the two groups) or by ZRD (any return code, or return codes limited to a maximum of N hours from the appearance of the domain)

Optionally one can also:

score the message negatively if any URL contains a domain/hostname authoritatively served by a nameserver whose IP address is listed by SBL
score the message negatively if any URL contains a domain/hostname authoritatively served by a nameserver whose domain is listed by DBL with a return code lower than 127.0.1.100

Numeric URLs

Remember that DBL is not expected to receive any IP-based query. If the URL contained in the message body is pointing to a raw IP, that should be checked against IP-based databases only