Actual Contacts for Outlook: Email verification technologies
The process of message delivery by a mail server consists of two phases. First, it detects the address of the mail server that receives messages for the recipient (RS). Then, it connects to that server using SMTP protocol and transmits the message to it.
A mail domain (mail.com for the address [email protected]; "alex" here is a mailbox in the mail.com
domain) name is normally different from the name of the mail server that receives
messages for that address. At the moment (when this text was written - 2005),
messages for [email protected] are
received by the servers mail-com.mr.outblaze.com and mail-com-bk.mr.outblaze.com.
While computers with the addresses mail.com and www.mail.com receive no messages for no addresses. So, mail domain cannot be directly related to the mail server address, rather often messages are received by a computer with a completely different name.
To find out the RS address, a query is sent to the DNS service, which stores (besides other things) information about mail servers that receive messages for each domain.
DNS is a distributed database. For example, DNS server ns1.outblaze.com stores
all the information about mail.com, but it knows nothing about other
domains, e.g. about hotmail.com. The server ns1.hotmail.com stores
information about domain hotmail.com, but it knows nothing about other
domains. There is a server responsible for all .com domains, it keeps
information about the servers that store domain information in the .com zone.
Assuming your ISP's DNS has no information about mail.com or hotmail.com.
Therefore, when it receives a query about the name mail.com, it inquires after
the server responsible for the .com zone about the address of the server
that contains the domain information for mail.com (it is ns1.outblaze.com), it
connects to that server and returns the answer to you. This way of query
execution is referred to as recursive.
We are not going to enlarge upon DNS technology here (it is well described in numerous public sources). The important fact for us to note is that a query to the DNS service might go through several DNS servers scattered all over the globe before you get your answer. And, after all, it's the domain owner who is responsible for the storage of information about it.
There is a common practice for caching DNS queries. Normally, a DNS server remembers the recursive query results for a couple of days in order to reduce the DNS server loading time to ensure faster query execution (information about the maximum possible number of days for the result caching is contained in the answer to a query). This means that when a DNS record suddenly changes, it might take several days before caches of other DNS servers on the Internet are updated and their users get the latest information.
To check whether an e-mail address exists or not, it is necessary to perform the same two phases as a mail server does to deliver a message to a recipient. First, we need to find out the address of the server that receives messages for the recipient. Then, we have to connect to the mail server and ask it if it can receive a message for the user with that particular address.
Unfortunately, this method allows the detection of no more than about two thirds of invalid addresses. The problem is that some mail servers receive all messages for their mail domains, but if a mailbox doesn't exist, a server notifies the sender via e-mail that the message is undeliverable.
Current statistics show that about 30% of active and 66.7% of dead addresses can be detected in the first phase, and 70% can be detected in the second phase. On average, the second stage takes 10 times longer and involves 5 times the amount of network traffic compared to the first phase. In fact, the two-stage checking process requires as much time and traffic as the sending of a small message to each address being checked.
Consider both phases in more detail. In the first phase, the checking software analyses e-mail address syntax, identifies mail domain and inquires to the DNS server about the mail server address for that domain. For interaction with a DNS server UDP protocol is used, this protocol is faster than TCP, because it is not oriented to establishing a connection between servers. Normally, DNS server inquiring time doesn't exceed 1-2 seconds. During that time, one packet with the query is sent (about 60 bytes including the packet heading) and one packet with the answer is received (its size doesn't exceed 512 bytes; normally it's no more than 200-300 bytes). Obviously, in this phase all addresses with the wrong syntax and addresses with non-existent domains are screened.
In the second phase connection is established with a mail server using SMTP protocol (based on TCP). TCP is oriented towards the establishing of a connection, therefore the servers involved in the process first send service packets in order to establish a connection. Once a connection is established, the servers exchange greetings (see the first three lines in the log below); then, the sender's address is submitted, and the receiving server confirms its readiness to receive a message from that address; after that, the recipient's message address is submitted:
< 220-ns.watson.ibm.com ESMTP Sendmail AIX4.3/8.9.3/8.9.0
< 220 Thu, 22 Aug 2002 20:44:07 +0500
> HELO cisco.my.net
< 250-ns.watson.ibm.com Hello cisco.my.net [12.44.72.94],
< 250 pleased to meet you
> MAIL FROM:
< 250 ... Sender is valid.
> RCPT TO:
< 550 ... User unknown
> RSET
< 250 Resetting the state.
> QUIT
In this
instance, the receiving server answered that the user with the address [email protected] was unknown to it and refused to receive the message. After that, the servers exchanged commands to terminate the connection.
While checking the address, the servers sent to each other 10 messages with a total size of about 500 bytes; but to send all those messages, they had to exchange over 20 packets, so the total traffic was about 2K. Of note, most of the action time was spent on waiting for replies from the other server. |