Spam - solutions anyone?
By Lasa Information Systems Team
Spam (unsolicited bulk email) is now a real problem for many organisations. The knowledgebase article "dealing with spam" explains what spam is, how and why it arrives in our in boxes and provides some suggestions for managing it. This article looks in more depth at how the various solutions on offer work and provides some guidance on issues to consider when choosing a solution for your organisation.
What's the problem?
Spam has rocketed to epic proportions since the first e-mail was sent in 1971. Back then, there were just a few geeks sending email, so the concept of spam didn't exist. In 1989, when the first commercial Internet service providers emerged, the opportunity for unsolicited commercial email unfolded. Now, as much as 64 per cent of all email is unwanted bulk mail, according to estimates from spam-filtering company Brightmail in May 2004. This is an alarming figure and makes it very important that organisations and individuals have a method for dealing with these unwanted emails. It is getting harder and harder to just ignore them, or spend the first hour of your morning sifting through your inbox. Throw in the number of new viruses being created on a daily basis and the average PC user is now having to develop a good knowledge of IT just to be safe surfing the net or sending emails.
What solutions are out there?
The only way you can ensure that your email is 100% spam free is through manually checking each and every email that comes into your inbox or your organisation. Unfortunately this is almost impossible to do, so some degree of compromise is required. There are various kinds of solutions depending on what your needs are, a few of the options are detailed below:
There are software solutions which are either installed on a server or installed on the client machines (individual workstations). These are then managed by adding rules or linking into Internet black lists. Examples include:
- McAfee Spamkiller,
- GFI Essentials,
- Spam Inspector,
- and the free / open source SpamBayes.
There are hardware solutions that require the installation of a box or router, these will usually also provide a firewall solution or Antivirus capabilities. Examples include:
Externally managed Solutions
There are also solutions that can be managed by an external organisation. They mainly work by having all your emails re-routed through to their servers, which then check them for spam and viruses using various rules and software. They then forward the mails to you and have different methods for dealing with any identified spam. Some tag the emails as spam, while others store the emails on their servers for you to check and delete. Examples include:
The ideal solution from a network manager's point of view would be one that allowed you to delete certain emails as definite spam, store likely spam suspects in a folder and allow users to develop white lists and banned email lists.
How do anti-spam solutions work?
All anti spam solutions work using a mixture of rules that end up giving an email a rating that determines the likelihood of it being spam. The user then sets the level at which an email will be tagged or classified as spam, this will depend on how aggressively you want to deal with spam. You can then decide to automatically delete these emails or store them in a spam folder for further checking.
White lists typically allow every email from everyone in a user's existing address book. These are treated as trusted email addresses that you have approved. Hotmail accounts use this method. It requires users to add all their contacts into their address books. All other mail is put in a junk email folder by default, and users then go and decide whether to keep these emails or not.
These are lists of known email addresses or domain names that have been used by persons or companies sending spam (a domain name is a name that identifies an organisation or entity on the Internet e.g. www.yourorganisation.com). There are several Internet organisations that maintain lists of IP addresses known to encourage spamming e.g. Spamcop, MAPS, and other open relay blacklists etc.
An IP address is a unique identifier for every machine that connects to the Internet or allows machines to connect through it to the Internet. IP addresses are generally shown as 4 sets of numbers from 0-255 e.g. 18.104.22.168).
ome anti spam software allows you to build up lists of email addresses and domains that you do not want to receive mail from, and any emails from these addresses are automatically deleted or put in a spam or junk mail folder.
The heuristic filter performs sets of tests on all the message components, (i.e. not just the message header which contains information about the email such the addressee and other recipients, but also the message body or main content of the email). Heuristic filters look for words, phrases, links or other characteristics which are common to spam. These tests are simple efficient rules of thumb that a program uses to make decisions on whether an email should be classified as spam or not. For example, it checks the occurrence of words in an email and uses this to rate the likelihood of an email being spam. This is not as useful as it once was as there are many ways to get around the filter, e.g. inserting spaces or dashes between words, like f-r-e-e and S-E-X.
This method was developed to combat spammers that had learnt how to circumvent the probability method (see heuristics above). It is based on a theory by Thomas Bayes. The essence of the Bayesian approach is to provide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence. That is, these filters operate by "learning" the trends in emails that you have identified to them as spam. They use this education in spam to identify evidence in future emails you receive that suggest that they too might be spam.
Bayes's rule proves to be extremely effective in weighing the evidence for and against a particular email being unsolicited email that you don't want to read. Unlike heuristic filters they have the advantage that they constantly evolve - every new email that is classified as spam is used to further educate the filter. In addition they are tailored to the particular type of email that you want to filter out.
A URL (Uniform Resource Locator) is unique address for a file that is accessible on the Internet (e.g. www.yourorganisation.org.uk/index.html). IP addresses and URLs contained within or referenced by a message are compared against a database of URLs or IPs known to be used by spammers. If they are found on the list they are automatically designated as spam. An extension to this is where specific email addresses are matched to a specific IP or range (e.g. 22.214.171.124/255 would check for all IP addresses that began 62.49.0.x) if the IP address doesn't also match, the message is rejected.
These look at the contents of an email and filter or compare, it against certain rules before deciding whether the email is spam or not. It includes headers, size, message subject, the body of the email, attachments etc.
Things to consider, before choosing a spam solution:
How much care will the product require?
Most spam solutions require a degree of human intervention to teach the solution what you term as spam. With different products the amount of work varies, some are very intuitive and give the user the option of clicking on buttons like "this is spam", "this is a spammer", etc. while others may require you to go into a folder and deselect those emails that are not spam.
How accurate is the product's search criteria in identifying Spam?
Most spam software products state that they are able to capture over 90% of spam (this is normally after you have educated the software -- some work will be required to teach the software to identify what your organisation considers to be spam).
Out of the box the success rates in identifying spam correctly are between 20-50% realistically. There is also the issue of false positives. What one organisation or group may term spam will be a legitimate email to another organisation (for example, breast cancer organisations, finance houses and manure processing companies will all use terms that most organisations' anti-spam solutions would identify as likely spam). Any product should have a way of ensuring that this is minimised.
How much does it cost?
Prices start from as little as £0.60 per user, and go up to thousands of pounds. The real cost per user includes all the time and effort that is needed to educate the software and ensure that it is not quarantining or deleting legitimate emails. With any system you should expect to invest a lot of time up front, but this should quickly reduce to an acceptable minimum over a period of weeks. You should also consider the costs of not having an effective solution i.e. time to peruse and delete emails, the amount of extra storage required, the extra bandwidth that is used up (resulting in a reduced amount of data that can be transferred across a network in a given time), and the cost of frustration from having your inbox bombarded with junk emails.
What other features am I paying for?
Finally, you need to understand what it is you are paying for. Quite a few solutions offer complimentary services like virus checking and removal, monitoring reports, network traffic reports, automatic updates etc. It is important to ensure you are not paying for the same services twice (e.g. if you already have an effective Antivirus solution).
In conclusion, the most effective anti-spam solution is a mixture of using automated tools and some human involvement. There are benefits though. If you can get the balance right as it could save you a great deal of time and stress.
- Dealing with spam (or spam, spam, lovely spam)
- From Spam to Ham
- From Spam to Ham - The Story Continues
Published: 7th July 2004 Reviewed: 10th April 2006
Copyright © 2004 Lasa Information Systems Team