Each message is split into single “word bytes

Spam Filters ExplainedWhat do they do? How do they work? Which one is right for me?By Alan HearnshawSpam is a very real problem that many people have to deal with on a daily basis. For those that have decided to do something about it and start to investigate the options available in spam filtering, this article provides a brief introduction to your options and the types of spam filters available.Despite the bewildering array of spam filters available today, all claiming to the best one “of its kind” there are really just five filtering methodologies in general use today and all astronomical telescopes suppliers products rely on one, or a combination of these:Content-Based Filters“In the beginning, there were content-based filters.”These filters scan the contents of the and look for tell-tale signs that the message is spam. In the early days of spamming it was quite simple to look out for “Kill Words” such as ”Lose Weight” and mark a message as spam if it was found.Very soon though, spammers got wise to this and started resorting to all kinds of tricks to get their message past the filters.

The days of “obfuscation” had begun. We started getting messages containing the phrase “L0se Welght” (Notice the zero for “o” and “l” for “i”) and even more bizarre – and sometimes quite ingenious – variations.This rendered basic content-based filters somewhat ineffective, although there are one or two on the market now that are clever enough to “see through” theses attempts and still provide good results.Bayesian Based Filters“The Reverend Bayes comes to the rescue”Born in London 1702, the son of a minister, Thomas Bayes developed a formula which allowed him to determine the probability of an event occurring based on the probabilities of two or more independent evidentiary events.Bayesian filters “learn” from studying known good and bad messages.

Each message is split into single “word bytes”, or tokens and these tokens are placed into a database along with how often they are found in each kind of message.When a new message arrives to be tested by the filter, the new message is also split into tokens and each token is looked up in the database. Extrapolating results from the database and applying a form of the good reverend’s formula, know as a “Naive Bayesian” formula, the message is given a “spamicity” rating and can be dealt with accordingly.Bayesian filters typically are capable of achieving very good accuracy rates (>97% is not uncommon), and require very little on-going maintenance.Whitelist/Blacklist Filters“Who goes there, friend or foe?”This very basic form of filtering is seldom used on its own nowadays, but can be useful as part of a larger filtering strategy.A “whitelist” is nothing more than a list of e-mail addresses from which you wish to accept communications.

0コメント

  • 1000 / 1000