Email is a fast and convenient way of sharing information for businesses. However, the simplicity and the convenience have also made it a hub for scams. We often find several unwanted emails in our inboxes. The menace of spam is increasing by leaps and bounds and have emerged as a serious threat. And the spam filtering is one of the best tools against spam mail available today.
There are several spam detection algorithms in use nowadays. They can be divided into two main categories
1. Content based filtering
2. Rule-based filtering
And the email can be divided into two sections that include
1. Header section containing sender’s and recipient’s email address
2. Subject of the mail
3. The content of the mail that includes text, images, and other multimedia data in the main body of the mail
Content based filtering techniques focus on the content of the mail and ignore the header section and subject of the mail. The majority of the content-based filtering techniques use a bag of words to identify spam mail. It is similar to text classification and has lower rates of false positives.
There are several content-based spam filtering ( Securence/Spam-Filtering ) techniques that include Gary Robinson technique, Bayesian Filtering, KNN classifier, and AdaBoost classifier. The spam filters look for commonly used words in spams and marketing emails. For example, spam mails, generally contain words like Viagra or Sales that do not generally appear in legitimate mail.
The spam filtering techniques can be tailored to meet individual needs of the user by building a bag of words. The working on the content-based filtering techniques can be refined by identifying the false judgments of the software.
Rule Based Filtering
This spam filtering technique is based on rules that can be applied to the header section of the email without considering the content of the email. The mail is classified using a concept of rules. These rules are applied on “To:”, “From:”, and “Subject”.
The rules of filtering may vary such as searching the subject line for words like free, sale and other words used in marketing. The rule-based filtering technique also checks whether the mail has arrived from a sender present in the address book of the user.
Rule-based filtering techniques use a white and blacklist of email addresses. If the sender or receiver’s address matches the address in the white or black list, the corresponding action is taken. If the mail is multicast and is addressed to a mailing list having more than 1000 recipients, it is immediately marked as spam mail.
Comparing Spam Filtering Techniques
A study was conducted by IJAIEM to compare content based and rules-based filtering technique. The results showed content-based filtering techniques were more accurate when detecting undesirable whereas rules based filtering worked much faster.
Employing spam filtering has become a necessity for businesses. By employing spam protection ( Securence.com/solution ) services, businesses can ensure employees are able to access email quickly and they do not have to worry about threats that may be hiding in the form of illegitimate emails.