Hercule Filter

Hercule Filter is a detective for spam-typical HTML and forged mail headers.
You may find the latest version at www.hinzen.de/Spamihilator and contact the author at edy@hinzen.de.
The current version number of this plugin will be shown in the upper right corner of the options dialog.


Scan sequence

Please note that 'Hercule' scans first the simple and fast-to-scan options and next those options who may take more time.
First of all, the mail headers will be checked, since those are given first by Spamihilator.
If in one of the (internal defined) sections of tests the mail is marked as Spam, 'Hercule' stops it's processing.
That means, that 'Hercule' reports only the first found sign(s) although probably more indications of Spam are contained in a mail.


Messages

If you select an entry in Spamihilator's recycle bin, the filter reason will be shown at the label "Spam Words" as usual.
For some options, the amount of found occurences will be shown, too.
External image (2) means, for example, that two external images have been found.

If you need detailed information about the rejection reason, please use the Logging-Options as decribed below.


Option panels

The options can be set on several panels in the options dialog.
You may set the options back to defaults by using the [Reset] Button.

This help file explains in different sections the various option panels. The explanations are structured as follows:

Title of option panel
Subtitle of option panel
Option, as you read it in the dialog Description
Message as shown in the recycle bin of Spamihilator.
Examples



Header
Mark as SPAM, if mail header ...
contains forged date Detects wrong dates (e.g. violating RFC 2822).
Forged date
contains date elder than one year Detects date strings with too old values.
Elder than one year
has bad charset Detects invalid charsets.
Bad charset
has bad subject Detects subjects containing white space or fillers like ".....".
Bad subject
has empty subject Detects mails with empty subjects.
Empty subject
contains authentication warning Detects mails with warnings given by program "sendmail".
Authentication warning found
contains invalid IP addresses Detects mails with IP addresses that violate internet standards.
Invalid IP address
reveals mail address Detects mails revealing your mail address in header fields where not necessary.
Header reveals your mail address
has more than one BCC field In general, the BCC (Blind Carbon Copy) field should not be sent. Some programs do either. If more than one BCC header is found, it's likely SPAM.
Multiple BCC fields found
forged mail header This option scans several vialoations of standards for mail headers.
Forged mail header



HTML (1)
Mark as SPAM, if detected ...
HTML hides e-mail address Detects tricks used to find out your address e.g. if you send the SPAM to abuse newsgroups.
HTML hides your mail address
link containing e-mail address Detects external links containing your mail address (in clear text or encrypted).
URL reveals your mail address
link perhaps revealing your identity Detects external links with parameters (like "aff_id=0815_4177") that probably reveals your address.
URL reveals identity
external images Detects useage of external images.
External image
zero sized images Detects useage of zero sized (and hereby invisible) images.
Zero sized image
image link contains e-mail address Detects useage of image links revealing that you have opened the mail.
Image reveals your mail address
external frames Detects useage of external frames. Those could be used to forge mails that may let you believe that a trusted company (e.g. your bank) has mailed to you.
External frame
invisible frames Detects useage of invisible frames. Those can be used to reveal your address or to download intrusion programs.
Invisible frame
empty mail Detects empty mails. Some spam programs seem to crash sometimes sending mails without any contents.
Empty mail



HTML (2)
Mark as SPAM, if detected ...
more than ... invalid HTML tags If checked, misspelled or invalid HTML tags may not exceed this value. Please don't set too low, because humans may have sent you a mail with manually written HTML tags and probably some typos in there.
Note: The list of valid HTML tags is held in file "HerculeFilter.ini" but cannot be edited using the options dialog.
Invalid HTML tags
more than ... too long tags If checked, defines the count of HTML tags exceeding the currently longest possible tag length.
The Tag <blockquote> is currently the longest valid HTML tag with ten characters. A tag longer than 12 characters will be recognized as too long.
The useage of this option is currently less recommended, because strings like <www.hinzen.de/Spamihilator> could be misinterpreted as tag instead of a term in brackets.
Too long HTML tag
more than ... bad HTML tags Detects bad tags typically used by spammers. Sample: <S§R>
Bad HTML tags
bad URLs Detects bad urls e.g. with some redirect or hideing tricks.
Bad URLs
URLs containing ... If checked, detects the useage of URLs containing one of the entered substrings. Checks, hyperlinks, image-, frame-, stylesheet-links and some more.
Black listed URL



Tricks
Mark as SPAM, if detected ...
more than ... random words Detects useage of random words typically used intending to fool spam filters. Counts e.g. the amount of words at the end of the mail without any punctuation marks.
Random words
more than ... META tags Detects useage of masses of META tags intending to fool spam filters.
Too much META tags
SPAM-typical HTML Detects useage of HTML tags and structures typically used by spammers.
Spam-typical HTML
Intrusion-typical HTML Detects useage of HTML used to infect your system with virusses and trojans. (Doesn't detect the virusses itself - it only can try to check the typically used HTML-techniques.)
Intrusion-typical HTML (viruses, trojans)
forgotten placeholders Detects if a spammer used a random-word-program and probably set wrong keywords.
Scans e.g. strings like %RANDOM_TEXT.
Place holders in body
URL spoofing Detects if a spammer tries to let you see another URL than that one really used.
URL spoofing
scripting Detects if scripting is used for some spamming tricks.
Contains script



Style
Mark as SPAM, if detected ...
tiny letters Detects useage of tiny letters (not read by humans but) confusing spam filters
Tiny letters
hidden letters Detects useage of hidden letters (not read by humans but) confusing spam filters
Invisible letters
white letters Detects useage of white letters (not read by humans but) confusing spam filters
White letters



Logging
Mode
Never No logging takes place.
Standard Standard logging. In general, only errors are reported.
Verbose For every rejected mail the sender, the subject and the rejection-reason are logged.
Extended In addition to the above, the correspondig dialog option that caused the rejection will be listed. Durations will be shown.
Debug-Mode Only for debugging. Start and end of subroutines and the contents of the mails are logged, too.
Remove previous log Within this option you define how long previous log entries will be held.



Version history

Version Remarks
1.2.0.0 Added "Gray list".
1.1.0.3 Recognizes more Scripts hidden in CSS.
  Accepts "?xml"-notation of xmlns.
1.0.9.7 Corrected bug in previous version that caused most settings deactivated, regardles of user settings.
1.0.9.6 Improved performance for mails with big attachments.
  Logging-mode "extended" now shows durations.
1.0.9.5 Fixed serious bug that marked all dates as forged if other date separator than "." is defined in local settings.
  Improved recognition of external files (e.g. Images).
  Improved logging details.
1.0.9.4 Fixed small bug concerning message-id headers including comments.
1.0.9.3 Fixed bug showing no filter-reason when logging was deactivated.
1.0.9.1 Improved scan for external images / frames.
  Improved performance of HTML-Scan.
  Now accepts XML name spaces (xmlns, as used e.g. by Office programs) in HTML-Scan.
  Fixed bug that marked given time zones without plus- or minus-sign ("+" "-") as forged date (e.g. "0100").