Hercule Filter

Hercule Filter is a detective for spam-typical HTML and forged mail headers.
You may find the latest version at www.hinzen.de/Spamihilator and contact the author at edy@hinzen.de.
The current version number of this plugin will be shown in the upper right corner of the options dialog.

Scan sequence

Please note that 'Hercule' scans first the simple and fast-to-scan options and next those options who may take more time.
First of all, the mail headers will be checked, since those are given first by Spamihilator.
If in one of the (internal defined) sections of tests the mail is marked as Spam, 'Hercule' stops it's processing.
That means, that 'Hercule' reports only the first found sign(s) although probably more indications of Spam are contained in a mail.

Messages

If you select an entry in Spamihilator's recycle bin, the filter reason will be shown at the label "Spam Words" as usual.
For some options, the amount of found occurences will be shown, too.
External image (2) means, for example, that two external images have been found.

If you need detailed information about the rejection reason, please use the Logging-Options as decribed below.

Option panels

The options can be set on several panels in the options dialog.
You may set the options back to defaults by using the [Reset] Button.

This help file explains in different sections the various option panels. The explanations are structured as follows:

Title of option panel
Subtitle of option panel
Option, as you read it in the dialog	Description Message as shown in the recycle bin of Spamihilator. Examples

Header
Mark as SPAM, if mail header ...
contains forged date	Detects wrong dates (e.g. violating RFC 2822). Forged date
contains date elder than one year	Detects date strings with too old values. Elder than one year
has bad charset	Detects invalid charsets. Bad charset
has bad subject	Detects subjects containing white space or fillers like ".....". Bad subject
has empty subject	Detects mails with empty subjects. Empty subject
contains authentication warning	Detects mails with warnings given by program "sendmail". Authentication warning found
contains invalid IP addresses	Detects mails with IP addresses that violate internet standards. Invalid IP address
reveals mail address	Detects mails revealing your mail address in header fields where not necessary. Header reveals your mail address
has more than one BCC field	In general, the BCC (Blind Carbon Copy) field should not be sent. Some programs do either. If more than one BCC header is found, it's likely SPAM. Multiple BCC fields found
forged mail header	This option scans several vialoations of standards for mail headers. Forged mail header

HTML (1)
Mark as SPAM, if detected ...
HTML hides e-mail address	Detects tricks used to find out your address e.g. if you send the SPAM to abuse newsgroups. HTML hides your mail address
link containing e-mail address	Detects external links containing your mail address (in clear text or encrypted). URL reveals your mail address
link perhaps revealing your identity	Detects external links with parameters (like "aff_id=0815_4177") that probably reveals your address. URL reveals identity
external images	Detects useage of external images. External image
zero sized images	Detects useage of zero sized (and hereby invisible) images. Zero sized image
image link contains e-mail address	Detects useage of image links revealing that you have opened the mail. Image reveals your mail address
external frames	Detects useage of external frames. Those could be used to forge mails that may let you believe that a trusted company (e.g. your bank) has mailed to you. External frame
invisible frames	Detects useage of invisible frames. Those can be used to reveal your address or to download intrusion programs. Invisible frame
empty mail	Detects empty mails. Some spam programs seem to crash sometimes sending mails without any contents. Empty mail

HTML (2)
Mark as SPAM, if detected ...
more than ... invalid HTML tags	If checked, misspelled or invalid HTML tags may not exceed this value. Please don't set too low, because humans may have sent you a mail with manually written HTML tags and probably some typos in there. Note: The list of valid HTML tags is held in file "HerculeFilter.ini" but cannot be edited using the options dialog. Invalid HTML tags
more than ... too long tags	If checked, defines the count of HTML tags exceeding the currently longest possible tag length. The Tag <blockquote> is currently the longest valid HTML tag with ten characters. A tag longer than 12 characters will be recognized as too long. The useage of this option is currently less recommended, because strings like <www.hinzen.de/Spamihilator> could be misinterpreted as tag instead of a term in brackets. Too long HTML tag
more than ... bad HTML tags	Detects bad tags typically used by spammers. Sample: <S§R> Bad HTML tags
bad URLs	Detects bad urls e.g. with some redirect or hideing tricks. Bad URLs
URLs containing ...	If checked, detects the useage of URLs containing one of the entered substrings. Checks, hyperlinks, image-, frame-, stylesheet-links and some more. Black listed URL

Tricks
Mark as SPAM, if detected ...
more than ... random words	Detects useage of random words typically used intending to fool spam filters. Counts e.g. the amount of words at the end of the mail without any punctuation marks. Random words
more than ... META tags	Detects useage of masses of META tags intending to fool spam filters. Too much META tags
SPAM-typical HTML	Detects useage of HTML tags and structures typically used by spammers. Spam-typical HTML
Intrusion-typical HTML	Detects useage of HTML used to infect your system with virusses and trojans. (Doesn't detect the virusses itself - it only can try to check the typically used HTML-techniques.) Intrusion-typical HTML (viruses, trojans)
forgotten placeholders	Detects if a spammer used a random-word-program and probably set wrong keywords. Scans e.g. strings like %RANDOM_TEXT. Place holders in body
URL spoofing	Detects if a spammer tries to let you see another URL than that one really used. URL spoofing
scripting	Detects if scripting is used for some spamming tricks. Contains script

Style
Mark as SPAM, if detected ...
tiny letters	Detects useage of tiny letters (not read by humans but) confusing spam filters Tiny letters
hidden letters	Detects useage of hidden letters (not read by humans but) confusing spam filters Invisible letters
white letters	Detects useage of white letters (not read by humans but) confusing spam filters White letters

Logging
Mode
Never	No logging takes place.
Standard	Standard logging. In general, only errors are reported.
Verbose	For every rejected mail the sender, the subject and the rejection-reason are logged.
Extended	In addition to the above, the correspondig dialog option that caused the rejection will be listed. Durations will be shown.
Debug-Mode	Only for debugging. Start and end of subroutines and the contents of the mails are logged, too.

Remove previous log	Within this option you define how long previous log entries will be held.

Version history

Version	Remarks
1.2.0.0	Added "Gray list".
1.1.0.3	Recognizes more Scripts hidden in CSS.
	Accepts "?xml"-notation of xmlns.
1.0.9.7	Corrected bug in previous version that caused most settings deactivated, regardles of user settings.
1.0.9.6	Improved performance for mails with big attachments.
	Logging-mode "extended" now shows durations.
1.0.9.5	Fixed serious bug that marked all dates as forged if other date separator than "." is defined in local settings.
	Improved recognition of external files (e.g. Images).
	Improved logging details.
1.0.9.4	Fixed small bug concerning message-id headers including comments.
1.0.9.3	Fixed bug showing no filter-reason when logging was deactivated.
1.0.9.1	Improved scan for external images / frames.
	Improved performance of HTML-Scan.
	Now accepts XML name spaces (xmlns, as used e.g. by Office programs) in HTML-Scan.
	Fixed bug that marked given time zones without plus- or minus-sign ("+" "-") as forged date (e.g. "0100").