Spam and Ham Processors (Gnus Manual)

Next: Spam Package Configuration Examples, Previous: Detecting Spam in Groups, Up: Spam Package [Contents][Index]

10.18.4 Spam and Ham Processors

Spam and ham processors specify special actions to take when you exit a group buffer. Spam processors act on spam messages, and ham processors on ham messages. At present, the main role of these processors is to update the dictionaries of dictionary-based spam back ends such as Bogofilter (see Bogofilter) and the Spam Statistics package (see Spam Statistics Filtering).

The spam and ham processors that apply to each group are determined by the group’s spam-process group parameter. If this group parameter is not defined, they are determined by the variable gnus-spam-process-newsgroups.

Gnus learns from the spam you get. You have to collect your spam in one or more spam groups, and set or customize the variable spam-junk-mailgroups as appropriate. You can also declare groups to contain spam by setting their group parameter spam-contents to gnus-group-spam-classification-spam, or by customizing the corresponding variable gnus-spam-newsgroup-contents. The spam-contents group parameter and the gnus-spam-newsgroup-contents variable can also be used to declare groups as ham groups if you set their classification to gnus-group-spam-classification-ham. If groups are not classified by means of spam-junk-mailgroups, spam-contents, or gnus-spam-newsgroup-contents, they are considered unclassified. All groups are unclassified by default.

In spam groups, all messages are considered to be spam by default: they get the ‘$’ mark (gnus-spam-mark) when you enter the group. If you have seen a message, had it marked as spam, then unmarked it, it won’t be marked as spam when you enter the group thereafter. You can disable that behavior, so all unread messages will get the ‘$’ mark, if you set the spam-mark-only-unseen-as-spam parameter to nil. You should remove the ‘$’ mark when you are in the group summary buffer for every message that is not spam after all. To remove the ‘$’ mark, you can use M-u to “unread” the article, or d for declaring it read the non-spam way. When you leave a group, all spam-marked (‘$’) articles are sent to a spam processor which will study them as spam samples.

Messages may also be deleted in various other ways, and unless ham-marks group parameter gets overridden below, marks ‘R’ and ‘r’ for default read or explicit delete, marks ‘X’ and ‘K’ for automatic or explicit kills, as well as mark ‘Y’ for low scores, are all considered to be associated with articles which are not spam. This assumption might be false, in particular if you use kill files or score files as means for detecting genuine spam, you should then adjust the ham-marks group parameter.

Variable: ham-marks ¶: You can customize this group or topic parameter to be the list of marks you want to consider ham. By default, the list contains the deleted, read, killed, kill-filed, and low-score marks (the idea is that these articles have been read, but are not spam). It can be useful to also include the tick mark in the ham marks. It is not recommended to make the unread mark a ham mark, because it normally indicates a lack of classification. But you can do it, and we’ll be happy for you.

Variable: spam-marks ¶: You can customize this group or topic parameter to be the list of marks you want to consider spam. By default, the list contains only the spam mark. It is not recommended to change that, but you can if you really want to.

When you leave any group, regardless of its spam-contents classification, all spam-marked articles are sent to a spam processor, which will study these as spam samples. If you explicit kill a lot, you might sometimes end up with articles marked ‘K’ which you never saw, and which might accidentally contain spam. Best is to make sure that real spam is marked with ‘$’, and nothing else.

When you leave a spam group, all spam-marked articles are marked as expired after processing with the spam processor. This is not done for unclassified or ham groups. Also, any ham articles in a spam group will be moved to a location determined by either the ham-process-destination group parameter or a match in the gnus-ham-process-destinations variable, which is a list of regular expressions matched with group names (it’s easiest to customize this variable with M-x customize-variable RET gnus-ham-process-destinations). Each group name list is a standard Lisp list, if you prefer to customize the variable manually. If the ham-process-destination parameter is not set, ham articles are left in place. If the spam-mark-ham-unread-before-move-from-spam-group parameter is set, the ham articles are marked as unread before being moved.

If ham can not be moved—because of a read-only back end such as NNTP, for example, it will be copied.

Note that you can use multiples destinations per group or regular expression! This enables you to send your ham to a regular mail group and to a ham training group.

When you leave a ham group, all ham-marked articles are sent to a ham processor, which will study these as non-spam samples.

By default the variable spam-process-ham-in-spam-groups is nil. Set it to t if you want ham found in spam groups to be processed. Normally this is not done, you are expected instead to send your ham to a ham group and process it there.

By default the variable spam-process-ham-in-nonham-groups is nil. Set it to t if you want ham found in non-ham (spam or unclassified) groups to be processed. Normally this is not done, you are expected instead to send your ham to a ham group and process it there.

When you leave a ham or unclassified group, all spam articles are moved to a location determined by either the spam-process-destination group parameter or a match in the gnus-spam-process-destinations variable, which is a list of regular expressions matched with group names (it’s easiest to customize this variable with M-x customize-variable RET gnus-spam-process-destinations). Each group name list is a standard Lisp list, if you prefer to customize the variable manually. If the spam-process-destination parameter is not set, the spam articles are only expired. The group name is fully qualified, meaning that if you see ‘nntp:servername’ before the group name in the group buffer then you need it here as well.

If spam can not be moved—because of a read-only back end such as NNTP, for example, it will be copied.

Note that you can use multiples destinations per group or regular expression! This enables you to send your spam to multiple spam training groups.

The problem with processing ham and spam is that Gnus doesn’t track this processing by default. Enable the spam-log-to-registry variable so spam.el will use gnus-registry.el to track what articles have been processed, and avoid processing articles multiple times. Keep in mind that if you limit the number of registry entries, this won’t work as well as it does without a limit.

Set this variable if you want only unseen articles in spam groups to be marked as spam. By default, it is set. If you set it to nil, unread articles will also be marked as spam.

Set this variable if you want ham to be unmarked before it is moved out of the spam group. This is very useful when you use something like the tick mark ‘!’ to mark ham—the article will be placed in your ham-process-destination, unmarked as if it came fresh from the mail server.

When autodetecting spam, this variable tells spam.el whether only unseen articles or all unread articles should be checked for spam. It is recommended that you leave it off.