8.6 Adaptive Scoring

If all this scoring is getting you down, Gnus has a way of making it all happen automatically—as if by magic. Or rather, as if by artificial stupidity, to be precise.

When you read an article, or mark an article as read, or kill an article, you leave marks behind. On exit from the group, Gnus can sniff these marks and add score elements depending on what marks it finds. You turn on this ability by setting gnus-use-adaptive-scoring to t or (line). If you want score adaptively on separate words appearing in the subjects, you should set this variable to (word). If you want to use both adaptive methods, set this variable to (word line).

To give you complete control over the scoring process, you can customize the gnus-default-adaptive-score-alist variable. For instance, it might look something like this:

(setq gnus-default-adaptive-score-alist
  '((gnus-unread-mark)
    (gnus-ticked-mark (from 4))
    (gnus-dormant-mark (from 5))
    (gnus-del-mark (from -4) (subject -1))
    (gnus-read-mark (from 4) (subject 2))
    (gnus-expirable-mark (from -1) (subject -1))
    (gnus-killed-mark (from -1) (subject -3))
    (gnus-kill-file-mark)
    (gnus-ancient-mark)
    (gnus-low-score-mark)
    (gnus-catchup-mark (from -1) (subject -1))))

As you see, each element in this alist has a mark as a key (either a variable name or a “real” mark—a character). Following this key is a arbitrary number of header/score pairs. If there are no header/score pairs following the key, no adaptive scoring will be done on articles that have that key as the article mark. For instance, articles with gnus-unread-mark in the example above will not get adaptive score entries.

Each article can have only one mark, so just a single of these rules will be applied to each article.

To take gnus-del-mark as an example—this alist says that all articles that have that mark (i.e., are marked with ‘e’) will have a score entry added to lower based on the From header by −4, and lowered by Subject by −1. Change this to fit your prejudices.

If you have marked 10 articles with the same subject with gnus-del-mark, the rule for that mark will be applied ten times. That means that that subject will get a score of ten times −1, which should be, unless I’m much mistaken, −10.

If you have auto-expirable (mail) groups (see Expiring Mail), all the read articles will be marked with the ‘E’ mark. This’ll probably make adaptive scoring slightly impossible, so auto-expiring and adaptive scoring doesn’t really mix very well.

The headers you can score on are from, subject, message-id, references, xref, lines, chars and date. In addition, you can score on followup, which will create an adaptive score entry that matches on the References header using the Message-ID of the current article, thereby matching the following thread.

If you use this scheme, you should set the score file atom mark to something small—like −300, perhaps, to avoid having small random changes result in articles getting marked as read.

After using adaptive scoring for a week or so, Gnus should start to become properly trained and enhance the authors you like best, and kill the authors you like least, without you having to say so explicitly.

You can control what groups the adaptive scoring is to be performed on by using the score files (see Score File Format). This will also let you use different rules in different groups.

The adaptive score entries will be put into a file where the name is the group name with gnus-adaptive-file-suffix appended. The default is ADAPT.

Adaptive score files can get huge and are not meant to be edited by human hands. If gnus-adaptive-pretty-print is nil (the default) those files will not be written in a human readable way.

When doing adaptive scoring, substring or fuzzy matching would probably give you the best results in most cases. However, if the header one matches is short, the possibility for false positives is great, so if the length of the match is less than gnus-score-exact-adapt-limit, exact matching will be used. If this variable is nil, exact matching will always be used to avoid this problem.

As mentioned above, you can adapt either on individual words or entire headers. If you adapt on words, the gnus-default-adaptive-word-score-alist variable says what score each instance of a word should add given a mark.

(setq gnus-default-adaptive-word-score-alist
      `((,gnus-read-mark . 30)
        (,gnus-catchup-mark . -10)
        (,gnus-killed-mark . -20)
        (,gnus-del-mark . -15)))

This is the default value. If you have adaption on words enabled, every word that appears in subjects of articles marked with gnus-read-mark will result in a score rule that increase the score with 30 points.

Words that appear in the gnus-default-ignored-adaptive-words list will be ignored. If you wish to add more words to be ignored, use the gnus-ignored-adaptive-words list instead.

Some may feel that short words shouldn’t count when doing adaptive scoring. If so, you may set gnus-adaptive-word-length-limit to an integer. Words shorter than this number will be ignored. This variable defaults to nil.

When the scoring is done, gnus-adaptive-word-syntax-table is the syntax table in effect. It is similar to the standard syntax table, but it considers numbers to be non-word-constituent characters.

If gnus-adaptive-word-minimum is set to a number, the adaptive word scoring process will never bring down the score of an article to below this number. The default is nil.

If gnus-adaptive-word-no-group-words is set to t, gnus won’t adaptively word score any of the words in the group name. Useful for groups like ‘comp.editors.emacs’, where most of the subject lines contain the word ‘emacs’.

After using this scheme for a while, it might be nice to write a gnus-psychoanalyze-user command to go through the rules and see what words you like and what words you don’t like. Or perhaps not.

Note that the adaptive word scoring thing is highly experimental and is likely to change in the future. Initial impressions seem to indicate that it’s totally useless as it stands. Some more work (involving more rigorous statistical methods) will have to be done to make this useful.