#2384 - Anti-spam heuristics

Identifier #2384
Issue type Feature request or suggestion
Title Anti-spam heuristics
Status Completed
Tags

Type: Spam (custom)

Handling member Chris Graham
Addon core
Description There are a number of factors we can use to detect increased likelihood of spam:
1) Posting speed (by looking at when the CSRF token was generated compared to when the form was posted)
2) Closeness to having joined (people may join for bypassing CAPTCHA, getting extra features)
3) Posting links
4) Posting frequency
5) Posting repeat content
6) Using particular keywords ("cialis", ...)
7) Using particular coding ("Times New Roman" [implies a paste], "<font face=" [implies a paste])
8) Use of invalid coding from other software ("[link", ...)
9) Use of paste as opposed to typing
10) Presence of JavaScript (particular calculations could be done and submitted with the form, to know that a real working JavaScript engine was there; perhaps something computationally costly like factorisation; also detection of use of mouse and/or keyboard as a human would)
11) Triggering of the spam blackhole in a form
12) Particular user-agent substrings ("bot", "perl", ...)
13) Missing HTTP headers a real browser will always send: Accept, User-Agent, Cookie, Accept-Language, Accept-Encoding
14) Hits from particular countries (fully configurable)

We can detect these factors and make them configurable to bump up the spam certainty ratings for a request. It would be cumulative, each factor would add together to give an overall spam rating. That overall rating would be subject to the approve/block/ban thresholds that already exist.

Our LAME_SPAM_HACK hack-attack signal can be removed, and the code for that integrated into this new system.

It would all be configurable. All the time factors, all the different spam certainty increments (including configuration per detected spammy keyword).
Steps to reproduce

Additional information Here's some simple temporary code in use on our own sites in an unofficial capacity, a small subset of what this final system would do...

require_code('antispam');
$hours_like_guest=2;
$post=post_param('post','');
if ((is_guest() || $GLOBALS['FORUM_DRIVER']->get_member_join_timestamp(get_member())>time()-60*60*$hours_like_guest) && ((strpos($post,'<a ')!==false) || (strpos($post,'[url')!==false))) {
handle_perceived_spammer_by_confidence(get_ip_address(),floatval(get_option('spam_approval_threshold'))/100.0,'internal checks',false);
}
Related to

#2057 - Delete member content on punishment form

Funded? No
The system will post a comment when this issue is modified (e.g., status changes). To be notified of this, click "Enable comment notifications".

Rating

Unrated