#290 - Spammer database – Composr CMS: Your Data, Your Privacy, Your Control

Identifier	#290
Issue type	Feature request or suggestion
Title	Spammer database
Status	Completed
Handling member	Chris Graham
Addon	core
Description	Lookup IPs/browsers in a spammer database and block from posting accordingly.
Steps to reproduce
Funded?	No

By Guest, By Guest, posted 9th Oct 2011, 9:55 AM

I think it would be better to lookup IP/Username/Email and flag with option to block user (or quarantine with option to allow user).

By Guest, By Guest, posted 10th Oct 2011, 7:40 PM

This should check both at member registration and for each post.

By Guest, By Guest, posted 10th Oct 2011, 8:55 PM

@Bobs: Most spammer databases contain only IP numbers, Email addresses, and user names. if you are checking for spammers at registration time, most likely you won't have any getting the chance to make posts.

A better spam filter would be needed for post submission checking. Perhaps something that would allow admins to enter the new word violation directly from the post.

By Guest, By Guest, posted 10th Oct 2011, 9:12 PM

I'm thinking of the situation where the above information is used to register and then the registration lays idle for a period of time. Checking again at time of post would catch those situations although I guess you could also employ member ranks with appropriate permissions to catch most of this stuff.

By Guest, By Guest, posted 10th Oct 2011, 9:54 PM

I understand what you are thinking about. The extra queries at post submission would put a lot of added traffic on the database website especially if you have a busy forum. Most spammers make it into the database after they have done a lot of spamming and flagging and quarantining them at registration would keep them from making any posts.

The mod I used on the old SMF allowed you to check any/all members against the database at anytime and at registration time.

By Guest, By Guest, posted 30th Dec 2011, 5:06 PM

what i can see in this that what we can do is add block and module for maxmind...
or a other services. that will auto look up ISP/IP/and if they are spammers or not.. in this module it will also have block for sign up with captcha that will auto match with there real IP and not proxy IP... this can or will help to save you time... it will also stop from the posting on your site and forums and there Sig.. please note that maxmind is not free.. but they do have free sign up to get you started. any questions?

By Guest, By Guest, posted 20th Mar 2012, 11:46 PM

I have been using the Cloudflare service which has support built-in for anti-spam and other threats, They use the Project Honeypot service to issue a challenge-response. All unsuccessful challenges are left in quarantine for permanent disposition or they can be left there indefinitely. I really like this system which has caught a lot of threats I would have otherwise missed just checking a spammer database.

I think it would be worthwhile seeing if you could build this service based on Project Honeypot with the possible inclusion of a honeypot to catch and report spammers back to the project. They have fairly demanding requirements to work with them (in order to protect their assets) but it might be worth looking into.

By Guest, By Guest, posted 6th Apr 2012, 1:07 AM

API info for Project Honey Pot's Http:BL

https://www.projecthoneypot.org/httpbl_api.php

By Guest posted 16th Apr 2012, 6:34 PM

"Most spammer databases contain only IP numbers, Email addresses, and user names"

As far as I can tell, it is only IP numbers. Do you know of one that does email addresses and user names? I suspect that'd be a privacy issue actually.

By Guest posted 16th Apr 2012, 6:37 PM

I am seeing some discussion here about checking at registration, or at posting. From my perspective I think it should be both. Both these things are relatively rare compared to page views, and checks are not awfully complex things.

I note a popular HTTP:BL implementation checks on every page view and apparently still has great performance (I think it probably relies on the server's DNS caching for that, as the checks happen via DNS).

By Guest posted 16th Apr 2012, 7:11 PM

NB: Also see discussion topic http://compo.sr/forum/topicview/misc/deploying/sponsorship-for-feature_2.htm

By Guest posted 16th Apr 2012, 7:51 PM

"Most spammer databases contain only IP numbers, Email addresses, and user names"

"As far as I can tell, it is only IP numbers. Do you know of one that does email addresses and user names? I suspect that'd be a privacy issue actually."

This is the mod I was using for my SMF based forum: http://custom.simplemachines.org/mods/index.php?mod=1547
It did an excellent job of weeding out the spammers. That mod pulls from the StopForumSpam database.

Here is a short list:
http://www.stopforumspam.com/
http://www.fspamlist.com/
http://www.spambusted.com/

Other resources:
http://akismet.com/development/api/
http://www.anatoa.com/
http://www.block-disposable-email.com/
http://blogspam.net/api/
http://www.easyantispam.com/wiki/api:home
http://spamid.servebeer.com:8081/spamid/spamid/apis.jsp
http://blocklistpro.com/
http://dnsbl.tornevall.org/

By Guest posted 16th Apr 2012, 8:13 PM

"I am seeing some discussion here about checking at registration, or at posting. From my perspective I think it should be both. Both these things are relatively rare compared to page views, and checks are not awfully complex things."

You need to watch out for a maximum amount of free queries from these databases. Some of these databases have a limit on the number of queries per day. Checking at post time may not seem like much, but when you have 100's or 1000's of forums querying a database at each post it could create an extra load on the database provider that the provider may not be happy about. Most of these databases are provided free of charge as a service to forum administrators.

If you are stopping them at registration time, they will not get the chance to create a spam post. If by chance you do get a spam post, more than likely that poster is using a new ip/email/username that was never on a database to begin with and you will most likely will miss catching them at post time. A spammer can make many posts on different forums before they are caught and reported, providing they are even reported.

A busy forum should have active moderators catching the very few spammers that will slip through past detection.

By Guest, By Guest, posted 16th Apr 2012, 9:00 PM

"You need to watch out for a maximum amount of free queries from these databases. Some of these databases have a limit on the number of queries per day. Checking at post time may not seem like much, but when you have 100's or 1000's of forums querying a database at each post it could create an extra load on the database provider that the provider may not be happy about. Most of these databases are provided free of charge as a service to forum administrators."

This is an advantage for Project Honey Pot/HTTP:BL which, to my knowledge, does not impose a quota although they do encourage high-traffic sites to download the BL to their DNS servers and keep them synced.

I agree that whatever automated solutions are developed, manual moderation is still a requirement but, hopefully, with a much smaller number of issues.

By Guest posted 18th Apr 2012, 3:08 PM

Ok so this is currently being implemented, 60% done so far.

However before I forget I want to mention that this is going to do a DB version jump of one of the core modules, so when the patch is released it'll be necessary to run a database upgrade in the upgrader.

This code is not going to be in v8. A patch will be released for v8, for people wanting to try it ahead of whenever is released (v9?). This is as per new policy - feature sponsorship results in supported patches for the latest version at the patch construction time, and the code is added to Composr's unstable branch, but does release roadmaps are unaffected.

By Guest posted 18th Apr 2012, 3:09 PM

"This is an advantage for Project Honey Pot/HTTP:BL which, to my knowledge, does not impose a quota although they do encourage high-traffic sites to download the BL to their DNS servers and keep them synced.". There'll be an option controlling when it happens. stopwebspam will never run for all page views, but RBL checks will be able to.

By Guest posted 18th Apr 2012, 3:10 PM

Btw, thanks sholzy, as you may have noticed your notes were incorporated into the plans.

By Guest posted 19th Apr 2012, 5:38 AM

Code written. Code Quality Checker passed. Test set written (but not ran through yet)...

"Spammer checking level": "Every page view", really does run RBL checks on each page view
"Spammer checking level": "Every page view", does not run Stop Web Spam checks on each page view
"Spammer checking level": "Every page view", does run Stop Web Spam checks on posting
"Spammer checking level": "Actions", really does run Stop Web Spam checks on posting as a member
"Spammer checking level": "Guest Actions", really does run Stop Web Spam checks on posting as a Guest
"Spammer checking level": "Guest Actions", really does run Stop Web Spam checks on joining
"Spammer checking level": "Never", really does not run RBL checks on joining
"Spammer checking level": "Never", really does not run Stop Web Spam checks on joining
RBL check works for tornevall, with a confidence equal to the "Implied spammer confidence" option
RBL check works for HTTP:BL with a correct confidence level (HTTP:BL needs setting up in config first, with a key)
If an invalid RBL is configured,it does not kill Composr, but it does send an error notification
If an RBL check bans a spammer, it is only for as long as the configured "Block list cache time"
If an IP is in "Spammer checking exclusions", it is not checked against RBLs
If an IP is in "Spammer checking exclusions", it is not checked against Stop Web Spam
If an IP is in "Spammer checking exclusions", any existing IP bans for it will be ignored
HTTP:BL bans over the "Spammer ban threshold" result in bans
Bans result in appropriate ban notifications indicating the reason and IP
HTTP:BL bans over the "Spammer block threshold" but less than "Spammer ban threshold" result in blocks
Blocks result in appropriate block notifications indicating the reason and IP
HTTP:BL bans over the "Spammer approval threshold" but less than "Spammer block threshold" result in content requiring approval, even for an admin
Approval-requires result in appropriate approval notifications indicating the reason and IP
Stop Web Spam results older than "Spammer staleness threshold" but above an action threshold do not result in any action
Stop Web Spam results newer than "Spammer staleness threshold" but above an action threshold do result in any action
If "Honeypot URL" is configured, honeypots are correctly advertised
If "Honeypot URL" is configured, honeypot URL injection methods are different on different pages, but are constant on each particular page
If the "Check usernames against known spammers" option is enabled then known Stop Web Spam usernames will be blocked on joining
If the "Check usernames against known spammers" option is enabled then known Stop Web Spam usernames will not blocked on joining
Stop Web Spam email addresses will be blocked on joining
If a service request to Stop Web Spam fails, an error notification is sent
If "Blackhole detection" is enabled, fiddling with browser developer tools to fill up the blackhole will result in a hack-attack alert
If "Blackhole detection" is enabled, NOT fiddling with browser developer tools to fill up the blackhole will NOT result in a hack-attack alert
The Blackhole is marked up so as not to be visible
The Blackhole is marked up so as someone with a screenreader would not accidentally fill it in
Ban syndication not available from the action log if no key provided
Ban syndication works from the action log
Ban syndication not available from investigate user if no key provided
Ban syndication works from investigate user
Ban syndication not available from punish member if no key provided
Ban syndication works from punish member
IP ban management correctly shows the temporary bans, with all the details required (including IP, expiry time, and block reason) but they are uneditable directly
When saving IP ban management, temporary bans are not wiped
Marking a trackback as spam results in ban syndication
"Scattergun link injection" spam (detected spam that creates a hack-attack in Composr) results in ban syndication
The privacy policy mentions spam checks, if not set to 'Never'
The privacy policy does not mention spam checks, if set to 'Never'

By Guest posted 19th Apr 2012, 6:13 AM

Uploaded an image of the config options. As you can see, we now have some serious power, and a new USP for Composr :).

By Guest posted 19th Apr 2012, 6:25 AM

"Btw, thanks sholzy, as you may have noticed your notes were incorporated into the plans."

Welcome. :-)
I might not have been able to help sponsor this monetarily, but I was able to help sponsor in another way -- sharing my knowledge from research.

By Guest posted 19th Apr 2012, 6:41 AM

I may need to unblock the Punjab region spammers (182.178.*.*, 110.36.*.*, 182.177.*.*) to try out this mod. They were the only real problem I had after moving the old SMF forum to Composr.

By Guest, By Guest, posted 19th Apr 2012, 9:31 AM

In the image and the description, there is reference to Stop Web Spam. Is this supposed to be Stop Forum Spam or is this something different?

By Guest posted 19th Apr 2012, 9:33 AM

Good catch :).

By Guest posted 19th Apr 2012, 10:16 AM

I saw that and was thinking Chris added a new feature - a spam filter for the whole internet. ;-)

By Guest posted 19th Apr 2012, 11:04 AM

Somewhat irritatingly, Stop Forum Spam API submission requires all of IP address, username, and password. So therefore is not suitable for automated submission of detected Guest spammers.

http://www.stopforumspam.com/forum/viewtopic.php?id=2256

So Tornevall API support will also be there. We support tornevall RBL already, so that means we both can feed off and into this.

HOWEVER! It requires PHP to have the SoapClient installed, and I'm also not sure how readily they give out API keys. You have to email to ask for access and I'm still awaiting mine.

By Guest, By Guest, posted 19th Apr 2012, 11:32 AM

This is one of the reasons that I like HTTP: BL (and probably others) as they seem anxious to identify potential threats and start reporting on them. As much as I like all the new features, I will continue using CloudFront to catch the most egregious scoundrels, plus it provides for efficient blocking by country. This will produce a nice "funnel" where IPs let through are challenged using a separate system (stopforumspam) which parallels what I currently do manually.

I can sort of see where StopForumSpa is coming from — they want to be a spammer database only. I just question whether that is a good long-term strategy.

By Guest posted 19th Apr 2012, 12:15 PM

Done! Well, my time estimate was off by a factor of 4, but I did a good job ;).

By Guest posted 19th Apr 2012, 12:21 PM

Before I forget to mention, the patch I post will be compatible with the next v8 RC. I found a few small v8 bugs whilst doing this, and those will be fixed in there (if I put them into the patch, it'll create conflicts later).

By Guest posted 19th Apr 2012, 12:26 PM

Actually, ignore that. I'm going to post the patch now, and the above statement is true. However I will roll in the handful of v8 RC5 fixes to the zip I attach later on today.

In theory the attached zip will be fine with v8-final as well as v8 RC6, as we are so close to release now.

#290 - Spammer database

Rating

Comments

Statistics