#290 - Spammer database

This is a spacer post for a website comment topic. The content this topic relates to: #290 - Spammer database
I think it would be better to lookup IP/Username/Email and flag with option to block user (or quarantine with option to allow user).
This should check both at member registration and for each post.
@Bobs: Most spammer databases contain only IP numbers, Email addresses, and user names. if you are checking for spammers at registration time, most likely you won't have any getting the chance to make posts.

A better spam filter would be needed for post submission checking. Perhaps something that would allow admins to enter the new word violation directly from the post.
I'm thinking of the situation where the above information is used to register and then the registration lays idle for a period of time. Checking again at time of post would catch those situations although I guess you could also employ member ranks with appropriate permissions to catch most of this stuff.
I understand what you are thinking about. The extra queries at post submission would put a lot of added traffic on the database website especially if you have a busy forum. Most spammers make it into the database after they have done a lot of spamming and flagging and quarantining them at registration would keep them from making any posts.

The mod I used on the old SMF allowed you to check any/all members against the database at anytime and at registration time.

what i can see in this that what we can do is add block and module for maxmind...
or a other services. that will auto look up ISP/IP/and if they are spammers or not.. in this module it will also have block for sign up with captcha that will auto match with there real IP and not proxy IP... this can or will help to save you time... it will also stop from the posting on your site and forums and there Sig.. please note that maxmind is not free.. but they do have free sign up to get you started. any questions?
I have been using the Cloudflare service which has support built-in for anti-spam and other threats, They use the Project Honeypot service to issue a challenge-response. All unsuccessful challenges are left in quarantine for permanent disposition or they can be left there indefinitely. I really like this system which has caught a lot of threats I would have otherwise missed just checking a spammer database.

I think it would be worthwhile seeing if you could build this service based on Project Honeypot with the possible inclusion of a honeypot to catch and report spammers back to the project. They have fairly demanding requirements to work with them (in order to protect their assets) but it might be worth looking into.
API info for Project Honey Pot's Http:BL

https://www.projecthoneypot.org/httpbl_api.php
"Most spammer databases contain only IP numbers, Email addresses, and user names"

As far as I can tell, it is only IP numbers. Do you know of one that does email addresses and user names? I suspect that'd be a privacy issue actually.
I am seeing some discussion here about checking at registration, or at posting. From my perspective I think it should be both. Both these things are relatively rare compared to page views, and checks are not awfully complex things.

I note a popular HTTP:BL implementation checks on every page view and apparently still has great performance (I think it probably relies on the server's DNS caching for that, as the checks happen via DNS).
NB: Also see discussion topic http://compo.sr/forum/topicview/misc/deploying/sponsorship-for-feature_2.htm
"Most spammer databases contain only IP numbers, Email addresses, and user names"

"As far as I can tell, it is only IP numbers. Do you know of one that does email addresses and user names? I suspect that'd be a privacy issue actually."

This is the mod I was using for my SMF based forum: http://custom.simplemachines.org/mods/index.php?mod=1547
It did an excellent job of weeding out the spammers. That mod pulls from the StopForumSpam database.

Here is a short list:
http://www.stopforumspam.com/
http://www.fspamlist.com/
http://www.spambusted.com/

Other resources:
http://akismet.com/development/api/
http://www.anatoa.com/
http://www.block-disposable-email.com/
http://blogspam.net/api/
http://www.easyantispam.com/wiki/api:home
http://spamid.servebeer.com:8081/spamid/spamid/apis.jsp
http://blocklistpro.com/
http://dnsbl.tornevall.org/
"I am seeing some discussion here about checking at registration, or at posting. From my perspective I think it should be both. Both these things are relatively rare compared to page views, and checks are not awfully complex things."

You need to watch out for a maximum amount of free queries from these databases. Some of these databases have a limit on the number of queries per day. Checking at post time may not seem like much, but when you have 100's or 1000's of forums querying a database at each post it could create an extra load on the database provider that the provider may not be happy about. Most of these databases are provided free of charge as a service to forum administrators.

If you are stopping them at registration time, they will not get the chance to create a spam post. If by chance you do get a spam post, more than likely that poster is using a new ip/email/username that was never on a database to begin with and you will most likely will miss catching them at post time. A spammer can make many posts on different forums before they are caught and reported, providing they are even reported.

A busy forum should have active moderators catching the very few spammers that will slip through past detection.
"You need to watch out for a maximum amount of free queries from these databases. Some of these databases have a limit on the number of queries per day. Checking at post time may not seem like much, but when you have 100's or 1000's of forums querying a database at each post it could create an extra load on the database provider that the provider may not be happy about. Most of these databases are provided free of charge as a service to forum administrators."

This is an advantage for Project Honey Pot/HTTP:BL which, to my knowledge, does not impose a quota although they do encourage high-traffic sites to download the BL to their DNS servers and keep them synced.

I agree that whatever automated solutions are developed, manual moderation is still a requirement but, hopefully, with a much smaller number of issues.
Ok so this is currently being implemented, 60% done so far.

However before I forget I want to mention that this is going to do a DB version jump of one of the core modules, so when the patch is released it'll be necessary to run a database upgrade in the upgrader.

This code is not going to be in v8. A patch will be released for v8, for people wanting to try it ahead of whenever is released (v9?). This is as per new policy - feature sponsorship results in supported patches for the latest version at the patch construction time, and the code is added to Composr's unstable branch, but does release roadmaps are unaffected.
"This is an advantage for Project Honey Pot/HTTP:BL which, to my knowledge, does not impose a quota although they do encourage high-traffic sites to download the BL to their DNS servers and keep them synced.". There'll be an option controlling when it happens. stopwebspam will never run for all page views, but RBL checks will be able to.
Btw, thanks sholzy, as you may have noticed your notes were incorporated into the plans.
Code written. Code Quality Checker passed. Test set written (but not ran through yet)...


"Spammer checking level": "Every page view", really does run RBL checks on each page view
"Spammer checking level": "Every page view", does not run Stop Web Spam checks on each page view
"Spammer checking level": "Every page view", does run Stop Web Spam checks on posting
"Spammer checking level": "Actions", really does run Stop Web Spam checks on posting as a member
"Spammer checking level": "Guest Actions", really does run Stop Web Spam checks on posting as a Guest
"Spammer checking level": "Guest Actions", really does run Stop Web Spam checks on joining
"Spammer checking level": "Never", really does not run RBL checks on joining
"Spammer checking level": "Never", really does not run Stop Web Spam checks on joining
RBL check works for tornevall, with a confidence equal to the "Implied spammer confidence" option
RBL check works for HTTP:BL with a correct confidence level (HTTP:BL needs setting up in config first, with a key)
If an invalid RBL is configured,it does not kill Composr, but it does send an error notification
If an RBL check bans a spammer, it is only for as long as the configured "Block list cache time"
If an IP is in "Spammer checking exclusions", it is not checked against RBLs
If an IP is in "Spammer checking exclusions", it is not checked against Stop Web Spam
If an IP is in "Spammer checking exclusions", any existing IP bans for it will be ignored
HTTP:BL bans over the "Spammer ban threshold" result in bans
Bans result in appropriate ban notifications indicating the reason and IP
HTTP:BL bans over the "Spammer block threshold" but less than "Spammer ban threshold" result in blocks
Blocks result in appropriate block notifications indicating the reason and IP
HTTP:BL bans over the "Spammer approval threshold" but less than "Spammer block threshold" result in content requiring approval, even for an admin
Approval-requires result in appropriate approval notifications indicating the reason and IP
Stop Web Spam results older than "Spammer staleness threshold" but above an action threshold do not result in any action
Stop Web Spam results newer than "Spammer staleness threshold" but above an action threshold do result in any action
If "Honeypot URL" is configured, honeypots are correctly advertised
If "Honeypot URL" is configured, honeypot URL injection methods are different on different pages, but are constant on each particular page
If the "Check usernames against known spammers" option is enabled then known Stop Web Spam usernames will be blocked on joining
If the "Check usernames against known spammers" option is enabled then known Stop Web Spam usernames will not blocked on joining
Stop Web Spam email addresses will be blocked on joining
If a service request to Stop Web Spam fails, an error notification is sent
If "Blackhole detection" is enabled, fiddling with browser developer tools to fill up the blackhole will result in a hack-attack alert
If "Blackhole detection" is enabled, NOT fiddling with browser developer tools to fill up the blackhole will NOT result in a hack-attack alert
The Blackhole is marked up so as not to be visible
The Blackhole is marked up so as someone with a screenreader would not accidentally fill it in
Ban syndication not available from the action log if no key provided
Ban syndication works from the action log
Ban syndication not available from investigate user if no key provided
Ban syndication works from investigate user
Ban syndication not available from punish member if no key provided
Ban syndication works from punish member
IP ban management correctly shows the temporary bans, with all the details required (including IP, expiry time, and block reason) but they are uneditable directly
When saving IP ban management, temporary bans are not wiped
Marking a trackback as spam results in ban syndication
"Scattergun link injection" spam (detected spam that creates a hack-attack in Composr) results in ban syndication
The privacy policy mentions spam checks, if not set to 'Never'
The privacy policy does not mention spam checks, if set to 'Never'
Image

(Click to enlarge)

Uploaded an image of the config options. As you can see, we now have some serious power, and a new USP for Composr :).
"Btw, thanks sholzy, as you may have noticed your notes were incorporated into the plans."

Welcome. :-)
I might not have been able to help sponsor this monetarily, but I was able to help sponsor in another way -- sharing my knowledge from research.
I may need to unblock the Punjab region spammers (182.178.*.*, 110.36.*.*, 182.177.*.*) to try out this mod. They were the only real problem I had after moving the old SMF forum to Composr.
In the image and the description, there is reference to Stop Web Spam. Is this supposed to be Stop Forum Spam or is this something different?
Good catch :).
I saw that and was thinking Chris added a new feature - a spam filter for the whole internet. ;-)
Somewhat irritatingly, Stop Forum Spam API submission requires all of IP address, username, and password. So therefore is not suitable for automated submission of detected Guest spammers.

http://www.stopforumspam.com/forum/viewtopic.php?id=2256

So Tornevall API support will also be there. We support tornevall RBL already, so that means we both can feed off and into this.

HOWEVER! It requires PHP to have the SoapClient installed, and I'm also not sure how readily they give out API keys. You have to email to ask for access and I'm still awaiting mine.
This is one of the reasons that I like HTTP: BL (and probably others) as they seem anxious to identify potential threats and start reporting on them. As much as I like all the new features, I will continue using CloudFront to catch the most egregious scoundrels, plus it provides for efficient blocking by country. This will produce a nice "funnel" where IPs let through are challenged using a separate system (stopforumspam) which parallels what I currently do manually.

I can sort of see where StopForumSpa is coming from — they want to be a spammer database only. I just question whether that is a good long-term strategy.
Done! Well, my time estimate was off by a factor of 4, but I did a good job ;).
Before I forget to mention, the patch I post will be compatible with the next v8 RC. I found a few small v8 bugs whilst doing this, and those will be fixed in there (if I put them into the patch, it'll create conflicts later).

Show 4 more replies

0 guests and 0 members have recently viewed this.