#415 - robots.txt editor (formerly: "Allow exclusion of contents from sitemap for all content types")

By Guest
Added 2nd Apr 2012, 1:19 AM
26 views

Identifier	#415
Issue type	Feature request or suggestion
Title	robots.txt editor (formerly: "Allow exclusion of contents from sitemap for all content types")
Status	Closed (rejected)
Tags	Type: SEO (custom)
Handling member	Chris Graham
Addon	core
Description	Currently, the sitemap generated by Composr includes all content with Guest view access. There are situation where you want the content available for guests on the site but that you do not want that content indexed by the search engines. The only solution to this currently is to place a restriction in the robots.txt file which works but which also causes and inconsistently: Google warns that items in the sitemap are excluded by entries in robots.txt. For Bing, I suspect that this would result in "errors" which, if over a certain percent of your sitemap entries, has them reject your sitemap. I think the better approach towards addressing this is to allow the admin to mark content as included/excluded when it is created and that the sitemap generation script would respect the exclude in addition to any current logic. This would provide the means to have a "custom" sitemap automatically generated by Composr. This should work for all content types and include category and individual entry exclusions.
Steps to reproduce
Funded?	No

The system will post a comment when this issue is modified (e.g., status changes). To be notified of this, click "Enable comment notifications".

Rating

Unrated

Comments

By Guest posted 10th Apr 2012, 7:28 AM

Alternate solution proposed was to parse robots.txt.

Having an editor for robots.txt inside Composr would not hurt.

By Guest posted 14th May 2012, 11:39 AM

7h time estimate added, for making a robots.txt editor, and moving some hard-coded rules in v8 into a default robots.txt. Should be an addon, and installer would need to be very careful to not overwrite an existing robots.txt (so the default rules should probably be stored in PHP code and saved into there via a function call run during installation).

By Guest posted 25th Jun 2019, 12:16 PM

I'm closing this.

A simple robots.txt editor is now implemented for v11, which will make editing robots.txt a little easier.

I thought about what was written here about keeping robots.txt and the XML Sitemap in sync, or about having content options to exclude it via robots.txt and the XML Sitemap. The problem with it is that it assumes a binary - that the content is either not to be crawled by anything, or it is. robots.txt allows specifying which crawlers have access to content, and the XML Sitemap is not specifically for crawlers (it could be used by an HTML Validation tool for example). So it doesn't line up very well with the necessary flexibility of these formats.

Because the implementation is problematic, I think just putting robots.txt in the hands of the user, but helping them to edit it, is the correct approach. Easier for us to do too.