#415 - robots.txt editor (formerly: "Allow exclusion of contents from sitemap for all content types")
| Identifier | #415 |
|---|---|
| Issue type | Feature request or suggestion |
| Title | robots.txt editor (formerly: "Allow exclusion of contents from sitemap for all content types") |
| Status | Closed (rejected) |
| Tags |
Type: SEO (custom) |
| Handling member | Chris Graham |
| Addon | core |
| Description | Currently, the sitemap generated by Composr includes all content with Guest view access. There are situation where you want the content available for guests on the site but that you do not want that content indexed by the search engines. The only solution to this currently is to place a restriction in the robots.txt file which works but which also causes and inconsistently: Google warns that items in the sitemap are excluded by entries in robots.txt. For Bing, I suspect that this would result in "errors" which, if over a certain percent of your sitemap entries, has them reject your sitemap.
I think the better approach towards addressing this is to allow the admin to mark content as included/excluded when it is created and that the sitemap generation script would respect the exclude in addition to any current logic. This would provide the means to have a "custom" sitemap automatically generated by Composr. This should work for all content types and include category and individual entry exclusions. |
| Steps to reproduce | |
| Funded? | No |
The system will post a comment when this issue is modified (e.g., status changes). To be notified of this, click "Enable comment notifications".
Comments
Having an editor for robots.txt inside Composr would not hurt.
A simple robots.txt editor is now implemented for v11, which will make editing robots.txt a little easier.
I thought about what was written here about keeping robots.txt and the XML Sitemap in sync, or about having content options to exclude it via robots.txt and the XML Sitemap. The problem with it is that it assumes a binary - that the content is either not to be crawled by anything, or it is. robots.txt allows specifying which crawlers have access to content, and the XML Sitemap is not specifically for crawlers (it could be used by an HTML Validation tool for example). So it doesn't line up very well with the necessary flexibility of these formats.
Because the implementation is problematic, I think just putting robots.txt in the hands of the user, but helping them to edit it, is the correct approach. Easier for us to do too.