View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
415 | Composr | core | public | 2012-04-02 05:19 | 2019-06-25 16:16 |
Reporter | Guest | Assigned To | Chris Graham | ||
Priority | normal | Severity | feature | ||
Status | closed | Resolution | won't fix | ||
Summary | 415: robots.txt editor (formerly: "Allow exclusion of contents from sitemap for all content types") | ||||
Description | Currently, the sitemap generated by Composr includes all content with Guest view access. There are situation where you want the content available for guests on the site but that you do not want that content indexed by the search engines. The only solution to this currently is to place a restriction in the robots.txt file which works but which also causes and inconsistently: Google warns that items in the sitemap are excluded by entries in robots.txt. For Bing, I suspect that this would result in "errors" which, if over a certain percent of your sitemap entries, has them reject your sitemap. I think the better approach towards addressing this is to allow the admin to mark content as included/excluded when it is created and that the sitemap generation script would respect the exclude in addition to any current logic. This would provide the means to have a "custom" sitemap automatically generated by Composr. This should work for all content types and include category and individual entry exclusions. | ||||
Tags | Type: SEO | ||||
Attach Tags | |||||
Time estimation (hours) | 7 | ||||
Sponsorship open | |||||
related to | 3315 | Resolved | Chris Graham | Composr | Bundle default robots.txt |
related to | 3569 | Not Assigned | Guest | Composr documentation | Blocking nefarious crawlers via .htaccess changes |
|
Alternate solution proposed was to parse robots.txt. Having an editor for robots.txt inside Composr would not hurt. |
|
7h time estimate added, for making a robots.txt editor, and moving some hard-coded rules in v8 into a default robots.txt. Should be an addon, and installer would need to be very careful to not overwrite an existing robots.txt (so the default rules should probably be stored in PHP code and saved into there via a function call run during installation). |
|
I'm closing this. A simple robots.txt editor is now implemented for v11, which will make editing robots.txt a little easier. I thought about what was written here about keeping robots.txt and the XML Sitemap in sync, or about having content options to exclude it via robots.txt and the XML Sitemap. The problem with it is that it assumes a binary - that the content is either not to be crawled by anything, or it is. robots.txt allows specifying which crawlers have access to content, and the XML Sitemap is not specifically for crawlers (it could be used by an HTML Validation tool for example). So it doesn't line up very well with the necessary flexibility of these formats. Because the implementation is problematic, I think just putting robots.txt in the hands of the user, but helping them to edit it, is the correct approach. Easier for us to do too. |
Date Modified | Username | Field | Change |
---|---|---|---|
2017-01-11 12:29 | Chris Graham | Tag Attached: Type: SEO | |
2018-03-28 00:16 | Chris Graham | Relationship added | related to 3315 |
2018-03-28 00:16 | Chris Graham | Relationship added | related to 3569 |
2019-06-25 16:16 | Chris Graham | Assigned To | => Chris Graham |
2019-06-25 16:16 | Chris Graham | Status | Not Assigned => Closed |
2019-06-25 16:16 | Chris Graham | Resolution | open => won't fix |
2019-06-25 16:16 | Chris Graham | Note Added: 0005983 |