View Issue Details

IDProjectCategoryView StatusLast Update
415Composrcorepublic2019-06-25 16:16
ReporterGuest Assigned ToChris Graham  
PrioritynormalSeverityfeature 
Status closedResolutionwon't fix 
Summary415: robots.txt editor (formerly: "Allow exclusion of contents from sitemap for all content types")
DescriptionCurrently, the sitemap generated by Composr includes all content with Guest view access. There are situation where you want the content available for guests on the site but that you do not want that content indexed by the search engines. The only solution to this currently is to place a restriction in the robots.txt file which works but which also causes and inconsistently: Google warns that items in the sitemap are excluded by entries in robots.txt. For Bing, I suspect that this would result in "errors" which, if over a certain percent of your sitemap entries, has them reject your sitemap.

I think the better approach towards addressing this is to allow the admin to mark content as included/excluded when it is created and that the sitemap generation script would respect the exclude in addition to any current logic. This would provide the means to have a "custom" sitemap automatically generated by Composr. This should work for all content types and include category and individual entry exclusions.
TagsType: SEO
Attach Tags
Time estimation (hours)7
Sponsorship open

Sponsor

Date Added Member Amount Sponsored

Relationships

related to 3315 ResolvedChris Graham Composr Bundle default robots.txt 
related to 3569 Not AssignedGuest Composr documentation Blocking nefarious crawlers via .htaccess changes 

Activities

Chris Graham

2012-04-10 11:28

administrator   ~379

Alternate solution proposed was to parse robots.txt.

Having an editor for robots.txt inside Composr would not hurt.

Chris Graham

2012-05-14 15:39

administrator   ~494

7h time estimate added, for making a robots.txt editor, and moving some hard-coded rules in v8 into a default robots.txt. Should be an addon, and installer would need to be very careful to not overwrite an existing robots.txt (so the default rules should probably be stored in PHP code and saved into there via a function call run during installation).

Chris Graham

2019-06-25 16:16

administrator   ~5983

I'm closing this.

A simple robots.txt editor is now implemented for v11, which will make editing robots.txt a little easier.

I thought about what was written here about keeping robots.txt and the XML Sitemap in sync, or about having content options to exclude it via robots.txt and the XML Sitemap. The problem with it is that it assumes a binary - that the content is either not to be crawled by anything, or it is. robots.txt allows specifying which crawlers have access to content, and the XML Sitemap is not specifically for crawlers (it could be used by an HTML Validation tool for example). So it doesn't line up very well with the necessary flexibility of these formats.

Because the implementation is problematic, I think just putting robots.txt in the hands of the user, but helping them to edit it, is the correct approach. Easier for us to do too.

Issue History

Date Modified Username Field Change
2017-01-11 12:29 Chris Graham Tag Attached: Type: SEO
2018-03-28 00:16 Chris Graham Relationship added related to 3315
2018-03-28 00:16 Chris Graham Relationship added related to 3569
2019-06-25 16:16 Chris Graham Assigned To => Chris Graham
2019-06-25 16:16 Chris Graham Status Not Assigned => Closed
2019-06-25 16:16 Chris Graham Resolution open => won't fix
2019-06-25 16:16 Chris Graham Note Added: 0005983