Darrin Lythgoe Posted December 9, 2005 Report Share Posted December 9, 2005 Hi everyone,Per my recent posting on the mailing list, I have created a new file called tngrobots.php that directs TNG which meta restrictions pertaining to robots to include on which pages. I have attempted to categorize each script, with content-rich pages getting full indexing, link-rich pages getting no indexing but yes to "following", and everything else getting "no index, no follow".There are definitely some gray areas there, however, and I would appreciate any feedback you might have about what belongs where and why (don't forget the "why" if you want to convince me).The beauty of this system will be that you can tweak this to your hearts content all from one page, but I'd like to optimize it before I release it to the public.Anyway, here's the code:<?phpif( !$cms[support] ) $tngscript = basename( $SCRIPT_NAME, ".php" );else $tngscript = $file;//No index only$NOI = "<meta name="robots" content="noindex">n";//No follow only$NOF = "<meta name="robots" content="nofollow">n";//No index AND no follow$NOINOF = "<meta name="robots" content="noindex,nofollow">n";//each "case" is the name of the script file without the ".php" at the endswitch( $tngscript ) { //allow full indexing case "cemeteries": case "getperson": case "familygroup": case "headstones": case "showheadstone": case "showmap": case "showphoto": case "showrepo": case "showsource": case "showtree": case "surnames": case "surnames-all": case "surnames-oneletter": $flags[norobots] = ""; break; //no indexing, but allow link following case "browsedocs": case "browseheadstones": case "browsenotes": case "browsephotos": case "browserepos": case "browsesources": case "browsetrees-old": case "descend": case "extrastree": case "register": case "reports": case "search": case "showreport": case "ahnentafel": case "pedigree": case "pedigreetext": case "surnames100": case "ultraped": $flags[norobots] = $NOI; break; //no index, no follow case "addnewacct": case "anniversaries": case "browsetrees": case "changelanguage": case "desctracker": case "gedform": case "login": case "newacctform": case "places-all": case "places-oneletter": case "places": case "placesearch": case "places100": case "relateform": case "relationship": case "searchform": case "sendlogin": case "showlog": case "suggest": case "timeline2": case "whatsnew": default: $flags[norobots] = $NOINOF; break;}?>Thanks!Darrin Quote Link to comment Share on other sites More sharing options...
theKiwi Posted December 9, 2005 Report Share Posted December 9, 2005 I'm wondering about the thoughts behind having some pages that are "no index but allow followiing"Wasn't the whole origin of the complaints that people are affected by the bandwidth used by the robots, and allowing them to follow the links is going to still use up bandwidth even if the robot doesn't then index the page it has just followed from?Roger Quote Link to comment Share on other sites More sharing options...
waterhead Posted December 9, 2005 Report Share Posted December 9, 2005 The robots don't seem to respect the meta links - I have tried the no index, nofollow stuff without success in the case of Google, MSN and Intomi Slurp.Chris Quote Link to comment Share on other sites More sharing options...
theKiwi Posted December 9, 2005 Report Share Posted December 9, 2005 The robots don't seem to respect the meta links - I have tried the no index, nofollow stuff without success in the case of Google, MSN and Intomi Slurp.Does your robots.txt file pass a validation test - for example the one athttp://www.searchengineworld.com/cgi-bin/r.../robotcheck.cgiI used this to discover that my very simple file that I'd copied directly from a site about robots.txt files didn't validate because I'd saved it on a Macintosh, and used the default of Macintosh line endings. Once I changed to Unix line endings it now validates.Roger Quote Link to comment Share on other sites More sharing options...
waterhead Posted December 12, 2005 Report Share Posted December 12, 2005 Roger,Thanks for that link. I have checked my robots files and they do check out correctly.Chris Quote Link to comment Share on other sites More sharing options...
bmohr Posted December 16, 2005 Report Share Posted December 16, 2005 I used this to discover that my very simple file that I'd copied directly from a site about robots.txt files didn't validate because I'd saved it on a Macintosh, and used the default of Macintosh line endings. Once I changed to Unix line endings it now validates.Your original file was fine. It's the validator that's broken. The robots.txt standard explicitly allows any line endings:The format and semantics of the "/robots.txt" file are as follows:The file consists of one or more records separated by one or more blank lines (terminated by CR,CR/NL, or NL). Each record contains lines of the form "<field>:<optionalspace><value><optionalspace>". The field name is case insensitive.Of course, some robots are probably similarly broken. I know some are incorrectly case-sensitive with regards to the field names.Brad Quote Link to comment Share on other sites More sharing options...
CCarter Posted February 1, 2008 Report Share Posted February 1, 2008 Does this go in the admin or root directory?What code do you add and where do you put it in order to point to your tngrobots.php file?Thanks,-charles Quote Link to comment Share on other sites More sharing options...
palmspringsbum Posted December 29, 2010 Report Share Posted December 29, 2010 I just upgraded to 8.1.msnbot-157-55-116-14.search.msn.com is still crawling my descend files. Nothing else appears to be crawling them now.I think that is also the bot that nearly doubled my band-width over the past couple of weeks, adding 10G, putting me 6G over my limit.That bot has bot to go. It appears it was guzzling the band-width on descendent trees and on relationship trees.Help me muzzle that bot. Quote Link to comment Share on other sites More sharing options...
palmspringsbum Posted March 6, 2011 Report Share Posted March 6, 2011 The robots don't seem to respect the meta links - I have tried the no index, nofollow stuff without success in the case of Google, MSN and Intomi Slurp.ChrisI'm having the same problem with the same bots, they are indexing and following the "descend" files, and it appears they are going through every single permutation, using twice as much of my bandwidth as "viewed files".Last month the bots gobbled over 15 GB (that's right, GIGS) of my bandwidth, over 90% of it in the last few days of the month. "viewed traffic" is about 7.5GB.Could the problem be that the "descend" files need to be named explicitly?Here are my top ten URLs by kbytes:# Hits KBytes URL1 92121 9.04% 2153029 8.95% /genealogy/getperson.php2 103 0.01% 2015860 8.38% /bin/brb060223.mp33 31865 3.13% 1782551 7.41% /genealogy/desctracker.php4 36302 3.56% 1433942 5.96% /genealogy/pedigree.php5 48502 4.76% 1065713 4.43% /genealogy/descendtext.php6 15485 1.52% 837550 3.48% /genealogy/descend.php7 1408 0.14% 749237 3.11% /blog/about/8 1425 0.14% 657778 2.73% /blog/store/9 20556 2.02% 580061 2.41% /bbs/viewtopic.php10 23127 2.27% 540937 2.25% /genealogy/familygroup.php- desctracker.php- descendtext.php- descend.php- pedigree.phpShould not be indexed or followed. That's 5.2 GB of my bandwidth right there. Quote Link to comment Share on other sites More sharing options...
palmspringsbum Posted April 17, 2011 Report Share Posted April 17, 2011 No replies?Just as well. I just checked and my viewed-to-bots bandwidth ratio is currently about 2:1, about 8M to 4M.That is a complete reversal.After trying everything else, I spent some time on the robots.txt file, telling the bots to ignore just about everything but "getperson.php".I had been going at this with the assumption that if bots were ignoring the "nofollow,noindex" they certainly wouldn't pay any attention to a robots.txt file. It seems I was wrong.Oh, yeah, I put robots.txt files in the subdirectories/subdomains as well.I also recall that in the process I discovered all my subdomains had somehow gotten screwed up and were pointing at the wrong place, if they were pointing anywhere at all, and so I fixed the subdomain redirects. Quote Link to comment Share on other sites More sharing options...
Jay Wilpolt Posted April 20, 2011 Report Share Posted April 20, 2011 No replies?Just as well. I just checked and my viewed-to-bots bandwidth ratio is currently about 2:1, about 8M to 4M.That is a complete reversal.After trying everything else, I spent some time on the robots.txt file, telling the bots to ignore just about everything but "getperson.php".I had been going at this with the assumption that if bots were ignoring the "nofollow,noindex" they certainly wouldn't pay any attention to a robots.txt file. It seems I was wrong.Oh, yeah, I put robots.txt files in the subdirectories/subdomains as well.I also recall that in the process I discovered all my subdomains had somehow gotten screwed up and were pointing at the wrong place, if they were pointing anywhere at all, and so I fixed the subdomain redirects.Here is my robots.txt fileIt's quite restrictive, so you may want to remove some info. You need to change the paths to match your path from your hosting ROOT folder. Hope this helps.Jay robots.txt Quote Link to comment Share on other sites More sharing options...
Larry Harrell Posted April 20, 2011 Report Share Posted April 20, 2011 Hi everyone, Per my recent posting on the mailing list, I have created a new file called tngrobots.php that directs TNG which meta restrictions pertaining to robots to include on which pages. I have attempted to categorize each script, with content-rich pages getting full indexing, link-rich pages getting no indexing but yes to "following", and everything else getting "no index, no follow". There are definitely some gray areas there, however, and I would appreciate any feedback you might have about what belongs where and why (don't forget the "why" if you want to convince me). The beauty of this system will be that you can tweak this to your hearts content all from one page, but I'd like to optimize it before I release it to the public. Anyway, here's the code: <?php if( !$cms[support] ) $tngscript = basename( $SCRIPT_NAME, ".php" ); else $tngscript = $file; //No index only $NOI = "<meta name="robots" content="noindex">n"; //No follow only $NOF = "<meta name="robots" content="nofollow">n"; //No index AND no follow $NOINOF = "<meta name="robots" content="noindex,nofollow">n"; //each "case" is the name of the script file without the ".php" at the end switch( $tngscript ) { //allow full indexing case "cemeteries": case "getperson": case "familygroup": case "headstones": case "showheadstone": case "showmap": case "showphoto": case "showrepo": case "showsource": case "showtree": case "surnames": case "surnames-all": case "surnames-oneletter": $flags[norobots] = ""; break; //no indexing, but allow link following case "browsedocs": case "browseheadstones": case "browsenotes": case "browsephotos": case "browserepos": case "browsesources": case "browsetrees-old": case "descend": case "extrastree": case "register": case "reports": case "search": case "showreport": case "ahnentafel": case "pedigree": case "pedigreetext": case "surnames100": case "ultraped": $flags[norobots] = $NOI; break; //no index, no follow case "addnewacct": case "anniversaries": case "browsetrees": case "changelanguage": case "desctracker": case "gedform": case "login": case "newacctform": case "places-all": case "places-oneletter": case "places": case "placesearch": case "places100": case "relateform": case "relationship": case "searchform": case "sendlogin": case "showlog": case "suggest": case "timeline2": case "whatsnew": default: $flags[norobots] = $NOINOF; break; } ?> Thanks! Darrin Darrin, Does this code go in index.php or do we need tngrobots.php and if so can you send tngrobots.php with instructions as to how to connect to the TNG main index.php page. Larry Quote Link to comment Share on other sites More sharing options...
Henrik Poulsen Posted July 11, 2011 Report Share Posted July 11, 2011 Has anyone found out how the tngrobots.php works?Is it right to assume hat no robots.txt is needed..? and that editsis done in tngrobots.php? Quote Link to comment Share on other sites More sharing options...
rocksea Posted August 23, 2011 Report Share Posted August 23, 2011 I am a bit confused about the use of tngrobots.php as there is no info or wiki about it. How does the robots.txt link to the tngrobots.php? Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.