Working on a test site at the moment which previously accidentally got indexed, due to some fool removing the robots.txt
I replaced it, successfully got the site deindexed, and to prevent anyone accessing it restricted access to it in the .htaccess to my ip only.
Noticed yesterday that google webmaster tools is now showing a string of 403 errors for the site, as it's also getting a 403 for the robots.txt
I've just changed the .htaccess to use simple authentication instead of ip blocking to restrict the site, and GWT is now giving me a 401 for the robots.txt (and presumably for the rest of the site once it gets around to crawling it again).
While job is done in that no one can access the site, I'm sure a history of crawl errors won't help it in the long run after it goes live if I leave it like this.
I'm sure there's a simple way of adding an exception in the .htaccess to allow access to robots.txt while keeping the authentication for everything else, but being half asleep on Monday morning I can't find it. Any suggestions?
I replaced it, successfully got the site deindexed, and to prevent anyone accessing it restricted access to it in the .htaccess to my ip only.
Noticed yesterday that google webmaster tools is now showing a string of 403 errors for the site, as it's also getting a 403 for the robots.txt
I've just changed the .htaccess to use simple authentication instead of ip blocking to restrict the site, and GWT is now giving me a 401 for the robots.txt (and presumably for the rest of the site once it gets around to crawling it again).
While job is done in that no one can access the site, I'm sure a history of crawl errors won't help it in the long run after it goes live if I leave it like this.
I'm sure there's a simple way of adding an exception in the .htaccess to allow access to robots.txt while keeping the authentication for everything else, but being half asleep on Monday morning I can't find it. Any suggestions?