.htaccess authentication blocking robots.txt

MOH · Apr 26, 2010

Working on a test site at the moment which previously accidentally got indexed, due to some fool removing the robots.txt

I replaced it, successfully got the site deindexed, and to prevent anyone accessing it restricted access to it in the .htaccess to my ip only.
Noticed yesterday that google webmaster tools is now showing a string of 403 errors for the site, as it's also getting a 403 for the robots.txt
I've just changed the .htaccess to use simple authentication instead of ip blocking to restrict the site, and GWT is now giving me a 401 for the robots.txt (and presumably for the rest of the site once it gets around to crawling it again).

While job is done in that no one can access the site, I'm sure a history of crawl errors won't help it in the long run after it goes live if I leave it like this.

I'm sure there's a simple way of adding an exception in the .htaccess to allow access to robots.txt while keeping the authentication for everything else, but being half asleep on Monday morning I can't find it. Any suggestions?

MOH · Apr 26, 2010

As usual, got it after posting. (Keep an eye out for my forthcoming site MOH.com, which will consist entirely of me asking silly questions and answering them shortly afterwards. Riveting stuff.)
For future reference, took a few wrong turns, but this seems to do the trick. Within the .htaccess:

Code:

AuthName "Demo site - internal testing only"
AuthUserFile <password file location here>
Require valid-user

[B]<Files ".\robots.txt">
AuthType None
</Files>[/B]

The odd thing about this is, it shouldn't work as far as I can see.

"None" is listed as a valid option for AuthType for Apache 2.3, but not for 2.2.
I'm on an Ubuntu server with Apache 2.2.11 - maybe I've got an updated version of mod_authn_file that includes it.

Seems to work for me anyway.

MOH · Apr 26, 2010

The above didn't actually work. It only seemed to because I had previously authenticated myself on the site, so when I clicked through from GWT I was able to see the robots.txt but Google failed again with a 401 a few minutes ago.

For Apache2.2 or earlier, this should (hopefully) do it (the Satisfy any means the Allow from all will suffice, so there's no need for authentication):

Code:

<Files "robots.txt">
Allow from all
Satisfy any
</Files>

.htaccess authentication blocking robots.txt

MOH

New Member

MOH

New Member

MOH

New Member