03-23-2011, 10:10 AM

Very annoyed today :x . I discovered a few days ago that Google was crawling the https version of my site. Hands up that I should have applied the SSL to the required folder only but what is done is done.

Anyhoo, I found a .htaccess rewrite that served a different robots.txt when the https version was used. Then went through and requested removal of all https results. This has worked well and now I no longer show duplicate pages as http/https.

Having done all that I find that Google is now intent on crawling my site by IP! I mean WTF is Google doing crawling and serving IP results anyway?? Absolutely bizarre that I block https and then a matter of days later this crops up. If I were more paranoid I'd say something sinister was happening.

Anyhoo, any ideas on blocking access to IP URLs? Should I use the same method as I did with the https? Any help to preserve my sanity would be awesome.

03-23-2011, 12:09 PM
Have settled on;

#Serves alternative robots file when requested through IP
RewriteCond %{HTTP_HOST} ^xxx\.xxx\.xxx\.xx
RewriteCond %{REQUEST_URI} ^/robots.txt$
RewriteRule ^(.*)$ /robots_alternative.txt [L]

to serve alternative robots.txt to try and prevent Google from crawling. Will see how it goes unless anyone has any better ideas?

03-23-2011, 12:18 PM
You can also use the canonical meta tag in the header of your pages http://www.google.com/support/webmasters/bin/answer.py?answer=139394

03-23-2011, 12:34 PM
Have thought about that Jim but our site is dynamically generated. I guess I could do something clever with the page id.

03-23-2011, 03:02 PM
Well, I have learnt more about .htaccess than I ever card to today. Still, hopefully fixed :)

04-03-2011, 06:20 AM
we are also getting same issue.