Wednesday, December 2, 2009

Keep a watch on Google's crawler

Newscorp recently threatened to remove its entire news content from Google and that has led Google to revise its crawling policies two days in a row. If you didn't already know this- you could read all of WSJ articles on Google News for free because of Google's anti-cloaking requirements for the content it crawls. Anti-cloaking means that a search user must be exposed to the same content the google spider was exposed to at the time the crawl happened. So WSJ had to effectively open access to any visitor from Google News since they allowed the google crawler to index their content. Since the advertizing business model is no longer working for the news industry, this move was coming I guess.

So what's changed now?

1) Google seems to have relaxed their anti-cloaking policies a bit. At least that is what it appears to be. Now after 1 click or 5 visits the news site can show a subscribe now page to the user.

2) Google has given out specific identification for the News crawler which can come several times during the day so that people can distinguish it from the generic search crawler. News sites can get more selective. I am not sure how and what teh implications of this might turn out, but I am certianly thinking about utilizing all this soon.

Thanks to Microsoft. I dont think News Corp intends to pull content out of Google, but it is just a very real threat to which Google will have to give a good response to.

0 comments:

Post a Comment