This is an old hack that many are aware of but I thought I'd post it anyway for those who aren't, and because I recently found it to be useful. Here's the situation: I was looking for an example of how to properly format a snippet of code that I haven't quite mastered. Everytime I did a query at Google, the first site that showed up appeared to have the answer, but required a subscription to view it. Frustrated, I had to continue searching until I found a non-subscription site that gave me what I needed. After a couple of searches like this, I decided to just cut to the quick. Obviously if it's top in Google search results, Google is seeing the full answer. I want to see what Google sees.
Google sends out robots to grab the content off web pages and store them so they can be retrieved through their search engine. Google's robot, or
Googlebot as it is called, is a browser similar to Internet Explorer or Firefox. Browsers are
user-agents in webspeak, which is just fancy terminology to say a program that retrieves (and usually renders) HTML content. Every user-agent is identified by name, and web designers can capture the user-agent name and present special content based on that name if they desire. A practical use of this technique would be to send users browsing the site on a mobile phone off to a less graphically intensive version of the page. There's a number of solid uses for displaying content based on what user-agent is accessing it. Many subscription sites, for example, may want to limit access to content for regular users, while at the same time giving Google the full version for indexing. That's what we're exploiting.
Here's how you make websites think you are GoogleAll web browsers have a user-agent identifier. The identifier tells a server what browser you are using, and often the operating system you are using.
For example, my version of Firefox is:
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.12) Gecko/20080201
Firefox/2.0.0.12
And my version of Internet Explorer is:
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727;
Media Center PC 5.0; .NET CLR 3.0.04506)
You can check your own at
http://www.useragent.org/Googlebot's user-agent is:
Googlebot/2.1 (+http://www.googlebot.com/bot.html)
A user-agent identifier can be anything really, and doesn't even have to refer to a "real" browser. For example, when checking my website stats one day I saw a "Commodore 64" user-agent. Now I know that's bull shit. Can you imagine how cool it would be if someone did convert the old
Commodore 64 into a web browser? I've also seen "Nintendo 64" and a bunch of other interesting mods. What it was is that some dude hacked his normal browser to pretend it's a Commodore 64 and pretend it's Nintendo, when it was probably just Firefox. That's what we're going to do here, but instead we're going to pretend we're the Googlebot.
You can hack any of the popular browsers to have different user-agent identifier, but I'm going to recommend using Firefox because it's the easiest to modify and then reset, and it's hard to break. Firefox makes it easy because some bad web designers optimize their content just for IE and then ignore all the other browsers. Some of them even use the user-agent catch to
prevent other browsers from accessing the site. Firefox anticipated this and made it easy for users to pretend they're using IE instead.
To change the user-agent identifier in Firefox, just enter
about:config as an address in the address bar, the location where you normally enter a URL. Next, press the right mouse button to get the context menu and select "String" from the menu entry "New". Enter the preference name "
general.useragent.override", without the quotes. Next, enter the new user-agent value you want Firefox to use (which can be anything). Here we'll add Googlebot's user-agent identifier: "
Googlebot/2.1 (+http://www.googlebot.com/bot.html)", without the quotes.
Restart the browser and you're all set. You are now "Google". I won't post the site I used this hack on out of respect to them and their subscription-based model, but I'm sure you can find some practical uses. One other thing I like to use this for is marketing. When bored, I add "JeremyBot http://www.jeremyparnell.com" as the user-agent. Typically, the browser a person surfs on is stored in the website's statistics, so it's fun to give them something interesting when the administrator checks the stats. To my knowledge, none of my clients have come from this technique, but you never know. It's free and fun.
To reset Firefox to it's normal user-agent setting, again enter "about:config", locate "general.useragent.override" in the list, right-click it and choose "reset". Now you're back to normal.
Labels: hacks