Chasing the Long Tail, Part II
In my last post, I introduced the term "Long Tail" and hinted that a large catalog of web pages could generate substantial traffic, regardless of the quality of the actual content. I also mentioned an experiment I'm conducting to test this theory. I'm not going to go into specifics here as to where I'm conducting this experiment (the web address) because I don't want a flood of traffic from here ruining my traffic statistics. I want those stats to reflect traffic resulting from the test as much as possible. I should also point out that I'm no scientist and there's no real controls here. This is just an informal test.
For my experiment I need pages, lots of pages. I also need to make sure that those pages wouldn't generate lots of traffic of their own under normal conditions (read as boring, unpopular content) and that the web site I'm using for the experiment doesn't already have traffic. Here's what I did to satisfy those three requirements: I picked one of my unestablished domain names I registered and wrote a script. The script fetches Wikipedia articles and stores them locally (totally legal through the GFDL). It also rewrites all the links in the article so that they point internally. In other words, when a search engine indexes the site, a link to Mona Lisa would be to http://(www.mysite.com)/Mona_Lisa rather than Wikipedia's version. When the search engine follows that article link it will, again, cause the new page to be stored locally as well. The general idea is to have the search engine spider all 2,332,000 + articles at Wikipedia, but have them all be at my site instead. That gives me the ungodly amount of content I need for my experiment, but because my site is not popular the content would otherwise never be found in search results.
Being buried in search results is no problem. That's what I want. I want to test the idea (Long Tail) that given a substantially large content catalog, you'll end up with traffic anyway, even if you're not popular. Where this traffic comes from, who knows, but I sometimes click page 10 or more when I search just for fun, sort of pulling from the bottom of the deck in a search result. Plus search engine results are based on a combination of keywords rather than a single word, and it's possible one might have just the right combination the searcher is looking for, a combination that wouldn't be in the original content, and thus drives the page to the top of the results. Who knows, who cares. The experiment is just to see if a lot of traffic comes naturally from having a large catalog of content, regardless of the quality of the content. We're testing quantity over quality.
Here's the results thus far:
I started my experiment in mid-March, with a small handful of links to articles at the test site posted in a blog. The first step was to see if the search engine would even follow the links and index the subsequent pages. They did. They followed my handful of links, which led to more links, and more, and they indexed those too. It's been slow going. To date, with a "seed" of links less than 10, they have indexed approximately 1,200 pages. It appears that it is exponential growth as well. In the first few days, it was one or two pages indexed at a time. Towards the latter end, it's been hundreds of articles at a time. In the last week or two, the amount of indexed pages doubled in size. Cool. 'Cause there's plenty more left to go. Ultimately I'd like to see all two million plus articles indexed because then I have some serious data to examine.
The traffic stats would seem less than noteworthy to someone who doesn't see the significance of the increase. With roughly 1,000 articles indexed, the number of people who visit the site has grown by 5 people per day. That's nothing to write home about, but the point is there was an increase. It's evidence that the idea is sound, and that things are working as they should.
The greater thing to get giddy about is that if we assume a traffic increase of 5 people per 1,000 pages, and if all 2 million pages are covered, then that is a total daily visitor count of 10,000 per day. That is definitely something to write home about. One could earn some serious extra income off a site that generates approximately 300,000 visits per month. Assuming an average of four page views per visit, we're looking at 1,200,000 page views per month, and that's only using simple, uninteresting content borrowed from some other site. The content's not augmented with other services that might cause a visitor to return, or see the page as a useful resource.
If augmented with other services, who knows? Guess we'll have to find out as the experiment continues.
For my experiment I need pages, lots of pages. I also need to make sure that those pages wouldn't generate lots of traffic of their own under normal conditions (read as boring, unpopular content) and that the web site I'm using for the experiment doesn't already have traffic. Here's what I did to satisfy those three requirements: I picked one of my unestablished domain names I registered and wrote a script. The script fetches Wikipedia articles and stores them locally (totally legal through the GFDL). It also rewrites all the links in the article so that they point internally. In other words, when a search engine indexes the site, a link to Mona Lisa would be to http://(www.mysite.com)/Mona_Lisa rather than Wikipedia's version. When the search engine follows that article link it will, again, cause the new page to be stored locally as well. The general idea is to have the search engine spider all 2,332,000 + articles at Wikipedia, but have them all be at my site instead. That gives me the ungodly amount of content I need for my experiment, but because my site is not popular the content would otherwise never be found in search results.
Being buried in search results is no problem. That's what I want. I want to test the idea (Long Tail) that given a substantially large content catalog, you'll end up with traffic anyway, even if you're not popular. Where this traffic comes from, who knows, but I sometimes click page 10 or more when I search just for fun, sort of pulling from the bottom of the deck in a search result. Plus search engine results are based on a combination of keywords rather than a single word, and it's possible one might have just the right combination the searcher is looking for, a combination that wouldn't be in the original content, and thus drives the page to the top of the results. Who knows, who cares. The experiment is just to see if a lot of traffic comes naturally from having a large catalog of content, regardless of the quality of the content. We're testing quantity over quality.
Here's the results thus far:
I started my experiment in mid-March, with a small handful of links to articles at the test site posted in a blog. The first step was to see if the search engine would even follow the links and index the subsequent pages. They did. They followed my handful of links, which led to more links, and more, and they indexed those too. It's been slow going. To date, with a "seed" of links less than 10, they have indexed approximately 1,200 pages. It appears that it is exponential growth as well. In the first few days, it was one or two pages indexed at a time. Towards the latter end, it's been hundreds of articles at a time. In the last week or two, the amount of indexed pages doubled in size. Cool. 'Cause there's plenty more left to go. Ultimately I'd like to see all two million plus articles indexed because then I have some serious data to examine.
The traffic stats would seem less than noteworthy to someone who doesn't see the significance of the increase. With roughly 1,000 articles indexed, the number of people who visit the site has grown by 5 people per day. That's nothing to write home about, but the point is there was an increase. It's evidence that the idea is sound, and that things are working as they should.
The greater thing to get giddy about is that if we assume a traffic increase of 5 people per 1,000 pages, and if all 2 million pages are covered, then that is a total daily visitor count of 10,000 per day. That is definitely something to write home about. One could earn some serious extra income off a site that generates approximately 300,000 visits per month. Assuming an average of four page views per visit, we're looking at 1,200,000 page views per month, and that's only using simple, uninteresting content borrowed from some other site. The content's not augmented with other services that might cause a visitor to return, or see the page as a useful resource.
If augmented with other services, who knows? Guess we'll have to find out as the experiment continues.
Labels: experiments, projects, The Long Tail











0 Comments:
Post a Comment
Links to this post:
Create a Link
« Home