Lies, Damned Lies and Web Statistics
Posted by JoeFor most online professionals involved in any facet of online marketing, website tracking and statistics is the core bedrock of data for analysis from which to work from. However even for those who don’t live and breathe things such as clickthrough percentages, page views per visitor or referrer sources an understanding of even the most basic website statistics is imperative for those with an interest in their online business. The most basic metric commonly measured is how many visitors you have. If you don’t know how many people are coming to your website it will be impossible gauge how it is performing, growing and more importantly, how to maximise growth through it. Even answering the basic question of how many visitors a site receives however is not quite as simple as it first appears.
How Website Statistics are Calculated
Website statistics are calculated using a variety of methods but at its simplest they are calculated either through the server logs or through a tracking code embedded on the page. Pictured below is how the server side web statistics are calculated. All requests that the webserver gets for any particular pages, images or other files are logged into a central logfile. These logfiles are then examined and parsed and the data contained within them is collated and aggregated to produce the daily, weekly or monthly statistics and charts.

The second mechanism used for tracking website traffic is through the use of a small embedded piece of code on all relevant pages. This embedded tracking code then notifies a third party system that it has been called. The third-party tracking system then logs all of these calls into its database and generates the statistics and reports from this information. See the picture below for a high-level overview.

Which One is Better?
Dodging the question of which one is better for the moment the important point here is that they are different. As you can see above they use two different methods to log and track information about a website. This difference in implementation can also lead to a (sizeable) difference in how it calculates figures such as unique visitors. Take the example below which shows the total unique visitors to a particular website as calculated by three different systems for the same period.

That’s a pretty big difference. Almost twice as many visitors were recorded by Webalizer as AWStats and almost fifteen times as many visits are recorded by Webalizer as Google Analytics. The reason for this is what each of these systems consider to be a visit and how they are calculated.
Lets look in detail at two of these systems; the server side based AWStats package and the tagging based system Google Analytics. Both of these systems are free to users. AWStats may often be already configured with your hosting package and may require little more than turning it on. You can sign up for Google Analytics here however it also requires that you update your webpages with the tracking code.
One of the reasons that the number of visits recorded by Webalizer is so high is that it doesn’t discount the traffic from robots to your site. Robots are the sent by the search engines for example to find out what your site is about but they aren’t really visitors (in the human sense!). Both AWStats and Google Analytics provide a more useful figure which is the number of Unique Visitors to your site. Below is the Unique Visitors recorded by AWStats and Google Analytics for the same period.

Although the difference in reported figures has narrowed it is still considerable at almost a factor of three. The reason is again how they are calculated. Here’s just some of the differences:
- AWStats calculates a visitor using a combination of a the IP address logged and the time (usually one hour). Google Analytics uses tracking cookies and may recognise the same visitor for a longer period thus reducing the overall visitor count.
- Although both AWStats and Google Analytics exclude known traffic from recognised robots traffic from spam robots and automated scrapers is still logged in AWStats as a visit. Most of these spam robots and scapers do not execute the Google Analytics tracking code.
- Particularly pertinent for blogs is the recording of pages viewed through the administration area. The most common plugin used in Wordpress for Google Analytics discounts traffic from the administration area, AWStats does not.
- Google Analytics cannot track a visitor correctly if they have cookies turned off and/or there is other Javascript on the page which interferes with the tracking code.
- Google Analytics will not count visitors who do not allow the page to load fully. The tracking code is normally contained at the very bottom of the page and is one of the call to the tracking server is usually one of the last items performed before a page is fully loaded. Those visitors that quickly visit and then navigate away from a page before it fully loads may therefore not be properly counted.
As you can see above there are a number of reasons why AWStats may be “over counting” visitor numbers and equally a number of reasons why Google Analytics may be “under counting” them. All in all however the most realistic picture we feel is given by the Google Analytics numbers with the caveat that there is still a margin of error and even for something as simple as counting the number of unique visitors to a site there is no guaranteed reliable figure which can be calculated.
This has highlighted some of the very basic issues with web statistics looking particularly at counting visitors. Although no absolute figure can be ascertained there is still a benefit to running both systems for a variety of additional information which both can provide. For example Google Analytics can give a much better picture of the user navigation through the site and more fine grained marketing information whereas AWStats can highlight errors and robot traffic more thoroughly. Finally a word of warning. The visitor numbers to your site are just that; visitors. Of much more importance is to figure out how many of those visitors are potential customers. A huge volume of visitors to your website is rarely a disadvantage however a huge volume potential customers is going to have a much more beneficial effect to your online business.







March 14th, 2007 at 3:12 pm
Hi Joe,
I definitely admire when you point out this one “A huge volume of visitors to your website is rarely a disadvantage however a huge volume potential customers is going to have a much more beneficial effect to your online business”. It is important that each and everyone must know about this.
Thank you and keep it up..
October 12th, 2007 at 2:11 pm
Good article…
I am not sure if there any problems associated with moving the analytics code to the top of the page… But I will give it a try and see how it goes…
October 18th, 2007 at 5:31 pm
Jordan,
There’s no issue in moving the Google Analytics code to the top of the page in fact where you make explicit calls to the Urchin Tracker you must first have the relevant Javascript files loaded. However if you don’t need this functionality having it at the end of the page is preferable with regard to loading the page as the main page content will load first.
Joe
April 1st, 2008 at 7:32 am
Hey
Thank You a lot for your information. I was having the same problem with over counting and under counting from the Web Statistics results.
Thank You,
abeen
http://outsourcing.javra.com