Your Google Analytics Data Will Never Be Perfect (an... Your Google Analytics Data Will Never Be Perfect (and That’s Okay) Nov. 25th, 2019 Tom Lundin

Your Google Analytics Data Will Never Be Perfect (and That’s Okay)

November 25th, 2019

People spend a lot of time, money, and energy getting their analytics implementations up and running. So feeling discouraged when you see problems with the data or discrepancies between data sources is understandable. But such discrepancies don’t always indicate a problem. 


Let’s talk about data discrepancies

Google Analytics (and similar products) are client-side technology. These products rely on the user’s browser (the client) to parse some JavaScript and send the appropriate data to the analytics servers. But this dependence means that the client can interrupt, modify, or prevent the transmission of that data. Certain ad blockers or other extensions can also inhibit Google Analytics from functioning correctly. Even something as mundane as a user deleting the cookies on their machine can cause Google to see the user as two different people.

In addition, the user can interrupt or prevent the loading of a Google Analytics script. Maybe they start loading a page, but then close the browser window before the page finishes loading and the script has time to execute. This problem is mitigated thanks to browsers that load things asynchronously and Google’s recommendation that websites host the script higher in the page source than they used to do. But the issue still occurs.

One alternative is to look at server-side data sources. Server-side tracking relies on your web server to provide a count of how many times a particular page or file is accessed. With this solution, there’s a lot less data on the client’s configuration — typically just an IP address or an address and a user agent.

Time and again, I hear people say that server-side tracking or log files are much more accurate than those on the client side. After all, server-side tracking gives you exact counts, and your server-side counts should always be higher. So that’s obviously better, right?

Not so fast.

Data discrepancies between server- and client-side tracking

Does higher really mean more accurate? Your server-side counts probably include spiders or bots that are scraping your site to index it. At the very least, they’re probably not filtering out bot networks that are trying to appear to be human traffic. Although surprisingly difficult to measure precisely, bots are typically thought to comprise around 40% of all internet traffic.

Google Analytics attempts to filter out known bot traffic by detecting network and user browsing patterns. The filtering process is by no means perfect — but it’s almost certainly better than your server’s raw data log.

Suddenly those server-side numbers don’t seem so infallible, do they?

Note: There is one area where server-side data does indeed excel and should be relied on: mission-critical data that depends on an authenticated user interaction. A frequent example is ecommerce data. Because completing a purchase means being an active user and having your payment verified and cleared, bots don’t make up much of that traffic. 

Data discrepancies between ad servers and your website

Another common source of frustration when analyzing website data is the frequent disparity between the numbers that one system (say, Facebook’s Ad Manager) says it is sending to your site versus the numbers that Google Analytics say actually came through. These differences can sometimes be quite stark: It isn’t unusual to see an ad server say that it’s sending twice as many users through as Google Analytics sees.

This difference can have several possible causes, many of which I’ve already touched on:

  • Users blocking analytics on the browser
  • Users clicking an ad (often accidentally) and then closing the browser or hitting the back button before the destination page fully loads
  • Google Analytics blocking bot traffic before it enters your data

Finally, if you see a massive gap with almost no campaign data in Google Analytics, the campaign tracking parameters might not be configured correctly. Or there might be a problem in the series of ad-server redirects before the traffic gets to your site.

An important point: Companies that sell ad space are usually trying to justify their existence and so will show the rosiest numbers possible. Most don’t make much effort to weed out bots; the ones that do (like Google Ads) offer very little transparency or specifics about which traffic was fraudulent.

Can you learn to love your imperfect data?

Google Analytics data isn’t perfect. But I hope I’ve shown you why no analytics implementation is. They all have shortcomings, but for most purposes, a client-side tracker like Google Analytics offers numerous advantages.

If you take nothing else away from this post, remember this: Even if the data isn’t perfect, that’s okay. It’s actually more than okay. The most valuable insights in analytics don’t come from exact numbers. 

You can lose yourself in a forest of data if you don’t keep an eye on the most valuable insights: trends, comparisons, and relative changes. 

Analytics data that’s 10% to 20% off might seem like a big deal. But generally, this level of discrepancy has very little effect on the day-to-day reporting of your site. This is especially true if the numbers are always off by an equivalent amount. It might feel as though knowing whether a certain page truly had 22,000 or 24,000 pageviews last month is vital, but trust me … it rarely is. 

What is a big deal? If you notice a 20% decline in your bounce rate month over month. Or your conversion rate improves 10% for mobile. Or you see that your organic search traffic has doubled in the past year.

Focus on the big picture and what’s really important; spend less time worrying about the details. Now that’s a win-win.