I’ve built a simple client-side website analytics tool for this site, you can see it at /analytics. It has the following metrics:
- Page views per day,
- Unique IP addresses per day
- Views per page per day.
This article eventually made it to the front page of Hacker News, which resulted in a lot of extra traffic and an opportunity to see how the tool performed under a much heavier load. I wrote about the affects of this and subsequent design changes here.
I compare the different results from CloudFlare Analytics, CloudFlare Web Analytics and my own tool in this follow-up article.
Google Analytics felt like overkill. It has so many data-points that the useful metrics are obscured. I also like this site to load quickly and GA makes it slower.
I’ve also tried CloudFlare Analytics. It’s a lot simpler than GA and better suits my use case, but I don’t think its accurate.
The analytics should be easy to access and easy to understand1.
I know from my work visualizing data and building dashboards that the metrics presented will alter the users perception of the underlying reality.
The way that someone thinks about their impact on a business, the value they’ve produced, or the dynamics of the underlying system (a product’s quality, site performance, growth, etc) is influenced by the design decisions I make, such as which metrics are available, how easy they are to access, or which metrics are above the fold.
If I present a particular metric as if its important, it will be difficult for someone who uses the dashboard to resist this implied message. They’ll eventually consider the metric as a Key Indicator of some kind.
For these reasons I wanted to see only the most important metrics about my website, and I wanted to see them in a simple way without distraction.
The only metrics I’m interested in are:
- How many people are reading my site
- What are they reading
- How much are they reading.
I’d like to be able to infer whether I have a few people who read a lot, or a lot of people who read a little. (Or, as is the case, a few people who read a little.)
I’m assuming that unique IP addresses is a good enough proxy for unique readers, even though I’m not considering crawlers, bots, or RSS subscribers2 .
The analytics “engine” works by consuming a request that is sent by the client each time a page is loaded. The request is parsed by a Cloud Function on GCP which extracts the page URL and the IP address. This is then recorded in a DataStore database along with the current date and time.
Viewing the analytics is as simple (and as complicated) as making a request to the database, parsing the data and visualizing it conveniently. For example, group the data by days and count the distinct IP Addresses to figure out how many people are visiting each day. This is achieved by making a request to another Cloud Function that returns a response with a JSON payload.
It’s not a perfect solution, there are edge cases I’m not considering. I expect
it to be mostly right and good enough for my purposes. It didn’t take much
effort and it was a fun mini project. The hardest part was figuring out
chart.js, the slowest part was iterating on the Cloud Functions.
Mocking Cloud Functions
I haven’t figured out how to easily test cloud functions locally - it would require setting up a NoSQL database and mocking Flask requests and responses. Instead of doing that, I watched Peaky Blinders for a couple of minutes whilst each new version of the Cloud Function was deploying.
Unless someone decides to spam the site, I expect the costs to be less than €1/month. This site is hosted using CloudFlare, so I suppose I could setup some page rules to prevent malicious traffic3 .
Tasks for later
/analytics.htmlload faster - latency is caused by the Cloud Function initialising. Short of paying actual money for always-on resources I can’t see a way to reduce this. However it’s only an issue if you are the first person to view the page in the last ~10 minutes - this blog post explains whj.
- Add loading spinners - I used the same snippets as in my Machine Vision demo.
- Group data by weeks or months as well as day.
- Aggregate the data (once per day) in a Cloud Function instead of repeatedly in the browser.
- Understand why the DataStore API is called multiple times for a single fetch.
- I’d be interested to know if there is a way to track RSS subscribers. I know that the usual method is to inspect server logs, but this site is hosted on GitHub pages so I don’t think this is possible.
I’ve used the
chart.jslibrary because its reasonably fast and lightweight. My preferred library would be
Plotlyif it could be responsive and fast even if there are >10 charts to render.
plotly.jsimproved recently to the point where it wouldn’t cause a browser to lag if multiple plots are being rendered?
Finally, it occurs to me that I could make an analytics widget for my desktop using Übersicht. It could show page views for the current day perhaps. I’ve made a couple of widgets before [1, 2] which were written in CoffeeScript, but the newer widgets are written in React, so I guess this is an opportunity to learn4 .
- In Google Analytics it can be fun clicking around on all the things and seeing lots of options, but its not really useful once the novelty has worn off. ↩
- I think this might be quite wrong, but I don’t know why. ↩
- The page is now rate limited to 5 requests per minute per IP address. ↩
- Done! My desktop now looks like this: ↩
- I failed the exam because I’d been working on Ry’s Git Tutorial instead. ↩