Own Your Own Analytics Stack

Tags
Blog

(Author’s Note, please read the following in a Werner Herzog voice) “Since the dawn of the first web page counters and ‘Under Construction’ gifs prominently displayed on Geocities websites, mankind has always strived to know more about how people interact with their digital properties.”

The tools necessary to properly account for user actions have always been out of reach, the patterns have been proprietary, and the scaling has been expensive. But in today’s technological landscape, things are shifting heavily to allow better organizational control over their analytics stack. More options are available today that offer flexibility, reliability, and privacy than traditional analytics SaaS apps. As the world is on the precipice of enforcing better safeguards for consumer privacy, it will soon become a requirement to know how the data you collect is processed and ensure that it isn’t used against your customers or your organization.

What You Lose by not Owning

Let’s discuss the benefits you may not be aware of and end with some options you can consider. And Werner, if you’re reading this, let’s make it into a documentary 😁.

A More Complete Picture

Having your view abstracted away from the raw data, you rely on the reporting mechanisms of 3rd Party Vendors to know what to focus on. Worse yet, you also have limitations on how to incorporate future automation that focuses on signal vs. noise. Proactive alerts that can let you know when something goes wrong, and your most important metrics are in dire straits require capabilities that your Vendor may not be able to support.

Since your insight into the data is transactional, relying solely on 3rd Parties to provide insights locks your organization into “coin-operated binoculars,” having to pay constantly to maintain access to your data while limiting your field of view. There isn’t a way out of your ownership of the data; it simply delays the inevitable. The faster you can prioritize being the keeper of the data, the less pain your organization will feel when a future migration to centralize the data is forced upon it.

Consumer Privacy Controls

Online Data and its Privacy is a growing focus for legislative bodies all across the globe. With news of the largest players in the space incurring fines that amount to the billions, more scrutiny will be placed on giving consumers control over how their data is used.

You will also have to ensure that data stays or is only processed within the regions it was created within, which is difficult for many legacy SaaS solutions that made architecture decisions to centralize the data processing for the sake of cost efficiency.

Not having answers to these questions will only go so far before it is expected for organizations to be knowledgeable about all aspects of this sensitive data. Other questions you should know about your user’s data: Do you know everything that the tools on your site are tracking? Will you be liable for enforcing do-not-track requests if your Vendors fail to respect them? Will your data be purged when you’re done with the vendor, or does it live on?

Sovereign Priorities

The typical “web tag” architecture for collecting analytics invites a fox into the hen house. Not only are these Vendors collecting more information than you think, they are also using your data to be leveraged by your competition. Specifically around Advertisers, their priority is to sell ad space, which can lead to selling the consumer you’ve been trying to convert an alternative offer with someone else.

This doesn’t mean you cannot use any Vendors to advertise. Instead, by controlling an upstream solution that sends data to these Vendors, there are techniques you can leverage to provide insights without letting them re-market to your customers without you. One of these techniques is hashing your User Identifiers so they are unique per downstream platform while allowing you (and only you) to unify activity across all Vendors.

It’s Easier Than You Think

Great, so by now, I should have painted a scary, post-apocalyptic picture of the future where you need to own the mechanisms that power the insights your organization relies on and spends a lot of cash on.

But you don’t need to go down this pathway alone; there are a lot of great solutions that are wholly or partially managed that can give your team the firepower they need to wrangle this need and gain control over your analytics.

Consumer Data Tools

Worried that your team cannot support their own stack long-term? You can use one of these managed systems to better wrangle who has access to your data and where that data goes without needing to cobble a solution from scratch.

  • MetaRouter - MetaRouter is a Consumer Data Infrastructure (CDI) platform that gives you a private environment to process data upstream of your vendors. They also have a proprietary Identity Graph that helps sync consumer identifiers in a performant and privacy-focused way. (I’m also one of the founding members, so maybe a bit biased 😅)
  • Countly - Countly is an all-in-one, installable analytics tool where you can send metrics and see insights all within an environment you fully control.

Build Your Own

Plenty of new, managed tools by Cloud providers can be combined to create your own Analytics Platform. Here are the main tentpoles you want to incorporate into your architecture:

  • Client-Side Connections: Ensuring that public-traffic-generated web events can safely and securely make it into your Solution. Managed, serverless products like Google Cloud’s API Gateway can connect to Cloud Functions and route data further into your Cloud environment. By focusing on a solution that can ingest HTTPS or gRPC request directly will allow all of your digital channels to feed insights.
  • Streaming Data Platform: You will want to have a solution to ensure that the streaming data coming into your API is durable and can be syndicated to multiple destinations (if desired) with minimum latency. Google Cloud offers Pub/Sub, Dataflow, and Datastream, which are excellent products for duplicating and manipulating data payloads in real-time. Cloud-agnostic solutions, such as Kafka or Redpanda, are also powerful tools that can ensure that data does not drop if there are issues with downstream processors.
  • Data Warehouse / Data Lake: Data Warehouses will be the solution that power your insights, allowing for a wide range of activity to be reported upon to clearly see consumer signals. Google Cloud’s BigQuery is the best-managed tool I’ve used for this, as it’s scalable without much work, forgiving with migrations and backups, and has great controls to purge data and ensure you’re not holding onto protected Consumer data for too long.

You Can Do It!

Owning your own analytics infrastructure will serve you well long-term, as you’ll have a more complete picture on activity that isn’t pay-gated, better respect your consumer’s privacy, and will not be beholden to the competing focus of another organization when they engage your customers. And it is easier than you think to implement with the availability of new or managed solutions that organizations of any size can adopt.