Advertisement

Analytics needs data lakes – but they will also disappoint us

Data lakes promise to unlock new ways of analyzing data, says David Creelman. But he also says they are not the silver bullet we all think they are:

Article main image
Sep 13, 2024

People are usually attracted to analytics because they love using powerful tools to find patterns in data.

Sadly, when they get on the job, they do not spend much time doing anything like data science.

Instead, they spend the bulk of their time organizing and cleaning data to get it in a form where they can do at least a little analysis.

This is often referred to as “data wrangling.”

Data lakes

One way to avoid spending so much time doing data wrangling is to routinely collect any relevant data in a manner that it is easier to analyze.

Historically, the tool to do this was called a data warehouse – which aimed to get data from various systems cleanly organized in one place.

A newer approach, however, is to build what is called a “data lake.”

A data lake comprises a central location that holds a large amount of data in its native, raw format. It differs from a hierarchical data warehouse, which stores data in files or folders.

Compared to a data warehouse, a data lake approach is typically regarded as a much more flexible and scalable solution because it is capable of handling both structured and unstructured data.

So far so good.

There is no point in having a lot of data if you cannot analyze it and data lakes are a big part of the solution.

As such, many people analytics departments are devoting a lot of effort creating a data management infrastructure that will eventually make it much easier to do their analytics work.

Why data lakes will not solve all of our analytics problems

But while data lakes may sound like a good idea for your organization, we shouldn’t let our hopes get too high.

One reason is that creating and maintaining a data lake is a lot of work.

However, there is a deeper – much more fundamental – issue that gets closer to an underlying limitation of people analytics: Often the information you need to answer a business question just does not exist anywhere in the organization’s datasets.

It is not a matter that the data is not readily accessed, or is not integrated, nor is not clean, or not analyzed with sufficiently sophisticated tools. It just out and out does not exist.

Data that doesn’t exist

For example, we might be interested in what leads some HR business partners to be successful whereas others struggle.

We can look at their education, career histories, and performance reviews, but the underlying causes may not be found in any of those.

No matter what kind of sophisticated analysis of our data we do, we will not find the causal mechanisms that lead to HR business partner success.

Similarly, we might be desperate to know how to avoid bad managerial hires but simply not have enough data to draw any conclusions.

In both these cases we will need to search for other types of evidence, maybe even just other hints, on what is causing success or failure and act based on that.

What analytics gets wrong

People analytics is often seen as an application of data science where the essence is grappling with quantitative data.

Yet insights involving nuanced issues around people are not always found in data.

To understand what makes for a successful HR business partner, for example, you might be better off doing a series of in-depth interviews, the way an anthropologist would, to understand the dynamics that lead to success.

The bottom line is that people analytics departments should not be obsessed with analytics; they should be obsessed with providing insights into business issues.

Sometimes those insights will come from quantitative data and other times it will come from softer methods, like interviewing stakeholders.

The best analytics departments are comfortable drawing on both hard and soft insights to paint a picture of what is happening and what the organization should do.

It is smart to invest in getting your data in good shape and that will help quantitative analytics play a role in providing insight into issues.

But, we should be aware that often the data will not have the answers we need and be capable of applying different kinds of methods to guide our decision-making.