Every company wants to deliver high-value data insights, but not every company is ready or able. Too often, they believe the marketing hype around point-and-click, no-code data connectors. Just set it and forget it, and all the hard work gets done — right? The hard truth is there are several critical steps that all data must go through in order for teams to get to what they want: high-value insights.
In many respects, centralizing the data is the easy part. There are more options than ever to quickly and easily build pipelines that ingest data from hundreds of sources and load it into a data warehouse or a data lake. However, there’s a major topic overlooked by today’s tooling marketing: enabling the transition from data centralization to high-value insight delivery. This is the hard work of data science. High-value insights aren’t free and companies can only get those insights by taking data through its paces.
The process of moving from “we have data” to “we have value” is complex and can be grouped into four key steps:
- Extract and Load: get data from sources
- Transform: get high-level data insights
- Learn: use ML/AI models
- Serve: get insights into applications
These steps are a progression of data-specific work, not an assembly of processes or tools. Let’s talk about how to move from “Extract and Load” to “Serve” and what it means to “take data through its paces.”
Taking Data Through Its Paces
Data teams are tasked with building a data stack to support business needs in what we classify to be the “Hierarchy of Needs for Data-Driven Companies.” The hierarchy looks like this:
Business-critical data is sitting in dozens, if not hundreds, of different sources. In some cases service providers like Datacoral and Fivetran offer connectors to pipe that data into centralized warehouses and lakes. And in others, engineering teams build these pipelines with custom code. But even when the data is flowing, it takes expertise to figure out what that data actually means. Companies can’t just jump into the data and start serving insights that predict and improve business outcomes. There’s foundational work to be done first.
The foundational work of data science includes: experimentation, hypothesis testing, normalization, cleansing, transformation, visualization, analysis, and more. Insights can’t be served without a deep understanding of the data. Let’s take a look at how this typically plays out.
Moving From “Extract and Load” to “Transform”
Let’s imagine a small startup that has just signed up their first few customers. They have product usage data being saved in their PostgreSQL database and are beginning to think about analytics and deriving value from this data. At the moment, they are at the very bottom of the data hierarchy. The first step for them to ascend the data hierarchy is to extract data from their database, or other data sources, and load it into a data warehouse, such as Snowflake.
Once the data is available in their warehouse, the team can start to explore and clean the data with simple transformations. Since their team is small, this work is likely done as a side project by one of their software engineers (or a very technical product manager), and the outcome is a small dashboard that shows important metrics about how users are using the startup’s product. At this point, the startup has been able to not just extract and load data from their sources, but transform it to produce KPIs and metrics that are relevant to their product. This is already creating significant value. When the team grows in size and sophistication, they can begin developing machine learning models and serving data into apps.
Moving from “Learn” to “Serve”
Now, let’s imagine a larger, more mature company. This company already has connectors for extracting data from different sources. Data is already consolidated in a data warehouse. The team has spent months exploring and cleansing their data and they have simple dashboards that display key metrics. The team trusts the data quality enough to rely on these metrics for internal reporting. They now identify use cases for predictive analytics, and decide to make their first Data Science hire. The new Data Scientist is able to rely on clean and well-understood data in the warehouse and helps the team climb the data hierarchy by training the first ML model.
With an ML model in production, the team is able to serve internal and external use cases: customer lifetime value (LTV) estimations for their Sales and Marketing teams, or time-saving features for their customers. At this point, the team is successfully navigating the data hierarchy; they are going all the way from raw data that is locked in source systems to serving insights in their applications. While there is always more work to be done to scale to more data sources, new use cases and larger data volumes, this is a successful data journey that very few companies are able to complete. For an example, see how Greenhouse improved their user’s experience of their product as they ascended the data hierarchy.
Data Hierarchy as a Framework
The idea of building a data stack can be intimidating. Employing the data hierarchy as an overall design framework can put your mind at ease. It’s tool-agnostic, non-prescriptive, and focuses on the high-level design and execution processes.
The first layer of the framework (“Extract and Load”) involves centralizing data into a warehouse or lake. The second layer (“Transform”) involves aggregating data from multiple sources and discovering meaningful insights. The third layer (“Learn”) builds on those insights with machine learning models and artificial intelligence. The fourth layer (“Serve”) allows data from AI/ML work to add value to internal and external services, customers, dashboards, and more.
By identifying which parts of a data stack fit into each layer of the framework, a data team can take a conceptual approach to delivering value. It’s not a one-to-one relationship, either. Some data tooling can serve multiple layers of the framework. For example, Fivetran can do work in the “Extract and Load” and “Transform” layers. Datacoral can do work in all four layers. Custom code might be used to serve the “Extract and Load” and “Learn” layers. The key takeaway is using the framework to help design instead of using a series of tools to figure out where the value is.
Moving from “Extract and Load” up to “Serve” is what we call “taking the data through its paces.”
We often see confusion occur when employing a tool-first mindset instead of a data hierarchy mindset. The number of different components and interconnections can seem astronomical. (See our data quality post to learn more.) Finding the right combination of tools to meet specific needs is overwhelming for small data teams just getting started–and there’s a lot at stake. This web of tools needs to survive and enable the company to grow for the next two or three years. The capabilities of the tools also need to scale with that growth.
If you’re looking for examples of how to take data through its paces, we have two stories to share. Enterprise talent acquisition platform Greenhouse used ML models to improve their interview scheduling experience and solve their “pointillistic spaghetti data problem.” Gig-economy startup Jyve used their data intelligence to inform “Jyvers” on their platform and expand into new markets. Small data insights can go a long way!
Delivering Valuable Insights with Data
Companies looking to deliver high-value insights need to put data through its paces. The Data Hierarchy of Needs provides “the what” and as we have explored in prior posts about simplifying the modern data-stack, and using metadata’s operational capabilities to future-proof said stack, the metadata-first, three-layer framework provides “the how.”
The framework’s data flow layer represents the transition from “data” to “value.” But it is just a start. These ideas collectively offer a way of thinking in an objective and abstract way about your data.
The hierarchy makes it clear that there is a sequence of dependencies in transitioning from “having data” to “getting high-value insights from data.” This is how data-driven companies operate. It is no accident. They move data through its paces. It requires expertise and experimentation.
Looking at data through the lens of the Data Hierarchy of Needs allows any data team to develop a meaningful understanding of where they are in the hierarchy, how to assess their current position’s strengths and weaknesses, and how to move into the next step.
To learn more about how Datacoral puts data through its paces, and uses metadata to improve team’s dataflows, you can read more about us on www.datacoral.com or reach out to our team at hello at datacoral dot co.