“The temptation to form premature theories upon insufficient data is the bane of our profession” — Sherlock Holmes
When Arthur Conan Doyle wrote these words in 1915, he may as well have been talking about every large business in the very early 21st century. Over the last decade, these businesses have woken up to a very new world.
With the advent of the smart phone in 2005 and the proliferation of smart devices more recently— home technology (security, irrigation, control systems), agricultural technology (energy and water systems, controlled environment agriculture, autonomous vehicles and tools), manufacturing technology (manufacturing robotics, supply chain speech and vision systems), transportation and delivery technology (UAV, robots) — we have over 27 billion smart connected devices worldwide today. That number is likely to be 80 billion by 2025–ten devices for every human on this planet. Imagine what these devices have the capability to do for businesses catering to their customers— be it growing better quality food, preventing fires, intervening in critical health situations — the list could go on. They render critical information that could solve so many hard problems! Businesses could be safer, more responsive and able to offer a whole new level of convenience to their customers.
The proliferation of these devices has not been lost on these businesses, which are now collecting data at an exponential rate. In fact, every day we produce 2.5 quintillion bytes of data, and we’ve collected more digital data this past year than the previous five thousand years of human history.
In a bid to make sense of all this data, businesses have begun to employ and train data scientists and machine learning engineers. These job categories are so hot that a report by LinkedIn found that 6.8X more people list their jobs as Data Scientists and 9.8X more as Machine Learning Engineer today than they did five years ago. These jobs were barely recognized as serious jobs more than a decade ago!
Given the above, it would seem like Sherlock Holmes statement is finally irrelevant in the world of business today. Unfortunately, that is anything but true.
The Plumbing Matters
It seems that all along while making new devices, we lost sight of how painful the collection of data (and making sense of it) is. We learned how difficult it really is to move data, store data and make data available accurately and reliably — a problem compounded by the sheer volume of number of connected devices in the world.
- It isn’t the data science that is the problem, it is the quality of data. The ability to keep data synchronized, clean and reliably piped is much harder than even the most astute engineers could imagine. It’s really been a big bottleneck to the insights we need.
- When dealing with distributed data at this scale, the security and privacy of data becomes very hard to control.
- All data is unique, a single generalized platform can’t solve our problem.
The Birth of a Framework
This is where Raghu Murthy comes in. He was the first entrepreneur to sign up in January 2016 to our Discover program, founded by a small group of technologists and entrepreneurs on a mission to to solve 40 of the world’s hardest problems by 2045. Raghu was instrumental in growing Facebook’s infrastructure to exabytes of data as they grew from fifty million to one billion users. Having seen, lived and experienced these problems himself, he decided to build a framework called “Datacoral” that would solve the problem for data scientists and data engineers:
- Break the problem into little blocks — each of these byte-sized operations is called a slice, and does the job of collecting, organizing and enabling access to data. Because these slices are functional tools, they can be deployed in the cloud, on devices and on any machine.
- Make sure the developer retains control — each slice runs within a developers AWS account and on their servers.
- Make sure privacy and security are inherently a part of this framework — this is a strong principle of the framework
- Automate data flow and fix broken plumbing — this is another important trait of ensuring that the dynamic flow of data is continuously monitored and corrected.
After two hard years of building out this framework and working with their early customers, including Greenhouse, Front, Fin, Ezetap, Swing Education and mPharma, Datacoral is finally ready to officially open their doors to make their “data infrastructure as a service” platform available to more companies. Along with Raghu and the Datacoral team, I’m thrilled to welcome Sudip and the Madrona Venture Group as partners on this journey. Head over to www.datacoral.com if you’re interested in making your data hum for you.
PS: Thanks to Phil Deutch and Milind Gadekar for their counsel and advice.