September was a big month for marketing data infrastructure at Datacoral, especially in the context of our partnership with Amazon Web Services. At the beginning of the month we conducted and published our first webinar recording:
This event covered how Datacoral goes beyond popular cloud-based ETL and ELT products to support a cost effective, scalable and compelling data infrastructure platform within AWS using native AWS services.
Beyond supporting AWS-best practice ELT centered around S3, Redshift and Athena, the five additional requirements are:
Datacoral’s founder, Raghu Murthy was featured on Dan Woods’ podcast at EarlyAdopter.com, talking about the origin of Datacoral, and the advantages of deploying serverless data integration technology.
We are very pleased to see that Amazon Web Services has published our blog about how SQL is the data programming language used to build data pipelines in Datacoral.
This article builds upon our earlier data programming series, and does a nice job of illustrating how we use SQL as the programming interface combining it with header comments which are key/value setting that tell Datacoral how to process the query. When we deploy it in our system we call it a Data Programming Language (.dpl) file, but it’s just your query with Datacoral instructions in the headers.
We also just ran a webinar that profiled five of our customers, Greenhouse, Front, Jyve, Swing Education and Cheetah. Characterized as Data Innovators, these fast growing organizations are inventing transformative business models for the gig economy, logistics, mobile, collaboration, human capital management and artificial intelligence. A recording of this event is available here, and becomes the building block for what we will introduce next week.
On October 10th, we will unveil our next initiative, the Data Infrastructure for Startups Program designed to help early stage technologists overcome the inevitable issues in tapping their data resources. The common, cost-conscious mindset is to use a combination of open source and ingenuity to build out initial data infrastructures. While it may be ok to trade an engineer’s time for money saved, at some point, fairly quickly, that becomes a maintenance burden for that engineer that further bogs down their productivity.
This is a very common problem for data engineers who are building and healing data pipelines, and this program will help resolve that, just as we did for the startups we featured earlier this week.
As we roll this out, we will also feature data solutions that we have implemented including:
I’ve been at Datacoral for two months. In that time, I’ve met or corresponded with most of our customers. What impresses me is how they describe the value that our data-infrastructure-as-a-service brings them. More than one says that we have saved them from needing to hire a team of engineers to build out and manage their data infrastructure. Okay, that sounds like pretty big value, but it’s abstract value, because I can’t immediately assess what ‘team’ means in terms of membership size, responsibility, roles, and skillset of each team member. So, I’ve been thinking about how to turn the value of this mysterious engineering team into something that I can explain and everyone else understands.
In discussing this with colleagues and friends, we concluded the following:
What’s disappointing about that 5-person team I just laid out is that only one of them is focused on actually deriving value within the data pipeline building transformations, while, like the curling team above, are all others working outside and around the data. So, a five person team’s time invested in data (%TID) is only 20%, which seems really low when you consider there’s five of them.
I’m sure that I can keep making the same kinds of analogies with a six-member hockey team and a seven-member water polo team, etc, where the puck or ball represents the data, and the activities and positioning of players receiving passes are infrastructure. We can get all the way up to the 11-player-per-side sports like Soccer and American Football and it all still works. Lots of coordination to prevent the other team from disrupting the flow of the ball (representing data).
Searching online turned up some great organizational development and scrum articles like Mark Ridley’s “What’s the perfect team size.” (I confess, I’m biased towards this one because he uses Neo4j, the last product I marketed, to express the total number of relationships among teams of any size.) He ultimately concludes that the ideal team size is between 4 and 9 people.
Now that I have my five team members, I’d like to figure out how much I might save if I didn’t need them. Or what I might accomplish if I reallocated them and their focus time to specific tasks to which they are suited. To answer the first question, “how much money would I save if I didn’t need to hire a team?” is pretty straightforward.
I can figure out how much a data architect or data engineer can expect to make from Salary.com. (Salary.com does not yet track salaries for Data Engineers.) From the table below, we see that the average salary is $142.6k and I’ll estimate that their fully loaded costs with benefits and bonuses add another 25%, making the average cost per team member, just over $178k. So, if I save three team member hires, that’s over $530k and a five-member team is almost $900k per year. Wow!
|San Francisco||Median Salary|
|Data Architect I||$98,000|
|Data Architect II||$129,000|
|Data Architect III||$144,000|
|Data Architect IV||$163,000|
|Data Architect V||$179,000|
|Avg Team Members’ Cost||$178,250|
|3-Member Team Savings||$534,750|
|5-Member Team Savings||$891,250|
Unlike Uber, I’m not going to lay off my engineers, they are still too valuable. What I’d like to do is increase their time invested in data (%TID), capitalizing on their SQL skills because the infrastructure doesn’t need much attention. That table looks like this:
|Team Member %Time invested in Data||Existing %TID||New %TID||Data Investment|
|1 Team Member||20%||70%||+50%|
|Value per member||$35,650||$124,775||$89,125|
|5 Team Members’ Investment in Data||$623,875||$455,625|
So, if they go from 20% of their time invested in data, to 70%, that’s a plus 50% bump in their data investment, which works out to be over $89k per employee or over $450k per team.
We can now look at these benefits in a few ways:
Those are pretty big returns, and I suspect the benefits of outsourcing my data infrastructure (assuming I make no trade-offs in security and manageability) are even bigger, like:
I’m hosting a webinar tomorrow morning on the Top 5 requirements for AWS-native Data Pipelines. In it, I’ll explain the latter two benefits more deeply as we discuss why a serverless microservices architecture, which is how we are built, is the optimal deployment model for AWS customers. Then we’ll talk about how orchestration, change awareness and data publishing are the remaining key requirements beyond supporting AWS best practice ELT, like most other vendors are touting to enjoying the customer benefits I’ve described in this post.
Datacoral is headed to New York City on Thursday, July 11 as a Silver sponsor of AWS Summit at the Jacob Javits Center. The event is free and runs from 7 AM to 6:30 PM.
We will be showing off our AWS-based Data Infrastructure as a Service (DIaaS) for data engineers, data scientists, Redshift administrators and BI analysts. Datacoral is a complete, end-to-end data pipeline service that runs securely in your VPC, connects to your cloud data, organizes and orchestrates it in Redshift, and allows users, applications and original sources to harness the results. Data is delivered as materialized views to whatever target you want.
We help customers address the most critical problems in data self-service–building and maintaining their data pipelines. Our customers tell us that we help save them over half a million dollars in resources per year, while giving their data engineers time to actually work with the data, not around it, which results in happy data scientists and consumers.
If you have data pipeline troubles, or are just moving into AWS altogether, then come see us in Booth #149, in the far left corner as you enter the exhibit hall, next to the Dev Lounge. We will have plenty of space and great give-aways including t-shirts, wireless phone chargers, pens, stickers and more. Plus you can meet Datacoral’s founder Raghu Murthy, who cut his teeth building infrastructures at Facebook and Yahoo!.
See you in New York!