And what is the ROI if I don’t have to hire so many team members?
I’ve been at Datacoral for two months. In that time, I’ve met or corresponded with most of our customers. What impresses me is how they describe the value that our data-infrastructure-as-a-service brings them. More than one says that we have saved them from needing to hire a team of engineers to build out and manage their data infrastructure. Okay, that sounds like pretty big value, but it’s abstract value, because I can’t immediately assess what ‘team’ means in terms of membership size, responsibility, roles, and skillset of each team member. So, I’ve been thinking about how to turn the value of this mysterious engineering team into something that I can explain and everyone else understands.
In discussing this with colleagues and friends, we concluded the following:
- One person isn’t a team, and that individual will be overwhelmed with data access and availability requests within the first three hours on the job. By the end of the week, they’ll be asking for more resources to share the workload.
- With two people we get doubles tennis or beach volleyball–cooperators certainly, but they are a pair and not quite up to team standards for availability, skill coverage and redundancy.
- My original minimum team size estimate is three people, like half-court basketball, because they can potentially provide seven-days per week on-call support coverage, without overworking any one individual. Of course data pipelines break all the time because data is unpredictable therefore you do need everyday coverage.
- Four people teams are found in curling, where positions and roles begin to take shape. Curling’s roles on a team are Leads, who take the first two turns at throwing the rock before sweeping the next six throws. Seconds and Thirds take two throws each while the Skip, the skipper and strategist for the team, takes the last two throws. The throws, themselves are important in curling, as this puts the rock in motion, allowing the sweepers to work around it. In the data pipeline world, I equate this to three people focusing on work around the data, while the rock represents the data itself. Out of all four team members, only one per turn touches the data, the other three are working the infrastructure around it. So, only 25% of the team touches the rock (data) during any given turn.
- Full-court basketball, with five people per side, seems to me to be the first all-purpose team size. Here we begin to have defined roles, performance expectations and a high degree of flexibility. I can see one team member working on future infrastructure additions like mapping data models for new sources, two managing the daily chores of orchestration and data quality checks while one person works on improving the value inside the pipeline by building and testing new SQL transformations. And finally, the point guard directing the action, defining how the overall architecture grows to support future needs and making sure management is happy. I like five.
What’s disappointing about that 5-person team I just laid out is that only one of them is focused on actually deriving value within the data pipeline building transformations, while, like the curling team above, are all others working outside and around the data. So, a five person team’s time invested in data (%TID) is only 20%, which seems really low when you consider there’s five of them.
I’m sure that I can keep making the same kinds of analogies with a six-member hockey team and a seven-member water polo team, etc, where the puck or ball represents the data, and the activities and positioning of players receiving passes are infrastructure. We can get all the way up to the 11-player-per-side sports like Soccer and American Football and it all still works. Lots of coordination to prevent the other team from disrupting the flow of the ball (representing data).
Enough Sports, What about Work Teams?
Searching online turned up some great organizational development and scrum articles like Mark Ridley’s “What’s the perfect team size.” (I confess, I’m biased towards this one because he uses Neo4j, the last product I marketed, to express the total number of relationships among teams of any size.) He ultimately concludes that the ideal team size is between 4 and 9 people.
Building the ROI Model
Now that I have my five team members, I’d like to figure out how much I might save if I didn’t need them. Or what I might accomplish if I reallocated them and their focus time to specific tasks to which they are suited. To answer the first question, “how much money would I save if I didn’t need to hire a team?” is pretty straightforward.
I can figure out how much a data architect or data engineer can expect to make from Salary.com. (Salary.com does not yet track salaries for Data Engineers.) From the table below, we see that the average salary is $142.6k and I’ll estimate that their fully loaded costs with benefits and bonuses add another 25%, making the average cost per team member, just over $178k. So, if I save three team member hires, that’s over $530k and a five-member team is almost $900k per year. Wow!
|Data Architect I
|Data Architect II
|Data Architect III
|Data Architect IV
|Data Architect V
|Avg Team Members’ Cost
|3-Member Team Savings
|5-Member Team Savings
Increasing Team Member’s Percent Time Investment in Data (%TID)
Unlike Uber, I’m not going to lay off my engineers, they are still too valuable. What I’d like to do is increase their time invested in data (%TID), capitalizing on their SQL skills because the infrastructure doesn’t need much attention. That table looks like this:
|Team Member %Time invested in Data
|1 Team Member
|Value per member
|5 Team Members’ Investment in Data
So, if they go from 20% of their time invested in data, to 70%, that’s a plus 50% bump in their data investment, which works out to be over $89k per employee or over $450k per team.
We can now look at these benefits in a few ways:
- Almost $900k in savings if I don’t have to build the team, or
- Over $620k in the team’s total % Time Invested in Data (%TID), and
- Acceleration of our decision and data refresh cycles. (One of our customers went from data refreshes every two days to every 2 hours, which is 2400% faster)
Those are pretty big returns, and I suspect the benefits of outsourcing my data infrastructure (assuming I make no trade-offs in security and manageability) are even bigger, like:
- Reduction in cost and complexity of licensing multiple tools, (our customers would need to know over 100 different technologies, APIs and AWS services to build a comparable data infrastructure stack), and
- having full control over my AWS utilization rates because the environment is both serverless and lets me throttle my data flow rates myself.
I’m hosting a webinar tomorrow morning on the Top 5 requirements for AWS-native Data Pipelines. In it, I’ll explain the latter two benefits more deeply as we discuss why a serverless microservices architecture, which is how we are built, is the optimal deployment model for AWS customers. Then we’ll talk about how orchestration, change awareness and data publishing are the remaining key requirements beyond supporting AWS best practice ELT, like most other vendors are touting to enjoying the customer benefits I’ve described in this post.