Using Datacoral’s PostgreSQL Change Data Capture (CDC) connectors and more than 30 additional API connectors, Greenhouse seamlessly integrates data from a wide range of sources into their data warehouse to generate features and orchestrate machine learning pipelines.
Greenhouse is an industry-leading hiring platform designed to help companies of all sizes hire for what’s next. To help recruiters and hiring managers stay on top of their current candidate pipeline, Greenhouse uses machine learning technology, enabled by data integrated using the Datacoral platform, to deliver predictive interviewer recommendations and save customers time that was previously spent on repetitive and time-consuming tasks.
Using Datacoral’s PostgreSQL Change Data Capture (CDC) connectors and more than 30 additional API connectors, including Salesforce, Zendesk, S3, Google Adwords, and Google Analytics, Greenhouse seamlessly integrates data from a wide range of sources into their data warehouse. Greenhouse also leverages Datacoral’s events connector to collect behavioral data within the Greenhouse product suite, which is then piped into their Amazon Redshift data warehouse where the team leverages several additional Amazon tools including SageMaker. This data stack enables Greenhouse to look for user activity trends that aid recruiters in candidate searches, interview scheduling, and placement services.
In addition, Datacoral powers all of Greenhouse’s data engineering requirements, such as its data transformation layer and its publishing pipeline, which distributes data and insights back into Greenhouse’s products and service vendors.
Greenhouse offers an industry-leading hiring platform to help companies hire for what’s next. The Greenhouse applicant tracking system (ATS) and suite of recruiting software helps global teams manage hundreds of thousands of candidate interviews every month. At this scale, reducing workflow completion time by a few seconds can literally save customers hundreds of hours of productivity every month. By developing structured workflows with embedded predictive analytics, Greenhouse can offer its customers valuable performance insights into their hiring process.
In 2019, Greenhouse recognized there was an opportunity to optimize their popular scheduling feature by automating ways customers coordinate, sync, and manage interviewing schedules across multiple calendars. Following a successful joint-partnership building an AWS-native data infrastructure, Greenhouse turned to Datacoral to establish new ways machine learning could improve a manual and time-consuming process.
Greenhouse conducted extensive research that revealed recruiting teams spend multiple hours a day coordinating interviews. Scheduling an interview is time intensive and requires a deep understanding of the role being filled, the hiring department’s expectations, and how to identify the best person for the sequence of hiring interviews. The Greenhouse scheduling system tracks where the applicant is in the hiring process and how many interviews have taken place, but the responsibility to recommend who was the most likely person to conduct the next interview fell to the recruiter or hiring manager.
When considering the design of the interviewer feature Stef Orzech, Senior Product Manager at Greenhouse, had two primary considerations: How accurate is the model and how is the recommendation presented in the feature?
One of Greenhouse’s primary concerns before introducing the interview recommendation model was to ensure qualified candidates remain in the pipeline and to reduce bias throughout the hiring process. By using Datacoral’s CDC connector, Greenhouse was able to build a model using real production data. Datacoral’s orchestrated transformation capabilities implement a simple machine learning pipeline that suggests a person to facilitate a given interview. Since interviews can be facilitated by one or multiple individuals, Greenhouse built a rules-based recommendation algorithm that provided the most likely individual name a recruiter would pick. The recommendation is pre-populated at the top of a dropdown menu with the recommended interviewer preselected, reducing friction and distractions in the interview scheduling process.
Datacoral’s CDC pipeline allows for reliable and fast consolidation of data across multiple shards into a single table, making it easy for an analyst or application to query a single table in the warehouse. This was a key performance and accuracy advantage for Greenhouse’s interview scheduling feature. Greenhouse employs a sharded database that would have required custom code and complex orchestration were it not for Datacoral’s end-to-end pipeline capabilities.
Greenhouse shards its production database to scale its application to thousands of customers. Sharded databases increase the complexity for the analytics use case since data needs to be consolidated across all the shards. Datacoral’s CDC solution orchestrates the consolidation of the data to provide an easy-to-consume schema in the data warehouse. Aaron Gibralter, Director of Engineering at Greenhouse, notes “We’ve grown up with Datacoral’s CDC and high-fidelity data. It’s hard to imagine an alternative integration with the ability to offer our customers the same benefits.”
Before they had CDC capabilities, Greenhouse employed an incremental SQL-based pipeline, which couldn’t handle hard deletes and sometimes captured inconsistent timestamps. By introducing Datacoral’s CDC solution, Greenhouse now has access to high-fidelity data, removing the need for time-consuming processes such as developing custom code or orchestration to manage their pipelines. This enables Greenhouse to merge sharded data and keep data fresh and accurate.
Prior to having orchestrated transformations, Greenhouse’s strategy to analytics was ad hoc and lacked architecture and efficiency. Gibralter notes, “Our pointillistic spaghetti data approach to analytics fit our needs during the first few years of our journey, but we quickly needed a solution that could scale with our business. Datacoral’s CDC plus transformation platform is the basis for defining new schemas and plays an important role helping power our analysis.” Datacoral’s orchestrated transformation feature allows users to just specify a SQL query and write the data for that view into a table, making data available much more quickly than a traditional database view.
Datacoral’s transformation pipeline capabilities generate features for training machine learning models. Khalil notes that “Datacoral’s orchestrated transformations are central to our team’s ability to develop new features and remains critical for our machine learning pipeline, allowing our model to run, train, and deploy faster.”
Datacoral simplifies the authoring and debugging of pipelines while automating pipeline operations. Prior to working with Datacoral, Greenhouse’s data scientists spent their time building and monitoring the day-to-day upkeep of data pipelines. Now, they can spend their time on higher-order analytics activities like building better reports for the Customer Success team and working closely with product engineering teams. Datacoral allows Greenhouse’s data teams to amplify their impact on the product and all business units
Datacoral’s CDC pipelines, orchestrated transformations, 24/7 fully-managed services, and data expertise gave Greenhouse the foundation to deliver an innovative and time-saving improvement for one of their most-used platform features. Partnering with Datacoral is a key piece of Greenhouse’s data strategy to ensure that the high volume of data they produce matches pipeline output capabilities.
Since deploying the interviewer recommendation feature, Greenhouse users have seen a significant decrease in the time spent manually assigning interviewers. With 85% of recommendations being accepted, recruiting teams are able to assign an interviewer with a few clicks. This seamless process enables lean recruiting teams to quickly and easily click through workflows and schedule interviews, sometimes even a round or two in advance. Removing this previously time consuming task allows recruiting teams to turn their attention to more impactful short-term and long-term activities such as building relationships with candidates and helping the business achieve its hiring goals.
Interview scheduling is a monotonous task. By using data to develop this software feature, Greenhouse and Datacoral were able to abstract away a repetitive, time-consuming task for recruiting coordinators and enable them to do more with their scarcest resource: time.
Director of Engineering, Greenhouse
“The datacoral infrastructure manages our entire backend data flow from collection from many sources, to analysis, to publishing back to the organization, allowing our data science team to deliver value to internal and external customers without worrying about tedious data plumbing.”
Greenhouse Software is the fastest-growing provider of enterprise talent acquisition software. Thousands of the smartest and most successful companies like Cisco Meraki, Time Inc., and Airbnb use Greenhouse’s intelligent guidance to design and automate all aspects of hiring throughout their organizations, helping them compete and win for top talent. Greenhouse has won numerous awards including #1 Best Place to Work by Glassdoor, Forbes Cloud 100, and Talent Acquisition FrontRunner leader by Software Advice.