Greenhouse Applicant tracking system & recruiting software

Datacoral’s pipelines translate Greenhouse data into an easily analyzable structure, solving the customer’s “pointillistic spaghetti data” problem.

Using Datacoral’s PostgreSQL Change Data Capture (CDC) connectors and more than 30 additional API connectors, Greenhouse seamlessly integrates data from a wide range of sources into their data warehouse to generate features and orchestrate machine learning pipelines.

Greenhouse is an industry-leading hiring platform designed to help companies of all sizes hire for what’s next. To help recruiters and hiring managers stay on top of their current candidate pipeline, Greenhouse uses machine learning technology, enabled by data integrated using the Datacoral platform, to deliver predictive interviewer recommendations and save customers time that was previously spent on repetitive and time-consuming tasks.

Using Datacoral’s PostgreSQL Change Data Capture (CDC) connectors and more than 30 additional API connectors, including Salesforce, Zendesk, S3, Google Adwords, and Google Analytics, Greenhouse seamlessly integrates data from a wide range of sources into their data warehouse. Greenhouse also leverages Datacoral’s events connector to collect behavioral data within the Greenhouse product suite, which is then piped into their Amazon Redshift data warehouse where the team leverages several additional Amazon tools including SageMaker. This data stack enables Greenhouse to look for user activity trends that aid recruiters in candidate searches, interview scheduling, and placement services.

In addition, Datacoral powers all of Greenhouse’s data engineering requirements, such as its data transformation layer and its publishing pipeline, which distributes data and insights back into Greenhouse’s products and service vendors.

Key Takeaways

  • Greenhouse built an interviewer recommendation feature to reduce the amount of time recruiting teams spend on repetitive form-filling tasks while scheduling interviews. This feature is powered by Datacoral’s Change Data Capture (CDC) and other integrations, in addition to data transformation pipelines, all of which are built declaratively using SQL. Using Datacoral’s instrumentation library, Greenhouse is able to monitor user behavior and evaluate the performance of the recommendation feature.
  • Datacoral’s CDC connectors provide Greenhouse with a robust, low footprint data integration to replicate sharded databases into a convenient, consolidated schema in the data warehouse for simple retrieval.
  • Datacoral’s pipelines translate Greenhouse data into an easily analyzable structure, solving the customer’s “pointillistic spaghetti data” problem. Datacoral provides a single pane of glass for data analysts to get full visibility into the data ingestion and transformation pipelines.
  • Datacoral’s declarative transformation pipelines allow Greenhouse’s team of data scientists to use SQL to help generate features and orchestrate machine learning pipelines using Amazon Sagemaker.
  • With Datacoral completely managing Greenhouse’s data pipelines, the product and data teams can focus on the design of their machine learning-powered interviewer recommendation feature.
  • In total, 16% fewer scheduling sessions end outside of the Greenhouse platform when an interviewer recommendation is accepted.

Predictive machine learning pipelines in Greenhouse required a reliable CDC pipeline to replicate their production databases to their Amazon Redshift data warehouse.

Greenhouse offers an industry-leading hiring platform to help companies hire for what’s next. The Greenhouse applicant tracking system (ATS) and suite of recruiting software helps global teams manage hundreds of thousands of candidate interviews every month. At this scale, reducing workflow completion time by a few seconds can literally save customers hundreds of hours of productivity every month. By developing structured workflows with embedded predictive analytics, Greenhouse can offer its customers valuable performance insights into their hiring process. 

In 2019, Greenhouse recognized there was an opportunity to optimize their popular scheduling feature by automating ways customers coordinate, sync, and manage interviewing schedules across multiple calendars. Following a successful joint-partnership building an AWS-native data infrastructure, Greenhouse turned to Datacoral to establish new ways machine learning could improve a manual and time-consuming process.

Greenhouse conducted extensive research that revealed recruiting teams spend multiple hours a day coordinating interviews. Scheduling an interview is time intensive and requires a deep understanding of the role being filled, the hiring department’s expectations, and how to identify the best person for the sequence of hiring interviews. The Greenhouse scheduling system tracks where the applicant is in the hiring process and how many interviews have taken place, but the responsibility to recommend who was the most likely person to conduct the next interview fell to the recruiter or hiring manager.

Feature development questions Greenhouse considered:

  • Can we accurately predict interviewer options and availability?
  • Can we proactively reserve time on calendars in advance to ensure the interview will be complete within the required amount of time?
  • Can we limit the number of tabs and platforms that recruiters need to toggle between in order to schedule the interview?
  • Can we easily scale this predictive recommendations feature to thousands of customers with a target range of 80-90% accuracy?

When considering the design of the interviewer feature Stef Orzech, Senior Product Manager at Greenhouse, had two primary considerations: How accurate is the model and how is the recommendation presented in the feature?

One of Greenhouse’s primary concerns before introducing the interview recommendation model was to ensure qualified candidates remain in the pipeline and to reduce bias throughout the hiring process. By using Datacoral’s CDC connector, Greenhouse was able to build a model using real production data. Datacoral’s orchestrated transformation capabilities implement a simple machine learning pipeline that suggests a person to facilitate a given interview. Since interviews can be facilitated by one or multiple individuals, Greenhouse built a rules-based recommendation algorithm that provided the most likely individual name a recruiter would pick. The recommendation is pre-populated at the top of a dropdown menu with the recommended interviewer preselected, reducing friction and distractions in the interview scheduling process. 

“The interviewer recommendation model leverages CDC, orchestrated transformations for data modeling and feature generation, and SageMaker to train and serve the algorithm. That’s how it was built and how future iterations of the feature will be built.”

– Mona Khalil, Senior Data Scientist at Greenhouse

Datacoral’s CDC pipeline allows for reliable and fast consolidation of data across multiple shards into a single table, making it easy for an analyst or application to query a single table in the warehouse. This was a key performance and accuracy advantage for Greenhouse’s interview scheduling feature. Greenhouse employs a sharded database that would have required custom code and complex orchestration were it not for Datacoral’s end-to-end pipeline capabilities.

With all technical concerns alleviated, Greenhouse Interviewer Recommendations was born.

Datacoral's CDC solution provides high-fidelity consolidated data across all the shards of the production database.

Greenhouse shards its production database to scale its application to thousands of customers. Sharded databases increase the complexity for the analytics use case since data needs to be consolidated across all the shards. Datacoral’s CDC solution orchestrates the consolidation of the data to provide an easy-to-consume schema in the data warehouse. Aaron Gibralter, Director of Engineering at Greenhouse, notes “We’ve grown up with Datacoral’s CDC and high-fidelity data. It’s hard to imagine an alternative integration with the ability to offer our customers the same benefits.”

Before they had CDC capabilities, Greenhouse employed an incremental SQL-based pipeline, which couldn’t handle hard deletes and sometimes captured inconsistent timestamps. By introducing Datacoral’s CDC solution, Greenhouse now has access to high-fidelity data, removing the need for time-consuming processes such as developing custom code or orchestration to manage their pipelines. This enables Greenhouse to merge sharded data and keep data fresh and accurate.

Orchestrated transformations via Datacoral provide a strategic foundation for data pipelines.

Prior to having orchestrated transformations, Greenhouse’s strategy to analytics was ad hoc and lacked architecture and efficiency. Gibralter notes, “Our pointillistic spaghetti data approach to analytics fit our needs during the first few years of our journey, but we quickly needed a solution that could scale with our business. Datacoral’s CDC plus transformation platform is the basis for defining new schemas and plays an important role helping power our analysis.” Datacoral’s orchestrated transformation feature allows users to just specify a SQL query and write the data for that view into a table, making data available much more quickly than a traditional database view.

Datacoral’s orchestrated transformations facilitate machine learning.

Datacoral’s transformation pipeline capabilities generate features for training machine learning models. Khalil notes that “Datacoral’s orchestrated transformations are central to our team’s ability to develop new features and remains critical for our machine learning pipeline, allowing our model to run, train, and deploy faster.”

Datacoral's events connector provides an easy way to integrate product usage into machine learning pipelines.

A feedback loop is the only way to improve a machine learning model. Datacoral’s event connector and javascript instrumentation libraries help Greenhouse understand how recruiting coordinators use the interviewer recommendation feature. Data generated from the Greenhouse platform helps Datacoral build a picture of common user interactions. Receiving user interaction data allows Greenhouse to build a feedback loop that is critical to tweak their machine learning model and improve efficiency.

Managed services allow the data science team to stay lean.

Datacoral simplifies the authoring and debugging of pipelines while automating pipeline operations. Prior to working with Datacoral, Greenhouse’s data scientists spent their time building and monitoring the day-to-day upkeep of data pipelines. Now, they can spend their time on higher-order analytics activities like building better reports for the Customer Success team and working closely with product engineering teams. Datacoral allows Greenhouse’s data teams to amplify their impact on the product and all business units

Saving clicks, saving money, and saving the day for recruiters.

The results: Greenhouse leverages Datacoral to build an interviewer recommendation feature that is 85% accurate, increases productivity, and improves NPS.
“Datacoral has been key in building the data foundations that allowed us to develop and deploy this model," said Gibralter

Datacoral’s CDC pipelines, orchestrated transformations, 24/7 fully-managed services, and data expertise gave Greenhouse the foundation to deliver an innovative and time-saving improvement for one of their most-used platform features. Partnering with Datacoral is a key piece of Greenhouse’s data strategy to ensure that the high volume of data they produce matches pipeline output capabilities. 

Since deploying the interviewer recommendation feature, Greenhouse users have seen a significant decrease in the time spent manually assigning interviewers. With 85% of recommendations being accepted, recruiting teams are able to assign an interviewer with a few clicks. This seamless process enables lean recruiting teams to quickly and easily click through workflows and schedule interviews, sometimes even a round or two in advance. Removing this previously time consuming task allows recruiting teams to turn their attention to more impactful short-term and long-term activities such as building relationships with candidates and helping the business achieve its hiring goals.

Interview scheduling is a monotonous task. By using data to develop this software feature, Greenhouse and Datacoral were able to abstract away a repetitive, time-consuming task for  recruiting coordinators and enable them to do more with their scarcest resource: time.

Aaron Gibralter

Director of Engineering, Greenhouse

“The datacoral infrastructure manages our entire backend data flow from collection from many sources, to analysis, to publishing back to the organization, allowing our data science team to deliver value to internal and external customers without worrying about tedious data plumbing.”


About Greenhouse

Greenhouse Software is the fastest-growing provider of enterprise talent acquisition software. Thousands of the smartest and most successful companies like Cisco Meraki, Time Inc., and Airbnb use Greenhouse’s intelligent guidance to design and automate all aspects of hiring throughout their organizations, helping them compete and win for top talent. Greenhouse has won numerous awards including #1 Best Place to Work by Glassdoor, Forbes Cloud 100, and Talent Acquisition FrontRunner leader by Software Advice.

We use cookies on our website. If you continue to use our website, you are agreeing to our use of cookies in accordance with our Cookie Statement. For information about how to change your cookie settings, please see our Cookie Statement.