This post was originally an email which our CEO, Raghu Murthy, shared with Datacoral customers on Tuesday, January 19th, 2021. It is shared with light edits.
This update started as a recap of 2020, where we shared our comments on what a strange year it had been. But that seems like old news now, compared to the eventful start to 2021. We hope with everything going on, you, your loved ones, and your teams are healthy and well.
We are heartened and excited about everything the Datacoral data community has achieved in 2020 and have some critical product fixes to share with you. Additionally, we want to provide you with a round-up of what’s informed our latest thinking at the frontiers of data infrastructure, and what’s on the horizon for Datacoral in Q1. Let’s get started.
- Change data capture – lessons learnt – Several of our customers have faced some pretty major challenges with implementing change data capture (CDC) solutions. This post by Daimler-TSS does an excellent job of explaining those challenges.
- Emerging architectures for modern data infrastructure – A16Z’s take on the latest set of data infrastructure tools and technologies was a great illustration of the level of complexity that companies still must deal with in choosing tools that fit their needs now and in the future. We have talked to 100s of companies who have told us about the overwhelming set of tool choices in building a data stack and have learned a lot about how to reduce this complexity while supporting our customers. We will share our frameworks and learnings in several forthcoming posts. Stay tuned!
- Lambda – A Serverless Musical – We started Datacoral because of AWS Lambda, which is now over 5 years old. We think this video will convince you to use AWS Lambda as well! 🙂
What’s on the Horizon for Q1:
You may have noticed CDC has come up with increased frequency in our communications lately. While we’ve discussed the concept before on our product page, and on our blog (here and here), we’ve seen a steady increase in customers looking for this capability.
Change Data Capture is not a new concept. And while several legacy solutions exist for the enterprise, fully-featured and fully managed solutions are still hard to come by for startups and scale-up companies looking to take full advantage of data that lives within their production databases like PostgreSQL and MySQL.
Although we first introduced CDC functionality awhile back, we have continued to make improvements and leveraged our data engineering platform to build an end-to-end pipeline for CDC that includes:
- Low overhead handling of historical syncs by connecting to replicas
- Reliable propagation of schema changes
- Robustness towards replication lag by leveraging concepts of close of books
- Guarantees of high fidelity replication via automated data quality checks
- Full observability into the data flow across all aspects of the pipeline from historical syncs, to actual change data capture, to schema change propagation, to data quality checks.
Given that database connectors, specifically, CDC connectors, are among the most heavily used integrations by our customers, and that the most valuable and voluminous data inputs for analyses within most companies live within their production databases, having a process that can reliably integrate data while not overloading the production database is critical. As a result of how much our customers rely on our CDC process, and the ability of these specific connectors to solve this complicated challenge many data teams face in integrating their production database into their warehouse, we have begun to identify our CDC connectors as being our ‘superpower’.
As a result, Q1 2021 will be focused on making several key improvements to our CDC connectors and platform especially around SQL and Python transformations and orchestration. While will continue to support and grow our catalog of 80+ pre-built connectors, you can expect to see more from us in the coming days as we continue to educate the data community about how an easy-to-use, reliable, and secure CDC solution does exist – and it can be found at Datacoral!
Datacoral Product Milestones and Updates:
- Our data pipelines ingest over 500B rows and support 10s of thousands of transformations each month. And continues to grow.
- We took part in testing and were featured as part of the launch of the new ML features announced by Amazon Redshift.
- We achieved our SOC2 type 1 certification this past quarter. This certification takes many companies months – but our secure, cloud on-prem architecture enabled us to achieve it in a matter of weeks!
- Our product and engineering team has pushed out over 800 improvements across 6 releases and released several new connectors based on customer requests.
- These connectors include:
- Key improvements to our PostgreSQL CDC and MySQL CDC connectors include:
- Support for performing historical syncs for new columns that might show up in the source database tables.
- Full transparency into historical syncs so that our customers can keep track of the progress of the initial sync of tables with billions of rows.
- Automating the generation of the data quality check pipeline including performing close of books in order to handle replication lag.