Changes in your production database? We got it with our Change Data Capture (CDC) connector.
Your production database will always change. Keeping your analytics warehouse in sync with your production database will be an ongoing challenge.
Our Postgres, MySQL and MongoDB CDC connectors have processed over 100B changes (5B rows) to ensure that the analytics warehouse captures 100% of the changes whether it be additions, deletions or appends.
Production databases (e.g., MySQL, PostgreSQL, etc.) scale really well for transactional queries (insert, updates and deletes), but cloud data warehouses such as Snowflake and AWS Redshift scale much better for analytical queries. For such analytics, data-driven data companies need to replicate data from their production database and from different data sources into a data warehouse. This helps them run analytics and gain insights on their product usage.
There are two common methods to sync data from production databases to an analytics data warehouse:
Every database worth its salt writes a log of all the changes to every single row/object. In most modern databases, this log is a Write Ahead Log (WAL). The WAL contains all changes to all tables in the database in a single log.
Change Data Capture as a way to integrate production databases to data warehouses actually involves more than just reading the WAL logs even though that is a critical part of the integration. Setting up a CDC connector involves several steps that need to be performed in the exact right order in order to not lose data and provide data quality guarantees. All of these steps are automated by Datacoral for a completely hands-off approach for data teams.
1. Start reading the WAL logs from the database and persist in the staging area
2. Perform a historical sync for all of the tables of interest
3. Start applying the changes captured in the WAL log to the tables whose historical syncs have completed
4. Perform above steps when new tables/columns show up in the source database
There are plenty of CDC connectors available in the market. But, Datacoral's end-to-end serverless data pipelines provide the ultimate flexibility, scalability, compliance, and data quality guarantees that data analysts need.
Our CDC Connectors provide:
Get full auditability - the entire WAL log is available for you to analyze. Pick parts of your WAL log to also be loaded to the warehouse for analysis.
Decide between hard deletes and soft deletes.
Automatically handle schema changes at the source. Specify sophisticated rules on how to propagate schema level changes at the source to the destination
Get out-of-the-box data quality checks including close of books to automatically handle replica lag. Your analytics will always be performed with the complete data for a given time period.
Decide how frequently you want to update the warehouse tables. Tune the pipeline easily to optimize load on your data warehouse.
Specify exactly which tables and columns you need. Datacoral will not persist *any* table or column level data that you do not need.