Capture Every Data Change!

Changes in your production database? Keep your analytics database in lock-step using our Change Data Capture (CDC) connector.

Your production database will always change. Keeping your analytics warehouse in sync with your production database will be an ongoing challenge.

Our PostgresSQL and MySQL CDC connectors have processed over 100B changes (5B rows) to ensure that the analytics warehouse captures 100% of the changes whether it be additions, deletions, or changes.

Our CDC Connectors

PostgreSQL
  • Logical replication – we manage the replication slot
  • Historical syncs from replicas
  • Support PostgreSQL on AWS Aurora, AWS RDS, and elsewhere
MySQL
  • Read binlogs from either the primary or the replica database
  • Historical syncs from replicas
  • MariaDB and MySQL

What is Change Data Capture (CDC)?

Production databases (e.g., MySQL, PostgreSQL, etc.) scale really well for transactional queries (insert, updates, and deletes), but cloud data warehouses such as Snowflake and AWS Redshift scale much better for analytical queries.

For such analytics, data-driven companies need to replicate data from their production database and from different data sources into a data warehouse. This helps them run analytics and gain insights on their product usage at a granular level without risking interference with the production database upon which their product is built.

There are 2 common methods to sync data from production
databases to an analytics data warehouse:

Running SQL Queries

In this method, the connector syncs data from the production database by running SQL queries to fetch data directly from the tables. This method doesn’t necessarily log updates to records, leading to the inability to perform in-depth analyses as well as a loss of data. 

Reading from Database Write Ahead Logs

In this method, the connector reads the database write ahead log (WAL) — for example, binary logs in MySQL, logical replication in PostgreSQL, and log shipping in SQL server. The connector then translates those write ahead logs to updates in tables in the data warehouse. This method is called change data capture (CDC). 

How Does CDC Work?

Every database worth its salt writes a log of all the changes to every single row/object. In most modern databases, this log is a Write Ahead Log (WAL). The WAL contains all changes to all tables in the database in a single log.

Change Data Capture as a way to integrate production databases to data warehouses actually involves more than just reading the WAL logs even though that is a critical part of the integration.

Setting up a CDC connector involves several steps that need to be performed in the exact right order in order to not lose data and provide data quality guarantees. All of these steps are automated by Datacoral for a completely hands-off approach for data teams.

Start reading the WAL logs from the database and persist in the staging area

Start applying the changes captured in the WAL log to the tables whose historical syncs have completed

Perform a historical sync for all of the tables of interest

Perform above steps when new tables/columns show up in the source database

Why use Datacoral's CDC Connectors?

There are plenty of CDC connectors available in the market. But Datacoral’s end-to-end serverless data pipelines provide the ultimate flexibility, scalability, compliance, and data quality guarantees that data analysts, data scientists, and data engineers need.

Our CDC Connectors Provide

Auditability

Get full auditability - the entire WAL log is available for you to analyze. Pick parts of your WAL log to also be loaded to the warehouse for analysis.

Delete Handling

Two options are better than one. Now you can decide between hard deletes and soft deletes.

Schema Changes

Automatically handle schema changes at the source. Specify sophisticated rules on how to propagate schema level changes at the source to the destination.

Quality Checks

Out-of-the-box Data Quality Checks to guarantee the right data, whenever you need it.

Resource Optimization

Decide how frequently you want to update the warehouse tables. Tune the pipeline easily to optimize load on your data warehouse.

Customization

Specify exactly which tables and columns you need. Datacoral will not persist *any* table or column level data that you do not need.

Change Data Capture Blogs

Change Data Capture with Google Analytics
Data Infrastructure
Rishabh Bhargava

Change Data Capture: 101

Introduction Let’s pretend you work at a small identity protection startup and you’re getting acquired by a Fortune 100 corporation. You and 120 of your

Read More »
Datacoral Data Capture Flowchart
Data Integration
Raghu Murthy

Change Data Capture: Overview

Change Data Capture (CDC) is a common pattern for replicating data from databases to data warehouses by syncing the database change logs rather than fetching

Read More »

The Only Cost-Effective, Secure Option

We use cookies on our website. If you continue to use our website, you are agreeing to our use of cookies in accordance with our Cookie Statement. For information about how to change your cookie settings, please see our Cookie Statement.