Capture Every Data Change!

Changes in your production database? Keep your analytics database in lock-step using our Change Data Capture (CDC) connector.

Your production database will always change. Keeping your analytics warehouse in sync with your production database will be an ongoing challenge.

Our PostgresSQL and MySQL CDC connectors have processed over 100B changes (5B rows) to ensure that the analytics warehouse captures 100% of the changes whether it be additions, deletions, or updates.

Our CDC Connectors

PostgreSQL
  • Logical replication – we manage the replication slot
  • Historical syncs from replicas
  • Support PostgreSQL on AWS Aurora, AWS RDS, and elsewhere
MySQL
  • Read binlogs from either the primary or the replica database
  • Historical syncs from replicas
  • MariaDB and MySQL

What is Change Data Capture (CDC)?

Production databases (e.g., MySQL, PostgreSQL, etc.) scale really well for transactional queries (insert, updates, and deletes), but cloud data warehouses such as Snowflake and AWS Redshift scale much better for analytical queries.

For such analytics, data-driven companies need to replicate data from their production database and from different data sources into a data warehouse. This helps them run analytics and gain insights on their product usage at a granular level without risking interference with the production database upon which their product is built.

There are two common methods to sync data from production
databases to an analytics data warehouse:

Running SQL Queries

In this method, the connector syncs data from the production database by running SQL queries to fetch data directly from the tables. This method doesn’t fetch the most granular changes or deleted records, but is easier to set up.

Reading from Database Write Ahead Logs

In this method, the connector reads the database write ahead log (WAL) — for example, binary logs in MySQL, logical replication in PostgreSQL, and log shipping in SQL server. The connector then translates those write ahead logs to updates in tables in the data warehouse. This method is called change data capture (CDC). 

How Does CDC Work?

Every database worth its salt writes a log of all the changes to every single row/object. In most modern databases, this log is a Write Ahead Log (WAL). The WAL contains all changes to all tables in the database in a single log.

Change Data Capture as a way to integrate production databases to data warehouses actually involves more than just reading the WAL logs — even though that is a critical part of the integration.

Setting up a CDC connector involves several steps that need to be performed in the exact right order in order to not lose data and provide data quality guarantees. All of these steps are automated by Datacoral for a completely hands-off approach for data teams.

Start reading the WAL logs from the database and persist in the staging area

Start applying the changes captured in the WAL log to the tables whose historical syncs have completed

Perform a historical sync for all of the tables of interest

Perform above steps when new tables/columns show up in the source database

Why use Datacoral's CDC Connectors?

There are plenty of CDC connectors available in the market. But Datacoral’s end-to-end serverless data pipelines provide the ultimate flexibility, scalability, compliance, and data quality guarantees that data analysts, data scientists, and data engineers need.

Our CDC Connectors Provide

Auditability

Get full auditability — the entire WAL log is available for you to analyze. Pick parts of your WAL log to also be loaded to the warehouse for analysis.

Delete Handling

Decide between hard deletes and soft deletes.

Schema Changes

Get full auditability - the entire WAL log is available for you to analyze. Pick parts of your WAL log to also be loaded to the warehouse for analysis.

Quality Checks

Automatically handle schema changes at the source. Specify sophisticated rules on how to propagate schema level changes at the source to the destination.

Resource Optimization

Decide how frequently you want to update the warehouse tables. Tune the pipeline easily to optimize load on your data warehouse.

Customization

Specify exactly which tables and columns you need. Datacoral will not persist *any* table or column level data that you do not need.

Change Data Capture Blogs

Various Data Connectors
Data Integration
Rishabh Bhargava

Data Connectors 101

Introduction Modern analytics teams are hungry for data. They are generating incredible insights that make their organizations smarter and are emphasizing the need for data-driven

Read More »

The Only Cost-Effective, Secure Option

We use cookies on our website. If you continue to use our website, you are agreeing to our use of cookies in accordance with our Cookie Statement. For information about how to change your cookie settings, please see our Cookie Statement.