
Data Connectors 101
Introduction Modern analytics teams are hungry for data. They are generating incredible insights that make their organizations smarter and are emphasizing the need for data-driven
Changes in your production database? Keep your analytics database in lock-step using our Change Data Capture (CDC) connector.
Your production database will always change. Keeping your analytics warehouse in sync with your production database will be an ongoing challenge.
Our PostgresSQL and MySQL CDC connectors have processed over 100B changes (5B rows) to ensure that the analytics warehouse captures 100% of the changes whether it be additions, deletions, or updates.
Production databases (e.g., MySQL, PostgreSQL, etc.) scale really well for transactional queries (insert, updates, and deletes), but cloud data warehouses such as Snowflake and AWS Redshift scale much better for analytical queries.
For such analytics, data-driven companies need to replicate data from their production database and from different data sources into a data warehouse. This helps them run analytics and gain insights on their product usage at a granular level without risking interference with the production database upon which their product is built.
In this method, the connector syncs data from the production database by running SQL queries to fetch data directly from the tables. This method doesn’t fetch the most granular changes or deleted records, but is easier to set up.
In this method, the connector reads the database write ahead log (WAL) — for example, binary logs in MySQL, logical replication in PostgreSQL, and log shipping in SQL server. The connector then translates those write ahead logs to updates in tables in the data warehouse. This method is called change data capture (CDC).
Every database worth its salt writes a log of all the changes to every single row/object. In most modern databases, this log is a Write Ahead Log (WAL). The WAL contains all changes to all tables in the database in a single log.
Change Data Capture as a way to integrate production databases to data warehouses actually involves more than just reading the WAL logs — even though that is a critical part of the integration.
Setting up a CDC connector involves several steps that need to be performed in the exact right order in order to not lose data and provide data quality guarantees. All of these steps are automated by Datacoral for a completely hands-off approach for data teams.
Start reading the WAL logs from the database and persist in the staging area
Start applying the changes captured in the WAL log to the tables whose historical syncs have completed
Perform a historical sync for all of the tables of interest
Perform above steps when new tables/columns show up in the source database
There are plenty of CDC connectors available in the market. But Datacoral’s end-to-end serverless data pipelines provide the ultimate flexibility, scalability, compliance, and data quality guarantees that data analysts, data scientists, and data engineers need.
Get full auditability — the entire WAL log is available for you to analyze. Pick parts of your WAL log to also be loaded to the warehouse for analysis.
Decide between hard deletes and soft deletes.
Get full auditability - the entire WAL log is available for you to analyze. Pick parts of your WAL log to also be loaded to the warehouse for analysis.
Automatically handle schema changes at the source. Specify sophisticated rules on how to propagate schema level changes at the source to the destination.
Decide how frequently you want to update the warehouse tables. Tune the pipeline easily to optimize load on your data warehouse.
Specify exactly which tables and columns you need. Datacoral will not persist *any* table or column level data that you do not need.
Introduction Modern analytics teams are hungry for data. They are generating incredible insights that make their organizations smarter and are emphasizing the need for data-driven
Introduction How do you think about your data stack? If your job requires analyzing data, or supporting teams who analyze your data, you will likely
We use cookies on our website. If you continue to use our website, you are agreeing to our use of cookies in accordance with our Cookie Statement. For information about how to change your cookie settings, please see our Cookie Statement.