1. Glossary/

CDC

Change Data Capture

CDC (Change Data Capture) is a technique for intercepting data changes (INSERT, UPDATE, DELETE) as they occur and propagating them to other systems in real time or near-real time. Unlike traditional batch approaches (periodic ETL), CDC captures changes continuously and incrementally.

How it works #

The most common approach is log-based CDC: an external component reads the database’s transaction logs (binary log in MySQL, WAL in PostgreSQL, redo log in Oracle) and converts events into a data stream consumable by other systems. Tools like Debezium, Maxwell and Canal implement this approach for MySQL by reading binary logs directly.

What it’s for #

CDC is used for:

  • Synchronising data between different databases in real time
  • Feeding data warehouses and data lakes with incremental updates
  • Populating caches and search indexes (Elasticsearch, Redis)
  • Implementing event-driven architectures and microservices

When to use it #

CDC requires binary logging to be active and in ROW format (which records row-level changes). Disabling binary logs or using STATEMENT format eliminates the ability to use CDC tools, making real-time integration with external systems impossible.