Surrogate key
Chiave surrogata
A surrogate key is a sequential numeric identifier generated internally by the data warehouse, with no business meaning. It is distinct from the natural key — the one coming from the source system (e.g. customer code, employee number).
Why it matters #
In SCD Type 2, the same customer can have multiple rows in the dimension table — one for each historical version. The natural key (customer_id) is no longer unique, so you need an identifier that distinguishes each individual version: the surrogate key (customer_key).
How it works #
It’s typically generated by a sequence (Oracle) or a SERIAL/IDENTITY column (PostgreSQL, MySQL). It’s never exposed to end users and has no meaning outside the data warehouse.
The fact table uses the surrogate key as a foreign key, pointing to the specific dimension version that was valid at the time of the fact. This ensures every transaction is associated with the correct dimensional context for that point in time.
Advantages #
- Enables dimension versioning (SCD Type 2)
- Joins between fact and dimension are on integers, so they’re fast
- Insulates the DWH from changes in source system keys
- Supports loading from multiple sources with potentially duplicate natural keys