DWH Partitioning: When 3 Years of Data Weigh Too Much
Range partitioning on an 800-million-row fact table: from 12-minute quarterly queries to 40 seconds. Monthly implementation, exchange and local indexes.

Data Warehouse articles: Kimball dimensional modeling, SCD Type 2, fact table grain, bus matrix, range partitioning, ragged hierarchy and practical ETL.
I have seen data warehouses built on daily granularity because “the business is fine with that” — and become useless the very next day, when marketing asked for hourly conversion analysis. I have seen customer dimensions without historisation, overwriting the postal code every time someone moved — and last year’s reports that no longer matched. I have seen ETL jobs reloading 200 million rows in full every night because nobody ever had the courage to redesign the delta.
And I have seen the exact opposite: small, well-modelled data marts, with the bus matrix properly drawn — answering questions nobody had thought to ask yet, without touching a single line of code.
The difference has never been the technology. It has always been the model.
A data warehouse is not a database with bigger tables. It is a different way of thinking about data — oriented towards analysis, history, decisions.
In transactional databases the present moment is what counts: the order you are inserting, the current balance, the row you are updating. In a data warehouse what counts is the journey: what that customer looked like six months ago, how the product has changed over time, which version of the master data was valid when that contract was signed.
Almost always, a DWH that does not hold up can be spotted by these things:
These are problems you cannot see in development. They blow up six months later, when the business asks for reports the model cannot support.
Before even drawing a fact table, there are five questions I put to the business. They are not optional — they are the difference between a data warehouse that lasts ten years and one that needs rewriting after two.
| Question | What I am trying to understand | Why it is critical |
|---|---|---|
| What grain do you need the data at? | Daily, hourly, single transaction | Always pick the finest useful grain — aggregating later is possible, disaggregating is not |
| How far back in time? | History required, analytical depth | Drives volumes, storage, partitioning and archiving strategies |
| What happens when a master record changes? | A customer moves, a product changes category | Determines the SCD type (1, 2, 3, 6) for each dimension |
| Which hierarchies must it support? | Drill-down, roll-up, alternative paths | Prevents ragged dimensions, unjustified snowflakes, slow joins on aggregates |
| What latency is acceptable? | Nightly batch, intraday, near real-time | It changes everything: ETL, model, infrastructure, cost |
Five questions. Twenty minutes of meeting. Weeks of rewrites avoided.
Real stories of data warehouse design and restructuring in production. Dimensional modelling (Kimball read properly, not as slogans), slowly changing dimensions, bus matrix, hierarchies, incremental loading strategies and analytical performance.
No textbook recipes. Just solutions applied to real systems — insurance, finance, public administration, telco, postal — that serve real business decisions.
A data warehouse is not built to contain data.
It is built to answer questions — and those questions, inevitably, change.

Kimball bus matrix to align isolated data marts: conformed dimensions, business processes and comparable sales. Real insurance group case.

Range partitioning on an 800-million-row fact table: from 12-minute quarterly queries to 40 seconds. Monthly implementation, exchange and local indexes.

Ragged hierarchy in the data warehouse: balancing unbalanced trees with the self-parenting technique. Correct drill-down on customers and groups.

SCD Type 2 in the data warehouse: historicising dimensions with surrogate keys and validity dates. Real case: a customer dimension that evolves over time.

Data warehouse: fact table grain determines which questions you can answer. Common mistakes in granularity and their impact on the dimensional model.
Start typing to search…
Select a result to see the preview