ANALYZE

ANALYZE is the PostgreSQL command that collects statistics about data distribution in tables and stores them in the pg_statistic catalog (readable through the pg_stats view). The optimizer uses these statistics to estimate cardinality — how many rows each operation will return — and choose the most efficient execution plan.

What it collects #

The statistics collected by ANALYZE include:

Most common values: the most frequent values for each column and their percentage
Distribution histograms: how the remaining values are distributed
Number of distinct values: how many unique values each column has
NULL percentage: how many rows have NULL for each column

The quality of these statistics depends on the number of samples collected, controlled by the default_statistics_target parameter.

Why it matters #

Without up-to-date statistics, the optimizer is forced to guess. Wrong estimates lead to disastrous execution plans — such as choosing a nested loop on millions of rows thinking there are only hundreds, or ignoring a perfectly suitable index.

When to run it #

PostgreSQL runs ANALYZE automatically through autovacuum, but the default threshold (50 rows + 10% of live rows) can be too high for rapidly growing tables. Situations where a manual ANALYZE is needed:

After bulk imports or bulk loads
After significant changes in data distribution
When EXPLAIN ANALYZE shows cardinality estimates far from actual rows
After modifying a column’s default_statistics_target

What it collects #

Why it matters #

When to run it #

Related articles