PostgreSQL LIKE with %value%: GIN index and pg_trgm

A few weeks ago, a client contacted me with a very common issue:

“Search in the admin console is slow. Sometimes it takes several seconds. We’ve already reduced the JOINs, but the problem hasn’t disappeared.”

Environment: PostgreSQL on managed cloud.
Main table: payment_report (~6 million rows, 3 GB).
Searched column: reference_code.

Problematic query:

SELECT *
FROM reporting.payment_report r
JOIN reporting.payment_cart c ON c.id = r.cart_id
WHERE c.service_id = 1001
  AND r.reference_code LIKE '%ABC123%'
ORDER BY c.created_at DESC
LIMIT 100;

🧠 First observation: the JOINs were not the problem #

I compared:

AS-IS version (3 JOINs on the same table)
TO-BE version (only 1 JOIN)

The result?

The execution plan showed in both cases:

Parallel Seq Scan on payment_report
Rows Removed by Filter: ~2,000,000
Buffers: shared read = hundreds of thousands
Execution Time: 14–18 seconds

Reducing the JOINs had only a marginal impact.

The real problem was something else.

📌 The culprit: `LIKE '%value%'` without a proper index #

A search with a leading wildcard (%value%) makes a normal B-Tree index unusable.

PostgreSQL is forced to perform a sequential scan of the entire table.

In this specific case:

~3 GB of data
hundreds of thousands of 8KB pages read
I/O bound workload
seconds of latency

This is not a matter of “bad SQL”. It is an access path problem.

🔬 Before creating an index: risk analysis #

The client rightly asked:

“If we create a trigram (GIN) index, do we risk slowing down payment transactions?”

This is where a frequently ignored concept comes into play: **churn** .

What is churn? #

It represents how much a table changes after rows are inserted.

High frequency of: - UPDATE - DELETE

→ high churn
→ higher index maintenance cost
→ possible write degradation

In our case:

Table payment_report: - ~12k inserts/day - 0 updates - 0 deletes - 0 dead tuples

Profile: append-only

This is the best possible scenario to introduce a GIN index.

📊 Critical check: synchronous or batch? #

The table did not contain an insertion timestamp.

Solution: indirect analysis.

I correlated rows in payment_report with the cart timestamp (payment_cart.created_at) and analyzed hourly distribution.

Result:

continuous 24/7 pattern
daytime peaks
nighttime drop
perfect correlation with cart traffic

Conclusion: near real-time population, not nightly batch.

🛠️ The solution #

CREATE EXTENSION IF NOT EXISTS pg_trgm
;

CREATE INDEX CONCURRENTLY idx_payment_report_reference_trgm
ON reporting.payment_report
USING gin
 (reference_code gin_trgm_ops);

The pg_trgm extension enables trigram-based search [1], used together with a GIN index [2]; the CONCURRENTLY mode lets you add the index without blocking INSERT/UPDATE/DELETE on the table [3].

Precautions:

Create during an off-peak window
Use CONCURRENTLY mode
Monitor I/O during index build

⚠️ Note on migration tools: CREATE INDEX CONCURRENTLY cannot be executed inside an explicit transaction block (BEGIN; ... COMMIT;). PostgreSQL returns ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block. With interactive psql it’s not an issue (no implicit transaction), but with Flyway, Liquibase, alembic and similar — which wrap every migration in a transaction — it has to be handled explicitly: Flyway with -- no transaction as the first line of the script, Liquibase with runInTransaction="false", alembic with op.execute(..., autocommit=True). If this check is missing, the migration fails at execution time, not in dry-run.

📈 Result: the execution plan before and after #

Here is the full execution plan for the query — before and after creating the trigram index.

Before (without trigram index):

Nested Loop Inner Join
  → Nested Loop Inner Join
    → Nested Loop Inner Join
      → Seq Scan on payment_report as r
          Filter: ((reference_code)::text ~~ '%ABC123%'::text)
      → Index Scan using payment_cart_pkey on payment_cart as c
          Filter: (service_id = 1001)
          Index Cond: (id = r.cart_id)
    → Index Only Scan using payment_cart_pkey on payment_cart as c2
        Index Cond: (id = c.id)
  → Index Only Scan using payment_cart_pkey on payment_cart as c3
      Index Cond: (id = c.id)

After (with trigram index):

Nested Loop Inner Join
  → Nested Loop Inner Join
    → Nested Loop Inner Join
      → Bitmap Heap Scan on payment_report as r
          Recheck Cond: ((reference_code)::text ~~ '%ABC123%'::text)
        → Bitmap Index Scan using idx_payment_report_reference_trgm
            Index Cond: ((reference_code)::text ~~ '%ABC123%'::text)
      → Index Scan using payment_cart_pkey on payment_cart as c
          Filter: (service_id = 1001)
          Index Cond: (id = r.cart_id)
    → Index Only Scan using payment_cart_pkey on payment_cart as c2
        Index Cond: (id = c.id)
  → Index Only Scan using payment_cart_pkey on payment_cart as c3
      Index Cond: (id = c.id)

The key change is at steps 4–5: the Seq Scan — which read the entire table row by row — has been replaced by a Bitmap Heap Scan driven by the trigram index idx_payment_report_reference_trgm. PostgreSQL now filters directly through the index and only rechecks the candidate rows.

Same query, same data, but a completely different access path. From seconds to milliseconds.

🎯 Key lesson #

When a query is slow:

Don’t stop at the number of JOINs.
Look at the execution plan.
Identify whether the bottleneck is CPU or I/O.
Evaluate churn before introducing a GIN index.
Always measure before deciding.

Often the problem is not “optimizing the query”.
It is giving the planner the right index.

Because this is an extremely common scenario:

Large tables
“Contains” search patterns
Fear of introducing GIN indexes
Concern about write performance degradation

With data in hand, the decision becomes technical, not emotional.

Optimization is not magic.
It is measurement, plan analysis, and understanding real system behavior.

Official Sources #

PostgreSQL Documentation — pg_trgm extension
PostgreSQL Documentation — GIN Indexes
PostgreSQL Documentation — CREATE INDEX (incl. CONCURRENTLY)

Glossary #

GIN Index — Generalized Inverted Index: PostgreSQL index type that creates an inverted mapping from each element to the records containing it. Ideal for “contains” searches on text with pg_trgm.

B-Tree — Balanced tree data structure, the default index in relational databases. Efficient for equality and range searches, but unusable for LIKE '%value%'.

pg_trgm — PostgreSQL extension that decomposes text into trigrams (3-character sequences), enabling GIN indexes to accelerate wildcard searches.

Churn — Measure of how much a table changes after insertion. Low churn (append-only) is the best scenario for introducing a GIN index without degrading writes.

Execution Plan — Sequence of operations chosen by the database to resolve a query. Reading the plan is the first step to identify whether the problem is CPU, I/O or a wrong access path.