Fixing the Postgres Query Planner with enable_seqscan = off

If you’re querying a table with a small number of rows in Postgres, your query will most likely use a “sequential scan”, which scans the entirety of the table to find your result. This is true even when tables have an index for the field or fields you’re querying.

Take for instance a small table with datetime and float columns, such as:

CREATE TABLE "items" (
    "timestamp" TIMESTAMP WITH TIME ZONE,
    "value" FLOAT
);
CREATE INDEX ON "items" ("timestamp");

You can see the query plan chosen by Postgres with EXPLAIN ANALYZE. For instance, with the query:

EXPLAIN ANALYZE SELECT * FROM "items" WHERE "timestamp" = '2023-02-24 12:44:27';

When not using the index:

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Seq Scan on items  (cost=0.00..18.50 rows=1 width=16) (actual time=0.277..0.279 rows=0 loops=1)
   Filter: ("timestamp" = '2023-02-24 12:44:27-07'::timestamp with time zone)
   Rows Removed by Filter: 1000
 Planning Time: 0.839 ms
 Execution Time: 0.325 ms

With the index:

                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Index Scan using items_timestamp_idx on items  (cost=0.28..8.29 rows=1 width=16) (actual time=0.010..0.011 rows=0 loops=1)
   Index Cond: ("timestamp" = '2023-02-24 12:44:27-07'::timestamp with time zone)
 Planning Time: 0.277 ms
 Execution Time: 0.036 ms

On my current system, the threshold for using the index is around 800 rows. Afterwards, the Postgres query planner will choose the index rather than performing a sequential scan of the table. Even with only a few thousand rows, the performance gain from the index is significant.

A common mistake when using indexes is to write a query that applies a function to a column, when only the column itself is indexed. The above examples are querying a full timestamp, but what if we wanted all results for the day (UTC of course)? Naively, a programmer might write the following query:

SELECT * FROM "items" WHERE DATE("timestamp" AT TIME ZONE 'utc') = '2023-02-24';

But even with the index and 10,000 rows in the table, the query planner still performs a sequential scan:

                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on items  (cost=0.00..235.00 rows=50 width=16) (actual time=0.328..4.883 rows=1609 loops=1)
   Filter: (date(timezone('utc'::text, "timestamp")) = '2023-02-24'::date)
   Rows Removed by Filter: 8391
 Planning Time: 0.377 ms
 Execution Time: 5.273 ms

There are several ways to fix the above problem, including querying with a range of timestamps that will use the existing index, but you can also create a new index for the function’s result:

CREATE INDEX ON "items" (DATE("timestamp" AT TIME ZONE 'utc'));

Which will be used by our previous query (and significantly improve performance):

                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on items  (cost=4.67..65.89 rows=50 width=16) (actual time=0.111..0.541 rows=1609 loops=1)
   Recheck Cond: (date(timezone('utc'::text, "timestamp")) = '2023-02-24'::date)
   Heap Blocks: exact=55
   ->  Bitmap Index Scan on items_date_idx  (cost=0.00..4.66 rows=50 width=0) (actual time=0.082..0.083 rows=1609 loops=1)
         Index Cond: (date(timezone('utc'::text, "timestamp")) = '2023-02-24'::date)
 Planning Time: 0.120 ms
 Execution Time: 0.722 ms

Recently, I came across a query that had made the above mistake. The query was taking over 52 seconds to scan over all 3.2 million rows in the table. However, after adding the correct index, the query was still performing a sequential scan.

It’s in these moments that a programmer begins to doubt their sanity, and question everything they’ve ever learned. But in this case, I hadn’t made a mistake, the Postgres query planner was wrong. The un-indexed query had been used for so long that the internal statistics indicated that a sequential scan should be used even with the new index present.

Thankfully, Postgres provides ways to manually alter the query planner. The documentation is clear that these should be temporary measures. In this case, since I knew the index should be more performant than a sequential scan, I ran SET enable_seqscan = off; for my current session and re-ran the query, which immediately made use of the new index. The new execution time? 2.1 milliseconds.

The Postgres documentation actively discourages use of these parameters, calling them “crude”. The documentation suggests running ANALYZE manually with an increased default_statistics_target value or by increasing the amount of collected statistics.

But at least this time, I was right.