Keyset pagination in PostgreSQL like a pro
With infinite scrolling tables on websites, keyset pagination is a technique to provide approximate constant time access to subsequent pages as a user scrolls. This approach can be implemented with data stored in a relational database like PostgreSQL. It is more complex to implement than a simple approach like LIMIT + OFFSET pagination but minimizes slower query times as you scroll many pages into a result set. The database doesn’t have to load the entire result set, sort it, and then return the specified limit from the given offset. Keyset pagination achieves this by leveraging an index to speed up access to data stored on disk and avoids this overhead.
Demonstration of this technique using PostgreSQL
Consider a table of employees with 100,000 rows where you want to display the employees sorted by salary
. Because employees could have the same salary
, we also need to leverage a unique key constraint to help with tie-breakers, which is id
in this case.
postgres=# CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
age INTEGER NOT NULL,
salary NUMERIC
);
postgres=# WITH generated_series AS (
SELECT generate_series(1, 100000, 1) AS num
)
INSERT INTO employees (id, name, age, salary)
SELECT
num AS id,
'employee' || num AS name,
floor((random() * 65) + 18)::INTEGER AS age,
floor((random() * 200000) + 100000)::NUMERIC AS salary
FROM generated_series;
postgres=# CREATE INDEX employees_idx1 ON employees (salary, id);
If we want to sort by salary ASC, id ASC
and paginate through this table with page size 5, a LIMIT + OFFSET
approach is as follows for the 2nd to last page of results:
postgres=# EXPLAIN ANALYZE
SELECT *
FROM employees
ORDER BY salary ASC, id ASC
OFFSET 99990
LIMIT 5;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2736.54..2736.68 rows=5 width=27) (actual time=66.071..66.076 rows=5 loops=1)
-> Index Scan using employees_idx1 on employees (cost=0.42..2736.82 rows=100000 width=27) (actual time=0.043..61.158 rows=99995 loops=1)
Planning Time: 0.195 ms
Execution Time: 66.120 ms
(4 rows)
With keyset pagination, the query would look like so. The key filter predicate is the row tuple comparison on the key (salary, id) > (299974, 3476)
, where we look for all rows that have a salary
AND id
greater than the supplied values. NOTE: This may not match your data if you use the above queries as row values are randomly chosen.
postgres=# EXPLAIN ANALYZE
SELECT *
FROM employees
WHERE (salary, id) > (299974, 3476)
ORDER BY salary ASC, id ASC
LIMIT 5;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..6.40 rows=5 width=27) (actual time=0.012..0.025 rows=5 loops=1)
-> Index Scan using employees_idx1 on employees (cost=0.42..17.16 rows=14 width=27) (actual time=0.010..0.023 rows=5 loops=1)
Index Cond: (ROW(salary, id) > ROW('299974'::numeric, 3476))
Planning Time: 0.250 ms
Execution Time: 0.049 ms
(5 rows)
The actual total cost is 2736.68 and execution time is 66.120 ms for LIMIT + OFFSET pagination. Keyset pagination drops those numbers to 6.40 and 0.049 ms, respectively. We observe that keyset pagination is best used for serving data sequentially from columns indexed for comparisons.
4 query patterns to implement keyset pagination like a pro
With the case for keyset pagination made, I want to quickly highlight some interesting patterns that you could use if you need to implement this for your use cases.
To get the first or last page of results, you can leverage arbitrarily large or small numbers or strings, which can be constants in your application or database functions.
For instance, to get the first page in our example (salary ASC, id ASC
):
SELECT *
FROM employees
WHERE (salary, id) > (-999999999, -999999999)
ORDER BY salary ASC, id ASC
LIMIT 5;
To get the last page in our example (salary ASC, id ASC
):
WITH intermediate AS (
SELECT *
FROM employees
WHERE (salary, id) < (999999999, 999999999)
ORDER BY salary DESC, id DESC
LIMIT 5
)
SELECT *
FROM intermediate
ORDER BY salary ASC, id ASC;
If the sort order for salary
is changed to DESC
, the above 2 queries become like as follows. First, for the first page in our example (salary DESC, id DESC
):
SELECT *
FROM employees
WHERE (salary, id) < (999999999, 999999999)
ORDER BY salary DESC, id DESC
LIMIT 5;
To get the last page in our example (salary DESC, id DESC
):
WITH intermediate AS (
SELECT *
FROM employees
WHERE (salary, id) > (-999999999, -999999999)
ORDER BY salary ASC, id ASC
LIMIT 5
)
SELECT *
FROM intermediate
ORDER BY salary DESC, id DESC;
Conclusion
Using the above 4 patterns; leveraging arbitrarily large/small numbers/strings to get the first/last page; and, creating indexes on columns results are sorted by are all you need to implement keyset pagination like a pro.