90 lines
3.5 KiB
Markdown
90 lines
3.5 KiB
Markdown
# GraphQL pagination
|
|
|
|
## Types of pagination
|
|
|
|
GitLab uses two primary types of pagination: **offset** and **keyset**
|
|
(sometimes called cursor-based) pagination.
|
|
The GraphQL API mainly uses keyset pagination, falling back to offset pagination when needed.
|
|
|
|
### Offset pagination
|
|
|
|
This is the traditional, page-by-page pagination, that is most common,
|
|
and used across much of GitLab. You can recognize it by
|
|
a list of page numbers near the bottom of a page, which, when clicked,
|
|
take you to that page of results.
|
|
|
|
For example, when you click **Page 100**, we send `100` to the
|
|
backend. For example, if each page has say 20 items, the
|
|
backend calculates `20 * 100 = 2000`,
|
|
and it queries the database by offsetting (skipping) the first 2000
|
|
records and pulls the next 20.
|
|
|
|
```plaintext
|
|
page number * page size = where to find my records
|
|
```
|
|
|
|
There are a couple of problems with this:
|
|
|
|
- Performance. When we query for page 100 (which gives an offset of
|
|
2000), then the database has to scan through the table to that
|
|
specific offset, and then pick up the next 20 records. As the offset
|
|
increases, the performance degrades quickly.
|
|
Read more in
|
|
[The SQL I Love <3. Efficient pagination of a table with 100M records](http://allyouneedisbackend.com/blog/2017/09/24/the-sql-i-love-part-1-scanning-large-table/).
|
|
|
|
- Data stability. When you get the 20 items for page 100 (at
|
|
offset 2000), GitLab shows those 20 items. If someone then
|
|
deletes or adds records in page 99 or before, the items at
|
|
offset 2000 become a different set of items. You can even get into a
|
|
situation where, when paginating, you could skip over items,
|
|
because the list keeps changing.
|
|
Read more in
|
|
[Pagination: You're (Probably) Doing It Wrong](https://coderwall.com/p/lkcaag/pagination-you-re-probably-doing-it-wrong).
|
|
|
|
### Keyset pagination
|
|
|
|
Given any specific record, if you know how to calculate what comes
|
|
after it, you can query the database for those specific records.
|
|
|
|
For example, suppose you have a list of issues sorted by creation date.
|
|
If you know the first item on a page has a specific date (say Jan 1), you can ask
|
|
for all records that were created after that date and take the first 20.
|
|
It no longer matters if many are deleted or added, as you always ask for
|
|
the ones after that date, and so get the correct items.
|
|
|
|
Unfortunately, there is no easy way to know if the issue created
|
|
on Jan 1 is on page 20 or page 100.
|
|
|
|
Some of the benefits and tradeoffs of keyset pagination are
|
|
|
|
- Performance is much better.
|
|
|
|
- Data stability is greater since you're not going to miss records due to
|
|
deletions or insertions.
|
|
|
|
- It's the best way to do infinite scrolling.
|
|
|
|
- It's more difficult to program and maintain. Easy for `updated_at` and
|
|
`sort_order`, complicated (or impossible) for complex sorting scenarios.
|
|
|
|
## Implementation
|
|
|
|
When pagination is supported for a query, GitLab defaults to using
|
|
keyset pagination. You can see where this is configured in
|
|
[`pagination/connections.rb`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/graphql/pagination/connections.rb).
|
|
If a query returns `ActiveRecord::Relation`, keyset pagination is automatically used.
|
|
|
|
This was a conscious decision to support performance and data stability.
|
|
|
|
However, there are some cases where we have to use the offset
|
|
pagination connection, `OffsetActiveRecordRelationConnection`, such as when
|
|
sorting by label priority in issues, due to the complexity of the sort.
|
|
|
|
<!-- ### Keyset pagination -->
|
|
|
|
<!-- ### Offset pagination -->
|
|
|
|
<!-- ### External pagination -->
|
|
|
|
<!-- ### Pagination testing -->
|