Pagination as an ingredient for API scalability

Scaling your API might require fundamental changes to your application architecture and code. First, you need to determine what your scaling bottlenecks are; otherwise, you’re just making guesses.

One of the best ways to gain insights into bottlenecks is through instrumentation. By collecting data on usage and monitoring for capacity bottlenecks, you can take advantage of data-driven insights into optimizations that will help you scale.

APIs serve large datasets, and when the client requests this data, serving it all at once can overwhelm the application's backend and the client application. If you love analogies, think of it this way: Amazon sells 12+ million products, and when a client sends a Get request to the product API loading this data to the server's memory, then transmits it to the client becomes resource inefficient.

For these reasons, it's important to chunk the response data and transmit each chunk when needed, which can at least minimize response time for requests, and make responses easier to handle. And this is where pagination, which is one of the best API scaling techniques comes in.

Pagination Techniques

We are going to explore some of the techniques that you can use to paginate an API.

1. Offset based pagination.

Limiting your response dataset by use of limits and offsets is generally the easiest and the most widely used pagination technique. To paginate this way, clients provide a page size that defines the maximum number of items to return and a page number that indicates the starting position in the list of items.

Based on these values, servers storing data in a SQL database can easily construct a query to fetch results. For instance, to fetch the fifth page of items with each page’s size being 10, we should load 10 items, starting after 40 items (skipping the first 4 pages of size 10). The corresponding SQL query would look like the following:

SELECT * FROM `items`
ORDER BY `id` ASC
LIMIT 10 OFFSET 40;

APIs, such as GitHub, support this kind of pagination. Clients can simply request with page and per_page parameters specified in the URL.

Advantages and disadvantages

Offset-based pagination is extremely simple to implement, both for clients and the server. It also has user experience advantages. It allows users to jump into any arbitrary page instead of forcing them to scroll through the entire content.

However, this technique has a few disadvantages:

i) It’s inefficient for large datasets. SQL queries with large offsets are pretty expensive. The database has to count and skip rows up to the offset value before it gets to returning the desired set of items.

ii) It can be unreliable when the list of items changes frequently. The addition of an item while a client is paginating through results could cause the client to display the same item twice. Similarly, on the removal of an item, a client might end up skipping it at the boundary.

iii) Offset-based pagination can be tricky in a distributed system. For large offsets, you might need to scan several shards before you get to the desired set of items. That said, offset paginations can be great when pagination depth is limited and clients can tolerate duplicates or missed items.

2. Cursor-Based Pagination

To address the problems of offset-based pagination, various APIs have adopted a technique called cursor-based pagination. To use this technique, clients first send a request while passing only the desired number of items. The server then responds with the requested number of items (or the maximum number of items supported and avail‐ able), in addition to a text cursor.

In the subsequent request, along with the number of items, clients pass this cursor indicating the starting position for the next set of items. Implementing cursor-based pagination is not very different from offset-based pagination. However, it’s much more efficient. Systems that store data in a SQL database can create queries based on the cursor values and retrieve results.

Suppose that a server returns a Unix timestamp of the last record as the cursor. To fetch a page of results that are older than that given cursor, the server can construct a SQL query like the following:

SELECT * FROM items
WHERE created_at < 1507876861
ORDER BY created_at
LIMIT 10;

Having an index on the column created_at in the preceding example makes the query fast. Several modern APIs, including those of Slack, Stripe, Twitter, and Facebook, offer cursor-based pagination.

Having an index on the column created_at in the preceding example makes the query fast. Several modern APIs, including those of Slack, Stripe, Twitter, and Facebook, offer cursor-based pagination.

Advantages and disadvantages

Cursor-based pagination addresses both the issues are seen with offset based pagination:

i) One of the key benefits of cursor-based pagination is performance. With an index on the column used in the cursor for pagination, even queries requiring scanning large tables are fast.

ii) The addition or removal of items does not affect the result set of a page. While paginating across results, the server returns every item exactly once. Cursor-based pagination is great for large and dynamic datasets.

However, it has a few drawbacks:

i) Clients cannot jump to a given page. They need to traverse through the entire result set page by page.

ii) The results must be sorted on a unique and sequential database column, used for the cursor value. It should not be possible to add records at a random position in the list.

iii) Implementing cursor-based pagination is a bit more complex than offset-based pagination, particularly for clients. Clients often need to store the cursor value to use it in subsequent requests.

The End!

Thanks for reading this article to this end. Hope it has been helpful and informative.

Did you find this article valuable?

Support Evans Opilo by becoming a sponsor. Any amount is appreciated!