Pattern Sections
Pattern: Pagination
also known as: Query with Partial Result Sets, Response Sequence
Context
Clients often query APIs, fetching data item collections to be displayed to the user or to be processed in other applications. In many such queries, the API provider responds by sending a large number of items. The size of this response may be larger than what the client needs or is ready to consume.
The data set may consist of identically structured data elements (e.g., rows fetched from a relational database or line items in a batch job executed by an enterprise information system in the backend) or of heterogeneous data not adhering to a common schema (e.g., parts of a document from a document-oriented NoSQL database such as MongoDB).
The data set may be read only or change while being retrieved.
Problem
How can an API provider deliver large sequences of structured data without overwhelming clients?
Forces
Pagination balances the following forces:
- Performance, scalability, and resource use
- Information needs of individual clients
- Loose coupling and interoperability
- Developer convenience and experience
- Security and data privacy
- Test and maintenance effort
- Session awareness and isolation
- Data set size and data access profile
Pattern forces are explained in depth in the book.
Solution
Divide large response data sets into manageable and easy-to-transmit chunks (also known as pages). Send one chunk of partial results per response message, and inform the client about the total and/or remaining number of chunks. Provide optional filtering capabilities to allow clients to request a particular selection of results. For extra convenience, include a reference to the next chunk/page from the current one.
Sketch
A solution sketch for this pattern from pre-book times is:
Variants
The pattern comes in several variants: Page-Based Pagination (a somewhat tautological name), Offset-Based Pagination, Cursor-Based Pagination (also known as Token-Based Pagination) and Time-Based Pagination.
Page-Based Pagination (a somewhat tautological name) and Offset-Based Pagination refer to the elements of the data set differently. The page-based variant divides the data set into same-sized pages; the client or the provider can specify the page size. Clients then request pages by their index (like page numbers in a book). With Offset-Based Pagination, a client selects an offset into the whole data set (i.e., how many single elements to skip) and the number of elements to return in the next chunk (often referred to as “limit”). Both approaches may be used interchangeably (the offset can be calculated by multiplying the page size with the page number); they address the problem and resolve the forces in similar ways. Page-Based Pagination and Offset-Based Pagination do not differ much. Whether entries are requested with an offset and limit or all entries are divided into pages of a particular size and then requested by an index is a minor difference. Either case requires two integer parameters.
These variants are not well suited for data that changes in between requests and therefore invalidates the index or offset calculations. For example, given a data set ordered by creation time from most recent to oldest, let us assume that a client has retrieved the first page and now requests the second one. In between these requests, the element at the front of the data set is then removed, causing an element to move from the second to the first page without the client ever seeing it.
The Cursor-Based Pagination variant solves this problem: it does not rely on the absolute position of an element in the data set. Instead, clients use the Id Element of a specific element along with the number of elements to retrieve. The resulting chunk does not change even if new elements have been added since the last request. The remaining fourth variant, Time-Based Pagination, is similar to Cursor-Based Pagination, but uses timestamps instead of element IDs. It is used in practice less frequently but could be applied to scroll through a time plot by gradually requesting older or newer data points.
Example
The Lakeside Mutual customer care backend API illustrates the
Offset-Based Pagination pattern in its customer
endpoint:
curl http://localhost:8080/customers?limit=2&offset=0
This call will return the first chunk of two entities and several
control
Metadata
Elements. Besides the two HATEOAS-style link relations (Allamaraju (2010)) that
link to the current chunk and the next chunk, the response also contains
the corresponding offset
, limit
, and total
size
values. Note that size
is not required to
implement Pagination on the provider
side, but allows API clients to show end users or other consumers how
many more data elements (or pages) may be requested subsequently.
{
"offset" : 0,
"limit" : 2,
"size" : 50,
"customers" : [
...
,
...
],
"_links" : {
"next" : {
"href" : "http://localhost:8080/customers?limit=2&offset=2"
}
}
}
The example shown above can easily be mapped to the corresponding SQL
query LIMIT 2 OFFSET 0
. Instead of talking about offsets
and limits, the API could also use the page metaphor in its message
vocabulary, as shown here:
{
"page" : 0,
"pageSize" : 2,
"totalPages" : 25,
"customers" : [
...
}
Using Cursor-Based Pagination, the client first requests the initial page of a desired size:
curl http://localhost:8080/customers?page-size=2
{
"pageSize" : 2,
"customers" : [
...
,
...
],
"_links" : {
"next" : {
"href" : "http://localhost:8080/customers?page-size=2&cursor=mfn834fj"
}
}
}
The response contains a link to next chunk of data, represented by
the cursor value mfn834fj
. The cursor could be as simple as
the primary key of the database or contain more information, such as a
query filter.
Are you missing implementation hints? Our papers publications provide them (for selected patterns).
Consequences
The resolution of pattern forces and other consequences are discussed in our book.
Known Uses
The JSON API specification illustrates Pagination; while it does not mention Time-Based Pagination, it differentiates between the offset and page variants. Hence, JSON API-based implementations provide further examples for studying the pattern realization.
The roots of the pattern and its name go back to plain Web page design, e.g., when displaying search or other query results on a series of linked Web pages. An early SOA and Web services production reference that uses Pagination is Brandner et al. (2004). While not being message-based, remote JDBC applies sophisticated Pagination concepts via its Result Set abstraction.
Many public Web APIs use
Pagination; typically both
Page-Based and Offset-Based and Cursor-Based
Pagination are supported while the Time-Based Pagination
variant is less common. For example, Google’s search results are
paginated as well as GitHub’s Query API and the Slack APIs. Atlassian
also features Pagination explicitly
and prominently in its JIRA Cloud REST
APIs. Regarding correlation, the Twitter REST
API is an interesting example because the timeline often changes,
and simple Page-based and Offset-Based Pagination
therefore does not work that well. Instead, a
since_id=12345
Atomic
Parameter can be used to only retrieve tweets that are more recent
than the specified id
Id
Element.
A Swiss software vendor specializing on the insurance industry describes page- and offset-based Pagination in its internal REST API Design Guidelines. Sorting and filtering of collection records is supported via operators that travel as HTTP parameters containing control metadata.
An online API Stylebook website lists a number of Web APIs, API design guideline books, and websites that discuss Pagination.
The Pattern Adoption Story: Dutch Government & Energy Sector contributed by a reader has more known uses and further discussion of this pattern.
More Information
Related Patterns
Pagination can be seen as the opposite of Request Bundle: While Pagination is concerned with reducing the individual message size by splitting one large message into many smaller pages, Request Bundle combines several messages into a single large one.
A paginated query typically defines an Atomic Parameter List for its input parameters containing the query parameters and a Parameter Tree for its output parameters (i.e., the pages). Pagination is an alternative to sending one big Parameter Tree if a large number of repetitive data records has to be transmitted and the client only required a small fraction of the data records to proceed.
A request-response(s) correlation scheme might be required so that the client can distinguish the partial results of multiple queries in arriving response messages; the pattern Correlation Identifier (Hohpe and Woolf (2003)) might be eligible in such cases.
A Message Sequence from Hohpe and Woolf (2003) also can be used when a single large data element has to be split up.
Other Sources
Chapter 10 of Sturgeon (2016) covers Pagination types, discusses implementation approaches, and presents examples in PHP; Chapter 8 in the RESTful Web Services Cookbook by Allamaraju (2010) deals with queries in an RESTful HTTP context.
In a broader context, the User Interface (UI) and Web design community has captured Pagination patterns in different contexts (i.e., not API design and management, but interaction design and information visualization). See for example coverage of the topic at the Interaction Design Foundation and a UI Patterns website.
“Web API Design: The Missing Link”, an eBook by apigee, covers Pagination under “More on Representation Design”.
Chapter 8 of Vernon (2013) features stepwise retrieval of a notification log/archive, which can be seen as Offset-Based Pagination. RFC 5005 covers “Feed Paging and Archiving” for Atom (Nottingham 2007).