950 lines
48 KiB
Markdown
950 lines
48 KiB
Markdown
|
---
|
||
|
stage: Enablement
|
||
|
group: Database
|
||
|
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
|
||
|
---
|
||
|
|
||
|
# Efficient `IN` operator queries
|
||
|
|
||
|
This document describes a technique for building efficient ordered database queries with the `IN`
|
||
|
SQL operator and the usage of a GitLab utility module to help apply the technique.
|
||
|
|
||
|
NOTE:
|
||
|
The described technique makes heavy use of
|
||
|
[keyset pagination](pagination_guidelines.md#keyset-pagination).
|
||
|
It's advised to get familiar with the topic first.
|
||
|
|
||
|
## Motivation
|
||
|
|
||
|
In GitLab, many domain objects like `Issue` live under nested hierarchies of projects and groups.
|
||
|
To fetch nested database records for domain objects at the group-level,
|
||
|
we often perform queries with the `IN` SQL operator.
|
||
|
We are usually interested in ordering the records by some attributes
|
||
|
and limiting the number of records using `ORDER BY` and `LIMIT` clauses for performance.
|
||
|
Pagination may be used to fetch subsequent records.
|
||
|
|
||
|
Example tasks requiring querying nested domain objects from the group level:
|
||
|
|
||
|
- Show first 20 issues by creation date or due date from the group `gitlab-org`.
|
||
|
- Show first 20 merge_requests by merged at date from the group `gitlab-com`.
|
||
|
|
||
|
Unfortunately, ordered group-level queries typically perform badly
|
||
|
as their executions require heavy I/O, memory, and computations.
|
||
|
Let's do an in-depth examination of executing one such query.
|
||
|
|
||
|
### Performance problems with `IN` queries
|
||
|
|
||
|
Consider the task of fetching the twenty oldest created issues
|
||
|
from the group `gitlab-org` with the following query:
|
||
|
|
||
|
```sql
|
||
|
SELECT "issues".*
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" IN
|
||
|
(SELECT "projects"."id"
|
||
|
FROM "projects"
|
||
|
WHERE "projects"."namespace_id" IN
|
||
|
(SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
|
||
|
FROM "namespaces"
|
||
|
WHERE (traversal_ids @> ('{9970}'))))
|
||
|
ORDER BY "issues"."created_at" ASC,
|
||
|
"issues"."id" ASC
|
||
|
LIMIT 20
|
||
|
```
|
||
|
|
||
|
NOTE:
|
||
|
For pagination, ordering by the `created_at` column is not enough,
|
||
|
we must add the `id` column as a
|
||
|
[tie-breaker](pagination_performance_guidelines.md#tie-breaker-column).
|
||
|
|
||
|
The execution of the query can be largely broken down into three steps:
|
||
|
|
||
|
1. The database accesses both `namespaces` and `projects` tables
|
||
|
to find all projects from all groups in the group hierarchy.
|
||
|
1. The database retrieves `issues` records for each project causing heavy disk I/O.
|
||
|
Ideally, an appropriate index configuration should optimize this process.
|
||
|
1. The database sorts the `issues` rows in memory by `created_at` and returns `LIMIT 20` rows to
|
||
|
the end-user. For large groups, this final step requires both large memory and CPU resources.
|
||
|
|
||
|
<details>
|
||
|
<summary>Expand this sentence to see the execution plan for this DB query.</summary>
|
||
|
<pre><code>
|
||
|
Limit (cost=90170.07..90170.12 rows=20 width=1329) (actual time=967.597..967.607 rows=20 loops=1)
|
||
|
Buffers: shared hit=239127 read=3060
|
||
|
I/O Timings: read=336.879
|
||
|
-> Sort (cost=90170.07..90224.02 rows=21578 width=1329) (actual time=967.596..967.603 rows=20 loops=1)
|
||
|
Sort Key: issues.created_at, issues.id
|
||
|
Sort Method: top-N heapsort Memory: 74kB
|
||
|
Buffers: shared hit=239127 read=3060
|
||
|
I/O Timings: read=336.879
|
||
|
-> Nested Loop (cost=1305.66..89595.89 rows=21578 width=1329) (actual time=4.709..797.659 rows=241534 loops=1)
|
||
|
Buffers: shared hit=239121 read=3060
|
||
|
I/O Timings: read=336.879
|
||
|
-> HashAggregate (cost=1305.10..1360.22 rows=5512 width=4) (actual time=4.657..5.370 rows=1528 loops=1)
|
||
|
Group Key: projects.id
|
||
|
Buffers: shared hit=2597
|
||
|
-> Nested Loop (cost=576.76..1291.32 rows=5512 width=4) (actual time=2.427..4.244 rows=1528 loops=1)
|
||
|
Buffers: shared hit=2597
|
||
|
-> HashAggregate (cost=576.32..579.06 rows=274 width=25) (actual time=2.406..2.447 rows=265 loops=1)
|
||
|
Group Key: namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)]
|
||
|
Buffers: shared hit=334
|
||
|
-> Bitmap Heap Scan on namespaces (cost=141.62..575.63 rows=274 width=25) (actual time=1.933..2.330 rows=265 loops=1)
|
||
|
Recheck Cond: (traversal_ids @> '{9970}'::integer[])
|
||
|
Heap Blocks: exact=243
|
||
|
Buffers: shared hit=334
|
||
|
-> Bitmap Index Scan on index_namespaces_on_traversal_ids (cost=0.00..141.55 rows=274 width=0) (actual time=1.897..1.898 rows=265 loops=1)
|
||
|
Index Cond: (traversal_ids @> '{9970}'::integer[])
|
||
|
Buffers: shared hit=91
|
||
|
-> Index Only Scan using index_projects_on_namespace_id_and_id on projects (cost=0.44..2.40 rows=20 width=8) (actual time=0.004..0.006 rows=6 loops=265)
|
||
|
Index Cond: (namespace_id = (namespaces.traversal_ids)[array_length(namespaces.traversal_ids, 1)])
|
||
|
Heap Fetches: 51
|
||
|
Buffers: shared hit=2263
|
||
|
-> Index Scan using index_issues_on_project_id_and_iid on issues (cost=0.57..10.57 rows=544 width=1329) (actual time=0.114..0.484 rows=158 loops=1528)
|
||
|
Index Cond: (project_id = projects.id)
|
||
|
Buffers: shared hit=236524 read=3060
|
||
|
I/O Timings: read=336.879
|
||
|
Planning Time: 7.750 ms
|
||
|
Execution Time: 967.973 ms
|
||
|
(36 rows)
|
||
|
</code></pre>
|
||
|
</details>
|
||
|
|
||
|
The performance of the query depends on the number of rows in the database.
|
||
|
On average, we can say the following:
|
||
|
|
||
|
- Number of groups in the group-hierarchy: less than 1 000
|
||
|
- Number of projects: less than 5 000
|
||
|
- Number of issues: less than 100 000
|
||
|
|
||
|
From the list, it's apparent that the number of `issues` records has
|
||
|
the largest impact on the performance.
|
||
|
As per normal usage, we can say that the number of issue records grows
|
||
|
at a faster rate than the `namespaces` and the `projects` records.
|
||
|
|
||
|
This problem affects most of our group-level features where records are listed
|
||
|
in a specific order, such as group-level issues, merge requests pages, and APIs.
|
||
|
For very large groups the database queries can easily time out, causing HTTP 500 errors.
|
||
|
|
||
|
## Optimizing ordered `IN` queries
|
||
|
|
||
|
In the talk
|
||
|
["How to teach an elephant to dance rock'n'roll"](https://www.youtube.com/watch?v=Ha38lcjVyhQ),
|
||
|
Maxim Boguk demonstrated a technique to optimize a special class of ordered `IN` queries,
|
||
|
such as our ordered group-level queries.
|
||
|
|
||
|
A typical ordered `IN` query may look like this:
|
||
|
|
||
|
```sql
|
||
|
SELECT t.* FROM t
|
||
|
WHERE t.fkey IN (value_set)
|
||
|
ORDER BY t.pkey
|
||
|
LIMIT N;
|
||
|
```
|
||
|
|
||
|
Here's the key insight used in the technique: we need at most `|value_set| + N` record lookups,
|
||
|
rather than retrieving all records satisfying the condition `t.fkey IN value_set` (`|value_set|`
|
||
|
is the number of values in `value_set`).
|
||
|
|
||
|
We adopted and generalized the technique for use in GitLab by implementing utilities in the
|
||
|
`Gitlab::Pagination::Keyset::InOperatorOptimization` class to facilitate building efficient `IN`
|
||
|
queries.
|
||
|
|
||
|
### Requirements
|
||
|
|
||
|
The technique is not a drop-in replacement for the existing group-level queries using `IN` operator.
|
||
|
The technique can only optimize `IN` queries that satisfy the following requirements:
|
||
|
|
||
|
- `LIMIT` is present, which usually means that the query is paginated
|
||
|
(offset or keyset pagination).
|
||
|
- The column used with the `IN` query and the columns in the `ORDER BY`
|
||
|
clause are covered with a database index. The columns in the index must be
|
||
|
in the following order: `column_for_the_in_query`, `order by column 1`, and
|
||
|
`order by column 2`.
|
||
|
- The columns in the `ORDER BY` clause are distinct
|
||
|
(the combination of the columns uniquely identifies one particular column in the table).
|
||
|
|
||
|
WARNING:
|
||
|
This technique will not improve the performance of the `COUNT(*)` queries.
|
||
|
|
||
|
## The `InOperatorOptimization` module
|
||
|
|
||
|
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/67352) in GitLab 14.3.
|
||
|
|
||
|
The `Gitlab::Pagination::Keyset::InOperatorOptimization` module implements utilities for applying a generalized version of
|
||
|
the efficient `IN` query technique described in the previous section.
|
||
|
|
||
|
To build optimized, ordered `IN` queries that meet [the requirements](#requirements),
|
||
|
use the utility class `QueryBuilder` from the module.
|
||
|
|
||
|
NOTE:
|
||
|
The generic keyset pagination module introduced in the merge request
|
||
|
[51481](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/51481)
|
||
|
plays a fundamental role in the generalized implementation of the technique
|
||
|
in `Gitlab::Pagination::Keyset::InOperatorOptimization`.
|
||
|
|
||
|
### Basic usage of `QueryBuilder`
|
||
|
|
||
|
To illustrate a basic usage, we will build a query that
|
||
|
fetches 20 issues with the oldest `created_at` from the group `gitlab-org`.
|
||
|
|
||
|
The following ActiveRecord query would produce a query similar to
|
||
|
[the unoptimized query](#performance-problems-with-in-queries) that we examined earlier:
|
||
|
|
||
|
```ruby
|
||
|
scope = Issue
|
||
|
.where(project_id: Group.find(9970).all_projects.select(:id)) # `gitlab-org` group and its subgroups
|
||
|
.order(:created_at, :id)
|
||
|
.limit(20)
|
||
|
```
|
||
|
|
||
|
Instead, use the query builder `InOperatorOptimization::QueryBuilder` to produce an optimized
|
||
|
version:
|
||
|
|
||
|
```ruby
|
||
|
scope = Issue.order(:created_at, :id)
|
||
|
array_scope = Group.find(9970).all_projects.select(:id)
|
||
|
array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
|
||
|
finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
|
||
|
|
||
|
Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
|
||
|
scope: scope,
|
||
|
array_scope: array_scope,
|
||
|
array_mapping_scope: array_mapping_scope,
|
||
|
finder_query: finder_query
|
||
|
).execute.limit(20)
|
||
|
```
|
||
|
|
||
|
- `scope` represents the original `ActiveRecord::Relation` object without the `IN` query. The
|
||
|
relation should define an order which must be supported by the
|
||
|
[keyset pagination library](keyset_pagination.md#usage).
|
||
|
- `array_scope` contains the `ActiveRecord::Relation` object, which represents the original
|
||
|
`IN (subquery)`. The select values must contain the columns by which the subquery is "connected"
|
||
|
to the main query: the `id` of the project record.
|
||
|
- `array_mapping_scope` defines a lambda returning an `ActiveRecord::Relation` object. The lambda
|
||
|
matches (`=`) single select values from the `array_scope`. The lambda yields as many
|
||
|
arguments as the select values defined in the `array_scope`. The arguments are Arel SQL expressions.
|
||
|
- `finder_query` loads the actual record row from the database. It must also be a lambda, where
|
||
|
the order by column expressions is available for locating the record. In this example, the
|
||
|
yielded values are `created_at` and `id` SQL expressions. Finding a record is very fast via the
|
||
|
primary key, so we don't use the `created_at` value.
|
||
|
|
||
|
The following database index on the `issues` table must be present
|
||
|
to make the query execute efficiently:
|
||
|
|
||
|
```sql
|
||
|
"idx_issues_on_project_id_and_created_at_and_id" btree (project_id, created_at, id)
|
||
|
```
|
||
|
|
||
|
<details>
|
||
|
<summary>Expand this sentence to see the SQL query.</summary>
|
||
|
<pre><code>
|
||
|
SELECT "issues".*
|
||
|
FROM
|
||
|
(WITH RECURSIVE "array_cte" AS MATERIALIZED
|
||
|
(SELECT "projects"."id"
|
||
|
FROM "projects"
|
||
|
WHERE "projects"."namespace_id" IN
|
||
|
(SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
|
||
|
FROM "namespaces"
|
||
|
WHERE (traversal_ids @> ('{9970}')))),
|
||
|
"recursive_keyset_cte" AS ( -- initializer row start
|
||
|
(SELECT NULL::issues AS records,
|
||
|
array_cte_id_array,
|
||
|
issues_created_at_array,
|
||
|
issues_id_array,
|
||
|
0::bigint AS COUNT
|
||
|
FROM
|
||
|
(SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
|
||
|
ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
|
||
|
ARRAY_AGG("issues"."id") AS issues_id_array
|
||
|
FROM
|
||
|
(SELECT "array_cte"."id"
|
||
|
FROM array_cte) array_cte
|
||
|
LEFT JOIN LATERAL
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" = "array_cte"."id"
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
|
||
|
LIMIT 1) issues ON TRUE
|
||
|
WHERE "issues"."created_at" IS NOT NULL
|
||
|
AND "issues"."id" IS NOT NULL) array_scope_lateral_query
|
||
|
LIMIT 1)
|
||
|
-- initializer row finished
|
||
|
UNION ALL
|
||
|
(SELECT
|
||
|
-- result row start
|
||
|
(SELECT issues -- record finder query as the first column
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."id" = recursive_keyset_cte.issues_id_array[position]
|
||
|
LIMIT 1),
|
||
|
array_cte_id_array,
|
||
|
recursive_keyset_cte.issues_created_at_array[:position_query.position-1]||next_cursor_values.created_at||recursive_keyset_cte.issues_created_at_array[position_query.position+1:],
|
||
|
recursive_keyset_cte.issues_id_array[:position_query.position-1]||next_cursor_values.id||recursive_keyset_cte.issues_id_array[position_query.position+1:],
|
||
|
recursive_keyset_cte.count + 1
|
||
|
-- result row finished
|
||
|
FROM recursive_keyset_cte,
|
||
|
LATERAL
|
||
|
-- finding the cursor values of the next record start
|
||
|
(SELECT created_at,
|
||
|
id,
|
||
|
position
|
||
|
FROM UNNEST(issues_created_at_array, issues_id_array) WITH
|
||
|
ORDINALITY AS u(created_at, id, position)
|
||
|
WHERE created_at IS NOT NULL
|
||
|
AND id IS NOT NULL
|
||
|
ORDER BY "created_at" ASC, "id" ASC
|
||
|
LIMIT 1) AS position_query,
|
||
|
-- finding the cursor values of the next record end
|
||
|
-- finding the next cursor values (next_cursor_values_query) start
|
||
|
LATERAL
|
||
|
(SELECT "record"."created_at",
|
||
|
"record"."id"
|
||
|
FROM (
|
||
|
VALUES (NULL,
|
||
|
NULL)) AS nulls
|
||
|
LEFT JOIN
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM (
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
|
||
|
AND recursive_keyset_cte.issues_created_at_array[position] IS NULL
|
||
|
AND "issues"."created_at" IS NULL
|
||
|
AND "issues"."id" > recursive_keyset_cte.issues_id_array[position]
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
|
||
|
UNION ALL
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
|
||
|
AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
|
||
|
AND "issues"."created_at" IS NULL
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
|
||
|
UNION ALL
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
|
||
|
AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
|
||
|
AND "issues"."created_at" > recursive_keyset_cte.issues_created_at_array[position]
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
|
||
|
UNION ALL
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[position]
|
||
|
AND recursive_keyset_cte.issues_created_at_array[position] IS NOT NULL
|
||
|
AND "issues"."created_at" = recursive_keyset_cte.issues_created_at_array[position]
|
||
|
AND "issues"."id" > recursive_keyset_cte.issues_id_array[position]
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)) issues
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
|
||
|
LIMIT 1) record ON TRUE
|
||
|
LIMIT 1) AS next_cursor_values))
|
||
|
-- finding the next cursor values (next_cursor_values_query) END
|
||
|
SELECT (records).*
|
||
|
FROM "recursive_keyset_cte" AS "issues"
|
||
|
WHERE (COUNT <> 0)) issues -- filtering out the initializer row
|
||
|
LIMIT 20
|
||
|
</code></pre>
|
||
|
</details>
|
||
|
|
||
|
### Using the `IN` query optimization
|
||
|
|
||
|
#### Adding more filters
|
||
|
|
||
|
In this example, let's add an extra filter by `milestone_id`.
|
||
|
|
||
|
Be careful when adding extra filters to the query. If the column is not covered by the same index,
|
||
|
then the query might perform worse than the non-optimized query. The `milestone_id` column in the
|
||
|
`issues` table is currently covered by a different index:
|
||
|
|
||
|
```sql
|
||
|
"index_issues_on_milestone_id" btree (milestone_id)
|
||
|
```
|
||
|
|
||
|
Adding the `miletone_id = X` filter to the `scope` argument or to the optimized scope causes bad performance.
|
||
|
|
||
|
Example (bad):
|
||
|
|
||
|
```ruby
|
||
|
Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
|
||
|
scope: scope,
|
||
|
array_scope: array_scope,
|
||
|
array_mapping_scope: array_mapping_scope,
|
||
|
finder_query: finder_query
|
||
|
).execute
|
||
|
.where(milestone_id: 5)
|
||
|
.limit(20)
|
||
|
```
|
||
|
|
||
|
To address this concern, we could define another index:
|
||
|
|
||
|
```sql
|
||
|
"idx_issues_on_project_id_and_milestone_id_and_created_at_and_id" btree (project_id, milestone_id, created_at, id)
|
||
|
```
|
||
|
|
||
|
Adding more indexes to the `issues` table could significantly affect the performance of
|
||
|
the `UPDATE` queries. In this case, it's better to rely on the original query. It means that if we
|
||
|
want to use the optimization for the unfiltered page we need to add extra logic in the application code:
|
||
|
|
||
|
```ruby
|
||
|
if optimization_possible? # no extra params or params covered with the same index as the ORDER BY clause
|
||
|
run_optimized_query
|
||
|
else
|
||
|
run_normal_in_query
|
||
|
end
|
||
|
```
|
||
|
|
||
|
#### Multiple `IN` queries
|
||
|
|
||
|
Let's assume that we want to extend the group-level queries to include only incident and test case
|
||
|
issue types.
|
||
|
|
||
|
The original ActiveRecord query would look like this:
|
||
|
|
||
|
```ruby
|
||
|
scope = Issue
|
||
|
.where(project_id: Group.find(9970).all_projects.select(:id)) # `gitlab-org` group and its subgroups
|
||
|
.where(issue_type: [:incident, :test_case]) # 1, 2
|
||
|
.order(:created_at, :id)
|
||
|
.limit(20)
|
||
|
```
|
||
|
|
||
|
To construct the array scope, we'll need to take the Cartesian product of the `project_id IN` and
|
||
|
the `issue_type IN` queries. `issue_type` is an ActiveRecord enum, so we need to
|
||
|
construct the following table:
|
||
|
|
||
|
| `project_id` | `issue_type_value` |
|
||
|
| ------------ | ------------------ |
|
||
|
| 2 | 1 |
|
||
|
| 2 | 2 |
|
||
|
| 5 | 1 |
|
||
|
| 5 | 2 |
|
||
|
| 10 | 1 |
|
||
|
| 10 | 2 |
|
||
|
| 9 | 1 |
|
||
|
| 9 | 2 |
|
||
|
|
||
|
For the `issue_types` query we can construct a value list without querying a table:
|
||
|
|
||
|
```ruby
|
||
|
value_list = Arel::Nodes::ValuesList.new([[Issue.issue_types[:incident]],[Issue.issue_types[:test_case]]])
|
||
|
issue_type_values = Arel::Nodes::Grouping.new(value_list).as('issue_type_values (value)').to_sql
|
||
|
|
||
|
array_scope = Group
|
||
|
.find(9970)
|
||
|
.all_projects
|
||
|
.from("#{Project.table_name}, #{issue_type_values}")
|
||
|
.select(:id, :value)
|
||
|
```
|
||
|
|
||
|
Building the `array_mapping_scope` query requires two arguments: `id` and `issue_type_value`:
|
||
|
|
||
|
```ruby
|
||
|
array_mapping_scope = -> (id_expression, issue_type_value_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)).where(Issue.arel_table[:issue_type].eq(issue_type_value_expression)) }
|
||
|
```
|
||
|
|
||
|
The `scope` and the `finder` queries don't change:
|
||
|
|
||
|
```ruby
|
||
|
scope = Issue.order(:created_at, :id)
|
||
|
finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
|
||
|
|
||
|
Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder.new(
|
||
|
scope: scope,
|
||
|
array_scope: array_scope,
|
||
|
array_mapping_scope: array_mapping_scope,
|
||
|
finder_query: finder_query
|
||
|
).execute.limit(20)
|
||
|
```
|
||
|
|
||
|
<details>
|
||
|
<summary>Expand this sentence to see the SQL query.</summary>
|
||
|
<pre><code>
|
||
|
SELECT "issues".*
|
||
|
FROM
|
||
|
(WITH RECURSIVE "array_cte" AS MATERIALIZED
|
||
|
(SELECT "projects"."id", "value"
|
||
|
FROM projects, (
|
||
|
VALUES (1), (2)) AS issue_type_values (value)
|
||
|
WHERE "projects"."namespace_id" IN
|
||
|
(WITH RECURSIVE "base_and_descendants" AS (
|
||
|
(SELECT "namespaces".*
|
||
|
FROM "namespaces"
|
||
|
WHERE "namespaces"."type" = 'Group'
|
||
|
AND "namespaces"."id" = 9970)
|
||
|
UNION
|
||
|
(SELECT "namespaces".*
|
||
|
FROM "namespaces", "base_and_descendants"
|
||
|
WHERE "namespaces"."type" = 'Group'
|
||
|
AND "namespaces"."parent_id" = "base_and_descendants"."id")) SELECT "id"
|
||
|
FROM "base_and_descendants" AS "namespaces")),
|
||
|
"recursive_keyset_cte" AS (
|
||
|
(SELECT NULL::issues AS records,
|
||
|
array_cte_id_array,
|
||
|
array_cte_value_array,
|
||
|
issues_created_at_array,
|
||
|
issues_id_array,
|
||
|
0::bigint AS COUNT
|
||
|
FROM
|
||
|
(SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
|
||
|
ARRAY_AGG("array_cte"."value") AS array_cte_value_array,
|
||
|
ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
|
||
|
ARRAY_AGG("issues"."id") AS issues_id_array
|
||
|
FROM
|
||
|
(SELECT "array_cte"."id",
|
||
|
"array_cte"."value"
|
||
|
FROM array_cte) array_cte
|
||
|
LEFT JOIN LATERAL
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" = "array_cte"."id"
|
||
|
AND "issues"."issue_type" = "array_cte"."value"
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
|
||
|
LIMIT 1) issues ON TRUE
|
||
|
WHERE "issues"."created_at" IS NOT NULL
|
||
|
AND "issues"."id" IS NOT NULL) array_scope_lateral_query
|
||
|
LIMIT 1)
|
||
|
UNION ALL
|
||
|
(SELECT
|
||
|
(SELECT issues
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."id" = recursive_keyset_cte.issues_id_array[POSITION]
|
||
|
LIMIT 1), array_cte_id_array,
|
||
|
array_cte_value_array,
|
||
|
recursive_keyset_cte.issues_created_at_array[:position_query.position-1]||next_cursor_values.created_at||recursive_keyset_cte.issues_created_at_array[position_query.position+1:], recursive_keyset_cte.issues_id_array[:position_query.position-1]||next_cursor_values.id||recursive_keyset_cte.issues_id_array[position_query.position+1:], recursive_keyset_cte.count + 1
|
||
|
FROM recursive_keyset_cte,
|
||
|
LATERAL
|
||
|
(SELECT created_at,
|
||
|
id,
|
||
|
POSITION
|
||
|
FROM UNNEST(issues_created_at_array, issues_id_array) WITH
|
||
|
ORDINALITY AS u(created_at, id, POSITION)
|
||
|
WHERE created_at IS NOT NULL
|
||
|
AND id IS NOT NULL
|
||
|
ORDER BY "created_at" ASC, "id" ASC
|
||
|
LIMIT 1) AS position_query,
|
||
|
LATERAL
|
||
|
(SELECT "record"."created_at",
|
||
|
"record"."id"
|
||
|
FROM (
|
||
|
VALUES (NULL,
|
||
|
NULL)) AS nulls
|
||
|
LEFT JOIN
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM (
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
|
||
|
AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
|
||
|
AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NULL
|
||
|
AND "issues"."created_at" IS NULL
|
||
|
AND "issues"."id" > recursive_keyset_cte.issues_id_array[POSITION]
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
|
||
|
UNION ALL
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
|
||
|
AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
|
||
|
AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
|
||
|
AND "issues"."created_at" IS NULL
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
|
||
|
UNION ALL
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
|
||
|
AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
|
||
|
AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
|
||
|
AND "issues"."created_at" > recursive_keyset_cte.issues_created_at_array[POSITION]
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)
|
||
|
UNION ALL
|
||
|
(SELECT "issues"."created_at",
|
||
|
"issues"."id"
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" = recursive_keyset_cte.array_cte_id_array[POSITION]
|
||
|
AND "issues"."issue_type" = recursive_keyset_cte.array_cte_value_array[POSITION]
|
||
|
AND recursive_keyset_cte.issues_created_at_array[POSITION] IS NOT NULL
|
||
|
AND "issues"."created_at" = recursive_keyset_cte.issues_created_at_array[POSITION]
|
||
|
AND "issues"."id" > recursive_keyset_cte.issues_id_array[POSITION]
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC)) issues
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
|
||
|
LIMIT 1) record ON TRUE
|
||
|
LIMIT 1) AS next_cursor_values)) SELECT (records).*
|
||
|
FROM "recursive_keyset_cte" AS "issues"
|
||
|
WHERE (COUNT <> 0)) issues
|
||
|
LIMIT 20
|
||
|
</code>
|
||
|
</pre>
|
||
|
</details>
|
||
|
|
||
|
NOTE:
|
||
|
To make the query efficient, the following columns need to be covered with an index: `project_id`, `issue_type`, `created_at`, and `id`.
|
||
|
|
||
|
#### Batch iteration
|
||
|
|
||
|
Batch iteration over the records is possible via the keyset `Iterator` class.
|
||
|
|
||
|
```ruby
|
||
|
scope = Issue.order(:created_at, :id)
|
||
|
array_scope = Group.find(9970).all_projects.select(:id)
|
||
|
array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
|
||
|
finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
|
||
|
|
||
|
opts = {
|
||
|
in_operator_optimization_options: {
|
||
|
array_scope: array_scope,
|
||
|
array_mapping_scope: array_mapping_scope,
|
||
|
finder_query: finder_query
|
||
|
}
|
||
|
}
|
||
|
|
||
|
Gitlab::Pagination::Keyset::Iterator.new(scope: scope, **opts).each_batch(of: 100) do |records|
|
||
|
puts records.select(:id).map { |r| [r.id] }
|
||
|
end
|
||
|
```
|
||
|
|
||
|
#### Keyset pagination
|
||
|
|
||
|
The optimization works out of the box with GraphQL and the `keyset_paginate` helper method.
|
||
|
Read more about [keyset pagination](keyset_pagination.md).
|
||
|
|
||
|
```ruby
|
||
|
array_scope = Group.find(9970).all_projects.select(:id)
|
||
|
array_mapping_scope = -> (id_expression) { Issue.where(Issue.arel_table[:project_id].eq(id_expression)) }
|
||
|
finder_query = -> (created_at_expression, id_expression) { Issue.where(Issue.arel_table[:id].eq(id_expression)) }
|
||
|
|
||
|
opts = {
|
||
|
in_operator_optimization_options: {
|
||
|
array_scope: array_scope,
|
||
|
array_mapping_scope: array_mapping_scope,
|
||
|
finder_query: finder_query
|
||
|
}
|
||
|
}
|
||
|
|
||
|
issues = Issue
|
||
|
.order(:created_at, :id)
|
||
|
.keyset_paginate(per_page: 20, keyset_order_options: opts)
|
||
|
.records
|
||
|
```
|
||
|
|
||
|
#### Offset pagination with Kaminari
|
||
|
|
||
|
The `ActiveRecord` scope produced by the `InOperatorOptimization` class can be used in
|
||
|
[offset-paginated](pagination_guidelines.md#offset-pagination)
|
||
|
queries.
|
||
|
|
||
|
```ruby
|
||
|
Gitlab::Pagination::Keyset::InOperatorOptimization::QueryBuilder
|
||
|
.new(...)
|
||
|
.execute
|
||
|
.page(1)
|
||
|
.per(20)
|
||
|
.without_count
|
||
|
```
|
||
|
|
||
|
## Generalized `IN` optimization technique
|
||
|
|
||
|
Let's dive into how `QueryBuilder` builds the optimized query
|
||
|
to fetch the twenty oldest created issues from the group `gitlab-org`
|
||
|
using the generalized `IN` optimization technique.
|
||
|
|
||
|
### Array CTE
|
||
|
|
||
|
As the first step, we use a common table expression (CTE) for collecting the `projects.id` values.
|
||
|
This is done by wrapping the incoming `array_scope` ActiveRecord relation parameter with a CTE.
|
||
|
|
||
|
```sql
|
||
|
WITH array_cte AS MATERIALIZED (
|
||
|
SELECT "projects"."id"
|
||
|
FROM "projects"
|
||
|
WHERE "projects"."namespace_id" IN
|
||
|
(SELECT traversal_ids[array_length(traversal_ids, 1)] AS id
|
||
|
FROM "namespaces"
|
||
|
WHERE (traversal_ids @> ('{9970}')))
|
||
|
)
|
||
|
```
|
||
|
|
||
|
This query produces the following result set with only one column (`projects.id`):
|
||
|
|
||
|
| ID |
|
||
|
| --- |
|
||
|
| 9 |
|
||
|
| 2 |
|
||
|
| 5 |
|
||
|
| 10 |
|
||
|
|
||
|
### Array mapping
|
||
|
|
||
|
For each project (that is, each record storing a project ID in `array_cte`),
|
||
|
we will fetch the cursor value identifying the first issue respecting the `ORDER BY` clause.
|
||
|
|
||
|
As an example, let's pick the first record `ID=9` from `array_cte`.
|
||
|
The following query should fetch the cursor value `(created_at, id)` identifying
|
||
|
the first issue record respecting the `ORDER BY` clause for the project with `ID=9`:
|
||
|
|
||
|
```sql
|
||
|
SELECT "issues"."created_at", "issues"."id"
|
||
|
FROM "issues"."project_id"=9
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
|
||
|
LIMIT 1;
|
||
|
```
|
||
|
|
||
|
We will use `LATERAL JOIN` to loop over the records in the `array_cte` and find the
|
||
|
cursor value for each project. The query would be built using the `array_mapping_scope` lambda
|
||
|
function.
|
||
|
|
||
|
```sql
|
||
|
SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array,
|
||
|
ARRAY_AGG("issues"."created_at") AS issues_created_at_array,
|
||
|
ARRAY_AGG("issues"."id") AS issues_id_array
|
||
|
FROM (
|
||
|
SELECT "array_cte"."id" FROM array_cte
|
||
|
) array_cte
|
||
|
LEFT JOIN LATERAL
|
||
|
(
|
||
|
SELECT "issues"."created_at", "issues"."id"
|
||
|
FROM "issues"
|
||
|
WHERE "issues"."project_id" = "array_cte"."id"
|
||
|
ORDER BY "issues"."created_at" ASC, "issues"."id" ASC
|
||
|
LIMIT 1
|
||
|
) issues ON TRUE
|
||
|
```
|
||
|
|
||
|
Since we have an index on `project_id`, `created_at`, and `id`,
|
||
|
index-only scans should quickly locate all the cursor values.
|
||
|
|
||
|
This is how the query could be translated to Ruby:
|
||
|
|
||
|
```ruby
|
||
|
created_at_values = []
|
||
|
id_values = []
|
||
|
project_ids.map do |project_id|
|
||
|
created_at, id = Issue.select(:created_at, :id).where(project_id: project_id).order(:created_at, :id).limit(1).first # N+1 but it's fast
|
||
|
created_at_values << created_at
|
||
|
id_values << id
|
||
|
end
|
||
|
```
|
||
|
|
||
|
This is what the result set would look like:
|
||
|
|
||
|
| `project_ids` | `created_at_values` | `id_values` |
|
||
|
| ------------- | ------------------- | ----------- |
|
||
|
| 2 | 2020-01-10 | 5 |
|
||
|
| 5 | 2020-01-05 | 4 |
|
||
|
| 10 | 2020-01-15 | 7 |
|
||
|
| 9 | 2020-01-05 | 3 |
|
||
|
|
||
|
The table shows the cursor values (`created_at, id`) of the first record for each project
|
||
|
respecting the `ORDER BY` clause.
|
||
|
|
||
|
At this point, we have the initial data. To start collecting the actual records from the database,
|
||
|
we'll use a recursive CTE query where each recursion locates one row until
|
||
|
the `LIMIT` is reached or no more data can be found.
|
||
|
|
||
|
Here's an outline of the steps we will take in the recursive CTE query
|
||
|
(expressing the steps in SQL is non-trivial but will be explained next):
|
||
|
|
||
|
1. Sort the initial resultset according to the `ORDER BY` clause.
|
||
|
1. Pick the top cursor to fetch the record, this is our first record. In the example,
|
||
|
this cursor would be (`2020-01-05`, `3`) for `project_id=9`.
|
||
|
1. We can use (`2020-01-05`, `3`) to fetch the next issue respecting the `ORDER BY` clause
|
||
|
`project_id=9` filter. This produces an updated resultset.
|
||
|
|
||
|
| `project_ids` | `created_at_values` | `id_values` |
|
||
|
| ------------- | ------------------- | ----------- |
|
||
|
| 2 | 2020-01-10 | 5 |
|
||
|
| 5 | 2020-01-05 | 4 |
|
||
|
| 10 | 2020-01-15 | 7 |
|
||
|
| **9** | **2020-01-06** | **6** |
|
||
|
|
||
|
1. Repeat 1 to 3 with the updated resultset until we have fetched `N=20` records.
|
||
|
|
||
|
### Initializing the recursive CTE query
|
||
|
|
||
|
For the initial recursive query, we'll need to produce exactly one row, we call this the
|
||
|
initializer query (`initializer_query`).
|
||
|
|
||
|
Use `ARRAY_AGG` function to compact the initial result set into a single row
|
||
|
and use the row as the initial value for the recursive CTE query:
|
||
|
|
||
|
Example initializer row:
|
||
|
|
||
|
| `records` | `project_ids` | `created_at_values` | `id_values` | `Count` | `Position` |
|
||
|
| -------------- | --------------- | ------------------- | ----------- | ------- | ---------- |
|
||
|
| `NULL::issues` | `[9, 2, 5, 10]` | `[...]` | `[...]` | `0` | `NULL` |
|
||
|
|
||
|
- The `records` column contains our sorted database records, and the initializer query sets the
|
||
|
first value to `NULL`, which is filtered out later.
|
||
|
- The `count` column tracks the number of records found. We use this column to filter out the
|
||
|
initializer row from the result set.
|
||
|
|
||
|
### Recursive portion of the CTE query
|
||
|
|
||
|
The result row is produced with the following steps:
|
||
|
|
||
|
1. [Order the keyset arrays.](#order-the-keyset-arrays)
|
||
|
1. [Find the next cursor.](#find-the-next-cursor)
|
||
|
1. [Produce a new row.](#produce-a-new-row)
|
||
|
|
||
|
#### Order the keyset arrays
|
||
|
|
||
|
Order the keyset arrays according to the original `ORDER BY` clause with `LIMIT 1` using the
|
||
|
`UNNEST [] WITH ORDINALITY` table function. The function locates the "lowest" keyset cursor
|
||
|
values and gives us the array position. These cursor values are used to locate the record.
|
||
|
|
||
|
NOTE:
|
||
|
At this point, we haven't read anything from the database tables, because we relied on
|
||
|
fast index-only scans.
|
||
|
|
||
|
| `project_ids` | `created_at_values` | `id_values` |
|
||
|
| ------------- | ------------------- | ----------- |
|
||
|
| 2 | 2020-01-10 | 5 |
|
||
|
| 5 | 2020-01-05 | 4 |
|
||
|
| 10 | 2020-01-15 | 7 |
|
||
|
| 9 | 2020-01-05 | 3 |
|
||
|
|
||
|
The first row is the 4th one (`position = 4`), because it has the lowest `created_at` and
|
||
|
`id` values. The `UNNEST` function also exposes the position using an extra column (note:
|
||
|
PostgreSQL uses 1-based index).
|
||
|
|
||
|
Demonstration of the `UNNEST [] WITH ORDINALITY` table function:
|
||
|
|
||
|
```sql
|
||
|
SELECT position FROM unnest('{2020-01-10, 2020-01-05, 2020-01-15, 2020-01-05}'::timestamp[], '{5, 4, 7, 3}'::int[])
|
||
|
WITH ORDINALITY AS t(created_at, id, position) ORDER BY created_at ASC, id ASC LIMIT 1;
|
||
|
```
|
||
|
|
||
|
Result:
|
||
|
|
||
|
```sql
|
||
|
position
|
||
|
----------
|
||
|
4
|
||
|
(1 row)
|
||
|
```
|
||
|
|
||
|
#### Find the next cursor
|
||
|
|
||
|
Now, let's find the next cursor values (`next_cursor_values_query`) for the project with `id = 9`.
|
||
|
To do that, we build a keyset pagination SQL query. Find the next row after
|
||
|
`created_at = 2020-01-05` and `id = 3`. Because we order by two database columns, there can be two
|
||
|
cases:
|
||
|
|
||
|
- There are rows with `created_at = 2020-01-05` and `id > 3`.
|
||
|
- There are rows with `created_at > 2020-01-05`.
|
||
|
|
||
|
Generating this query is done by the generic keyset pagination library. After the query is done,
|
||
|
we have a temporary table with the next cursor values:
|
||
|
|
||
|
| `created_at` | ID |
|
||
|
| ------------ | --- |
|
||
|
| 2020-01-06 | 6 |
|
||
|
|
||
|
#### Produce a new row
|
||
|
|
||
|
As the final step, we need to produce a new row by manipulating the initializer row
|
||
|
(`data_collector_query` method). Two things happen here:
|
||
|
|
||
|
- Read the full row from the DB and return it in the `records` columns. (`result_collector_columns`
|
||
|
method)
|
||
|
- Replace the cursor values at the current position with the results from the keyset query.
|
||
|
|
||
|
Reading the full row from the database is only one index scan by the primary key. We use the
|
||
|
ActiveRecord query passed as the `finder_query`:
|
||
|
|
||
|
```sql
|
||
|
(SELECT "issues".* FROM issues WHERE id = id_values[position] LIMIT 1)
|
||
|
```
|
||
|
|
||
|
By adding parentheses, the result row can be put into the `records` column.
|
||
|
|
||
|
Replacing the cursor values at `position` can be done via standard PostgreSQL array operators:
|
||
|
|
||
|
```sql
|
||
|
-- created_at_values column value
|
||
|
created_at_values[:position-1]||next_cursor_values.created_at||created_at_values[position+1:]
|
||
|
|
||
|
-- id_values column value
|
||
|
id_values[:position-1]||next_cursor_values.id||id_values[position+1:]
|
||
|
```
|
||
|
|
||
|
The Ruby equivalent would be the following:
|
||
|
|
||
|
```ruby
|
||
|
id_values[0..(position - 1)] + [next_cursor_values.id] + id_values[(position + 1)..-1]
|
||
|
```
|
||
|
|
||
|
After this, the recursion starts again by finding the next lowest cursor value.
|
||
|
|
||
|
### Finalizing the query
|
||
|
|
||
|
For producing the final `issues` rows, we're going to wrap the query with another `SELECT` statement:
|
||
|
|
||
|
```sql
|
||
|
SELECT "issues".*
|
||
|
FROM (
|
||
|
SELECT (records).* -- similar to ruby splat operator
|
||
|
FROM recursive_keyset_cte
|
||
|
WHERE recursive_keyset_cte.count <> 0 -- filter out the initializer row
|
||
|
) AS issues
|
||
|
```
|
||
|
|
||
|
### Performance comparison
|
||
|
|
||
|
Assuming that we have the correct database index in place, we can compare the query performance by
|
||
|
looking at the number of database rows accessed by the query.
|
||
|
|
||
|
- Number of groups: 100
|
||
|
- Number of projects: 500
|
||
|
- Number of issues (in the group hierarchy): 50 000
|
||
|
|
||
|
Standard `IN` query:
|
||
|
|
||
|
| Query | Entries read from index | Rows read from the table | Rows sorted in memory |
|
||
|
| ------------------------ | ----------------------- | ------------------------ | --------------------- |
|
||
|
| group hierarchy subquery | 100 | 0 | 0 |
|
||
|
| project lookup query | 500 | 0 | 0 |
|
||
|
| issue lookup query | 50 000 | 20 | 50 000 |
|
||
|
|
||
|
Optimized `IN` query:
|
||
|
|
||
|
| Query | Entries read from index | Rows read from the table | Rows sorted in memory |
|
||
|
| ------------------------ | ----------------------- | ------------------------ | --------------------- |
|
||
|
| group hierarchy subquery | 100 | 0 | 0 |
|
||
|
| project lookup query | 500 | 0 | 0 |
|
||
|
| issue lookup query | 519 | 20 | 10 000 |
|
||
|
|
||
|
The group and project queries are not using sorting, the necessary columns are read from database
|
||
|
indexes. These values are accessed frequently so it's very likely that most of the data will be
|
||
|
in the PostgreSQL's buffer cache.
|
||
|
|
||
|
The optimized `IN` query will read maximum 519 entries (cursor values) from the index:
|
||
|
|
||
|
- 500 index-only scans for populating the arrays for each project. The cursor values of the first
|
||
|
record will be here.
|
||
|
- Maximum 19 additional index-only scans for the consecutive records.
|
||
|
|
||
|
The optimized `IN` query will sort the array (cursor values per project array) 20 times, which
|
||
|
means we'll sort 20 x 500 rows. However, this might be a less memory-intensive task than
|
||
|
sorting 10 000 rows at once.
|
||
|
|
||
|
Performance comparison for the `gitlab-org` group:
|
||
|
|
||
|
| Query | Number of 8K Buffers involved | Uncached execution time | Cached execution time |
|
||
|
| -------------------- | ----------------------------- | ----------------------- | --------------------- |
|
||
|
| `IN` query | 240833 | 1.2s | 660ms |
|
||
|
| Optimized `IN` query | 9783 | 450ms | 22ms |
|
||
|
|
||
|
NOTE:
|
||
|
Before taking measurements, the group lookup query was executed separately in order to make
|
||
|
the group data available in the buffer cache. Since it's a frequently called query, it's going to
|
||
|
hit many shared buffers during the query execution in the production environment.
|