info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Adding Database Indexes
Indexes can be used to speed up database queries, but when should you add a new
index? Traditionally the answer to this question has been to add an index for
every column used for filtering or joining data. For example, consider the
following query:
```sql
SELECT *
FROM projects
WHERE user_id = 2;
```
Here we are filtering by the `user_id` column and as such a developer may decide
to index this column.
While in certain cases indexing columns using the above approach may make sense,
it can actually have a negative impact. Whenever you write data to a table, any
existing indexes must also be updated. The more indexes there are, the slower this
can potentially become. Indexes can also take up significant disk space, depending
on the amount of data indexed and the index type. For example, PostgreSQL offers
`GIN` indexes which can be used to index certain data types that cannot be
indexed by regular B-tree indexes. These indexes, however, generally take up more
data and are slower to update compared to B-tree indexes.
Because of all this, it's important make the following considerations
when adding a new index:
1. Do the new queries re-use as many existing indexes as possible?
1. Is there enough data that using an index is faster than iterating over
rows in the table?
1. Is the overhead of maintaining the index worth the reduction in query
timings?
## Re-using Queries
The first step is to make sure your query re-uses as many existing indexes as
possible. For example, consider the following query:
```sql
SELECT *
FROM todos
WHERE user_id = 123
AND state = 'open';
```
Now imagine we already have an index on the `user_id` column but not on the
`state` column. One may think this query performs badly due to `state` being
unindexed. In reality the query may perform just fine given the index on
`user_id` can filter out enough rows.
The best way to determine if indexes are re-used is to run your query using
`EXPLAIN ANALYZE`. Depending on the joined tables and the columns being used for filtering,
you may find an extra index doesn't make much, if any, difference.
In short:
1. Try to write your query in such a way that it re-uses as many existing
indexes as possible.
1. Run the query using `EXPLAIN ANALYZE` and study the output to find the most
ideal query.
## Data Size
A database may not use an index even when a regular sequence scan
(iterating over all rows) is faster, especially for small tables.
Consider adding an index if a table is expected to grow, and your query has to filter a lot of rows.
You may _not_ want to add an index if the table size is small (<`1,000` records),
or if existing indexes already filter out enough rows.
## Maintenance Overhead
Indexes have to be updated on every table write. In the case of PostgreSQL, _all_
existing indexes are updated whenever data is written to a table. As a
result, having many indexes on the same table slows down writes. It's therefore important
to balance query performance with the overhead of maintaining an extra index.
Let's say that adding an index reduces SELECT timings by 5 milliseconds but increases
INSERT/UPDATE/DELETE timings by 10 milliseconds. In this case, the new index may not be worth
it. A new index is more valuable when SELECT timings are reduced and INSERT/UPDATE/DELETE
timings are unaffected.
## Finding Unused Indexes
To see which indexes are unused you can run the following query:
```sql
SELECT relname as table_name, indexrelname as index_name, idx_scan, idx_tup_read, idx_tup_fetch, pg_size_pretty(pg_relation_size(indexrelname::regclass))
FROM pg_stat_all_indexes
WHERE schemaname = 'public'
AND idx_scan = 0
AND idx_tup_read = 0
AND idx_tup_fetch = 0
ORDER BY pg_relation_size(indexrelname::regclass) desc;
```
This query outputs a list containing all indexes that are never used and sorts
them by indexes sizes in descending order. This query helps in
determining whether existing indexes are still required. More information on
the meaning of the various columns can be found at
- Use a meta-command in #database-lab, such as: `\d <index_name>`.
- Ensure that the index is not [`invalid`](https://www.postgresql.org/docs/12/sql-createindex.html#:~:text=The%20psql%20%5Cd%20command%20will%20report%20such%20an%20index%20as%20INVALID).
- Ask someone in #database to check if the index exists.
- With proper access, you can also verify directly on production or in a
production clone.
### Add a migration to create the index synchronously
After the index is verified to exist on the production database, create a second
merge request that adds the index synchronously. The schema changes must be
updated and committed to `structure.sql` in this second merge request.
The synchronous migration results in a no-op on GitLab.com, but you should still add the
migration as expected for other installations. The below block
demonstrates how to create the second migration for the previous
asynchronous example.
**WARNING:**
Verify that the index exists in production before merging a second migration with `add_concurrent_index`.
If the second migration is deployed before the index has been created,
the index is created synchronously when the second migration executes.
You must test the database index changes locally before creating a merge request.
### Verify indexes created asynchronously
Use the asynchronous index helpers on your local environment to test changes for creating an index:
1. Enable the feature flags by running `Feature.enable(:database_async_index_creation)` and `Feature.enable(:database_reindexing)` in the Rails console.
1. Run `bundle exec rails db:migrate` so that it creates an entry in the `postgres_async_indexes` table.
1. Run `bundle exec rails gitlab:db:reindex` so that the index is created asynchronously.
1. To verify the index, open the PostgreSQL console using the [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/postgresql.md) command `gdk psql` and run the command `\d <index_name>` to check that your newly created index exists.
## Drop indexes asynchronously
For very large tables, index destruction can be a challenge to manage.
While `remove_concurrent_index` removes indexes in a way that does not block
normal traffic, it can still be problematic if index destruction runs for
during autovacuum. Necessary database operations like `autovacuum` cannot run, and
the deployment process on GitLab.com is blocked while waiting for index
destruction to finish.
To limit the impact on GitLab.com, use the following process to remove indexes
asynchronously during weekend hours. Due to generally lower traffic and fewer deployments,
index destruction can proceed at a lower level of risk.
1. [Schedule the index to be removed](#schedule-the-index-to-be-removed).
1. Run `bundle exec rails db:migrate` which should create an entry in the `postgres_async_indexes` table.
1. Run `bundle exec rails gitlab:db:reindex` destroy the index asynchronously.
1. To verify the index, open the PostgreSQL console by using the [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/postgresql.md)
command `gdk psql` and run `\d <index_name>` to check that the destroyed index no longer exists.