info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Avoiding downtime in migrations
When working with a database certain operations may require downtime. Since we
cannot have downtime in migrations we need to use a set of steps to get the
same end result without downtime. This guide describes various operations that
may appear to need downtime, their impact, and how to perform them without
requiring downtime.
## Dropping Columns
Removing columns is tricky because running GitLab processes may still be using
the columns. To work around this safely, you will need three steps in three releases:
1. Ignoring the column (release M)
1. Dropping the column (release M+1)
1. Removing the ignore rule (release M+2)
The reason we spread this out across three releases is that dropping a column is
a destructive operation that can't be rolled back easily.
Following this procedure helps us to make sure there are no deployments to GitLab.com
and upgrade processes for self-managed installations that lump together any of these steps.
### Step 1: Ignoring the column (release M)
The first step is to ignore the column in the application code. This is
necessary because Rails caches the columns and re-uses this cache in various
places. This can be done by defining the columns to ignore. For example, to ignore
`updated_at` in the User model you'd use the following:
If you're renaming a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3), please carefully consider the state when the first migration has run but the second cleanup migration hasn't been run yet.
With [Canary](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/) it is possible that the system runs in this state for a significant amount of time.
## Changing Column Constraints
Adding or removing a `NOT NULL` clause (or another constraint) can typically be
done without requiring downtime. However, this does require that any application
changes are deployed _first_. Thus, changing the constraints of a column should
happen in a post-deployment migration.
Avoid using `change_column` as it produces an inefficient query because it re-defines
the whole column type.
You can check the following guides for each specific use case:
To [prevent the overflow risk](https://gitlab.com/groups/gitlab-org/-/epics/4785) for some tables
with `integer` primary key (PK), we have to migrate their PK to `bigint`. The process to do this
without downtime and causing too much load on the database is described below.
### Initialize the conversion and start migrating existing data (release N)
To start the process, add a regular migration to create the new `bigint` columns. Use the provided
`initialize_conversion_of_integer_to_bigint` helper. The helper also creates a database trigger
to keep in sync both columns for any new records ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/migrate/20210608072312_initialize_conversion_of_ci_stages_to_bigint.rb)):
```ruby
class InitializeConversionOfCiStagesToBigint <ActiveRecord::Migration[6.1]
To monitor the health of the database, use these additional metrics:
- [PostgreSQL Tuple Statistics](https://dashboards.gitlab.net/d/000000167/postgresql-tuple-statistics?orgId=1&refresh=1m): if you see high rate of updates for the tables being actively converted, or increasing percentage of dead tuples for this table, it might mean that autovacuum cannot keep up.
- [PostgreSQL Overview](https://dashboards.gitlab.net/d/000000144/postgresql-overview?orgId=1): if you see high system usage or transactions per second (TPS) on the primary database server, it might mean that the migration is causing problems.
### Prometheus metrics
Number of [metrics](https://gitlab.com/gitlab-org/gitlab/-/blob/294a92484ce4611f660439aa48eee4dfec2230b5/lib/gitlab/database/background_migration/batched_migration_wrapper.rb#L90-128)
for each batched background migration are published to Prometheus. These metrics can be searched for and
visualized in Thanos ([see an example](https://thanos-query.ops.gitlab.net/graph?g0.expr=sum%20(rate(batched_migration_job_updated_tuples_total%7Benv%3D%22gprd%22%7D%5B5m%5D))%20by%20(migration_id)%20&g0.tab=0&g0.stacked=0&g0.range_input=3d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D&g0.end_input=2021-06-13%2012%3A18%3A24&g0.moment_input=2021-06-13%2012%3A18%3A24)).
### Swap the columns (release N + 1)
After the background is completed and the new `bigint` columns are populated for all records, we can
swap the columns. Swapping is done with post-deployment migration. The exact process depends on the
table being converted, but in general it's done in the following steps:
1. Using the provided `ensure_batched_background_migration_is_finished` helper, make sure the batched
migration has finished ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L13-18)).
If the migration has not completed, the subsequent steps fail anyway. By checking in advance we
aim to have more helpful error message.
1. Create indexes using the `bigint` columns that match the existing indexes using the `integer`
column ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L28-34)).
1. Create foreign keys (FK) using the `bigint` columns that match the existing FKs using the
`integer` column. Do this both for FK referencing other tables, and FKs that reference the table
that is being migrated ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L36-43)).
1. Inside a transaction, swap the columns:
1. Lock the tables involved. To reduce the chance of hitting a deadlock, we recommended to do this in parent to child order ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L47)).
1. Rename the columns to swap names ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L49-54))
1. Reset the trigger function ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L56-57)).
1. Swap the defaults ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L59-62)).
1. Swap the PK constraint (if any) ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L64-68)).
1. Remove old indexes and rename new ones ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L70-72)).
1. Remove old FKs (if still present) and rename new ones ([see an example](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb#L74)).
See example [merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/66088), and [migration](https://gitlab.com/gitlab-org/gitlab/-/blob/41fbe34a4725a4e357a83fda66afb382828767b2/db/post_migrate/20210707210916_finalize_ci_stages_bigint_conversion.rb).
### Remove the trigger and old `integer` columns (release N + 2)
Using post-deployment migration and the provided `cleanup_conversion_of_integer_to_bigint` helper,
drop the database trigger and the old `integer` columns ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/69714)).
### Remove ignore rules (release N + 3)
In the next release after the columns were dropped, remove the ignore rules as we do not need them
anymore ([see an example](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/71161)).