info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
Please check that documentation and determine if that framework suits your needs and fall back
to these only if required.
Background migrations should be used to perform data migrations whenever a
migration exceeds [the time limits in our guidelines](../migration_style_guide.md#how-long-a-migration-should-take). For example, you can use background
migrations to migrate data that's stored in a single JSON column
to a separate table instead.
If the database cluster is considered to be in an unhealthy state, background
migrations automatically reschedule themselves for a later point in time.
## When To Use Background Migrations
You should use a background migration when you migrate _data_ in tables that have
so many rows that the process would exceed [the time limits in our guidelines](../migration_style_guide.md#how-long-a-migration-should-take) if performed using a regular Rails migration.
- Background migrations should be used when migrating data in [high-traffic tables](../migration_style_guide.md#high-traffic-tables).
- Background migrations may also be used when executing numerous single-row queries
for every item on a large dataset. Typically, for single-record patterns, runtime is
largely dependent on the size of the dataset, hence it should be split accordingly
and put into background migrations.
- Background migrations should not be used to perform schema migrations.
Some examples where background migrations can be useful:
- Migrating events from one table to multiple separate tables.
- Populating one column based on JSON stored in another column.
- Migrating data that depends on the output of external services (for example, an API).
NOTE:
If the background migration is part of an important upgrade, make sure it's announced
in the release post. Discuss with your Project Manager if you're not sure the migration falls
into this category.
## Isolation
Background migrations must be isolated and can not use application code (for example,
See [Accessing data for multiple databases of Batched Background Migrations](batched_background_migrations.md#accessing-data-for-multiple-databases) for more details.
Keep in mind that `before` and `after` RSpec hooks are going
to migrate you database down and up, which can result in other background
migrations being called. That means that using `spy` test doubles with
`have_received` is encouraged, instead of using regular test doubles, because
your expectations defined in a `it` block can conflict with what is being
called in RSpec hooks. See [issue #35351](https://gitlab.com/gitlab-org/gitlab/-/issues/18839)
for more details.
## Best practices
1. Make sure to know how much data you're dealing with.
1. Make sure that background migration jobs are idempotent.
1. Make sure that tests you write are not false positives.
1. Make sure that if the data being migrated is critical and cannot be lost, the
clean-up migration also checks the final state of the data before completing.
1. When migrating many columns, make sure it won't generate too many
dead tuples in the process (you may need to directly query the number of dead tuples
and adjust the scheduling according to this piece of data).
1. Make sure to discuss the numbers with a database specialist, the migration may add
more pressure on DB than you expect (measure on staging,
or ask someone to measure on production).
1. Make sure to know how much time it'll take to run all scheduled migrations.
1. Provide an estimation section in the description, estimating both the total migration
run time and the query times for each background migration job. Explain plans for each query
should also be provided.
For example, assuming a migration that deletes data, include information similar to
the following section:
```plaintext
Background Migration Details:
47600 items to delete
batch size = 1000
47600 / 1000 = 48 batches
Estimated times per batch:
- 820ms for select statement with 1000 items (see linked explain plan)
- 900ms for delete statement with 1000 items (see linked explain plan)
Total: ~2 sec per batch
2 mins delay per batch (safe for the given total time per batch)
48 batches * 2 min per batch = 96 mins to run all the scheduled jobs
```
The execution time per batch (2 sec in this example) is not included in the calculation
for total migration time. The jobs are scheduled 2 minutes apart without knowledge of
the execution time.
## Additional tips and strategies
### Nested batching
A strategy to make the migration run faster is to schedule larger batches, and then use `EachBatch`
within the background migration to perform multiple statements.
The background migration helpers that queue multiple jobs such as
`queue_background_migration_jobs_by_range_at_intervals` use [`EachBatch`](../iterating_tables_in_batches.md).
The example above has batches of 1000, where each queued job takes two seconds. If the query has been optimized
to make the time for the delete statement within the [query performance guidelines](../query_performance.md),
1000 may be the largest number of records that can be deleted in a reasonable amount of time.
The minimum and most common interval for delaying jobs is two minutes. This results in two seconds
of work for each two minute job. There's nothing that prevents you from executing multiple delete
statements in each background migration job.
Looking at the example above, you could alternatively do:
```plaintext
Background Migration Details:
47600 items to delete
batch size = 10_000
47600 / 10_000 = 5 batches
Estimated times per batch:
- Records are updated in sub-batches of 1000 => 10_000 / 1000 = 10 total updates
- 820ms for select statement with 1000 items (see linked explain plan)
- 900ms for delete statement with 1000 items (see linked explain plan)
Sub-batch total: ~2 sec per sub-batch,
Total batch time: 2 * 10 = 20 sec per batch
2 mins delay per batch
5 batches * 2 min per batch = 10 mins to run all the scheduled jobs
```
The batch time of 20 seconds still fits comfortably within the two minute delay, yet the total run
time is cut by a tenth from around 100 minutes to 10 minutes! When dealing with large background
migrations, this can cut the total migration time by days.
When batching in this way, it is important to look at query times on the higher end
of the table or relation being updated. `EachBatch` may generate some queries that become much
slower when dealing with higher ID ranges.
### Delay time
When looking at the batch execution time versus the delay time, the execution time
should fit comfortably within the delay time for a few reasons:
- To allow for a variance in query times.
- To allow autovacuum to catch up after periods of high churn.
Never try to optimize by fully filling the delay window even if you are confident
the queries themselves have no timing variance.
### Background jobs tracking
NOTE:
Background migrations with job tracking enabled must call `mark_all_as_succeeded` for its batch, even if no work is needed to be done.
`queue_background_migration_jobs_by_range_at_intervals` can create records for each job that is scheduled to run.
You can enable this behavior by passing `track_jobs: true`. Each record starts with a `pending` status. Make sure that your worker updates the job status to `succeeded` by calling `Gitlab::Database::BackgroundMigrationJob.mark_all_as_succeeded` in the `perform` method of your background migration.
```ruby
# Background migration code
def perform(start_id, end_id)
# do work here
mark_job_as_succeeded(start_id, end_id)
end
private
# Make sure that the arguments passed here match those passed to the background
See [`lib/gitlab/background_migration/drop_invalid_vulnerabilities.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/background_migration/drop_invalid_vulnerabilities.rb) for a full example.
#### Rescheduling pending jobs
You can reschedule pending migrations from the `background_migration_jobs` table by creating a post-deployment migration and calling `requeue_background_migration_jobs_by_range_at_intervals` with the migration name and delay interval.
See [`db/post_migrate/20210604070207_retry_backfill_traversal_ids.rb`](https://gitlab.com/gitlab-org/gitlab/blob/master/db/post_migrate/20210604070207_retry_backfill_traversal_ids.rb) for a full example.
### Viewing failure error logs
After running a background migration, if any jobs have failed, you can view the logs in [Kibana](https://log.gprd.gitlab.net/goto/5f06a57f768c6025e1c65aefb4075694).
Looking at the `json.error_class`, `json.error_message` and `json.error_backtrace` values may be helpful in understanding why the jobs failed.
Depending on when and how the failure occurred, you may find other helpful information by filtering with `json.class: <MyBackgroundMigrationClassName>`.