391 lines
15 KiB
Markdown
391 lines
15 KiB
Markdown
|
---
|
||
|
stage: Enablement
|
||
|
group: Database
|
||
|
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
|
||
|
---
|
||
|
|
||
|
# Migrations for Multiple databases
|
||
|
|
||
|
> Support for describing migration purposes was [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/73756) in GitLab 14.8.
|
||
|
|
||
|
This document describes how to properly write database migrations
|
||
|
for [the decomposed GitLab application using multiple databases](https://gitlab.com/groups/gitlab-org/-/epics/6168).
|
||
|
|
||
|
Learn more about general multiple databases support in a [separate document](multiple_databases.md).
|
||
|
|
||
|
WARNING:
|
||
|
If you experience any issues using `Gitlab::Database::Migration[2.0]`,
|
||
|
you can temporarily revert back to the previous behavior by changing the version to `Gitlab::Database::Migration[1.0]`.
|
||
|
Please report any issues with `Gitlab::Database::Migration[2.0]` in [this issue](https://gitlab.com/gitlab-org/gitlab/-/issues/358430).
|
||
|
|
||
|
The design for multiple databases (except for the Geo database) assumes
|
||
|
that all decomposed databases have **the same structure** (for example, schema), but **the data is different** in each database. This means that some tables do not contain data on each database.
|
||
|
|
||
|
## Operations
|
||
|
|
||
|
Depending on the used constructs, we can classify migrations to be either:
|
||
|
|
||
|
1. Modifying structure ([DDL - Data Definition Language](https://www.postgresql.org/docs/current/ddl.html)) (for example, `ALTER TABLE`).
|
||
|
1. Modifying data ([DML - Data Manipulation Language](https://www.postgresql.org/docs/current/dml.html)) (for example, `UPDATE`).
|
||
|
1. Performing [other queries](https://www.postgresql.org/docs/current/queries.html) (for example, `SELECT`) that are treated as **DML** for the purposes of our migrations.
|
||
|
|
||
|
**The usage of `Gitlab::Database::Migration[2.0]` requires migrations to always be of a single purpose**.
|
||
|
Migrations cannot mix **DDL** and **DML** changes as the application requires the structure
|
||
|
(as described by `db/structure.sql`) to be exactly the same across all decomposed databases.
|
||
|
|
||
|
### Data Definition Language (DDL)
|
||
|
|
||
|
The DDL migrations are all migrations that:
|
||
|
|
||
|
1. Create or drop a table (for example, `create_table`).
|
||
|
1. Add or remove an index (for example, `add_index`, `add_index_concurrently`).
|
||
|
1. Add or remove a foreign key (for example `add_foreign_key`, `add_foreign_key_concurrently`).
|
||
|
1. Add or remove a column with or without a default value (for example, `add_column`).
|
||
|
1. Create or drop trigger functions (for example, `create_trigger_function`).
|
||
|
1. Attach or detach triggers from tables (for example, `track_record_deletions`, `untrack_record_deletions`).
|
||
|
1. Prepare or not async indexes (for example, `prepare_async_index`, `unprepare_async_index_by_name`).
|
||
|
|
||
|
As such DDL migrations **CANNOT**:
|
||
|
|
||
|
1. Read or modify data in any form, via SQL statements or ActiveRecord models.
|
||
|
1. Update column values (for example, `update_column_in_batches`).
|
||
|
1. Schedule background migrations (for example, `queue_background_migration_jobs_by_range_at_intervals`).
|
||
|
1. Read the state of feature flags since they are stored in `main:` (a `features` and `feature_gates`).
|
||
|
1. Read application settings (as settings are stored in `main:`).
|
||
|
|
||
|
As the majority of migrations in the GitLab codebase are of the DDL-type,
|
||
|
this is also the default mode of operation and requires no further changes
|
||
|
to the migrations files.
|
||
|
|
||
|
#### Example: perform DDL on all databases
|
||
|
|
||
|
Example migration adding a concurrent index that is treated as change of the structure (DDL)
|
||
|
that is executed on all configured databases.
|
||
|
|
||
|
```ruby
|
||
|
class AddUserIdAndStateIndexToMergeRequestReviewers < Gitlab::Database::Migration[2.0]
|
||
|
disable_ddl_transaction!
|
||
|
|
||
|
INDEX_NAME = 'index_on_merge_request_reviewers_user_id_and_state'
|
||
|
|
||
|
def up
|
||
|
add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: INDEX_NAME
|
||
|
end
|
||
|
|
||
|
def down
|
||
|
remove_concurrent_index_by_name :merge_request_reviewers, INDEX_NAME
|
||
|
end
|
||
|
end
|
||
|
```
|
||
|
|
||
|
### Data Manipulation Language (DML)
|
||
|
|
||
|
The DML migrations are all migrations that:
|
||
|
|
||
|
1. Read data via SQL statements (for example, `SELECT * FROM projects WHERE id=1`).
|
||
|
1. Read data via ActiveRecord models (for example, `User < MigrationRecord`).
|
||
|
1. Create, update or delete data via ActiveRecord models (for example, `User.create!(...)`).
|
||
|
1. Create, update or delete data via SQL statements (for example, `DELETE FROM projects WHERE id=1`).
|
||
|
1. Update columns in batches (for example, `update_column_in_batches(:projects, :archived, true)`).
|
||
|
1. Schedule background migrations (for example, `queue_background_migration_jobs_by_range_at_intervals`).
|
||
|
1. Access application settings (for example, `ApplicationSetting.last` if run for `main:` database).
|
||
|
1. Read and modify feature flags if run for the `main:` database.
|
||
|
|
||
|
The DML migrations **CANNOT**:
|
||
|
|
||
|
1. Make any changes to DDL since this breaks the rule of keeping `structure.sql` coherent across
|
||
|
all decomposed databases.
|
||
|
1. **Read data from another database**.
|
||
|
|
||
|
To indicate the `DML` migration type, a migration must use the `restrict_gitlab_migration gitlab_schema:`
|
||
|
syntax in a migration class. This marks the given migration as DML and restricts access to it.
|
||
|
|
||
|
#### Example: perform DML only in context of the database containing the given `gitlab_schema`
|
||
|
|
||
|
Example migration updating `archived` column of `projects` that is executed
|
||
|
only for the database containing `gitlab_main` schema.
|
||
|
|
||
|
```ruby
|
||
|
class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
|
||
|
disable_ddl_transaction!
|
||
|
|
||
|
restrict_gitlab_migration gitlab_schema: :gitlab_main
|
||
|
|
||
|
def up
|
||
|
update_column_in_batches(:projects, :archived, true) do |table, query|
|
||
|
query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
|
||
|
end
|
||
|
end
|
||
|
|
||
|
def down
|
||
|
# no-op
|
||
|
end
|
||
|
end
|
||
|
```
|
||
|
|
||
|
#### Example: usage of `ActiveRecord` classes
|
||
|
|
||
|
A migration using `ActiveRecord` class to perform data manipulation
|
||
|
must use the `MigrationRecord` class. This class is guaranteed to provide
|
||
|
a correct connection in a context of a given migration.
|
||
|
|
||
|
Underneath the `MigrationRecord == ActiveRecord::Base`, as once the `db:migrate`
|
||
|
runs, it switches the active connection of `ActiveRecord::Base.establish_connection :ci`.
|
||
|
To avoid confusion to using the `ActiveRecord::Base`, `MigrationRecord` is required.
|
||
|
|
||
|
This implies that DML migrations are forbidden to read data from other
|
||
|
databases. For example, running migration in context of `ci:` and reading feature flags
|
||
|
from `main:`, as no established connection to another database is present.
|
||
|
|
||
|
```ruby
|
||
|
class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
|
||
|
disable_ddl_transaction!
|
||
|
|
||
|
restrict_gitlab_migration gitlab_schema: :gitlab_main
|
||
|
|
||
|
class Project < MigrationRecord
|
||
|
end
|
||
|
|
||
|
def up
|
||
|
Project.where(archived: false).each_batch of |batch|
|
||
|
batch.update_all(archived: true)
|
||
|
end
|
||
|
end
|
||
|
|
||
|
def down
|
||
|
end
|
||
|
end
|
||
|
```
|
||
|
|
||
|
### The special purpose of `gitlab_shared`
|
||
|
|
||
|
As described in [gitlab_schema](multiple_databases.md#the-special-purpose-of-gitlab_shared),
|
||
|
the `gitlab_shared` tables are allowed to contain data across all databases. This implies
|
||
|
that such migrations should run across all databases to modify structure (DDL) or modify data (DML).
|
||
|
|
||
|
As such migrations accessing `gitlab_shared` do not need to use `restrict_gitlab_migration gitlab_schema:`,
|
||
|
migrations without restriction run across all databases and are allowed to modify data on each of them.
|
||
|
If the `restrict_gitlab_migration gitlab_schema:` is specified, the `DML` migration
|
||
|
runs only in a context of a database containing the given `gitlab_schema`.
|
||
|
|
||
|
#### Example: run DML `gitlab_shared` migration on all databases
|
||
|
|
||
|
Example migration updating `loose_foreign_keys_deleted_records` table
|
||
|
that is marked in `lib/gitlab/database/gitlab_schemas.yml` as `gitlab_shared`.
|
||
|
|
||
|
This migration is executed across all configured databases.
|
||
|
|
||
|
```ruby
|
||
|
class DeleteAllLooseForeignKeyRecords < Gitlab::Database::Migration[2.0]
|
||
|
disable_ddl_transaction!
|
||
|
|
||
|
def up
|
||
|
execute("DELETE FROM loose_foreign_keys_deleted_records")
|
||
|
end
|
||
|
|
||
|
def down
|
||
|
# no-op
|
||
|
end
|
||
|
end
|
||
|
```
|
||
|
|
||
|
#### Example: run DML `gitlab_shared` only on the database containing the given `gitlab_schema`
|
||
|
|
||
|
Example migration updating `loose_foreign_keys_deleted_records` table
|
||
|
that is marked in `lib/gitlab/database/gitlab_schemas.yml` as `gitlab_shared`.
|
||
|
|
||
|
This migration since it configures restriction on `gitlab_ci` is executed only
|
||
|
in context of database containing `gitlab_ci` schema.
|
||
|
|
||
|
```ruby
|
||
|
class DeleteCiBuildsLooseForeignKeyRecords < Gitlab::Database::Migration[2.0]
|
||
|
disable_ddl_transaction!
|
||
|
|
||
|
restrict_gitlab_migration gitlab_schema: :gitlab_ci
|
||
|
|
||
|
def up
|
||
|
execute("DELETE FROM loose_foreign_keys_deleted_records WHERE fully_qualified_table_name='ci_builds'")
|
||
|
end
|
||
|
|
||
|
def down
|
||
|
# no-op
|
||
|
end
|
||
|
end
|
||
|
```
|
||
|
|
||
|
### The behavior of skipping migrations
|
||
|
|
||
|
The only migrations that are skipped are the ones performing **DML** changes.
|
||
|
The **DDL** migrations are **always and unconditionally** executed.
|
||
|
|
||
|
The implemented [solution](https://gitlab.com/gitlab-org/gitlab/-/issues/355014#solution-2-use-database_tasks)
|
||
|
uses the `database_tasks:` as a way to indicate which additional database configurations
|
||
|
(in `config/database.yml`) share the same primary database. The database configurations
|
||
|
marked with `database_tasks: false` are exempt from executing `db:migrate` for those
|
||
|
database configurations.
|
||
|
|
||
|
If database configurations do not share databases (all do have `database_tasks: true`),
|
||
|
each migration runs for every database configuration:
|
||
|
|
||
|
1. The DDL migration applies all structure changes on all databases.
|
||
|
1. The DML migration runs only in the context of a database containing the given `gitlab_schema:`.
|
||
|
1. If the DML migration is not eligible to run, it is skipped. It's still
|
||
|
marked as executed in `schema_migrations`. While running `db:migrate`, the skipped
|
||
|
migration outputs `Current migration is skipped since it modifies 'gitlab_ci' which is outside of 'gitlab_main, gitlab_shared`.
|
||
|
|
||
|
To prevent loss of migrations if the `database_tasks: false` is configured, a dedicated
|
||
|
Rake task is used [`gitlab:db:validate_config`](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/83118).
|
||
|
The `gitlab:db:validate_config` validates the correctness of `database_tasks:` by checking database identifiers
|
||
|
of each underlying database configuration. The ones that share the database are required to have
|
||
|
the `database_tasks: false` set. `gitlab:db:validate_config` always runs before `db:migrate`.
|
||
|
|
||
|
## Validation
|
||
|
|
||
|
Validation in a nutshell uses [pg_query](https://github.com/pganalyze/pg_query) to analyze
|
||
|
each query and classify tables with information from [`gitlab_schema.yml`](multiple_databases.md#gitlab-schema).
|
||
|
The migration is skipped if the specified `gitlab_schema` is outside of a list of schemas
|
||
|
managed by a given database connection (`Gitlab::Database::gitlab_schemas_for_connection`).
|
||
|
|
||
|
The `Gitlab::Database::Migration[2.0]` includes `Gitlab::Database::MigrationHelpers::RestrictGitlabSchema`
|
||
|
which extends the `#migrate` method. For the duration of a migration a dedicated query analyzer
|
||
|
is installed `Gitlab::Database::QueryAnalyzers::RestrictAllowedSchemas` that accepts
|
||
|
a list of allowed schemas as defined by `restrict_gitlab_migration:`. If the executed query
|
||
|
is outside of allowed schemas, it raises an exception.
|
||
|
|
||
|
## Exceptions
|
||
|
|
||
|
Depending on misuse or lack of `restrict_gitlab_migration` various exceptions can be raised
|
||
|
as part of the migration run and prevent the migration from being completed.
|
||
|
|
||
|
### Exception 1: migration running in DDL mode does DML select
|
||
|
|
||
|
```ruby
|
||
|
class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
|
||
|
disable_ddl_transaction!
|
||
|
|
||
|
# Missing:
|
||
|
# restrict_gitlab_migration gitlab_schema: :gitlab_main
|
||
|
|
||
|
def up
|
||
|
update_column_in_batches(:projects, :archived, true) do |table, query|
|
||
|
query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
|
||
|
end
|
||
|
end
|
||
|
|
||
|
def down
|
||
|
# no-op
|
||
|
end
|
||
|
end
|
||
|
```
|
||
|
|
||
|
```plaintext
|
||
|
Select/DML queries (SELECT/UPDATE/DELETE) are disallowed in the DDL (structure) mode
|
||
|
Modifying of 'projects' (gitlab_main) with 'SELECT * FROM projects...
|
||
|
```
|
||
|
|
||
|
The current migration do not use `restrict_gitlab_migration`. The lack indicates a migration
|
||
|
running in **DDL** mode, but the executed payload appears to be reading data from `projects`.
|
||
|
|
||
|
**The solution** is to add `restrict_gitlab_migration gitlab_schema: :gitlab_main`.
|
||
|
|
||
|
### Exception 2: migration running in DML mode changes the structure
|
||
|
|
||
|
```ruby
|
||
|
class AddUserIdAndStateIndexToMergeRequestReviewers < Gitlab::Database::Migration[2.0]
|
||
|
disable_ddl_transaction!
|
||
|
|
||
|
# restrict_gitlab_migration if defined indicates DML, it should be removed
|
||
|
restrict_gitlab_migration gitlab_schema: :gitlab_main
|
||
|
|
||
|
INDEX_NAME = 'index_on_merge_request_reviewers_user_id_and_state'
|
||
|
|
||
|
def up
|
||
|
add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: INDEX_NAME
|
||
|
end
|
||
|
|
||
|
def down
|
||
|
remove_concurrent_index_by_name :merge_request_reviewers, INDEX_NAME
|
||
|
end
|
||
|
end
|
||
|
```
|
||
|
|
||
|
```plaintext
|
||
|
DDL queries (structure) are disallowed in the Select/DML (SELECT/UPDATE/DELETE) mode.
|
||
|
Modifying of 'merge_request_reviewers' with 'CREATE INDEX...
|
||
|
```
|
||
|
|
||
|
The current migration do use `restrict_gitlab_migration`. The presence indicates **DML** mode,
|
||
|
but the executed payload appears to be doing structure changes (DDL).
|
||
|
|
||
|
**The solution** is to remove `restrict_gitlab_migration gitlab_schema: :gitlab_main`.
|
||
|
|
||
|
### Exception 3: migration running in DML mode accesses data from a table in another schema
|
||
|
|
||
|
```ruby
|
||
|
class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
|
||
|
disable_ddl_transaction!
|
||
|
|
||
|
# Since it modifies `projects` it should use `gitlab_main`
|
||
|
restrict_gitlab_migration gitlab_schema: :gitlab_ci
|
||
|
|
||
|
def up
|
||
|
update_column_in_batches(:projects, :archived, true) do |table, query|
|
||
|
query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
|
||
|
end
|
||
|
end
|
||
|
|
||
|
def down
|
||
|
# no-op
|
||
|
end
|
||
|
end
|
||
|
```
|
||
|
|
||
|
```plaintext
|
||
|
Select/DML queries (SELECT/UPDATE/DELETE) do access 'projects' (gitlab_main) " \
|
||
|
which is outside of list of allowed schemas: 'gitlab_ci'
|
||
|
```
|
||
|
|
||
|
The current migration do restrict the migration to `gitlab_ci`, but appears to modify
|
||
|
data in `gitlab_main`.
|
||
|
|
||
|
**The solution** is to change `restrict_gitlab_migration gitlab_schema: :gitlab_ci`.
|
||
|
|
||
|
### Exception 4: mixing DDL and DML mode
|
||
|
|
||
|
```ruby
|
||
|
class UpdateProjectsArchivedState < Gitlab::Database::Migration[2.0]
|
||
|
disable_ddl_transaction!
|
||
|
|
||
|
# This migration is invalid regardless of specification
|
||
|
# as it cannot modify structure and data at the same time
|
||
|
restrict_gitlab_migration gitlab_schema: :gitlab_ci
|
||
|
|
||
|
def up
|
||
|
add_concurrent_index :merge_request_reviewers, [:user_id, :state], where: 'state = 2', name: 'index_on_merge_request_reviewers'
|
||
|
update_column_in_batches(:projects, :archived, true) do |table, query|
|
||
|
query.where(table[:archived].eq(false)) # rubocop:disable CodeReuse/ActiveRecord
|
||
|
end
|
||
|
end
|
||
|
|
||
|
def down
|
||
|
# no-op
|
||
|
end
|
||
|
end
|
||
|
```
|
||
|
|
||
|
The migrations mixing **DDL** and **DML** depending on ordering of operations raises
|
||
|
one of the prior exceptions.
|
||
|
|
||
|
## Upcoming changes on multiple database migrations
|
||
|
|
||
|
The `restrict_gitlab_migration` using `gitlab_schema:` is considered as a first iteration
|
||
|
of this feature for running migrations selectively depending on a context. It is possible
|
||
|
to add additional restrictions to DML-only migrations (as the structure coherency is likely
|
||
|
to stay as-is until further notice) to restrict when they run.
|
||
|
|
||
|
A Potential extension is to limit running DML migration only to specific environments:
|
||
|
|
||
|
```ruby
|
||
|
restrict_gitlab_migration gitlab_schema: :gitlab_main, gitlab_env: :gitlab_com
|
||
|
```
|