1061 lines
38 KiB
Markdown
1061 lines
38 KiB
Markdown
---
|
|
stage: none
|
|
group: unassigned
|
|
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
|
|
---
|
|
|
|
# Migration Style Guide
|
|
|
|
When writing migrations for GitLab, you have to take into account that
|
|
these are run by hundreds of thousands of organizations of all sizes, some with
|
|
many years of data in their database.
|
|
|
|
In addition, having to take a server offline for an upgrade small or big is a
|
|
big burden for most organizations. For this reason, it is important that your
|
|
migrations are written carefully, can be applied online, and adhere to the style
|
|
guide below.
|
|
|
|
Migrations are **not** allowed to require GitLab installations to be taken
|
|
offline ever. Migrations always must be written in such a way to avoid
|
|
downtime. In the past we had a process for defining migrations that allowed for
|
|
downtime by setting a `DOWNTIME` constant. You may see this when looking at
|
|
older migrations. This process was in place for 4 years without every being
|
|
used and as such we've learnt we can always figure out how to write a migration
|
|
differently to avoid downtime.
|
|
|
|
When writing your migrations, also consider that databases might have stale data
|
|
or inconsistencies and guard for that. Try to make as few assumptions as
|
|
possible about the state of the database.
|
|
|
|
Please don't depend on GitLab-specific code since it can change in future
|
|
versions. If needed copy-paste GitLab code into the migration to make it forward
|
|
compatible.
|
|
|
|
For GitLab.com, please take into consideration that regular migrations (under `db/migrate`)
|
|
are run before [Canary is deployed](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/#configuration-and-deployment),
|
|
and [post-deployment migrations](post_deployment_migrations.md) (`db/post_migrate`) are run after the deployment to production has finished.
|
|
|
|
## Schema Changes
|
|
|
|
Changes to the schema should be committed to `db/structure.sql`. This
|
|
file is automatically generated by Rails, so you normally should not
|
|
edit this file by hand. If your migration is adding a column to a
|
|
table, that column is added at the bottom. Please do not reorder
|
|
columns manually for existing tables as this causes confusion to
|
|
other people using `db/structure.sql` generated by Rails.
|
|
|
|
When your local database in your GDK is diverging from the schema from
|
|
`main` it might be hard to cleanly commit the schema changes to
|
|
Git. In that case you can use the `scripts/regenerate-schema` script to
|
|
regenerate a clean `db/structure.sql` for the migrations you're
|
|
adding. This script applies all migrations found in `db/migrate`
|
|
or `db/post_migrate`, so if there are any migrations you don't want to
|
|
commit to the schema, rename or remove them. If your branch is not
|
|
targeting `main` you can set the `TARGET` environment variable.
|
|
|
|
```shell
|
|
# Regenerate schema against `main`
|
|
scripts/regenerate-schema
|
|
|
|
# Regenerate schema against `12-9-stable-ee`
|
|
TARGET=12-9-stable-ee scripts/regenerate-schema
|
|
```
|
|
|
|
## Avoiding downtime
|
|
|
|
The document ["Avoiding downtime in migrations"](avoiding_downtime_in_migrations.md) specifies
|
|
various database operations, such as:
|
|
|
|
- [dropping and renaming columns](avoiding_downtime_in_migrations.md#dropping-columns)
|
|
- [changing column constraints and types](avoiding_downtime_in_migrations.md#changing-column-constraints)
|
|
- [adding and dropping indexes, tables, and foreign keys](avoiding_downtime_in_migrations.md#adding-indexes)
|
|
|
|
and explains how to perform them without requiring downtime.
|
|
|
|
## Reversibility
|
|
|
|
Your migration **must be** reversible. This is very important, as it should
|
|
be possible to downgrade in case of a vulnerability or bugs.
|
|
|
|
In your migration, add a comment describing how the reversibility of the
|
|
migration was tested.
|
|
|
|
Some migrations cannot be reversed. For example, some data migrations can't be
|
|
reversed because we lose information about the state of the database before the migration.
|
|
You should still create a `down` method with a comment, explaining why
|
|
the changes performed by the `up` method can't be reversed, so that the
|
|
migration itself can be reversed, even if the changes performed during the migration
|
|
can't be reversed:
|
|
|
|
```ruby
|
|
def down
|
|
# no-op
|
|
|
|
# comment explaining why changes performed by `up` cannot be reversed.
|
|
end
|
|
```
|
|
|
|
## Atomicity
|
|
|
|
By default, migrations are single transaction. That is, a transaction is opened
|
|
at the beginning of the migration, and committed after all steps are processed.
|
|
|
|
Running migrations in a single transaction makes sure that if one of the steps fails,
|
|
none of the steps are executed, leaving the database in valid state.
|
|
Therefore, either:
|
|
|
|
- Put all migrations in one single-transaction migration.
|
|
- If necessary, put most actions in one migration and create a separate migration
|
|
for the steps that cannot be done in a single transaction.
|
|
|
|
For example, if you create an empty table and need to build an index for it,
|
|
it is recommended to use a regular single-transaction migration and the default
|
|
rails schema statement: [`add_index`](https://api.rubyonrails.org/v5.2/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-i-add_index).
|
|
This is a blocking operation, but it doesn't cause problems because the table is not yet used,
|
|
and therefore it does not have any records yet.
|
|
|
|
## Naming conventions
|
|
|
|
We keep column names consistent with [ActiveRecord's schema conventions](https://guides.rubyonrails.org/active_record_basics.html#schema-conventions).
|
|
|
|
Custom index names should follow the pattern `index_#{table_name}_on_#{column_1}_and_#{column_2}_#{condition}`.
|
|
|
|
Examples:
|
|
|
|
- `index_services_on_type_and_id_and_template_when_active`
|
|
- `index_projects_on_id_service_desk_enabled`
|
|
- `index_clusters_on_enabled_cluster_type_id_and_created_at`
|
|
|
|
### Truncate long index names
|
|
|
|
PostgreSQL [limits the length of identifiers](https://www.postgresql.org/docs/current/limits.html),
|
|
like column or index names. Column names are not usually a problem, but index names tend
|
|
to be longer. Some methods for shortening a name that's too long:
|
|
|
|
- Prefix it with `i_` instead of `index_`.
|
|
- Skip redundant prefixes. For example,
|
|
`index_vulnerability_findings_remediations_on_vulnerability_remediation_id` becomes
|
|
`index_vulnerability_findings_remediations_on_remediation_id`.
|
|
- Instead of columns, specify the purpose of the index, such as `index_users_for_unconfirmation_notification`.
|
|
|
|
## Heavy operations in a single transaction
|
|
|
|
When using a single-transaction migration, a transaction holds a database connection
|
|
for the duration of the migration, so you must make sure the actions in the migration
|
|
do not take too much time: GitLab.com's production database has a `15s` timeout, so
|
|
in general, the cumulative execution time in a migration should aim to fit comfortably
|
|
in that limit. Singular query timings should fit within the [standard limit](query_performance.md#timing-guidelines-for-queries)
|
|
|
|
In case you need to insert, update, or delete a significant amount of data, you:
|
|
|
|
- Must disable the single transaction with `disable_ddl_transaction!`.
|
|
- Should consider doing it in a [Background Migration](background_migrations.md).
|
|
|
|
## Retry mechanism when acquiring database locks
|
|
|
|
When changing the database schema, we use helper methods to invoke DDL (Data Definition
|
|
Language) statements. In some cases, these DDL statements require a specific database lock.
|
|
|
|
Example:
|
|
|
|
```ruby
|
|
def change
|
|
remove_column :users, :full_name, :string
|
|
end
|
|
```
|
|
|
|
Executing this migration requires an exclusive lock on the `users` table. When the table
|
|
is concurrently accessed and modified by other processes, acquiring the lock may take
|
|
a while. The lock request is waiting in a queue and it may also block other queries
|
|
on the `users` table once it has been enqueued.
|
|
|
|
More information about PostgresSQL locks: [Explicit Locking](https://www.postgresql.org/docs/current/explicit-locking.html)
|
|
|
|
For stability reasons, GitLab.com has a specific [`statement_timeout`](../user/gitlab_com/index.md#postgresql)
|
|
set. When the migration is invoked, any database query has
|
|
a fixed time to execute. In a worst-case scenario, the request sits in the
|
|
lock queue, blocking other queries for the duration of the configured statement timeout,
|
|
then failing with `canceling statement due to statement timeout` error.
|
|
|
|
This problem could cause failed application upgrade processes and even application
|
|
stability issues, since the table may be inaccessible for a short period of time.
|
|
|
|
To increase the reliability and stability of database migrations, the GitLab codebase
|
|
offers a helper method to retry the operations with different `lock_timeout` settings
|
|
and wait time between the attempts. Multiple smaller attempts to acquire the necessary
|
|
lock allow the database to process other statements.
|
|
|
|
### Examples
|
|
|
|
**Removing a column:**
|
|
|
|
```ruby
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
def up
|
|
with_lock_retries do
|
|
remove_column :users, :full_name
|
|
end
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
add_column :users, :full_name, :string
|
|
end
|
|
end
|
|
```
|
|
|
|
**Removing a foreign key:**
|
|
|
|
```ruby
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
def up
|
|
with_lock_retries do
|
|
remove_foreign_key :issues, :projects
|
|
end
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
add_foreign_key :issues, :projects
|
|
end
|
|
end
|
|
```
|
|
|
|
**Changing default value for a column:**
|
|
|
|
```ruby
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
def up
|
|
with_lock_retries do
|
|
change_column_default :merge_requests, :lock_version, from: nil, to: 0
|
|
end
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
change_column_default :merge_requests, :lock_version, from: 0, to: nil
|
|
end
|
|
end
|
|
```
|
|
|
|
**Creating a new table with a foreign key:**
|
|
|
|
We can wrap the `create_table` method with `with_lock_retries`:
|
|
|
|
```ruby
|
|
def up
|
|
with_lock_retries do
|
|
create_table :issues do |t|
|
|
t.references :project, index: true, null: false, foreign_key: { on_delete: :cascade }
|
|
t.string :title, limit: 255
|
|
end
|
|
end
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
drop_table :issues
|
|
end
|
|
end
|
|
```
|
|
|
|
**Creating a new table when we have two foreign keys:**
|
|
|
|
For this, we need three migrations:
|
|
|
|
1. Creating the table without foreign keys (with the indices).
|
|
1. Add foreign key to the first table.
|
|
1. Add foreign key to the second table.
|
|
|
|
Creating the table:
|
|
|
|
```ruby
|
|
def up
|
|
create_table :imports do |t|
|
|
t.bigint :project_id, null: false
|
|
t.bigint :user_id, null: false
|
|
t.string :jid, limit: 255
|
|
|
|
t.index :project_id
|
|
t.index :user_id
|
|
end
|
|
end
|
|
|
|
def down
|
|
drop_table :imports
|
|
end
|
|
```
|
|
|
|
Adding foreign key to `projects`:
|
|
|
|
We can use the `add_concurrent_foreign_key` method in this case, as this helper method
|
|
has the lock retries built into it.
|
|
|
|
```ruby
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
disable_ddl_transaction!
|
|
|
|
def up
|
|
add_concurrent_foreign_key :imports, :projects, column: :project_id, on_delete: :cascade
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
remove_foreign_key :imports, column: :project_id
|
|
end
|
|
end
|
|
```
|
|
|
|
Adding foreign key to `users`:
|
|
|
|
```ruby
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
disable_ddl_transaction!
|
|
|
|
def up
|
|
add_concurrent_foreign_key :imports, :users, column: :user_id, on_delete: :cascade
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
remove_foreign_key :imports, column: :user_id
|
|
end
|
|
end
|
|
```
|
|
|
|
**Usage with `disable_ddl_transaction!`**
|
|
|
|
Generally the `with_lock_retries` helper should work with `disable_ddl_transaction!`. A custom RuboCop rule ensures that only allowed methods can be placed within the lock retries block.
|
|
|
|
```ruby
|
|
disable_ddl_transaction!
|
|
|
|
def up
|
|
with_lock_retries do
|
|
add_column :users, :name, :text
|
|
end
|
|
|
|
add_text_limit :users, :name, 255 # Includes constraint validation (full table scan)
|
|
end
|
|
```
|
|
|
|
The RuboCop rule generally allows standard Rails migration methods, listed below. This example causes a Rubocop offense:
|
|
|
|
```ruby
|
|
disable_ddl_transaction!
|
|
|
|
def up
|
|
with_lock_retries do
|
|
add_concurrent_index :users, :name
|
|
end
|
|
end
|
|
```
|
|
|
|
### When to use the helper method
|
|
|
|
The `with_lock_retries` helper method can be used when you normally use
|
|
standard Rails migration helper methods. Calling more than one migration
|
|
helper is not a problem if they're executed on the same table.
|
|
|
|
Using the `with_lock_retries` helper method is advised when a database
|
|
migration involves one of the [high-traffic tables](#high-traffic-tables).
|
|
|
|
Example changes:
|
|
|
|
- `add_foreign_key` / `remove_foreign_key`
|
|
- `add_column` / `remove_column`
|
|
- `change_column_default`
|
|
- `create_table` / `drop_table`
|
|
|
|
The `with_lock_retries` method **cannot** be used within the `change` method, you must manually define the `up` and `down` methods to make the migration reversible.
|
|
|
|
### How the helper method works
|
|
|
|
1. Iterate 50 times.
|
|
1. For each iteration, set a pre-configured `lock_timeout`.
|
|
1. Try to execute the given block. (`remove_column`).
|
|
1. If `LockWaitTimeout` error is raised, sleep for the pre-configured `sleep_time`
|
|
and retry the block.
|
|
1. If no error is raised, the current iteration has successfully executed the block.
|
|
|
|
For more information check the [`Gitlab::Database::WithLockRetries`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/with_lock_retries.rb) class. The `with_lock_retries` helper method is implemented in the [`Gitlab::Database::MigrationHelpers`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers.rb) module.
|
|
|
|
In a worst-case scenario, the method:
|
|
|
|
- Executes the block for a maximum of 50 times over 40 minutes.
|
|
- Most of the time is spent in a pre-configured sleep period after each iteration.
|
|
- After the 50th retry, the block is executed without `lock_timeout`, just
|
|
like a standard migration invocation.
|
|
- If a lock cannot be acquired, the migration fails with `statement timeout` error.
|
|
|
|
The migration might fail if there is a very long running transaction (40+ minutes)
|
|
accessing the `users` table.
|
|
|
|
## Multi-Threading
|
|
|
|
Sometimes a migration might need to use multiple Ruby threads to speed up a
|
|
migration. For this to work your migration needs to include the module
|
|
`Gitlab::Database::MultiThreadedMigration`:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
include Gitlab::Database::MigrationHelpers
|
|
include Gitlab::Database::MultiThreadedMigration
|
|
end
|
|
```
|
|
|
|
You can then use the method `with_multiple_threads` to perform work in separate
|
|
threads. For example:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
include Gitlab::Database::MigrationHelpers
|
|
include Gitlab::Database::MultiThreadedMigration
|
|
|
|
def up
|
|
with_multiple_threads(4) do
|
|
disable_statement_timeout
|
|
|
|
# ...
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
Here the call to `disable_statement_timeout` uses the connection local to
|
|
the `with_multiple_threads` block, instead of re-using the global connection
|
|
pool. This ensures each thread has its own connection object, and doesn't time
|
|
out when trying to obtain one.
|
|
|
|
PostgreSQL has a maximum amount of connections that it allows. This
|
|
limit can vary from installation to installation. As a result, it's recommended
|
|
you do not use more than 32 threads in a single migration. Usually, 4-8 threads
|
|
should be more than enough.
|
|
|
|
## Removing indexes
|
|
|
|
If the table is not empty when removing an index, make sure to use the method
|
|
`remove_concurrent_index` instead of the regular `remove_index` method.
|
|
The `remove_concurrent_index` method drops indexes concurrently, so no locking is required,
|
|
and there is no need for downtime. To use this method, you must disable single-transaction mode
|
|
by calling the method `disable_ddl_transaction!` in the body of your migration
|
|
class like so:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
include Gitlab::Database::MigrationHelpers
|
|
disable_ddl_transaction!
|
|
|
|
INDEX_NAME = 'index_name'
|
|
def up
|
|
remove_concurrent_index :table_name, :column_name, name: INDEX_NAME
|
|
end
|
|
end
|
|
```
|
|
|
|
Note that it is not necessary to check if the index exists prior to
|
|
removing it, however it is required to specify the name of the
|
|
index that is being removed. This can be done either by passing the name
|
|
as an option to the appropriate form of `remove_index` or `remove_concurrent_index`,
|
|
or more simply by using the `remove_concurrent_index_by_name` method. Explicitly
|
|
specifying the name is important to ensure the correct index is removed.
|
|
|
|
For a small table (such as an empty one or one with less than `1,000` records),
|
|
it is recommended to use `remove_index` in a single-transaction migration,
|
|
combining it with other operations that don't require `disable_ddl_transaction!`.
|
|
|
|
### Disabling an index
|
|
|
|
There are certain situations in which you might want to disable an index before removing it.
|
|
See the [maintenance operations guide](database/maintenance_operations.md#disabling-an-index)
|
|
for more details.
|
|
|
|
## Adding indexes
|
|
|
|
Before adding an index, consider if this one is necessary. There are situations in which an index
|
|
might not be required, like:
|
|
|
|
- The table is small (less than `1,000` records) and it's not expected to exponentially grow in size.
|
|
- Any existing indexes filter out enough rows.
|
|
- The reduction in query timings after the index is added is not significant.
|
|
|
|
Additionally, wide indexes are not required to match all filter criteria of queries, we just need
|
|
to cover enough columns so that the index lookup has a small enough selectivity. Please review our
|
|
[Adding Database indexes](adding_database_indexes.md) guide for more details.
|
|
|
|
When adding an index to a non-empty table make sure to use the method
|
|
`add_concurrent_index` instead of the regular `add_index` method.
|
|
The `add_concurrent_index` method automatically creates concurrent indexes
|
|
when using PostgreSQL, removing the need for downtime.
|
|
|
|
To use this method, you must disable single-transactions mode
|
|
by calling the method `disable_ddl_transaction!` in the body of your migration
|
|
class like so:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
disable_ddl_transaction!
|
|
|
|
INDEX_NAME = 'index_name'
|
|
|
|
def up
|
|
add_concurrent_index :table, :column, name: INDEX_NAME
|
|
end
|
|
|
|
def down
|
|
remove_concurrent_index :table, :column, name: INDEX_NAME
|
|
end
|
|
end
|
|
```
|
|
|
|
You must explicitly name indexes that are created with more complex
|
|
definitions beyond table name, column name(s) and uniqueness constraint.
|
|
Consult the [Adding Database Indexes](adding_database_indexes.md#requirements-for-naming-indexes)
|
|
guide for more details.
|
|
|
|
If you need to add a unique index, please keep in mind there is the possibility
|
|
of existing duplicates being present in the database. This means that should
|
|
always _first_ add a migration that removes any duplicates, before adding the
|
|
unique index.
|
|
|
|
For a small table (such as an empty one or one with less than `1,000` records),
|
|
it is recommended to use `add_index` in a single-transaction migration, combining it with other
|
|
operations that don't require `disable_ddl_transaction!`.
|
|
|
|
## Testing for existence of indexes
|
|
|
|
If a migration requires conditional logic based on the absence or
|
|
presence of an index, you must test for existence of that index using
|
|
its name. This helps avoids problems with how Rails compares index definitions,
|
|
which can lead to unexpected results. For more details, review the
|
|
[Adding Database Indexes](adding_database_indexes.md#why-explicit-names-are-required)
|
|
guide.
|
|
|
|
The easiest way to test for existence of an index by name is to use the
|
|
`index_name_exists?` method, but the `index_exists?` method can also
|
|
be used with a name option. For example:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
INDEX_NAME = 'index_name'
|
|
|
|
def up
|
|
# an index must be conditionally created due to schema inconsistency
|
|
unless index_exists?(:table_name, :column_name, name: INDEX_NAME)
|
|
add_index :table_name, :column_name, name: INDEX_NAME
|
|
end
|
|
end
|
|
|
|
def down
|
|
# no op
|
|
end
|
|
end
|
|
```
|
|
|
|
Keep in mind that concurrent index helpers like `add_concurrent_index`,
|
|
`remove_concurrent_index`, and `remove_concurrent_index_by_name` already
|
|
perform existence checks internally.
|
|
|
|
## Adding foreign-key constraints
|
|
|
|
When adding a foreign-key constraint to either an existing or a new column also
|
|
remember to add an index on the column.
|
|
|
|
This is **required** for all foreign-keys, e.g., to support efficient cascading
|
|
deleting: when a lot of rows in a table get deleted, the referenced records need
|
|
to be deleted too. The database has to look for corresponding records in the
|
|
referenced table. Without an index, this results in a sequential scan on the
|
|
table, which can take a long time.
|
|
|
|
Here's an example where we add a new column with a foreign key
|
|
constraint. Note it includes `index: true` to create an index for it.
|
|
|
|
```ruby
|
|
class Migration < ActiveRecord::Migration[6.0]
|
|
|
|
def change
|
|
add_reference :model, :other_model, index: true, foreign_key: { on_delete: :cascade }
|
|
end
|
|
end
|
|
```
|
|
|
|
When adding a foreign-key constraint to an existing column in a non-empty table,
|
|
we have to employ `add_concurrent_foreign_key` and `add_concurrent_index`
|
|
instead of `add_reference`.
|
|
|
|
If you have a new or empty table that doesn't reference a
|
|
[high-traffic table](#high-traffic-tables),
|
|
we recommend that you use `add_reference` in a single-transaction migration. You can
|
|
combine it with other operations that don't require `disable_ddl_transaction!`.
|
|
|
|
You can read more about adding [foreign key constraints to an existing column](database/add_foreign_key_to_existing_column.md).
|
|
|
|
## `NOT NULL` constraints
|
|
|
|
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/38358) in GitLab 13.0.
|
|
|
|
See the style guide on [`NOT NULL` constraints](database/not_null_constraints.md) for more information.
|
|
|
|
## Adding Columns With Default Values
|
|
|
|
With PostgreSQL 11 being the minimum version in GitLab 13.0 and later, adding columns with default values has become much easier and
|
|
the standard `add_column` helper should be used in all cases.
|
|
|
|
Before PostgreSQL 11, adding a column with a default was problematic as it would
|
|
have caused a full table rewrite. The corresponding helper `add_column_with_default`
|
|
has been deprecated and is scheduled to be removed in a later release.
|
|
|
|
If a backport adding a column with a default value is needed for %12.9 or earlier versions,
|
|
it should use `add_column_with_default` helper. If a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3)
|
|
is involved, backporting to %12.9 is contraindicated.
|
|
|
|
## Changing the column default
|
|
|
|
One might think that changing a default column with `change_column_default` is an
|
|
expensive and disruptive operation for larger tables, but in reality it's not.
|
|
|
|
Take the following migration as an example:
|
|
|
|
```ruby
|
|
class DefaultRequestAccessGroups < ActiveRecord::Migration[5.2]
|
|
def change
|
|
change_column_default(:namespaces, :request_access_enabled, from: false, to: true)
|
|
end
|
|
end
|
|
```
|
|
|
|
Migration above changes the default column value of one of our largest
|
|
tables: `namespaces`. This can be translated to:
|
|
|
|
```sql
|
|
ALTER TABLE namespaces
|
|
ALTER COLUMN request_access_enabled
|
|
SET DEFAULT false
|
|
```
|
|
|
|
In this particular case, the default value exists and we're just changing the metadata for
|
|
`request_access_enabled` column, which does not imply a rewrite of all the existing records
|
|
in the `namespaces` table. Only when creating a new column with a default, all the records are going be rewritten.
|
|
|
|
NOTE:
|
|
A faster [ALTER TABLE ADD COLUMN with a non-null default](https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/)
|
|
was introduced on PostgresSQL 11.0, removing the need of rewriting the table when a new column with a default value is added.
|
|
|
|
For the reasons mentioned above, it's safe to use `change_column_default` in a single-transaction migration
|
|
without requiring `disable_ddl_transaction!`.
|
|
|
|
## Updating an existing column
|
|
|
|
To update an existing column to a particular value, you can use
|
|
`update_column_in_batches`. This splits the updates into batches, so we
|
|
don't update too many rows at in a single statement.
|
|
|
|
This updates the column `foo` in the `projects` table to 10, where `some_column`
|
|
is `'hello'`:
|
|
|
|
```ruby
|
|
update_column_in_batches(:projects, :foo, 10) do |table, query|
|
|
query.where(table[:some_column].eq('hello'))
|
|
end
|
|
```
|
|
|
|
If a computed update is needed, the value can be wrapped in `Arel.sql`, so Arel
|
|
treats it as an SQL literal. It's also a required deprecation for [Rails 6](https://gitlab.com/gitlab-org/gitlab/-/issues/28497).
|
|
|
|
The below example is the same as the one above, but
|
|
the value is set to the product of the `bar` and `baz` columns:
|
|
|
|
```ruby
|
|
update_value = Arel.sql('bar * baz')
|
|
|
|
update_column_in_batches(:projects, :foo, update_value) do |table, query|
|
|
query.where(table[:some_column].eq('hello'))
|
|
end
|
|
```
|
|
|
|
Like `add_column_with_default`, there is a RuboCop cop to detect usage of this
|
|
on large tables. In the case of `update_column_in_batches`, it may be acceptable
|
|
to run on a large table, as long as it is only updating a small subset of the
|
|
rows in the table, but do not ignore that without validating on the GitLab.com
|
|
staging environment - or asking someone else to do so for you - beforehand.
|
|
|
|
## Dropping a database table
|
|
|
|
Dropping a database table is uncommon, and the `drop_table` method
|
|
provided by Rails is generally considered safe. Before dropping the table,
|
|
please consider the following:
|
|
|
|
If your table has foreign keys on a [high-traffic table](#high-traffic-tables) (like `projects`), then
|
|
the `DROP TABLE` statement is likely to stall concurrent traffic until it fails with **statement timeout** error.
|
|
|
|
Table **has no records** (feature was never in use) and **no foreign
|
|
keys**:
|
|
|
|
- Use the `drop_table` method in your migration.
|
|
|
|
```ruby
|
|
def change
|
|
drop_table :my_table
|
|
end
|
|
```
|
|
|
|
Table **has records** but **no foreign keys**:
|
|
|
|
- First release: Remove the application code related to the table, such as models,
|
|
controllers and services.
|
|
- Second release: Use the `drop_table` method in your migration.
|
|
|
|
```ruby
|
|
def up
|
|
drop_table :my_table
|
|
end
|
|
|
|
def down
|
|
# create_table ...
|
|
end
|
|
```
|
|
|
|
Table **has foreign keys**:
|
|
|
|
- First release: Remove the application code related to the table, such as models,
|
|
controllers, and services.
|
|
- Second release: Remove the foreign keys using the `with_lock_retries`
|
|
helper method. Use `drop_table` in another migration file.
|
|
|
|
**Migrations for the second release:**
|
|
|
|
Removing the foreign key on the `projects` table:
|
|
|
|
```ruby
|
|
# first migration file
|
|
|
|
def up
|
|
with_lock_retries do
|
|
remove_foreign_key :my_table, :projects
|
|
end
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
add_foreign_key :my_table, :projects
|
|
end
|
|
end
|
|
```
|
|
|
|
Dropping the table:
|
|
|
|
```ruby
|
|
# second migration file
|
|
|
|
def up
|
|
drop_table :my_table
|
|
end
|
|
|
|
def down
|
|
# create_table ...
|
|
end
|
|
```
|
|
|
|
## Integer column type
|
|
|
|
By default, an integer column can hold up to a 4-byte (32-bit) number. That is
|
|
a max value of 2,147,483,647. Be aware of this when creating a column that
|
|
holds file sizes in byte units. If you are tracking file size in bytes, this
|
|
restricts the maximum file size to just over 2GB.
|
|
|
|
To allow an integer column to hold up to an 8-byte (64-bit) number, explicitly
|
|
set the limit to 8-bytes. This allows the column to hold a value up to
|
|
`9,223,372,036,854,775,807`.
|
|
|
|
Rails migration example:
|
|
|
|
```ruby
|
|
add_column(:projects, :foo, :integer, default: 10, limit: 8)
|
|
```
|
|
|
|
## Strings and the Text data type
|
|
|
|
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/30453) in GitLab 13.0.
|
|
|
|
See the [text data type](database/strings_and_the_text_data_type.md) style guide for more information.
|
|
|
|
## Timestamp column type
|
|
|
|
By default, Rails uses the `timestamp` data type that stores timestamp data
|
|
without timezone information. The `timestamp` data type is used by calling
|
|
either the `add_timestamps` or the `timestamps` method.
|
|
|
|
Also, Rails converts the `:datetime` data type to the `timestamp` one.
|
|
|
|
Example:
|
|
|
|
```ruby
|
|
# timestamps
|
|
create_table :users do |t|
|
|
t.timestamps
|
|
end
|
|
|
|
# add_timestamps
|
|
def up
|
|
add_timestamps :users
|
|
end
|
|
|
|
# :datetime
|
|
def up
|
|
add_column :users, :last_sign_in, :datetime
|
|
end
|
|
```
|
|
|
|
Instead of using these methods, one should use the following methods to store
|
|
timestamps with timezones:
|
|
|
|
- `add_timestamps_with_timezone`
|
|
- `timestamps_with_timezone`
|
|
- `datetime_with_timezone`
|
|
|
|
This ensures all timestamps have a time zone specified. This, in turn, means
|
|
existing timestamps don't suddenly use a different timezone when the system's
|
|
timezone changes. It also makes it very clear which timezone was used in the
|
|
first place.
|
|
|
|
## Storing JSON in database
|
|
|
|
The Rails 5 natively supports `JSONB` (binary JSON) column type.
|
|
Example migration adding this column:
|
|
|
|
```ruby
|
|
class AddOptionsToBuildMetadata < ActiveRecord::Migration[5.0]
|
|
def change
|
|
add_column :ci_builds_metadata, :config_options, :jsonb
|
|
end
|
|
end
|
|
```
|
|
|
|
You have to use a serializer to provide a translation layer:
|
|
|
|
```ruby
|
|
class BuildMetadata
|
|
serialize :config_options, Serializers::Json # rubocop:disable Cop/ActiveRecordSerialize
|
|
end
|
|
```
|
|
|
|
When using a `JSONB` column, use the [JsonSchemaValidator](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/validators/json_schema_validator.rb) to keep control of the data being inserted over time.
|
|
|
|
```ruby
|
|
class BuildMetadata
|
|
validates :config_options, json_schema: { filename: 'build_metadata_config_option' }
|
|
end
|
|
```
|
|
|
|
## Encrypted attributes
|
|
|
|
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/227779) in GitLab 14.0.
|
|
|
|
Do not store `attr_encrypted` attributes as `:text` in the database; use
|
|
`:binary` instead. This uses the `bytea` type in PostgreSQL and makes storage more
|
|
efficient:
|
|
|
|
```ruby
|
|
class AddSecretToSomething < ActiveRecord::Migration[5.0]
|
|
def change
|
|
add_column :something, :encrypted_secret, :binary
|
|
add_column :something, :encrypted_secret_iv, :binary
|
|
end
|
|
end
|
|
```
|
|
|
|
When storing encrypted attributes in a binary column, we need to provide the
|
|
`encode: false` and `encode_iv: false` options to `attr_encrypted`:
|
|
|
|
```ruby
|
|
class Something < ApplicationRecord
|
|
attr_encrypted :secret,
|
|
mode: :per_attribute_iv,
|
|
key: Settings.attr_encrypted_db_key_base_32,
|
|
algorithm: 'aes-256-gcm',
|
|
encode: false,
|
|
encode_iv: false
|
|
end
|
|
```
|
|
|
|
## Testing
|
|
|
|
See the [Testing Rails migrations](testing_guide/testing_migrations_guide.md) style guide.
|
|
|
|
## Data migration
|
|
|
|
Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of
|
|
using plain SQL, you need to quote all input manually with `quote_string` helper.
|
|
|
|
Example with Arel:
|
|
|
|
```ruby
|
|
users = Arel::Table.new(:users)
|
|
users.group(users[:user_id]).having(users[:id].count.gt(5))
|
|
|
|
#update other tables with these results
|
|
```
|
|
|
|
Example with plain SQL and `quote_string` helper:
|
|
|
|
```ruby
|
|
select_all("SELECT name, COUNT(id) as cnt FROM tags GROUP BY name HAVING COUNT(id) > 1").each do |tag|
|
|
tag_name = quote_string(tag["name"])
|
|
duplicate_ids = select_all("SELECT id FROM tags WHERE name = '#{tag_name}'").map{|tag| tag["id"]}
|
|
origin_tag_id = duplicate_ids.first
|
|
duplicate_ids.delete origin_tag_id
|
|
|
|
execute("UPDATE taggings SET tag_id = #{origin_tag_id} WHERE tag_id IN(#{duplicate_ids.join(",")})")
|
|
execute("DELETE FROM tags WHERE id IN(#{duplicate_ids.join(",")})")
|
|
end
|
|
```
|
|
|
|
If you need more complex logic, you can define and use models local to a
|
|
migration. For example:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
class Project < ActiveRecord::Base
|
|
self.table_name = 'projects'
|
|
end
|
|
|
|
def up
|
|
# Reset the column information of all the models that update the database
|
|
# to ensure the Active Record's knowledge of the table structure is current
|
|
Project.reset_column_information
|
|
|
|
# ... ...
|
|
end
|
|
end
|
|
```
|
|
|
|
When doing so be sure to explicitly set the model's table name, so it's not
|
|
derived from the class name or namespace.
|
|
|
|
Be aware of the limitations [when using models in migrations](#using-models-in-migrations-discouraged).
|
|
|
|
### Renaming reserved paths
|
|
|
|
When a new route for projects is introduced, it could conflict with any
|
|
existing records. The path for these records should be renamed, and the
|
|
related data should be moved on disk.
|
|
|
|
Since we had to do this a few times already, there are now some helpers to help
|
|
with this.
|
|
|
|
To use this you can include `Gitlab::Database::RenameReservedPathsMigration::V1`
|
|
in your migration. This provides 3 methods which you can pass one or more
|
|
paths that need to be rejected.
|
|
|
|
- **`rename_root_paths`**: Renames the path of all _namespaces_ with the
|
|
given name that don't have a `parent_id`.
|
|
- **`rename_child_paths`**: Renames the path of all _namespaces_ with the
|
|
given name that have a `parent_id`.
|
|
- **`rename_wildcard_paths`**: Renames the path of all _projects_, and all
|
|
_namespaces_ that have a `project_id`.
|
|
|
|
The `path` column for these rows are renamed to their previous value followed
|
|
by an integer. For example: `users` would turn into `users0`
|
|
|
|
## Using models in migrations (discouraged)
|
|
|
|
The use of models in migrations is generally discouraged. As such models are
|
|
[contraindicated for background migrations](background_migrations.md#isolation),
|
|
the model needs to be declared in the migration.
|
|
|
|
If using a model in the migrations, you should first
|
|
[clear the column cache](https://api.rubyonrails.org/classes/ActiveRecord/ModelSchema/ClassMethods.html#method-i-reset_column_information)
|
|
using `reset_column_information`.
|
|
|
|
If using a model that leverages single table inheritance (STI), there are [special
|
|
considerations](single_table_inheritance.md#in-migrations).
|
|
|
|
This avoids problems where a column that you are using was altered and cached
|
|
in a previous migration.
|
|
|
|
### Example: Add a column `my_column` to the users table
|
|
|
|
It is important not to leave out the `User.reset_column_information` command, in order to ensure that the old schema is dropped from the cache and ActiveRecord loads the updated schema information.
|
|
|
|
```ruby
|
|
class AddAndSeedMyColumn < ActiveRecord::Migration[6.0]
|
|
class User < ActiveRecord::Base
|
|
self.table_name = 'users'
|
|
end
|
|
|
|
def up
|
|
User.count # Any ActiveRecord calls on the model that caches the column information.
|
|
|
|
add_column :users, :my_column, :integer, default: 1
|
|
|
|
User.reset_column_information # The old schema is dropped from the cache.
|
|
User.find_each do |user|
|
|
user.my_column = 42 if some_condition # ActiveRecord sees the correct schema here.
|
|
user.save!
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
The underlying table is modified and then accessed via ActiveRecord.
|
|
|
|
Note that this also needs to be used if the table is modified in a previous, different migration,
|
|
if both migrations are run in the same `db:migrate` process.
|
|
|
|
This results in the following. Note the inclusion of `my_column`:
|
|
|
|
```shell
|
|
== 20200705232821 AddAndSeedMyColumn: migrating ==============================
|
|
D, [2020-07-06T00:37:12.483876 #130101] DEBUG -- : (0.2ms) BEGIN
|
|
D, [2020-07-06T00:37:12.521660 #130101] DEBUG -- : (0.4ms) SELECT COUNT(*) FROM "user"
|
|
-- add_column(:users, :my_column, :integer, {:default=>1})
|
|
D, [2020-07-06T00:37:12.523309 #130101] DEBUG -- : (0.8ms) ALTER TABLE "users" ADD "my_column" integer DEFAULT 1
|
|
-> 0.0016s
|
|
D, [2020-07-06T00:37:12.650641 #130101] DEBUG -- : AddAndSeedMyColumn::User Load (0.7ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]]
|
|
D, [2020-07-18T00:41:26.851769 #459802] DEBUG -- : AddAndSeedMyColumn::User Update (1.1ms) UPDATE "users" SET "my_column" = $1, "updated_at" = $2 WHERE "users"."id" = $3 [["my_column", 42], ["updated_at", "2020-07-17 23:41:26.849044"], ["id", 1]]
|
|
D, [2020-07-06T00:37:12.653648 #130101] DEBUG -- : ↳ config/initializers/config_initializers_active_record_locking.rb:13:in `_update_row'
|
|
== 20200705232821 AddAndSeedMyColumn: migrated (0.1706s) =====================
|
|
```
|
|
|
|
If you skip clearing the schema cache (`User.reset_column_information`), the column is not
|
|
used by ActiveRecord and the intended changes are not made, leading to the result below,
|
|
where `my_column` is missing from the query.
|
|
|
|
```shell
|
|
== 20200705232821 AddAndSeedMyColumn: migrating ==============================
|
|
D, [2020-07-06T00:37:12.483876 #130101] DEBUG -- : (0.2ms) BEGIN
|
|
D, [2020-07-06T00:37:12.521660 #130101] DEBUG -- : (0.4ms) SELECT COUNT(*) FROM "user"
|
|
-- add_column(:users, :my_column, :integer, {:default=>1})
|
|
D, [2020-07-06T00:37:12.523309 #130101] DEBUG -- : (0.8ms) ALTER TABLE "users" ADD "my_column" integer DEFAULT 1
|
|
-> 0.0016s
|
|
D, [2020-07-06T00:37:12.650641 #130101] DEBUG -- : AddAndSeedMyColumn::User Load (0.7ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]]
|
|
D, [2020-07-06T00:37:12.653459 #130101] DEBUG -- : AddAndSeedMyColumn::User Update (0.5ms) UPDATE "users" SET "updated_at" = $1 WHERE "users"."id" = $2 [["updated_at", "2020-07-05 23:37:12.652297"], ["id", 1]]
|
|
D, [2020-07-06T00:37:12.653648 #130101] DEBUG -- : ↳ config/initializers/config_initializers_active_record_locking.rb:13:in `_update_row'
|
|
== 20200705232821 AddAndSeedMyColumn: migrated (0.1706s) =====================
|
|
```
|
|
|
|
## High traffic tables
|
|
|
|
Here's a list of current [high-traffic tables](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml).
|
|
|
|
Determining what tables are high-traffic can be difficult. Self-managed instances might use
|
|
different features of GitLab with different usage patterns, thus making assumptions based
|
|
on GitLab.com not enough.
|
|
|
|
To identify a high-traffic table for GitLab.com the following measures are considered.
|
|
Note that the metrics linked here are GitLab-internal only:
|
|
|
|
- [Read operations](https://thanos.gitlab.net/graph?g0.range_input=2h&g0.max_source_resolution=0s&g0.expr=topk(500%2C%20sum%20by%20(relname)%20(rate(pg_stat_user_tables_seq_tup_read%7Benvironment%3D%22gprd%22%7D%5B12h%5D)%20%2B%20rate(pg_stat_user_tables_idx_scan%7Benvironment%3D%22gprd%22%7D%5B12h%5D)%20%2B%20rate(pg_stat_user_tables_idx_tup_fetch%7Benvironment%3D%22gprd%22%7D%5B12h%5D)))&g0.tab=1)
|
|
- [Number of records](https://thanos.gitlab.net/graph?g0.range_input=2h&g0.max_source_resolution=0s&g0.expr=topk(500%2C%20max%20by%20(relname)%20(pg_stat_user_tables_n_live_tup%7Benvironment%3D%22gprd%22%7D))&g0.tab=1)
|
|
- [Size](https://thanos.gitlab.net/graph?g0.range_input=2h&g0.max_source_resolution=0s&g0.expr=topk(500%2C%20max%20by%20(relname)%20(pg_total_relation_size_bytes%7Benvironment%3D%22gprd%22%7D))&g0.tab=1) is greater than 10 GB
|
|
|
|
Any table which has some high read operation compared to current [high-traffic tables](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L4) might be a good candidate.
|