978 lines
33 KiB
Markdown
978 lines
33 KiB
Markdown
# Migration Style Guide
|
|
|
|
When writing migrations for GitLab, you have to take into account that
|
|
these will be run by hundreds of thousands of organizations of all sizes, some with
|
|
many years of data in their database.
|
|
|
|
In addition, having to take a server offline for an upgrade small or big is a
|
|
big burden for most organizations. For this reason, it is important that your
|
|
migrations are written carefully, can be applied online, and adhere to the style
|
|
guide below.
|
|
|
|
Migrations are **not** allowed to require GitLab installations to be taken
|
|
offline unless _absolutely necessary_.
|
|
|
|
When downtime is necessary the migration has to be approved by:
|
|
|
|
1. The VP of Engineering
|
|
1. A Backend Maintainer
|
|
1. A Database Maintainer
|
|
|
|
An up-to-date list of people holding these titles can be found at
|
|
<https://about.gitlab.com/company/team/>.
|
|
|
|
When writing your migrations, also consider that databases might have stale data
|
|
or inconsistencies and guard for that. Try to make as few assumptions as
|
|
possible about the state of the database.
|
|
|
|
Please don't depend on GitLab-specific code since it can change in future
|
|
versions. If needed copy-paste GitLab code into the migration to make it forward
|
|
compatible.
|
|
|
|
For GitLab.com, please take into consideration that regular migrations (under `db/migrate`)
|
|
are run before [Canary is deployed](https://gitlab.com/gitlab-com/gl-infra/readiness/-/tree/master/library/canary/#configuration-and-deployment),
|
|
and post-deployment migrations (`db/post_migrate`) are run after the deployment to production has finished.
|
|
|
|
## Schema Changes
|
|
|
|
Changes to the schema should be committed to `db/structure.sql`. This
|
|
file is automatically generated by Rails, so you normally should not
|
|
edit this file by hand. If your migration is adding a column to a
|
|
table, that column will be added at the bottom. Please do not reorder
|
|
columns manually for existing tables as this will cause confusing to
|
|
other people using `db/structure.sql` generated by Rails.
|
|
|
|
When your local database in your GDK is diverging from the schema from
|
|
`master` it might be hard to cleanly commit the schema changes to
|
|
Git. In that case you can use the `scripts/regenerate-schema` script to
|
|
regenerate a clean `db/structure.sql` for the migrations you're
|
|
adding. This script will apply all migrations found in `db/migrate`
|
|
or `db/post_migrate`, so if there are any migrations you don't want to
|
|
commit to the schema, rename or remove them. If your branch is not
|
|
targeting `master` you can set the `TARGET` environment variable.
|
|
|
|
```shell
|
|
# Regenerate schema against `master`
|
|
scripts/regenerate-schema
|
|
|
|
# Regenerate schema against `12-9-stable-ee`
|
|
TARGET=12-9-stable-ee scripts/regenerate-schema
|
|
```
|
|
|
|
## What Requires Downtime?
|
|
|
|
The document ["What Requires Downtime?"](what_requires_downtime.md) specifies
|
|
various database operations, such as
|
|
|
|
- [dropping and renaming columns](what_requires_downtime.md#dropping-columns)
|
|
- [changing column constraints and types](what_requires_downtime.md#changing-column-constraints)
|
|
- [adding and dropping indexes, tables, and foreign keys](what_requires_downtime.md#adding-indexes)
|
|
|
|
and whether they require downtime and how to work around that whenever possible.
|
|
|
|
## Downtime Tagging
|
|
|
|
Every migration must specify if it requires downtime or not, and if it should
|
|
require downtime it must also specify a reason for this. This is required even
|
|
if 99% of the migrations won't require downtime as this makes it easier to find
|
|
the migrations that _do_ require downtime.
|
|
|
|
To tag a migration, add the following two constants to the migration class'
|
|
body:
|
|
|
|
- `DOWNTIME`: a boolean that when set to `true` indicates the migration requires
|
|
downtime.
|
|
- `DOWNTIME_REASON`: a String containing the reason for the migration requiring
|
|
downtime. This constant **must** be set when `DOWNTIME` is set to `true`.
|
|
|
|
For example:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
DOWNTIME = true
|
|
DOWNTIME_REASON = 'This migration requires downtime because ...'
|
|
|
|
def change
|
|
...
|
|
end
|
|
end
|
|
```
|
|
|
|
It is an error (that is, CI will fail) if the `DOWNTIME` constant is missing
|
|
from a migration class.
|
|
|
|
## Reversibility
|
|
|
|
Your migration **must be** reversible. This is very important, as it should
|
|
be possible to downgrade in case of a vulnerability or bugs.
|
|
|
|
In your migration, add a comment describing how the reversibility of the
|
|
migration was tested.
|
|
|
|
Some migrations cannot be reversed. For example, some data migrations can't be
|
|
reversed because we lose information about the state of the database before the migration.
|
|
You should still create a `down` method with a comment, explaining why
|
|
the changes performed by the `up` method can't be reversed, so that the
|
|
migration itself can be reversed, even if the changes performed during the migration
|
|
can't be reversed:
|
|
|
|
```ruby
|
|
def down
|
|
# no-op
|
|
|
|
# comment explaining why changes performed by `up` cannot be reversed.
|
|
end
|
|
```
|
|
|
|
## Atomicity
|
|
|
|
By default, migrations are single transaction. That is, a transaction is opened
|
|
at the beginning of the migration, and committed after all steps are processed.
|
|
|
|
Running migrations in a single transaction makes sure that if one of the steps fails,
|
|
none of the steps will be executed, leaving the database in valid state.
|
|
Therefore, either:
|
|
|
|
- Put all migrations in one single-transaction migration.
|
|
- If necessary, put most actions in one migration and create a separate migration
|
|
for the steps that cannot be done in a single transaction.
|
|
|
|
For example, if you create an empty table and need to build an index for it,
|
|
it is recommended to use a regular single-transaction migration and the default
|
|
rails schema statement: [`add_index`](https://api.rubyonrails.org/v5.2/classes/ActiveRecord/ConnectionAdapters/SchemaStatements.html#method-i-add_index).
|
|
This is a blocking operation, but it won't cause problems because the table is not yet used,
|
|
and therefore it does not have any records yet.
|
|
|
|
## Heavy operations in a single transaction
|
|
|
|
When using a single-transaction migration, a transaction will hold on a database connection
|
|
for the duration of the migration, so you must make sure the actions in the migration
|
|
do not take too much time: In general, queries executed in a migration need to fit comfortably
|
|
within `15s` on GitLab.com.
|
|
|
|
In case you need to insert, update, or delete a significant amount of data, you:
|
|
|
|
- Must disable the single transaction with `disable_ddl_transaction!`.
|
|
- Should consider doing it in a [Background Migration](background_migrations.md).
|
|
|
|
## Retry mechanism when acquiring database locks
|
|
|
|
When changing the database schema, we use helper methods to invoke DDL (Data Definition
|
|
Language) statements. In some cases, these DDL statements require a specific database lock.
|
|
|
|
Example:
|
|
|
|
```ruby
|
|
def change
|
|
remove_column :users, :full_name, :string
|
|
end
|
|
```
|
|
|
|
Executing this migration requires an exclusive lock on the `users` table. When the table
|
|
is concurrently accessed and modified by other processes, acquiring the lock may take
|
|
a while. The lock request is waiting in a queue and it may also block other queries
|
|
on the `users` table once it has been enqueued.
|
|
|
|
More information about PostgresSQL locks: [Explicit Locking](https://www.postgresql.org/docs/current/explicit-locking.html)
|
|
|
|
For stability reasons, GitLab.com has a specific [`statement_timeout`](../user/gitlab_com/index.md#postgresql)
|
|
set. When the migration is invoked, any database query will have
|
|
a fixed time to execute. In a worst-case scenario, the request will sit in the
|
|
lock queue, blocking other queries for the duration of the configured statement timeout,
|
|
then failing with `canceling statement due to statement timeout` error.
|
|
|
|
This problem could cause failed application upgrade processes and even application
|
|
stability issues, since the table may be inaccessible for a short period of time.
|
|
|
|
To increase the reliability and stability of database migrations, the GitLab codebase
|
|
offers a helper method to retry the operations with different `lock_timeout` settings
|
|
and wait time between the attempts. Multiple smaller attempts to acquire the necessary
|
|
lock allow the database to process other statements.
|
|
|
|
### Examples
|
|
|
|
**Removing a column:**
|
|
|
|
```ruby
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
def up
|
|
with_lock_retries do
|
|
remove_column :users, :full_name
|
|
end
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
add_column :users, :full_name, :string
|
|
end
|
|
end
|
|
```
|
|
|
|
**Removing a foreign key:**
|
|
|
|
```ruby
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
def up
|
|
with_lock_retries do
|
|
remove_foreign_key :issues, :projects
|
|
end
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
add_foreign_key :issues, :projects
|
|
end
|
|
end
|
|
```
|
|
|
|
**Changing default value for a column:**
|
|
|
|
```ruby
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
def up
|
|
with_lock_retries do
|
|
change_column_default :merge_requests, :lock_version, from: nil, to: 0
|
|
end
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
change_column_default :merge_requests, :lock_version, from: 0, to: nil
|
|
end
|
|
end
|
|
```
|
|
|
|
**Creating a new table with a foreign key:**
|
|
|
|
We can simply wrap the `create_table` method with `with_lock_retries`:
|
|
|
|
```ruby
|
|
def up
|
|
with_lock_retries do
|
|
create_table :issues do |t|
|
|
t.references :project, index: true, null: false, foreign_key: { on_delete: :cascade }
|
|
t.string :title, limit: 255
|
|
end
|
|
end
|
|
end
|
|
|
|
def down
|
|
drop_table :issues
|
|
end
|
|
```
|
|
|
|
**Creating a new table when we have two foreign keys:**
|
|
|
|
For this, we'll need three migrations:
|
|
|
|
1. Creating the table without foreign keys (with the indices).
|
|
1. Add foreign key to the first table.
|
|
1. Add foreign key to the second table.
|
|
|
|
Creating the table:
|
|
|
|
```ruby
|
|
def up
|
|
create_table :imports do |t|
|
|
t.bigint :project_id, null: false
|
|
t.bigint :user_id, null: false
|
|
t.string :jid, limit: 255
|
|
end
|
|
|
|
add_index :imports, :project_id
|
|
add_index :imports, :user_id
|
|
end
|
|
|
|
def down
|
|
drop_table :imports
|
|
end
|
|
```
|
|
|
|
Adding foreign key to `projects`:
|
|
|
|
```ruby
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
def up
|
|
with_lock_retries do
|
|
add_foreign_key :imports, :projects, column: :project_id, on_delete: :cascade
|
|
end
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
remove_foreign_key :imports, column: :project_id
|
|
end
|
|
end
|
|
```
|
|
|
|
Adding foreign key to `users`:
|
|
|
|
```ruby
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
def up
|
|
with_lock_retries do
|
|
add_foreign_key :imports, :users, column: :user_id, on_delete: :cascade
|
|
end
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
remove_foreign_key :imports, column: :user_id
|
|
end
|
|
end
|
|
```
|
|
|
|
**Usage with `disable_ddl_transaction!`**
|
|
|
|
Generally the `with_lock_retries` helper should work with `disabled_ddl_transaction!`. A custom RuboCop rule ensures that only allowed methods can be placed within the lock retries block.
|
|
|
|
```ruby
|
|
disable_ddl_transaction!
|
|
|
|
def up
|
|
with_lock_retries do
|
|
add_column :users, :name, :text
|
|
end
|
|
|
|
add_text_limit :users, :name, 255 # Includes constraint validation (full table scan)
|
|
end
|
|
```
|
|
|
|
The RuboCop rule generally allows standard Rails migration methods, listed below. This example will cause a Rubocop offense:
|
|
|
|
```ruby
|
|
disabled_ddl_transaction!
|
|
|
|
def up
|
|
with_lock_retries do
|
|
add_concurrent_index :users, :name
|
|
end
|
|
end
|
|
```
|
|
|
|
### When to use the helper method
|
|
|
|
The `with_lock_retries` helper method can be used when you normally use
|
|
standard Rails migration helper methods. Calling more than one migration
|
|
helper is not a problem if they're executed on the same table.
|
|
|
|
Using the `with_lock_retries` helper method is advised when a database
|
|
migration involves one of the high-traffic tables:
|
|
|
|
- `users`
|
|
- `projects`
|
|
- `namespaces`
|
|
- `issues`
|
|
- `merge_requests`
|
|
- `ci_pipelines`
|
|
- `ci_builds`
|
|
- `notes`
|
|
|
|
Example changes:
|
|
|
|
- `add_foreign_key` / `remove_foreign_key`
|
|
- `add_column` / `remove_column`
|
|
- `change_column_default`
|
|
- `create_table` / `drop_table`
|
|
|
|
NOTE: **Note:**
|
|
`with_lock_retries` method **cannot** be used within the `change` method, you must manually define the `up` and `down` methods to make the migration reversible.
|
|
|
|
### How the helper method works
|
|
|
|
1. Iterate 50 times.
|
|
1. For each iteration, set a pre-configured `lock_timeout`.
|
|
1. Try to execute the given block. (`remove_column`).
|
|
1. If `LockWaitTimeout` error is raised, sleep for the pre-configured `sleep_time`
|
|
and retry the block.
|
|
1. If no error is raised, the current iteration has successfully executed the block.
|
|
|
|
For more information check the [`Gitlab::Database::WithLockRetries`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/with_lock_retries.rb) class. The `with_lock_retries` helper method is implemented in the [`Gitlab::Database::MigrationHelpers`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/database/migration_helpers.rb) module.
|
|
|
|
In a worst-case scenario, the method:
|
|
|
|
- Executes the block for a maximum of 50 times over 40 minutes.
|
|
- Most of the time is spent in a pre-configured sleep period after each iteration.
|
|
- After the 50th retry, the block will be executed without `lock_timeout`, just
|
|
like a standard migration invocation.
|
|
- If a lock cannot be acquired, the migration will fail with `statement timeout` error.
|
|
|
|
The migration might fail if there is a very long running transaction (40+ minutes)
|
|
accessing the `users` table.
|
|
|
|
## Multi-Threading
|
|
|
|
Sometimes a migration might need to use multiple Ruby threads to speed up a
|
|
migration. For this to work your migration needs to include the module
|
|
`Gitlab::Database::MultiThreadedMigration`:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
include Gitlab::Database::MigrationHelpers
|
|
include Gitlab::Database::MultiThreadedMigration
|
|
end
|
|
```
|
|
|
|
You can then use the method `with_multiple_threads` to perform work in separate
|
|
threads. For example:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
include Gitlab::Database::MigrationHelpers
|
|
include Gitlab::Database::MultiThreadedMigration
|
|
|
|
def up
|
|
with_multiple_threads(4) do
|
|
disable_statement_timeout
|
|
|
|
# ...
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
Here the call to `disable_statement_timeout` will use the connection local to
|
|
the `with_multiple_threads` block, instead of re-using the global connection
|
|
pool. This ensures each thread has its own connection object, and won't time
|
|
out when trying to obtain one.
|
|
|
|
NOTE: **Note:**
|
|
PostgreSQL has a maximum amount of connections that it allows. This
|
|
limit can vary from installation to installation. As a result, it's recommended
|
|
you do not use more than 32 threads in a single migration. Usually, 4-8 threads
|
|
should be more than enough.
|
|
|
|
## Removing indexes
|
|
|
|
If the table is not empty when removing an index, make sure to use the method
|
|
`remove_concurrent_index` instead of the regular `remove_index` method.
|
|
The `remove_concurrent_index` method drops indexes concurrently, so no locking is required,
|
|
and there is no need for downtime. To use this method, you must disable single-transaction mode
|
|
by calling the method `disable_ddl_transaction!` in the body of your migration
|
|
class like so:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
include Gitlab::Database::MigrationHelpers
|
|
disable_ddl_transaction!
|
|
|
|
def up
|
|
remove_concurrent_index :table_name, :column_name
|
|
end
|
|
end
|
|
```
|
|
|
|
Note that it is not necessary to check if the index exists prior to
|
|
removing it.
|
|
|
|
For a small table (such as an empty one or one with less than `1,000` records),
|
|
it is recommended to use `remove_index` in a single-transaction migration,
|
|
combining it with other operations that don't require `disable_ddl_transaction!`.
|
|
|
|
## Adding indexes
|
|
|
|
Before adding an index, consider if this one is necessary. There are situations in which an index
|
|
might not be required, like:
|
|
|
|
- The table is small (less than `1,000` records) and it's not expected to exponentially grow in size.
|
|
- Any existing indexes filter out enough rows.
|
|
- The reduction in query timings after the index is added is not significant.
|
|
|
|
Additionally, wide indexes are not required to match all filter criteria of queries, we just need
|
|
to cover enough columns so that the index lookup has a small enough selectivity. Please review our
|
|
[Adding Database indexes](adding_database_indexes.md) guide for more details.
|
|
|
|
When adding an index to a non-empty table make sure to use the method
|
|
`add_concurrent_index` instead of the regular `add_index` method.
|
|
The `add_concurrent_index` method automatically creates concurrent indexes
|
|
when using PostgreSQL, removing the need for downtime.
|
|
|
|
To use this method, you must disable single-transactions mode
|
|
by calling the method `disable_ddl_transaction!` in the body of your migration
|
|
class like so:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
include Gitlab::Database::MigrationHelpers
|
|
|
|
DOWNTIME = false
|
|
|
|
disable_ddl_transaction!
|
|
|
|
def up
|
|
add_concurrent_index :table, :column
|
|
end
|
|
|
|
def down
|
|
remove_concurrent_index :table, :column
|
|
end
|
|
end
|
|
```
|
|
|
|
If you need to add a unique index, please keep in mind there is the possibility
|
|
of existing duplicates being present in the database. This means that should
|
|
always _first_ add a migration that removes any duplicates, before adding the
|
|
unique index.
|
|
|
|
For a small table (such as an empty one or one with less than `1,000` records),
|
|
it is recommended to use `add_index` in a single-transaction migration, combining it with other
|
|
operations that don't require `disable_ddl_transaction!`.
|
|
|
|
## Adding foreign-key constraints
|
|
|
|
When adding a foreign-key constraint to either an existing or a new column also
|
|
remember to add an index on the column.
|
|
|
|
This is **required** for all foreign-keys, e.g., to support efficient cascading
|
|
deleting: when a lot of rows in a table get deleted, the referenced records need
|
|
to be deleted too. The database has to look for corresponding records in the
|
|
referenced table. Without an index, this will result in a sequential scan on the
|
|
table, which can take a long time.
|
|
|
|
Here's an example where we add a new column with a foreign key
|
|
constraint. Note it includes `index: true` to create an index for it.
|
|
|
|
```ruby
|
|
class Migration < ActiveRecord::Migration[6.0]
|
|
|
|
def change
|
|
add_reference :model, :other_model, index: true, foreign_key: { on_delete: :cascade }
|
|
end
|
|
end
|
|
```
|
|
|
|
When adding a foreign-key constraint to an existing column in a non-empty table,
|
|
we have to employ `add_concurrent_foreign_key` and `add_concurrent_index`
|
|
instead of `add_reference`.
|
|
|
|
For an empty table (such as a fresh one), it is recommended to use
|
|
`add_reference` in a single-transaction migration, combining it with other
|
|
operations that don't require `disable_ddl_transaction!`.
|
|
|
|
You can read more about adding [foreign key constraints to an existing column](database/add_foreign_key_to_existing_column.md).
|
|
|
|
## `NOT NULL` constraints
|
|
|
|
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/38358) in GitLab 13.0.
|
|
|
|
See the style guide on [`NOT NULL` constraints](database/not_null_constraints.md) for more information.
|
|
|
|
## Adding Columns With Default Values
|
|
|
|
With PostgreSQL 11 being the minimum version since GitLab 13.0, adding columns with default values has become much easier and
|
|
the standard `add_column` helper should be used in all cases.
|
|
|
|
Before PostgreSQL 11, adding a column with a default was problematic as it would
|
|
have caused a full table rewrite. The corresponding helper `add_column_with_default`
|
|
has been deprecated and will be removed in a later release.
|
|
|
|
NOTE: **Note:**
|
|
If a backport adding a column with a default value is needed for %12.9 or earlier versions,
|
|
it should use `add_column_with_default` helper. If a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3)
|
|
is involved, backporting to %12.9 is contraindicated.
|
|
|
|
## Changing the column default
|
|
|
|
One might think that changing a default column with `change_column_default` is an
|
|
expensive and disruptive operation for larger tables, but in reality it's not.
|
|
|
|
Take the following migration as an example:
|
|
|
|
```ruby
|
|
class DefaultRequestAccessGroups < ActiveRecord::Migration[5.2]
|
|
DOWNTIME = false
|
|
|
|
def change
|
|
change_column_default(:namespaces, :request_access_enabled, from: false, to: true)
|
|
end
|
|
end
|
|
```
|
|
|
|
Migration above changes the default column value of one of our largest
|
|
tables: `namespaces`. This can be translated to:
|
|
|
|
```sql
|
|
ALTER TABLE namespaces
|
|
ALTER COLUMN request_access_enabled
|
|
DEFAULT false
|
|
```
|
|
|
|
In this particular case, the default value exists and we're just changing the metadata for
|
|
`request_access_enabled` column, which does not imply a rewrite of all the existing records
|
|
in the `namespaces` table. Only when creating a new column with a default, all the records are going be rewritten.
|
|
|
|
NOTE: **Note:**
|
|
A faster [ALTER TABLE ADD COLUMN with a non-null default](https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-fast-alter-table-add-column-with-a-non-null-default/)
|
|
was introduced on PostgresSQL 11.0, removing the need of rewriting the table when a new column with a default value is added.
|
|
|
|
For the reasons mentioned above, it's safe to use `change_column_default` in a single-transaction migration
|
|
without requiring `disable_ddl_transaction!`.
|
|
|
|
## Updating an existing column
|
|
|
|
To update an existing column to a particular value, you can use
|
|
`update_column_in_batches`. This will split the updates into batches, so we
|
|
don't update too many rows at in a single statement.
|
|
|
|
This updates the column `foo` in the `projects` table to 10, where `some_column`
|
|
is `'hello'`:
|
|
|
|
```ruby
|
|
update_column_in_batches(:projects, :foo, 10) do |table, query|
|
|
query.where(table[:some_column].eq('hello'))
|
|
end
|
|
```
|
|
|
|
If a computed update is needed, the value can be wrapped in `Arel.sql`, so Arel
|
|
treats it as an SQL literal. It's also a required deprecation for [Rails 6](https://gitlab.com/gitlab-org/gitlab/-/issues/28497).
|
|
|
|
The below example is the same as the one above, but
|
|
the value is set to the product of the `bar` and `baz` columns:
|
|
|
|
```ruby
|
|
update_value = Arel.sql('bar * baz')
|
|
|
|
update_column_in_batches(:projects, :foo, update_value) do |table, query|
|
|
query.where(table[:some_column].eq('hello'))
|
|
end
|
|
```
|
|
|
|
Like `add_column_with_default`, there is a RuboCop cop to detect usage of this
|
|
on large tables. In the case of `update_column_in_batches`, it may be acceptable
|
|
to run on a large table, as long as it is only updating a small subset of the
|
|
rows in the table, but do not ignore that without validating on the GitLab.com
|
|
staging environment - or asking someone else to do so for you - beforehand.
|
|
|
|
## Dropping a database table
|
|
|
|
Dropping a database table is uncommon, and the `drop_table` method
|
|
provided by Rails is generally considered safe. Before dropping the table,
|
|
please consider the following:
|
|
|
|
If your table has foreign keys on a high-traffic table (like `projects`), then
|
|
the `DROP TABLE` statement might fail with **statement timeout** error. Determining
|
|
what tables are high traffic can be difficult. Self-managed instances might
|
|
use different features of GitLab with different usage patterns, thus making
|
|
assumptions based on GitLab.com is not enough.
|
|
|
|
Table **has no records** (feature was never in use) and **no foreign
|
|
keys**:
|
|
|
|
- Simply use the `drop_table` method in your migration.
|
|
|
|
```ruby
|
|
def change
|
|
drop_table :my_table
|
|
end
|
|
```
|
|
|
|
Table **has records** but **no foreign keys**:
|
|
|
|
- First release: Remove the application code related to the table, such as models,
|
|
controllers and services.
|
|
- Second release: Use the `drop_table` method in your migration.
|
|
|
|
```ruby
|
|
def up
|
|
drop_table :my_table
|
|
end
|
|
|
|
def down
|
|
# create_table ...
|
|
end
|
|
```
|
|
|
|
Table **has foreign keys**:
|
|
|
|
- First release: Remove the application code related to the table, such as models,
|
|
controllers, and services.
|
|
- Second release: Remove the foreign keys using the `with_lock_retries`
|
|
helper method. Use `drop_table` in another migration file.
|
|
|
|
**Migrations for the second release:**
|
|
|
|
Removing the foreign key on the `projects` table:
|
|
|
|
```ruby
|
|
# first migration file
|
|
|
|
def up
|
|
with_lock_retries do
|
|
remove_foreign_key :my_table, :projects
|
|
end
|
|
end
|
|
|
|
def down
|
|
with_lock_retries do
|
|
add_foreign_key :my_table, :projects
|
|
end
|
|
end
|
|
```
|
|
|
|
Dropping the table:
|
|
|
|
```ruby
|
|
# second migration file
|
|
|
|
def up
|
|
drop_table :my_table
|
|
end
|
|
|
|
def down
|
|
# create_table ...
|
|
end
|
|
```
|
|
|
|
## Integer column type
|
|
|
|
By default, an integer column can hold up to a 4-byte (32-bit) number. That is
|
|
a max value of 2,147,483,647. Be aware of this when creating a column that will
|
|
hold file sizes in byte units. If you are tracking file size in bytes, this
|
|
restricts the maximum file size to just over 2GB.
|
|
|
|
To allow an integer column to hold up to an 8-byte (64-bit) number, explicitly
|
|
set the limit to 8-bytes. This will allow the column to hold a value up to
|
|
`9,223,372,036,854,775,807`.
|
|
|
|
Rails migration example:
|
|
|
|
```ruby
|
|
add_column(:projects, :foo, :integer, default: 10, limit: 8)
|
|
```
|
|
|
|
## Strings and the Text data type
|
|
|
|
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/30453) in GitLab 13.0.
|
|
|
|
See the [text data type](database/strings_and_the_text_data_type.md) style guide for more information.
|
|
|
|
## Timestamp column type
|
|
|
|
By default, Rails uses the `timestamp` data type that stores timestamp data
|
|
without timezone information. The `timestamp` data type is used by calling
|
|
either the `add_timestamps` or the `timestamps` method.
|
|
|
|
Also, Rails converts the `:datetime` data type to the `timestamp` one.
|
|
|
|
Example:
|
|
|
|
```ruby
|
|
# timestamps
|
|
create_table :users do |t|
|
|
t.timestamps
|
|
end
|
|
|
|
# add_timestamps
|
|
def up
|
|
add_timestamps :users
|
|
end
|
|
|
|
# :datetime
|
|
def up
|
|
add_column :users, :last_sign_in, :datetime
|
|
end
|
|
```
|
|
|
|
Instead of using these methods, one should use the following methods to store
|
|
timestamps with timezones:
|
|
|
|
- `add_timestamps_with_timezone`
|
|
- `timestamps_with_timezone`
|
|
- `datetime_with_timezone`
|
|
|
|
This ensures all timestamps have a time zone specified. This, in turn, means
|
|
existing timestamps won't suddenly use a different timezone when the system's
|
|
timezone changes. It also makes it very clear which timezone was used in the
|
|
first place.
|
|
|
|
## Storing JSON in database
|
|
|
|
The Rails 5 natively supports `JSONB` (binary JSON) column type.
|
|
Example migration adding this column:
|
|
|
|
```ruby
|
|
class AddOptionsToBuildMetadata < ActiveRecord::Migration[5.0]
|
|
DOWNTIME = false
|
|
|
|
def change
|
|
add_column :ci_builds_metadata, :config_options, :jsonb
|
|
end
|
|
end
|
|
```
|
|
|
|
You have to use a serializer to provide a translation layer:
|
|
|
|
```ruby
|
|
class BuildMetadata
|
|
serialize :config_options, Serializers::JSON # rubocop:disable Cop/ActiveRecordSerialize
|
|
end
|
|
```
|
|
|
|
When using a `JSONB` column, use the [JsonSchemaValidator](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/validators/json_schema_validator.rb) to keep control of the data being inserted over time.
|
|
|
|
```ruby
|
|
class BuildMetadata
|
|
validates :config_options, json_schema: { filename: 'build_metadata_config_option' }
|
|
end
|
|
```
|
|
|
|
## Testing
|
|
|
|
See the [Testing Rails migrations](testing_guide/testing_migrations_guide.md) style guide.
|
|
|
|
## Data migration
|
|
|
|
Please prefer Arel and plain SQL over usual ActiveRecord syntax. In case of
|
|
using plain SQL, you need to quote all input manually with `quote_string` helper.
|
|
|
|
Example with Arel:
|
|
|
|
```ruby
|
|
users = Arel::Table.new(:users)
|
|
users.group(users[:user_id]).having(users[:id].count.gt(5))
|
|
|
|
#update other tables with these results
|
|
```
|
|
|
|
Example with plain SQL and `quote_string` helper:
|
|
|
|
```ruby
|
|
select_all("SELECT name, COUNT(id) as cnt FROM tags GROUP BY name HAVING COUNT(id) > 1").each do |tag|
|
|
tag_name = quote_string(tag["name"])
|
|
duplicate_ids = select_all("SELECT id FROM tags WHERE name = '#{tag_name}'").map{|tag| tag["id"]}
|
|
origin_tag_id = duplicate_ids.first
|
|
duplicate_ids.delete origin_tag_id
|
|
|
|
execute("UPDATE taggings SET tag_id = #{origin_tag_id} WHERE tag_id IN(#{duplicate_ids.join(",")})")
|
|
execute("DELETE FROM tags WHERE id IN(#{duplicate_ids.join(",")})")
|
|
end
|
|
```
|
|
|
|
If you need more complex logic, you can define and use models local to a
|
|
migration. For example:
|
|
|
|
```ruby
|
|
class MyMigration < ActiveRecord::Migration[6.0]
|
|
class Project < ActiveRecord::Base
|
|
self.table_name = 'projects'
|
|
end
|
|
|
|
def up
|
|
# Reset the column information of all the models that update the database
|
|
# to ensure the Active Record's knowledge of the table structure is current
|
|
Project.reset_column_information
|
|
|
|
# ... ...
|
|
end
|
|
end
|
|
```
|
|
|
|
When doing so be sure to explicitly set the model's table name, so it's not
|
|
derived from the class name or namespace.
|
|
|
|
Be aware of the limitations [when using models in migrations](#using-models-in-migrations-discouraged).
|
|
|
|
### Renaming reserved paths
|
|
|
|
When a new route for projects is introduced, it could conflict with any
|
|
existing records. The path for these records should be renamed, and the
|
|
related data should be moved on disk.
|
|
|
|
Since we had to do this a few times already, there are now some helpers to help
|
|
with this.
|
|
|
|
To use this you can include `Gitlab::Database::RenameReservedPathsMigration::V1`
|
|
in your migration. This will provide 3 methods which you can pass one or more
|
|
paths that need to be rejected.
|
|
|
|
**`rename_root_paths`**: This will rename the path of all _namespaces_ with the
|
|
given name that don't have a `parent_id`.
|
|
|
|
**`rename_child_paths`**: This will rename the path of all _namespaces_ with the
|
|
given name that have a `parent_id`.
|
|
|
|
**`rename_wildcard_paths`**: This will rename the path of all _projects_, and all
|
|
_namespaces_ that have a `project_id`.
|
|
|
|
The `path` column for these rows will be renamed to their previous value followed
|
|
by an integer. For example: `users` would turn into `users0`
|
|
|
|
## Using models in migrations (discouraged)
|
|
|
|
The use of models in migrations is generally discouraged. As such models are
|
|
[contraindicated for background migrations](background_migrations.md#isolation),
|
|
the model needs to be declared in the migration.
|
|
|
|
If using a model in the migrations, you should first
|
|
[clear the column cache](https://api.rubyonrails.org/classes/ActiveRecord/ModelSchema/ClassMethods.html#method-i-reset_column_information)
|
|
using `reset_column_information`.
|
|
|
|
This avoids problems where a column that you are using was altered and cached
|
|
in a previous migration.
|
|
|
|
### Example: Add a column `my_column` to the users table
|
|
|
|
NOTE: **Note:**
|
|
It is important not to leave out the `User.reset_column_information` command, in order to ensure that the old schema is dropped from the cache and ActiveRecord loads the updated schema information.
|
|
|
|
```ruby
|
|
class AddAndSeedMyColumn < ActiveRecord::Migration[6.0]
|
|
class User < ActiveRecord::Base
|
|
self.table_name = 'users'
|
|
end
|
|
|
|
def up
|
|
User.count # Any ActiveRecord calls on the model that caches the column information.
|
|
|
|
add_column :users, :my_column, :integer, default: 1
|
|
|
|
User.reset_column_information # The old schema is dropped from the cache.
|
|
User.find_each do |user|
|
|
user.my_column = 42 if some_condition # ActiveRecord sees the correct schema here.
|
|
user.save!
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
The underlying table is modified and then accessed via ActiveRecord.
|
|
|
|
Note that this also needs to be used if the table is modified in a previous, different migration,
|
|
if both migrations are run in the same `db:migrate` process.
|
|
|
|
This results in the following. Note the inclusion of `my_column`:
|
|
|
|
```shell
|
|
== 20200705232821 AddAndSeedMyColumn: migrating ==============================
|
|
D, [2020-07-06T00:37:12.483876 #130101] DEBUG -- : (0.2ms) BEGIN
|
|
D, [2020-07-06T00:37:12.521660 #130101] DEBUG -- : (0.4ms) SELECT COUNT(*) FROM "user"
|
|
-- add_column(:users, :my_column, :integer, {:default=>1})
|
|
D, [2020-07-06T00:37:12.523309 #130101] DEBUG -- : (0.8ms) ALTER TABLE "users" ADD "my_column" integer DEFAULT 1
|
|
-> 0.0016s
|
|
D, [2020-07-06T00:37:12.650641 #130101] DEBUG -- : AddAndSeedMyColumn::User Load (0.7ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]]
|
|
D, [2020-07-18T00:41:26.851769 #459802] DEBUG -- : AddAndSeedMyColumn::User Update (1.1ms) UPDATE "users" SET "my_column" = $1, "updated_at" = $2 WHERE "users"."id" = $3 [["my_column", 42], ["updated_at", "2020-07-17 23:41:26.849044"], ["id", 1]]
|
|
D, [2020-07-06T00:37:12.653648 #130101] DEBUG -- : ↳ config/initializers/config_initializers_active_record_locking.rb:13:in `_update_row'
|
|
== 20200705232821 AddAndSeedMyColumn: migrated (0.1706s) =====================
|
|
```
|
|
|
|
If you skip clearing the schema cache (`User.reset_column_information`), the column is not
|
|
used by ActiveRecord and the intended changes are not made, leading to the result below,
|
|
where `my_column` is missing from the query.
|
|
|
|
```shell
|
|
== 20200705232821 AddAndSeedMyColumn: migrating ==============================
|
|
D, [2020-07-06T00:37:12.483876 #130101] DEBUG -- : (0.2ms) BEGIN
|
|
D, [2020-07-06T00:37:12.521660 #130101] DEBUG -- : (0.4ms) SELECT COUNT(*) FROM "user"
|
|
-- add_column(:users, :my_column, :integer, {:default=>1})
|
|
D, [2020-07-06T00:37:12.523309 #130101] DEBUG -- : (0.8ms) ALTER TABLE "users" ADD "my_column" integer DEFAULT 1
|
|
-> 0.0016s
|
|
D, [2020-07-06T00:37:12.650641 #130101] DEBUG -- : AddAndSeedMyColumn::User Load (0.7ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT $1 [["LIMIT", 1000]]
|
|
D, [2020-07-06T00:37:12.653459 #130101] DEBUG -- : AddAndSeedMyColumn::User Update (0.5ms) UPDATE "users" SET "updated_at" = $1 WHERE "users"."id" = $2 [["updated_at", "2020-07-05 23:37:12.652297"], ["id", 1]]
|
|
D, [2020-07-06T00:37:12.653648 #130101] DEBUG -- : ↳ config/initializers/config_initializers_active_record_locking.rb:13:in `_update_row'
|
|
== 20200705232821 AddAndSeedMyColumn: migrated (0.1706s) =====================
|
|
```
|