debian-mirror-gitlab/doc/development/database/not_null_constraints.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

212 lines
7.9 KiB
Markdown
Raw Normal View History

2021-01-03 14:25:43 +05:30
---
2022-07-23 23:45:48 +05:30
stage: Data Stores
2021-01-03 14:25:43 +05:30
group: Database
2021-02-22 17:27:13 +05:30
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
2021-01-03 14:25:43 +05:30
---
2020-06-23 00:09:42 +05:30
# `NOT NULL` constraints
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/38358) in GitLab 13.0.
All attributes that should not have `NULL` as a value, should be defined as `NOT NULL`
columns in the database.
Depending on the application logic, `NOT NULL` columns should either have a `presence: true`
validation defined in their Model or have a default value as part of their database definition.
As an example, the latter can be true for boolean attributes that should always have a non-`NULL`
value, but have a well defined default value that the application does not need to enforce each
time (for example, `active=true`).
## Create a new table with `NOT NULL` columns
When adding a new table, all `NOT NULL` columns should be defined as such directly inside `create_table`.
For example, consider a migration that creates a table with two `NOT NULL` columns,
`db/migrate/20200401000001_create_db_guides.rb`:
```ruby
2021-11-11 11:23:49 +05:30
class CreateDbGuides < Gitlab::Database::Migration[1.0]
2020-06-23 00:09:42 +05:30
def change
create_table :db_guides do |t|
t.bigint :stars, default: 0, null: false
t.bigint :guide, null: false
end
end
end
```
## Add a `NOT NULL` column to an existing table
2021-01-03 14:25:43 +05:30
With PostgreSQL 11 being the minimum version in GitLab 13.0 and later, adding columns with `NULL` and/or
2020-06-23 00:09:42 +05:30
default values has become much easier and the standard `add_column` helper should be used in all cases.
For example, consider a migration that adds a new `NOT NULL` column `active` to table `db_guides`,
`db/migrate/20200501000001_add_active_to_db_guides.rb`:
```ruby
2021-11-11 11:23:49 +05:30
class AddExtendedTitleToSprints < Gitlab::Database::Migration[1.0]
2020-06-23 00:09:42 +05:30
def change
add_column :db_guides, :active, :boolean, default: true, null: false
end
end
```
## Add a `NOT NULL` constraint to an existing column
Adding `NOT NULL` to existing database columns requires multiple steps split into at least two
different releases:
1. Release `N.M` (current release)
2021-10-27 15:23:28 +05:30
- Ensure the constraint is enforced at the application level (that is, add a model validation).
2020-06-23 00:09:42 +05:30
- Add a post-deployment migration to add the `NOT NULL` constraint with `validate: false`.
- Add a post-deployment migration to fix the existing records.
2021-02-22 17:27:13 +05:30
NOTE:
2020-06-23 00:09:42 +05:30
Depending on the size of the table, a background migration for cleanup could be required in the next release.
See the [`NOT NULL` constraints on large tables](not_null_constraints.md#not-null-constraints-on-large-tables) section for more information.
- Create an issue for the next milestone to validate the `NOT NULL` constraint.
1. Release `N.M+1` (next release)
- Validate the `NOT NULL` constraint using a post-deployment migration.
### Example
Considering a given release milestone, such as 13.0, a model validation has been added into `epic.rb`
to require a description:
```ruby
class Epic < ApplicationRecord
validates :description, presence: true
end
```
The same constraint should be added at the database level for consistency purposes.
We only want to enforce the `NOT NULL` constraint without setting a default, as we have decided
that all epics should have a user-generated description.
After checking our production database, we know that there are `epics` with `NULL` descriptions,
so we can not add and validate the constraint in one step.
2021-02-22 17:27:13 +05:30
NOTE:
2020-06-23 00:09:42 +05:30
Even if we did not have any epic with a `NULL` description, another instance of GitLab could have
such records, so we would follow the same process either way.
#### Prevent new invalid records (current release)
We first add the `NOT NULL` constraint with a `NOT VALID` parameter, which enforces consistency
when new records are inserted or current records are updated.
2022-07-23 23:45:48 +05:30
In the example above, the existing epics with a `NULL` description are not affected and you are
still able to update records in the `epics` table. However, when you try to update or insert
2020-06-23 00:09:42 +05:30
an epic without providing a description, the constraint causes a database error.
Adding or removing a `NOT NULL` clause requires that any application changes are deployed _first_.
Thus, adding a `NOT NULL` constraint to an existing column should happen in a post-deployment migration.
Still in our example, for the 13.0 milestone example (current), we add the `NOT NULL` constraint
with `validate: false` in a post-deployment migration,
`db/post_migrate/20200501000001_add_not_null_constraint_to_epics_description.rb`:
```ruby
2021-11-11 11:23:49 +05:30
class AddNotNullConstraintToEpicsDescription < Gitlab::Database::Migration[1.0]
2020-06-23 00:09:42 +05:30
disable_ddl_transaction!
def up
# This will add the `NOT NULL` constraint WITHOUT validating it
add_not_null_constraint :epics, :description, validate: false
end
def down
# Down is required as `add_not_null_constraint` is not reversible
remove_not_null_constraint :epics, :description
end
end
```
#### Data migration to fix existing records (current release)
The approach here depends on the data volume and the cleanup strategy. The number of records that
2022-07-23 23:45:48 +05:30
must be fixed on GitLab.com is a nice indicator that helps us decide whether to use a
2020-06-23 00:09:42 +05:30
post-deployment migration or a background data migration:
- If the data volume is less than `1000` records, then the data migration can be executed within the post-migration.
- If the data volume is higher than `1000` records, it's advised to create a background migration.
2022-08-27 11:52:29 +05:30
When unsure about which option to use, contact the Database team for advice.
2020-06-23 00:09:42 +05:30
Back to our example, the epics table is not considerably large nor frequently accessed,
2022-07-23 23:45:48 +05:30
so we add a post-deployment migration for the 13.0 milestone (current),
2020-06-23 00:09:42 +05:30
`db/post_migrate/20200501000002_cleanup_epics_with_null_description.rb`:
```ruby
2021-11-11 11:23:49 +05:30
class CleanupEpicsWithNullDescription < Gitlab::Database::Migration[1.0]
2020-06-23 00:09:42 +05:30
# With BATCH_SIZE=1000 and epics.count=29500 on GitLab.com
# - 30 iterations will be run
# - each requires on average ~150ms
# Expected total run time: ~5 seconds
BATCH_SIZE = 1000
disable_ddl_transaction!
class Epic < ActiveRecord::Base
include EachBatch
self.table_name = 'epics'
end
def up
Epic.each_batch(of: BATCH_SIZE) do |relation|
relation.
where('description IS NULL').
update_all(description: 'No description')
end
end
def down
# no-op : can't go back to `NULL` without first dropping the `NOT NULL` constraint
end
end
```
2021-09-30 23:02:18 +05:30
#### Validate the `NOT NULL` constraint (next release)
2020-06-23 00:09:42 +05:30
2022-07-23 23:45:48 +05:30
Validating the `NOT NULL` constraint scans the whole table and make sure that each record is correct.
2020-06-23 00:09:42 +05:30
Still in our example, for the 13.1 milestone (next), we run the `validate_not_null_constraint`
migration helper in a final post-deployment migration,
`db/post_migrate/20200601000001_validate_not_null_constraint_on_epics_description.rb`:
```ruby
2021-11-11 11:23:49 +05:30
class ValidateNotNullConstraintOnEpicsDescription < Gitlab::Database::Migration[1.0]
2020-06-23 00:09:42 +05:30
disable_ddl_transaction!
def up
validate_not_null_constraint :epics, :description
end
def down
# no-op
end
end
```
## `NOT NULL` constraints on large tables
2021-09-30 23:02:18 +05:30
If you have to clean up a nullable column for a [high-traffic table](../migration_style_guide.md#high-traffic-tables)
2022-07-23 23:45:48 +05:30
(for example, the `artifacts` in `ci_builds`), your background migration goes on for a while and
it needs an additional [background migration cleaning up](background_migrations.md#cleaning-up)
2020-06-23 00:09:42 +05:30
in the release after adding the data migration.
2022-07-23 23:45:48 +05:30
In that rare case you need 3 releases end-to-end:
2020-06-23 00:09:42 +05:30
1. Release `N.M` - Add the `NOT NULL` constraint and the background-migration to fix the existing records.
1. Release `N.M+1` - Cleanup the background migration.
1. Release `N.M+2` - Validate the `NOT NULL` constraint.
2022-08-27 11:52:29 +05:30
For these cases, consult the database team early in the update cycle. The `NOT NULL`
2020-06-23 00:09:42 +05:30
constraint may not be required or other options could exist that do not affect really large
or frequently accessed tables.