debian-mirror-gitlab/doc/administration/merge_request_diffs.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

276 lines
10 KiB
Markdown
Raw Normal View History

2020-10-24 23:57:45 +05:30
---
stage: Create
group: Editor
2022-03-02 08:16:31 +05:30
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
2020-10-24 23:57:45 +05:30
---
2021-03-11 19:13:27 +05:30
# Merge request diffs storage **(FREE SELF)**
2019-03-02 22:35:43 +05:30
Merge request diffs are size-limited copies of diffs associated with merge
requests. When viewing a merge request, diffs are sourced from these copies
wherever possible as a performance optimization.
By default, merge request diffs are stored in the database, in a table named
`merge_request_diff_files`. Larger installations may find this table grows too
large, in which case, switching to external storage is recommended.
2022-03-02 08:16:31 +05:30
Merge request diffs can be stored [on disk](#using-external-storage), or in
[object storage](#using-object-storage). In general, it
2021-01-03 14:25:43 +05:30
is better to store the diffs in the database than on disk. A compromise is available
that only [stores outdated diffs](#alternative-in-database-storage) outside of database.
2019-03-02 22:35:43 +05:30
2021-01-03 14:25:43 +05:30
## Using external storage
2019-03-02 22:35:43 +05:30
**In Omnibus installations:**
1. Edit `/etc/gitlab/gitlab.rb` and add the following line:
2019-09-30 21:07:59 +05:30
```ruby
gitlab_rails['external_diffs_enabled'] = true
```
2019-03-02 22:35:43 +05:30
2021-03-11 19:13:27 +05:30
1. The external diffs are stored in
`/var/opt/gitlab/gitlab-rails/shared/external-diffs`. To change the path,
2019-03-13 22:55:13 +05:30
for example, to `/mnt/storage/external-diffs`, edit `/etc/gitlab/gitlab.rb`
2019-03-02 22:35:43 +05:30
and add the following line:
2019-09-30 21:07:59 +05:30
```ruby
gitlab_rails['external_diffs_storage_path'] = "/mnt/storage/external-diffs"
```
2019-03-02 22:35:43 +05:30
2019-03-13 22:55:13 +05:30
1. Save the file and [reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
2022-03-02 08:16:31 +05:30
GitLab then migrates your existing merge request diffs to external storage.
2019-03-02 22:35:43 +05:30
**In installations from source:**
1. Edit `/home/git/gitlab/config/gitlab.yml` and add or amend the following
lines:
2019-09-30 21:07:59 +05:30
```yaml
external_diffs:
enabled: true
```
2019-03-02 22:35:43 +05:30
2021-03-11 19:13:27 +05:30
1. The external diffs are stored in
`/home/git/gitlab/shared/external-diffs`. To change the path, for example,
2019-03-02 22:35:43 +05:30
to `/mnt/storage/external-diffs`, edit `/home/git/gitlab/config/gitlab.yml`
and add or amend the following lines:
2019-09-30 21:07:59 +05:30
```yaml
external_diffs:
enabled: true
storage_path: /mnt/storage/external-diffs
```
2019-03-02 22:35:43 +05:30
2019-03-13 22:55:13 +05:30
1. Save the file and [restart GitLab](restart_gitlab.md#installations-from-source) for the changes to take effect.
2022-03-02 08:16:31 +05:30
GitLab then migrates your existing merge request diffs to external storage.
2019-03-02 22:35:43 +05:30
2019-09-30 21:07:59 +05:30
## Using object storage
2019-03-02 22:35:43 +05:30
2021-02-22 17:27:13 +05:30
WARNING:
2021-03-11 19:13:27 +05:30
Migrating to object storage is not reversible.
2019-12-21 20:55:43 +05:30
2019-03-13 22:55:13 +05:30
Instead of storing the external diffs on disk, we recommended the use of an object
2019-03-02 22:35:43 +05:30
store like AWS S3 instead. This configuration relies on valid AWS credentials to
be configured already.
2021-01-03 14:25:43 +05:30
**In Omnibus installations:**
1. Edit `/etc/gitlab/gitlab.rb` and add the following line:
```ruby
gitlab_rails['external_diffs_enabled'] = true
```
1. Set [object storage settings](#object-storage-settings).
1. Save the file and [reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
2022-03-02 08:16:31 +05:30
GitLab then migrates your existing merge request diffs to external storage.
2021-01-03 14:25:43 +05:30
**In installations from source:**
1. Edit `/home/git/gitlab/config/gitlab.yml` and add or amend the following
lines:
```yaml
external_diffs:
enabled: true
```
1. Set [object storage settings](#object-storage-settings).
1. Save the file and [restart GitLab](restart_gitlab.md#installations-from-source) for the changes to take effect.
2022-03-02 08:16:31 +05:30
GitLab then migrates your existing merge request diffs to external storage.
2021-01-03 14:25:43 +05:30
2020-04-22 19:07:51 +05:30
[Read more about using object storage with GitLab](object_storage.md).
2021-01-03 14:25:43 +05:30
### Object Storage Settings
2019-03-02 22:35:43 +05:30
2021-02-22 17:27:13 +05:30
NOTE:
2020-07-28 23:09:34 +05:30
In GitLab 13.2 and later, we recommend using the
[consolidated object storage settings](object_storage.md#consolidated-object-storage-configuration).
This section describes the earlier configuration format.
2019-03-02 22:35:43 +05:30
For source installations, these settings are nested under `external_diffs:` and
2019-03-13 22:55:13 +05:30
then `object_store:`. On Omnibus installations, they are prefixed by
2019-03-02 22:35:43 +05:30
`external_diffs_object_store_`.
| Setting | Description | Default |
|---------|-------------|---------|
| `enabled` | Enable/disable object storage | `false` |
2021-03-11 19:13:27 +05:30
| `remote_directory` | The bucket name where external diffs are stored| |
2021-01-03 14:25:43 +05:30
| `direct_upload` | Set to `true` to enable direct upload of external diffs without the need of local shared storage. Option may be removed once we decide to support only single storage for all files. | `false` |
| `background_upload` | Set to `false` to disable automatic upload. Option may be removed once upload is direct to S3 | `true` |
| `proxy_download` | Set to `true` to enable proxying all files served. Option allows to reduce egress traffic as this allows clients to download directly from remote storage instead of proxying all data | `false` |
2019-03-02 22:35:43 +05:30
| `connection` | Various connection options described below | |
2021-01-03 14:25:43 +05:30
#### S3 compatible connection settings
2019-03-02 22:35:43 +05:30
2020-07-28 23:09:34 +05:30
See [the available connection settings for different providers](object_storage.md#connection-settings).
2019-03-02 22:35:43 +05:30
**In Omnibus installations:**
1. Edit `/etc/gitlab/gitlab.rb` and add the following lines by replacing with
the values you want:
2019-09-30 21:07:59 +05:30
```ruby
gitlab_rails['external_diffs_enabled'] = true
gitlab_rails['external_diffs_object_store_enabled'] = true
gitlab_rails['external_diffs_object_store_remote_directory'] = "external-diffs"
gitlab_rails['external_diffs_object_store_connection'] = {
'provider' => 'AWS',
'region' => 'eu-central-1',
'aws_access_key_id' => 'AWS_ACCESS_KEY_ID',
'aws_secret_access_key' => 'AWS_SECRET_ACCESS_KEY'
}
```
2021-03-11 19:13:27 +05:30
If you are using AWS IAM profiles, omit the
2019-09-30 21:07:59 +05:30
AWS access key and secret access key/value pairs. For example:
```ruby
gitlab_rails['external_diffs_object_store_connection'] = {
'provider' => 'AWS',
'region' => 'eu-central-1',
'use_iam_profile' => true
}
```
2019-03-02 22:35:43 +05:30
2019-03-13 22:55:13 +05:30
1. Save the file and [reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
2019-03-02 22:35:43 +05:30
**In installations from source:**
1. Edit `/home/git/gitlab/config/gitlab.yml` and add or amend the following
lines:
2019-09-30 21:07:59 +05:30
```yaml
external_diffs:
enabled: true
object_store:
enabled: true
remote_directory: "external-diffs" # The bucket name
connection:
provider: AWS # Only AWS supported at the moment
aws_access_key_id: AWS_ACCESS_KEY_ID
aws_secret_access_key: AWS_SECRET_ACCESS_KEY
region: eu-central-1
```
2019-03-02 22:35:43 +05:30
2019-03-13 22:55:13 +05:30
1. Save the file and [restart GitLab](restart_gitlab.md#installations-from-source) for the changes to take effect.
2019-07-07 11:18:12 +05:30
2019-09-30 21:07:59 +05:30
## Alternative in-database storage
2019-07-07 11:18:12 +05:30
Enabling external diffs may reduce the performance of merge requests, as they
must be retrieved in a separate operation to other data. A compromise may be
reached by only storing outdated diffs externally, while keeping current diffs
in the database.
To enable this feature, perform the following steps:
**In Omnibus installations:**
2019-09-30 21:07:59 +05:30
1. Edit `/etc/gitlab/gitlab.rb` and add the following line:
2019-07-07 11:18:12 +05:30
2019-09-30 21:07:59 +05:30
```ruby
gitlab_rails['external_diffs_when'] = 'outdated'
```
2019-07-07 11:18:12 +05:30
1. Save the file and [reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
**In installations from source:**
1. Edit `/home/git/gitlab/config/gitlab.yml` and add or amend the following
lines:
2019-09-30 21:07:59 +05:30
```yaml
external_diffs:
enabled: true
when: outdated
```
2019-07-07 11:18:12 +05:30
1. Save the file and [restart GitLab](restart_gitlab.md#installations-from-source) for the changes to take effect.
2021-03-11 19:13:27 +05:30
With this feature enabled, diffs are initially stored in the database, rather
than externally. They are moved to external storage after any of these
2019-07-07 11:18:12 +05:30
conditions become true:
- A newer version of the merge request diff exists
- The merge request was merged more than seven days ago
- The merge request was closed more than seven day ago
These rules strike a balance between space and performance by only storing
frequently-accessed diffs in the database. Diffs that are less likely to be
accessed are moved to external storage instead.
2020-07-28 23:09:34 +05:30
## Correcting incorrectly-migrated diffs
Versions of GitLab earlier than `v13.0.0` would incorrectly record the location
of some merge request diffs when [external diffs in object storage](#object-storage-settings)
were enabled. This mainly affected imported merge requests, and was resolved
with [this merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/31005).
2021-03-11 19:13:27 +05:30
If you are using object storage, or have never used on-disk storage for external
diffs, the **Changes** tab for some merge requests fails to load with a 500 error,
2020-07-28 23:09:34 +05:30
and the exception for that error is of this form:
```plain
Errno::ENOENT (No such file or directory @ rb_sysopen - /var/opt/gitlab/gitlab-rails/shared/external-diffs/merge_request_diffs/mr-6167082/diff-8199789)
```
2021-03-11 19:13:27 +05:30
Then you are affected by this issue. Because it's not possible to safely determine
2020-07-28 23:09:34 +05:30
all these conditions automatically, we've provided a Rake task in GitLab v13.2.0
that you can run manually to correct the data:
**In Omnibus installations:**
```shell
sudo gitlab-rake gitlab:external_diffs:force_object_storage
```
**In installations from source:**
```shell
sudo -u git -H bundle exec rake gitlab:external_diffs:force_object_storage RAILS_ENV=production
```
Environment variables can be provided to modify the behavior of the task. The
available variables are:
| Name | Default value | Purpose |
| ---- | ------------- | ------- |
| `ANSI` | `true` | Use ANSI escape codes to make output more understandable |
| `BATCH_SIZE` | `1000` | Iterate through the table in batches of this size |
| `START_ID` | `nil` | If set, begin scanning at this ID |
| `END_ID` | `nil` | If set, stop scanning at this ID |
| `UPDATE_DELAY` | `1` | Number of seconds to sleep between updates |
The `START_ID` and `END_ID` variables may be used to run the update in parallel,
by assigning different processes to different parts of the table. The `BATCH`
and `UPDATE_DELAY` parameters allow the speed of the migration to be traded off
against concurrent access to the table. The `ANSI` parameter should be set to
false if your terminal does not support ANSI escape codes.
2021-11-11 11:23:49 +05:30
By default, `sudo` does not preserve existing environment variables. You should append them, rather than prefix them.
```shell
sudo gitlab-rake gitlab:external_diffs:force_object_storage START_ID=59946109 END_ID=59946109 UPDATE_DELAY=5
```