debian-mirror-gitlab/doc/administration/geo/disaster_recovery/background_verification.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

198 lines
7.8 KiB
Markdown
Raw Normal View History

2020-06-23 00:09:42 +05:30
---
2022-07-23 23:45:48 +05:30
stage: Systems
2020-06-23 00:09:42 +05:30
group: Geo
2022-11-25 23:54:43 +05:30
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
2020-06-23 00:09:42 +05:30
---
2021-03-11 19:13:27 +05:30
# Automatic background verification **(PREMIUM SELF)**
2019-07-31 22:56:46 +05:30
2021-02-22 17:27:13 +05:30
NOTE:
2019-07-31 22:56:46 +05:30
Automatic background verification of repositories and wikis was added in
GitLab EE 10.6 but is enabled by default only on GitLab EE 11.1. You can
disable or enable this feature manually by following
[these instructions](#disabling-or-enabling-the-automatic-background-verification).
Automatic background verification ensures that the transferred data matches a
2022-08-13 15:12:31 +05:30
calculated checksum. If the checksum of the data on the **primary** site matches checksum of the
data on the **secondary** site, the data transferred successfully. Following a planned failover,
2019-07-31 22:56:46 +05:30
any corrupted data may be **lost**, depending on the extent of the corruption.
2022-08-13 15:12:31 +05:30
If verification fails on the **primary** site, this indicates Geo is replicating a corrupted object.
You can restore it from backup or remove it from the **primary** site to resolve the issue.
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
If verification succeeds on the **primary** site but fails on the **secondary** site,
2019-07-31 22:56:46 +05:30
this indicates that the object was corrupted during the replication process.
Geo actively try to correct verification failures marking the repository to
2020-04-08 14:13:33 +05:30
be resynced with a back-off period. If you want to reset the verification for
2020-04-22 19:07:51 +05:30
these failures, so you should follow [these instructions](background_verification.md#reset-verification-for-projects-where-verification-has-failed).
2019-07-31 22:56:46 +05:30
If verification is lagging significantly behind replication, consider giving
2022-08-13 15:12:31 +05:30
the site more time before scheduling a planned failover.
2019-07-31 22:56:46 +05:30
## Disabling or enabling the automatic background verification
2022-08-13 15:12:31 +05:30
Run the following commands in a [Rails console](../../operations/rails_console.md) on a **Rails node on the primary** site.
2019-07-31 22:56:46 +05:30
To check if automatic background verification is enabled:
```ruby
Gitlab::Geo.repository_verification_enabled?
```
To disable automatic background verification:
```ruby
Feature.disable('geo_repository_verification')
```
To enable automatic background verification:
```ruby
Feature.enable('geo_repository_verification')
```
## Repository verification
2022-08-13 15:12:31 +05:30
On the **primary** site:
2019-07-31 22:56:46 +05:30
2022-10-11 01:57:18 +05:30
1. On the top bar, select **Main menu > Admin**.
2022-08-13 15:12:31 +05:30
1. On the left sidebar, select **Geo > Sites**.
1. Expand **Verification information** tab for that site to view automatic checksumming
2021-09-04 01:27:46 +05:30
status for repositories and wikis. Successes are shown in green, pending work
in gray, and failures in red.
2019-07-31 22:56:46 +05:30
2021-09-04 01:27:46 +05:30
![Verification status](img/verification_status_primary_v14_0.png)
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
On the **secondary** site:
2021-09-04 01:27:46 +05:30
2022-10-11 01:57:18 +05:30
1. On the top bar, select **Main menu > Admin**.
2022-08-13 15:12:31 +05:30
1. On the left sidebar, select **Geo > Sites**.
1. Expand **Verification information** tab for that site to view automatic checksumming
2021-09-04 01:27:46 +05:30
status for repositories and wikis. Successes are shown in green, pending work
in gray, and failures in red.
![Verification status](img/verification_status_secondary_v14_0.png)
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
## Using checksums to compare Geo sites
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
To check the health of Geo **secondary** sites, we use a checksum over the list of
2019-07-31 22:56:46 +05:30
Git references and their values. The checksum includes `HEAD`, `heads`, `tags`,
2022-08-13 15:12:31 +05:30
`notes`, and GitLab-specific references to ensure true consistency. If two sites
2019-07-31 22:56:46 +05:30
have the same checksum, then they definitely hold the same references. We compute
2022-08-13 15:12:31 +05:30
the checksum for every site after every update to make sure that they are all
2019-07-31 22:56:46 +05:30
in sync.
## Repository re-verification
Due to bugs or transient infrastructure failures, it is possible for Git
repositories to change unexpectedly without being marked for verification.
Geo constantly reverifies the repositories to ensure the integrity of the
data. The default and recommended re-verification interval is 7 days, though
an interval as short as 1 day can be set. Shorter intervals reduce risk but
increase load and vice versa.
2022-08-13 15:12:31 +05:30
On the **primary** site:
2021-09-04 01:27:46 +05:30
2022-10-11 01:57:18 +05:30
1. On the top bar, select **Main menu > Admin**.
2022-08-13 15:12:31 +05:30
1. On the left sidebar, select **Geo > Sites**.
1. Select **Edit** for the **primary** site to customize the minimum
2021-09-04 01:27:46 +05:30
re-verification interval:
2019-07-31 22:56:46 +05:30
2021-09-04 01:27:46 +05:30
![Re-verification interval](img/reverification-interval.png)
2019-07-31 22:56:46 +05:30
The automatic background re-verification is enabled by default, but you can
2022-08-13 15:12:31 +05:30
disable if you need. Run the following commands in a [Rails console](../../operations/rails_console.md) on a **Rails node on the primary** site:
2019-07-31 22:56:46 +05:30
To disable automatic background re-verification:
```ruby
Feature.disable('geo_repository_reverification')
```
To enable automatic background re-verification:
```ruby
Feature.enable('geo_repository_reverification')
```
## Reset verification for projects where verification has failed
2022-08-13 15:12:31 +05:30
Geo actively tries to correct verification failures marking the repository to
2020-04-08 14:13:33 +05:30
be resynced with a back-off period. If you want to reset them manually, this
2020-04-22 19:07:51 +05:30
Rake task marks projects where verification has failed or the checksum mismatch
2020-04-08 14:13:33 +05:30
to be resynced without the back-off period:
2019-07-31 22:56:46 +05:30
2023-01-13 00:05:48 +05:30
Run the appropriate commands on a **Rails node on the secondary** site.
2022-08-13 15:12:31 +05:30
2019-07-31 22:56:46 +05:30
For repositories:
2020-03-13 15:44:24 +05:30
```shell
2019-09-04 21:01:54 +05:30
sudo gitlab-rake geo:verification:repository:reset
```
2019-07-31 22:56:46 +05:30
For wikis:
2020-03-13 15:44:24 +05:30
```shell
2019-09-04 21:01:54 +05:30
sudo gitlab-rake geo:verification:wiki:reset
```
2019-07-31 22:56:46 +05:30
## Reconcile differences with checksum mismatches
2022-08-13 15:12:31 +05:30
If the **primary** and **secondary** sites have a checksum verification mismatch, the cause may not be apparent. To find the cause of a checksum mismatch:
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
1. On the **primary** site:
2022-10-11 01:57:18 +05:30
1. On the top bar, select **Main menu > Admin**.
2021-09-04 01:27:46 +05:30
1. On the left sidebar, select **Overview > Projects**.
1. Find the project that you want to check the checksum differences and
select its name.
1. On the project administration page get the **Gitaly storage name**,
and **Gitaly relative path**.
2019-07-31 22:56:46 +05:30
2021-09-04 01:27:46 +05:30
![Project administration page](img/checksum-differences-admin-project-page.png)
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
1. On a **Gitaly node on the primary** site and a **Gitaly node on the secondary** site, go to the project's repository directory. If using Gitaly Cluster, [check that it is in a healthy state](../../gitaly/troubleshooting.md#check-cluster-health) prior to running these commands.
The default path is `/var/opt/gitlab/git-data/repositories`. If `git_data_dirs`
2021-09-04 01:27:46 +05:30
is customized, check the directory layout on your server to be sure:
2019-07-31 22:56:46 +05:30
2020-03-13 15:44:24 +05:30
```shell
2019-09-30 21:07:59 +05:30
cd /var/opt/gitlab/git-data/repositories
```
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
1. Run the following command on the **primary** site, redirecting the output to a file:
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
```shell
git show-ref --head | grep -E "HEAD|(refs/(heads|tags|keep-around|merge-requests|environments|notes)/)" > primary-site-refs
```
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
1. Run the following command on the **secondary** site, redirecting the output to a file:
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
```shell
git show-ref --head | grep -E "HEAD|(refs/(heads|tags|keep-around|merge-requests|environments|notes)/)" > secondary-site-refs
```
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
1. Copy the files from the previous steps on the same system, and do a diff between the contents:
2019-07-31 22:56:46 +05:30
2022-08-13 15:12:31 +05:30
```shell
diff primary-site-refs secondary-site-refs
```
2019-07-31 22:56:46 +05:30
## Current limitations
2019-09-30 21:07:59 +05:30
Automatic background verification doesn't cover attachments, LFS objects,
job artifacts, and user uploads in file storage. You can keep track of the
2020-04-22 19:07:51 +05:30
progress to include them in [Geo: Verify all replicated data](https://gitlab.com/groups/gitlab-org/-/epics/1430).
For now, you can verify their integrity
manually by following [these instructions](../../raketasks/check.md) on both
2022-08-13 15:12:31 +05:30
sites, and comparing the output between them.
2019-09-30 21:07:59 +05:30
2020-04-08 14:13:33 +05:30
In GitLab EE 12.1, Geo calculates checksums for attachments, LFS objects, and
2022-08-13 15:12:31 +05:30
archived traces on secondary sites after the transfer, compares it with the
2021-10-27 15:23:28 +05:30
stored checksums, and rejects transfers if mismatched. Geo
2019-09-30 21:07:59 +05:30
currently does not support an automatic way to verify these data if they have
been synced before GitLab EE 12.1.
2019-07-31 22:56:46 +05:30
Data in object storage is **not verified**, as the object store is responsible
for ensuring the integrity of the data.