info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments
In the following cases, consider using file system data transfer or snapshots as part of your backup strategy:
- Your GitLab instance contains a lot of Git repository data and the GitLab backup script is too slow.
- Your GitLab instance has a lot of forked projects and the regular backup task duplicates the Git data for all of them.
- Your GitLab instance has a problem and using the regular backup and import Rake tasks isn't possible.
When considering using file system data transfer or snapshots:
- Don't use these methods to migrate from one operating system to another. The operating systems of the source and destination should be as similar as possible. For example,
don't use these methods to migrate from Ubuntu to Fedora.
- Data consistency is very important. We recommend stopping GitLab with `sudo gitlab-ctl stop` before taking doing a file system transfer (with rsync, for example) or taking a
For manually backing up the Git repository data on disk, there are multiple possible strategies:
- Use snapshots, such as the previous examples of Amazon EBS drive snapshots, or LVM snapshots + rsync.
- Use [GitLab Geo](../administration/geo/index.md) and rely on the repository data on a Geo secondary site.
- [Prevent writes and copy the Git repository data](#prevent-writes-and-copy-the-git-repository-data).
- [Create an online backup by marking repositories as read-only (experimental)](#online-backup-through-marking-repositories-as-read-only-experimental).
#### Prevent writes and copy the Git repository data
Git repositories must be copied in a consistent way. They should not be copied during concurrent write
operations, as this can lead to inconsistencies or corruption issues. For more details,
- Create explicit downtime by stopping all Gitaly services before backing up the repositories:
```shell
sudo gitlab-ctl stop gitaly
# execute git data copy step
sudo gitlab-ctl start gitaly
```
You can copy Git repository data using any method, as long as writes are prevented on the data being copied
(to prevent inconsistencies and corruption issues). In order of preference and safety, the recommended methods are:
1. Use `rsync` with archive-mode, delete, and checksum options, for example:
```shell
rsync -aR --delete --checksum source destination # be extra safe with the order as it will delete existing data if inverted
```
1. Use a [`tar` pipe to copy the entire repository's directory to another server or location](../administration/operations/moving_repositories.md#tar-pipe-to-another-server).
1. Use `sftp`, `scp`, `cp`, or any other copying method.
#### Online backup through marking repositories as read-only (experimental)
One way of backing up repositories without requiring instance-wide downtime
is to programmatically mark projects as read-only while copying the underlying data.
There are a few possible downsides to this:
- Repositories are read-only for a period of time that scales with the size of the repository.
- Backups take a longer time to complete due to marking each project as read-only, potentially leading to inconsistencies. For example,
a possible date discrepancy between the last data available for the first project that gets backed up compared to
the last project that gets backed up.
- Fork networks should be entirely read-only while the projects inside get backed up to prevent potential changes to the pool repository.
to address [CVE-2018-1058](https://www.postgresql.org/about/news/postgresql-103-968-9512-9417-and-9322-released-1834/).
Since connections are reused with PgBouncer in transaction pooling mode,
PostgreSQL fails to search the default `public` schema. As a result,
this clearing of the search path causes tables and columns to appear
missing.
### Bypassing PgBouncer
There are two ways to fix this:
1. [Use environment variables to override the database settings](#environment-variable-overrides) for the backup task.
1. Reconfigure a node to [connect directly to the PostgreSQL primary database node](../administration/postgresql/pgbouncer.md#procedure-for-bypassing-pgbouncer).
#### Environment variable overrides
By default, GitLab uses the database configuration stored in a
configuration file (`database.yml`). However, you can override the database settings
for the backup and restore task by setting environment
variables that are prefixed with `GITLAB_BACKUP_`:
-`GITLAB_BACKUP_PGHOST`
-`GITLAB_BACKUP_PGUSER`
-`GITLAB_BACKUP_PGPORT`
-`GITLAB_BACKUP_PGPASSWORD`
-`GITLAB_BACKUP_PGSSLMODE`
-`GITLAB_BACKUP_PGSSLKEY`
-`GITLAB_BACKUP_PGSSLCERT`
-`GITLAB_BACKUP_PGSSLROOTCERT`
-`GITLAB_BACKUP_PGSSLCRL`
-`GITLAB_BACKUP_PGSSLCOMPRESSION`
For example, to override the database host and port to use 192.168.1.10
<!-- some details borrowed from GitLab.com move from Azure to GCP detailed at https://gitlab.com/gitlab-com/migration/-/blob/master/.gitlab/issue_templates/failover.md -->
You can use GitLab backup and restore to migrate your instance to a new server. This section outlines a typical procedure for a GitLab deployment running on a single server.
If you're running GitLab Geo, an alternative option is [Geo disaster recovery for planned failover](../administration/geo/disaster_recovery/planned_failover.md).
WARNING:
Avoid uncoordinated data processing by both the new and old servers, where multiple
servers could connect concurrently and process the same data. For example, when using
[incoming email](../administration/incoming_email.md), if both GitLab instances are
See [Manually replicate the primary site's SSH host keys](../administration/geo/replication/configuration.md#step-2-manually-replicate-the-primary-sites-ssh-host-keys) for example steps.
1. While still under the Sidekiq dashboard, select **Cron** and then **Enable All**
to re-enable periodic background jobs.
1. Test that read-only operations on the GitLab instance work as expected. For example, browse through project repository files, merge requests, and issues.
1. If you know the specific group or project from which you wish to delete variables, you can include a `WHERE` statement to specify that in your `DELETE`:
```sql
DELETE FROM ci_group_variables WHERE group_id = <GROUPID>;
DELETE FROM ci_variables WHERE project_id = <PROJECTID>;
the restore process ([issue #62759](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/62759 "Incorrect permissions on registry filesystem after restore")).
This problem stops the backup script from completing. To fix this problem, you must truncate the filenames causing the problem. A maximum of 246 characters, including the file extension, is permitted.
A [known issue](https://gitlab.com/gitlab-org/gitlab-foss/issues/45425) caused object store uploads to remain after a parent resource was deleted. This issue was [resolved](https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/18698).
#### Truncate the filenames referenced by the database
You must truncate the files referenced by the database that are causing the problem. The filenames referenced by the database are stored:
- In the `uploads` table.
- In the references found. Any reference found from other database tables and columns.
- On the filesystem.
Truncate the filenames in the `uploads` table:
1. Enter the database console:
For Omnibus GitLab 14.2 and later:
```shell
sudo gitlab-rails dbconsole --database main
```
For Omnibus GitLab 14.1 and earlier:
```shell
sudo gitlab-rails dbconsole
```
For installations from source, GitLab 14.2 and later:
```shell
sudo -u git -H bundle exec rails dbconsole -e production --database main
```
For installations from source, GitLab 14.1 and earlier:
```shell
sudo -u git -H bundle exec rails dbconsole -e production
```
1. Search the `uploads` table for filenames longer than 246 characters:
The following query selects the `uploads` records with filenames longer than 246 characters in batches of 0 to 10000. This improves the performance on large GitLab instances with tables having thousand of records.
```sql
CREATE TEMP TABLE uploads_with_long_filenames AS
SELECT ROW_NUMBER() OVER(ORDER BY id) row_id, id, path
FROM uploads AS u
WHERE LENGTH((regexp_match(u.path, '[^\\/:*?"<>|\r\n]+$'))[1]) > 246;
CREATE INDEX ON uploads_with_long_filenames(row_id);
SELECT
u.id,
u.path,
-- Current filename
(regexp_match(u.path, '[^\\/:*?"<>|\r\n]+$'))[1] AS current_filename,
COALESCE(SUBSTRING((regexp_match(u.path, '[^\\/:*?"<>|\r\n]+$'))[1] FROM '\.(?:.(?!\.))+$'))
)
) AS new_path
FROM uploads_with_long_filenames AS u
WHERE u.row_id > 0 AND u.row_id <= 10000;
```
Output example:
```postgresql
-[ RECORD 1 ]----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-`current_filename`: a filename that is currently more than 246 characters long.
-`new_filename`: a filename that has been truncated to 246 characters maximum.
-`new_path`: new path considering the new_filename (truncated).
Once you validate the batch results, you must change the batch size (`row_id`) using the following sequence of numbers (10000 to 20000). Repeat this process until you reach the last record in the `uploads` table.
1. Rename the files found in the `uploads` table from long filenames to new truncated filenames. The following query rolls back the update so you can check the results safely within a transaction wrapper:
```sql
CREATE TEMP TABLE uploads_with_long_filenames AS
SELECT ROW_NUMBER() OVER(ORDER BY id) row_id, path, id
FROM uploads AS u
WHERE LENGTH((regexp_match(u.path, '[^\\/:*?"<>|\r\n]+$'))[1]) > 246;
CREATE INDEX ON uploads_with_long_filenames(row_id);
COALESCE(SUBSTRING((regexp_match(updatable_uploads.path, '[^\\/:*?"<>|\r\n]+$'))[1] FROM '\.(?:.(?!\.))+$'))
)
)
FROM
uploads_with_long_filenames AS updatable_uploads
WHERE
uploads.id = updatable_uploads.id
AND updatable_uploads.row_id > 0 AND updatable_uploads.row_id <= 10000
RETURNING uploads.*
)
SELECT id, path FROM updated_uploads;
ROLLBACK;
```
Once you validate the batch update results, you must change the batch size (`row_id`) using the following sequence of numbers (10000 to 20000). Repeat this process until you reach the last record in the `uploads` table.
1. Validate that the new filenames from the previous query are the expected ones. If you are sure you want to truncate the records found in the previous step to 246 characters, run the following:
WARNING:
The following action is **irreversible**.
```sql
CREATE TEMP TABLE uploads_with_long_filenames AS
SELECT ROW_NUMBER() OVER(ORDER BY id) row_id, path, id
FROM uploads AS u
WHERE LENGTH((regexp_match(u.path, '[^\\/:*?"<>|\r\n]+$'))[1]) > 246;
CREATE INDEX ON uploads_with_long_filenames(row_id);
COALESCE(SUBSTRING((regexp_match(updatable_uploads.path, '[^\\/:*?"<>|\r\n]+$'))[1] FROM '\.(?:.(?!\.))+$'))
)
)
FROM
uploads_with_long_filenames AS updatable_uploads
WHERE
uploads.id = updatable_uploads.id
AND updatable_uploads.row_id > 0 AND updatable_uploads.row_id <= 10000;
```
Once you finish the batch update, you must change the batch size (`updatable_uploads.row_id`) using the following sequence of numbers (10000 to 20000). Repeat this process until you reach the last record in the `uploads` table.
Truncate the filenames in the references found:
1. Check if those records are referenced somewhere. One way to do this is to dump the database and search for the parent directory name and filename:
1. To dump your database, you can use the following command as an example:
1. Then you can search for the references using the `grep` command. Combining the parent directory and the filename can be a good idea. For example:
```shell
grep public/alongfilenamehere.txt gitlab-dump.tmp
```
1. Replace those long filenames using the new filenames obtained from querying the `uploads` table.
Truncate the filenames on the filesystem. You must manually rename the files in your filesystem to the new filenames obtained from querying the `uploads` table.
#### Re-run the backup task
After following all the previous steps, re-run the backup task.