debian-mirror-gitlab/doc/administration/operations/moving_repositories.md

243 lines
9.1 KiB
Markdown
Raw Normal View History

2021-01-03 14:25:43 +05:30
---
2021-01-29 00:20:46 +05:30
stage: Create
group: Gitaly
2021-01-03 14:25:43 +05:30
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
2021-01-29 00:20:46 +05:30
type: reference
2021-01-03 14:25:43 +05:30
---
2021-01-29 00:20:46 +05:30
# Moving repositories managed by GitLab **(CORE ONLY)**
2016-11-03 12:29:30 +05:30
Sometimes you need to move all repositories managed by GitLab to
2021-01-29 00:20:46 +05:30
another file system or another server.
## Moving data within a GitLab instance
The GitLab API is the recommended way to move Git repositories:
- Between servers.
- Between different storage.
- From single-node Gitaly to Gitaly Cluster.
For more information, see:
- [Configuring additional storage for Gitaly](../gitaly/index.md#network-architecture). Within this
example, additional storage called `storage1` and `storage2` is configured.
- [The API documentation](../../api/project_repository_storage_moves.md) details the endpoints for
querying and scheduling repository moves.
- [Migrate existing repositories to Gitaly Cluster](../gitaly/praefect.md#migrate-existing-repositories-to-gitaly-cluster).
### Limitations
Read more in the [API documentation](../../api/project_repository_storage_moves.md#limitations).
## Migrating to another GitLab instance
[Using the API](#moving-data-within-a-gitlab-instance) isn't an option if you are migrating to a new
GitLab environment, for example:
- From a single-node GitLab to a scaled-out architecture.
- From a GitLab instance in your private datacenter to a cloud provider.
The rest of the document looks
2016-11-03 12:29:30 +05:30
at some of the ways you can copy all your repositories from
`/var/opt/gitlab/git-data/repositories` to `/mnt/gitlab/repositories`.
2021-01-29 00:20:46 +05:30
We look at three scenarios:
- The target directory is empty.
- The target directory contains an outdated copy of the repositories.
- How to deal with thousands of repositories.
DANGER: **Warning:**
Each of the approaches we list can or does overwrite data in the target directory
`/mnt/gitlab/repositories`. Do not mix up the source and the target.
### Recommended approach in all cases
GitLab's [backup and restore capability](../../raketasks/backup_restore.md) should be used. Git
repositories are accessed, managed, and stored on GitLab servers by Gitaly as a database. Data loss
can result from directly accessing and copying Gitaly's files using tools like `rsync`.
2016-11-03 12:29:30 +05:30
2021-01-29 00:20:46 +05:30
- From GitLab 13.3, backup performance can be improved by
[processing multiple repositories concurrently](../../raketasks/backup_restore.md#back-up-git-repositories-concurrently).
- Backups can be created of just the repositories using the
[skip feature](../../raketasks/backup_restore.md#excluding-specific-directories-from-the-backup).
2016-11-03 12:29:30 +05:30
2021-01-29 00:20:46 +05:30
### Target directory is empty: use a `tar` pipe
2016-11-03 12:29:30 +05:30
If the target directory `/mnt/gitlab/repositories` is empty the
2020-10-24 23:57:45 +05:30
simplest thing to do is to use a `tar` pipe. This method has low
overhead and `tar` is almost always already installed on your system.
However, it is not possible to resume an interrupted `tar` pipe: if
2016-11-03 12:29:30 +05:30
that happens then all data must be copied again.
2020-03-13 15:44:24 +05:30
```shell
2018-12-05 23:21:45 +05:30
sudo -u git sh -c 'tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\
tar -C /mnt/gitlab/repositories -xf -'
2016-11-03 12:29:30 +05:30
```
If you want to see progress, replace `-xf` with `-xvf`.
2021-01-29 00:20:46 +05:30
#### `tar` pipe to another server
2016-11-03 12:29:30 +05:30
2020-10-24 23:57:45 +05:30
You can also use a `tar` pipe to copy data to another server. If your
2020-04-22 19:07:51 +05:30
`git` user has SSH access to the new server as `git@newserver`, you
2016-11-03 12:29:30 +05:30
can pipe the data through SSH.
2020-03-13 15:44:24 +05:30
```shell
2018-12-05 23:21:45 +05:30
sudo -u git sh -c 'tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\
ssh git@newserver tar -C /mnt/gitlab/repositories -xf -'
2016-11-03 12:29:30 +05:30
```
If you want to compress the data before it goes over the network
2021-01-29 00:20:46 +05:30
(which costs you CPU cycles) you can replace `ssh` with `ssh -C`.
2016-11-03 12:29:30 +05:30
2021-01-29 00:20:46 +05:30
### The target directory contains an outdated copy of the repositories: use `rsync`
DANGER: **Warning:**
Using `rsync` to migrate Git data can cause data loss and repository corruption.
[These instructions are being reviewed](https://gitlab.com/gitlab-org/gitlab/-/issues/270422).
2016-11-03 12:29:30 +05:30
If the target directory already contains a partial / outdated copy
of the repositories it may be wasteful to copy all the data again
2020-10-24 23:57:45 +05:30
with `tar`. In this scenario it is better to use `rsync`. This utility
2016-11-03 12:29:30 +05:30
is either already installed on your system or easily installable
2021-01-29 00:20:46 +05:30
via `apt`, `yum`, and so on.
2016-11-03 12:29:30 +05:30
2020-03-13 15:44:24 +05:30
```shell
2018-12-05 23:21:45 +05:30
sudo -u git sh -c 'rsync -a --delete /var/opt/gitlab/git-data/repositories/. \
/mnt/gitlab/repositories'
2016-11-03 12:29:30 +05:30
```
The `/.` in the command above is very important, without it you can
easily get the wrong directory structure in the target directory.
If you want to see progress, replace `-a` with `-av`.
2021-01-29 00:20:46 +05:30
#### Single `rsync` to another server
DANGER: **Warning:**
Using `rsync` to migrate Git data can cause data loss and repository corruption.
[These instructions are being reviewed](https://gitlab.com/gitlab-org/gitlab/-/issues/270422).
2016-11-03 12:29:30 +05:30
2019-12-21 20:55:43 +05:30
If the `git` user on your source system has SSH access to the target
2020-10-24 23:57:45 +05:30
server you can send the repositories over the network with `rsync`.
2016-11-03 12:29:30 +05:30
2020-03-13 15:44:24 +05:30
```shell
2018-12-05 23:21:45 +05:30
sudo -u git sh -c 'rsync -a --delete /var/opt/gitlab/git-data/repositories/. \
git@newserver:/mnt/gitlab/repositories'
2016-11-03 12:29:30 +05:30
```
2021-01-29 00:20:46 +05:30
### Thousands of Git repositories: use one `rsync` per repository
DANGER: **Warning:**
Using `rsync` to migrate Git data can cause data loss and repository corruption.
[These instructions are being reviewed](https://gitlab.com/gitlab-org/gitlab/-/issues/270422).
2016-11-03 12:29:30 +05:30
2020-10-24 23:57:45 +05:30
Every time you start an `rsync` job it has to inspect all files in
2016-11-03 12:29:30 +05:30
the source directory, all files in the target directory, and then
decide what files to copy or not. If the source or target directory
2020-10-24 23:57:45 +05:30
has many contents this startup phase of `rsync` can become a burden
for your GitLab server. In cases like this you can make `rsync`'s
2016-11-03 12:29:30 +05:30
life easier by dividing its work in smaller pieces, and sync one
repository at a time.
2021-01-29 00:20:46 +05:30
In addition to `rsync` we use [GNU Parallel](http://www.gnu.org/software/parallel/).
This utility is not included in GitLab so you need to install it yourself with `apt`
or `yum`. Also note that the GitLab scripts we used below were added in GitLab 8.1.
2016-11-03 12:29:30 +05:30
2020-06-11 16:45:22 +05:30
**This process does not clean up repositories at the target location that no
2021-01-29 00:20:46 +05:30
longer exist at the source.**
2016-11-03 12:29:30 +05:30
2021-01-29 00:20:46 +05:30
#### Parallel `rsync` for all repositories known to GitLab
2016-11-03 12:29:30 +05:30
2021-01-29 00:20:46 +05:30
DANGER: **Warning:**
Using `rsync` to migrate Git data can cause data loss and repository corruption.
[These instructions are being reviewed](https://gitlab.com/gitlab-org/gitlab/-/issues/270422).
This syncs repositories with 10 `rsync` processes at a time. We keep
2016-11-03 12:29:30 +05:30
track of progress so that the transfer can be restarted if necessary.
2019-12-21 20:55:43 +05:30
First we create a new directory, owned by `git`, to hold transfer
2016-11-03 12:29:30 +05:30
logs. We assume the directory is empty before we start the transfer
procedure, and that we are the only ones writing files in it.
2020-03-13 15:44:24 +05:30
```shell
2016-11-03 12:29:30 +05:30
# Omnibus
sudo mkdir /var/opt/gitlab/transfer-logs
sudo chown git:git /var/opt/gitlab/transfer-logs
# Source
sudo -u git -H mkdir /home/git/transfer-logs
```
We seed the process with a list of the directories we want to copy.
2020-03-13 15:44:24 +05:30
```shell
2016-11-03 12:29:30 +05:30
# Omnibus
sudo -u git sh -c 'gitlab-rake gitlab:list_repos > /var/opt/gitlab/transfer-logs/all-repos-$(date +%s).txt'
# Source
cd /home/git/gitlab
sudo -u git -H sh -c 'bundle exec rake gitlab:list_repos > /home/git/transfer-logs/all-repos-$(date +%s).txt'
```
Now we can start the transfer. The command below is idempotent, and
the number of jobs done by GNU Parallel should converge to zero. If it
2018-12-05 23:21:45 +05:30
does not, some repositories listed in `all-repos-1234.txt` may have been
2016-11-03 12:29:30 +05:30
deleted/renamed before they could be copied.
2020-03-13 15:44:24 +05:30
```shell
2016-11-03 12:29:30 +05:30
# Omnibus
sudo -u git sh -c '
cat /var/opt/gitlab/transfer-logs/* | sort | uniq -u |\
/usr/bin/env JOBS=10 \
/opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \
/var/opt/gitlab/transfer-logs/success-$(date +%s).log \
/var/opt/gitlab/git-data/repositories \
/mnt/gitlab/repositories
'
# Source
cd /home/git/gitlab
sudo -u git -H sh -c '
cat /home/git/transfer-logs/* | sort | uniq -u |\
/usr/bin/env JOBS=10 \
bin/parallel-rsync-repos \
/home/git/transfer-logs/success-$(date +%s).log \
/home/git/repositories \
/mnt/gitlab/repositories
`
```
2021-01-29 00:20:46 +05:30
#### Parallel `rsync` only for repositories with recent activity
DANGER: **Warning:**
Using `rsync` to migrate Git data can cause data loss and repository corruption.
[These instructions are being reviewed](https://gitlab.com/gitlab-org/gitlab/-/issues/270422).
2016-11-03 12:29:30 +05:30
Suppose you have already done one sync that started after 2015-10-1 12:00 UTC.
Then you might only want to sync repositories that were changed via GitLab
2018-12-05 23:21:45 +05:30
_after_ that time. You can use the `SINCE` variable to tell `rake
gitlab:list_repos` to only print repositories with recent activity.
2016-11-03 12:29:30 +05:30
2020-03-13 15:44:24 +05:30
```shell
2016-11-03 12:29:30 +05:30
# Omnibus
sudo gitlab-rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\
sudo -u git \
/usr/bin/env JOBS=10 \
/opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \
success-$(date +%s).log \
/var/opt/gitlab/git-data/repositories \
/mnt/gitlab/repositories
# Source
cd /home/git/gitlab
sudo -u git -H bundle exec rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\
sudo -u git -H \
/usr/bin/env JOBS=10 \
bin/parallel-rsync-repos \
success-$(date +%s).log \
/home/git/repositories \
/mnt/gitlab/repositories
```