debian-mirror-gitlab/doc/administration/git_annex.md

238 lines
8.1 KiB
Markdown
Raw Normal View History

2019-12-26 22:10:19 +05:30
---
2020-10-24 23:57:45 +05:30
stage: Create
group: Source Code
2021-02-22 17:27:13 +05:30
info: "To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments"
2020-10-24 23:57:45 +05:30
type: reference, howto
2019-12-26 22:10:19 +05:30
disqus_identifier: 'https://docs.gitlab.com/ee/workflow/git_annex.html'
---
# Git annex
2021-02-22 17:27:13 +05:30
WARNING:
2020-07-28 23:09:34 +05:30
[Git Annex support was removed](https://gitlab.com/gitlab-org/gitlab/-/issues/1648)
in GitLab 9.0. Read through the [migration guide from git-annex to Git LFS](../topics/git/lfs/migrate_from_git_annex_to_git_lfs.md).
2019-12-26 22:10:19 +05:30
The biggest limitation of Git, compared to some older centralized version
2020-05-24 23:13:21 +05:30
control systems has been the maximum size of the repositories.
2019-12-26 22:10:19 +05:30
The general recommendation is to not have Git repositories larger than 1GB to
preserve performance. Although GitLab has no limit (some repositories in GitLab
are over 50GB!), we subscribe to the advice to keep repositories as small as
you can.
Not being able to version control large binaries is a big problem for many
larger organizations.
2020-04-08 14:13:33 +05:30
Videos, photos, audio, compiled binaries, and many other types of files are too
2019-12-26 22:10:19 +05:30
large. As a workaround, people keep artwork-in-progress in a Dropbox folder and
only check in the final result. This results in using outdated files, not
2020-05-24 23:13:21 +05:30
having a complete history, and increases the risk of losing work.
2019-12-26 22:10:19 +05:30
This problem is solved in GitLab Enterprise Edition by integrating the
2020-05-24 23:13:21 +05:30
[git-annex](https://git-annex.branchable.com/) application.
2019-12-26 22:10:19 +05:30
`git-annex` allows managing large binaries with Git without checking the
contents into Git.
You check-in only a symlink that contains the SHA-1 of the large binary. If you
need the large binary, you can sync it from the GitLab server over `rsync`, a
very fast file copying tool.
## GitLab git-annex Configuration
2021-03-11 19:13:27 +05:30
`git-annex` is disabled by default in GitLab. Below are the
2019-12-26 22:10:19 +05:30
configuration options required to enable it.
### Requirements
2020-05-24 23:13:21 +05:30
`git-annex` needs to be installed both on the server and the client-side.
2019-12-26 22:10:19 +05:30
2020-04-08 14:13:33 +05:30
For Debian-like systems (for example, Debian and Ubuntu) this can be achieved by running:
2019-12-26 22:10:19 +05:30
2020-04-08 14:13:33 +05:30
```shell
2019-12-26 22:10:19 +05:30
sudo apt-get update && sudo apt-get install git-annex
```
2020-04-08 14:13:33 +05:30
For RedHat-like systems (for example, CentOS and RHEL) this can be achieved by running:
2019-12-26 22:10:19 +05:30
2020-04-08 14:13:33 +05:30
```shell
2019-12-26 22:10:19 +05:30
sudo yum install epel-release && sudo yum install git-annex
```
### Configuration for Omnibus packages
For Omnibus GitLab packages, only one configuration setting is needed.
2021-03-11 19:13:27 +05:30
The Omnibus package internally sets the correct options in all locations.
2019-12-26 22:10:19 +05:30
1. In `/etc/gitlab/gitlab.rb` add the following line:
```ruby
gitlab_shell['git_annex_enabled'] = true
```
2020-05-24 23:13:21 +05:30
1. Save the file and [reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
2019-12-26 22:10:19 +05:30
### Configuration for installations from source
There are 2 settings to enable git-annex on your GitLab server.
One is located in `config/gitlab.yml` of the GitLab repository and the other
one is located in `config.yml` of GitLab Shell.
1. In `config/gitlab.yml` add or edit the following lines:
```yaml
gitlab_shell:
git_annex_enabled: true
```
1. In `config.yml` of GitLab Shell add or edit the following lines:
```yaml
git_annex_enabled: true
```
2020-05-24 23:13:21 +05:30
1. Save the files and [restart GitLab](restart_gitlab.md#installations-from-source) for the changes to take effect.
2019-12-26 22:10:19 +05:30
## Using GitLab git-annex
2021-02-22 17:27:13 +05:30
NOTE:
2020-10-24 23:57:45 +05:30
Your Git remotes must be using the SSH protocol, not HTTP(S).
2019-12-26 22:10:19 +05:30
Here is an example workflow of uploading a very large file and then checking it
into your Git repository:
2020-03-13 15:44:24 +05:30
```shell
2019-12-26 22:10:19 +05:30
git clone git@example.com:group/project.git
git annex init 'My Laptop' # initialize the annex project and give an optional description
cp ~/tmp/debian.iso ./ # copy a large file into the current directory
git annex add debian.iso # add the large file to git annex
git commit -am "Add Debian iso" # commit the file metadata
git annex sync --content # sync the Git repo and large file to the GitLab server
```
The output should look like this:
2020-04-08 14:13:33 +05:30
```plaintext
2019-12-26 22:10:19 +05:30
commit
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
ok
pull origin
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 5 (delta 2), reused 0 (delta 0)
Unpacking objects: 100% (5/5), done.
From example.com:group/project
497842b..5162f80 git-annex -> origin/git-annex
ok
(merging origin/git-annex into git-annex...)
(recording state in git...)
copy debian.iso (checking origin...) (to origin...)
SHA256E-s26214400--8092b3d482fb1b7a5cf28c43bc1425c8f2d380e86869c0686c49aa7b0f086ab2.iso
26,214,400 100% 638.88kB/s 0:00:40 (xfr#1, to-chk=0/1)
ok
pull origin
ok
(recording state in git...)
push origin
Counting objects: 15, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (13/13), done.
Writing objects: 100% (15/15), 1.64 KiB | 0 bytes/s, done.
Total 15 (delta 1), reused 0 (delta 0)
To example.com:group/project.git
* [new branch] git-annex -> synced/git-annex
* [new branch] master -> synced/master
ok
```
2021-03-11 19:13:27 +05:30
Your files can be found in the `master` branch, but more branches are created
by the `annex sync` command.
2019-12-26 22:10:19 +05:30
2021-03-11 19:13:27 +05:30
Git Annex creates a new directory at `.git/annex/` and records the
2019-12-26 22:10:19 +05:30
tracked files in the `.git/config` file. The files you assign to be tracked
2021-03-11 19:13:27 +05:30
with `git-annex` don't affect the existing `.git/config` records. The files
2019-12-26 22:10:19 +05:30
are turned into symbolic links that point to data in `.git/annex/objects/`.
2021-03-11 19:13:27 +05:30
The `debian.iso` file in the example contain the symbolic link:
2019-12-26 22:10:19 +05:30
2020-04-08 14:13:33 +05:30
```plaintext
2019-12-26 22:10:19 +05:30
.git/annex/objects/ZW/1k/SHA256E-s82701--6384039733b5035b559efd5a2e25a493ab6e09aabfd5162cc03f6f0ec238429d.png/SHA256E-s82701--6384039733b5035b559efd5a2e25a493ab6e09aabfd5162cc03f6f0ec238429d.iso
```
Use `git annex info` to retrieve the information about the local copy of your
repository.
---
2021-03-11 19:13:27 +05:30
You can download a single large file with these commands:
2019-12-26 22:10:19 +05:30
2020-03-13 15:44:24 +05:30
```shell
2019-12-26 22:10:19 +05:30
git clone git@gitlab.example.com:group/project.git
git annex sync # sync Git branches but not the large file
git annex get debian.iso # download the large file
```
To download all files:
2020-03-13 15:44:24 +05:30
```shell
2019-12-26 22:10:19 +05:30
git clone git@gitlab.example.com:group/project.git
git annex sync --content # sync Git branches and download all the large files
```
By using `git-annex` without GitLab, anyone that can access the server can also
2021-03-11 19:13:27 +05:30
access the files of all projects. GitLab Annex ensures that you can only
2019-12-26 22:10:19 +05:30
access files of projects you have access to (developer, maintainer, or owner role).
## How it works
2020-05-24 23:13:21 +05:30
Internally GitLab uses [GitLab Shell](https://gitlab.com/gitlab-org/gitlab-shell) to handle SSH access and this was a great
2019-12-26 22:10:19 +05:30
integration point for `git-annex`.
There is a setting in GitLab Shell so you can disable GitLab Annex support
if you want to.
## Troubleshooting tips
2020-05-24 23:13:21 +05:30
Differences in the version of `git-annex` on the GitLab server and on local machines
2019-12-26 22:10:19 +05:30
can cause `git-annex` to raise unpredicted warnings and errors.
2020-05-24 23:13:21 +05:30
Consult the [Annex upgrade page](https://git-annex.branchable.com/upgrades/) for more information about
2019-12-26 22:10:19 +05:30
the differences between versions. You can find out which version is installed
2020-07-28 23:09:34 +05:30
on your server by navigating to `https://pkgs.org/download/git-annex` and
2019-12-26 22:10:19 +05:30
searching for your distribution.
Although there is no general guide for `git-annex` errors, there are a few tips
on how to go around the warnings.
### `git-annex-shell: Not a git-annex or gcrypt repository`
This warning can appear on the initial `git annex sync --content` and is caused
by differences in `git-annex-shell`. You can read more about it
2020-05-24 23:13:21 +05:30
[in this git-annex issue](https://git-annex.branchable.com/forum/Error_from_git-annex-shell_on_creation_of_gcrypt_special_remote/).
2019-12-26 22:10:19 +05:30
One important thing to note is that despite the warning, the `sync` succeeds
and the files are pushed to the GitLab repository.
If you get hit by this, you can run the following command inside the repository
that the warning was raised:
2020-04-08 14:13:33 +05:30
```shell
2019-12-26 22:10:19 +05:30
git config remote.origin.annex-ignore false
```
Consecutive runs of `git annex sync --content` **should not** produce this
warning and the output should look like this:
2020-04-08 14:13:33 +05:30
```plaintext
2019-12-26 22:10:19 +05:30
commit ok
pull origin
ok
pull origin
ok
push origin
```