debian-mirror-gitlab/doc/user/project/repository/reducing_the_repo_size_using_git.md

254 lines
9.8 KiB
Markdown
Raw Normal View History

2019-10-12 21:52:04 +05:30
---
2020-06-23 00:09:42 +05:30
stage: Create
group: Gitaly
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
2019-10-12 21:52:04 +05:30
type: howto
---
2020-06-23 00:09:42 +05:30
# Reduce repository size
2019-01-03 12:48:30 +05:30
2020-06-23 00:09:42 +05:30
Git repositories become larger over time. When large files are added to a Git repository:
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
- Fetching the repository becomes slower because everyone must download the files.
- They take up a large amount of storage space on the server.
- Git repository storage limits [can be reached](#storage-limits).
2017-09-10 17:25:29 +05:30
2020-06-23 00:09:42 +05:30
Rewriting a repository can remove unwanted history to make the repository smaller.
[`git filter-repo`](https://github.com/newren/git-filter-repo) is a tool for quickly rewriting Git
repository history, and is recommended over both:
2017-09-10 17:25:29 +05:30
2020-06-23 00:09:42 +05:30
- [`git filter-branch`](https://git-scm.com/docs/git-filter-branch).
- [BFG](https://rtyley.github.io/bfg-repo-cleaner/).
DANGER: **Danger:**
Rewriting repository history is a destructive operation. Make sure to backup your repository before
you begin. The best way back up a repository is to
[export the project](../settings/import_export.md#exporting-a-project-and-its-data).
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
## Purge files from repository history
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
To make cloning your project faster, rewrite branches and tags to remove unwanted files.
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
1. [Install `git filter-repo`](https://github.com/newren/git-filter-repo/blob/master/INSTALL.md)
using a supported package manager or from source.
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
1. Clone a fresh copy of the repository using `--bare`:
2019-02-15 15:39:39 +05:30
2020-03-13 15:44:24 +05:30
```shell
2020-06-23 00:09:42 +05:30
git clone --bare https://example.gitlab.com/my/project.git
2019-09-30 21:07:59 +05:30
```
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
1. Using `git filter-repo`, purge any files from the history of your repository.
To purge all large files, the `--strip-blobs-bigger-than` option can be used:
2019-02-15 15:39:39 +05:30
2020-03-13 15:44:24 +05:30
```shell
2020-06-23 00:09:42 +05:30
git filter-repo --strip-blobs-bigger-than 10M
2019-09-30 21:07:59 +05:30
```
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
To purge specific large files by path, the `--path` and `--invert-paths` options can be combined:
2019-02-15 15:39:39 +05:30
2020-03-13 15:44:24 +05:30
```shell
2020-06-23 00:09:42 +05:30
git filter-repo --path path/to/big/file.m4v --invert-paths
2019-09-30 21:07:59 +05:30
```
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
See the
[`git filter-repo` documentation](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
for more examples and the complete documentation.
1. Running `git filter-repo` removes all remotes. To restore the remote for your project, run:
2019-02-15 15:39:39 +05:30
2020-03-13 15:44:24 +05:30
```shell
2020-06-23 00:09:42 +05:30
git remote add origin https://example.gitlab.com/<namespace>/<project_name>.git
2019-09-30 21:07:59 +05:30
```
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
1. Force push your changes to overwrite all branches on GitLab:
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
```shell
git push origin --force --all
```
[Protected branches](../protected_branches.md) will cause this to fail. To proceed, you must
remove branch protection, push, and then re-enable protected branches.
1. To remove large files from tagged releases, force push your changes to all tags on GitLab:
2019-02-15 15:39:39 +05:30
2020-03-13 15:44:24 +05:30
```shell
2020-06-23 00:09:42 +05:30
git push origin --force --tags
2019-09-30 21:07:59 +05:30
```
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
[Protected tags](../protected_tags.md) will cause this to fail. To proceed, you must remove tag
protection, push, and then re-enable protected tags.
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
## Purge files from GitLab storage
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
To reduce the size of your repository in GitLab, you must remove GitLab internal references to
commits that contain large files. Before completing these steps,
[purge files from your repository history](#purge-files-from-repository-history).
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
As well as [branches](branches/index.md) and tags, which are a type of Git ref, GitLab automatically
creates other refs. These refs prevent dead links to commits, or missing diffs when viewing merge
requests. [Repository cleanup](#repository-cleanup) can be used to remove these from GitLab.
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
The following internal refs are not advertised:
2019-07-31 22:56:46 +05:30
2020-06-23 00:09:42 +05:30
- `refs/merge-requests/*` for merge requests.
- `refs/pipelines/*` for
[pipelines](../../../ci/pipelines/index.md#troubleshooting-fatal-reference-is-not-a-tree).
- `refs/environments/*` for environments.
2019-02-15 15:39:39 +05:30
2020-06-23 00:09:42 +05:30
This means they are not usually included when fetching, which makes fetching faster. In addition,
`refs/keep-around/*` are hidden refs to prevent commits with discussion from being deleted and
cannot be fetched at all.
2017-09-10 17:25:29 +05:30
2020-06-23 00:09:42 +05:30
However, these refs can be accessed from the Git bundle inside a project export.
2017-09-10 17:25:29 +05:30
2020-06-23 00:09:42 +05:30
1. [Install `git filter-repo`](https://github.com/newren/git-filter-repo/blob/master/INSTALL.md)
using a supported package manager or from source.
1. Generate a fresh [export from the
project](../settings/import_export.html#exporting-a-project-and-its-data) and download it.
1. Decompress the backup using `tar`:
2017-09-10 17:25:29 +05:30
2020-03-13 15:44:24 +05:30
```shell
2020-06-23 00:09:42 +05:30
tar xzf project-backup.tar.gz
2019-09-30 21:07:59 +05:30
```
2017-09-10 17:25:29 +05:30
2020-06-23 00:09:42 +05:30
This will contain a `project.bundle` file, which was created by
[`git bundle`](https://git-scm.com/docs/git-bundle).
1. Clone a fresh copy of the repository from the bundle:
2017-09-10 17:25:29 +05:30
2020-03-13 15:44:24 +05:30
```shell
2020-06-23 00:09:42 +05:30
git clone --bare --mirror /path/to/project.bundle
2019-09-30 21:07:59 +05:30
```
2017-09-10 17:25:29 +05:30
2020-06-23 00:09:42 +05:30
1. Using `git filter-repo`, purge any files from the history of your repository. Because we are
trying to remove internal refs, we will rely on the `commit-map` produced by each run to tell us
which internal refs to remove.
NOTE:**Note:**
`git filter-repo` creates a new `commit-map` file every run, and overwrite the `commit-map` from
the previous run. You will need this file from **every** run. Do the next step every time you run
`git filter-repo`.
To purge all large files, the `--strip-blobs-bigger-than` option can be used:
2017-09-10 17:25:29 +05:30
2020-03-13 15:44:24 +05:30
```shell
2020-06-23 00:09:42 +05:30
git filter-repo --strip-blobs-bigger-than 10M
2019-09-30 21:07:59 +05:30
```
2017-09-10 17:25:29 +05:30
2020-06-23 00:09:42 +05:30
To purge specific large files by path, the `--path` and `--invert-paths` options can be combined.
2017-09-10 17:25:29 +05:30
2020-03-13 15:44:24 +05:30
```shell
2020-06-23 00:09:42 +05:30
git filter-repo --path path/to/big/file.m4v --invert-paths
2019-09-30 21:07:59 +05:30
```
2017-09-10 17:25:29 +05:30
2020-06-23 00:09:42 +05:30
See the
[`git filter-repo` documentation](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
for more examples and the complete documentation.
1. Run a [repository cleanup](#repository-cleanup).
## Repository cleanup
> [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/19376) in GitLab 11.6.
Repository cleanup allows you to upload a text file of objects and GitLab will remove internal Git
references to these objects. You can use
[`git filter-repo`](https://github.com/newren/git-filter-repo) to produce a list of objects (in a
`commit-map` file) that can be used with repository cleanup.
To clean up a repository:
1. Go to the project for the repository.
1. Navigate to **{settings}** **Settings > Repository**.
1. Upload a list of objects. For example, a `commit-map` file.
1. Click **Start cleanup**.
This will:
- Remove any internal Git references to old commits.
- Run `git gc` against the repository.
You will receive an email once it has completed.
When using repository cleanup, note:
- Housekeeping prunes loose objects older than 2 weeks. This means objects added in the last 2 weeks
will not be removed immediately. If you have access to the
[Gitaly](../../../administration/gitaly/index.md) server, you may run `git gc --prune=now` to
prune all loose objects immediately.
- This process will remove some copies of the rewritten commits from GitLab's cache and database,
but there are still numerous gaps in coverage and some of the copies may persist indefinitely.
[Clearing the instance cache](../../../administration/raketasks/maintenance.md#clear-redis-cache)
may help to remove some of them, but it should not be depended on for security purposes!
## Storage limits
Repository size limits:
- Can [be set by an administrator](../../admin_area/settings/account_and_limit_settings.md#repository-size-limit-starter-only)
on self-managed instances. **(STARTER ONLY)**
- Are [set for GitLab.com](../../gitlab_com/index.md#repository-size-limit).
When a project has reached its size limit, you cannot:
- Push to the project.
- Create a new merge request.
- Merge existing merge requests.
- Upload LFS objects.
You can still:
- Create new issues.
- Clone the project.
If you exceed the repository size limit, you might try to:
1. Remove some data.
1. Make a new commit.
1. Push back to the repository.
Perhaps you might also:
- Move some blobs to LFS.
- Remove some old dependency updates from history.
Unfortunately, this workflow won't work. Deleting files in a commit doesn't actually reduce the size
of the repository because the earlier commits and blobs still exist.
What you need to do is rewrite history. We recommend the open-source community-maintained tool
[`git filter-repo`](https://github.com/newren/git-filter-repo).
NOTE: **Note:**
Until `git gc` runs on the GitLab side, the "removed" commits and blobs will still exist. You also
must be able to push the rewritten history to GitLab, which may be impossible if you've already
exceeded the maximum size limit.
In order to lift these restrictions, the administrator of the self-managed GitLab instance must
increase the limit on the particular project that exceeded it. Therefore, it's always better to
proactively stay underneath the limit. If you hit the limit, and can't have it temporarily
increased, your only option is to:
1. Prune all the unneeded stuff locally.
1. Create a new project on GitLab and start using that instead.
CAUTION: **Caution:**
This process is not suitable for removing sensitive data like password or keys from your repository.
Information about commits, including file content, is cached in the database, and will remain
visible even after they have been removed from the repository.
2019-10-12 21:52:04 +05:30
<!-- ## Troubleshooting
Include any troubleshooting steps that you can foresee. If you know beforehand what issues
one might have when setting this up, or when something is changed, or on upgrading, it's
important to describe those, too. Think of things that may go wrong and include them here.
This is important to minimize requests for support, and to avoid doc comments with
questions that you know someone might ask.
Each scenario can be a third-level heading, e.g. `### Getting error message X`.
If you have none to add when creating a doc, leave this section in place
but commented out to help encourage others to add to it in the future. -->