46 KiB
stage | group | info |
---|---|---|
Systems | Distribution | To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments |
Object storage (FREE SELF)
GitLab supports using an object storage service for holding numerous types of data. It's recommended over NFS and in general it's better in larger setups as object storage is typically much more performant, reliable, and scalable.
Options
GitLab is tightly integrated with Fog
, so you can refer to its
documentation to check
which storage services can be integrated with GitLab.
Specifically, GitLab has been tested by vendors and customers on a number of object storage providers:
- Amazon S3
- Google Cloud Storage
- Digital Ocean Spaces
- Oracle Cloud Infrastructure
- OpenStack Swift (S3 compatible mode)
- Azure Blob storage
- On-premises hardware and appliances from various storage vendors, whose list is not officially established.
- MinIO. We have a guide to deploying this within our Helm Chart documentation.
Known compatibility issues
-
Dell EMC ECS: Prior to GitLab 13.3, there is a known bug in GitLab Workhorse that prevents HTTP Range Requests from working with CI job artifacts. Be sure to upgrade to GitLab 13.3.0 or above if you use S3 storage with this hardware.
-
Ceph S3 prior to Kraken 11.0.2 does not support the Upload Copy Part API. You may need to disable multi-threaded copying.
-
Amazon S3 Object Lock is not supported. Follow issue #335775 for progress on enabling this option.
Configuration guides
There are two ways of specifying object storage configuration in GitLab:
- Consolidated form: A single credential is shared by all supported object types.
- Storage-specific form: Every object defines its own object storage connection and configuration.
For more information on the differences and to transition from one form to another, see Transition to consolidated form.
If you are currently storing data locally, see Migrate to object storage for migration details.
Consolidated object storage configuration
Introduced in GitLab 13.2.
Using the consolidated object storage configuration has a number of advantages:
- It can simplify your GitLab configuration since the connection details are shared across object types.
- It enables the use of encrypted S3 buckets.
- It uploads files to S3 with proper
Content-MD5
headers.
Because direct upload mode must be enabled, only the following providers can be used:
When consolidated object storage is used, direct upload is enabled automatically. For storage-specific configuration, direct upload may become the default because it does not require a shared folder.
Consolidated object storage configuration can't be used for backups or Mattermost. See the full table for a complete list. However, backups can be configured with server side encryption separately.
Enabling consolidated object storage enables object storage for all object
types. If not all buckets are specified, sudo gitlab-ctl reconfigure
may fail with the error like:
Object storage for <object type> must have a bucket specified
If you want to use local storage for specific object types, you can selectively disable object storages.
Most types of objects, such as CI artifacts, LFS files, and upload attachments can be saved in object storage by specifying a single credential for object storage with multiple buckets.
When the consolidated form is:
- Used with an S3-compatible object storage, Workhorse uses its internal S3 client to upload files.
- Not used with an S3-compatible object storage, Workhorse falls back to using pre-signed URLs.
See the section on ETag mismatch errors for more details.
Use AWS S3
The following example uses AWS S3 to enable object storage for all supported services:
::Tabs
:::TabTitle Linux package (Omnibus)
-
Edit
/etc/gitlab/gitlab.rb
and add the following lines, substituting the values you want:# Consolidated object storage configuration gitlab_rails['object_store']['enabled'] = true gitlab_rails['object_store']['proxy_download'] = true gitlab_rails['object_store']['connection'] = { 'provider' => 'AWS', 'region' => 'eu-central-1', 'aws_access_key_id' => '<AWS_ACCESS_KEY_ID>', 'aws_secret_access_key' => '<AWS_SECRET_ACCESS_KEY>' } # OPTIONAL: The following lines are only needed if server side encryption is required gitlab_rails['object_store']['storage_options'] = { 'server_side_encryption' => '<AES256 or aws:kms>', 'server_side_encryption_kms_key_id' => '<arn:aws:kms:xxx>' } gitlab_rails['object_store']['objects']['artifacts']['bucket'] = 'gitlab-artifacts' gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = 'gitlab-mr-diffs' gitlab_rails['object_store']['objects']['lfs']['bucket'] = 'gitlab-lfs' gitlab_rails['object_store']['objects']['uploads']['bucket'] = 'gitlab-uploads' gitlab_rails['object_store']['objects']['packages']['bucket'] = 'gitlab-packages' gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = 'gitlab-dependency-proxy' gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = 'gitlab-terraform-state' gitlab_rails['object_store']['objects']['ci_secure_files']['bucket'] = 'gitlab-ci-secure-files' gitlab_rails['object_store']['objects']['pages']['bucket'] = 'gitlab-pages'
If you’re using AWS IAM profiles, omit the AWS access key and secret access key/value pairs. For example:
gitlab_rails['object_store']['connection'] = { 'provider' => 'AWS', 'region' => 'eu-central-1', 'use_iam_profile' => true }
-
Save the file and reconfigure GitLab:
sudo gitlab-ctl reconfigure
:::TabTitle Helm chart (Kubernetes)
-
Put the following content in a file named
object_storage.yaml
to be used as a Kubernetes Secret:provider: AWS region: us-east-1 aws_access_key_id: <AWS_ACCESS_KEY_ID> aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
If you’re using AWS IAM profiles, omit the AWS access key and secret access key/value pairs. For example:
provider: AWS region: us-east-1 use_iam_profile: true
-
Create the Kubernetes Secret:
kubectl create secret generic -n <namespace> gitlab-object-storage --from-file=connection=object_storage.yaml
-
Export the Helm values:
helm get values gitlab > gitlab_values.yaml
-
Edit
gitlab_values.yaml
:global: appConfig: object_store: enabled: false proxy_download: true storage_options: {} # server_side_encryption: # server_side_encryption_kms_key_id connection: secret: gitlab-object-storage lfs: enabled: true proxy_download: true bucket: gitlab-lfs connection: {} # secret: # key: artifacts: enabled: true proxy_download: true bucket: gitlab-artifacts connection: {} # secret: # key: uploads: enabled: true proxy_download: true bucket: gitlab-uploads connection: {} # secret: # key: packages: enabled: true proxy_download: true bucket: gitlab-packages connection: {} externalDiffs: enabled: true when: proxy_download: true bucket: gitlab-mr-diffs connection: {} terraformState: enabled: true bucket: gitlab-terraform-state connection: {} ciSecureFiles: enabled: true bucket: gitlab-ci-secure-files connection: {} dependencyProxy: enabled: true proxy_download: true bucket: gitlab-dependency-proxy connection: {}
-
Save the file and apply the new values:
helm upgrade -f gitlab_values.yaml gitlab gitlab/gitlab
:::TabTitle Docker
-
Edit
docker-compose.yml
:version: "3.6" services: gitlab: environment: GITLAB_OMNIBUS_CONFIG: | # Consolidated object storage configuration gitlab_rails['object_store']['enabled'] = true gitlab_rails['object_store']['proxy_download'] = true gitlab_rails['object_store']['connection'] = { 'provider' => 'AWS', 'region' => 'eu-central-1', 'aws_access_key_id' => '<AWS_ACCESS_KEY_ID>', 'aws_secret_access_key' => '<AWS_SECRET_ACCESS_KEY>' } # OPTIONAL: The following lines are only needed if server side encryption is required gitlab_rails['object_store']['storage_options'] = { 'server_side_encryption' => '<AES256 or aws:kms>', 'server_side_encryption_kms_key_id' => '<arn:aws:kms:xxx>' } gitlab_rails['object_store']['objects']['artifacts']['bucket'] = 'gitlab-artifacts' gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = 'gitlab-mr-diffs' gitlab_rails['object_store']['objects']['lfs']['bucket'] = 'gitlab-lfs' gitlab_rails['object_store']['objects']['uploads']['bucket'] = 'gitlab-uploads' gitlab_rails['object_store']['objects']['packages']['bucket'] = 'gitlab-packages' gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = 'gitlab-dependency-proxy' gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = 'gitlab-terraform-state' gitlab_rails['object_store']['objects']['ci_secure_files']['bucket'] = 'gitlab-ci-secure-files' gitlab_rails['object_store']['objects']['pages']['bucket'] = 'gitlab-pages'
If you’re using AWS IAM profiles, omit the AWS access key and secret access key/value pairs. For example:
gitlab_rails['object_store']['connection'] = { 'provider' => 'AWS', 'region' => 'eu-central-1', 'use_iam_profile' => true }
-
Save the file and restart GitLab:
docker compose up -d
:::TabTitle Self-compiled (source)
-
Edit
/home/git/gitlab/config/gitlab.yml
and add or amend the following lines:production: &base object_store: enabled: true proxy_download: true connection: provider: AWS aws_access_key_id: <AWS_ACCESS_KEY_ID> aws_secret_access_key: <AWS_SECRET_ACCESS_KEY> region: eu-central-1 storage_options: server_side_encryption: <AES256 or aws:kms> server_side_encryption_key_kms_id: <arn:aws:kms:xxx> objects: artifacts: bucket: gitlab-artifacts external_diffs: bucket: gitlab-mr-diffs lfs: bucket: gitlab-lfs uploads: bucket: gitlab-uploads packages: bucket: gitlab-packages dependency_proxy: bucket: gitlab-dependency-proxy terraform_state: bucket: gitlab-terraform-state ci_secure_files: bucket: gitlab-ci-secure-files pages: bucket: gitlab-pages
If you’re using AWS IAM profiles, omit the AWS access key and secret access key/value pairs. For example:
connection: provider: AWS region: eu-central-1 use_iam_profile: true
-
Edit
/home/git/gitlab-workhorse/config.toml
and add or amend the following lines:[object_storage] provider = "AWS" [object_storage.s3] aws_access_key_id = "<AWS_ACCESS_KEY_ID>" aws_secret_access_key = "<AWS_SECRET_ACCESS_KEY>"
If you’re using AWS IAM profiles, omit the AWS access key and secret access key/value pairs. For example:
[object_storage.s3] use_iam_profile = true
-
Save the file and restart GitLab:
# For systems running systemd sudo systemctl restart gitlab.target # For systems running SysV init sudo service gitlab restart
::EndTabs
Common parameters
In the consolidated configuration, the object_store
section defines a
common set of parameters. Here we use the YAML from the source
installation because it's easier to see the inheritance:
object_store:
enabled: true
proxy_download: true
connection:
provider: AWS
aws_access_key_id: <AWS_ACCESS_KEY_ID>
aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
objects:
...
The Omnibus configuration maps directly to this:
gitlab_rails['object_store']['enabled'] = true
gitlab_rails['object_store']['proxy_download'] = true
gitlab_rails['object_store']['connection'] = {
'provider' => 'AWS',
'aws_access_key_id' => '<AWS_ACCESS_KEY_ID',
'aws_secret_access_key' => '<AWS_SECRET_ACCESS_KEY>'
}
Setting | Description |
---|---|
enabled |
Enable or disable object storage. |
proxy_download |
Set to true to enable proxying all files served. Option allows to reduce egress traffic as this allows clients to download directly from remote storage instead of proxying all data. |
connection |
Various connection options described below. |
storage_options |
Options to use when saving new objects, such as server side encryption. Introduced in GitLab 13.3. |
objects |
Object-specific configuration. |
Connection settings
Both consolidated configuration form and storage-specific configuration form must configure a connection. The following sections describe parameters that can be used
in the connection
setting.
S3-compatible connection settings
The connection settings match those provided by fog-aws:
Setting | Description | Default |
---|---|---|
provider |
Always AWS for compatible hosts. |
AWS |
aws_access_key_id |
AWS credentials, or compatible. | |
aws_secret_access_key |
AWS credentials, or compatible. | |
aws_signature_version |
AWS signature version to use. 2 or 4 are valid options. Digital Ocean Spaces and other providers may need 2 . |
4 |
enable_signature_v4_streaming |
Set to true to enable HTTP chunked transfers with AWS v4 signatures. Oracle Cloud S3 needs this to be false . |
true |
region |
AWS region. | |
host |
DEPRECATED: Use endpoint instead. S3 compatible host for when not using AWS. For example, localhost or storage.example.com . HTTPS and port 443 is assumed. |
s3.amazonaws.com |
endpoint |
Can be used when configuring an S3 compatible service such as MinIO, by entering a URL such as http://127.0.0.1:9000 . This takes precedence over host . Always use endpoint for consolidated form. |
(optional) |
path_style |
Set to true to use host/bucket_name/object style paths instead of bucket_name.host/object . Set to true for using MinIO. Leave as false for AWS S3. |
false . |
use_iam_profile |
Set to true to use IAM profile instead of access keys. |
false |
aws_credentials_refresh_threshold_seconds |
Sets the automatic refresh threshold when using temporary credentials in IAM. | 15 |
Oracle Cloud S3 connection settings
Oracle Cloud S3 must be sure to use the following settings:
Setting | Value |
---|---|
enable_signature_v4_streaming |
false |
path_style |
true |
If enable_signature_v4_streaming
is set to true
, you may see the
following error in production.log
:
STREAMING-AWS4-HMAC-SHA256-PAYLOAD is not supported
Google Cloud Storage (GCS)
Here are the valid connection parameters for GCS:
Setting | Description | Example |
---|---|---|
provider |
Provider name. | Google |
google_project |
GCP project name. | gcp-project-12345 |
google_json_key_location |
JSON key path. | /path/to/gcp-project-12345-abcde.json |
google_json_key_string |
JSON key string. | { "type": "service_account", "project_id": "example-project-382839", ... } |
google_application_default |
Set to true to use Google Cloud Application Default Credentials to locate service account credentials. |
GitLab reads the value of google_json_key_location
, then google_json_key_string
, and finally, google_application_default
.
It uses the first of these settings that has a value.
The service account must have permission to access the bucket. For more information, see the Cloud Storage authentication documentation.
NOTE: Bucket encryption with the Cloud Key Management Service (KMS) is not supported and results in ETag mismatch errors.
Google example (consolidated form)
For Omnibus installations, this is an example of the connection
setting:
gitlab_rails['object_store']['connection'] = {
'provider' => 'Google',
'google_project' => '<GOOGLE PROJECT>',
'google_json_key_location' => '<FILENAME>'
}
Google example with ADC (consolidated form)
Introduced in GitLab 13.6.
Google Cloud Application Default Credentials (ADC) are typically used with GitLab to use the default service account. This eliminates the need to supply credentials for the instance. For example:
gitlab_rails['object_store']['connection'] = {
'provider' => 'Google',
'google_project' => '<GOOGLE PROJECT>',
'google_application_default' => true
}
If you use ADC, be sure that:
-
The service account that you use has the
iam.serviceAccounts.signBlob
permission. Typically this is done by granting theService Account Token Creator
role to the service account. -
Your virtual machines have the correct access scopes to access Google Cloud APIs. If the machines do not have the right scope, the error logs may show:
Google::Apis::ClientError (insufficientPermissions: Request had insufficient authentication scopes.)
Azure Blob storage
Introduced in GitLab 13.4.
Although Azure uses the word container
to denote a collection of
blobs, GitLab standardizes on the term bucket
. Be sure to configure
Azure container names in the bucket
settings.
Azure Blob storage can only be used with the consolidated form because a single set of credentials are used to access multiple containers. The storage-specific form is not supported. For more details, see how to transition to consolidated form.
The following are the valid connection parameters for Azure. For more information, see the Azure Blob Storage documentation.
Setting | Description | Example |
---|---|---|
provider |
Provider name. | AzureRM |
azure_storage_account_name |
Name of the Azure Blob Storage account used to access the storage. | azuretest |
azure_storage_access_key |
Storage account access key used to access the container. This is typically a secret, 512-bit encryption key encoded in base64. | czV2OHkvQj9FKEgrTWJRZVRoV21ZcTN0Nnc5eiRDJkYpSkBOY1JmVWpYbjJy\nNHU3eCFBJUQqRy1LYVBkU2dWaw==\n |
azure_storage_domain |
Domain name used to contact the Azure Blob Storage API (optional). Defaults to blob.core.windows.net . Set this if you are using Azure China, Azure Germany, Azure US Government, or some other custom Azure domain. |
blob.core.windows.net |
Azure example (consolidated form)
For Omnibus installations, this is an example of the connection
setting:
gitlab_rails['object_store']['connection'] = {
'provider' => 'AzureRM',
'azure_storage_account_name' => '<AZURE STORAGE ACCOUNT NAME>',
'azure_storage_access_key' => '<AZURE STORAGE ACCESS KEY>',
'azure_storage_domain' => '<AZURE STORAGE DOMAIN>'
}
Azure Workhorse settings (source installs only)
For source installations, Workhorse also needs to be configured with Azure credentials. This isn't needed in Omnibus installs, because the Workhorse settings are populated from the previous settings.
-
Edit
/home/git/gitlab-workhorse/config.toml
and add or amend the following lines:[object_storage] provider = "AzureRM" [object_storage.azurerm] azure_storage_account_name = "<AZURE STORAGE ACCOUNT NAME>" azure_storage_access_key = "<AZURE STORAGE ACCESS KEY>"
If you are using a custom Azure storage domain,
azure_storage_domain
does not have to be set in the Workhorse
configuration. This information is exchanged in an API call between
GitLab Rails and Workhorse.
Storj Gateway Configuration (SJ)
NOTE:
The Storj Gateway does not support multi-threaded copying (see UploadPartCopy
in the table).
While an implementation is planned, you must disable multi-threaded copying until completion.
The Storj Network provides an S3-compatible API gateway. Use the following configuration example:
gitlab_rails['object_store']['connection'] = {
'provider' => 'AWS',
'endpoint' => 'https://gateway.storjshare.io',
'path_style' => true,
'region' => 'eu1',
'aws_access_key_id' => 'ACCESS_KEY',
'aws_secret_access_key' => 'SECRET_KEY',
'aws_signature_version' => 2,
'enable_signature_v4_streaming' => false
}
The signature version must be 2
. Using v4 results in a HTTP 411 Length Required error.
For more information, see issue #4419.
Object-specific configuration
The following YAML shows how the object_store
section defines
object-specific configuration block and how the enabled
and
proxy_download
flags can be overridden. The bucket
is the only
required parameter within each type:
object_store:
connection:
...
objects:
artifacts:
bucket: artifacts
proxy_download: false
external_diffs:
bucket: external-diffs
lfs:
bucket: lfs-objects
uploads:
bucket: uploads
packages:
bucket: packages
dependency_proxy:
enabled: false
bucket: dependency_proxy
terraform_state:
bucket: terraform
pages:
bucket: pages
This maps to this Omnibus GitLab configuration:
gitlab_rails['object_store']['objects']['artifacts']['bucket'] = 'artifacts'
gitlab_rails['object_store']['objects']['artifacts']['proxy_download'] = false
gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = 'external-diffs'
gitlab_rails['object_store']['objects']['lfs']['bucket'] = 'lfs-objects'
gitlab_rails['object_store']['objects']['uploads']['bucket'] = 'uploads'
gitlab_rails['object_store']['objects']['packages']['bucket'] = 'packages'
gitlab_rails['object_store']['objects']['dependency_proxy']['enabled'] = false
gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = 'dependency-proxy'
gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = 'terraform-state'
gitlab_rails['object_store']['objects']['pages']['bucket'] = 'pages'
This is the list of valid objects
that can be used:
Type | Description |
---|---|
artifacts |
CI artifacts |
external_diffs |
Merge request diffs |
uploads |
User uploads |
lfs |
Git Large File Storage objects |
packages |
Project packages (for example, PyPI, Maven, or NuGet) |
dependency_proxy |
Dependency Proxy |
terraform_state |
Terraform state files |
pages |
Pages |
Within each object type, three parameters can be defined:
Setting | Required? | Description |
---|---|---|
bucket |
{check-circle} Yes | Bucket name for the object storage. |
enabled |
{dotted-circle} No | Overrides the common parameter. |
proxy_download |
{dotted-circle} No | Overrides the common parameter. |
Selectively disabling object storage
As seen above, object storage can be disabled for specific types by
setting the enabled
flag to false
. For example, to disable object
storage for CI artifacts:
gitlab_rails['object_store']['objects']['artifacts']['enabled'] = false
A bucket is not needed if the feature is disabled entirely. For example, no bucket is needed if CI artifacts are disabled with this setting:
gitlab_rails['artifacts_enabled'] = false
Migrate to object storage
To migrate existing local data to object storage see the following guides:
- Job artifacts including archived job logs
- LFS objects
- Uploads
- Merge request diffs
- Packages (optional feature)
- Dependency Proxy
- Terraform state files
- Pages content
Transition to consolidated form
Prior to GitLab 13.2:
- Object storage configuration for all types of objects such as CI/CD artifacts, LFS files, and upload attachments had to be configured independently.
- Object store connection parameters such as passwords and endpoint URLs had to be duplicated for each type.
For example, an Omnibus GitLab install might have the following configuration:
# Original object storage configuration
gitlab_rails['artifacts_object_store_enabled'] = true
gitlab_rails['artifacts_object_store_direct_upload'] = true
gitlab_rails['artifacts_object_store_proxy_download'] = true
gitlab_rails['artifacts_object_store_remote_directory'] = 'artifacts'
gitlab_rails['artifacts_object_store_connection'] = { 'provider' => 'AWS', 'aws_access_key_id' => 'access_key', 'aws_secret_access_key' => 'secret' }
gitlab_rails['uploads_object_store_enabled'] = true
gitlab_rails['uploads_object_store_direct_upload'] = true
gitlab_rails['uploads_object_store_proxy_download'] = true
gitlab_rails['uploads_object_store_remote_directory'] = 'uploads'
gitlab_rails['uploads_object_store_connection'] = { 'provider' => 'AWS', 'aws_access_key_id' => 'access_key', 'aws_secret_access_key' => 'secret' }
Although this provides flexibility in that it makes it possible for GitLab to store objects across different cloud providers, it also creates additional complexity and unnecessary redundancy. Since both GitLab Rails and Workhorse components need access to object storage, the consolidated form avoids excessive duplication of credentials.
The consolidated object storage configuration is used only if all lines from
the original form is omitted. To move to the consolidated form, remove the
original configuration (for example, artifacts_object_store_enabled
, or
uploads_object_store_connection
)
Storage-specific configuration
For configuring object storage in GitLab 13.1 and earlier, or for storage types not supported by consolidated configuration form, refer to the following guides:
Object storage type | Supported by consolidated configuration? |
---|---|
Backups | {dotted-circle} No |
Job artifacts including archived job logs | {check-circle} Yes |
LFS objects | {check-circle} Yes |
Uploads | {check-circle} Yes |
Container Registry (optional feature) | {dotted-circle} No |
Merge request diffs | {check-circle} Yes |
Mattermost | {dotted-circle} No |
Packages (optional feature) | {check-circle} Yes |
Dependency Proxy (optional feature) | {check-circle} Yes |
Autoscale runner caching (optional for improved performance) | {dotted-circle} No |
Terraform state files | {check-circle} Yes |
Pages content | {check-circle} Yes |
WARNING: The use of encrypted S3 buckets with non-consolidated configuration is not supported. You may start getting ETag mismatch errors if you use it.
Other alternatives to file system storage
If you're working to scale out your GitLab implementation, or add fault tolerance and redundancy, you may be looking at removing dependencies on block or network file systems. See the following additional guides:
- Make sure the
git
user home directory is on local disk. - Configure database lookup of SSH keys
to eliminate the need for a shared
authorized_keys
file. - Prevent local disk usage for job logs.
- Disable Pages local storage.
Warnings, limitations, and known issues
Objects are not included in GitLab backups
As noted in our backup documentation, objects are not included in GitLab backups. You can enable backups with your object storage provider instead.
Use separate buckets
Using separate buckets for each data type is the recommended approach for GitLab. This ensures there are no collisions across the various types of data GitLab stores. There are plans to enable the use of a single bucket in the future.
With Omnibus and source installations it is possible to split a single
real bucket into multiple virtual buckets. If your object storage
bucket is called my-gitlab-objects
you can configure uploads to go
into my-gitlab-objects/uploads
, artifacts into
my-gitlab-objects/artifacts
, etc. The application acts as if
these are separate buckets. Use of bucket prefixes
may not work correctly with Helm backups.
Helm-based installs require separate buckets to handle backup restorations.
S3 API compatibility issues
Not all S3 providers are fully compatible
with the Fog library that GitLab uses. Symptoms include an error in production.log
:
411 Length Required
Proxy Download
Clients can download files in object storage by receiving a pre-signed, time-limited URL, or by GitLab proxying the data from object storage to the client. Downloading files from object storage directly helps reduce the amount of egress traffic GitLab needs to process.
When the files are stored on local block storage or NFS, GitLab has to act as a proxy. This is not the default behavior with object storage.
The proxy_download
setting controls this behavior: the default is generally false
.
Verify this in the documentation for each use case. Set it to true
if you want
GitLab to proxy the files.
When not proxying files, GitLab returns an HTTP 302 redirect with a pre-signed, time-limited object storage URL. This can result in some of the following problems:
-
If GitLab is using non-secure HTTP to access the object storage, clients may generate
https->http
downgrade errors and refuse to process the redirect. The solution to this is for GitLab to use HTTPS. LFS, for example, generates this error:LFS: lfsapi/client: refusing insecure redirect, https->http
-
Clients need to trust the certificate authority that issued the object storage certificate, or may return common TLS errors such as:
x509: certificate signed by unknown authority
-
Clients need network access to the object storage. Network firewalls could block access. Errors that might result if this access is not in place include:
Received status code 403 from server: Forbidden
-
Object storage buckets need to allow Cross-Origin Resource Sharing (CORS) access from the URL of the GitLab instance. Attempting to load a PDF in the repository page may show the following error:
An error occurred while loading the file. Please try again later.
See the LFS documentation for more details.
Additionally for a short time period users could share pre-signed, time-limited object storage URLs with others without authentication. Also bandwidth charges may be incurred between the object storage provider and the client.
ETag mismatch
Using the default GitLab settings, some object storage back-ends such as
MinIO
and Alibaba
might generate ETag mismatch
errors.
If you are seeing this ETag mismatch error with Amazon Web Services S3, it's likely this is due to encryption settings on your bucket. To fix this issue, you have two options:
The first option is recommended for MinIO. Otherwise, the
workaround for MinIO
is to use the --compat
parameter on the server.
Without consolidated object store configuration or instance profiles enabled,
GitLab Workhorse uploads files to S3 using pre-signed URLs that do
not have a Content-MD5
HTTP header computed for them. To ensure data
is not corrupted, Workhorse checks that the MD5 hash of the data sent
equals the ETag header returned from the S3 server. When encryption is
enabled, this is not the case, which causes Workhorse to report an ETag mismatch
error during an upload.
With the consolidated object configuration and instance profile, Workhorse has
S3 credentials so that it can compute the Content-MD5
header. This
eliminates the need to compare ETag headers returned from the S3 server.
Encrypting buckets with the GCS Cloud Key Management Service (KMS) is not supported and results in ETag mismatch errors.
Using Amazon instance profiles
Instead of supplying AWS access and secret keys in object storage configuration, GitLab can be configured to use IAM roles to set up an Amazon instance profile. When this is used, GitLab fetches temporary credentials each time an S3 bucket is accessed, so no hard-coded values are needed in the configuration.
To use an Amazon instance profile, GitLab must be able to connect to the
instance metadata endpoint.
If GitLab is configured to use an Internet proxy, the endpoint IP
address must be added to the no_proxy
list.
Encrypted S3 buckets
- Introduced in GitLab 13.1 for instance profiles only and S3 default encryption.
- Introduced in GitLab 13.2 for static credentials when consolidated object storage configuration and S3 default encryption are used.
When configured either with an instance profile or with the consolidated object configuration, GitLab Workhorse properly uploads files to S3 buckets that have SSE-S3 or SSE-KMS encryption enabled by default. Customer master keys (CMKs) and SSE-C encryption are not supported since this requires sending the encryption keys in every request.
Server-side encryption headers
Introduced in GitLab 13.3.
Setting a default encryption on an S3 bucket is the easiest way to
enable encryption, but you may want to
set a bucket policy to ensure only encrypted objects are uploaded.
To do this, you must configure GitLab to send the proper encryption headers
in the storage_options
configuration section:
Setting | Description |
---|---|
server_side_encryption |
Encryption mode (AES256 or aws:kms ). |
server_side_encryption_kms_key_id |
Amazon Resource Name. Only needed when aws:kms is used in server_side_encryption . See the Amazon documentation on using KMS encryption. |
As with the case for default encryption, these options only work when the Workhorse S3 client is enabled. One of the following two conditions must be fulfilled:
use_iam_profile
istrue
in the connection settings.- Consolidated object storage settings are in use.
ETag mismatch errors occur if server side encryption headers are used without enabling the Workhorse S3 client.
IAM Permissions
To set up an instance profile:
-
Create an Amazon Identity Access and Management (IAM) role with the necessary permissions. The following example is a role for an S3 bucket named
test-bucket
:{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ], "Resource": "arn:aws:s3:::test-bucket/*" } ] }
-
Attach this role to the EC2 instance hosting your GitLab instance.
-
Configure GitLab to use it via the
use_iam_profile
configuration option.
Multi-threaded copying
GitLab uses the S3 Upload Part Copy API to accelerate the copying of files within a bucket. Ceph S3 prior to Kraken 11.0.2 does not support this and returns a 404 error when files are copied during the upload process.
The feature can be disabled using the :s3_multithreaded_uploads
feature flag. To disable the feature, ask a GitLab administrator with
Rails console access
to run the following command:
Feature.disable(:s3_multithreaded_uploads)
Migrate objects to a different object storage provider
You may need to migrate GitLab data in object storage to a different object storage provider. The following steps show you how do this using Rclone.
The steps assume you are moving the uploads
bucket, but the same process works for other buckets.
Prerequisites:
- Choose the computer to run Rclone on. Depending on how much data you are migrating, Rclone may have to run for a long time so you should avoid using a laptop or desktop computer that can go into power saving. You can use your GitLab server to run Rclone.
-
Install Rclone.
-
Configure Rclone by running the following:
rclone config
The configuration process is interactive. Add at least two "remotes": one for the object storage provider your data is currently on (
old
), and one for the provider you are moving to (new
). -
Verify that you can read the old data. The following example refers to the
uploads
bucket , but your bucket may have a different name:rclone ls old:uploads | head
This should print a partial list of the objects currently stored in your
uploads
bucket. If you get an error, or if the list is empty, go back and update your Rclone configuration usingrclone config
. -
Perform an initial copy. You do not need to take your GitLab server offline for this step.
rclone sync -P old:uploads new:uploads
-
After the first sync completes, use the web UI or command-line interface of your new object storage provider to verify that there are objects in the new bucket. If there are none, or if you encounter an error while running
rclone sync
, check your Rclone configuration and try again.
After you have done at least one successful Rclone copy from the old location to the new location, schedule maintenance and take your GitLab server offline. During your maintenance window you must do two things:
- Perform a final
rclone sync
run, knowing that your users cannot add new objects so you do not leave any behind in the old bucket. - Update the object storage configuration of your GitLab server to use the new provider for
uploads
.