debian-mirror-gitlab/doc/administration/object_storage.md
2022-03-02 08:16:31 +05:30

38 KiB

stage group info
Enablement Distribution To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments

Object storage (FREE SELF)

GitLab supports using an object storage service for holding numerous types of data. It's recommended over NFS and in general it's better in larger setups as object storage is typically much more performant, reliable, and scalable.

Options

GitLab has been tested by vendors and customers on a number of object storage providers:

Known compatibility issues

Configuration guides

There are two ways of specifying object storage configuration in GitLab:

For more information on the differences and to transition from one form to another, see Transition to consolidated form.

If you are currently storing data locally, see Migrate to object storage for migration details.

Consolidated object storage configuration

Introduced in GitLab 13.2.

Using the consolidated object storage configuration has a number of advantages:

Because direct upload mode must be enabled, only the following providers can be used:

When consolidated object storage is used, direct upload is enabled automatically. Background upload is not supported. For storage-specific configuration, direct upload may become the default because it does not require a shared folder.

Consolidated object storage configuration can't be used for backups or Mattermost. See the full table for a complete list. However, backups can be configured with server side encryption separately.

Enabling consolidated object storage enables object storage for all object types. If not all buckets are specified, sudo gitlab-ctl reconfigure may fail with the error like:

Object storage for <object type> must have a bucket specified

If you want to use local storage for specific object types, you can selectively disable object storages.

Most types of objects, such as CI artifacts, LFS files, upload attachments, and so on can be saved in object storage by specifying a single credential for object storage with multiple buckets.

When the consolidated form is:

  • Used with an S3-compatible object storage, Workhorse uses its internal S3 client to upload files.
  • Not used with an S3-compatible object storage, Workhorse falls back to using pre-signed URLs.

See the section on ETag mismatch errors for more details.

In Omnibus installations:

  1. Edit /etc/gitlab/gitlab.rb and add the following lines, substituting the values you want:

    # Consolidated object storage configuration
    gitlab_rails['object_store']['enabled'] = true
    gitlab_rails['object_store']['proxy_download'] = true
    gitlab_rails['object_store']['connection'] = {
      'provider' => 'AWS',
      'region' => '<eu-central-1>',
      'aws_access_key_id' => '<AWS_ACCESS_KEY_ID>',
      'aws_secret_access_key' => '<AWS_SECRET_ACCESS_KEY>'
    }
    # OPTIONAL: The following lines are only needed if server side encryption is required
    gitlab_rails['object_store']['storage_options'] = {
      'server_side_encryption' => '<AES256 or aws:kms>',
      'server_side_encryption_kms_key_id' => '<arn:aws:kms:xxx>'
    }
    gitlab_rails['object_store']['objects']['artifacts']['bucket'] = '<artifacts>'
    gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = '<external-diffs>'
    gitlab_rails['object_store']['objects']['lfs']['bucket'] = '<lfs-objects>'
    gitlab_rails['object_store']['objects']['uploads']['bucket'] = '<uploads>'
    gitlab_rails['object_store']['objects']['packages']['bucket'] = '<packages>'
    gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = '<dependency-proxy>'
    gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = '<terraform-state>'
    gitlab_rails['object_store']['objects']['pages']['bucket'] = '<pages>'
    

    If you're using AWS IAM profiles, omit the AWS access key and secret access key/value pairs. For example:

    gitlab_rails['object_store']['connection'] = {
      'provider' => 'AWS',
      'region' => '<eu-central-1>',
      'use_iam_profile' => true
    }
    
  2. Save the file and reconfigure GitLab for the changes to take effect.

In installations from source:

  1. Edit /home/git/gitlab/config/gitlab.yml and add or amend the following lines:

    object_store:
      enabled: true
      proxy_download: true
      connection:
        provider: AWS
        aws_access_key_id: <AWS_ACCESS_KEY_ID>
        aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
        region: <eu-central-1>
      storage_options:
        server_side_encryption: <AES256 or aws:kms>
        server_side_encryption_key_kms_id: <arn:aws:kms:xxx>
      objects:
        artifacts:
          bucket: <artifacts>
        external_diffs:
          bucket: <external-diffs>
        lfs:
          bucket: <lfs-objects>
        uploads:
          bucket: <uploads>
        packages:
          bucket: <packages>
        dependency_proxy:
          bucket: <dependency_proxy>
        terraform_state:
          bucket: <terraform>
        pages:
          bucket: <pages>
    
  2. Edit /home/git/gitlab-workhorse/config.toml and add or amend the following lines:

    [object_storage]
      provider = "AWS"
    
    [object_storage.s3]
      aws_access_key_id = "<AWS_ACCESS_KEY_ID>"
      aws_secret_access_key = "<AWS_SECRET_ACCESS_KEY>"
    
  3. Save the file and restart GitLab for the changes to take effect.

Common parameters

In the consolidated configuration, the object_store section defines a common set of parameters. Here we use the YAML from the source installation because it's easier to see the inheritance:

    object_store:
      enabled: true
      proxy_download: true
      connection:
        provider: AWS
        aws_access_key_id: <AWS_ACCESS_KEY_ID>
        aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
      objects:
        ...

The Omnibus configuration maps directly to this:

gitlab_rails['object_store']['enabled'] = true
gitlab_rails['object_store']['proxy_download'] = true
gitlab_rails['object_store']['connection'] = {
  'provider' => 'AWS',
  'aws_access_key_id' => '<AWS_ACCESS_KEY_ID',
  'aws_secret_access_key' => '<AWS_SECRET_ACCESS_KEY>'
}
Setting Description
enabled Enable or disable object storage.
proxy_download Set to true to enable proxying all files served. Option allows to reduce egress traffic as this allows clients to download directly from remote storage instead of proxying all data.
connection Various connection options described below.
storage_options Options to use when saving new objects, such as server side encryption. Introduced in GitLab 13.3.
objects Object-specific configuration.

Connection settings

Both consolidated configuration form and storage-specific configuration form must configure a connection. The following sections describe parameters that can be used in the connection setting.

S3-compatible connection settings

The connection settings match those provided by fog-aws:

Setting Description Default
provider Always AWS for compatible hosts. AWS
aws_access_key_id AWS credentials, or compatible.
aws_secret_access_key AWS credentials, or compatible.
aws_signature_version AWS signature version to use. 2 or 4 are valid options. Digital Ocean Spaces and other providers may need 2. 4
enable_signature_v4_streaming Set to true to enable HTTP chunked transfers with AWS v4 signatures. Oracle Cloud S3 needs this to be false. true
region AWS region.
host S3 compatible host for when not using AWS. For example, localhost or storage.example.com. HTTPS and port 443 is assumed. s3.amazonaws.com
endpoint Can be used when configuring an S3 compatible service such as MinIO, by entering a URL such as http://127.0.0.1:9000. This takes precedence over host. (optional)
path_style Set to true to use host/bucket_name/object style paths instead of bucket_name.host/object. Leave as false for AWS S3. false.
use_iam_profile Set to true to use IAM profile instead of access keys. false

Oracle Cloud S3 connection settings

Oracle Cloud S3 must be sure to use the following settings:

Setting Value
enable_signature_v4_streaming false
path_style true

If enable_signature_v4_streaming is set to true, you may see the following error in production.log:

STREAMING-AWS4-HMAC-SHA256-PAYLOAD is not supported

Google Cloud Storage (GCS)

Here are the valid connection parameters for GCS:

Setting Description Example
provider Provider name. Google
google_project GCP project name. gcp-project-12345
google_client_email Email address of the service account. foo@gcp-project-12345.iam.gserviceaccount.com
google_json_key_location JSON key path. /path/to/gcp-project-12345-abcde.json
google_application_default Set to true to use Google Cloud Application Default Credentials to locate service account credentials.

The service account must have permission to access the bucket. Learn more in Google's Cloud Storage authentication documentation.

NOTE: Bucket encryption with the Cloud Key Management Service (KMS) is not supported and will result in ETag mismatch errors.

Google example (consolidated form)

For Omnibus installations, this is an example of the connection setting:

gitlab_rails['object_store']['connection'] = {
  'provider' => 'Google',
  'google_project' => '<GOOGLE PROJECT>',
  'google_client_email' => '<GOOGLE CLIENT EMAIL>',
  'google_json_key_location' => '<FILENAME>'
}
Google example with ADC (consolidated form)

Introduced in GitLab 13.6.

Google Cloud Application Default Credentials (ADC) are typically used with GitLab to use the default service account. This eliminates the need to supply credentials for the instance. For example:

gitlab_rails['object_store']['connection'] = {
  'provider' => 'Google',
  'google_project' => '<GOOGLE PROJECT>',
  'google_application_default' => true
}

If you use ADC, be sure that:

Azure Blob storage

Introduced in GitLab 13.4.

Although Azure uses the word container to denote a collection of blobs, GitLab standardizes on the term bucket. Be sure to configure Azure container names in the bucket settings.

Azure Blob storage can only be used with the consolidated form because a single set of credentials are used to access multiple containers. The storage-specific form is not supported. For more details, see how to transition to consolidated form.

The following are the valid connection parameters for Azure. Read the Azure Blob storage documentation to learn more.

Setting Description Example
provider Provider name. AzureRM
azure_storage_account_name Name of the Azure Blob Storage account used to access the storage. azuretest
azure_storage_access_key Storage account access key used to access the container. This is typically a secret, 512-bit encryption key encoded in base64. czV2OHkvQj9FKEgrTWJRZVRoV21ZcTN0Nnc5eiRDJkYpSkBOY1JmVWpYbjJy\nNHU3eCFBJUQqRy1LYVBkU2dWaw==\n
azure_storage_domain Domain name used to contact the Azure Blob Storage API (optional). Defaults to blob.core.windows.net. Set this if you are using Azure China, Azure Germany, Azure US Government, or some other custom Azure domain. blob.core.windows.net
Azure example (consolidated form)

For Omnibus installations, this is an example of the connection setting:

gitlab_rails['object_store']['connection'] = {
  'provider' => 'AzureRM',
  'azure_storage_account_name' => '<AZURE STORAGE ACCOUNT NAME>',
  'azure_storage_access_key' => '<AZURE STORAGE ACCESS KEY>',
  'azure_storage_domain' => '<AZURE STORAGE DOMAIN>'
}
Azure Workhorse settings (source installs only)

For source installations, Workhorse also needs to be configured with Azure credentials. This isn't needed in Omnibus installs, because the Workhorse settings are populated from the previous settings.

  1. Edit /home/git/gitlab-workhorse/config.toml and add or amend the following lines:

    [object_storage]
      provider = "AzureRM"
    
    [object_storage.azurerm]
      azure_storage_account_name = "<AZURE STORAGE ACCOUNT NAME>"
      azure_storage_access_key = "<AZURE STORAGE ACCESS KEY>"
    

If you are using a custom Azure storage domain, azure_storage_domain does not have to be set in the Workhorse configuration. This information is exchanged in an API call between GitLab Rails and Workhorse.

OpenStack-compatible connection settings

Although OpenStack Swift provides S3 compatibility, some users may want to use the Swift API.

This isn't compatible with the consolidated object storage form. OpenStack Swift is supported only with the storage-specific form. If you want to use the consolidated form, see the S3 settings.

Here are the valid connection settings for the Swift API, provided by fog-openstack:

Setting Description Default
provider Always OpenStack for compatible hosts. OpenStack
openstack_username OpenStack username.
openstack_api_key OpenStack API key.
openstack_temp_url_key OpenStack key for generating temporary URLs
openstack_auth_url OpenStack authentication endpoint
openstack_region OpenStack region.
openstack_tenant OpenStack tenant ID.

Rackspace Cloud Files

The following table describes the valid connection parameters for Rackspace Cloud, provided by fog-rackspace.

This isn't compatible with the consolidated object storage form. Rackspace Cloud is supported only with the storage-specific form.

Setting Description Example
provider Provider name. Rackspace
rackspace_username Username of the Rackspace account with access to the container. joe.smith
rackspace_api_key API key of the Rackspace account with access to the container. ABC123DEF456ABC123DEF456ABC123DE
rackspace_region Rackspace storage region to use, a three letter code from the list of service access endpoints. iad
rackspace_temp_url_key Private key you set in the Rackspace API for temporary URLs. ABC123DEF456ABC123DEF456ABC123DE

Regardless of whether the container has public access enabled or disabled, Fog uses the TempURL method to grant access to LFS objects. If you see error messages in logs that refer to instantiating storage with a temp-url-key, be sure you have set the key properly both in the Rackspace API and in gitlab.rb. You can verify the value of the key Rackspace has set by sending a GET request with token header to the service access endpoint URL and comparing the output of the returned headers.

Object-specific configuration

The following YAML shows how the object_store section defines object-specific configuration block and how the enabled and proxy_download flags can be overridden. The bucket is the only required parameter within each type:

  object_store:
      connection:
        ...
      objects:
        artifacts:
          bucket: artifacts
          proxy_download: false
        external_diffs:
          bucket: external-diffs
        lfs:
          bucket: lfs-objects
        uploads:
          bucket: uploads
        packages:
          bucket: packages
        dependency_proxy:
          enabled: false
          bucket: dependency_proxy
        terraform_state:
          bucket: terraform
        pages:
          bucket: pages

This maps to this Omnibus GitLab configuration:

gitlab_rails['object_store']['objects']['artifacts']['bucket'] = 'artifacts'
gitlab_rails['object_store']['objects']['artifacts']['proxy_download'] = false
gitlab_rails['object_store']['objects']['external_diffs']['bucket'] = 'external-diffs'
gitlab_rails['object_store']['objects']['lfs']['bucket'] = 'lfs-objects'
gitlab_rails['object_store']['objects']['uploads']['bucket'] = 'uploads'
gitlab_rails['object_store']['objects']['packages']['bucket'] = 'packages'
gitlab_rails['object_store']['objects']['dependency_proxy']['enabled'] = false
gitlab_rails['object_store']['objects']['dependency_proxy']['bucket'] = 'dependency-proxy'
gitlab_rails['object_store']['objects']['terraform_state']['bucket'] = 'terraform-state'
gitlab_rails['object_store']['objects']['pages']['bucket'] = 'pages'

This is the list of valid objects that can be used:

Type Description
artifacts CI artifacts
external_diffs Merge request diffs
uploads User uploads
lfs Git Large File Storage objects
packages Project packages (for example, PyPI, Maven, or NuGet)
dependency_proxy Dependency Proxy
terraform_state Terraform state files
pages Pages

Within each object type, three parameters can be defined:

Setting Required? Description
bucket {check-circle} Yes Bucket name for the object storage.
enabled {dotted-circle} No Overrides the common parameter.
proxy_download {dotted-circle} No Overrides the common parameter.

Selectively disabling object storage

As seen above, object storage can be disabled for specific types by setting the enabled flag to false. For example, to disable object storage for CI artifacts:

gitlab_rails['object_store']['objects']['artifacts']['enabled'] = false

A bucket is not needed if the feature is disabled entirely. For example, no bucket is needed if CI artifacts are disabled with this setting:

gitlab_rails['artifacts_enabled'] = false

Migrate to object storage

To migrate existing local data to object storage see the following guides:

Transition to consolidated form

Prior to GitLab 13.2:

  • Object storage configuration for all types of objects such as CI/CD artifacts, LFS files, upload attachments, and so on had to be configured independently.
  • Object store connection parameters such as passwords and endpoint URLs had to be duplicated for each type.

For example, an Omnibus GitLab install might have the following configuration:

# Original object storage configuration
gitlab_rails['artifacts_object_store_enabled'] = true
gitlab_rails['artifacts_object_store_direct_upload'] = true
gitlab_rails['artifacts_object_store_proxy_download'] = true
gitlab_rails['artifacts_object_store_remote_directory'] = 'artifacts'
gitlab_rails['artifacts_object_store_connection'] = { 'provider' => 'AWS', 'aws_access_key_id' => 'access_key', 'aws_secret_access_key' => 'secret' }
gitlab_rails['uploads_object_store_enabled'] = true
gitlab_rails['uploads_object_store_direct_upload'] = true
gitlab_rails['uploads_object_store_proxy_download'] = true
gitlab_rails['uploads_object_store_remote_directory'] = 'uploads'
gitlab_rails['uploads_object_store_connection'] = { 'provider' => 'AWS', 'aws_access_key_id' => 'access_key', 'aws_secret_access_key' => 'secret' }

Although this provides flexibility in that it makes it possible for GitLab to store objects across different cloud providers, it also creates additional complexity and unnecessary redundancy. Since both GitLab Rails and Workhorse components need access to object storage, the consolidated form avoids excessive duplication of credentials.

The consolidated object storage configuration is used only if all lines from the original form is omitted. To move to the consolidated form, remove the original configuration (for example, artifacts_object_store_enabled, or uploads_object_store_connection)

Storage-specific configuration

For configuring object storage in GitLab 13.1 and earlier, or for storage types not supported by consolidated configuration form, refer to the following guides:

Object storage type Supported by consolidated configuration?
Backups {dotted-circle} No
Job artifacts including archived job logs {check-circle} Yes
LFS objects {check-circle} Yes
Uploads {check-circle} Yes
Container Registry (optional feature) {dotted-circle} No
Merge request diffs {check-circle} Yes
Mattermost {dotted-circle} No
Packages (optional feature) {check-circle} Yes
Dependency Proxy (optional feature) {check-circle} Yes
Pseudonymizer (optional feature) {dotted-circle} No
Autoscale runner caching (optional for improved performance) {dotted-circle} No
Terraform state files {check-circle} Yes
Pages content {check-circle} Yes

Other alternatives to file system storage

If you're working to scale out your GitLab implementation, or add fault tolerance and redundancy, you may be looking at removing dependencies on block or network file systems. See the following additional guides:

  1. Make sure the git user home directory is on local disk.
  2. Configure database lookup of SSH keys to eliminate the need for a shared authorized_keys file.
  3. Prevent local disk usage for job logs.

Warnings, limitations, and known issues

Objects are not included in GitLab backups

As noted in our backup documentation, objects are not included in GitLab backups. You can enable backups with your object storage provider instead.

Use separate buckets

Using separate buckets for each data type is the recommended approach for GitLab. This ensures there are no collisions across the various types of data GitLab stores. There are plans to enable the use of a single bucket in the future.

Helm-based installs require separate buckets to handle backup restorations.

S3 API compatibility issues

Not all S3 providers are fully compatible with the Fog library that GitLab uses. Symptoms include an error in production.log:

411 Length Required

Proxy Download

Clients can download files in object storage by receiving a pre-signed, time-limited URL, or by GitLab proxying the data from object storage to the client. Downloading files from object storage directly helps reduce the amount of egress traffic GitLab needs to process.

When the files are stored on local block storage or NFS, GitLab has to act as a proxy. This is not the default behavior with object storage.

The proxy_download setting controls this behavior: the default is generally false. Verify this in the documentation for each use case. Set it to true if you want GitLab to proxy the files.

When not proxying files, GitLab returns an HTTP 302 redirect with a pre-signed, time-limited object storage URL. This can result in some of the following problems:

  • If GitLab is using non-secure HTTP to access the object storage, clients may generate https->http downgrade errors and refuse to process the redirect. The solution to this is for GitLab to use HTTPS. LFS, for example, generates this error:

    LFS: lfsapi/client: refusing insecure redirect, https->http
    
  • Clients need to trust the certificate authority that issued the object storage certificate, or may return common TLS errors such as:

    x509: certificate signed by unknown authority
    
  • Clients need network access to the object storage. Network firewalls could block access. Errors that might result if this access is not in place include:

    Received status code 403 from server: Forbidden
    

Getting a 403 Forbidden response is specifically called out on the package repository documentation as a side effect of how some build tools work.

Additionally for a short time period users could share pre-signed, time-limited object storage URLs with others without authentication. Also bandwidth charges may be incurred between the object storage provider and the client.

ETag mismatch

Using the default GitLab settings, some object storage back-ends such as MinIO and Alibaba might generate ETag mismatch errors.

If you are seeing this ETag mismatch error with Amazon Web Services S3, it's likely this is due to encryption settings on your bucket. To fix this issue, you have two options:

The first option is recommended for MinIO. Otherwise, the workaround for MinIO is to use the --compat parameter on the server.

Without consolidated object store configuration or instance profiles enabled, GitLab Workhorse uploads files to S3 using pre-signed URLs that do not have a Content-MD5 HTTP header computed for them. To ensure data is not corrupted, Workhorse checks that the MD5 hash of the data sent equals the ETag header returned from the S3 server. When encryption is enabled, this is not the case, which causes Workhorse to report an ETag mismatch error during an upload.

With the consolidated object configuration and instance profile, Workhorse has S3 credentials so that it can compute the Content-MD5 header. This eliminates the need to compare ETag headers returned from the S3 server.

Encrypting buckets with GCS' Cloud Key Management Service (KMS) is not supported and will result in ETag mismatch errors.

Using Amazon instance profiles

Instead of supplying AWS access and secret keys in object storage configuration, GitLab can be configured to use IAM roles to set up an Amazon instance profile. When this is used, GitLab fetches temporary credentials each time an S3 bucket is accessed, so no hard-coded values are needed in the configuration.

Encrypted S3 buckets

When configured either with an instance profile or with the consolidated object configuration, GitLab Workhorse properly uploads files to S3 buckets that have SSE-S3 or SSE-KMS encryption enabled by default. Customer master keys (CMKs) and SSE-C encryption are not supported since this requires sending the encryption keys in every request.

Server-side encryption headers

Introduced in GitLab 13.3.

Setting a default encryption on an S3 bucket is the easiest way to enable encryption, but you may want to set a bucket policy to ensure only encrypted objects are uploaded. To do this, you must configure GitLab to send the proper encryption headers in the storage_options configuration section:

Setting Description
server_side_encryption Encryption mode (AES256 or aws:kms).
server_side_encryption_kms_key_id Amazon Resource Name. Only needed when aws:kms is used in server_side_encryption. See the Amazon documentation on using KMS encryption.

As with the case for default encryption, these options only work when the Workhorse S3 client is enabled. One of the following two conditions must be fulfilled:

  • use_iam_profile is true in the connection settings.
  • Consolidated object storage settings are in use.

ETag mismatch errors occur if server side encryption headers are used without enabling the Workhorse S3 client.

IAM Permissions

To set up an instance profile:

  1. Create an Amazon Identity Access and Management (IAM) role with the necessary permissions. The following example is a role for an S3 bucket named test-bucket:

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject"
                ],
                "Resource": "arn:aws:s3:::test-bucket/*"
            }
        ]
    }
    
  2. Attach this role to the EC2 instance hosting your GitLab instance.

  3. Configure GitLab to use it via the use_iam_profile configuration option.

Multi-threaded copying

GitLab uses the S3 Upload Part Copy API to accelerate the copying of files within a bucket. Ceph S3 prior to Kraken 11.0.2 does not support this and returns a 404 error when files are copied during the upload process.

The feature can be disabled using the :s3_multithreaded_uploads feature flag. To disable the feature, ask a GitLab administrator with Rails console access to run the following command:

Feature.disable(:s3_multithreaded_uploads)