123 lines
4.6 KiB
Markdown
123 lines
4.6 KiB
Markdown
---
|
|
stage: Enablement
|
|
group: Distribution
|
|
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
|
|
---
|
|
|
|
# Pseudonymizer **(ULTIMATE)**
|
|
|
|
Your GitLab database contains sensitive information. To protect sensitive information
|
|
when you run analytics on your database, you can use the Pseudonymizer service, which:
|
|
|
|
1. Uses `HMAC(SHA256)` to mutate fields containing sensitive information.
|
|
1. Preserves references (referential integrity) between fields.
|
|
1. Exports your GitLab data, scrubbed of sensitive material.
|
|
|
|
WARNING:
|
|
If the source data is available, users can compare and correlate the scrubbed data
|
|
with the original.
|
|
|
|
To generate a pseudonymized data set:
|
|
|
|
1. [Configure Pseudonymizer](#configure-pseudonymizer) fields and output location.
|
|
1. [Enable Pseudonymizer data collection](#enable-pseudonymizer-data-collection).
|
|
1. Optional. [Generate a data set manually](#generate-data-set-manually).
|
|
|
|
## Configure Pseudonymizer
|
|
|
|
To use the Pseudonymizer, configure both the fields you want to anonymize, and the location to
|
|
store the scrubbed data:
|
|
|
|
1. **Create a manifest file**: This file describes the fields to include or pseudonymize.
|
|
- **Default manifest** - GitLab provides a default manifest in your GitLab installation
|
|
([example `manifest.yml` file](https://gitlab.com/gitlab-org/gitlab/-/blob/master/config/pseudonymizer.yml)).
|
|
To use the example manifest file, use the `config/pseudonymizer.yml` relative path
|
|
when you configure connection parameters.
|
|
- **Custom manifest** - To use a custom manifest file, use the absolute path to
|
|
the file when you configure the connection parameters.
|
|
1. **Configure connection parameters**: In the configuration method appropriate for
|
|
your version of GitLab, specify the [object storage](object_storage.md)
|
|
connection parameters (`pseudonymizer.upload.connection`).
|
|
|
|
**For Omnibus installations:**
|
|
|
|
1. Edit `/etc/gitlab/gitlab.rb` and add the following lines by replacing with
|
|
the values you want:
|
|
|
|
```ruby
|
|
gitlab_rails['pseudonymizer_manifest'] = 'config/pseudonymizer.yml'
|
|
gitlab_rails['pseudonymizer_upload_remote_directory'] = 'gitlab-elt' # bucket name
|
|
gitlab_rails['pseudonymizer_upload_connection'] = {
|
|
'provider' => 'AWS',
|
|
'region' => 'eu-central-1',
|
|
'aws_access_key_id' => 'AWS_ACCESS_KEY_ID',
|
|
'aws_secret_access_key' => 'AWS_SECRET_ACCESS_KEY'
|
|
}
|
|
```
|
|
|
|
If you are using AWS IAM profiles, omit the AWS access key and secret access key/value pairs.
|
|
|
|
```ruby
|
|
gitlab_rails['pseudonymizer_upload_connection'] = {
|
|
'provider' => 'AWS',
|
|
'region' => 'eu-central-1',
|
|
'use_iam_profile' => true
|
|
}
|
|
```
|
|
|
|
1. Save the file and [reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure)
|
|
for the changes to take effect.
|
|
|
|
---
|
|
|
|
**For installations from source:**
|
|
|
|
1. Edit `/home/git/gitlab/config/gitlab.yml` and add or amend the following
|
|
lines:
|
|
|
|
```yaml
|
|
pseudonymizer:
|
|
manifest: config/pseudonymizer.yml
|
|
upload:
|
|
remote_directory: 'gitlab-elt' # bucket name
|
|
connection:
|
|
provider: AWS
|
|
aws_access_key_id: AWS_ACCESS_KEY_ID
|
|
aws_secret_access_key: AWS_SECRET_ACCESS_KEY
|
|
region: eu-central-1
|
|
```
|
|
|
|
1. Save the file and [restart GitLab](restart_gitlab.md#installations-from-source)
|
|
for the changes to take effect.
|
|
|
|
## Enable Pseudonymizer data collection
|
|
|
|
To enable data collection:
|
|
|
|
1. On the top bar, select **Menu > Admin**.
|
|
1. On the left sidebar, select **Settings > Metrics and Profiling**, then expand
|
|
**Pseudonymizer data collection**.
|
|
1. Select **Enable Pseudonymizer data collection**.
|
|
1. Select **Save changes**.
|
|
|
|
## Generate data set manually
|
|
|
|
You can also run the Pseudonymizer manually:
|
|
|
|
1. Set these environment variables:
|
|
- `PSEUDONYMIZER_OUTPUT_DIR` - Where to store the output CSV files. Defaults to `/tmp`.
|
|
These commands produce CSV files that can be quite large. Make sure the directory
|
|
can store a file at least 10% of the size of your database.
|
|
- `PSEUDONYMIZER_BATCH` - The batch size when querying the database. Defaults to `100000`.
|
|
1. Run the command appropriate for your application:
|
|
- **Omnibus GitLab**:
|
|
`sudo gitlab-rake gitlab:db:pseudonymizer`
|
|
- **Installations from source**:
|
|
`sudo -u git -H bundle exec rake gitlab:db:pseudonymizer RAILS_ENV=production`
|
|
|
|
After you run the command, upload the output CSV files to your configured object
|
|
storage. After the upload completes, delete the output file from the local disk.
|
|
|
|
## Related topics
|
|
|
|
- [Using object storage with GitLab](object_storage.md).
|