508 lines
19 KiB
Markdown
508 lines
19 KiB
Markdown
# Geo database replication **(PREMIUM ONLY)**
|
|
|
|
NOTE: **Note:**
|
|
The following steps are for Omnibus installs only. Using Geo with source-based installs was **deprecated** in GitLab 11.5.
|
|
|
|
NOTE: **Note:**
|
|
If your GitLab installation uses external (not managed by Omnibus) PostgreSQL
|
|
instances, the Omnibus roles will not be able to perform all necessary
|
|
configuration steps. In this case,
|
|
[follow the Geo with external PostgreSQL instances document instead](external_database.md).
|
|
|
|
NOTE: **Note:**
|
|
The stages of the setup process must be completed in the documented order.
|
|
Before attempting the steps in this stage, [complete all prior stages][toc].
|
|
|
|
This document describes the minimal steps you have to take in order to
|
|
replicate your **primary** GitLab database to a **secondary** node's database. You may
|
|
have to change some values according to your database setup, how big it is, etc.
|
|
|
|
You are encouraged to first read through all the steps before executing them
|
|
in your testing/production environment.
|
|
|
|
## PostgreSQL replication
|
|
|
|
The GitLab **primary** node where the write operations happen will connect to
|
|
the **primary** database server, and **secondary** nodes will
|
|
connect to their own database servers (which are also read-only).
|
|
|
|
NOTE: **Note:**
|
|
In database documentation, you may see "**primary**" being referenced as "master"
|
|
and "**secondary**" as either "slave" or "standby" server (read-only).
|
|
|
|
We recommend using [PostgreSQL replication slots][replication-slots-article]
|
|
to ensure that the **primary** node retains all the data necessary for the **secondary** nodes to
|
|
recover. See below for more details.
|
|
|
|
The following guide assumes that:
|
|
|
|
- You are using Omnibus and therefore you are using PostgreSQL 9.6 or later
|
|
which includes the [`pg_basebackup` tool][pgback] and improved
|
|
[Foreign Data Wrapper][FDW] support.
|
|
- You have a **primary** node already set up (the GitLab server you are
|
|
replicating from), running Omnibus' PostgreSQL (or equivalent version), and
|
|
you have a new **secondary** server set up with the same versions of the OS,
|
|
PostgreSQL, and GitLab on all nodes.
|
|
|
|
CAUTION: **Warning:**
|
|
Geo works with streaming replication. Logical replication is not supported at this time.
|
|
There is an [issue where support is being discussed](https://gitlab.com/gitlab-org/gitlab-ee/issues/7420).
|
|
|
|
### Step 1. Configure the **primary** server
|
|
|
|
1. SSH into your GitLab **primary** server and login as root:
|
|
|
|
```sh
|
|
sudo -i
|
|
```
|
|
|
|
1. Execute the command below to define the node as **primary** node:
|
|
|
|
```sh
|
|
gitlab-ctl set-geo-primary-node
|
|
```
|
|
|
|
This command will use your defined `external_url` in `/etc/gitlab/gitlab.rb`.
|
|
|
|
1. GitLab 10.4 and up only: Do the following to make sure the `gitlab` database user has a password defined:
|
|
|
|
Generate a MD5 hash of the desired password:
|
|
|
|
```sh
|
|
gitlab-ctl pg-password-md5 gitlab
|
|
# Enter password: <your_password_here>
|
|
# Confirm password: <your_password_here>
|
|
# fca0b89a972d69f00eb3ec98a5838484
|
|
```
|
|
|
|
Edit `/etc/gitlab/gitlab.rb`:
|
|
|
|
```ruby
|
|
# Fill with the hash generated by `gitlab-ctl pg-password-md5 gitlab`
|
|
postgresql['sql_user_password'] = '<md5_hash_of_your_password>'
|
|
|
|
# Every node that runs Unicorn or Sidekiq needs to have the database
|
|
# password specified as below. If you have a high-availability setup, this
|
|
# must be present in all application nodes.
|
|
gitlab_rails['db_password'] = '<your_password_here>'
|
|
```
|
|
|
|
1. Omnibus GitLab already has a [replication user]
|
|
called `gitlab_replicator`. You must set the password for this user manually.
|
|
You will be prompted to enter a password:
|
|
|
|
```sh
|
|
gitlab-ctl set-replication-password
|
|
```
|
|
|
|
This command will also read the `postgresql['sql_replication_user']` Omnibus
|
|
setting in case you have changed `gitlab_replicator` username to something
|
|
else.
|
|
|
|
If you are using an external database not managed by Omnibus GitLab, you need
|
|
to create the replicator user and define a password to it manually:
|
|
|
|
```sql
|
|
--- Create a new user 'replicator'
|
|
CREATE USER gitlab_replicator;
|
|
|
|
--- Set/change a password and grants replication privilege
|
|
ALTER USER gitlab_replicator WITH REPLICATION ENCRYPTED PASSWORD '<replication_password>';
|
|
```
|
|
|
|
1. Configure PostgreSQL to listen on network interfaces:
|
|
|
|
For security reasons, PostgreSQL does not listen on any network interfaces
|
|
by default. However, Geo requires the **secondary** node to be able to
|
|
connect to the **primary** node's database. For this reason, we need the address of
|
|
each node.
|
|
|
|
NOTE: **Note:** For external PostgreSQL instances, see [additional instructions](external_database.md).
|
|
|
|
If you are using a cloud provider, you can lookup the addresses for each
|
|
Geo node through your cloud provider's management console.
|
|
|
|
To lookup the address of a Geo node, SSH in to the Geo node and execute:
|
|
|
|
```sh
|
|
##
|
|
## Private address
|
|
##
|
|
ip route get 255.255.255.255 | awk '{print "Private address:", $NF; exit}'
|
|
|
|
##
|
|
## Public address
|
|
##
|
|
echo "External address: $(curl --silent ipinfo.io/ip)"
|
|
```
|
|
|
|
In most cases, the following addresses will be used to configure GitLab
|
|
Geo:
|
|
|
|
| Configuration | Address |
|
|
|:----------------------------------------|:------------------------------------------------------|
|
|
| `postgresql['listen_address']` | **Primary** node's public or VPC private address. |
|
|
| `postgresql['md5_auth_cidr_addresses']` | **Secondary** node's public or VPC private addresses. |
|
|
|
|
If you are using Google Cloud Platform, SoftLayer, or any other vendor that
|
|
provides a virtual private cloud (VPC) you can use the **secondary** node's private
|
|
address (corresponds to "internal address" for Google Cloud Platform) for
|
|
`postgresql['md5_auth_cidr_addresses']` and `postgresql['listen_address']`.
|
|
|
|
The `listen_address` option opens PostgreSQL up to network connections
|
|
with the interface corresponding to the given address. See [the PostgreSQL
|
|
documentation][pg-docs-runtime-conn] for more details.
|
|
|
|
Depending on your network configuration, the suggested addresses may not
|
|
be correct. If your **primary** node and **secondary** nodes connect over a local
|
|
area network, or a virtual network connecting availability zones like
|
|
[Amazon's VPC](https://aws.amazon.com/vpc/) or [Google's VPC](https://cloud.google.com/vpc/)
|
|
you should use the **secondary** node's private address for `postgresql['md5_auth_cidr_addresses']`.
|
|
|
|
Edit `/etc/gitlab/gitlab.rb` and add the following, replacing the IP
|
|
addresses with addresses appropriate to your network configuration:
|
|
|
|
```ruby
|
|
##
|
|
## Geo Primary role
|
|
## - configure dependent flags automatically to enable Geo
|
|
##
|
|
roles ['geo_primary_role']
|
|
|
|
##
|
|
## Primary address
|
|
## - replace '<primary_node_ip>' with the public or VPC address of your Geo primary node
|
|
##
|
|
postgresql['listen_address'] = '<primary_node_ip>'
|
|
|
|
##
|
|
# Allow PostgreSQL client authentication from the primary and secondary IPs. These IPs may be
|
|
# public or VPC addresses in CIDR format, for example ['198.51.100.1/32', '198.51.100.2/32']
|
|
##
|
|
postgresql['md5_auth_cidr_addresses'] = ['<primary_node_ip>/32', '<secondary_node_ip>/32']
|
|
|
|
##
|
|
## Replication settings
|
|
## - set this to be the number of Geo secondary nodes you have
|
|
##
|
|
postgresql['max_replication_slots'] = 1
|
|
# postgresql['max_wal_senders'] = 10
|
|
# postgresql['wal_keep_segments'] = 10
|
|
|
|
##
|
|
## Disable automatic database migrations temporarily
|
|
## (until PostgreSQL is restarted and listening on the private address).
|
|
##
|
|
gitlab_rails['auto_migrate'] = false
|
|
```
|
|
|
|
1. Optional: If you want to add another **secondary** node, the relevant setting would look like:
|
|
|
|
```ruby
|
|
postgresql['md5_auth_cidr_addresses'] = ['<primary_node_ip>/32', '<secondary_node_ip>/32', '<another_secondary_node_ip>/32']
|
|
```
|
|
|
|
You may also want to edit the `wal_keep_segments` and `max_wal_senders` to
|
|
match your database replication requirements. Consult the [PostgreSQL -
|
|
Replication documentation][pg-docs-runtime-replication]
|
|
for more information.
|
|
|
|
1. Save the file and reconfigure GitLab for the database listen changes and
|
|
the replication slot changes to be applied:
|
|
|
|
```sh
|
|
gitlab-ctl reconfigure
|
|
```
|
|
|
|
Restart PostgreSQL for its changes to take effect:
|
|
|
|
```sh
|
|
gitlab-ctl restart postgresql
|
|
```
|
|
|
|
1. Re-enable migrations now that PostgreSQL is restarted and listening on the
|
|
private address.
|
|
|
|
Edit `/etc/gitlab/gitlab.rb` and **change** the configuration to `true`:
|
|
|
|
```ruby
|
|
gitlab_rails['auto_migrate'] = true
|
|
```
|
|
|
|
Save the file and reconfigure GitLab:
|
|
|
|
```sh
|
|
gitlab-ctl reconfigure
|
|
```
|
|
|
|
1. Now that the PostgreSQL server is set up to accept remote connections, run
|
|
`netstat -plnt | grep 5432` to make sure that PostgreSQL is listening on port
|
|
`5432` to the **primary** server's private address.
|
|
|
|
1. A certificate was automatically generated when GitLab was reconfigured. This
|
|
will be used automatically to protect your PostgreSQL traffic from
|
|
eavesdroppers, but to protect against active ("man-in-the-middle") attackers,
|
|
the **secondary** node needs a copy of the certificate. Make a copy of the PostgreSQL
|
|
`server.crt` file on the **primary** node by running this command:
|
|
|
|
```sh
|
|
cat ~gitlab-psql/data/server.crt
|
|
```
|
|
|
|
Copy the output into a clipboard or into a local file. You
|
|
will need it when setting up the **secondary** node! The certificate is not sensitive
|
|
data.
|
|
|
|
### Step 2. Configure the **secondary** server
|
|
|
|
1. SSH into your GitLab **secondary** server and login as root:
|
|
|
|
```
|
|
sudo -i
|
|
```
|
|
|
|
1. Stop application server and Sidekiq
|
|
|
|
```
|
|
gitlab-ctl stop unicorn
|
|
gitlab-ctl stop sidekiq
|
|
```
|
|
|
|
NOTE: **Note:**
|
|
This step is important so we don't try to execute anything before the node is fully configured.
|
|
|
|
1. [Check TCP connectivity][rake-maintenance] to the **primary** node's PostgreSQL server:
|
|
|
|
```sh
|
|
gitlab-rake gitlab:tcp_check[<primary_node_ip>,5432]
|
|
```
|
|
|
|
NOTE: **Note:**
|
|
If this step fails, you may be using the wrong IP address, or a firewall may
|
|
be preventing access to the server. Check the IP address, paying close
|
|
attention to the difference between public and private addresses and ensure
|
|
that, if a firewall is present, the **secondary** node is permitted to connect to the
|
|
**primary** node on port 5432.
|
|
|
|
1. Create a file `server.crt` in the **secondary** server, with the content you got on the last step of the **primary** node's setup:
|
|
|
|
```
|
|
editor server.crt
|
|
```
|
|
|
|
1. Set up PostgreSQL TLS verification on the **secondary** node:
|
|
|
|
Install the `server.crt` file:
|
|
|
|
```sh
|
|
install \
|
|
-D \
|
|
-o gitlab-psql \
|
|
-g gitlab-psql \
|
|
-m 0400 \
|
|
-T server.crt ~gitlab-psql/.postgresql/root.crt
|
|
```
|
|
|
|
PostgreSQL will now only recognize that exact certificate when verifying TLS
|
|
connections. The certificate can only be replicated by someone with access
|
|
to the private key, which is **only** present on the **primary** node.
|
|
|
|
1. Test that the `gitlab-psql` user can connect to the **primary** node's database
|
|
(the default Omnibus database name is gitlabhq_production):
|
|
|
|
```sh
|
|
sudo \
|
|
-u gitlab-psql /opt/gitlab/embedded/bin/psql \
|
|
--list \
|
|
-U gitlab_replicator \
|
|
-d "dbname=gitlabhq_production sslmode=verify-ca" \
|
|
-W \
|
|
-h <primary_node_ip>
|
|
```
|
|
|
|
When prompted enter the password you set in the first step for the
|
|
`gitlab_replicator` user. If all worked correctly, you should see
|
|
the list of **primary** node's databases.
|
|
|
|
A failure to connect here indicates that the TLS configuration is incorrect.
|
|
Ensure that the contents of `~gitlab-psql/data/server.crt` on the **primary** node
|
|
match the contents of `~gitlab-psql/.postgresql/root.crt` on the **secondary** node.
|
|
|
|
1. Configure PostgreSQL to enable FDW support:
|
|
|
|
This step is similar to how we configured the **primary** instance.
|
|
We need to enable this, to enable FDW support, even if using a single node.
|
|
|
|
Edit `/etc/gitlab/gitlab.rb` and add the following, replacing the IP
|
|
addresses with addresses appropriate to your network configuration:
|
|
|
|
```ruby
|
|
##
|
|
## Geo Secondary role
|
|
## - configure dependent flags automatically to enable Geo
|
|
##
|
|
roles ['geo_secondary_role']
|
|
|
|
##
|
|
## Secondary address
|
|
## - replace '<secondary_node_ip>' with the public or VPC address of your Geo secondary node
|
|
##
|
|
postgresql['listen_address'] = '<secondary_node_ip>'
|
|
postgresql['md5_auth_cidr_addresses'] = ['<secondary_node_ip>/32']
|
|
|
|
##
|
|
## Database credentials password (defined previously in primary node)
|
|
## - replicate same values here as defined in primary node
|
|
##
|
|
postgresql['sql_user_password'] = '<md5_hash_of_your_password>'
|
|
gitlab_rails['db_password'] = '<your_password_here>'
|
|
|
|
##
|
|
## Enable FDW support for the Geo Tracking Database (improves performance)
|
|
##
|
|
geo_secondary['db_fdw'] = true
|
|
```
|
|
|
|
For external PostgreSQL instances, see [additional instructions](external_database.md).
|
|
If you bring a former **primary** node back online to serve as a **secondary** node, then you also need to remove `roles ['geo_primary_role']` or `geo_primary_role['enable'] = true`.
|
|
|
|
1. Reconfigure GitLab for the changes to take effect:
|
|
|
|
```sh
|
|
gitlab-ctl reconfigure
|
|
```
|
|
|
|
1. Restart PostgreSQL for the IP change to take effect and reconfigure again:
|
|
|
|
```sh
|
|
gitlab-ctl restart postgresql
|
|
gitlab-ctl reconfigure
|
|
```
|
|
|
|
This last reconfigure will provision the FDW configuration and enable it.
|
|
|
|
### Step 3. Initiate the replication process
|
|
|
|
Below we provide a script that connects the database on the **secondary** node to
|
|
the database on the **primary** node, replicates the database, and creates the
|
|
needed files for streaming replication.
|
|
|
|
The directories used are the defaults that are set up in Omnibus. If you have
|
|
changed any defaults, configure it as you see fit replacing the directories and paths.
|
|
|
|
CAUTION: **Warning:**
|
|
Make sure to run this on the **secondary** server as it removes all PostgreSQL's
|
|
data before running `pg_basebackup`.
|
|
|
|
1. SSH into your GitLab **secondary** server and login as root:
|
|
|
|
```sh
|
|
sudo -i
|
|
```
|
|
|
|
1. Choose a database-friendly name to use for your **secondary** node to
|
|
use as the replication slot name. For example, if your domain is
|
|
`secondary.geo.example.com`, you may use `secondary_example` as the slot
|
|
name as shown in the commands below.
|
|
|
|
1. Execute the command below to start a backup/restore and begin the replication
|
|
|
|
CAUTION: **Warning:** Each Geo **secondary** node must have its own unique replication slot name.
|
|
Using the same slot name between two secondaries will break PostgreSQL replication.
|
|
|
|
```sh
|
|
gitlab-ctl replicate-geo-database \
|
|
--slot-name=<secondary_node_name> \
|
|
--host=<primary_node_ip>
|
|
```
|
|
|
|
When prompted, enter the _plaintext_ password you set up for the `gitlab_replicator`
|
|
user in the first step.
|
|
|
|
This command also takes a number of additional options. You can use `--help`
|
|
to list them all, but here are a couple of tips:
|
|
|
|
- If PostgreSQL is listening on a non-standard port, add `--port=` as well.
|
|
- If your database is too large to be transferred in 30 minutes, you will need
|
|
to increase the timeout, e.g., `--backup-timeout=3600` if you expect the
|
|
initial replication to take under an hour.
|
|
- Pass `--sslmode=disable` to skip PostgreSQL TLS authentication altogether
|
|
(e.g., you know the network path is secure, or you are using a site-to-site
|
|
VPN). This is **not** safe over the public Internet!
|
|
- You can read more details about each `sslmode` in the
|
|
[PostgreSQL documentation][pg-docs-ssl];
|
|
the instructions above are carefully written to ensure protection against
|
|
both passive eavesdroppers and active "man-in-the-middle" attackers.
|
|
- Change the `--slot-name` to the name of the replication slot
|
|
to be used on the **primary** database. The script will attempt to create the
|
|
replication slot automatically if it does not exist.
|
|
- If you're repurposing an old server into a Geo **secondary** node, you'll need to
|
|
add `--force` to the command line.
|
|
- When not in a production machine you can disable backup step if you
|
|
really sure this is what you want by adding `--skip-backup`
|
|
|
|
The replication process is now complete.
|
|
|
|
## PGBouncer support (optional)
|
|
|
|
[PGBouncer](http://pgbouncer.github.io/) may be used with GitLab Geo to pool
|
|
PostgreSQL connections. We recommend using PGBouncer if you use GitLab in a
|
|
high-availability configuration with a cluster of nodes supporting a Geo
|
|
**primary** node and another cluster of nodes supporting a Geo **secondary** node. For more
|
|
information, see [High Availability with GitLab Omnibus](../../high_availability/database.md#high-availability-with-gitlab-omnibus-premium-only).
|
|
|
|
For a Geo **secondary** node to work properly with PGBouncer in front of the database,
|
|
it will need a separate read-only user to make [PostgreSQL FDW queries][FDW]
|
|
work:
|
|
|
|
1. On the **primary** Geo database, enter the PostgreSQL on the console as an
|
|
admin user. If you are using an Omnibus-managed database, log onto the **primary**
|
|
node that is running the PostgreSQL database (the default Omnibus database name is gitlabhq_production):
|
|
|
|
```sh
|
|
sudo \
|
|
-u gitlab-psql /opt/gitlab/embedded/bin/psql \
|
|
-h /var/opt/gitlab/postgresql gitlabhq_production
|
|
```
|
|
|
|
1. Then create the read-only user:
|
|
|
|
```sql
|
|
-- NOTE: Use the password defined earlier
|
|
CREATE USER gitlab_geo_fdw WITH password 'mypassword';
|
|
GRANT CONNECT ON DATABASE gitlabhq_production to gitlab_geo_fdw;
|
|
GRANT USAGE ON SCHEMA public TO gitlab_geo_fdw;
|
|
GRANT SELECT ON ALL TABLES IN SCHEMA public TO gitlab_geo_fdw;
|
|
GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO gitlab_geo_fdw;
|
|
|
|
-- Tables created by "gitlab" should be made read-only for "gitlab_geo_fdw"
|
|
-- automatically.
|
|
ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON TABLES TO gitlab_geo_fdw;
|
|
ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON SEQUENCES TO gitlab_geo_fdw;
|
|
```
|
|
|
|
1. On the **secondary** nodes, change `/etc/gitlab/gitlab.rb`:
|
|
|
|
```
|
|
geo_postgresql['fdw_external_user'] = 'gitlab_geo_fdw'
|
|
```
|
|
|
|
1. Save the file and reconfigure GitLab for the changes to be applied:
|
|
|
|
```sh
|
|
gitlab-ctl reconfigure
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
Read the [troubleshooting document](troubleshooting.md).
|
|
|
|
[replication-slots-article]: https://medium.com/@tk512/replication-slots-in-postgresql-b4b03d277c75
|
|
[pgback]: http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
|
|
[replication user]:https://wiki.postgresql.org/wiki/Streaming_Replication
|
|
[FDW]: https://www.postgresql.org/docs/9.6/static/postgres-fdw.html
|
|
[toc]: index.md#using-omnibus-gitlab
|
|
[rake-maintenance]: ../../raketasks/maintenance.md
|
|
[pg-docs-ssl]: https://www.postgresql.org/docs/9.6/static/libpq-ssl.html#LIBPQ-SSL-PROTECTION
|
|
[pg-docs-runtime-conn]: https://www.postgresql.org/docs/9.6/static/runtime-config-connection.html
|
|
[pg-docs-runtime-replication]: https://www.postgresql.org/docs/9.6/static/runtime-config-replication.html
|