440 lines
16 KiB
Markdown
440 lines
16 KiB
Markdown
|
# Geo database replication (source) **[PREMIUM ONLY]**
|
||
|
|
||
|
NOTE: **Note:**
|
||
|
This documentation applies to GitLab source installations. In GitLab 11.5, this documentation was deprecated and will be removed in a future release.
|
||
|
Please consider [migrating to GitLab Omnibus install](https://docs.gitlab.com/omnibus/update/convert_to_omnibus.html). For installations
|
||
|
using the Omnibus GitLab packages, follow the
|
||
|
[**database replication for Omnibus GitLab**][database] guide.
|
||
|
|
||
|
NOTE: **Note:**
|
||
|
The stages of the setup process must be completed in the documented order.
|
||
|
Before attempting the steps in this stage, [complete all prior stages](index.md#using-gitlab-installed-from-source-deprecated).
|
||
|
|
||
|
This document describes the minimal steps you have to take in order to
|
||
|
replicate your **primary** GitLab database to a **secondary** node's database. You may
|
||
|
have to change some values according to your database setup, how big it is, etc.
|
||
|
|
||
|
You are encouraged to first read through all the steps before executing them
|
||
|
in your testing/production environment.
|
||
|
|
||
|
## PostgreSQL replication
|
||
|
|
||
|
The GitLab **primary** node where the write operations happen will connect to
|
||
|
**primary** database server, and the **secondary** ones which are read-only will
|
||
|
connect to **secondary** database servers (which are read-only too).
|
||
|
|
||
|
NOTE: **Note:**
|
||
|
In many databases' documentation, you will see "**primary**" being referenced as "master"
|
||
|
and "**secondary**" as either "slave" or "standby" server (read-only).
|
||
|
|
||
|
We recommend using [PostgreSQL replication slots][replication-slots-article]
|
||
|
to ensure the **primary** node retains all the data necessary for the secondaries to
|
||
|
recover. See below for more details.
|
||
|
|
||
|
The following guide assumes that:
|
||
|
|
||
|
- You are using PostgreSQL 9.6 or later which includes the
|
||
|
[`pg_basebackup` tool][pgback] and improved [Foreign Data Wrapper][FDW] support.
|
||
|
- You have a **primary** node already set up (the GitLab server you are
|
||
|
replicating from), running PostgreSQL 9.6 or later, and
|
||
|
you have a new **secondary** server set up with the same versions of the OS,
|
||
|
PostgreSQL, and GitLab on all nodes.
|
||
|
- The IP of the **primary** server for our examples is `198.51.100.1`, whereas the
|
||
|
**secondary** node's IP is `198.51.100.2`. Note that the **primary** and **secondary** servers
|
||
|
**must** be able to communicate over these addresses. These IP addresses can either
|
||
|
be public or private.
|
||
|
|
||
|
CAUTION: **Warning:**
|
||
|
Geo works with streaming replication. Logical replication is not supported at this time.
|
||
|
There is an [issue where support is being discussed](https://gitlab.com/gitlab-org/gitlab-ee/issues/7420).
|
||
|
|
||
|
### Step 1. Configure the **primary** server
|
||
|
|
||
|
1. SSH into your GitLab **primary** server and login as root:
|
||
|
|
||
|
```sh
|
||
|
sudo -i
|
||
|
```
|
||
|
|
||
|
1. Add this node as the Geo **primary** by running:
|
||
|
|
||
|
```sh
|
||
|
bundle exec rake geo:set_primary_node
|
||
|
```
|
||
|
|
||
|
1. Create a [replication user] named `gitlab_replicator`:
|
||
|
|
||
|
```sql
|
||
|
--- Create a new user 'replicator'
|
||
|
CREATE USER gitlab_replicator;
|
||
|
|
||
|
--- Set/change a password and grants replication privilege
|
||
|
ALTER USER gitlab_replicator WITH REPLICATION ENCRYPTED PASSWORD '<replication_password>';
|
||
|
```
|
||
|
|
||
|
1. Make sure your the `gitlab` database user has a password defined:
|
||
|
|
||
|
```sh
|
||
|
sudo \
|
||
|
-u postgres psql \
|
||
|
-d template1 \
|
||
|
-c "ALTER USER gitlab WITH ENCRYPTED PASSWORD '<database_password>';"
|
||
|
```
|
||
|
|
||
|
1. Edit the content of `database.yml` in `production:` and add the password like the example below:
|
||
|
|
||
|
```yaml
|
||
|
#
|
||
|
# PRODUCTION
|
||
|
#
|
||
|
production:
|
||
|
adapter: postgresql
|
||
|
encoding: unicode
|
||
|
database: gitlabhq_production
|
||
|
pool: 10
|
||
|
username: gitlab
|
||
|
password: <database_password>
|
||
|
host: /var/opt/gitlab/geo-postgresql
|
||
|
```
|
||
|
|
||
|
1. Set up TLS support for the PostgreSQL **primary** server:
|
||
|
|
||
|
CAUTION: **Warning**:
|
||
|
Only skip this step if you **know** that PostgreSQL traffic
|
||
|
between the **primary** and **secondary** nodes will be secured through some other
|
||
|
means, e.g., a known-safe physical network path or a site-to-site VPN that
|
||
|
you have configured.
|
||
|
|
||
|
If you are replicating your database across the open Internet, it is
|
||
|
**essential** that the connection is TLS-secured. Correctly configured, this
|
||
|
provides protection against both passive eavesdroppers and active
|
||
|
"man-in-the-middle" attackers.
|
||
|
|
||
|
To generate a self-signed certificate and key, run this command:
|
||
|
|
||
|
```sh
|
||
|
openssl req \
|
||
|
-nodes \
|
||
|
-batch \
|
||
|
-x509 \
|
||
|
-newkey rsa:4096 \
|
||
|
-keyout server.key \
|
||
|
-out server.crt \
|
||
|
-days 3650
|
||
|
```
|
||
|
|
||
|
This will create two files - `server.key` and `server.crt` - that you can
|
||
|
use for authentication.
|
||
|
|
||
|
Copy them to the correct location for your PostgreSQL installation:
|
||
|
|
||
|
```sh
|
||
|
# Copying a self-signed certificate and key
|
||
|
install -o postgres -g postgres -m 0400 -T server.crt ~postgres/9.x/main/data/server.crt
|
||
|
install -o postgres -g postgres -m 0400 -T server.key ~postgres/9.x/main/data/server.key
|
||
|
```
|
||
|
|
||
|
Add this configuration to `postgresql.conf`, removing any existing
|
||
|
configuration for `ssl_cert_file` or `ssl_key_file`:
|
||
|
|
||
|
```
|
||
|
ssl = on
|
||
|
ssl_cert_file='server.crt'
|
||
|
ssl_key_file='server.key'
|
||
|
```
|
||
|
|
||
|
1. Edit `postgresql.conf` to configure the **primary** server for streaming replication
|
||
|
(for Debian/Ubuntu that would be `/etc/postgresql/9.x/main/postgresql.conf`):
|
||
|
|
||
|
```
|
||
|
listen_address = '<primary_node_ip>'
|
||
|
wal_level = hot_standby
|
||
|
max_wal_senders = 5
|
||
|
min_wal_size = 80MB
|
||
|
max_wal_size = 1GB
|
||
|
max_replicaton_slots = 1 # Number of Geo secondary nodes
|
||
|
wal_keep_segments = 10
|
||
|
hot_standby = on
|
||
|
```
|
||
|
|
||
|
NOTE: **Note**:
|
||
|
Be sure to set `max_replication_slots` to the number of Geo **secondary**
|
||
|
nodes that you may potentially have (at least 1).
|
||
|
|
||
|
For security reasons, PostgreSQL by default only listens on the local
|
||
|
interface (e.g. 127.0.0.1). However, Geo needs to communicate
|
||
|
between the **primary** and **secondary** nodes over a common network, such as a
|
||
|
corporate LAN or the public Internet. For this reason, we need to
|
||
|
configure PostgreSQL to listen on more interfaces.
|
||
|
|
||
|
The `listen_address` option opens PostgreSQL up to external connections
|
||
|
with the interface corresponding to the given IP. See [the PostgreSQL
|
||
|
documentation][pg-docs-runtime-conn] for more details.
|
||
|
|
||
|
You may also want to edit the `wal_keep_segments` and `max_wal_senders` to
|
||
|
match your database replication requirements. Consult the
|
||
|
[PostgreSQL - Replication documentation][pg-docs-runtime-replication] for more information.
|
||
|
|
||
|
1. Set the access control on the **primary** node to allow TCP connections using the
|
||
|
server's public IP and set the connection from the **secondary** node to require a
|
||
|
password. Edit `pg_hba.conf` (for Debian/Ubuntu that would be
|
||
|
`/etc/postgresql/9.x/main/pg_hba.conf`):
|
||
|
|
||
|
```sh
|
||
|
host all all <primary_node_ip>/32 md5
|
||
|
host replication gitlab_replicator <secondary_node_ip>/32 md5
|
||
|
```
|
||
|
|
||
|
If you want to add another secondary, add one more row like the replication
|
||
|
one and change the IP address:
|
||
|
|
||
|
```sh
|
||
|
host all all <primary_node_ip>/32 md5
|
||
|
host replication gitlab_replicator <secondary_node_ip>/32 md5
|
||
|
host replication gitlab_replicator <another_secondary_node_ip>/32 md5
|
||
|
```
|
||
|
|
||
|
1. Restart PostgreSQL for the changes to take effect.
|
||
|
|
||
|
1. Choose a database-friendly name to use for your secondary to use as the
|
||
|
replication slot name. For example, if your domain is
|
||
|
`secondary.geo.example.com`, you may use `secondary_example` as the slot
|
||
|
name.
|
||
|
|
||
|
1. Create the replication slot on the **primary** node:
|
||
|
|
||
|
```sh
|
||
|
$ sudo -u postgres psql -c "SELECT * FROM pg_create_physical_replication_slot('secondary_example');"
|
||
|
slot_name | xlog_position
|
||
|
------------------+---------------
|
||
|
secondary_example |
|
||
|
(1 row)
|
||
|
```
|
||
|
|
||
|
1. Now that the PostgreSQL server is set up to accept remote connections, run
|
||
|
`netstat -plnt` to make sure that PostgreSQL is listening to the server's
|
||
|
public IP.
|
||
|
|
||
|
### Step 2. Configure the secondary server
|
||
|
|
||
|
Follow the first steps in ["configure the secondary server"][database-replication] and note that since you are installing from source, the username and
|
||
|
group listed as `gitlab-psql` in those steps should be replaced by `postgres`
|
||
|
instead. After completing the "Test that the `gitlab-psql` user can connect to
|
||
|
the **primary** node's database" step, continue here:
|
||
|
|
||
|
1. Edit `postgresql.conf` to configure the secondary for streaming replication
|
||
|
(for Debian/Ubuntu that would be `/etc/postgresql/9.*/main/postgresql.conf`):
|
||
|
|
||
|
```sh
|
||
|
wal_level = hot_standby
|
||
|
max_wal_senders = 5
|
||
|
checkpoint_segments = 10
|
||
|
wal_keep_segments = 10
|
||
|
hot_standby = on
|
||
|
```
|
||
|
|
||
|
1. Restart PostgreSQL for the changes to take effect.
|
||
|
|
||
|
#### Enable tracking database on the secondary server
|
||
|
|
||
|
Geo secondary nodes use a tracking database to keep track of replication status
|
||
|
and recover automatically from some replication issues. Follow the steps below to create
|
||
|
the tracking database.
|
||
|
|
||
|
1. On the secondary node, run the following command to create `database_geo.yml` with the
|
||
|
information of your secondary PostgreSQL instance:
|
||
|
|
||
|
```sh
|
||
|
sudo cp /home/git/gitlab/config/database_geo.yml.postgresql /home/git/gitlab/config/database_geo.yml
|
||
|
```
|
||
|
|
||
|
1. Edit the content of `database_geo.yml` in `production:` as in the example below:
|
||
|
|
||
|
```yaml
|
||
|
#
|
||
|
# PRODUCTION
|
||
|
#
|
||
|
production:
|
||
|
adapter: postgresql
|
||
|
encoding: unicode
|
||
|
database: gitlabhq_geo_production
|
||
|
pool: 10
|
||
|
username: gitlab_geo
|
||
|
# password:
|
||
|
host: /var/opt/gitlab/geo-postgresql
|
||
|
```
|
||
|
|
||
|
1. Create the database `gitlabhq_geo_production` on the PostgreSQL instance of the **secondary** node.
|
||
|
|
||
|
1. Set up the Geo tracking database:
|
||
|
|
||
|
```sh
|
||
|
bundle exec rake geo:db:migrate
|
||
|
```
|
||
|
|
||
|
1. Configure the [PostgreSQL FDW][FDW] connection and credentials:
|
||
|
|
||
|
Save the script below in a file, ex. `/tmp/geo_fdw.sh` and modify the connection
|
||
|
params to match your environment. Execute it to set up the FDW connection.
|
||
|
|
||
|
```sh
|
||
|
#!/bin/bash
|
||
|
|
||
|
# Secondary Database connection params:
|
||
|
DB_HOST="/var/opt/gitlab/postgresql" # change to the public IP or VPC private IP if its an external server
|
||
|
DB_NAME="gitlabhq_production"
|
||
|
DB_USER="gitlab"
|
||
|
DB_PORT="5432"
|
||
|
|
||
|
# Tracking Database connection params:
|
||
|
GEO_DB_HOST="/var/opt/gitlab/geo-postgresql" # change to the public IP or VPC private IP if its an external server
|
||
|
GEO_DB_NAME="gitlabhq_geo_production"
|
||
|
GEO_DB_USER="gitlab_geo"
|
||
|
GEO_DB_PORT="5432"
|
||
|
|
||
|
query_exec () {
|
||
|
gitlab-psql -h $GEO_DB_HOST -d $GEO_DB_NAME -p $GEO_DB_PORT -c "${1}"
|
||
|
}
|
||
|
|
||
|
query_exec "CREATE EXTENSION postgres_fdw;"
|
||
|
query_exec "CREATE SERVER gitlab_secondary FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '${DB_HOST}', dbname '${DB_NAME}', port '${DB_PORT}');"
|
||
|
query_exec "CREATE USER MAPPING FOR ${GEO_DB_USER} SERVER gitlab_secondary OPTIONS (user '${DB_USER}');"
|
||
|
query_exec "CREATE SCHEMA gitlab_secondary;"
|
||
|
query_exec "GRANT USAGE ON FOREIGN SERVER gitlab_secondary TO ${GEO_DB_USER};"
|
||
|
```
|
||
|
|
||
|
And edit the content of `database_geo.yml` and to add `fdw: true` to
|
||
|
the `production:` block.
|
||
|
|
||
|
### Step 3. Initiate the replication process
|
||
|
|
||
|
Below we provide a script that connects the database on the **secondary** node to
|
||
|
the database on the **primary** node, replicates the database, and creates the
|
||
|
needed files for streaming replication.
|
||
|
|
||
|
The directories used are the defaults for Debian/Ubuntu. If you have changed
|
||
|
any defaults, configure it as you see fit replacing the directories and paths.
|
||
|
|
||
|
CAUTION: **Warning:**
|
||
|
Make sure to run this on the **secondary** server as it removes all PostgreSQL's
|
||
|
data before running `pg_basebackup`.
|
||
|
|
||
|
1. SSH into your GitLab **secondary** server and login as root:
|
||
|
|
||
|
```sh
|
||
|
sudo -i
|
||
|
```
|
||
|
|
||
|
1. Save the snippet below in a file, let's say `/tmp/replica.sh`. Modify the
|
||
|
embedded paths if necessary:
|
||
|
|
||
|
```
|
||
|
#!/bin/bash
|
||
|
|
||
|
PORT="5432"
|
||
|
USER="gitlab_replicator"
|
||
|
echo ---------------------------------------------------------------
|
||
|
echo WARNING: Make sure this script is run from the secondary server
|
||
|
echo ---------------------------------------------------------------
|
||
|
echo
|
||
|
echo Enter the IP or FQDN of the primary PostgreSQL server
|
||
|
read HOST
|
||
|
echo Enter the password for $USER@$HOST
|
||
|
read -s PASSWORD
|
||
|
echo Enter the required sslmode
|
||
|
read SSLMODE
|
||
|
|
||
|
echo Stopping PostgreSQL and all GitLab services
|
||
|
sudo service gitlab stop
|
||
|
sudo service postgresql stop
|
||
|
|
||
|
echo Backing up postgresql.conf
|
||
|
sudo -u postgres mv /var/opt/gitlab/postgresql/data/postgresql.conf /var/opt/gitlab/postgresql/
|
||
|
|
||
|
echo Cleaning up old cluster directory
|
||
|
sudo -u postgres rm -rf /var/opt/gitlab/postgresql/data
|
||
|
|
||
|
echo Starting base backup as the replicator user
|
||
|
echo Enter the password for $USER@$HOST
|
||
|
sudo -u postgres /opt/gitlab/embedded/bin/pg_basebackup -h $HOST -D /var/opt/gitlab/postgresql/data -U gitlab_replicator -v -x -P
|
||
|
|
||
|
echo Writing recovery.conf file
|
||
|
sudo -u postgres bash -c "cat > /var/opt/gitlab/postgresql/data/recovery.conf <<- _EOF1_
|
||
|
standby_mode = 'on'
|
||
|
primary_conninfo = 'host=$HOST port=$PORT user=$USER password=$PASSWORD sslmode=$SSLMODE'
|
||
|
_EOF1_
|
||
|
"
|
||
|
|
||
|
echo Restoring postgresql.conf
|
||
|
sudo -u postgres mv /var/opt/gitlab/postgresql/postgresql.conf /var/opt/gitlab/postgresql/data/
|
||
|
|
||
|
echo Starting PostgreSQL
|
||
|
sudo service postgresql start
|
||
|
```
|
||
|
|
||
|
1. Run it with:
|
||
|
|
||
|
```sh
|
||
|
bash /tmp/replica.sh
|
||
|
```
|
||
|
|
||
|
When prompted, enter the IP/FQDN of the **primary** node, and the password you set up
|
||
|
for the `gitlab_replicator` user in the first step.
|
||
|
|
||
|
You should use `verify-ca` for the `sslmode`. You can use `disable` if you
|
||
|
are happy to skip PostgreSQL TLS authentication altogether (e.g., you know
|
||
|
the network path is secure, or you are using a site-to-site VPN). This is
|
||
|
**not** safe over the public Internet!
|
||
|
|
||
|
You can read more details about each `sslmode` in the
|
||
|
[PostgreSQL documentation][pg-docs-ssl];
|
||
|
the instructions above are carefully written to ensure protection against
|
||
|
both passive eavesdroppers and active "man-in-the-middle" attackers.
|
||
|
|
||
|
The replication process is now over.
|
||
|
|
||
|
## PGBouncer support (optional)
|
||
|
|
||
|
1. First, enter the PostgreSQL console as an admin user.
|
||
|
|
||
|
1. Then create the read-only user:
|
||
|
|
||
|
```sql
|
||
|
-- NOTE: Use the password defined earlier
|
||
|
CREATE USER gitlab_geo_fdw WITH password '<your_password_here>';
|
||
|
GRANT CONNECT ON DATABASE gitlabhq_production to gitlab_geo_fdw;
|
||
|
GRANT USAGE ON SCHEMA public TO gitlab_geo_fdw;
|
||
|
GRANT SELECT ON ALL TABLES IN SCHEMA public TO gitlab_geo_fdw;
|
||
|
GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO gitlab_geo_fdw;
|
||
|
|
||
|
-- Tables created by "gitlab" should be made read-only for "gitlab_geo_fdw"
|
||
|
-- automatically.
|
||
|
ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON TABLES TO gitlab_geo_fdw;
|
||
|
ALTER DEFAULT PRIVILEGES FOR USER gitlab IN SCHEMA public GRANT SELECT ON SEQUENCES TO gitlab_geo_fdw;
|
||
|
```
|
||
|
|
||
|
1. Enter the PostgreSQL console on the **secondary** tracking database and change the user mapping to this new user:
|
||
|
|
||
|
```
|
||
|
ALTER USER MAPPING FOR gitlab_geo SERVER gitlab_secondary OPTIONS (SET user 'gitlab_geo_fdw')
|
||
|
```
|
||
|
|
||
|
## MySQL replication
|
||
|
|
||
|
MySQL replication is not supported for Geo.
|
||
|
|
||
|
## Troubleshooting
|
||
|
|
||
|
Read the [troubleshooting document](troubleshooting.md).
|
||
|
|
||
|
[replication-slots-article]: https://medium.com/@tk512/replication-slots-in-postgresql-b4b03d277c75
|
||
|
[pgback]: http://www.postgresql.org/docs/9.6/static/app-pgbasebackup.html
|
||
|
[replication user]:https://wiki.postgresql.org/wiki/Streaming_Replication
|
||
|
[FDW]: https://www.postgresql.org/docs/9.6/static/postgres-fdw.html
|
||
|
[database]: database.md
|
||
|
[add-geo-node]: configuration.md#step-3-add-the-secondary-gitlab-node
|
||
|
[database-replication]: database.md#step-2-configure-the-secondary-server
|
||
|
[pg-docs-ssl]: https://www.postgresql.org/docs/9.6/static/libpq-ssl.html#LIBPQ-SSL-PROTECTION
|
||
|
[pg-docs-runtime-conn]: https://www.postgresql.org/docs/9.6/static/runtime-config-connection.html
|
||
|
[pg-docs-runtime-replication]: https://www.postgresql.org/docs/9.6/static/runtime-config-replication.html
|