463 lines
16 KiB
Markdown
463 lines
16 KiB
Markdown
# Geo High Availability **(PREMIUM ONLY)**
|
|
|
|
This document describes a minimal reference architecture for running Geo
|
|
in a high availability configuration. If your HA setup differs from the one
|
|
described, it is possible to adapt these instructions to your needs.
|
|
|
|
## Architecture overview
|
|
|
|
![Geo HA Diagram](../../high_availability/img/geo-ha-diagram.png)
|
|
|
|
_[diagram source - GitLab employees only][diagram-source]_
|
|
|
|
The topology above assumes that the **primary** and **secondary** Geo clusters
|
|
are located in two separate locations, on their own virtual network
|
|
with private IP addresses. The network is configured such that all machines within
|
|
one geographic location can communicate with each other using their private IP addresses.
|
|
The IP addresses given are examples and may be different depending on the
|
|
network topology of your deployment.
|
|
|
|
The only external way to access the two Geo deployments is by HTTPS at
|
|
`gitlab.us.example.com` and `gitlab.eu.example.com` in the example above.
|
|
|
|
NOTE: **Note:**
|
|
The **primary** and **secondary** Geo deployments must be able to communicate to each other over HTTPS.
|
|
|
|
## Redis and PostgreSQL High Availability
|
|
|
|
Geo supports:
|
|
|
|
- Redis and PostgreSQL on the **primary** node configured for high availability
|
|
- Redis on **secondary** nodes configured for high availability.
|
|
|
|
NOTE: **Note:**
|
|
Support for PostgreSQL on **secondary** nodes in high availability configuration
|
|
[is planned](https://gitlab.com/groups/gitlab-org/-/epics/2536).
|
|
|
|
Because of the additional complexity involved in setting up this configuration
|
|
for PostgreSQL and Redis, it is not covered by this Geo HA documentation.
|
|
|
|
For more information about setting up a highly available PostgreSQL cluster and Redis cluster using the omnibus package see the high availability documentation for
|
|
[PostgreSQL](../../high_availability/database.md) and
|
|
[Redis](../../high_availability/redis.md), respectively.
|
|
|
|
NOTE: **Note:**
|
|
It is possible to use cloud hosted services for PostgreSQL and Redis, but this is beyond the scope of this document.
|
|
|
|
## Prerequisites: Two working GitLab HA clusters
|
|
|
|
One cluster will serve as the **primary** node. Use the
|
|
[GitLab HA documentation](../../high_availability/README.md) to set this up. If
|
|
you already have a working GitLab instance that is in-use, it can be used as a
|
|
**primary**.
|
|
|
|
The second cluster will serve as the **secondary** node. Again, use the
|
|
[GitLab HA documentation](../../high_availability/README.md) to set this up.
|
|
It's a good idea to log in and test it, however, note that its data will be
|
|
wiped out as part of the process of replicating from the **primary**.
|
|
|
|
## Configure the GitLab cluster to be the **primary** node
|
|
|
|
The following steps enable a GitLab cluster to serve as the **primary** node.
|
|
|
|
### Step 1: Configure the **primary** frontend servers
|
|
|
|
1. Edit `/etc/gitlab/gitlab.rb` and add the following:
|
|
|
|
```ruby
|
|
##
|
|
## Enable the Geo primary role
|
|
##
|
|
roles ['geo_primary_role']
|
|
|
|
##
|
|
## The unique identifier for the Geo node.
|
|
##
|
|
gitlab_rails['geo_node_name'] = '<node_name_here>'
|
|
|
|
##
|
|
## Disable automatic migrations
|
|
##
|
|
gitlab_rails['auto_migrate'] = false
|
|
```
|
|
|
|
After making these changes, [reconfigure GitLab][gitlab-reconfigure] so the changes take effect.
|
|
|
|
NOTE: **Note:** PostgreSQL and Redis should have already been disabled on the
|
|
application servers, and connections from the application servers to those
|
|
services on the backend servers configured, during normal GitLab HA set up. See
|
|
high availability configuration documentation for
|
|
[PostgreSQL](../../high_availability/database.md#configuring-the-application-nodes)
|
|
and [Redis](../../high_availability/redis.md#example-configuration-for-the-gitlab-application).
|
|
|
|
### Step 2: Configure the **primary** database
|
|
|
|
1. Edit `/etc/gitlab/gitlab.rb` and add the following:
|
|
|
|
```ruby
|
|
##
|
|
## Configure the Geo primary role and the PostgreSQL role
|
|
##
|
|
roles ['geo_primary_role', 'postgres_role']
|
|
```
|
|
|
|
## Configure a **secondary** node
|
|
|
|
A **secondary** cluster is similar to any other GitLab HA cluster, with two
|
|
major differences:
|
|
|
|
- The main PostgreSQL database is a read-only replica of the **primary** node's
|
|
PostgreSQL database.
|
|
- There is also a single PostgreSQL database for the **secondary** cluster,
|
|
called the "tracking database", which tracks the synchronization state of
|
|
various resources.
|
|
|
|
Therefore, we will set up the HA components one-by-one, and include deviations
|
|
from the normal HA setup. However, we highly recommend first configuring a
|
|
brand-new cluster as if it were not part of a Geo setup so that it can be
|
|
tested and verified as a working cluster. And only then should it be modified
|
|
for use as a Geo **secondary**. This helps to separate problems that are related
|
|
and are not related to Geo setup.
|
|
|
|
### Step 1: Configure the Redis and Gitaly services on the **secondary** node
|
|
|
|
Configure the following services, again using the non-Geo high availability
|
|
documentation:
|
|
|
|
- [Configuring Redis for GitLab HA](../../high_availability/redis.md) for high
|
|
availability.
|
|
- [Gitaly](../../high_availability/gitaly.md), which will store data that is
|
|
synchronized from the **primary** node.
|
|
|
|
NOTE: **Note:**
|
|
[NFS](../../high_availability/nfs.md) can be used in place of Gitaly but is not
|
|
recommended.
|
|
|
|
### Step 2: Configure the main read-only replica PostgreSQL database on the **secondary** node
|
|
|
|
NOTE: **Note:** The following documentation assumes the database will be run on
|
|
a single node only. PostgreSQL HA on **secondary** nodes is
|
|
[not currently supported](https://gitlab.com/groups/gitlab-org/-/epics/2536).
|
|
|
|
Configure the [**secondary** database](database.md) as a read-only replica of
|
|
the **primary** database. Use the following as a guide.
|
|
|
|
1. Generate an MD5 hash of the desired password for the database user that the
|
|
GitLab application will use to access the read-replica database:
|
|
|
|
Note that the username (`gitlab` by default) is incorporated into the hash.
|
|
|
|
```shell
|
|
gitlab-ctl pg-password-md5 gitlab
|
|
# Enter password: <your_password_here>
|
|
# Confirm password: <your_password_here>
|
|
# fca0b89a972d69f00eb3ec98a5838484
|
|
```
|
|
|
|
Use this hash to fill in `<md5_hash_of_your_password>` in the next step.
|
|
|
|
1. Edit `/etc/gitlab/gitlab.rb` in the replica database machine, and add the
|
|
following:
|
|
|
|
```ruby
|
|
##
|
|
## Configure the Geo secondary role and the PostgreSQL role
|
|
##
|
|
roles ['geo_secondary_role', 'postgres_role']
|
|
|
|
##
|
|
## Secondary address
|
|
## - replace '<secondary_node_ip>' with the public or VPC address of your Geo secondary node
|
|
## - replace '<tracking_database_ip>' with the public or VPC address of your Geo tracking database node
|
|
##
|
|
postgresql['listen_address'] = '<secondary_node_ip>'
|
|
postgresql['md5_auth_cidr_addresses'] = ['<secondary_node_ip>/32', '<tracking_database_ip>/32']
|
|
|
|
##
|
|
## Database credentials password (defined previously in primary node)
|
|
## - replicate same values here as defined in primary node
|
|
##
|
|
postgresql['sql_user_password'] = '<md5_hash_of_your_password>'
|
|
gitlab_rails['db_password'] = '<your_password_here>'
|
|
|
|
##
|
|
## When running the Geo tracking database on a separate machine, disable it
|
|
## here and allow connections from the tracking database host. And ensure
|
|
## the tracking database IP is in postgresql['md5_auth_cidr_addresses'] above.
|
|
##
|
|
geo_postgresql['enable'] = false
|
|
|
|
##
|
|
## Disable `geo_logcursor` service so Rails doesn't get configured here
|
|
##
|
|
geo_logcursor['enable'] = false
|
|
```
|
|
|
|
After making these changes, [reconfigure GitLab][gitlab-reconfigure] so the changes take effect.
|
|
|
|
If using an external PostgreSQL instance, refer also to
|
|
[Geo with external PostgreSQL instances](external_database.md).
|
|
|
|
### Step 3: Configure the tracking database on the **secondary** node
|
|
|
|
NOTE: **Note:** This documentation assumes the tracking database will be run on
|
|
only a single machine, rather than as a PostgreSQL cluster.
|
|
|
|
Configure the tracking database.
|
|
|
|
1. Generate an MD5 hash of the desired password for the database user that the
|
|
GitLab application will use to access the tracking database:
|
|
|
|
Note that the username (`gitlab_geo` by default) is incorporated into the
|
|
hash.
|
|
|
|
```shell
|
|
gitlab-ctl pg-password-md5 gitlab_geo
|
|
# Enter password: <your_password_here>
|
|
# Confirm password: <your_password_here>
|
|
# fca0b89a972d69f00eb3ec98a5838484
|
|
```
|
|
|
|
Use this hash to fill in `<tracking_database_password_md5_hash>` in the next
|
|
step.
|
|
|
|
1. Edit `/etc/gitlab/gitlab.rb` in the tracking database machine, and add the
|
|
following:
|
|
|
|
```ruby
|
|
##
|
|
## Enable the Geo secondary tracking database
|
|
##
|
|
geo_postgresql['enable'] = true
|
|
geo_postgresql['listen_address'] = '<ip_address_of_this_host>'
|
|
geo_postgresql['sql_user_password'] = '<tracking_database_password_md5_hash>'
|
|
|
|
##
|
|
## Configure FDW connection to the replica database
|
|
##
|
|
geo_secondary['db_fdw'] = true
|
|
geo_postgresql['fdw_external_password'] = '<replica_database_password_plaintext>'
|
|
geo_postgresql['md5_auth_cidr_addresses'] = ['<replica_database_ip>/32']
|
|
gitlab_rails['db_host'] = '<replica_database_ip>'
|
|
|
|
# Prevent reconfigure from attempting to run migrations on the replica DB
|
|
gitlab_rails['auto_migrate'] = false
|
|
|
|
##
|
|
## Disable all other services that aren't needed, since we don't have a role
|
|
## that does this.
|
|
##
|
|
alertmanager['enable'] = false
|
|
consul['enable'] = false
|
|
gitaly['enable'] = false
|
|
gitlab_exporter['enable'] = false
|
|
gitlab_workhorse['enable'] = false
|
|
nginx['enable'] = false
|
|
node_exporter['enable'] = false
|
|
pgbouncer_exporter['enable'] = false
|
|
postgresql['enable'] = false
|
|
prometheus['enable'] = false
|
|
redis['enable'] = false
|
|
redis_exporter['enable'] = false
|
|
repmgr['enable'] = false
|
|
sidekiq['enable'] = false
|
|
unicorn['enable'] = false
|
|
```
|
|
|
|
After making these changes, [reconfigure GitLab][gitlab-reconfigure] so the changes take effect.
|
|
|
|
If using an external PostgreSQL instance, refer also to
|
|
[Geo with external PostgreSQL instances](external_database.md).
|
|
|
|
### Step 4: Configure the frontend application servers on the **secondary** node
|
|
|
|
In the architecture overview, there are two machines running the GitLab
|
|
application services. These services are enabled selectively in the
|
|
configuration.
|
|
|
|
Configure the application servers following
|
|
[Configuring GitLab for HA](../../high_availability/gitlab.md), then make the
|
|
following modifications:
|
|
|
|
1. Edit `/etc/gitlab/gitlab.rb` on each application server in the **secondary**
|
|
cluster, and add the following:
|
|
|
|
```ruby
|
|
##
|
|
## Enable the Geo secondary role
|
|
##
|
|
roles ['geo_secondary_role', 'application_role']
|
|
|
|
##
|
|
## The unique identifier for the Geo node.
|
|
##
|
|
gitlab_rails['geo_node_name'] = '<node_name_here>'
|
|
|
|
##
|
|
## Disable automatic migrations
|
|
##
|
|
gitlab_rails['auto_migrate'] = false
|
|
|
|
##
|
|
## Configure the connection to the tracking DB. And disable application
|
|
## servers from running tracking databases.
|
|
##
|
|
geo_secondary['db_host'] = '<geo_tracking_db_host>'
|
|
geo_secondary['db_password'] = '<geo_tracking_db_password>'
|
|
geo_postgresql['enable'] = false
|
|
|
|
##
|
|
## Configure connection to the streaming replica database, if you haven't
|
|
## already
|
|
##
|
|
gitlab_rails['db_host'] = '<replica_database_host>'
|
|
gitlab_rails['db_password'] = '<replica_database_password>'
|
|
|
|
##
|
|
## Configure connection to Redis, if you haven't already
|
|
##
|
|
gitlab_rails['redis_host'] = '<redis_host>'
|
|
gitlab_rails['redis_password'] = '<redis_password>'
|
|
|
|
##
|
|
## If you are using custom users not managed by Omnibus, you need to specify
|
|
## UIDs and GIDs like below, and ensure they match between servers in a
|
|
## cluster to avoid permissions issues
|
|
##
|
|
user['uid'] = 9000
|
|
user['gid'] = 9000
|
|
web_server['uid'] = 9001
|
|
web_server['gid'] = 9001
|
|
registry['uid'] = 9002
|
|
registry['gid'] = 9002
|
|
```
|
|
|
|
NOTE: **Note:**
|
|
If you had set up PostgreSQL cluster using the omnibus package and you had set
|
|
up `postgresql['sql_user_password'] = 'md5 digest of secret'` setting, keep in
|
|
mind that `gitlab_rails['db_password']` and `geo_secondary['db_password']`
|
|
mentioned above contains the plaintext passwords. This is used to let the Rails
|
|
servers connect to the databases.
|
|
|
|
NOTE: **Note:**
|
|
Make sure that current node IP is listed in `postgresql['md5_auth_cidr_addresses']` setting of your remote database.
|
|
|
|
After making these changes [Reconfigure GitLab][gitlab-reconfigure] so the changes take effect.
|
|
|
|
On the secondary the following GitLab frontend services will be enabled:
|
|
|
|
- `geo-logcursor`
|
|
- `gitlab-pages`
|
|
- `gitlab-workhorse`
|
|
- `logrotate`
|
|
- `nginx`
|
|
- `registry`
|
|
- `remote-syslog`
|
|
- `sidekiq`
|
|
- `unicorn`
|
|
|
|
Verify these services by running `sudo gitlab-ctl status` on the frontend
|
|
application servers.
|
|
|
|
### Step 5: Set up the LoadBalancer for the **secondary** node
|
|
|
|
In this topology, a load balancer is required at each geographic location to
|
|
route traffic to the application servers.
|
|
|
|
See [Load Balancer for GitLab HA](../../high_availability/load_balancer.md) for
|
|
more information.
|
|
|
|
### Step 6: Configure the backend application servers on the **secondary** node
|
|
|
|
The minimal reference architecture diagram above shows all application services
|
|
running together on the same machines. However, for high availability we
|
|
[strongly recommend running all services separately](../../high_availability/README.md).
|
|
|
|
For example, a Sidekiq server could be configured similarly to the frontend
|
|
application servers above, with some changes to run only the `sidekiq` service:
|
|
|
|
1. Edit `/etc/gitlab/gitlab.rb` on each Sidekiq server in the **secondary**
|
|
cluster, and add the following:
|
|
|
|
```ruby
|
|
##
|
|
## Enable the Geo secondary role
|
|
##
|
|
roles ['geo_secondary_role']
|
|
|
|
##
|
|
## Enable the Sidekiq service
|
|
##
|
|
sidekiq['enable'] = true
|
|
|
|
##
|
|
## Ensure unnecessary services are disabled
|
|
##
|
|
alertmanager['enable'] = false
|
|
consul['enable'] = false
|
|
geo_logcursor['enable'] = false
|
|
gitaly['enable'] = false
|
|
gitlab_exporter['enable'] = false
|
|
gitlab_workhorse['enable'] = false
|
|
nginx['enable'] = false
|
|
node_exporter['enable'] = false
|
|
pgbouncer_exporter['enable'] = false
|
|
postgresql['enable'] = false
|
|
prometheus['enable'] = false
|
|
redis['enable'] = false
|
|
redis_exporter['enable'] = false
|
|
repmgr['enable'] = false
|
|
unicorn['enable'] = false
|
|
|
|
##
|
|
## The unique identifier for the Geo node.
|
|
##
|
|
gitlab_rails['geo_node_name'] = '<node_name_here>'
|
|
|
|
##
|
|
## Disable automatic migrations
|
|
##
|
|
gitlab_rails['auto_migrate'] = false
|
|
|
|
##
|
|
## Configure the connection to the tracking DB. And disable application
|
|
## servers from running tracking databases.
|
|
##
|
|
geo_secondary['db_host'] = '<geo_tracking_db_host>'
|
|
geo_secondary['db_password'] = '<geo_tracking_db_password>'
|
|
geo_postgresql['enable'] = false
|
|
|
|
##
|
|
## Configure connection to the streaming replica database, if you haven't
|
|
## already
|
|
##
|
|
gitlab_rails['db_host'] = '<replica_database_host>'
|
|
gitlab_rails['db_password'] = '<replica_database_password>'
|
|
|
|
##
|
|
## Configure connection to Redis, if you haven't already
|
|
##
|
|
gitlab_rails['redis_host'] = '<redis_host>'
|
|
gitlab_rails['redis_password'] = '<redis_password>'
|
|
|
|
##
|
|
## If you are using custom users not managed by Omnibus, you need to specify
|
|
## UIDs and GIDs like below, and ensure they match between servers in a
|
|
## cluster to avoid permissions issues
|
|
##
|
|
user['uid'] = 9000
|
|
user['gid'] = 9000
|
|
web_server['uid'] = 9001
|
|
web_server['gid'] = 9001
|
|
registry['uid'] = 9002
|
|
registry['gid'] = 9002
|
|
```
|
|
|
|
You can similarly configure a server to run only the `geo-logcursor` service
|
|
with `geo_logcursor['enable'] = true` and disabling Sidekiq with
|
|
`sidekiq['enable'] = false`.
|
|
|
|
These servers do not need to be attached to the load balancer.
|
|
|
|
[diagram-source]: https://docs.google.com/drawings/d/1z0VlizKiLNXVVVaERFwgsIOuEgjcUqDTWPdQYsE7Z4c/edit
|
|
[gitlab-reconfigure]: ../../restart_gitlab.md#omnibus-gitlab-reconfigure
|