debian-mirror-gitlab/doc/administration/database_load_balancing.md

283 lines
12 KiB
Markdown
Raw Normal View History

2021-01-03 14:25:43 +05:30
---
2021-01-29 00:20:46 +05:30
stage: Enablement
group: Database
2021-02-22 17:27:13 +05:30
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
2021-01-03 14:25:43 +05:30
---
2021-03-11 19:13:27 +05:30
# Database Load Balancing **(PREMIUM SELF)**
2019-07-31 22:56:46 +05:30
2020-05-24 23:13:21 +05:30
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/1283) in [GitLab Premium](https://about.gitlab.com/pricing/) 9.0.
2019-07-31 22:56:46 +05:30
Distribute read-only queries among multiple database servers.
## Overview
Database load balancing improves the distribution of database workloads across
multiple computing resources. Load balancing aims to optimize resource use,
maximize throughput, minimize response time, and avoid overload of any single
resource. Using multiple components with load balancing instead of a single
component may increase reliability and availability through redundancy.
2020-05-24 23:13:21 +05:30
[_Wikipedia article_](https://en.wikipedia.org/wiki/Load_balancing_(computing))
2019-07-31 22:56:46 +05:30
When database load balancing is enabled in GitLab, the load is balanced using
a simple round-robin algorithm, without any external dependencies such as Redis.
Load balancing is not enabled for Sidekiq as this would lead to consistency
problems, and Sidekiq mostly performs writes anyway.
In the following image, you can see the load is balanced rather evenly among
all the secondaries (`db4`, `db5`, `db6`). Because `SELECT` queries are not
sent to the primary (unless necessary), the primary (`db3`) hardly has any load.
![DB load balancing graph](img/db_load_balancing_postgres_stats.png)
## Requirements
2020-05-24 23:13:21 +05:30
For load balancing to work you will need at least PostgreSQL 11 or newer,
[**MySQL is not supported**](../install/requirements.md#database). You also need to make sure that you have
at least 1 secondary in [hot standby](https://www.postgresql.org/docs/11/hot-standby.html) mode.
2019-07-31 22:56:46 +05:30
Load balancing also requires that the configured hosts **always** point to the
primary, even after a database failover. Furthermore, the additional hosts to
balance load among must **always** point to secondary databases. This means that
2021-06-08 01:23:25 +05:30
you should put a load balancer in front of every database, and have GitLab connect
2019-07-31 22:56:46 +05:30
to those load balancers.
For example, say you have a primary (`db1.gitlab.com`) and two secondaries,
`db2.gitlab.com` and `db3.gitlab.com`. For this setup you will need to have 3
load balancers, one for every host. For example:
2019-09-30 21:07:59 +05:30
- `primary.gitlab.com` forwards to `db1.gitlab.com`
- `secondary1.gitlab.com` forwards to `db2.gitlab.com`
- `secondary2.gitlab.com` forwards to `db3.gitlab.com`
2019-07-31 22:56:46 +05:30
Now let's say that a failover happens and db2 becomes the new primary. This
means forwarding should now happen as follows:
2019-09-30 21:07:59 +05:30
- `primary.gitlab.com` forwards to `db2.gitlab.com`
- `secondary1.gitlab.com` forwards to `db1.gitlab.com`
- `secondary2.gitlab.com` forwards to `db3.gitlab.com`
2019-07-31 22:56:46 +05:30
GitLab does not take care of this for you, so you will need to do so yourself.
Finally, load balancing requires that GitLab can connect to all hosts using the
same credentials and port as configured in the
[Enabling load balancing](#enabling-load-balancing) section. Using
different ports or credentials for different hosts is not supported.
## Use cases
- For GitLab instances with thousands of users and high traffic, you can use
database load balancing to reduce the load on the primary database and
increase responsiveness, thus resulting in faster page load inside GitLab.
## Enabling load balancing
For the environment in which you want to use load balancing, you'll need to add
the following. This will balance the load between `host1.example.com` and
`host2.example.com`.
**In Omnibus installations:**
1. Edit `/etc/gitlab/gitlab.rb` and add the following line:
2019-09-30 21:07:59 +05:30
```ruby
gitlab_rails['db_load_balancing'] = { 'hosts' => ['host1.example.com', 'host2.example.com'] }
```
2019-07-31 22:56:46 +05:30
2020-05-24 23:13:21 +05:30
1. Save the file and [reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
2019-07-31 22:56:46 +05:30
---
**In installations from source:**
1. Edit `/home/git/gitlab/config/database.yml` and add or amend the following lines:
2019-09-30 21:07:59 +05:30
```yaml
production:
username: gitlab
database: gitlab
encoding: unicode
load_balancing:
hosts:
- host1.example.com
- host2.example.com
```
2019-07-31 22:56:46 +05:30
2020-05-24 23:13:21 +05:30
1. Save the file and [restart GitLab](restart_gitlab.md#installations-from-source) for the changes to take effect.
2019-07-31 22:56:46 +05:30
## Service Discovery
2020-05-24 23:13:21 +05:30
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/5883) in [GitLab Premium](https://about.gitlab.com/pricing/) 11.0.
2019-07-31 22:56:46 +05:30
Service discovery allows GitLab to automatically retrieve a list of secondary
databases to use, instead of having to manually specify these in the
`database.yml` configuration file. Service discovery works by periodically
checking a DNS A record, using the IPs returned by this record as the addresses
for the secondaries. For service discovery to work, all you need is a DNS server
and an A record containing the IP addresses of your secondaries.
To use service discovery you need to change your `database.yml` configuration
file so it looks like the following:
```yaml
production:
username: gitlab
database: gitlab
encoding: unicode
load_balancing:
discover:
nameserver: localhost
record: secondary.postgresql.service.consul
2019-12-04 20:38:33 +05:30
record_type: A
2019-07-31 22:56:46 +05:30
port: 8600
interval: 60
disconnect_timeout: 120
```
2021-01-03 14:25:43 +05:30
Here, the `discover:` section specifies the configuration details to use for
2019-07-31 22:56:46 +05:30
service discovery.
### Configuration
The following options can be set:
| Option | Description | Default |
|----------------------|---------------------------------------------------------------------------------------------------|-----------|
| `nameserver` | The nameserver to use for looking up the DNS record. | localhost |
2019-12-04 20:38:33 +05:30
| `record` | The record to look up. This option is required for service discovery to work. | |
2021-01-03 14:25:43 +05:30
| `record_type` | Optional record type to look up, this can be either A or SRV (GitLab 12.3 and later) | A |
2019-07-31 22:56:46 +05:30
| `port` | The port of the nameserver. | 8600 |
| `interval` | The minimum time in seconds between checking the DNS record. | 60 |
| `disconnect_timeout` | The time in seconds after which an old connection is closed, after the list of hosts was updated. | 120 |
| `use_tcp` | Lookup DNS resources using TCP instead of UDP | false |
2019-12-04 20:38:33 +05:30
If `record_type` is set to `SRV`, GitLab will continue to use a round-robin algorithm
and will ignore the `weight` and `priority` in the record. Since SRV records usually
return hostnames instead of IPs, GitLab will look for the IPs of returned hostnames
in the additional section of the SRV response. If no IP is found for a hostname, GitLab
will query the configured `nameserver` for ANY record for each such hostname looking for A or AAAA
records, eventually dropping this hostname from rotation if it can't resolve its IP.
2019-07-31 22:56:46 +05:30
The `interval` value specifies the _minimum_ time between checks. If the A
record has a TTL greater than this value, then service discovery will honor said
TTL. For example, if the TTL of the A record is 90 seconds, then service
2021-06-08 01:23:25 +05:30
discovery waits at least 90 seconds before checking the A record again.
2019-07-31 22:56:46 +05:30
When the list of hosts is updated, it might take a while for the old connections
to be terminated. The `disconnect_timeout` setting can be used to enforce an
2021-06-08 01:23:25 +05:30
upper limit on the time it takes to terminate all old database connections.
2019-07-31 22:56:46 +05:30
2021-02-22 17:27:13 +05:30
Some nameservers (like [Consul](https://www.consul.io/docs/discovery/dns#udp-based-dns-queries)) can return a truncated list of hosts when
2019-07-31 22:56:46 +05:30
queried over UDP. To overcome this issue, you can use TCP for querying by setting
`use_tcp` to `true`.
### Forking
2021-02-22 17:27:13 +05:30
NOTE:
2020-05-24 23:13:21 +05:30
Starting with GitLab 13.0, Puma is the default web server used in GitLab
all-in-one package based installations as well as GitLab Helm chart deployments.
2019-07-31 22:56:46 +05:30
If you use an application server that forks, such as Unicorn, you _have to_
update your Unicorn configuration to start service discovery _after_ a fork.
2021-06-08 01:23:25 +05:30
Failure to do so leads to service discovery only running in the parent
2019-07-31 22:56:46 +05:30
process. If you are using Unicorn, then you can add the following to your
Unicorn configuration file:
```ruby
after_fork do |server, worker|
defined?(Gitlab::Database::LoadBalancing) &&
Gitlab::Database::LoadBalancing.start_service_discovery
end
```
2021-06-08 01:23:25 +05:30
This ensures that service discovery is started in both the parent and all
2019-07-31 22:56:46 +05:30
child processes.
## Balancing queries
2021-06-08 01:23:25 +05:30
Read-only `SELECT` queries balance among all the secondary hosts.
Everything else (including transactions) executes on the primary.
2019-07-31 22:56:46 +05:30
Queries such as `SELECT ... FOR UPDATE` are also executed on the primary.
## Prepared statements
Prepared statements don't work well with load balancing and are disabled
automatically when load balancing is enabled. This should have no impact on
response timings.
## Primary sticking
2021-06-08 01:23:25 +05:30
After a write has been performed, GitLab sticks to using the primary for a
certain period of time, scoped to the user that performed the write. GitLab
reverts back to using secondaries when they have either caught up, or after 30
2019-07-31 22:56:46 +05:30
seconds.
## Failover handling
2021-06-08 01:23:25 +05:30
In the event of a failover or an unresponsive database, the load balancer
tries to use the next available host. If no secondaries are available the
2019-07-31 22:56:46 +05:30
operation is performed on the primary instead.
2021-06-08 01:23:25 +05:30
If a connection error occurs while writing data, the
operation is retried up to 3 times using an exponential back-off.
2019-07-31 22:56:46 +05:30
When using load balancing, you should be able to safely restart a database server
without it immediately leading to errors being presented to the users.
## Logging
2019-12-04 20:38:33 +05:30
The load balancer logs various events in
2020-11-24 15:15:51 +05:30
[`database_load_balancing.log`](logs.md#database_load_balancinglog), such as
2019-07-31 22:56:46 +05:30
2019-09-30 21:07:59 +05:30
- When a host is marked as offline
- When a host comes back online
- When all secondaries are offline
2019-12-04 20:38:33 +05:30
- When a read is retried on a different host due to a query conflict
2019-07-31 22:56:46 +05:30
2019-12-04 20:38:33 +05:30
The log is structured with each entry a JSON object containing at least:
2019-07-31 22:56:46 +05:30
2019-12-04 20:38:33 +05:30
- An `event` field useful for filtering.
- A human-readable `message` field.
- Some event-specific metadata. For example, `db_host`
- Contextual information that is always logged. For example, `severity` and `time`.
For example:
```json
{"severity":"INFO","time":"2019-09-02T12:12:01.728Z","correlation_id":"abcdefg","event":"host_online","message":"Host came back online","db_host":"111.222.333.444","db_port":null,"tag":"rails.database_load_balancing","environment":"production","hostname":"web-example-1","fqdn":"gitlab.example.com","path":null,"params":null}
2019-07-31 22:56:46 +05:30
```
## Handling Stale Reads
2020-05-24 23:13:21 +05:30
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/3526) in [GitLab Premium](https://about.gitlab.com/pricing/) 10.3.
2019-07-31 22:56:46 +05:30
2021-06-08 01:23:25 +05:30
To prevent reading from an outdated secondary the load balancer checks if it
2019-07-31 22:56:46 +05:30
is in sync with the primary. If the data is determined to be recent enough the
2021-06-08 01:23:25 +05:30
secondary is used, otherwise it is ignored. To reduce the overhead of
2019-07-31 22:56:46 +05:30
these checks we only perform these checks at certain intervals.
2020-05-24 23:13:21 +05:30
There are three configuration options that influence this behavior:
2019-07-31 22:56:46 +05:30
| Option | Description | Default |
|------------------------------|----------------------------------------------------------------------------------------------------------------|------------|
| `max_replication_difference` | The amount of data (in bytes) a secondary is allowed to lag behind when it hasn't replicated data for a while. | 8 MB |
| `max_replication_lag_time` | The maximum number of seconds a secondary is allowed to lag behind before we stop using it. | 60 seconds |
| `replica_check_interval` | The minimum number of seconds we have to wait before checking the status of a secondary. | 60 seconds |
The defaults should be sufficient for most users. Should you want to change them
you can specify them in `config/database.yml` like so:
```yaml
production:
username: gitlab
database: gitlab
encoding: unicode
load_balancing:
hosts:
- host1.example.com
- host2.example.com
max_replication_difference: 16777216 # 16 MB
max_replication_lag_time: 30
replica_check_interval: 30
```