info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
---
# Redis replication and failover with Omnibus GitLab **(PREMIUM ONLY)**
NOTE: **Note:**
This is the documentation for the Omnibus GitLab packages. For using your own
non-bundled Redis, follow the [relevant documentation](replication_and_failover_external.md).
NOTE: **Note:**
In Redis lingo, primary is called master. In this document, primary is used
instead of master, except the settings where `master` is required.
Using [Redis](https://redis.io/) in scalable environment is possible using a **Primary** x **Replica**
topology with a [Redis Sentinel](https://redis.io/topics/sentinel) service to watch and automatically
start the failover procedure.
Redis requires authentication if used with Sentinel. See
[Redis Security](https://redis.io/topics/security) documentation for more
information. We recommend using a combination of a Redis password and tight
firewall rules to secure your Redis service.
You are highly encouraged to read the [Redis Sentinel](https://redis.io/topics/sentinel) documentation
before configuring Redis with GitLab to fully understand the topology and
architecture.
Before diving into the details of setting up Redis and Redis Sentinel for a
replicated topology, make sure you read this document once as a whole to better
understand how the components are tied together.
You need at least `3` independent machines: physical, or VMs running into
distinct physical machines. It is essential that all primary and replica Redis
instances run in different machines. If you fail to provision the machines in
that specific way, any issue with the shared environment can bring your entire
setup down.
It is OK to run a Sentinel alongside of a primary or replica Redis instance.
There should be no more than one Sentinel on the same machine though.
You also need to take into consideration the underlying network topology,
making sure you have redundant connectivity between Redis / Sentinel and
GitLab instances, otherwise the networks will become a single point of
failure.
Running Redis in a scaled environment requires a few things:
- Multiple Redis instances
- Run Redis in a **Primary** x **Replica** topology
- Multiple Sentinel instances
- Application support and visibility to all Sentinel and Redis instances
Redis Sentinel can handle the most important tasks in an HA environment and that's
to help keep servers online with minimal to no downtime. Redis Sentinel:
- Monitors **Primary** and **Replicas** instances to see if they are available
- Promotes a **Replica** to **Primary** when the **Primary** fails
- Demotes a **Primary** to **Replica** when the failed **Primary** comes back online
(to prevent data-partitioning)
- Can be queried by the application to always connect to the current **Primary**
server
When a **Primary** fails to respond, it's the application's responsibility
(in our case GitLab) to handle timeout and reconnect (querying a **Sentinel**
for a new **Primary**).
To get a better understanding on how to correctly set up Sentinel, please read
the [Redis Sentinel documentation](https://redis.io/topics/sentinel) first, as
failing to configure it correctly can lead to data loss or can bring your
whole cluster down, invalidating the failover effort.
## Recommended setup
For a minimal setup, you will install the Omnibus GitLab package in `3`
**independent** machines, both with **Redis** and **Sentinel**:
- Redis Primary + Sentinel
- Redis Replica + Sentinel
- Redis Replica + Sentinel
If you are not sure or don't understand why and where the amount of nodes come
from, read [Redis setup overview](#redis-setup-overview) and
For a recommended setup that can resist more failures, you will install
the Omnibus GitLab package in `5`**independent** machines, both with
**Redis** and **Sentinel**:
- Redis Primary + Sentinel
- Redis Replica + Sentinel
- Redis Replica + Sentinel
- Redis Replica + Sentinel
- Redis Replica + Sentinel
### Redis setup overview
You must have at least `3` Redis servers: `1` primary, `2` Replicas, and they
need to each be on independent machines (see explanation above).
You can have additional Redis nodes, that will help survive a situation
where more nodes goes down. Whenever there is only `2` nodes online, a failover
will not be initiated.
As an example, if you have `6` Redis nodes, a maximum of `3` can be
simultaneously down.
Please note that there are different requirements for Sentinel nodes.
If you host them in the same Redis machines, you may need to take
that restrictions into consideration when calculating the amount of
nodes to be provisioned. See [Sentinel setup overview](#sentinel-setup-overview)
documentation for more information.
All Redis nodes should be configured the same way and with similar server specs, as
in a failover situation, any **Replica** can be promoted as the new **Primary** by
the Sentinel servers.
The replication requires authentication, so you need to define a password to
protect all Redis nodes and the Sentinels. They will all share the same
password, and all instances must be able to talk to
each other over the network.
### Sentinel setup overview
Sentinels watch both other Sentinels and Redis nodes. Whenever a Sentinel
detects that a Redis node is not responding, it will announce that to the
other Sentinels. They have to reach the **quorum**, that is the minimum amount
of Sentinels that agrees a node is down, in order to be able to start a failover.
Whenever the **quorum** is met, the **majority** of all known Sentinel nodes
need to be available and reachable, so that they can elect the Sentinel **leader**
who will take all the decisions to restore the service availability by:
- Promoting a new **Primary**
- Reconfiguring the other **Replicas** and make them point to the new **Primary**
- Announce the new **Primary** to every other Sentinel peer
- Reconfigure the old **Primary** and demote to **Replica** when it comes back online
You must have at least `3` Redis Sentinel servers, and they need to
be each in an independent machine (that are believed to fail independently),
ideally in different geographical areas.
You can configure them in the same machines where you've configured the other
Redis servers, but understand that if a whole node goes down, you loose both
a Sentinel and a Redis instance.
The number of sentinels should ideally always be an **odd** number, for the
consensus algorithm to be effective in the case of a failure.
In a `3` nodes topology, you can only afford `1` Sentinel node going down.
Whenever the **majority** of the Sentinels goes down, the network partition
protection prevents destructive actions and a failover **will not be started**.
Here are some examples:
- With `5` or `6` sentinels, a maximum of `2` can go down for a failover begin.
- With `7` sentinels, a maximum of `3` nodes can go down.
The **Leader** election can sometimes fail the voting round when **consensus**
is not achieved (see the odd number of nodes requirement above). In that case,
a new attempt will be made after the amount of time defined in
`sentinel['failover_timeout']` (in milliseconds).
NOTE: **Note:**
We will see where `sentinel['failover_timeout']` is defined later.
The `failover_timeout` variable has a lot of different use cases. According to
the official documentation:
- The time needed to re-start a failover after a previous failover was
already tried against the same primary by a given Sentinel, is two
times the failover timeout.
- The time needed for a replica replicating to a wrong primary according
to a Sentinel current configuration, to be forced to replicate
with the right primary, is exactly the failover timeout (counting since
the moment a Sentinel detected the misconfiguration).
- The time needed to cancel a failover that is already in progress but
did not produced any configuration change (REPLICAOF NO ONE yet not
acknowledged by the promoted replica).
- The maximum time a failover in progress waits for all the replicas to be
reconfigured as replicas of the new primary. However even after this time
the replicas will be reconfigured by the Sentinels anyway, but not with
the exact parallel-syncs progression as specified.
## Configuring Redis
This is the section where we install and set up the new Redis instances.
It is assumed that you have installed GitLab and all its components from scratch.
If you already have Redis installed and running, read how to
[switch from a single-machine installation](#switching-from-an-existing-single-machine-installation).
NOTE: **Note:**
Redis nodes (both primary and replica) will need the same password defined in
`redis['password']`. At any time during a failover the Sentinels can
reconfigure a node and change its status from primary to replica and vice versa.
### Requirements
The requirements for a Redis setup are the following:
1. Provision the minimum required number of instances as specified in the
[recommended setup](#recommended-setup) section.
1. We **Do not** recommend installing Redis or Redis Sentinel in the same machines your
GitLab application is running on as this weakens your HA configuration. You can however opt in to install Redis
and Sentinel in the same machine.
1. All Redis nodes must be able to talk to each other and accept incoming
connections over Redis (`6379`) and Sentinel (`26379`) ports (unless you
change the default ones).
1. The server that hosts the GitLab application must be able to access the
Redis nodes.
1. Protect the nodes from access from external networks ([Internet](https://gitlab.com/gitlab-org/gitlab-foss/uploads/c4cc8cd353604bd80315f9384035ff9e/The_Internet_IT_Crowd.png)), using
firewall.
### Switching from an existing single-machine installation
If you already have a single-machine GitLab install running, you will need to
replicate from this machine first, before de-activating the Redis instance
inside it.
Your single-machine install will be the initial **Primary**, and the `3` others
should be configured as **Replica** pointing to this machine.
After replication catches up, you will need to stop services in the
single-machine install, to rotate the **Primary** to one of the new nodes.
Make the required changes in configuration and restart the new nodes again.
To disable Redis in the single install, edit `/etc/gitlab/gitlab.rb`:
```ruby
redis['enable'] = false
```
If you fail to replicate first, you may loose data (unprocessed background jobs).
### Step 1. Configuring the primary Redis instance
1. SSH into the **Primary** Redis server.
1. [Download/install](https://about.gitlab.com/install/) the Omnibus GitLab
package you want using **steps 1 and 2** from the GitLab downloads page.
- Make sure you select the correct Omnibus package, with the same version
and type (Community, Enterprise editions) of your current install.
- Do not complete any other steps on the download page.
1. Edit `/etc/gitlab/gitlab.rb` and add the contents:
```ruby
# Specify server role as 'redis_master_role'
roles ['redis_master_role']
# IP address pointing to a local IP that the other machines can reach to.
# You can also set bind to '0.0.0.0' which listen in all interfaces.
# If you really need to bind to an external accessible IP, make
# sure you add extra firewall rules to prevent unauthorized access.
redis['bind'] = '10.0.0.1'
# Define a port so Redis can listen for TCP requests which will allow other
# machines to connect to it.
redis['port'] = 6379
# Set up password authentication for Redis (use the same password in all nodes).
redis['password'] = 'redis-password-goes-here'
```
1. Only the primary GitLab application server should handle migrations. To
prevent database migrations from running on upgrade, add the following
configuration to your `/etc/gitlab/gitlab.rb` file:
```ruby
gitlab_rails['auto_migrate'] = false
```
1. [Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
NOTE: **Note:**
You can specify multiple roles like sentinel and Redis as:
1. [Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
### Step 5. Enable Monitoring
> [Introduced](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/3786) in GitLab 12.0.
If you enable Monitoring, it must be enabled on **all** Redis servers.
1. Make sure to collect [`CONSUL_SERVER_NODES`](../postgresql/replication_and_failover.md#consul-information), which are the IP addresses or DNS records of the Consul server nodes, for the next step. Note they are presented as `Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z`
1. Create/edit `/etc/gitlab/gitlab.rb` and add the following configuration:
In the previous example, we've used `redis_sentinel_role` and
`redis_master_role` which simplifies the amount of configuration changes.
If you want more control, here is what each one sets for you automatically
when enabled:
```ruby
## Redis Sentinel Role
redis_sentinel_role['enable'] = true
# When Sentinel Role is enabled, the following services are also enabled
sentinel['enable'] = true
# The following services are disabled
redis['enable'] = false
bootstrap['enable'] = false
nginx['enable'] = false
postgresql['enable'] = false
gitlab_rails['enable'] = false
mailroom['enable'] = false
-------
## Redis primary/replica Role
redis_master_role['enable'] = true # enable only one of them
redis_replica_role['enable'] = true # enable only one of them
# When Redis primary or Replica role are enabled, the following services are
# enabled/disabled. Note that if Redis and Sentinel roles are combined, both
# services will be enabled.
# The following services are disabled
sentinel['enable'] = false
bootstrap['enable'] = false
nginx['enable'] = false
postgresql['enable'] = false
gitlab_rails['enable'] = false
mailroom['enable'] = false
# For Redis Replica role, also change this setting from default 'true' to 'false':
redis['master'] = false
```
You can find the relevant attributes defined in [`gitlab_rails.rb`](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/files/gitlab-cookbooks/gitlab/libraries/gitlab_rails.rb).
## Troubleshooting
See the [Redis troubleshooting guide](troubleshooting.md).