Improved documentation on HA sentinel part and Redis replication troubleshooting.

2016-10-15 05:40:15 +02:00 · 2016-10-15 05:40:15 +02:00 · 95f6cf339a
commit 95f6cf339a
parent e26d8e0272
1 changed files with 260 additions and 64 deletions
--- a/doc/administration/high_availability/redis.md
+++ b/doc/administration/high_availability/redis.md
@ -8,6 +8,27 @@ that comes bundled with GitLab Omnibus packages.
  information. We recommend using a combination of a Redis password and tight
  firewall rules to secure your Redis service.

+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**
+
+- [Configure your own Redis server](#configure-your-own-redis-server)
+- [Configure Redis using Omnibus](#configure-redis-using-omnibus)
+- [Experimental Redis Sentinel support](#experimental-redis-sentinel-support)
+  - [Redis setup](#redis-setup)
+    - [Source install](#source-install)
+    - [Omnibus Install](#omnibus-install)
+    - [Troubleshooting Replication](#troubleshooting-replication)
+  - [Sentinel](#sentinel)
+    - [Sentinel setup (Community Edition)](#sentinel-setup-community-edition)
+    - [Sentinel setup (EE Only)](#sentinel-setup-ee-only)
+  - [GitLab setup](#gitlab-setup)
+  - [Sentinel troubleshooting](#sentinel-troubleshooting)
+    - [Omnibus install](#omnibus-install)
+    - [Source install](#source-install-1)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
 ## Configure your own Redis server

 If you're hosting GitLab on a cloud provider, you can optionally use a
@ -37,6 +58,7 @@ Redis.
    unicorn['enable'] = false
    sidekiq['enable'] = false
    postgresql['enable'] = false
+    gitlab_rails['enable'] = false
    gitlab_workhorse['enable'] = false
    mailroom['enable'] = false

@ -59,120 +81,294 @@ Redis.

 ## Experimental Redis Sentinel support

-> [Introduced][ce-1877] in GitLab 8.11.
+> [Introduced][ce-1877] in GitLab 8.11, improved in 8.13.

 Since GitLab 8.11, you can configure a list of Redis Sentinel servers that
 will monitor a group of Redis servers to provide you with a standard failover
 support.

-There is currently one exception to the Sentinel support: `mail_room`, the
-component that processes incoming emails. It doesn't support Sentinel yet, but
-we hope to integrate a future release that does support it.
-
 To get a better understanding on how to correctly setup Sentinel, please read
 the [Redis Sentinel documentation](http://redis.io/topics/sentinel) first, as
 failing to configure it correctly can lead to data loss.

+Redis Sentinel can handle the most important tasks in a HA environment to help
+keep servers online with minimal to no downtime:
+
+- Monitors master and slave instances to see if they are available
+- Promote a slave to master when the master fails.
+- Demote a master to slave when failed master comes back online (to prevent
+  data-partitioning).
+- Can be queried by clients to always connect to the correct master server.
+
+There is currently one exception to the Sentinel support: `mail_room`, the
+component that processes incoming emails. It doesn't support Sentinel yet, but
+we hope to integrate a future release that does support it soon.
+
 The configuration consists of three parts:

- Redis setup
- Sentinel setup
- GitLab setup
+- Setup Redis Master and Slave nodes
+- Setup Sentinel nodes
+- Setup GitLab
+
+> **IMPORTANT**: You need at least 3 independent machines: physical, or VMs
+running into distinct physical machines. If you fail to provision the
+machines in that specific way, any issue with the shared environment can
+bring your entire setup down.

 Read carefully how to configure those components below.

 ### Redis setup

-You must have at least 2 Redis servers: 1 Master, 1 or more Slaves.
+You must have at least `3` Redis servers: `1` Master, `2` Slaves, and they need to
+be each in a independent machine (see explanation above).
+
 They should be configured the same way and with similar server specs, as
-in a failover situation, any Slave can be elected as the new Master by
+in a failover situation, any `Slave` can be elected as the new `Master` by
 the Sentinel servers.

-In a minimal setup, the only required change for the slaves in `redis.conf`
-is the addition of a `slaveof` line pointing to the initial master.
-You can increase the security by defining a `requirepass` configuration in
-the master, and `masterauth` in slaves.
+With Sentinel, you must define a password to protect the access as both
+Sentinel instances and other redis instances should be able to talk to
+each other over the network.

---
+You'll need to define both `requirepass` and `masterauth` in all
+nodes because they can be re-configured at any time by the Sentinels
+during a failover, and change it's status as `Master` or `Slave`.

-**Configuring your own Redis server**
+Initial `Slave` nodes will have in `redis.conf` an additional `slaveof` line
+pointing to the initial `Master`.

-1. Add to the slaves' `redis.conf`:
+#### Source install
+
+**Master Redis instance**
+
+You need to make the following changes in `redis.conf`:
+
+1. Define a `bind` address pointing to a local IP that your other machines
+   can reach you. If you really need to bind to an external acessible IP, make
+   sure you add extra firewall rules to prevent unauthorized access:
+
+   ```conf
+   # By default, if no "bind" configuration directive is specified, Redis listens
+   # for connections from all the network interfaces available on the server.
+   # It is possible to listen to just one or multiple selected interfaces using
+   # the "bind" configuration directive, followed by one or more IP addresses.
+   #
+   # Examples:
+   #
+   # bind 192.168.1.100 10.0.0.1
+   # bind 127.0.0.1 ::1
+   bind 0.0.0.0 # This will bind to all interfaces
+   ```
+
+1. Define a `port` to force redis to listin on TCP so other machines can
+   connect to it:
+
+   ```conf
+   # Accept connections on the specified port, default is 6379 (IANA #815344).
+   # If port 0 is specified Redis will not listen on a TCP socket.
+   port 6379
+   ```
+
+1. Set up password authentication (use the same password in all nodes)

    ```conf
-    # IP and port of the master Redis server
-    slaveof 10.10.10.10 6379
-    ```
-
-1. Optionally, set up password authentication for increased security.
-   Add the following to master's `redis.conf`:
-
-    ```conf
-    # Optional password authentication for increased security
-    requirepass "<password>"
-    ```
-
-1. Then add this line to all the slave servers' `redis.conf`:
-
-    ```conf
-    masterauth "<password>"
+    requirepass "redis-password-goes-here"
+    masterauth "redis-password-goes-here"
    ```

 1. Restart the Redis services for the changes to take effect.

---
+**Slave Redis instance**

-**Using Redis via Omnibus**
+1. Follow same instructions from master with the extra change in `redis.conf`:

-1. Edit `/etc/gitlab/gitlab.rb` of a master Redis machine (usualy a single machine):
+   ```conf
+   # IP and port of the master Redis server
+   slaveof 10.10.10.10 6379
+   ```

-    ```ruby
-    ## Redis TCP support (will disable UNIX socket transport)
-    redis['bind'] = '0.0.0.0' # or specify an IP to bind to a single one
-    redis['port'] = 6379
+1. Restart the Redis services for the changes to take effect.

-    ## Master redis instance
-    redis['password'] = 'redis-password-goes-here'
-    ```
+#### Omnibus Install

-1. Edit `/etc/gitlab/gitlab.rb` of a slave Redis machine (should be one or more machines):
+You need to install the omnibus package in 3 different and independent machines.
+We will elect one as the initial `Master` and the other 2 as `Slaves`.

-    ```ruby
-    ## Redis TCP support (will disable UNIX socket transport)
-    redis['bind'] = '0.0.0.0' # or specify an IP to bind to a single one
-    redis['port'] = 6379
+If you are migrating from a single machine install, you may want to setup the
+machines as Slaves, pointing to the original machine as `Master`, to migrate
+the data first, and than switch to this setup.

-    ## Slave redis instance
-    redis['master'] = false
-    redis['master_ip'] = '10.10.10.10' # IP of master Redis server
-    redis['master_port'] = 6379 # Port of master Redis server
-    redis['master_password'] = "redis-password-goes-here"
-    ```
+To disable redis in the single install, edit `/etc/gitlab/gitlab.rb`:

-1. Reconfigure the GitLab for the changes to take effect: `sudo gitlab-ctl reconfigure`
+```ruby
+redis['enable'] = false
+```
+
+**Master Redis instances**
+
+You need to make the following changes in `/etc/gitlab/gitlab.rb`:
+
+1. Define a `redis['bind']` address pointing to a local IP that your other machines
+   can reach you. If you really need to bind to an external acessible IP, make
+   sure you add extra firewall rules to prevent unauthorized access.
+1. Define a `redis['port']` to force redis to listin on TCP so other machines can
+   connect to it.
+1. Set up password authentication with `redis['master_password']` (use the same
+   password in all nodes).
+
+```ruby
+## Redis TCP support (will disable UNIX socket transport)
+redis['bind'] = '0.0.0.0' # or specify an IP to bind to a single one
+redis['port'] = 6379
+redis['requirepass'] = 'redis-password-goes-here'
+redis['master_password'] = 'redis-password-goes-here'
+```
+
+Reconfigure GitLab Omnibus for the changes to take effect: `sudo gitlab-ctl reconfigure`
+
+**Slave Redis instances**
+
+You need to make the same changes listed for the `Master` instance,
+with an additional `Slave` section as in the example below:
+
+```ruby
+## Redis TCP support (will disable UNIX socket transport)
+redis['bind'] = '0.0.0.0' # or specify an IP to bind to a single one
+redis['port'] = 6379
+redis['requirepass'] = 'redis-password-goes-here'
+redis['master_password'] = 'redis-password-goes-here'
+
+## Slave redis instance
+redis['master'] = false
+redis['master_ip'] = '10.10.10.10' # IP of master Redis server
+redis['master_port'] = 6379 # Port of master Redis server
+```
+
+Reconfigure GitLab Omnibus for the changes to take effect: `sudo gitlab-ctl reconfigure`
+
+#### Troubleshooting Replication
+
+You can check if everything is correct by connecting to each server using
+`redis-cli` application, and sending the `INFO` command.
+
+If authentication was correctly defined, it should fail with:
+`NOAUTH Authentication required` error. Try to authenticate with the
+previous defined password with `AUTH redis-password-goes-here` and
+try the `INFO` command again.
+
+Look for the `# Replication` section where you should see some important
+information like the `role` of the server.
+
+When connected to a `master` redis, you will see the number of connected
+`slaves`, and a list of each with connection details.
+
+When it's a `slave`, you will see details of the master connection and if
+its `up` or `down`.

 ---

 Now that the Redis servers are all set up, let's configure the Sentinel
 servers.

-### Sentinel setup
+If you are not sure if your Redis servers are working and replicating
+correctly, please read the [Troubleshooting  Replication](#troubleshooting-replication)
+and fix it before proceeding with Sentinel setup.

-We provide an automated way to setup and run the Sentinel daemon
-with GitLab EE.
+### Sentinel

-See the instructions below how to setup it by yourself.
+You must have at least `3` Redis Sentinel servers, and they need to
+be each in a independent machine. You can install them in the same
+machines you installed the other `3` Redis servers.

-Here is an example configuration file (`sentinel.conf`) for a Sentinel node:
+This number is required for the consensus algorithm to be effective
+in the case of a failure. You should always have and `odd` number
+of Sentinel nodes provisioned.
+
+Here is a simple explanation on how Sentinel handles a failover:
+
+When a number of Sentinels (`quorum` value) agree the fact the `master` is
+not reachable, the **majority** of the sentinels must elect a temporary
+Sentinel `leader`, that will be responsible to start the failover proceedings.
+
+As an example, for a cluster of `3` Sentinels, at least `2` must agree on a
+`leader`. If you have total of `5` at least `3` must agree on the leader.
+
+The `quorum` is only used to detect failure, not to elect the `leader`.
+
+Official [Sentinel documentation](http://redis.io/topics/sentinel#example-sentinel-deployments)
+also lists different network topologies and warns againts situations like
+network partition and how it can affect the state of the HA solution. Make
+sure you read it carefully and understand the implications in your current
+setup.
+
+To make Sentinel setup easier, ee provide an [automated way to setup and run](#sentinel-setup-ee-only)
+the Sentinel daemon with GitLab EE.
+
+#### Sentinel setup (Community Edition)
+
+For GitLab CE, you need to install, configure, execute and monitor Sentinel
+by yourself.
+
+Here is an example configuration file (`sentinel.conf`) for a minimal Sentinel
+node:

 ```conf
-port 26379
-sentinel monitor gitlab-redis 10.0.0.1 6379 1
+bind 0.0.0.0 # bind to all interfaces or change to a specific IP
+port 26379 # default sentinel port
+sentinel auth-pass gitlab-redis redis-password-goes-here
+sentinel monitor gitlab-redis 10.0.0.1 6379 2
 sentinel down-after-milliseconds gitlab-redis 10000
 sentinel config-epoch gitlab-redis 0
 sentinel leader-epoch gitlab-redis 0
 ```

+#### Sentinel setup (EE Only)
+
+To setup sentinel, you must edit `/etc/gitlab/gitlab.rb` file.
+This is a minimal configuration required to run the daemon:
+
+```ruby
+redis['master_name'] = 'gitlab-redis' # must be the same in every sentinel node
+redis['master_ip'] = '10.0.0.1' # ip of the initial master redis instance
+redis['master_port'] = 6379 # port of the initial master redis instance
+redis['master_password'] = 'your-secure-password-here' # the same value defined in redis['password'] in the master instance
+
+sentinel['enable'] = true
+# sentinel['port'] = 26379
+
+## Quorum must reflect the amount of voting sentinels it take to start a failover.
+sentinel['quorum'] = 2
+
+## Consider unresponsive server down after x amount of ms.
+# sentinel['down_after_milliseconds'] = 10000
+
+# sentinel['failover_timeout'] = 60000
+```
+
+When you install Sentinel in a separate machine, you need to control which
+other services will be running in it. Take a look at the following variables
+and enable or disable whenever it fits your strategy:
+
+```ruby
+# Enabled Redis and Sentinel services
+redis['enable'] = true
+sentinel['enable'] = true
+
+# Disabled all other services
+redis['enable'] = false
+bootstrap['enable'] = false
+nginx['enable'] = false
+unicorn['enable'] = false
+sidekiq['enable'] = false
+postgresql['enable'] = false
+gitlab_workhorse['enable'] = false
+gitlab_rails['enable'] = false
+mailroom['enable'] = false
+```
+
+Remember that enabling a new service may also require additional configuration
+params (like `redis` for example).
+
 ---

 The final part is to inform the main GitLab application server of the Redis
@ -243,7 +439,7 @@ or `gitlab-rails['redis_*']` in Omnibus):

 ```conf
 # sentinel.conf:
-sentinel monitor gitlab-redis 10.10.10.10 6379 1
+sentinel monitor gitlab-redis 10.10.10.10 6379 2
 sentinel down-after-milliseconds gitlab-redis 10000
 sentinel config-epoch gitlab-redis 0
 sentinel leader-epoch gitlab-redis 0
@ -276,7 +472,7 @@ To make sure your configuration is correct:
    sudo gitlab-rails console

    # For source installations
-    sudo -u git rails console RAILS_ENV=production
+    sudo -u git rails console production
    ```

 1. Run in the console: