Improved documentation on HA sentinel part and Redis replication troubleshooting.

2016-10-15 05:40:15 +02:00 · 2016-10-15 05:40:15 +02:00 · c4d3c0de1f
commit c4d3c0de1f
parent f54d60b41d
1 changed files with 260 additions and 64 deletions
--- a/doc/administration/high_availability/redis.md
+++ b/doc/administration/high_availability/redis.md
@ -8,6 +8,27 @@ that comes bundled with GitLab Omnibus packages.
  information. We recommend using a combination of a Redis password and tight
  firewall rules to secure your Redis service.
 <!-- START doctoc generated TOC please keep comment here to allow auto update -->
 <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
 **Table of Contents**
 - [Configure your own Redis server](#configure-your-own-redis-server)
 - [Configure Redis using Omnibus](#configure-redis-using-omnibus)
 - [Experimental Redis Sentinel support](#experimental-redis-sentinel-support)
  - [Redis setup](#redis-setup)
    - [Source install](#source-install)
    - [Omnibus Install](#omnibus-install)
    - [Troubleshooting Replication](#troubleshooting-replication)
  - [Sentinel](#sentinel)
    - [Sentinel setup (Community Edition)](#sentinel-setup-community-edition)
    - [Sentinel setup (EE Only)](#sentinel-setup-ee-only)
  - [GitLab setup](#gitlab-setup)
  - [Sentinel troubleshooting](#sentinel-troubleshooting)
    - [Omnibus install](#omnibus-install)
    - [Source install](#source-install-1)
 <!-- END doctoc generated TOC please keep comment here to allow auto update -->
 ## Configure your own Redis server
 If you're hosting GitLab on a cloud provider, you can optionally use a
@ -37,6 +58,7 @@ Redis.
    unicorn['enable'] = false
    sidekiq['enable'] = false
    postgresql['enable'] = false
    gitlab_rails['enable'] = false
    gitlab_workhorse['enable'] = false
    mailroom['enable'] = false
@ -59,120 +81,294 @@ Redis.
 ## Experimental Redis Sentinel support
-> [Introduced][ce-1877] in GitLab 8.11.
+> [Introduced][ce-1877] in GitLab 8.11, improved in 8.13.
 Since GitLab 8.11, you can configure a list of Redis Sentinel servers that
 will monitor a group of Redis servers to provide you with a standard failover
 support.
 There is currently one exception to the Sentinel support: `mail_room`, the
 component that processes incoming emails. It doesn't support Sentinel yet, but
 we hope to integrate a future release that does support it.
 To get a better understanding on how to correctly setup Sentinel, please read
 the [Redis Sentinel documentation](http://redis.io/topics/sentinel) first, as
 failing to configure it correctly can lead to data loss.
 Redis Sentinel can handle the most important tasks in a HA environment to help
 keep servers online with minimal to no downtime:
 - Monitors master and slave instances to see if they are available
 - Promote a slave to master when the master fails.
 - Demote a master to slave when failed master comes back online (to prevent
  data-partitioning).
 - Can be queried by clients to always connect to the correct master server.
 There is currently one exception to the Sentinel support: `mail_room`, the
 component that processes incoming emails. It doesn't support Sentinel yet, but
 we hope to integrate a future release that does support it soon.
 The configuration consists of three parts:
- Redis setup
+- Setup Redis Master and Slave nodes
- Sentinel setup
+- Setup Sentinel nodes
- GitLab setup
+- Setup GitLab
 > **IMPORTANT**: You need at least 3 independent machines: physical, or VMs
 running into distinct physical machines. If you fail to provision the
 machines in that specific way, any issue with the shared environment can
 bring your entire setup down.
 Read carefully how to configure those components below.
 ### Redis setup
-You must have at least 2 Redis servers: 1 Master, 1 or more Slaves.
+You must have at least `3` Redis servers: `1` Master, `2` Slaves, and they need to
 be each in a independent machine (see explanation above).
 They should be configured the same way and with similar server specs, as
-in a failover situation, any Slave can be elected as the new Master by
+in a failover situation, any `Slave` can be elected as the new `Master` by
 the Sentinel servers.
-In a minimal setup, the only required change for the slaves in `redis.conf`
+With Sentinel, you must define a password to protect the access as both
-is the addition of a `slaveof` line pointing to the initial master.
+Sentinel instances and other redis instances should be able to talk to
-You can increase the security by defining a `requirepass` configuration in
+each other over the network.
 the master, and `masterauth` in slaves.
---
+You'll need to define both `requirepass` and `masterauth` in all
 nodes because they can be re-configured at any time by the Sentinels
 during a failover, and change it's status as `Master` or `Slave`.
-**Configuring your own Redis server**
+Initial `Slave` nodes will have in `redis.conf` an additional `slaveof` line
 pointing to the initial `Master`.
-1. Add to the slaves' `redis.conf`:
+#### Source install
 **Master Redis instance**
 You need to make the following changes in `redis.conf`:
 1. Define a `bind` address pointing to a local IP that your other machines
   can reach you. If you really need to bind to an external acessible IP, make
   sure you add extra firewall rules to prevent unauthorized access:
   ```conf
   # By default, if no "bind" configuration directive is specified, Redis listens
   # for connections from all the network interfaces available on the server.
   # It is possible to listen to just one or multiple selected interfaces using
   # the "bind" configuration directive, followed by one or more IP addresses.
   #
   # Examples:
   #
   # bind 192.168.1.100 10.0.0.1
   # bind 127.0.0.1 ::1
   bind 0.0.0.0 # This will bind to all interfaces
   ```
 1. Define a `port` to force redis to listin on TCP so other machines can
   connect to it:
   ```conf
   # Accept connections on the specified port, default is 6379 (IANA #815344).
   # If port 0 is specified Redis will not listen on a TCP socket.
   port 6379
   ```
 1. Set up password authentication (use the same password in all nodes)
    ```conf
-    # IP and port of the master Redis server
+    requirepass "redis-password-goes-here"
-    slaveof 10.10.10.10 6379
+    masterauth "redis-password-goes-here"
    ```
 1. Optionally, set up password authentication for increased security.
   Add the following to master's `redis.conf`:
    ```conf
    # Optional password authentication for increased security
    requirepass "<password>"
    ```
 1. Then add this line to all the slave servers' `redis.conf`:
    ```conf
    masterauth "<password>"
    ```
 1. Restart the Redis services for the changes to take effect.
---
+**Slave Redis instance**
-**Using Redis via Omnibus**
+1. Follow same instructions from master with the extra change in `redis.conf`:
-1. Edit `/etc/gitlab/gitlab.rb` of a master Redis machine (usualy a single machine):
+   ```conf
   # IP and port of the master Redis server
   slaveof 10.10.10.10 6379
   ```
-    ```ruby
+1. Restart the Redis services for the changes to take effect.
    ## Redis TCP support (will disable UNIX socket transport)
    redis['bind'] = '0.0.0.0' # or specify an IP to bind to a single one
    redis['port'] = 6379
-    ## Master redis instance
+#### Omnibus Install
    redis['password'] = 'redis-password-goes-here'
    ```
-1. Edit `/etc/gitlab/gitlab.rb` of a slave Redis machine (should be one or more machines):
+You need to install the omnibus package in 3 different and independent machines.
 We will elect one as the initial `Master` and the other 2 as `Slaves`.
-    ```ruby
+If you are migrating from a single machine install, you may want to setup the
-    ## Redis TCP support (will disable UNIX socket transport)
+machines as Slaves, pointing to the original machine as `Master`, to migrate
-    redis['bind'] = '0.0.0.0' # or specify an IP to bind to a single one
+the data first, and than switch to this setup.
    redis['port'] = 6379
-    ## Slave redis instance
+To disable redis in the single install, edit `/etc/gitlab/gitlab.rb`:
    redis['master'] = false
    redis['master_ip'] = '10.10.10.10' # IP of master Redis server
    redis['master_port'] = 6379 # Port of master Redis server
    redis['master_password'] = "redis-password-goes-here"
    ```
-1. Reconfigure the GitLab for the changes to take effect: `sudo gitlab-ctl reconfigure`
+```ruby
 redis['enable'] = false
 ```
 **Master Redis instances**
 You need to make the following changes in `/etc/gitlab/gitlab.rb`:
 1. Define a `redis['bind']` address pointing to a local IP that your other machines
   can reach you. If you really need to bind to an external acessible IP, make
   sure you add extra firewall rules to prevent unauthorized access.
 1. Define a `redis['port']` to force redis to listin on TCP so other machines can
   connect to it.
 1. Set up password authentication with `redis['master_password']` (use the same
   password in all nodes).
 ```ruby
 ## Redis TCP support (will disable UNIX socket transport)
 redis['bind'] = '0.0.0.0' # or specify an IP to bind to a single one
 redis['port'] = 6379
 redis['requirepass'] = 'redis-password-goes-here'
 redis['master_password'] = 'redis-password-goes-here'
 ```
 Reconfigure GitLab Omnibus for the changes to take effect: `sudo gitlab-ctl reconfigure`
 **Slave Redis instances**
 You need to make the same changes listed for the `Master` instance,
 with an additional `Slave` section as in the example below:
 ```ruby
 ## Redis TCP support (will disable UNIX socket transport)
 redis['bind'] = '0.0.0.0' # or specify an IP to bind to a single one
 redis['port'] = 6379
 redis['requirepass'] = 'redis-password-goes-here'
 redis['master_password'] = 'redis-password-goes-here'
 ## Slave redis instance
 redis['master'] = false
 redis['master_ip'] = '10.10.10.10' # IP of master Redis server
 redis['master_port'] = 6379 # Port of master Redis server
 ```
 Reconfigure GitLab Omnibus for the changes to take effect: `sudo gitlab-ctl reconfigure`
 #### Troubleshooting Replication
 You can check if everything is correct by connecting to each server using
 `redis-cli` application, and sending the `INFO` command.
 If authentication was correctly defined, it should fail with:
 `NOAUTH Authentication required` error. Try to authenticate with the
 previous defined password with `AUTH redis-password-goes-here` and
 try the `INFO` command again.
 Look for the `# Replication` section where you should see some important
 information like the `role` of the server.
 When connected to a `master` redis, you will see the number of connected
 `slaves`, and a list of each with connection details.
 When it's a `slave`, you will see details of the master connection and if
 its `up` or `down`.
 ---
 Now that the Redis servers are all set up, let's configure the Sentinel
 servers.
-### Sentinel setup
+If you are not sure if your Redis servers are working and replicating
 correctly, please read the [Troubleshooting  Replication](#troubleshooting-replication)
 and fix it before proceeding with Sentinel setup.
-We provide an automated way to setup and run the Sentinel daemon
+### Sentinel
 with GitLab EE.
-See the instructions below how to setup it by yourself.
+You must have at least `3` Redis Sentinel servers, and they need to
 be each in a independent machine. You can install them in the same
 machines you installed the other `3` Redis servers.
-Here is an example configuration file (`sentinel.conf`) for a Sentinel node:
+This number is required for the consensus algorithm to be effective
 in the case of a failure. You should always have and `odd` number
 of Sentinel nodes provisioned.
 Here is a simple explanation on how Sentinel handles a failover:
 When a number of Sentinels (`quorum` value) agree the fact the `master` is
 not reachable, the **majority** of the sentinels must elect a temporary
 Sentinel `leader`, that will be responsible to start the failover proceedings.
 As an example, for a cluster of `3` Sentinels, at least `2` must agree on a
 `leader`. If you have total of `5` at least `3` must agree on the leader.
 The `quorum` is only used to detect failure, not to elect the `leader`.
 Official [Sentinel documentation](http://redis.io/topics/sentinel#example-sentinel-deployments)
 also lists different network topologies and warns againts situations like
 network partition and how it can affect the state of the HA solution. Make
 sure you read it carefully and understand the implications in your current
 setup.
 To make Sentinel setup easier, ee provide an [automated way to setup and run](#sentinel-setup-ee-only)
 the Sentinel daemon with GitLab EE.
 #### Sentinel setup (Community Edition)
 For GitLab CE, you need to install, configure, execute and monitor Sentinel
 by yourself.
 Here is an example configuration file (`sentinel.conf`) for a minimal Sentinel
 node:
 ```conf
-port 26379
+bind 0.0.0.0 # bind to all interfaces or change to a specific IP
-sentinel monitor gitlab-redis 10.0.0.1 6379 1
+port 26379 # default sentinel port
 sentinel auth-pass gitlab-redis redis-password-goes-here
 sentinel monitor gitlab-redis 10.0.0.1 6379 2
 sentinel down-after-milliseconds gitlab-redis 10000
 sentinel config-epoch gitlab-redis 0
 sentinel leader-epoch gitlab-redis 0
 ```
 #### Sentinel setup (EE Only)
 To setup sentinel, you must edit `/etc/gitlab/gitlab.rb` file.
 This is a minimal configuration required to run the daemon:
 ```ruby
 redis['master_name'] = 'gitlab-redis' # must be the same in every sentinel node
 redis['master_ip'] = '10.0.0.1' # ip of the initial master redis instance
 redis['master_port'] = 6379 # port of the initial master redis instance
 redis['master_password'] = 'your-secure-password-here' # the same value defined in redis['password'] in the master instance
 sentinel['enable'] = true
 # sentinel['port'] = 26379
 ## Quorum must reflect the amount of voting sentinels it take to start a failover.
 sentinel['quorum'] = 2
 ## Consider unresponsive server down after x amount of ms.
 # sentinel['down_after_milliseconds'] = 10000
 # sentinel['failover_timeout'] = 60000
 ```
 When you install Sentinel in a separate machine, you need to control which
 other services will be running in it. Take a look at the following variables
 and enable or disable whenever it fits your strategy:
 ```ruby
 # Enabled Redis and Sentinel services
 redis['enable'] = true
 sentinel['enable'] = true
 # Disabled all other services
 redis['enable'] = false
 bootstrap['enable'] = false
 nginx['enable'] = false
 unicorn['enable'] = false
 sidekiq['enable'] = false
 postgresql['enable'] = false
 gitlab_workhorse['enable'] = false
 gitlab_rails['enable'] = false
 mailroom['enable'] = false
 ```
 Remember that enabling a new service may also require additional configuration
 params (like `redis` for example).
 ---
 The final part is to inform the main GitLab application server of the Redis
@ -243,7 +439,7 @@ or `gitlab-rails['redis_*']` in Omnibus):
 ```conf
 # sentinel.conf:
-sentinel monitor gitlab-redis 10.10.10.10 6379 1
+sentinel monitor gitlab-redis 10.10.10.10 6379 2
 sentinel down-after-milliseconds gitlab-redis 10000
 sentinel config-epoch gitlab-redis 0
 sentinel leader-epoch gitlab-redis 0
@ -276,7 +472,7 @@ To make sure your configuration is correct:
    sudo gitlab-rails console
    # For source installations
-    sudo -u git rails console RAILS_ENV=production
+    sudo -u git rails console production
    ```
 1. Run in the console: