History

GitLab Bot 83a3209c3f Add latest changes from gitlab-org/gitlab@master		2020-03-31 03:07:51 +00:00
..
img	Compress images with pngquant	2019-08-07 10:01:48 +00:00
README.md	Add latest changes from gitlab-org/gitlab@master	2020-03-30 12:07:40 +00:00
alpha_database.md	SSoT for administration/high_availability docs	2019-07-16 03:32:29 +00:00
consul.md	Add latest changes from gitlab-org/gitlab@master	2020-03-03 12:08:08 +00:00
database.md	Add latest changes from gitlab-org/gitlab@master	2020-03-25 06:07:58 +00:00
gitaly.md	Add latest changes from gitlab-org/gitlab@master	2020-02-29 21:08:32 +00:00
gitlab.md	Add latest changes from gitlab-org/gitlab@master	2020-03-03 12:08:08 +00:00
load_balancer.md	Add latest changes from gitlab-org/gitlab@master	2019-09-23 06:06:19 +00:00
monitoring_node.md	Add latest changes from gitlab-org/gitlab@master	2020-03-13 09:09:23 +00:00
nfs.md	Add latest changes from gitlab-org/gitlab@master	2020-03-27 03:07:56 +00:00
nfs_host_client_setup.md	Add latest changes from gitlab-org/gitlab@master	2020-03-09 00:08:14 +00:00
object_storage.md	Add latest changes from gitlab-org/gitlab@master	2020-03-25 12:08:19 +00:00
pgbouncer.md	Add latest changes from gitlab-org/gitlab@master	2020-03-31 03:07:51 +00:00
redis.md	Add latest changes from gitlab-org/gitlab@master	2020-03-09 00:08:14 +00:00
redis_source.md	Add latest changes from gitlab-org/gitlab@master	2019-09-23 06:06:19 +00:00
sidekiq.md	Add latest changes from gitlab-org/gitlab@master	2020-03-24 03:09:28 +00:00

README.md

type
reference, concepts

Scaling and High Availability

GitLab supports a number of scaling options to ensure that your self-managed instance is able to scale out to meet your organization's needs when scaling up is no longer practical or feasible.

GitLab also offers high availability options for organizations that require the fault tolerance and redundancy necessary to maintain high-uptime operations.

Scaling and high availability can be tackled separately as GitLab comprises modular components which can be individually scaled or made highly available depending on your organization's needs and resources.

On this page, we present examples of self-managed instances which demonstrate how GitLab can be scaled out and made highly available. These examples progress from simple to complex as scaling or highly-available components are added.

For larger setups serving 2,000 or more users, we provide reference architectures based on GitLab's experience with GitLab.com and internal scale testing that aim to achieve the right balance of scalability and availability.

For detailed insight into how GitLab scales and configures GitLab.com, you can watch this 1 hour Q&A with John Northrup, and live questions coming in from some of our customers.

Scaling examples

Single-node Omnibus installation

This solution is appropriate for many teams that have a single server at their disposal. With automatic backup of the GitLab repositories, configuration, and the database, this can be an optimal solution if you don't have strict availability requirements.

You can also optionally configure GitLab to use an external PostgreSQL service or an external object storage service for added performance and reliability at a relatively low complexity cost.

References:

Omnibus installation with multiple application servers

This solution is appropriate for teams that are starting to scale out when scaling up is no longer meeting their needs. In this configuration, additional application nodes will handle frontend traffic, with a load balancer in front to distribute traffic across those nodes. Meanwhile, each application node connects to a shared file server and PostgreSQL and Redis services on the back end.

The additional application servers adds limited fault tolerance to your GitLab instance. As long as one application node is online and capable of handling the instance's usage load, your team's productivity will not be interrupted. Having multiple application nodes also enables zero-downtime updates.

References:

High-availability examples

Omnibus installation with automatic database failover

By adding automatic failover for database systems, we can enable higher uptime with an additional layer of complexity.

For PostgreSQL, we provide repmgr for server cluster management and failover and a combination of PgBouncer and Consul for database client cutover.
For Redis, we use Redis Sentinel for server failover and client cutover.

You can also optionally run additional Sidekiq processes on dedicated hardware and configure individual Sidekiq processes to process specific background job queues if you need to scale out background job processing.

GitLab Geo

GitLab Geo allows you to replicate your GitLab instance to other geographical locations as a read-only fully operational instance that can also be promoted in case of disaster.

This configuration is supported in GitLab Premium and Ultimate.

References:

GitLab components and configuration instructions

The GitLab application depends on the following components. It can also depend on several third party services depending on your environment setup. Here we'll detail both in the order in which you would typically configure them along with our recommendations for their use and configuration.

Third party services

Here's some details of several third party services a typical environment will depend on. The services can be provided by numerous applications or providers and further advice can be given on how best to select. These should be configured first, before the GitLab components.

Component	Description	Configuration instructions
Load Balancer(s)¹	Handles load balancing for the GitLab nodes where required	Load balancer HA configuration
Cloud Object Storage service²	Recommended store for shared data objects	Cloud Object Storage configuration
NFS³ ⁴	Shared disk storage service. Can be used as an alternative for Gitaly or Object Storage. Required for GitLab Pages	NFS configuration

GitLab components

Next are all of the components provided directly by GitLab. As mentioned earlier, they are presented in the typical order you would configure them.

Component	Description	Configuration instructions
Consul⁵	Service discovery and health checks/failover	Consul HA configuration (PREMIUM ONLY)
PostgreSQL	Database	Database HA configuration
PgBouncer	Database Pool Manager	PgBouncer HA configuration (PREMIUM ONLY)
Redis⁵ with Redis Sentinel	Key/Value store for shared data with HA watcher service	Redis HA configuration
Gitaly⁶ ³ ⁴	Recommended high-level storage for Git repository data	Gitaly HA configuration
Sidekiq	Asynchronous/Background jobs	Sidekiq configuration
GitLab application nodes⁷	(Unicorn / Puma, Workhorse) - Web-requests (UI, API, Git over HTTP)	GitLab app HA/scaling configuration
Prometheus and Grafana	GitLab environment monitoring	Monitoring node for scaling/HA

In some cases, components can be combined on the same nodes to reduce complexity as well.

Recommended setups based on number of users

1 - 1000 Users: A single-node Omnibus setup with frequent backups. Refer to the requirements page for further details of the specs you will require.
1000 - 10000 Users: A scaled environment based on one of our Reference Architectures, without the HA components applied. This can be a reasonable step towards a fully HA environment.
2000 - 50000+ Users: A scaled HA environment based on one of our Reference Architectures below.

Reference architectures

In this section we'll detail the Reference Architectures that can support large numbers of users. These were built, tested and verified by our Quality and Support teams.

Testing was done with our GitLab Performance Tool at specific coded workloads, and the throughputs used for testing were calculated based on sample customer data. We test each endpoint type with the following number of requests per second (RPS) per 1000 users:

API: 20 RPS
Web: 2 RPS
Git: 2 RPS

NOTE: Note: Note that depending on your workflow the below recommended reference architectures may need to be adapted accordingly. Your workload is influenced by factors such as - but not limited to - how active your users are, how much automation you use, mirroring, and repo/change size. Additionally the shown memory values are given directly by GCP machine types. On different cloud vendors a best effort like for like can be used.

2,000 user configuration

Supported users (approximate): 2,000
Test RPS rates: API: 40 RPS, Web: 4 RPS, Git: 4 RPS
Known issues: List of known performance issues

Service	Nodes	Configuration⁸	GCP type
GitLab Rails⁷	3	8 vCPU, 7.2GB Memory	n1-highcpu-8
PostgreSQL	3	2 vCPU, 7.5GB Memory	n1-standard-2
PgBouncer	3	2 vCPU, 1.8GB Memory	n1-highcpu-2
Gitaly⁶ ³ ⁴	X	4 vCPU, 15GB Memory	n1-standard-4
Redis⁵	3	2 vCPU, 7.5GB Memory	n1-standard-2
Consul + Sentinel⁵	3	2 vCPU, 1.8GB Memory	n1-highcpu-2
Sidekiq	4	2 vCPU, 7.5GB Memory	n1-standard-2
Cloud Object Storage²	-	-	-
NFS Server³ ⁴	1	4 vCPU, 3.6GB Memory	n1-highcpu-4
Monitoring node	1	2 vCPU, 1.8GB Memory	n1-highcpu-2
External load balancing node¹	1	2 vCPU, 1.8GB Memory	n1-highcpu-2
Internal load balancing node¹	1	2 vCPU, 1.8GB Memory	n1-highcpu-2

5,000 user configuration

Supported users (approximate): 5,000
Test RPS rates: API: 100 RPS, Web: 10 RPS, Git: 10 RPS
Known issues: List of known performance issues

Service	Nodes	Configuration⁸	GCP type
GitLab Rails⁷	3	16 vCPU, 14.4GB Memory	n1-highcpu-16
PostgreSQL	3	2 vCPU, 7.5GB Memory	n1-standard-2
PgBouncer	3	2 vCPU, 1.8GB Memory	n1-highcpu-2
Gitaly⁶ ³ ⁴	X	8 vCPU, 30GB Memory	n1-standard-8
Redis⁵	3	2 vCPU, 7.5GB Memory	n1-standard-2
Consul + Sentinel⁵	3	2 vCPU, 1.8GB Memory	n1-highcpu-2
Sidekiq	4	2 vCPU, 7.5GB Memory	n1-standard-2
Cloud Object Storage²	-	-	-
NFS Server³ ⁴	1	4 vCPU, 3.6GB Memory	n1-highcpu-4
Monitoring node	1	2 vCPU, 1.8GB Memory	n1-highcpu-2
External load balancing node¹	1	2 vCPU, 1.8GB Memory	n1-highcpu-2
Internal load balancing node¹	1	2 vCPU, 1.8GB Memory	n1-highcpu-2

10,000 user configuration

Supported users (approximate): 10,000
Test RPS rates: API: 200 RPS, Web: 20 RPS, Git: 20 RPS
Known issues: List of known performance issues

Service	Nodes	Configuration⁸	GCP type
GitLab Rails⁷	3	32 vCPU, 28.8GB Memory	n1-highcpu-32
PostgreSQL	3	4 vCPU, 15GB Memory	n1-standard-4
PgBouncer	3	2 vCPU, 1.8GB Memory	n1-highcpu-2
Gitaly⁶ ³ ⁴	X	16 vCPU, 60GB Memory	n1-standard-16
Redis⁵ - Cache	3	4 vCPU, 15GB Memory	n1-standard-4
Redis⁵ - Queues / Shared State	3	4 vCPU, 15GB Memory	n1-standard-4
Redis Sentinel⁵ - Cache	3	1 vCPU, 1.7GB Memory	g1-small
Redis Sentinel⁵ - Queues / Shared State	3	1 vCPU, 1.7GB Memory	g1-small
Consul	3	2 vCPU, 1.8GB Memory	n1-highcpu-2
Sidekiq	4	4 vCPU, 15GB Memory	n1-standard-4
Cloud Object Storage²	-	-	-
NFS Server³ ⁴	1	4 vCPU, 3.6GB Memory	n1-highcpu-4
Monitoring node	1	4 vCPU, 3.6GB Memory	n1-highcpu-4
External load balancing node¹	1	2 vCPU, 1.8GB Memory	n1-highcpu-2
Internal load balancing node¹	1	2 vCPU, 1.8GB Memory	n1-highcpu-2

25,000 user configuration

Supported users (approximate): 25,000
Test RPS rates: API: 500 RPS, Web: 50 RPS, Git: 50 RPS
Known issues: List of known performance issues

Service	Nodes	Configuration⁸	GCP type
GitLab Rails⁷	5	32 vCPU, 28.8GB Memory	n1-highcpu-32
PostgreSQL	3	8 vCPU, 30GB Memory	n1-standard-8
PgBouncer	3	2 vCPU, 1.8GB Memory	n1-highcpu-2
Gitaly⁶ ³ ⁴	X	32 vCPU, 120GB Memory	n1-standard-32
Redis⁵ - Cache	3	4 vCPU, 15GB Memory	n1-standard-4
Redis⁵ - Queues / Shared State	3	4 vCPU, 15GB Memory	n1-standard-4
Redis Sentinel⁵ - Cache	3	1 vCPU, 1.7GB Memory	g1-small
Redis Sentinel⁵ - Queues / Shared State	3	1 vCPU, 1.7GB Memory	g1-small
Consul	3	2 vCPU, 1.8GB Memory	n1-highcpu-2
Sidekiq	4	4 vCPU, 15GB Memory	n1-standard-4
Cloud Object Storage²	-	-	-
NFS Server³ ⁴	1	4 vCPU, 3.6GB Memory	n1-highcpu-4
Monitoring node	1	4 vCPU, 3.6GB Memory	n1-highcpu-4
External load balancing node¹	1	2 vCPU, 1.8GB Memory	n1-highcpu-2
Internal load balancing node¹	1	4 vCPU, 3.6GB Memory	n1-highcpu-4

50,000 user configuration

Supported users (approximate): 50,000
Test RPS rates: API: 1000 RPS, Web: 100 RPS, Git: 100 RPS
Known issues: List of known performance issues

Service	Nodes	Configuration⁸	GCP type
GitLab Rails⁷	12	32 vCPU, 28.8GB Memory	n1-highcpu-32
PostgreSQL	3	16 vCPU, 60GB Memory	n1-standard-16
PgBouncer	3	2 vCPU, 1.8GB Memory	n1-highcpu-2
Gitaly⁶ ³ ⁴	X	64 vCPU, 240GB Memory	n1-standard-64
Redis⁵ - Cache	3	4 vCPU, 15GB Memory	n1-standard-4
Redis⁵ - Queues / Shared State	3	4 vCPU, 15GB Memory	n1-standard-4
Redis Sentinel⁵ - Cache	3	1 vCPU, 1.7GB Memory	g1-small
Redis Sentinel⁵ - Queues / Shared State	3	1 vCPU, 1.7GB Memory	g1-small
Consul	3	2 vCPU, 1.8GB Memory	n1-highcpu-2
Sidekiq	4	4 vCPU, 15GB Memory	n1-standard-4
NFS Server³ ⁴	1	4 vCPU, 3.6GB Memory	n1-highcpu-4
Cloud Object Storage²	-	-	-
Monitoring node	1	4 vCPU, 3.6GB Memory	n1-highcpu-4
External load balancing node¹	1	2 vCPU, 1.8GB Memory	n1-highcpu-2
Internal load balancing node¹	1	8 vCPU, 7.2GB Memory	n1-highcpu-8

Our architectures have been tested and validated with HAProxy as the load balancer. However other reputable load balancers with similar feature sets should also work instead but be aware these aren't validated. ↩︎
For data objects such as LFS, Uploads, Artifacts, etc. We recommend a Cloud Object Storage service over NFS where possible, due to better performance and availability. ↩︎
NFS can be used as an alternative for both repository data (replacing Gitaly) and object storage but this isn't typically recommended for performance reasons. Note however it is required for GitLab Pages. ↩︎
We strongly recommend that any Gitaly and / or NFS nodes are set up with SSD disks over HDD with a throughput of at least 8,000 IOPS for read operations and 2,000 IOPS for write as these components have heavy I/O. These IOPS values are recommended only as a starter as with time they may be adjusted higher or lower depending on the scale of your environment's workload. If you're running the environment on a Cloud provider you may need to refer to their documentation on how configure IOPS correctly. ↩︎
Recommended Redis setup differs depending on the size of the architecture. For smaller architectures (up to 5,000 users) we suggest one Redis cluster for all classes and that Redis Sentinel is hosted alongside Consul. For larger architectures (10,000 users or more) we suggest running a separate Redis Cluster for the Cache class and another for the Queues and Shared State classes respectively. We also recommend that you run the Redis Sentinel clusters separately as well for each Redis Cluster. ↩︎
Gitaly node requirements are dependent on customer data, specifically the number of projects and their sizes. We recommend 2 nodes as an absolute minimum for HA environments and at least 4 nodes should be used when supporting 50,000 or more users. We also recommend that each Gitaly node should store no more than 5TB of data and have the number of gitaly-ruby workers set to 20% of available CPUs. Additional nodes should be considered in conjunction with a review of expected data size and spread based on the recommendations above. ↩︎
In our architectures we run each GitLab Rails node using the Puma webserver and have its number of workers set to 90% of available CPUs along with 4 threads. ↩︎
The architectures were built and tested with the Intel Xeon E5 v3 (Haswell) CPU platform on GCP. On different hardware you may find that adjustments, either lower or higher, are required for your CPU or Node counts accordingly. For more info a Sysbench benchmark of the CPU can be found here. ↩︎