gitlab-org--gitlab-foss/doc/development/real_time.md

97 lines
4.2 KiB
Markdown

---
stage: Plan
group: Project Management
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
---
# Real-Time Features
This guide contains instructions on how to safely roll out new real-time
features.
Real-time features are implemented using GraphQL Subscriptions.
[Developer documentation](api_graphql_styleguide.md#subscriptions) is available.
WebSockets are a relatively new technology at GitLab, and supporting them at
scale introduces some challenges. For that reason, new features should be rolled
out using the instructions below.
## Reuse an existing WebSocket connection
Features reusing an existing connection incur minimal risk. Feature flag rollout
is recommended in order to give more control to self-hosting customers. However,
it is not necessary to roll out in percentages, or to estimate new connections for
GitLab.com.
## Introduce a new WebSocket connection
Any change that introduces a WebSocket connection to part of the GitLab application
incurs some scalability risk, both to nodes responsible for maintaining open
connections and on downstream services; such as Redis and the primary database.
### Estimate peak connections
The first real-time feature to be fully enabled on GitLab.com was
[real-time assignees](https://gitlab.com/gitlab-org/gitlab/-/issues/17589). By comparing
peak throughput to the issue page against peak simultaneous WebSocket connections it is
possible to crudely estimate that each 1 request per second adds
approximately 4200 WebSocket connections.
To understand the impact a new feature might have, sum the peak throughput (RPS)
to the pages it originates from (`n`) and apply the formula:
```ruby
(n * 4200) / peak_active_connections
```
Current active connections are visible on
[this Grafana chart](https://dashboards.gitlab.net/d/websockets-main/websockets-overview?viewPanel=1357460996&orgId=1).
This calculation is crude, and should be revised as new features are
deployed. It yields a rough estimate of the capacity that must be
supported, as a proportion of existing capacity.
### Graduated roll-out
New capacity may need to be provisioned to support your changes, depending on
current saturation and the proportion of new connections required. While
Kubernetes makes this relatively easy in most cases, there remains a risk to
downstream services.
To mitigate this, ensure that the code establishing the new WebSocket connection
is feature flagged and defaulted to `off`. A careful, percentage-based roll-out
of the feature flag ensures that effects can be observed on the [WebSocket
dashboard](https://dashboards.gitlab.net/d/websockets-main/websockets-overview?orgId=1)
1. Create a
[feature flag roll-out](https://gitlab.com/gitlab-org/gitlab/-/blob/master/.gitlab/issue_templates/Feature%20Flag%20Roll%20Out.md)
issue.
1. Add the estimated new connections required under the **What are we expecting to happen** section.
1. Copy in a member of the Plan and Scalability teams to estimate a percentage-based
roll-out plan.
## Backward compatibility
For the duration of the feature flag roll-out and indefinitely thereafter,
real-time features must be backward-compatible, or at least degrade
gracefully. Not all customers have Action Cable enabled, and further work
needs to be done before Action Cable can be enabled by default.
Making real-time a requirement represents a breaking change, so the next
opportunity to do this is version 15.0.
## Enable Real-Time by default
Mounting the Action Cable library adds minimal memory footprint. However,
serving WebSocket requests introduces additional memory requirements. For this
reason, enabling Action Cable by default requires additional work; perhaps
to reduce overall memory usage, including a known issue with Workhorse, but at
least to revise Reference Architectures.
## Real-time infrastructure on GitLab.com
On GitLab.com, WebSocket connections are served from dedicated infrastructure,
entirely separate from the regular Web fleet and deployed with Kubernetes. This
limits risk to nodes handling requests but not to shared services. For more
information on the WebSockets Kubernetes deployment see
[this epic](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/355).