211 lines
12 KiB
Markdown
211 lines
12 KiB
Markdown
|
---
|
||
|
stage: Growth
|
||
|
group: Product Intelligence
|
||
|
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
|
||
|
---
|
||
|
|
||
|
# Develop and test Service Ping
|
||
|
|
||
|
To add a new metric and test Service Ping:
|
||
|
|
||
|
1. [Name and place the metric](#name-and-place-the-metric)
|
||
|
1. [Test counters manually using your Rails console](#test-counters-manually-using-your-rails-console)
|
||
|
1. [Generate the SQL query](#generate-the-sql-query)
|
||
|
1. [Optimize queries with `#database-lab`](#optimize-queries-with-database-lab)
|
||
|
1. [Add the metric definition](#add-the-metric-definition)
|
||
|
1. [Add the metric to the Versions Application](#add-the-metric-to-the-versions-application)
|
||
|
1. [Create a merge request](#create-a-merge-request)
|
||
|
1. [Verify your metric](#verify-your-metric)
|
||
|
1. [Set up and test Service Ping locally](#set-up-and-test-service-ping-locally)
|
||
|
|
||
|
## Name and place the metric
|
||
|
|
||
|
Add the metric in one of the top-level keys:
|
||
|
|
||
|
- `settings`: for settings related metrics.
|
||
|
- `counts_weekly`: for counters that have data for the most recent 7 days.
|
||
|
- `counts_monthly`: for counters that have data for the most recent 28 days.
|
||
|
- `counts`: for counters that have data for all time.
|
||
|
|
||
|
### How to get a metric name suggestion
|
||
|
|
||
|
The metric YAML generator can suggest a metric name for you.
|
||
|
To generate a metric name suggestion, first instrument the metric at the provided `key_path`.
|
||
|
Then, generate the metric's YAML definition and
|
||
|
return to the instrumentation and update it.
|
||
|
|
||
|
1. Add the metric instrumentation to `lib/gitlab/usage_data.rb` inside one
|
||
|
of the [top-level keys](#name-and-place-the-metric), using any name you choose.
|
||
|
1. Run the [metrics YAML generator](metrics_dictionary.md#metrics-definition-and-validation).
|
||
|
1. Use the metric name suggestion to select a suitable metric name.
|
||
|
1. Update the instrumentation you created in the first step and change the metric name to the suggested name.
|
||
|
1. Update the metric's YAML definition with the correct `key_path`.
|
||
|
|
||
|
## Test counters manually using your Rails console
|
||
|
|
||
|
```ruby
|
||
|
# count
|
||
|
Gitlab::UsageData.count(User.active)
|
||
|
Gitlab::UsageData.count(::Clusters::Cluster.aws_installed.enabled, :cluster_id)
|
||
|
|
||
|
# count distinct
|
||
|
Gitlab::UsageData.distinct_count(::Project, :creator_id)
|
||
|
Gitlab::UsageData.distinct_count(::Note.with_suggestions.where(time_period), :author_id, start: ::User.minimum(:id), finish: ::User.maximum(:id))
|
||
|
```
|
||
|
|
||
|
## Generate the SQL query
|
||
|
|
||
|
Your Rails console returns the generated SQL queries. For example:
|
||
|
|
||
|
```ruby
|
||
|
pry(main)> Gitlab::UsageData.count(User.active)
|
||
|
(2.6ms) SELECT "features"."key" FROM "features"
|
||
|
(15.3ms) SELECT MIN("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4))
|
||
|
(2.4ms) SELECT MAX("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4))
|
||
|
(1.9ms) SELECT COUNT("users"."id") FROM "users" WHERE ("users"."state" IN ('active')) AND ("users"."user_type" IS NULL OR "users"."user_type" IN (6, 4)) AND "users"."id" BETWEEN 1 AND 100000
|
||
|
```
|
||
|
|
||
|
## Optimize queries with `#database-lab`
|
||
|
|
||
|
`#database-lab` is a Slack channel that uses a production-sized environment to test your queries.
|
||
|
Paste the SQL query into `#database-lab` to see how the query performs at scale.
|
||
|
|
||
|
- GitLab.com's production database has a 15 second timeout.
|
||
|
- Any single query must stay below the [1 second execution time](../query_performance.md#timing-guidelines-for-queries) with cold caches.
|
||
|
- Add a specialized index on columns involved to reduce the execution time.
|
||
|
|
||
|
To understand the query's execution, we add the following information
|
||
|
to a merge request description:
|
||
|
|
||
|
- For counters that have a `time_period` test, we add information for both:
|
||
|
- `time_period = {}` for all time periods.
|
||
|
- `time_period = { created_at: 28.days.ago..Time.current }` for the last 28 days.
|
||
|
- Execution plan and query time before and after optimization.
|
||
|
- Query generated for the index and time.
|
||
|
- Migration output for up and down execution.
|
||
|
|
||
|
We also use `#database-lab` and [explain.depesz.com](https://explain.depesz.com/). For more details, see the [database review guide](../database_review.md#preparation-when-adding-or-modifying-queries).
|
||
|
|
||
|
### Optimization recommendations and examples
|
||
|
|
||
|
- Use specialized indexes. For examples, see these merge requests:
|
||
|
- [Example 1](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/26871)
|
||
|
- [Example 2](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/26445)
|
||
|
- Use defined `start` and `finish`, and simple queries.
|
||
|
These values can be memoized and reused, as in this [example merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/37155).
|
||
|
- Avoid joins and write the queries as simply as possible,
|
||
|
as in this [example merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/36316).
|
||
|
- Set a custom `batch_size` for `distinct_count`, as in this [example merge request](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/38000).
|
||
|
|
||
|
## Add the metric definition
|
||
|
|
||
|
See the [Metrics Dictionary guide](metrics_dictionary.md) for more information.
|
||
|
|
||
|
## Add the metric to the Versions Application
|
||
|
|
||
|
Check if the new metric must be added to the Versions Application. See the `usage_data` [schema](https://gitlab.com/gitlab-services/version-gitlab-com/-/blob/master/db/schema.rb#L147) and Service Data [parameters accepted](https://gitlab.com/gitlab-services/version-gitlab-com/-/blob/master/app/services/usage_ping.rb). Any metrics added under the `counts` key are saved in the `stats` column.
|
||
|
|
||
|
## Create a merge request
|
||
|
|
||
|
Create a merge request for the new Service Ping metric, and do the following:
|
||
|
|
||
|
- Add the `feature` label to the merge request. A metric is a user-facing change and is part of expanding the Service Ping feature.
|
||
|
- Add a changelog entry that complies with the [changelog entries guide](../changelog.md).
|
||
|
- Ask for a Product Intelligence review.
|
||
|
On GitLab.com, we have DangerBot set up to monitor Product Intelligence related files and recommend a [Product Intelligence review](review_guidelines.md).
|
||
|
|
||
|
## Verify your metric
|
||
|
|
||
|
On GitLab.com, the Product Intelligence team regularly [monitors Service Ping](https://gitlab.com/groups/gitlab-org/-/epics/6000).
|
||
|
They may alert you that your metrics need further optimization to run quicker and with greater success.
|
||
|
|
||
|
The Service Ping JSON payload for GitLab.com is shared in the
|
||
|
[#g_product_intelligence](https://gitlab.slack.com/archives/CL3A7GFPF) Slack channel every week.
|
||
|
|
||
|
You may also use the [Service Ping QA dashboard](https://app.periscopedata.com/app/gitlab/632033/Usage-Ping-QA) to check how well your metric performs.
|
||
|
The dashboard allows filtering by GitLab version, by "Self-managed" and "SaaS", and shows you how many failures have occurred for each metric. Whenever you notice a high failure rate, you can re-optimize your metric.
|
||
|
|
||
|
## Set up and test Service Ping locally
|
||
|
|
||
|
To set up Service Ping locally, you must:
|
||
|
|
||
|
1. [Set up local repositories](#set-up-local-repositories).
|
||
|
1. [Test local setup](#test-local-setup).
|
||
|
1. (Optional) [Test Prometheus-based Service Ping](#test-prometheus-based-service-ping).
|
||
|
|
||
|
### Set up local repositories
|
||
|
|
||
|
1. Clone and start [GitLab](https://gitlab.com/gitlab-org/gitlab-development-kit).
|
||
|
1. Clone and start [Versions Application](https://gitlab.com/gitlab-services/version-gitlab-com).
|
||
|
Make sure you run `docker-compose up` to start a PostgreSQL and Redis instance.
|
||
|
1. Point GitLab to the Versions Application endpoint instead of the default endpoint:
|
||
|
1. Open [service_ping/submit_service.rb](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/services/service_ping/submit_service.rb#L5) in your local and modified `PRODUCTION_URL`.
|
||
|
1. Set it to the local Versions Application URL: `http://localhost:3000/usage_data`.
|
||
|
|
||
|
### Test local setup
|
||
|
|
||
|
1. Using the `gitlab` Rails console, manually trigger Service Ping:
|
||
|
|
||
|
```ruby
|
||
|
ServicePing::SubmitService.new.execute
|
||
|
```
|
||
|
|
||
|
1. Use the `versions` Rails console to check the Service Ping was successfully received,
|
||
|
parsed, and stored in the Versions database:
|
||
|
|
||
|
```ruby
|
||
|
UsageData.last
|
||
|
```
|
||
|
|
||
|
## Test Prometheus-based Service Ping
|
||
|
|
||
|
If the data submitted includes metrics [queried from Prometheus](index.md#prometheus-queries)
|
||
|
you want to inspect and verify, you must:
|
||
|
|
||
|
- Ensure that a Prometheus server is running locally.
|
||
|
- Ensure the respective GitLab components are exporting metrics to the Prometheus server.
|
||
|
|
||
|
If you do not need to test data coming from Prometheus, no further action
|
||
|
is necessary. Service Ping should degrade gracefully in the absence of a running Prometheus server.
|
||
|
|
||
|
Three kinds of components may export data to Prometheus, and are included in Service Ping:
|
||
|
|
||
|
- [`node_exporter`](https://github.com/prometheus/node_exporter): Exports node metrics
|
||
|
from the host machine.
|
||
|
- [`gitlab-exporter`](https://gitlab.com/gitlab-org/gitlab-exporter): Exports process metrics
|
||
|
from various GitLab components.
|
||
|
- Other various GitLab services, such as Sidekiq and the Rails server, which export their own metrics.
|
||
|
|
||
|
### Test with an Omnibus container
|
||
|
|
||
|
This is the recommended approach to test Prometheus-based Service Ping.
|
||
|
|
||
|
To verify your change, build a new Omnibus image from your code branch using CI/CD, download the image,
|
||
|
and run a local container instance:
|
||
|
|
||
|
1. From your merge request, select the `qa` stage, then trigger the `package-and-qa` job. This job triggers an Omnibus
|
||
|
build in a [downstream pipeline of the `omnibus-gitlab-mirror` project](https://gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/-/pipelines).
|
||
|
1. In the downstream pipeline, wait for the `gitlab-docker` job to finish.
|
||
|
1. Open the job logs and locate the full container name including the version. It takes the following form: `registry.gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/gitlab-ee:<VERSION>`.
|
||
|
1. On your local machine, make sure you are signed in to the GitLab Docker registry. You can find the instructions for this in
|
||
|
[Authenticate to the GitLab Container Registry](../../user/packages/container_registry/index.md#authenticate-with-the-container-registry).
|
||
|
1. Once signed in, download the new image by using `docker pull registry.gitlab.com/gitlab-org/build/omnibus-gitlab-mirror/gitlab-ee:<VERSION>`
|
||
|
1. For more information about working with and running Omnibus GitLab containers in Docker, refer to [GitLab Docker images](https://docs.gitlab.com/omnibus/docker/README.html) in the Omnibus documentation.
|
||
|
|
||
|
### Test with GitLab development toolkits
|
||
|
|
||
|
This is the less recommended approach, because it comes with a number of difficulties when emulating a real GitLab deployment.
|
||
|
|
||
|
The [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit) is not set up to run a Prometheus server or `node_exporter` alongside other GitLab components. If you would
|
||
|
like to do so, [Monitoring the GDK with Prometheus](https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/prometheus/index.md#monitoring-the-gdk-with-prometheus) is a good start.
|
||
|
|
||
|
The [GCK](https://gitlab.com/gitlab-org/gitlab-compose-kit) has limited support for testing Prometheus based Service Ping.
|
||
|
By default, it comes with a fully configured Prometheus service that is set up to scrape a number of components.
|
||
|
However, it has the following limitations:
|
||
|
|
||
|
- It does not run a `gitlab-exporter` instance, so several `process_*` metrics from services such as Gitaly may be missing.
|
||
|
- While it runs a `node_exporter`, `docker-compose` services emulate hosts, meaning that it normally reports itself as not associated
|
||
|
with any of the other running services. That is not how node metrics are reported in a production setup, where `node_exporter`
|
||
|
always runs as a process alongside other GitLab components on any given node. For Service Ping, none of the node data would therefore
|
||
|
appear to be associated to any of the services running, because they all appear to be running on different hosts. To alleviate this problem, the `node_exporter` in GCK was arbitrarily "assigned" to the `web` service, meaning only for this service `node_*` metrics appears in Service Ping.
|