From b96fdf3f57f08737ec1d6d55027d6b6f27c4588a Mon Sep 17 00:00:00 2001 From: Jason Colyer Date: Tue, 6 Aug 2019 06:54:30 +0000 Subject: [PATCH] Add ElasticSearch troubleshooting * Added elasticsearch.md * Added reference to guide in Troubleshooting section --- doc/administration/index.md | 1 + .../troubleshooting/elasticsearch.md | 345 ++++++++++++++++++ 2 files changed, 346 insertions(+) create mode 100644 doc/administration/troubleshooting/elasticsearch.md diff --git a/doc/administration/index.md b/doc/administration/index.md index febee3e5af8..f7f9d753e58 100644 --- a/doc/administration/index.md +++ b/doc/administration/index.md @@ -187,3 +187,4 @@ Learn how to install, configure, update, and maintain your GitLab instance. - [Sidekiq Troubleshooting](troubleshooting/sidekiq.md): Debug when Sidekiq appears hung and is not processing jobs. - Useful [diagnostics tools](troubleshooting/diagnostics_tools.md) that are sometimes used by the GitLab Support team. +- [Troubleshooting ElasticSearch](troubleshooting/elasticsearch.md): Tips to troubleshoot ElasticSearch. diff --git a/doc/administration/troubleshooting/elasticsearch.md b/doc/administration/troubleshooting/elasticsearch.md new file mode 100644 index 00000000000..d54dd758d29 --- /dev/null +++ b/doc/administration/troubleshooting/elasticsearch.md @@ -0,0 +1,345 @@ +# Troubleshooting ElasticSearch + +Troubleshooting ElasticSearch requires: + +- Knowledge of common terms. +- Establishing within which category the problem fits. + +## Common Terminology + +- **Lucene**: A full-text search library written in Java. +- **Near Realtime (NRT)**: Refers to the slight latency from the time to index a + document to the time when it becomes searchable. +- **Cluster**: A collection of one or more nodes that work together to hold all + the data, providing indexing and search capabilities. +- **Node**: A single server that works as part of a cluster. +- **Index**: A collection of documents that have somewhat similar characteristics. +- **Document**: A basic unit of information that can be indexed. +- **Shards**: Fully-functional and independent subdivisions of indices. Each shard is actually + a Lucene index. +- **Replicas**: Failover mechanisms that duplicate indices. + +## Troubleshooting workflows + +The type of problem will determine what steps to take. The possible troubleshooting workflows are for: + +- Search results. +- Indexing. +- Integration. +- Performance. + +### Search Results workflow + +The following workflow is for ElasticSearch search results issues: + +```mermaid +graph TD; + B --> |No| B1 + B --> |Yes| B4 + B1 --> B2 + B2 --> B3 + B4 --> B5 + B5 --> |Yes| B6 + B5 --> |No| B7 + B7 --> B8 + B{Is GitLab using
ElasticSearch for
searching?} + B1[Check Admin Area > Integrations
to ensure the settings are correct] + B2[Perform a search via
the rails console] + B3[If all settings are correct
and it still doesn't show ElasticSearch
doing the searches, escalate
to GitLab support.] + B4[Perform
the same search via the
ElasticSearch API] + B5{Are the results
the same?} + B6[This means it is working as intended.
Speak with GitLab support
to confirm if the issue lies with
the filters.] + B7[Check the index status of the project
containing the missing search
results.] + B8(Indexing Troubleshooting) +``` + +### Indexing workflow + +The following workflow is for ElasticSearch indexing issues: + +```mermaid +graph TD; + C --> |Yes| C1 + C1 --> |Yes| C2 + C1 --> |No| C3 + C3 --> |Yes| C4 + C3 --> |No| C5 + C --> |No| C6 + C6 --> |No| C10 + C7 --> |GitLab| C8 + C7 --> |ElasticSearch| C9 + C6 --> |Yes| C7 + C10 --> |No| C12 + C10 --> |Yes| C11 + C12 --> |Yes| C13 + C12 --> |No| C14 + C14 --> |Yes| C15 + C14 --> |No| C16 + C{Is the problem with
creating an empty
index?} + C1{Does the gitlab-production
index exist on the
ElasticSearch instance?} + C2(Try to manually
delete the index on the
ElasticSearch instance and
retry creating an empty index.) + C3{Can indices be made
manually on the ElasticSearch
instance?} + C4(Retry the creation of an empty index) + C5(It is best to speak with an
ElasticSearch admin concerning the
instance's inability to create indices.) + C6{Is the indexer presenting
errors during indexing?} + C7{Is the error a GitLab
error or an ElasticSearch
error?} + C8[Escalate to
GitLab support] + C9[You will want
to speak with an
ElasticSearch admin.] + C10{Does the index status
show 100%?} + C11[Escalate to
GitLab support] + C12{Does re-indexing the project
present any GitLab errors?} + C13[Rectify the GitLab errors and
restart troubleshooting, or
escalate to GitLab support.] + C14{Does re-indexing the project
present errors on the
ElasticSearch instance?} + C15[It would be best
to speak with an
ElasticSearch admin.] + C16[This is likely a bug/issue
in GitLab and will require
deeper investigation. Escalate
to GitLab support.] +``` + +### Integration workflow + +The following workflow is for ElasticSearch integration issues: + +```mermaid +graph TD; + D --> |No| D1 + D --> |Yes| D2 + D2 --> |No| D3 + D2 --> |Yes| D4 + D4 --> |No| D5 + D4 --> |Yes| D6 + D{Is the error concerning
the beta indexer?} + D1[It would be best
to speak with an
ElasticSearch admin.] + D2{Is the ICU development
package installed?} + D3>This package is required.
Install the package
and retry.] + D4{Is the error stemming
from the indexer?} + D5[This would indicate an OS level
issue. It would be best to
contact your sysadmin.] + D6[This is likely a bug/issue
in GitLab and will require
deeper investigation. Escalate
to GitLab support.] +``` + +### Performance workflow + +The following workflow is for ElasticSearch performance issues: + +```mermaid +graph TD; + F --> |Yes| F1 + F --> |No| F2 + F2 --> |No| F3 + F2 --> |Yes| F4 + F4 --> F5 + F5 --> |No| F6 + F5 --> |Yes| F7 + F{Is the ElasticSearch instance
running on the same server
as the GitLab instance?} + F1(This is not advised and will cause issues.
We recommend moving the ElasticSearch
instance to a different server.) + F2{Does the ElasticSearch
server have at least 8
GB of RAM and 2 CPU
cores?} + F3(According to ElasticSearch, a non-prod
server needs these as a base requirement.
Production often requires more. We recommend
you increase the server specifications.) + F4(Obtain the
cluster health information) + F5(Does it show the
status as green?) + F6(We recommend you speak with
an ElasticSearch admin
about implementing sharding.) + F7(Escalate to
GitLab support.) +``` + +## Troubleshooting Walkthrough + +Most ElasticSearch troubleshooting can be broken down into 4 categories: + +- [Troubleshooting search results](#troubleshooting-search-results) +- [Troubleshooting indexing](#troubleshooting-indexing) +- [Troubleshooting integration](#troubleshooting-integration) +- [Troubleshooting performance](#troubleshooting-performance) + +Generally speaking, if it does not fall into those four categories, it is either: + +- Something GitLab support needs to look into. +- Not a true ElasticSearch issue. + +Exercise caution. Issues that appear to be ElasticSearch problems can be OS-level issues. + +### Troubleshooting search results + +Troubleshooting search result issues is rather straight forward on ElasticSearch. + +The first step is to confirm GitLab is using ElasticSearch for the search function. +To do this: + +1. Confirm the integration is enabled in **Admin Area > Settings > Integrations**. +1. Confirm searches utilize ElasticSearch by accessing the rails console + (`sudo gitlab-rails console`) and running the following commands: + + ```rails + u = User.find_by_email('email_of_user_doing_search') + s = SearchService.new(u, {:search => 'search_term'}) + pp s.search_objects.class.name + ``` + +The ouput from the last command is the key here. If it shows: + +- `ActiveRecord::Relation`, **it is not** using ElasticSearch. +- `Kaminari::PaginatableArray`, **it is** using ElasticSearch. + +| Not using ElasticSearch | Using ElasticSearch | +|--------------------------|------------------------------| +| `ActiveRecord::Relation` | `Kaminari::PaginatableArray` | + +If all the settings look correct and it is still not using ElasticSearch for the search function, it is best to escalate to GitLab support. This could be a bug/issue. + +Moving past that, it is best to attempt the same search using the [ElasticSearch Search API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html) and compare the results from what you see in GitLab. + +If the results: + +- Sync up, then there is not a technical "issue" per se. Instead, it might be a problem + with the ElasticSearch filters we are using. This can be complicated, so it is best to + escalate to GitLab support to check these and guide you on the potential on whether or + not a feature request is needed. +- Do not match up, this indicates a problem with the documents generated from the + project. It is best to re-index that project and proceed with + [Troubleshooting indexing](#troubleshooting-indexing). + +### Troubleshooting indexing + +Troubleshooting indexing issues can be tricky. It can pretty quickly go to either GitLab +support or your ElasticSearch admin. + +The best place to start is to determine if the issue is with creating an empty index. +If it is, check on the ElasticSearch side to determine if the `gitlab-production` (the +name for the GitLab index) exists. If it exists, manually delete it on the ElasticSearch +side and attempt to recreate it from the +[`create_empty_index`](../../integration/elasticsearch.md#gitlab-elasticsearch-rake-tasks) +rake task. + +If you still encounter issues, try creating an index manually on the ElasticSearch +instance. The details of the index aren't important here, as we want to test if indices +can be made. If the indices: + +- Cannot be made, speak with your ElasticSearch admin. +- Can be made, Escalate this to GitLab support. + +If the issue is not with creating an empty index, the next step is to check for errors +during the indexing of projects. If errors do occur, they will either stem from the indexing: + +- On the GitLab side. You need to rectify those. If they are not + something you are familiar with, contact GitLab support for guidance. +- Within the ElasticSearch instance itself. See if the error is [documented and has a fix](../../integration/elasticsearch.md#troubleshooting). If not, speak with your ElasticSearch admin. + +If the indexing process does not present errors, you will want to check the status of the indexed projects. You can do this via the following rake tasks: + +- [`sudo gitlab-rake gitlab:elastic:index_projects_status`](../../integration/elasticsearch.md#gitlab-elasticsearch-rake-tasks) (shows the overall status) +- [`sudo gitlab-rake gitlab:elastic:projects_not_indexed`](../../integration/elasticsearch.md#gitlab-elasticsearch-rake-tasks) (shows specific projects that are not indexed) + +If: + +- Everything is showing at 100%, escalate to GitLab support. This could be a potential + bug/issue. +- You do see something not at 100%, attempt to reindex that project. To do this, + run `sudo gitlab-rake gitlab:elastic:index_projects ID_FROM= ID_TO=`. + +If reindexing the project shows: + +- Errors on the GitLab side, escalate those to GitLab support. +- ElasticSearch errors or doesn't present any errors at all, reach out to your + ElasticSearch admin to check the instance. + +### Troubleshooting integration + +Troubleshooting integration tends to be pretty straight forward, as there really isn't +much to "integrate" here. + +If the issue is: + +- Not concerning the beta indexer, it is almost always an + ElasticSearch-side issue. This means you should reach out to your ElasticSearch admin + regarding the error(s) you are seeing. If you are unsure here, it never hurts to reach + out to GitLab support. +- With the beta indexer, check if the ICU development package is installed. + This is a required package so make sure you install it. + +Beyond that, you will want to review the error. If it is: + +- Specifically from the indexer, this could be a bug/issue and should be escalated to + GitLab support. +- An OS issue, you will want to reach out to your systems administrator. + +### Troubleshooting performance + +Troubleshooting performance can be difficult on ElasticSearch. There is a ton of tuning +that *can* be done, but the majority of this falls on shoulders of a skilled +ElasticSearch administrator. + +Generally speaking, ensure: + +* The ElasticSearch server **is not** running on the same node as GitLab. +* The ElasticSearch server have enough RAM and CPU cores. +* That sharding **is** being used. + +Going into some more detail here, if ElasticSearch is running on the same server as GitLab, resource contention is **very** likely to occur. Ideally, ElasticSearch, which requires ample resources, should be running on its own server (maybe coupled with logstash and kibana). + +When it comes to ElasticSearch, RAM is the key resource. ElasticSearch themselves recommend: + +- **At least** 8 GB of RAM for a non-production instance. +- **At least** 16 GB of RAM for a production instance. +- Ideally, 64 GB of RAM. + +For CPU, ElasticSearch recommends at least 2 CPU cores, but ElasticSearch states common +setups use up to 8 cores. For more details on server specs, check out +[ElasticSearch's hardware guide](https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html). + +Beyond the obvious, sharding comes into play. Sharding is a core part of ElasticSearch. +It allows for horizontal scaling of indices, which is helpful when you are dealing with +a large amount of data. + +With the way GitLab does indexing, there is a **huge** amount of documents being +indexed. By utilizing sharding, you can speed up ElasticSearch's ability to locate +data, since each shard is a Lucene index. + +If you are not using sharding, you are likely to hit issues when you start using +ElasticSearch in a production environment. + +Keep in mind that an index with only one shard has **no scale factor** and will +likely encounter issues when called upon with some frequency. + +If you need to know how many shards, read +[ElasticSearch's documentation on capacity planning](https://www.elastic.co/guide/en/elasticsearch/guide/2.x/capacity-planning.html), +as the answer is not straight forward. + +The easiest way to determine if sharding is in use is to check the output of the +[ElasticSearch Health API](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html): + +- Red means the cluster is down. +- Yellow means it is up with no sharding/replication. +- Green means it is healthy (up, sharding, replicating). + +For production use, it should always be green. + +Beyond these steps, you get into some of the more complicated things to check, +such as merges and caching. These can get complicated and it takes some time to +learn them, so it is best to escalate/pair with an ElasticSearch expert if you need to +dig further into these. + +Feel free to reach out to GitLab support, but this is likely to be something a skilled +ElasticSearch admin has more experience with. + +## Common Issues + +All common issues [should be documented](../../integration/elasticsearch.md#troubleshooting). If not, +feel free to update that page with issues you encounter and solutions. + +## Replication + +Setting up ElasticSearch isn't too bad, but it can be a bit finnicky and time consuming. + +The eastiest method is to spin up a docker container with the required version and +bind ports 9200/9300 so it can be used. + +The following is an example of running a docker container of ElasticSearch v7.2.0: + +```bash +docker pull docker.elastic.co/elasticsearch/elasticsearch:7.2.0 +docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.2.0 +``` + +From here, you can: + +- Grab the IP of the docker container (use `docker inspect `) +- Use `` to communicate with it. + +This is a quick method to test out ElasticSearch, but by no means is this a +production solution.