Commit Graph

18 Commits

Author SHA1 Message Date
Yorick Peterse c5cb68fd03
Instrument Gitlab::GitAccess/GitAccessWiki 2016-04-21 19:15:32 +02:00
Rémy Coutable 0b8852d3c4 Merge branch 'instrument-service-classes' into 'master'
Instrument all service classes

This will help us see where (mostly) Sidekiq code is spending time.

See merge request !3675
2016-04-12 14:39:54 +00:00
Yorick Peterse b3d8b995c3
Un-instrument Banzai::ReferenceExtractor
Instrumenting this class together with Gitlab::ReferenceExtractor causes
a StackError for some reason. Since Gitlab::ReferenceExtractor has most
of the interesting code we'll only instrument that class.
2016-04-12 14:10:35 +02:00
Yorick Peterse 8299129382
Instrument all service classes
Fixes gitlab-org/gitlab-ce#15162
2016-04-12 12:24:16 +02:00
Yorick Peterse 935f913165
Instrument Banzai code 2016-04-11 17:43:12 +02:00
Yorick Peterse c56f702ec3
Instrument Rails cache code
This allows us to track how much time of a transaction is spent in
dealing with cached data.
2016-04-08 17:54:52 +02:00
Yorick Peterse 33fb55a572 Instrument various Rugged constants 2016-02-01 15:03:58 +01:00
Yorick Peterse 1256dabb44 Instrument all Gitlab::Git instance methods 2016-02-01 13:54:39 +01:00
Yorick Peterse 7d1b51fac1 Instrument Gitlab::Git::Repository
This adds instrumentation for the instance methods of
Gitlab::Git::Repository.
2016-01-21 11:27:31 +01:00
Yorick Peterse 8c4210e676 Added metrics instrumentation for all finders 2016-01-18 11:54:22 +01:00
Yorick Peterse 66a997a914 Track total query/view timings in transactions 2016-01-04 12:14:36 +01:00
Yorick Peterse cafc784ee1 Removed tracking of hostnames for metrics
This isn't hugely useful and mostly wastes InfluxDB space. We can re-add
this whenever needed (but only once we really need it).
2015-12-31 17:55:10 +01:00
Yorick Peterse a6c60127e3 Removed tracking of raw SQL queries
This particular setup had 3 problems:

1. Storing SQL queries as tags is very inefficient as InfluxDB ends up
   indexing every query (and they can get pretty large). Storing these
   as values instead means we can't always display the SQL as easily.
2. We already instrument ActiveRecord query methods, thus we already
   have timing information about database queries.
3. SQL obfuscation is difficult to get right and I'd rather not expose
   sensitive data by accident.
2015-12-31 17:14:02 +01:00
Yorick Peterse 4d925f2147 Move InfluxDB settings to ApplicationSetting 2015-12-28 18:00:32 +01:00
Yorick Peterse bcee44ad33 Instrument all ActiveRecord model methods
This works by searching the raw source code for any references to
commonly used ActiveRecord methods. While not bulletproof it saves us
from having to list hundreds of methods by hand. It also ensures that
(most) newly added methods are instrumented automatically.

This _only_ instruments models defined in app/models, should a model
reside somewhere else (e.g. somewhere in lib/) it _won't_ be
instrumented.
2015-12-17 17:25:48 +01:00
Yorick Peterse 6dc25ad58c Instrument Gitlab::Shel and Gitlab::Git 2015-12-17 17:25:48 +01:00
Yorick Peterse 1b077d2d81 Use custom code for instrumenting method calls
The use of ActiveSupport would slow down instrumented method calls by
about 180x due to:

1. ActiveSupport itself not being the fastest thing on the planet
2. caller_locations() having quite some overhead

The use of caller_locations() has been removed because it's not _that_
useful since we already know the full namespace of receivers and the
names of the called methods.

The use of ActiveSupport has been replaced with some custom code that's
generated using eval() (which can be quite a bit faster than using
define_method).

This new setup results in instrumented methods only being about 35-40x
slower (compared to non instrumented methods).
2015-12-17 17:25:48 +01:00
Yorick Peterse 141e946c3d Storing of application metrics in InfluxDB
This adds the ability to write application metrics (e.g. SQL timings) to
InfluxDB. These metrics can in turn be visualized using Grafana, or
really anything else that can read from InfluxDB. These metrics can be
used to track application performance over time, between different Ruby
versions, different GitLab versions, etc.

== Transaction Metrics

Currently the following is tracked on a per transaction basis (a
transaction is a Rails request or a single Sidekiq job):

* Timings per query along with the raw (obfuscated) SQL and information
  about what file the query originated from.
* Timings per view along with the path of the view and information about
  what file triggered the rendering process.
* The duration of a request itself along with the controller/worker
  class and method name.
* The duration of any instrumented method calls (more below).

== Sampled Metrics

Certain metrics can't be directly associated with a transaction. For
example, a process' total memory usage is unrelated to any running
transactions. While a transaction can result in the memory usage going
up there's no accurate way to determine what transaction is to blame,
this becomes especially problematic in multi-threaded environments.

To solve this problem there's a separate thread that takes samples at a
fixed interval. This thread (using the class Gitlab::Metrics::Sampler)
currently tracks the following:

* The process' total memory usage.
* The number of file descriptors opened by the process.
* The amount of Ruby objects (using ObjectSpace.count_objects).
* GC statistics such as timings, heap slots, etc.

The default/current interval is 15 seconds, any smaller interval might
put too much pressure on InfluxDB (especially when running dozens of
processes).

== Method Instrumentation

While currently not yet used methods can be instrumented to track how
long they take to run. Unlike the likes of New Relic this doesn't
require modifying the source code (e.g. including modules), it all
happens from the outside. For example, to track `User.by_login` we'd add
the following code somewhere in an initializer:

    Gitlab::Metrics::Instrumentation.
      instrument_method(User, :by_login)

to instead instrument an instance method:

    Gitlab::Metrics::Instrumentation.
      instrument_instance_method(User, :save)

Instrumentation for either all public model methods or a few crucial
ones will be added in the near future, I simply haven't gotten to doing
so just yet.

== Configuration

By default metrics are disabled. This means users don't have to bother
setting anything up if they don't want to. Metrics can be enabled by
editing one's gitlab.yml configuration file (see
config/gitlab.yml.example for example settings).

== Writing Data To InfluxDB

Because InfluxDB is still a fairly young product I expect the worse.
Data loss, unexpected reboots, the database not responding, you name it.
Because of this data is _not_ written to InfluxDB directly, instead it's
queued and processed by Sidekiq. This ensures that users won't notice
anything when InfluxDB is giving trouble.

The metrics worker can be started in a standalone manner as following:

    bundle exec sidekiq -q metrics

The corresponding class is called MetricsWorker.
2015-12-17 17:25:48 +01:00