Commit graph

6 commits

Author SHA1 Message Date
Yorick Peterse
0df65909ef Added benchmark for User.all
This benchmark exists to test if ordering has any noticeable impact in
the test environment.
2015-11-03 11:47:23 +01:00
Yorick Peterse
6d3068bec3 Adjusted ips/sec for find_by_any_email benchmarks
While these benchmarks run at roughly 1500 i/sec setting the threshold
to 1000 leaves some room for deviations (e.g. due to different DB
setups).
2015-10-30 12:00:58 +01:00
Yorick Peterse
49c081b9f3 Improve performance of User.find_by_any_email
This query used to rely on a JOIN, effectively producing the following
SQL:

    SELECT users.*
    FROM users
    LEFT OUTER JOIN emails ON emails.user_id = users.id
    WHERE (users.email = X OR emails.email = X)
    LIMIT 1;

The use of a JOIN means having to scan over all Emails and users, join
them together and then filter out the rows that don't match the criteria
(though this step may be taken into account already when joining).

In the new setup this query instead uses a sub-query, producing the
following SQL:

    SELECT *
    FROM users
    WHERE id IN (select user_id FROM emails WHERE email = X)
    OR email = X
    LIMIT 1;

This query has the benefit that it:

1. Doesn't have to JOIN any rows
2. Only has to operate on a relatively small set of rows from the
   "emails" table.

Since most users will only have a handful of Emails associated
(certainly not hundreds or even thousands) the size of the set returned
by the sub-query is small enough that it should not become problematic.

Performance of the old versus new version can be measured using the
following benchmark:

    # Save this in ./bench.rb
    require 'benchmark/ips'

    email = 'yorick@gitlab.com'

    def User.find_by_any_email_old(email)
      user_table = arel_table
      email_table = Email.arel_table

      query = user_table.
        project(user_table[Arel.star]).
        join(email_table, Arel::Nodes::OuterJoin).
        on(user_table[:id].eq(email_table[:user_id])).
        where(user_table[:email].eq(email).or(email_table[:email].eq(email)))

      find_by_sql(query.to_sql).first
    end

    Benchmark.ips do |bench|
      bench.report 'original' do
        User.find_by_any_email_old(email)
      end

      bench.report 'optimized' do
        User.find_by_any_email(email)
      end

      bench.compare!
    end

Running this locally using "bundle exec rails r bench.rb" produces the
following output:

    Calculating -------------------------------------
                original     1.000  i/100ms
               optimized    93.000  i/100ms
    -------------------------------------------------
                original     11.103  (± 0.0%) i/s -     56.000
               optimized    948.713  (± 5.3%) i/s -      4.743k

    Comparison:
               optimized:      948.7 i/s
                original:       11.1 i/s - 85.45x slower

In other words, the new setup is 85x faster compared to the old setup,
at least when running this benchmark locally.

For GitLab.com these improvements result in User.find_by_any_email
taking only ~170 ms to run, instead of around 800 ms. While this is
"only" an improvement of about 4.5 times (instead of 85x) it's still
significantly better than before.

Fixes #3242
2015-10-30 12:00:58 +01:00
Yorick Peterse
72f428c7d2 Improve performance of User.by_login
Performance is improved in two steps:

1. On PostgreSQL an expression index is used for checking lower(email)
   and lower(username).
2. The check to determine if we're searching for a username or Email is
   moved to Ruby. Thanks to @haynes for suggesting and writing the
   initial implementation of this.

Moving the check to Ruby makes this method an additional 1.5 times
faster compared to doing the check in the SQL query.

With performance being improved I've now also tweaked the amount of
iterations required by the User.by_login benchmark. This method now runs
between 900 and 1000 iterations per second.
2015-10-15 11:58:25 +02:00
Yorick Peterse
22506ddc50 Added benchmark_subject method for benchmarks
This class method can be used in "describe" blocks to specify the
subject of a benchmark. This lets you write:

    benchmark_subject { Foo }

instead of:

    benchmark_subject { -> { Foo } }
2015-10-05 10:51:24 +02:00
Yorick Peterse
19893a1c10 Basic setup for an RSpec based benchmark suite
This benchmark suite uses benchmark-ips
(https://github.com/evanphx/benchmark-ips) behind the scenes. Specs can
be turned into benchmark specs by setting "benchmark" to "true" in the
top-level describe block like so:

    describe SomeClass, benchmark: true do

    end

Writing benchmarks can be done using custom RSpec matchers, for example:

    describe MaruTheCat, benchmark: true do
      describe '#jump_in_box' do
        it 'should run 1000 iterations per second' do
          maru = described_class.new

          expect { maru.jump_in_box }.to iterate_per_second(1000)
        end
      end
    end

By default the "iterate_per_second" expectation requires a standard
deviation under 30% (this is just an arbitrary default for now). You can
change this by chaining "with_maximum_stddev" on the expectation:

    expect { maru.jump_in_box }.to iterate_per_second(1000)
      .with_maximum_stddev(10)

This will change the expectation to require a maximum deviation of 10%.

Alternatively you can use the it block style to write specs:

    describe MaruTheCat, benchmark: true do
      describe '#jump_in_box' do
        subject { -> { described_class.new } }

        it { is_expected.to iterate_per_second(1000) }
      end
    end

Because "iterate_per_second" operates on a block, opposed to a static
value, the "subject" method must return a Proc. This looks a bit goofy
but I have been unable to find a nice way around this.
2015-10-02 17:00:23 +02:00