41bfe82b7a
This optimises searching for users when using queries consisting out of one or two characters such as "ab". We optimise such cases by searching for `LOWER(name)` and `LOWER(username)` instead of using `ILIKE`. Using `LOWER` produces a _much_ better performing query. For example, when searching for all users matching the term "a" we'd produce the following plan: Limit (cost=637.69..637.74 rows=20 width=805) (actual time=41.983..41.995 rows=20 loops=1) Buffers: shared hit=8330 -> Sort (cost=637.69..638.61 rows=368 width=805) (actual time=41.982..41.990 rows=20 loops=1) Sort Key: (CASE WHEN ((name)::text = 'a'::text) THEN 0 WHEN ((username)::text = 'a'::text) THEN 1 WHEN ((email)::text = 'a'::text) THEN 2 ELSE 3 END), name Sort Method: top-N heapsort Memory: 35kB Buffers: shared hit=8330 -> Bitmap Heap Scan on users (cost=75.47..627.89 rows=368 width=805) (actual time=9.452..41.305 rows=277 loops=1) Recheck Cond: (((name)::text ~~* 'a'::text) OR ((username)::text ~~* 'a'::text) OR ((email)::text = 'a'::text)) Rows Removed by Index Recheck: 7601 Heap Blocks: exact=7636 Buffers: shared hit=8327 -> BitmapOr (cost=75.47..75.47 rows=368 width=0) (actual time=8.290..8.290 rows=0 loops=1) Buffers: shared hit=691 -> Bitmap Index Scan on index_users_on_name_trigram (cost=0.00..38.85 rows=180 width=0) (actual time=4.369..4.369 rows=4071 loops=1) Index Cond: ((name)::text ~~* 'a'::text) Buffers: shared hit=360 -> Bitmap Index Scan on index_users_on_username_trigram (cost=0.00..34.41 rows=188 width=0) (actual time=3.896..3.896 rows=4140 loops=1) Index Cond: ((username)::text ~~* 'a'::text) Buffers: shared hit=328 -> Bitmap Index Scan on users_email_key (cost=0.00..1.94 rows=1 width=0) (actual time=0.022..0.022 rows=0 loops=1) Index Cond: ((email)::text = 'a'::text) Buffers: shared hit=3 Planning time: 3.912 ms Execution time: 42.171 ms With the changes in this commit we now produce the following plan instead: Limit (cost=13257.48..13257.53 rows=20 width=805) (actual time=1.567..1.579 rows=20 loops=1) Buffers: shared hit=287 -> Sort (cost=13257.48..13280.93 rows=9379 width=805) (actual time=1.567..1.572 rows=20 loops=1) Sort Key: (CASE WHEN ((name)::text = 'a'::text) THEN 0 WHEN ((username)::text = 'a'::text) THEN 1 WHEN ((email)::text = 'a'::text) THEN 2 ELSE 3 END), name Sort Method: top-N heapsort Memory: 35kB Buffers: shared hit=287 -> Bitmap Heap Scan on users (cost=135.66..13007.91 rows=9379 width=805) (actual time=0.194..1.107 rows=277 loops=1) Recheck Cond: ((lower((name)::text) = 'a'::text) OR (lower((username)::text) = 'a'::text) OR ((email)::text = 'a'::text)) Heap Blocks: exact=277 Buffers: shared hit=287 -> BitmapOr (cost=135.66..135.66 rows=9379 width=0) (actual time=0.152..0.152 rows=0 loops=1) Buffers: shared hit=10 -> Bitmap Index Scan on yorick_test_users (cost=0.00..124.75 rows=9377 width=0) (actual time=0.101..0.101 rows=277 loops=1) Index Cond: (lower((name)::text) = 'a'::text) Buffers: shared hit=4 -> Bitmap Index Scan on index_on_users_lower_username (cost=0.00..1.94 rows=1 width=0) (actual time=0.035..0.035 rows=1 loops=1) Index Cond: (lower((username)::text) = 'a'::text) Buffers: shared hit=3 -> Bitmap Index Scan on users_email_key (cost=0.00..1.94 rows=1 width=0) (actual time=0.014..0.014 rows=0 loops=1) Index Cond: ((email)::text = 'a'::text) Buffers: shared hit=3 Planning time: 0.303 ms Execution time: 1.687 ms Here we can see the new query is 25 times faster compared to the old query.
71 lines
2.1 KiB
Ruby
71 lines
2.1 KiB
Ruby
module Gitlab
|
|
module SQL
|
|
module Pattern
|
|
extend ActiveSupport::Concern
|
|
|
|
MIN_CHARS_FOR_PARTIAL_MATCHING = 3
|
|
REGEX_QUOTED_WORD = /(?<=\A| )"[^"]+"(?= |\z)/
|
|
|
|
class_methods do
|
|
def fuzzy_search(query, columns)
|
|
matches = columns.map { |col| fuzzy_arel_match(col, query) }.compact.reduce(:or)
|
|
|
|
where(matches)
|
|
end
|
|
|
|
def to_pattern(query)
|
|
if partial_matching?(query)
|
|
"%#{sanitize_sql_like(query)}%"
|
|
else
|
|
sanitize_sql_like(query)
|
|
end
|
|
end
|
|
|
|
def partial_matching?(query)
|
|
query.length >= MIN_CHARS_FOR_PARTIAL_MATCHING
|
|
end
|
|
|
|
# column - The column name to search in.
|
|
# query - The text to search for.
|
|
# lower_exact_match - When set to `true` we'll fall back to using
|
|
# `LOWER(column) = query` instead of using `ILIKE`.
|
|
def fuzzy_arel_match(column, query, lower_exact_match: false)
|
|
query = query.squish
|
|
return nil unless query.present?
|
|
|
|
words = select_fuzzy_words(query)
|
|
|
|
if words.any?
|
|
words.map { |word| arel_table[column].matches(to_pattern(word)) }.reduce(:and)
|
|
else
|
|
sanitized_query = sanitize_sql_like(query)
|
|
|
|
# No words of at least 3 chars, but we can search for an exact
|
|
# case insensitive match with the query as a whole
|
|
if lower_exact_match
|
|
Arel::Nodes::NamedFunction
|
|
.new('LOWER', [arel_table[column]])
|
|
.eq(sanitized_query)
|
|
else
|
|
arel_table[column].matches(sanitized_query)
|
|
end
|
|
end
|
|
end
|
|
|
|
def select_fuzzy_words(query)
|
|
quoted_words = query.scan(REGEX_QUOTED_WORD)
|
|
|
|
query = quoted_words.reduce(query) { |q, quoted_word| q.sub(quoted_word, '') }
|
|
|
|
words = query.split
|
|
|
|
quoted_words.map! { |quoted_word| quoted_word[1..-2] }
|
|
|
|
words.concat(quoted_words)
|
|
|
|
words.select { |word| partial_matching?(word) }
|
|
end
|
|
end
|
|
end
|
|
end
|
|
end
|