Add Gitlab::Database::Subquery.self_join to delete_all with limit

`delete_all` doesn't support limit, so you'd need to subquery
that. And instead of subquerying with `where(id: query)`, it's better
to use an `INNER JOIN`. This method also works with MySQL, while
subquerying doesn't (without another layer of subquerying)

Reference:
https://stackoverflow.com/questions/17892762/mysql-this-version-of-mysql-doesnt-yet-support-limit-in-all-any-some-subqu/17892886#17892886
This commit is contained in:
Toon Claes 2018-09-18 17:32:21 +02:00
parent 8c2192943a
commit ce830d3c60
4 changed files with 97 additions and 0 deletions

View file

@ -94,6 +94,7 @@ description: 'Learn how to contribute to GitLab.'
- [Verifying database capabilities](verifying_database_capabilities.md)
- [Database Debugging and Troubleshooting](database_debugging.md)
- [Query Count Limits](query_count_limits.md)
- [Database helper modules](database_helpers.md)
## Testing guides

View file

@ -0,0 +1,63 @@
# Database helpers
There are a number of useful helper modules defined in `/lib/gitlab/database/`.
## Subquery
In some cases it is not possible to perform an operation on a query.
For example:
```ruby
Geo::EventLog.where('id < 100').limit(10).delete_all
```
Will give this error:
> ActiveRecord::ActiveRecordError: delete_all doesn't support limit
One solution would be to wrap it in another `where`:
```ruby
Geo::EventLog.where(id: Geo::EventLog.where('id < 100').limit(10)).delete_all
```
This works with PostgreSQL, but with MySQL it gives this error:
> ActiveRecord::StatementInvalid: Mysql2::Error: This version of MySQL
> doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
Also, that query doesn't have very good performance. Using a
`INNER JOIN` with itself is better.
So instead of this query:
```sql
SELECT geo_event_log.*
FROM geo_event_log
WHERE geo_event_log.id IN
(SELECT geo_event_log.id
FROM geo_event_log
WHERE (id < 100)
LIMIT 10)
```
It's better to write:
```sql
SELECT geo_event_log.*
FROM geo_event_log
INNER JOIN
(SELECT geo_event_log.*
FROM geo_event_log
WHERE (id < 100)
LIMIT 10) t2 ON geo_event_log.id = t2.id
```
And this is where `Gitlab::Database::Subquery.self_join` can help
you. So you can rewrite the above statement as:
```ruby
Gitlab::Database::Subquery.self_join(Geo::EventLog.where('id < 100').limit(10)).delete_all
```
And this also works with MySQL, so you don't need to worry about that.

View file

@ -0,0 +1,16 @@
# frozen_string_literal: true
module Gitlab
module Database
module Subquery
class << self
def self_join(relation)
t = relation.arel_table
t2 = relation.arel.as('t2')
relation.unscoped.joins(t.join(t2).on(t[:id].eq(t2[:id])).join_sources.first)
end
end
end
end
end

View file

@ -0,0 +1,17 @@
# frozen_string_literal: true
require 'spec_helper'
RSpec.describe Gitlab::Database::Subquery do
describe '.self_join' do
set(:project) { create(:project) }
it 'allows you to delete_all rows with WHERE and LIMIT' do
events = create_list(:event, 8, project: project)
expect do
described_class.self_join(Event.where('id < ?', events[5]).recent.limit(2)).delete_all
end.to change { Event.count }.by(-2)
end
end
end