`HomogeneousIn` has changed merging behavior for NOT IN clause from
before. This changes `equality?` to return true only if `type == :in` to
restore the original behavior.
- Calling `Blog.where(title: ['foo', 'bar']).where_values_hash` now
returns an empty hash.
This is a regression since 72fd0bae59 .
`Arel::Node::HomogeousIn` isn't a `EqualityNode`, the `WhereClause`
didn't had a case for this.
I decide to not make `HomogeousIn` inherit from `EqualityNode`,
because there is a comment questioning it for `In` 57d926a78a/activerecord/lib/arel/nodes.rb (L31)
Intead I just modified the `WhereClause` case and implemented
`right` on the node which is needed by `where_value_hash` 57d926a78a/activerecord/lib/active_record/relation/where_clause.rb (L59)
Since we're checking `serializable?` in the new `HomogeneousIn`
`serialize` will no longer raise an exception. We implemented
`unchecked_serialize` to avoid raising in these cases, but with some of
our refactoring we no longer need it.
I discovered this while trying to fix a query in our application that
was not properly serializing binary columns. I discovered that in at
least 2 of our active model types we were not calling the correct
serialization. Since `serialize` wasn't aliased to `unchecked_serialize`
in `ActiveModel::Type::Binary` and `ActiveModel::Type::Boolean` (I
didn't check others but pretty sure all the AM Types are broken) the SQL
was being treated as a `String` and not the correct type.
This caused Rails to incorrectly query by string values. This is
problematic for columns storing binary data like our emoji columns at
GitHub. The test added here is an example of how the Binary type was
broken previously. The SQL should be using the hex values, not the
string value of "🥦" or other emoji.
We still have the problem `unchecked_serialize` was supposed to fix -
that `serialize` shouldn't validate data, just convert it. We'll be
fixing that in a followup PR so for now we should use `serialize` so we
know all the values are going through the right serialization for their
SQL.
A coworker at GitHub found a few months back that if we used
`santitize_sql` over `where` when we knew the values going into `where`
it was a lot faster than `where`.
This PR adds a new Arel node type called `HomogenousIn` that will be
used when Rails knows the values are all homogenous and can therefore
pick a faster codepath. This new codepath skips some of the required
processing by `where` to make `wheres` with homogenous arrays faster
without requiring the application author to know when to use which query
type.
Using our benchmark code:
```ruby
ids = (1..1000).each.map do |n|
Post.create!.id
end
Benchmark.ips do |x|
x.report("where with ids") do
Post.where(id: ids).to_a
end
x.report("where with sanitize") do
Post.where(ActiveRecord::Base.sanitize_sql(["id IN (?)", ids])).to_a
end
x.compare!
end
```
Before this PR comparing where with a list of IDs to santitize sql:
```
Warming up --------------------------------------
where with ids 11.000 i/100ms
where with sanitize 17.000 i/100ms
Calculating -------------------------------------
where with ids 115.733 (± 4.3%) i/s - 583.000 in 5.045828s
where with sanitize 174.231 (± 4.0%) i/s - 884.000 in 5.081495s
Comparison:
where with sanitize: 174.2 i/s
where with ids: 115.7 i/s - 1.51x slower
```
After this PR comparing where with a list of IDs to santitize sql:
```
Warming up --------------------------------------
where with ids 16.000 i/100ms
where with sanitize 19.000 i/100ms
Calculating -------------------------------------
where with ids 158.293 (± 6.3%) i/s - 800.000 in 5.072208s
where with sanitize 169.141 (± 3.5%) i/s - 855.000 in 5.060878s
Comparison:
where with sanitize: 169.1 i/s
where with ids: 158.3 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <aaron.patterson@gmail.com>