1
0
Fork 0
mirror of https://github.com/rails/rails.git synced 2022-11-09 12:12:34 -05:00

Stop find_in_batches using the records after yielding.

Currently if the code that calls .find_in_batches modifies the yielded array in place then .find_in_batches can enter an infinite loop searching with ruby object ids in the database instead of the primary key of records in the database. This happens because it naively assumes the yielded array hasn't been modified before calling #id on the last object in the array. And ruby (1.8 at least) alias' #id to #object_id so an integer is still returned no matter what the last object is.

By moving finding the #id of the last object before yielding the array it means the calling code can do whatever it wants to the array in terms of modifying it in place, and .find_in_batches doesn't care.
This commit is contained in:
Caius Durling 2011-05-31 22:48:40 +01:00
parent 62570e8626
commit 96be08de25
2 changed files with 21 additions and 2 deletions

View file

@ -68,11 +68,14 @@ module ActiveRecord
records = relation.where(table[primary_key].gteq(start)).all
while records.any?
records_size = records.size
primary_key_offset = records.last.id
yield records
break if records.size < batch_size
break if records_size < batch_size
if primary_key_offset = records.last.id
if primary_key_offset
records = relation.where(table[primary_key].gt(primary_key_offset)).to_a
else
raise "Primary key not included in the custom select clause"

View file

@ -93,4 +93,20 @@ class EachTest < ActiveRecord::TestCase
end
end
end
def test_find_in_batches_should_not_use_records_after_yielding_them_in_case_original_array_is_modified
not_a_post = "not a post"
not_a_post.stubs(:id).raises(StandardError, "not_a_post had #id called on it")
assert_nothing_raised do
Post.find_in_batches(:batch_size => 1) do |batch|
assert_kind_of Array, batch
assert_kind_of Post, batch.first
batch.map! { not_a_post }
end
end
end
end