Gotcha with find_each and find_in_batches

Rails 2.3 added a couple of nice new methods – find_each and find_in_batches. Both methods accomplish the same thing in a slightly different way. Unlike a normal finder these methods grab objects in batches instead of all at once. For instance, if you have 500,000 users, you don’t want to do the following:

User.find(:all).each { |user| user.some_method }

The reason is that you just loaded all 500,000 records into memory, and your server is not happy. Instead, you could do:

User.find_each { |user| user.some_method }

By default, the above will only load 1,000 User objects into memory, and your server will thank you. If 1,000 is too big/small for you, use the :batch_size option to change it. The find_in_batches method is similar except that it provides the array to the block instead of one object at a time. For example:

User.find_in_batches do |users|
  users.each { |user| user.some_method }
end

If you ever used the wonderful will_paginate gem, you are probably familiar with the concept from the paginated_each method that will_paginate provided.

So, what is the problem? The problem is that you have to aware that unlike paginate_each, find_each and find_in_batches work by setting up a with_scope block. Therefore, if you need to do any other finds on that same model, the scope will apply. Usually this only affects relationships, but it isn’t hard to forget. Here is an example:

# This is a purely made up example
class User < ActiveRecord::Base
  # There is a last_login_at attribute
  named_scope :recent_login,
              lambda { |*args|
              { :conditions => ["people.last_login_at >= ?", (args.first || 1.week.ago)] } }
  belongs_to :parent, :class_name => "User", :foreign_key => "parent_id"
end 
User.recent_login.find_each do |user|
  parent = user.parent # This will include the recent_login scope.
end 

It’s no different than other with_scope issues, but it isn’t as obvious. You can get around it by doing:

User.recent_login.find_each do |user|
  # Got use send because with_excusive_scope is protected.
  User.send(:with_exclusive_scope)
    parent = user.parent # This will include the recent_login scope
  end
end

Now, go and be kind to your server with find_each and find_in_batches – just remember the scope.


Posted

in

by

Comments

3 responses to “Gotcha with find_each and find_in_batches”

  1. Gabe da Silveira Avatar

    Great public service announcement. This is something I hadn’t thought of that is probably causing me small bugs in some attachment_fu s3 thumbnail migration. Namely some thumbnails are getting dropped at the end of every 1000.

  2. Heroku and the little memory hogs | simmo.gs Avatar

    […] – I found the code samples here to be of great help too. This entry was posted in Ruby on Rails. Bookmark the permalink. […]

Leave a Reply

Your email address will not be published. Required fields are marked *