Ruby Background Tasks with Starling – Part 2

EDIT: The latest workling on github has all of the changes described here. There is no need to download my patch anymore. See github.
In my previous post, I briefly described how simple it is to add background processing to your RoR app with Starling and the Workling plugin. Now, let’s discuss the changes I made to the Workling plugin, and then get everything deployed and monitored. As good as the Workling plugin is, it had one limitation that hurt me. At Inquisix, I have 5 workers to handle importing contacts, stats/logging, emails, searching, etc. Most methods are pretty quick, but a large import process could take 5-10 minutes to complete. With the original Workling, that meant all my other background tasks would wait, and I didn’t want that. The easiest solution for me was to modify Workling so each worker polled its queues in a thread. That way, if the contact worker was busy importing 1000 contacts, my emails still got delivered in a reasonable amount of time.

Most of the changes to Workling, are limited to the lib/workling/starling/poller.rb file. Here is a summary of the changes I made:

  • Added threading
  • Updated the way ClassAndMethodRouting builds the routing hash. I needed a routing hash per worker. Note, whatever you do don’t ever try to call routing.build from inside another thread. Really strange things happen that were very difficult to track down. Ruby threads are nice, but I’m used to real threads.
  • Added handling for MemCache exceptions. During development, I went to dinner and left the listener running without Starling. When I got back, my Mac was complaining the disk was full because my log was > 40Gig. Now, I explicitly catch MemCache exceptions and wait 30 seconds before trying again. The log will still grow, but at least not as fast. Besides, I will show you how to make sure Starling stays running.
  • Keep your database connection alive. I found this out after leaving for the night. I thought I had a working system when I left, but nothing worked in the morning. Basically, ActiveRecord will drop your connection if you don’t do anything for a while. Normally, this is not a problem in web apps because the connection is updated every time you access a page, but it doesn’t work that way here. You have to do it yourself. Add this in your loop to be sure:
    
    unless ActiveRecord::Base.connection.active?
    
      unless ActiveRecord::Base.connection.reconnect!
    
        RAILS_DEFAULT_LOGGER.fatal("FAILED - Database not available")
    
        break
    
      end
    
    end
    
    

That’s it for now. Next time, I will walk through the deployment and monitoring process as I have it.

EDIT: The latest workling on github has all of my changes in it. There is no need to download the patch.


Posted

in

by

Tags:

Comments

23 responses to “Ruby Background Tasks with Starling – Part 2”

  1. Dave Avatar

    One other change I forgot to mention. In ClassAndMethodRouting, I changed the class/method split character from ‘:’ to ‘__’. The reason is that the MemCacheClient#stats method uses ‘:’ when it builds the stats. If ‘:’ is in the queue name, it gets confused and you get no information about your queues.

  2. rany Avatar

    hey dave, thanks for the great writeups! i’m glad to see workling is doing it for you :). i’ll look at getting the stuff you did into workling asap (as well as moving it to github). i’ll probaby have time for this on the 08.04 – we’re working on the launch of boomloop.com so i’m flat out atm!

  3. Dave Avatar

    I made a couple of minor changes to my Workling patch. First, if there is an exception, I reset the memcache connection rather than creating a new Workling::Starling::Client. This fits more into how memcache clients should work. Second, I added a call to Thread.pass in Poller#dispatch! after each message is processed. This insures that after each message is processed another thread gets a chance. Probably not necessary, but it made me feel better.
    On another note, my system has processed several million messages so far using this scheme without any issues. Unfortunately, I can’t give you an exact number because I booted my server for other reasons (I usually boot after system updates – old Windows habit).

  4. Dave Avatar

    Updated the patch again to be a ZIP instead of an svn patch. Some people were having problems with that.

  5. Ram Avatar
    Ram

    Any idea why my worker functions would be called in a loop?
    I have exactly one call:

    BgWorker.asynch_set_enable(:ref => 1)

    in my bg_worker.rb file I have

    class BgWorker <1, :uid=>”bg_workers:set_enable:61bb317b03aaf804bdbc21ce635a779e”})

    called in a loop. I don’t understand why. This worked before, and now runs in a loop.

    Thanks

  6. Ram Avatar
    Ram

    aaaahhhh… Starling was running on the wrong port !!!

  7. Dave Avatar

    I’m curious how the wrong port caused a loop.

  8. Dave Avatar

    As an FYI, the original Working author has integrated my changes as well as several other improvements. I will be switching back to his very soon. Because everything is on git now, it’s easier to suggest new additions so everyone can get the benefit.
    The author’s site: http://playtype.net/past/2008/2/6/starling_and_asynchrous_tasks_in_ruby_on_rails/

    The github for the sources: git://github.com/purzelrakete/workling.git

  9. Andy Watts Avatar
    Andy Watts

    Workling seems to poll memcache for starling messages every two seconds. This means a delay before a worker starts on a job.
    Is this sleep between polls necessary?
    Is there anything else I can do to minimize the delay in jobs starting?

    Thanks Andy

  10. Andy Watts Avatar
    Andy Watts

    default poll is 1 second. Pretty fast.
    Adding sleep_time to config/starling.yml can make it faster.
    Making it faster drives the cpu load up.

    development:
    listens_on: localhost:22122
    sleep_time: 0.001

  11. Dave Avatar

    Balancing sleep_time with CPU is pretty much your only option. Since Starling is a really simple queuing system, it only supports polling. We could probably come up with a signaling scheme to prevent the polling and make jobs start immediately, but that is not the case that starling/workling were meant to solve.

  12. graham Avatar
    graham

    I’m wondering if threading is the best direction to go for this. I’ve been looking around, and it seems like rails/activerecord threading generally meant a world of hurt (is this view outdated?).

  13. Dave Avatar

    I’ve had it running for months without a single thread-related issue. The trick is to make sure you have:
    ActiveRecord::Base.allow_concurrency = true

    when you startup your thread. To be complete, you should also have:

    ActiveRecord::Base.verify_active_connections!

    at the end of your thread.

    Don’t ever try to have threads in your web application, but I haven’t seen problems with daemons.

  14. Tom Avatar

    I can’t find any documentation on the starling.yml file? I only have a workling.yml. What options are available? How do I start starling and reference the .yml?
    Thanks!

  15. Dave Avatar

    I always use the command line rather than a YML file because I didn’t see any doc either. However, I looked through the code and it looks like the format would be:

    starling:
      host:
      port:
      queue_path:
      log_level:
      daemonize:
      timeout:
      pid_file:
      log_file:
      user:
      group:
      syslog_channel:
      timeout:
    

    Then, run with starling –config starling.yml

    I haven’t tried this, so YMMV. Check out load_config_file and parse_options in starling-0.9.8/lib/starling/server_runner.rb. It looks like load_config_file simply maps yml names to symbols, but it maps queue_path –> path and log_file –> logger.

    Let me know if this works.

  16. Nanda Avatar
    Nanda

    Dave,I am running into a really weird situation. Starling process is not running at all, but workling task seems to be running in endless loop..any thoughts? In workling.yml, we specified starling server as localhost.
    Thanks!

  17. Dave Avatar

    Very interesting. I would have expected that without Starling running, workling throws and error and stops. Was Starling running when you started Workling, or did Starling stop while Workling was running? In the later case, Workling will keep trying to talk to Starling, but it will pause for a while between tries.

  18. Nanda Avatar
    Nanda

    Ya, we never had starling running at any point, thing is we dont have much control over the server and not sure how it was set up. But I dont think starling was ever started there, no folder where the task files are stored like /var/spool/starling. Anyway, Ram had mentioned in the comments above he had similar issues, but he had starling running on wrong port, in our case we dont have it running at all…..tried to reproduce it locally and appropriately workling complains it can’t find starling in specified port. Anyway, we will try to debug more, thanks.

  19. Dave Avatar

    I know there is code in there to quit on startup if Starling isn’t running. I just tried it locally, and Workling quit immediately when Starling was not running. Are you running the latest version from Github?

  20. Nanda Avatar
    Nanda

    We just had the latest github version about 2 months ago, since then haven’t updated. To me it seems like, there was no starling, no spawn, bjrunner running, so looking at the plugin code, it invoked notremoterunner section of the code .. still doesn’t explain why it went in a loop. But now we started the starling and it’s working fine.

  21. Dave Avatar

    Very interesting. The behavior I see in that case is for workling to write something in the log and exit immediately.

  22. Taurus Avatar

    @Dave:
    If I’ve understood your posts correctly, you indicated the latest version of workling now includes a facility to create concurrent threads in the case of long running jobs. Is there something special I need to do to get that working? My workling runs fine but still only serially processes jobs from the same worker. For example, I have a report_queue_worker which routinely spins off long-running jobs. All other jobs seem to still wait until the last one finishes.

  23. Dave Avatar

    Workers are still single-threaded. The concurrency level is only at the worker level. Essentially, each worker class gets its own thread.

Leave a Reply

Your email address will not be published. Required fields are marked *