Ruby Background Tasks with Starling
At Inquisix, we help sales professionals exchange trusted referrals. To do that requires several background tasks, some that could take 10-15 minutes to process. Obviously, I can’t make a client wait for that, so I needed a system that could handle background tasks. At first, I started with backgroundrb, and it worked just fine. Backgroundrb was in production for two months while Inquisix grew. However, there were a few things about backgroundrb that bothered me:
- It uses a lot of memory. Every worker creates at least one process. Plus, there is a master process to watch everything and deal with communication. It doesn’t take much before you end up with 5-6 processes. I had to upgrade my test server just to deal with the extra memory requirements.
- It’s not easy to build a queue with control over threads without creating a ton of processes.
- Too many times, I wanted to do something pretty straight forward, but I had to dig through the backgroundrb code to figure out the backgroundrb way. For example, don’t ever call sleep in a backgroundb thread pool. You need to call next_turn instead.
After a while, I decided to look for a simpler way that would scale better without using so much memory. I decided a more traditional queue system would work better for me. At a former company, I built up an enterprise system based on queues that processes millions of transactions across dozens of servers. Something based on queues would work for me, but I did not want to take on the complexity of JMS, ActiveMQ (or some other queue), ActiveMessaging, etc. As usual with Ruby projects, I looked around on the web. Within a few minutes, I came across Starling and Sparrow. Both a Ruby queue systems using the memcached interface. That means I can use the memcache-client gem that I already use. Starling was developed at Twitter for background processing, so I figure it’s got some testing behind it. Sparrow is newer, but basically the same. However, there isn’t much experience with Sparrow, so I settled on Starling as my queue server.
To install Starling:
sudo gem install starling
sudo gem install memcache-client
Now, I needed a way to use my new queue server for background tasks. Again, a few minutes of looking, and I found Workling. It didn’t have everything I wanted, but it was nice and simple, and it had almost everything I wanted. I use Piston for all my plugins, so here is how to install with that:
piston import http://svn.playtype.net/plugins/workling/ vendor/plugins/workling
svn commit -m "added workling"
Make sure you commit now because we will be making some changes to Workling later. Piston will get confused and toss your changes if you don’t commit first.
Client Code
First, create a worker in app/workers/my_worker.rb
class MyWorker < Workling::Base
def do_something_big(options = {})
SomeModel.do_something_big(options[:some_arg])
end
end
Anything in app/workers that inherits from Workling::Base will get picked up automatically as a worker. Workers are basically listeners on a Starling queue. By default, Workling defines queues based on class and method. There will be a queue for every method in every class that inherits from Workling::Base.
Now, you can call your worker asynchronously anywhere like so:
MyWorker.asynch_do_something_big(:some_arg => 5)
Starling Runner
To use the Workling’s Starling runner, you need to setup your environment like so:
Workling::Remote.dispatcher = Workling::Remote::Runners::StarlingRunner.new
I add this line to all my environment files (development.rb, etc.). Workling is nice in that if you comment out the above line, all the MyWorker.asynch_* calls will become synchronous calls — nice for debugging!
The Starling runner takes care of several things:
- Mapping of queue names to worker code. this is done with Workling::ClassAndMethodRouting, but you can change the queue routing pretty easily.
- There’s a client daemon that waits for messages and dispatches these to the responsible workers. if you intend to run this on a remote machine, then just check out your rails project there and start up the Starling client.
Now, fire up Starling, your app, and the workling runner, and your are processing background tasks. Don’t forget to edit config/starling.yml first to tell Workling where Starling is running.
sudo starling -d
script/server
script/workling_starling_client start
What I ended up with was much better for what I was doing. This combination processed my background tasks faster and more reliably. It is much easier to add new workers and call them. Finally, it uses a whole lot less memory, so my end user application performs better. Basically, it wins on all fronts for me.
Next time, I will share the changes I made to Workling to support threads and provide the necessary configuration to ensure that everything stays running in production.
March 25th, 2008 at 10:37 pm
A quick update on memory. With backgroundrb, I could barely run 4 mongrel instances for my app on my 1 gig production server. Now, with Starling and Workling I can pretty easily run 7 with room to spare. It makes sense given that my backgroundrb installation had 4-5 rails processes, and now I have one rails (Workling) and one small ruby process (Starling).
April 18th, 2008 at 8:13 am
An awsome plugin to use with Starling.But when i tried to use it asynchronously,i am facing a problem.(synchronous call works fine)
FAILED to process queue uploadedsong_workers:my_queue. # could not handle invocation of my_queue with {:uid=>”uploadedsong_workers:my_queue:dd5ed62c727f319f4ca89cdce2edd291″}: wrong number of arguments (1 for 0).
I have a workling UploadedsongWorker with a method my_queue.I am calling this worker in following way.
UploadedsongWorker.asynch_my_queue().
I could not find where the problem is.Can somebody help me.
April 18th, 2008 at 8:28 am
Make sure your worker methods always have an options parameter. For example:
def process_my_task(options = {})
end
Workling is always going to try and pass at least one key to the options hash.
April 19th, 2008 at 10:42 am
Thanks for a timely help Dave.
But now i am stuck with a different problem.i am getting a NoMethodError in classandmethodRouting class
WORKLING: runner could not invoke UploadedsongWorker:my_queue with {:uid=>”uploadedsong_workers:my_queue:f90e2826d76cc5202fc11eb05fe16412″}. error was: #
i have created a worker with class name - UploadedsongWorker (apps/workersuploadedsong_worker.rb).
I think i have missed some naming convention.Can you also help me on this ?
Thanks in advance.
Raj
April 19th, 2008 at 1:27 pm
sorry Dave, I am facing the same problem even after adding parmaeter to methods in workers
def my_queue( options = {})
end
But same code works fine for synchronous call.What may be the problem in asynchronous call.
getting following exception
FAILED to process queue uploadedsong_workers:my_queue. # could not handle invocation of my_queue with {:uid=>”uploadedsong_workers:my_queue:e85d05a67bc961265b8c8713d4c5ef03″}: wrong number of arguments (1 for 0).
April 19th, 2008 at 1:29 pm
sorry Dave, I am facing the same problem even after adding parmaeter to methods in workers
def my_queue( options = {})
end
How should i proceed.Is there anything i am missing
April 19th, 2008 at 2:34 pm
Please post your class definition and where it is located. Remove anything you don’t want to share.
The rules for workers are only:
1. Worker classes must be in the app/workers directory.
2. Worker classes need to be named “MyClassWorker”. Replace “MyClass” with your class name. Don’t use underscores in the class name.
3. They must inherit from Workling::Base.
The error you’re getting (uploadedsong_workers:my_queue could not handle…) makes me think you don’t have your class using camel case. Something like class UploadedSong should have mapped to uploaded_song for a queue name.
April 19th, 2008 at 3:33 pm
Dave,once again thanks for helping me.It will be great if this issue is solved for me.I will give more details
The following model code is in app/models/uploadedsong_observer.rb
class UploadedsongObserver
April 19th, 2008 at 3:36 pm
Dave this is /apps/workers/uploadedsong_worker.rb
class UploadedsongWorker
April 19th, 2008 at 6:06 pm
OK, now I see the problem. Your class needs to be defined like this:
class UploadedsongWorker < Workling::Base
# methods
end
Without the Workling::Base, the starling remote Discover class will not be able to find your class. Calling it directly (not remote) will still work because /app/workers is in the path.
April 20th, 2008 at 8:32 am
Sorry Dave,this is my worker class.Already i had it extended from Workling.Still i face the problem.
class UploadedWorker
April 20th, 2008 at 8:36 am
class UploadedsongWorker
April 20th, 2008 at 8:37 am
Sorry Dave,if i try to give you the code,it is getting filtered when i submit my comment
April 20th, 2008 at 9:04 am
Can you email it to me? My email is dave @ [this web site's domain].
April 20th, 2008 at 4:44 pm
I’m also monkeying around with Workling and Starling (and Sparrow, which seems even better at first glance). Very slick. Thanks for calling it out.
However, when I try to pass a model object in the options hash, the poller process tells me that it “could not handle invocation of method_name with nil: No connection to server”. If I pass in an identifier and call find to look up the AR object, then it works fine.
April 20th, 2008 at 6:10 pm
Sparrow looks great as well, but I feel better with starling because it’s been around longer. You should also check out the very different ways they perform the same task. Starling makes use of threads while Sparrow does network events (like backgrounrb).
Regarding your model object, it’s not a good idea to pass model objects across processes (don’t forget your worker is in a separate process). That’s the no connection to server problem. To pass models around, you should always pass the ID, and then do a find in your worker to bring it back. It’s fine to pass large hashes and regular classes around, but models should be with an ID only.
April 20th, 2008 at 7:08 pm
well, marshaling of the object was the expected behavior. But point taken.
April 20th, 2008 at 7:17 pm
Yes, the object is marshalled, but the connection does not marshal well. The attributes marshal fine, but the model’s database connection and state do not make it across to queue .
April 29th, 2008 at 5:59 pm
Is there any way to use Starling / Workling for a task that we want our app to do forever? like for example, parsing feeds?
If so, how to do it?
My idea was initially to create a worker that will pull a task (parse_feed) and a feed(my_feed) and that parses the feed and after puts back this task and this feed in the queue? What do you think?
April 29th, 2008 at 9:21 pm
I suppose you could, but if all your application is going to do is parse feeds without any interaction with the web application, then you might be better off using the daemons gem to build a background process. The background process would do nothing but sit in a loop parsing feeds and putting them someplace. See http://daemons.rubyforge.org/ for daemons info.
Now, you you want feed parsing to be kicked off by some user interaction, then workling and starling could be put to very good use.
I would need to understand more of what you need before making any recommendation.
April 30th, 2008 at 4:57 pm
Thanks Dave for the reply…
Well my application is pretty simple : it receives feeds (urls) from another application through an API. And then, it parses the feed “continuously” -or at least very often, max every 30 minutes- to check if there are new items. When new items are detected, it “posts” these new items to a third Rails application… And that’s pretty much it.
The 2 big constraints that I have : 1) many many feed (up to 100k), 2) speed : a feed should be parsed at least every 30 minutes!
Thanks for your help
Julien (julien DOT genestoux AT gmail DOT com)
May 1st, 2008 at 7:24 am
This doesn’t sound like something workling would be good for. You could probably fake it by posting messages back to yourself, but it wouldn’t be the most efficient.
When you say “often” is there a minimum as well. Say you had something really efficient. Would you really want to parse continously? I’m guessing you would want some minimum time (every 15 minutes, max every 30 minutes). Am I right?
I could see something like:
- Your first app’s API dumps the feeds to parse into a database
- Build a daemon that polls the db for feeds that haven’t been parsed in 15 minutes, sort by when last parsed (oldest first). Each feed to parse is sumped on a queue for processing. You could use starling for this.
- Build another daemon that does nothing by listen on the queue, parse feeds, and hand them off to your third application.
I split out the daemons because you can add more of the second daemon to makes sure you keep up with all the parsing.
Feel free to email me for more. My email is dave AT [this domain]
May 2nd, 2008 at 6:33 pm
Thanks Dave!
Here is what I actually did, since I need to parse many feeds at the same time.
First I took your pateched version of Workling
I created 10 ParserWorkers (workling), and 1 dispatch worker (daemon),
First, each feed has a “frequency” and I record the time the last parse.
My DispatchWorker runs continuously and fecth all the feeds that needs to be parsed (based on the frequency and the last parse). After, it sends this to any of the worker randomly.
The ‘ultimate’ thing would be to be able to know the size of the queue for each worker and launch dynamically new workers if the sizes of the queues are too big… but I really don’t think it’s possible to launch dynamically new workers…
Anyway, thanks again!
May 2nd, 2008 at 7:21 pm
No problem. Glad I could help.
Regarding dynamically adjusting workers, you could do that with a daemon pretty easily, but I’m not sure right off how to do that with workling. Note, you can call memcache stats on starling, and you will get a bunch of information regarding all the queues (including size). My only concern might be that it may not be as simple as firing up more workers. Eventually, you may get CPU starved.
May 5th, 2008 at 1:33 pm
Hi,
This addresses Julien’s post.
1) This might be a very obvious but does the starling/workling pair work across multiple web servers? That is, starling is a ruby process that runs in memory and maintains the queue, and workling is another rails process that listens to entries in the queue and executes them. Will this seamlessly work as I add more webservers?
2) I don’t believe daemons work across web servers and so you will have to remember to startup up a daemon when you add a new server. Doesn’t seem like a big deal but if you think about all the other configuration tasks that are required to add another webserver to scale your rails app, remembering to spawn a daemon process is one more task to execute to have your app working properly.
Thanks,
Prathap
May 5th, 2008 at 2:05 pm
1) Absolutely, you can run starling and workling on multiple servers, and they do not need to be on the web servers. You can run the starling/workling processes anywhere. If you require feedback (progress) from the workling servers on your web site, then you will have to come up with a way to get that across. The existing way will work, but not so great with multiple starling instances. I would probably switch over to memcache to provide feedback to the site.
2) I’m not sure what you mean by daemons working across web servers, but I could see specifying capistrano roles for workling/starling servers. Perhaps, they always run on the app servers, but they could be another type of app server. Of course, wherever workling and starling run will need to have the appropriate config tasks setup to ensure starling and workling execute, but that should be a simple matter of splitting up the existing god (or monit) config file I provided — different config files based on the server’s role.
May 5th, 2008 at 2:54 pm
Thanks Dave,
Sorry I might not have been very clear with questions, or maybe you answered it and I didn’t quite grasp it. Just to confirm:
1) Take memcached for instance. I have it running on machine A which also has a webserver and mysql running on it. If I decide to add another machine, say machine B with only a webserver, all I really need is have memcached-client running machine B, and all requests going to B will know to query the memcached server running on machine A. Similarly, if machine A had starling running on it, when I add machine B, all I would need is have the starling client running on B (that is run: script/workling_starling_client start) so that requests can enqueue jobs on the starling server running on machine A correct?
2) Sorry my question about daemons was utterly stupid. What we are trying to do is have a daily job that scans through our db tables and runs some analytics/calculations to be displayed on the website on a daily basis. I guess the ideal solution would be to pull out mysql into another machine C that sits in front of machine A and machine B (the webservers), and have a deamon process running on machine C that scans through the db and does the calculations nightly. For this kind of a job, using starling/workling wouldn’t be very efficient (or would it?) because these aren’t jobs triggered by HTTP requests. This is just a nightly analytics jobs. However, I am considering starling/workling for this job because I know for a fact that we will need to employ starling/workling for tasks they are perfect for. So why not just implement it for this nightly job and just keep one solution instead of having a daemon for the nightly job and then implement starling/workling for stuff like video uploading?
Thanks again,
Prathap
May 5th, 2008 at 4:00 pm
1) Yes on memcache, but no on the starling. If machine A has memcache server, a webserver, and mysql running on it, and machine B is only a webserver, you would use the same memcache client on machine B to talk to memcache on machine A and starling (wherever that is installed). Usually, I see mysql getting pulled off on its own once multiple web/app servers are added, but here is a possible scenario:
Machine A: webserver, memcache server, memcache client, mysql, starling, and workling.
Machine B: webserver, memcache client
Machine B would call memcache server on machine A and starling on machine A (via memcache client).
I’m assuming here that your web server also has your ruby application running on it.
You could also have starling and workling running on each web server. In this case, your app would be configured to call the local starling, and your workling_starling_client would pull from the local starling.
Once thing to be careful of is that workling_starling_client is kind of misnamed. It isn’t actually the CLIENT for starling. It is more the LISTENER for starling in tradition queue-based processing. Your application uses memcache-client to PUT messages onto the starling queue. workling_starling_client uses memcache-client to PULL messages from the starling queue for processing. With the proper configuration, there are no limits to where or how many starlings and worklings are running.
2) I have something similar. What I do is have a rake task that calls MyWorker.asynch_my_method. Then I have cron kick off the rake task. This way, your app can kick off the task via an HTTP request as well.
Even without workling/starling, I use rake tasks for all my background tasks. Once you do that, it’s trivial to kick off scheduled jobs with cron. For example:
Using rake tasks also makes it easy to kick off tasks from capistrano.
May 5th, 2008 at 4:25 pm
Thanks a lot for the clarification and suggestions!
May 15th, 2008 at 8:31 pm
[...] Ruby Background Tasks with Starling (tags: memcached ruby rails programming 247up) Possibly related posts: (automatically generated)links for 2007-10-23 [...]
May 16th, 2008 at 6:32 am
Hey Dave,
Great writeup, and even better follow up on these comments. After spending 12+ hours fighting BackgroundRb over the last couple days (and much more than that cumulatively), I am very ready to believe that Workling will be the solution to our asynchronous processing. After getting it preliminarily working within my first hour of reading about it (compared to about a day to get BackgroundRb initially set up), I’m very optimistic.
The main objective I have that I’m a little bit unclear about whether I can achieve: can Workling run on multiple boxes currently? I presume it is possible to use Starling with multiple boxes by instantiating multiple Starling objects like so:
— starling.rb (in initializer directory) —
require ‘memcache’
STARLING_SERVER_1 = MemCache.new(’127.0.0.1:22122′ )
STARLING_SERVER_2 = MemCache.new(’127.0.0.2:22122′ )
…
STARLING_SERVER_1.set(’my_queue’,{ :my_data => true})
STARLING_SERVER_2.set(’my_queue’,{ :my_data => false})
— end —
Now, I haven’t run that code, but it seems like it would probably work. However, I’m not sure how that example would extrapolate to Workling, since the Starling server that Workling uses is specified statically in the yml file.
If Workling can handle this, it would have a distinct advantage over BackgroundRb, which for all intents and purposes, from my experience, can connect only to one server.
One other question you may or may not know the answer to: is the data transmitted between a remote Starling and a Rails server just sent as raw data (i.e., packet sniffable, i.e., not good to send sensitive data over the connection)?
Thanks for any help! Once I get this working, I’m hoping to accumulate as much Workling setup notes on my blog, since there’s something of a scarcity of it right now, except from you and the creator.
May 16th, 2008 at 6:39 am
P.S. If sensitive data shouldn’t be sent between Starling and Rails servers (as is my guess), I would probably want to setup my Worklings so that my sensitive Workling processes run locally (on the web server), and the non-sensitive Worklings run on a separate server (which saves CPU/memory on the web servers). Maybe this is possible by calling the local Worklings directly by their class name, e.g., MyWorker.asynch_method_call(options)), and the remote Workling through the ol’ Workling::Remote.run(:my_worker, :my_action, options)…?
May 16th, 2008 at 9:33 am
There is no need to write any additional code to have multiple starlings or worklings. To have multiple starlings (on a single machine or multiple), simply execute starling wherever you want (even multiple on the same machine). Then, update your config for multiple starlings:
config/starling.yml:
listens_on: [192.168.1.10:22122, 192.168.1.11:22122]
Now wherever your app or workling runs with this config, puts and gets to the queue will randomly access two starling queue servers.
Running workling on multiple machines requires no changes as long as config/starling.yml points to the appropriate starling(s). If you want to run multiple worklings on a single server, then you need to change script/workling_starling_client. Set :multiple => true, and you should be good to go.
Essentially, there is no limit to how many starlings and worklings you have running. Also, with the right config, they can run anywhere. If you have a lot of traffic and long running tasks, it’s a good idea to run starling/workling on an app server rather then a web server. I find it easier to pair up starling/workling on app servers rather than separating them as well. How many you need will obviously depend on your app.
Regarding sensitive data, you are correct that data between your app, starling, and workling is sent in the clear. It’s just like memcached or a database call. Objects are mashalled and sent over a port. As it is now, there is no way to specify servers to run workers on. I’d say your better bet would be to make sure your network is secure to limit concerns about anyone sniffing packets between your servers. If you have multiple servers, then I assume your database is on a separate server. Do you encrypt that traffic?
May 16th, 2008 at 2:38 pm
Dave — thanks for the quick and detailed response, that’s great. Way cool that setting up multiple Starlings/Worklings is as easy as adding an entry to starling.yml. I like the sound of this!
Right now, all of our servers are hosted, so I can’t just throw up a firewall in front of them and go about my business. Because I haven’t had the time yet to research how to go about encrypting the traffic between our servers (sounds hard), I’m just running the DB and app on a big server, then running other not-sensitive things (memcached, and soon, starling) on other servers that are setup to restrict traffic with iptables. So my fundamental problem remains, which is that I need to either 1) figure out how to encrypt remote calls to the DB (could then have the remote Starling query the DB to get the sensitive data it needs to execute its task) or 2) figure out how to make sure that certain Workling calls run on a local Starling.
If you have any experience as to what the relative difficulty of those two tasks might be, I’m all ears. Otherwise, a-Googling I go.
May 16th, 2008 at 8:00 pm
I haven’t had the experience of requiring encryption of network traffic. I’ve always solved that problem by securing the network, but I did have the luxury of my own servers. Then again, I was dealing with medical data so hosting was out of the question.
In your case, you could certainly setup iptables to restrict access, but like you said that doesn’t secure the traffic. What is your hosting service? Most have pretty good policies about not watching traffic, but it all comes down to how much you trust them. Unfortunately, my guess is that it will be very difficult to encrypt all your traffic.
May 18th, 2008 at 1:41 pm
Hey Dave,
Thanks for the ideas above. I’m now trying to figure out how you go about debugging workling when things go wrong? I’ve got my local server so that it should be connected to a remote starling. My remote starling is running on a machine that has my Rails app on it. I started stuff on the remote machine roughly as you prescribe:
sudo starling -d -h my_host_ip
script/server -e production -d
script/workling_starling_client start
I’m getting no errors in the production log on my local machine, but the async call basically just seems to disappear. I tried invoking my event both as you describe and as the RailsLodge page describes:
ScraperWorker.asynch_do_scrape(my_options) AND
Workling::Remote.run(:scraper_worker, :do_scrape, my_options)
I’ve double-checked that the starling.yml on my local machine points to the right starling IP and that the RemoteRunner stuff is in my environment.rb. I note that on the remote server, my scraper calls are not showing up in my Starling spool directory, so maybe the async call is not connecting to Starling for some reason?
Since the extent of Workling’s log info seems to be a line here and there in the production.log (and my production.log gives me no guff about this call), and Starling gives me even less logging info, I am not really sure what to do to narrow down problems when mysterious things go wrong. Any advice?
May 18th, 2008 at 7:21 pm
My guess is that something is stopping the traffic from getting to starling. You said that you had iptables on your machines. Do you have port 22122 open on the remote machine? Also, make sure that you are not using 127.0.0.1 for an IP.
Regarding starling’s log, there is a bug in starling that prevents you from using -v or -l.
My advice is get things going to starling’s queues. You can see that by watching the file sizes on the queue files grow. After that, add lots of logging to your workers.
May 24th, 2008 at 8:41 pm
I had a similar problem like Bill, but now I can see my queues are getting bigger in the /var/spool/starling directory. But when I run script/workling_starling_client run , I get no output. How do I know if the client is listening? How can I check what the client is seeing?
BTW, I had this working couple of days ago.
Thanks
May 26th, 2008 at 10:22 pm
There isn’t any output when run as a daemon. There are two ways to see what may be up:
1. Run “script/workling_starling_client start -t”. This will execute the server on top so you can see any puts. There aren’t a lot, but you can add a bunch in poller.rb. clazz_listen and dispatch is where most of the fun happens.
2. Workling will write something to the RAILS_DEFAULT_LOGGER, but most is in the debug log so make sure your log level is set to debug.
One thing to be aware of is that queues will grow on puts AND gets. When something is put on the queue, it grow by the size of all the args (plus a little). Gets are 1 byte at a time. So if you see your queues growing one byte at a time every second so, the workling server is doing its thing. If the queues are growing by larger chunks at a time, then your app is successfully adding items to the queue.
June 18th, 2008 at 8:25 pm
Hey there,
Here is a completely different approach to background processing, that can be used with any of the above mentioned background processing or messaging frameworks, like ActiveMQ:
http://devblog.imedo.de/2008/6/18/running-ruby-blocks-in-the-background
Instead of implementing a complete background task communication protocol, this solution builds on top of any communication protocol to execute code in a background process. Plus, it is the only solution that I know of, that has DRY error recovery support: If for some reason, the communication to the background task fails, it is possible to run the task in-process or write in on disk for later replay.
June 18th, 2008 at 9:34 pm
Very interesting. Although I disagree that most background solutions are not DRY. In fact, the workling solution allows you to absolutely be DRY by allowing you to use all of your application’s code. In the scheme presented, it is unclear how I use code outside of what is serialized in the background block.
June 19th, 2008 at 6:57 pm
Is there anyway to make sure that the workers get reloaded on every server request. Currently, I have to restart the server after a change to the worker. I tried replacing require with require_dependency but that doesn’t seem to help.
Thanks
June 19th, 2008 at 8:07 pm
As far as I know daemons will not work that way. I always remember to restart the worker each time by adding the restart of working to any of my capistrano tasks.
Note, if you are debugging in development, you can turn off asynchronous processing by commenting out:
#Workling::Remote.dispatcher = Workling::Remote::Runners::StarlingRunner.new
in your config/environments/development.rb file. That will cause all your asynch calls to be synchronous and reload works. Your workling process does not need to be running (it isn’t being used anyway).
June 19th, 2008 at 8:51 pm
I have the line commented out in development.rb, and I am running in the synchronous mode. Thanks for the tip btw. But, I need to manually restart server each time I make a change to my background workers. Are you able to make changes to your background workers, and have the changes show up without restarting (in the synchronous mode)?
Thanks.
BTW: Your blog is worth its weight in gold.
June 19th, 2008 at 8:52 pm
…. the weight includes the weight of hardware of course
June 20th, 2008 at 7:03 am
Yes, when running in synchronous mode, the workling process is not used. Everything runs on the context of your main application. If you have to restart your workling process to see the changes, then you are not running in synchronous mode.
Thanks!
June 20th, 2008 at 8:21 am
I guess I wasn’t quite clear. In the synchronous mode, I have to restart my mongrels each time I make a change, for the change to take effect (even in development mode).
I don’t have the starling / worklings running at all. Anyway, I quick-fixed it by creating a class Background with an appropriate method_missing and my workers are subclasses of this class. Seems to have the behavior I want for now - though I am sure I am missing something.
Thanks.
June 20th, 2008 at 9:43 am
Interesting. Do you have:
config.cache_classes = false
in your config/environments/development.rb file?
When running in synchronous mode, your worker classes should be exactly like any other classes and should be reloaded automatically. I haven’t had this issue, but I tend to do the majority of my development and test in the console anyway.
Wish I could be more help, but if you have cache_classes = false your worker classes should reload automatically. Do other classes reload for you, or is it only the worker classes?
June 20th, 2008 at 5:09 pm
config.cache_classes is false and rest of my models, controllers don’t require reloads. I even tried putting require_dependency in development.rb for my worker classes. Anyway, my current solution works - I basically have
AsyncWorker < MyCustomBG < Workling::Base
MyCustomBG has method_missing defined inside a class_eval based on the definition of Workling::Remote.dispatcher
It’s a hack - but works for now
Thanks a lot!
July 8th, 2008 at 11:03 am
Are there any comprehensive resources on getting starling/workling up and running in a multi-stage environments?
I’m still really confused as to what exactly is necessary for starling/workling. Do I need to set up a memcached server or is ‘$sudo gem install starling’ enough?
What capistrano tasks should I have? Should starling be restarted ever? When should workling be restarted? If I ‘$cap deploy’ what happens to the current workling processes?
Thanks so much in advance!
July 8th, 2008 at 11:56 am
I haven’t seen too many (outside of this series).
Regarding your other questions, let me have go.
The only external gems necessary for workling/starling are starling itself and the memcache-client. The memcached server is not required unless you use it for other things. One of the nice things about starling is that it is built on the memcached interface, and that means it is available from many languages and platforms. Plus, since so many apps use memcached for caching, it’s nice not to have yet another protocol to deal with.
Other than the occasional reboot for other reasons, I have never restarted starling or had it go into any form of bad state. It’s been a rock. Even though I do have cap tasks for start/stop/restart of starling, those tasks are not part of any recipes.
I restart workling whenever I restart my application. See part 3 (http://davedupre.com/2008/04/01/ruby-background-tasks-with-starling-part-3/) for a description of my cap tasks.
Now, there is one minor issue with this scheme that I haven’t got a good fix for yet. If you do stop the workling process, it may kill a worker in the middle of a task. This is due to how daemons is implemented (workling uses daemons). I’ve been in touch with the author to figure out a safe way to kill the process, but we haven’t made much progress. What I want is a way to signal the pollers to exit their loops, but daemons makes that difficult. I suppose you could stop the application, wait for workling to get through everything in the queues, restart workling, then start your app. I haven’t got to the point where this is a real problem yet, but I could see it coming.
July 13th, 2008 at 1:56 pm
Hi Dave,
Thank you for the response! Sorry I was a bit trigger happy with that post — I should have read the rest of your blog first :).
Anyway, as for the last point there (restarting workling), might it be possible to have each started instance of workling use a unique starling queue (perhaps prefixed by the timestamp of when workling starts up). That way, you can “signal” the workling process to shut down (set some sort of class variable?), and it will once it has exhausted its own starling queue. Plus, this would mean that for a new deploy, you could start a new workling right away to work with the new code. Just a few thoughts I guess… perhaps you’ve already considered them.
July 14th, 2008 at 11:11 am
I hadn’t thought of using a special queue because I was concerned that if there are many methods (queues) in a worker, it might take a while to get to the stop message. I was working through some form of signaling mechanism so the process isn’t just killed when you stop it. I have some of the code in there already to stop things, but daemon coding is not my area of expertise so I wasn’t able to spend the time necessary to get the signal to work correctly.
If you are up for suggesting a working scheme, please feel free to do so.
July 15th, 2008 at 9:34 pm
Wouldn’t you want a worker to finish off all of its queues before dying for consistency? For instance, if you deploy a new version of a site which changes the interaction between the app and the workers, a new worker picking up on an old queue might break things. Of course this can lead to memory strain when the queues are long and you’re deploying a new app (new workers + old workers finishing off queues), but it seems like the only safe option besides stopping the queues early.
Perhaps I’m missing something fundamental about workling/starling?
July 16th, 2008 at 8:53 am
No, you’re not missing anything. Certainly, if I was changing the workers then I would want them to empty out their queues before updating. In that case, I would do:
1. Stop my application to prevent new messages
2. run stats on starling to see how many items are left to process
3. When all queues are empty, restart workling
4. Start my application
I suppose to be safest you would want this automated in the workling shutdown process. However, if you’re queues tend to be long, and you haven’t changed a worker, then all you really need to do is stop all the workling threads in between a message. The ultimate would be to have both options available.
July 22nd, 2008 at 12:09 am
Hi Dave,
we are facing an issue while updating a model from workling.the issue is,i have a counter column in a table and i increment this from controller and decrement from workling.This counter is used for controlling number of workling to be triggered.
The table in DB is showing the decremented value but still controller could not see the change in counter by the workling.Is this the expected behaviour of workling.
Is there any way to access the changes made to model (db table) by the controller.
Thanks in advance Kevin.
July 22nd, 2008 at 7:32 am
This is a pretty common Rails issues. When you have multiple instances of a model (one in the app and one in the workling in your case), and one instance saves, the other will not see the change. You have a couple of options.
1. Make sure to reload the model in your controller. This may work, but timing will be a problem. What if the workling is delayed?
2. Use a transaction and lock the model. Just be careful with locking multiple models, you could end up with a deadlock. If you do modify multiple models, make sure you always access them in the same order.