Measuring Performance and Cost Value Index

May 9th, 2008

I often ask the question “How can we properly work out pay for performance?” For some teams it sounds pretty simple. Sales, for instance, has some relatively simple metrics. How many encounters? How much did you sell? It’s a little more difficult for Engineering. Here’s an example that I often see:

A great product team gets together and builds an amazing product or service. Product Marketing provided valuable customer feedback. Product Design uses that feedback to designed a great product that removes customer pain. Development builds it on-time and under budget, and QA ensures a high-quality product. All of these teams work together to build an awesome product — often with long hours and a few miracles along the way. As a result of these efforts, sales go through the roof. Unfortunately, all that means is Sales gets the most benefit (monetarily anyway) from the the other teams efforts because their commissions go up.

The question is how does this work it’s way back to the other teams? Should it? Unfortunately, there is no easy answer because there isn’t any way to figure out how much an individual contributed to a product. At first, I thought we could come up with a scheme to reward a team for their contributions, but even that doesn’t always work. In the world of Agile and Scrum, there is a concept of BVI (Business Value Index). I often use BVI to help prioritize stories for completion. In a nutshell, BVI is the percentage contribution a story has to the overall value of the release. So, if a release is projected to bring in $1M in revenue, and a story has a BVI of 0.25, then that story’s Business Value (BV) is $250K. In theory, you could then use the combined BV of a scrum team’s story output to come up with that team’s contribution to the final value. Unfortunately, the reality is not that simple. Often, stories have a BVI of 0 even though other stories can’t happen without it. A story to refactor a class to make it more maintainable has no impact on the customer and, therefore, no BVI. It does impact cost, so maybe we could add a CVI (Cost Value Index) to those stories without BVI.

To some extent, CVI can have even more impact long term. A story with a BV of $250K will always be far less on the company’s bottom line. However, a story with a Cost Value (CV) of $250K goes straight to the company bottom line. Plus, developer’s spend less time on maintenance and more time developing, and that makes everyone happy.

Cost Value sounds like a great idea, but it is extremely difficult to calculate. In fact, very few organizations pay much attention to it. “We can’t spend 80 hours automating that process. There isn’t time.” The part that people forget is not having the process automated is costing 10 hours per week. Let’s say one developer takes the two weeks (80 hours) to automated it, then your total cost is 100 hours (you still have the 10 hours/week without automation). For your 100 hours of cost, you saved 480 hours over a year (48 weeks * 10 hours/week). That’s 480 more hours to spend on actual product development.

CV = Net savings per year
CVI = Percentage contribution to the whole

I need to think about this some more, but I’m curious what people think. The biggest problem with this is a breaks one fundamental rule in Scrum. That is to properly calculate CV, you have to track how much time people are spending not doing their tasks, and we know we should only be tracking how much time is left.

What do you think? I don’t like the idea of tracking time, but I do like the idea of being able to prioritize a story that reduces costs along with a story that generates revenue.

I will leave you with a great quote. I don’t know who said it, but it is so very true.

“Organizations don’t get what they want. They get what they measure.”

Patent Decisions Since 2000 Invalid?

May 6th, 2008

Anyone see the NY Times article In One Flaw, Questions on Validity of 46 Judges? Basically, a law professor discovered a constitutional flaw in the appointment process for judges who decide patent appeals and disputes.  This goes back to 2000.  That means thousands of patent cases and billions of dollars in licenses.  The really interesting part is that no one, including the patent office, is saying he is wrong.

Imagine all those startups thinking they are secure with a patent to cover their IP. This is one crazy cloud over so that thought. Granted, a bunch of recent patents are completely bogus, but now the patent trolls could have a field day.   Definitely need to keep an eye on this one.

I wonder if that means the patent I just received last month is not valid anymore…

Kelly Johnson and Agile Development

April 30th, 2008

In a previous post about the F-117, I described some of the rules Kelly Johnson had for his projects. As promised, here is how I would map Kelly’s rules to agile development.

Kelly: 1. The Skunk Works manager must be delegated practically complete control of his program in all aspects. He should report to a division president or higher.

Agile: 1. The Scrum Master must have the power and authority to remove any blocking issues.

Kelly: 2. Strong but small project offices must be provided both by the military and industry.

Agile: 2. Sprint teams must be in the same offices. Virtual teams can work, but it’s best to be in the same area.

Kelly: 3. The number of people having any connection with the project must be restricted in an almost vicious manner. Use a small number of good people (10% to 25% compared to the so-called normal systems).

Agile: 3. Sprint teams must be small (< 7). Although, I like Kelly’s phrasing better.

Kelly: 4. A very simple drawing and drawing release system with great flexibility for making changes must be provided.

Agile: 4. Use Stories instead of detailed design and functional specifications. Document when necessary, but keep it simple.

Kelly: 5. There must be a minimum number of reports required, but important work must be recorded thoroughly.

Agile: 5. Keep the status reports to the waterfall people. We hold daily Scrum Meetings to discuss status. If you’re interested, come to the meeting, but don’t expect a report.

Kelly: 6. There must be a monthly cost review covering not only what has been spent and committed but also projected costs to the conclusion of the program. Don’t have the books ninety days late and don’t surprise the customer with sudden overruns.

Agile: 6. Hold daily Scrum meetings to determine the status of the project. Identify and remove blocking issues before they become a problem.

Kelly: 7. The contractor must be delegated and must assume more than normal responsibility to get good vendor bids for subcontract on the project. Commercial bid procedures are very often better than military ones.

Agile: 7. The Sprint Team must be fully empowered to reach the Sprint Goal.

Kelly: 8. The inspection system as currently used by the Skunk Works, which has been approved by both the Air Force and Navy, meets the intent of existing military requirements and should be used on new projects. Push more basic inspection responsibility back to subcontractors and vendors. Don’t duplicate so much inspection.

Agile: 8. The Sprint Team shall complete all assigned stories, including full testing. Don’t push testing off for a future team.

Kelly: 9. The contractor must be delegated the authority to test his final product in flight. He can and must test it in the initial stages. If he doesn’t, he rapidly loses his competency to design other vehicles.

Agile: 9. The Sprint Team must demonstrate success of the Sprint by showing actual, working product that completes the assigned stories. Testing must be don’t continuously throughout the Sprint.

Kelly: 10. The specifications applying to the hardware must be agreed to well in advance of contracting. The Skunk Works practice of having a specification section stating clearly which important military specification items will not knowingly be complied with and reasons therefore is highly recommended.

Agile: 10. Stories must be committed to in advance of the Sprint.

Kelly: 11. Funding a program must be timely so that the contractor doesn’t have to keep running to the bank to support government projects.

Agile: 11. Keep the Sprint Team well fed, well caffeinated, and unblocked.

Kelly: 12. There must be mutual trust between the military project organization and the contractor with very close cooperation and liaison on a day-to-day basis. This cuts down misunderstanding and correspondence to an absolute minimum.

Agile: 12. There must be mutual trust between upper management and the Sprint Team. Scrum Meetings are held daily to facilitate communication.

Kelly: 13. Access by outsiders to the project and its personnel must be strictly controlled by appropriate security measures.

Agile: 13. Access by outsiders to the Sprint team and its personnel must be strictly controlled by the Scrum Master. That means no going to a Sprint Team member to add more tasks.

Kelly: 14. Because only a few people will be used in engineering and most other areas, ways must be provided to reward good performance by pay not based on the number of personnel supervised.

Agile: 14. Pay should be based on the success of the team and the individual, not on how many people are supervised.

There you have it. Kelly Johnson and the Skunk Works were one of the first Agile Engineering shops. Very few can claim to be as successful or innovative.

Follow up to high stakes salvage

April 29th, 2008

A Crushing Issue: How to Destroy Brand-New Cars

Last month, I pointed out an article about salvaging a ship in trouble.  The salvage crew successfully saved the ship and its cargo of 4,703 Mazda vehicles (loosing one life in the process), but now what do you do with the cars?  Turns out, it’s not easy to destroy 4,703 cars and not get in trouble, damage your brand, or get sued in the process.  It took Mazda a year to plan and about another year to actually do it.  Who knew?

POPSignal - Boston, May 15th

April 24th, 2008

POPSignal

Those of you in Boston may remember Tech Cocktail last fall. If you were one of the lucky 300+ people in attendance, you know how successful it was. I know I enjoyed it, and I met some great new contacts there. Well, Brian Balfour and Jay Meattle are at it again, but now it is POPSignal.

POPSignal parties are aimed at bringing together the local tech community in a fun and informal environment. There is no format, presentations, or speeches. However, there is always a free open bar, free food, music, fun activities from sponsors, and great conversation.

The date is May 15th, 6:30-9:30pm, at Tequila Rain near Fenway Park. To RSVP and get more details check out http://popsignal.eventbrite.com/.

Come on by and say hello.

Continuing the Lockheed Skunk Works Theme

April 22nd, 2008

The History Behind the F-35B

Continuing the theme of the Lockheed Skunk Works, this is an interesting take on the development of the shaft-driven fan on the F-35B. The inventor discusses how he came up with the idea and how the system works. Like any good research project, there was a risk of failure, but in the 9th month of a 9 month study Dr. Paul Bevilaqua had an idea that eventually turned into the F-35’s engine. I hope he got a bonus considering the F-35 contract is worth $200 Billion (that’s a ‘B’).

One comment from Dr. Bevilaqua worth mentioning is how it is more important to do engineering than to do math. In other words, think! As Dr. Bevilaqua says, school may have taught him how to move the pieces, but his mentor (Dr. Hans von Ohain, co-inventor of the jet engine) taught him how to play chess. In my travels, I see far too many “engineers” that don’t think. If someone hasn’t done it before, they are stuck.

We all should take a lesson from this. Good software engineering is more than re-hashing what someone else did. Sometimes it’s the wacky ideas that end up being the best. Don’t be afraid to reach sometimes and see what might work. Don’t re-invent the wheel, but if there is no wheel then figure out a way to make one.

Ruby Background Tasks with Starling - Part 3

April 1st, 2008

In my previous post, I went over the changes I made to Workling to add threading and allow it to run for long periods. Now, we need to deploy and monitor everything. In the Linux word, there are several options, but monit seems to be the most popular. However, I wanted to give god.rb a shot. God.rb is basically a clone of monit written in Ruby. What it’s written in doesn’t really matter to me, but setting up my config files in Ruby was interesting. That’s one more set of script commands I don’t have know.

Installing god.rb is as simple as:

sudo gem install god

The god.rb web site has some decent documentation to help you understand how to build a config file. I tend to build my config files for as much as I can using erb templates rather than manually updating the config files for each environment.  I build my apache config file this way as well.

1. Download my god config to /lib/templates/god.conf. This is an example of an actual god.conf file. You will want to update it for paths and to setup the mongrels and the listeners to not run as root — not a good idea to run those as root. My apache and mysql scripts already run as a different user.

2. Now, add the following to your config/deploy.rb file:


namespace :god do

  desc <<-DESC
  Generate the GOD configuration. We will create the appropriate GOD
  configuration stanza for the application and copy it to:

  /shared/path/god.conf

  Once it's there, somebody with the required permission to manage the
  GOD configuration should somehow incorporate that file and restart
  the GOD server. Each time the runtime configuration (ie number of
  mongrels running) changes, the configuration will have to be manually
  updated to match.
  DESC
  task :config, :roles => :app do
    rails_env = fetch(:rails_env, 'production')
    dispatcher_starting_port = fetch(:dispatcher_starting_port, 8000)
    dispatcher_instances = fetch(:dispatcher_instances, 3)
    dispatcher_ending_port = dispatcher_starting_port + dispatcher_instances - 1
    dispatcher_ports = dispatcher_starting_port..dispatcher_ending_port
    god_conf_path = "#{shared_path}/god.conf"
    god_conf_template = File.join(File.dirname(__FILE__), '..', 'lib', 'templates', 'god.conf.erb')
    begin
      god_conf = ERB.new(File.read(god_conf_template)).result(binding)
      put god_conf, god_conf_path, :mode => 0644
    rescue Exception => e
      abort "An error occurred in the GOD config generation: #{e}"
    end
  end
end

3. Execute cap god:config with the desired environment, and a new god.conf file will be copied to the shared directory on your server.  I’m assuming you have capistrano 2.x here.

4. On your server, execute: sudo god -c /shared/path/god.conf

That’s it! Now, all your processes will be monitored by god.rb and restarted whenever there is a problem. Hold on though, we’re not completely done. You want god.rb to start up when your server boots, right? We also have to update our deploy process to use god.rb to start/stop our services now. Try to use you existing spin or spawn tasks, and god.rb will fight with you.

First, let’s get god.rb setup as a service. I found this script in the god.rb sources, and I tweaked it a bit for my system.


#!/bin/bash
# god       startup script for god (like monit, only Ruby)
# Author: Dave Dupre

# Comments to support chkconfig on CentOS
# chkconfig: - 85 15
# description: god - monitor all my processes

CONF_DIR=/shared/path
LOG_DIR=/var/log
PID_DIR=/var/run/god
BIN_DIR=/usr/local/bin

RETVAL=0

# Go no further if config directory is missing.
[ -d "$CONF_DIR" ] || exit 0

case “$1″ in
  start)
    $BIN_DIR/god -P $PID_DIR/god.pid -l $LOG_DIR/god.log -c $CONF_DIR/god.conf
    RETVAL=$?
  ;;
  stop)
    $BIN_DIR/god terminate
    RETVAL=$?
  ;;
  restart)
    $BIN_DIR/god terminate
    $BIN_DIR/god -P $PID_DIR/god.pid -l $LOG_DIR/god.log -c $CONF_DIR/god.conf
    RETVAL=$?
  ;;
  status)
    $BIN_DIR/god status
    RETVAL=$?
  ;;
  *)
  echo “Usage: god {start|stop|restartstatus}”
  exit 1
  ;;
esac

exit $RETVAL

Put the above contents into /etc/init.d/god and make it executable (sudo chmod +x /etc/init.d/god). Lastly, tell your system there is a new service in town. I use CentOS 5, so I run:

sudo chkconfig /etc/init.d/god on

Now, god.rb will start up whenever your server boots, and you can start/stop/restart it using standard service calls.

The last thing we need to do is update our deployment process to use god.rb to stop/start our processes. Add this to your deploy.rb file:


namespace :deploy do
  [ :stop, :start, :restart ].each do |t|
    desc “#{t.to_s.capitalize} mongrels using god”
    task t, :roles => :app do
      sudo “god #{t.to_s} listeners”
      sudo “god #{t.to_s} mongrels”
    end
  end
end

namespace :starling do
  [ :stop, :start, :restart ].each do |t|
    desc “#{t.to_s.capitalize} starling using god”
    task t, :roles => :app do
      sudo “god #{t.to_s} starlings”
    end
  end
end

namespace :workling do
  [ :stop, :start, :restart ].each do |t|
    desc “#{t.to_s.capitalize} workling using god”
    task t, :roles => :app do
      sudo “god #{t.to_s} listeners”
    end
  end
end

OK! Now, we’re done. You deploy as you normally would, and I have full control over Starling and the Workling listener. Notice that there is no mongrel cluster either. God.rb started up all the instances of mongrel I needed, and it will monitor everything so there is no need for mongrel cluster anymore.

That’s it for now. You have a system that will process all your background tasks and stay running. The only thing I didn’t setup here is notifications from god.rb when there is a problem. The god.rb config settings have lots of schemes for email notifications. Take a peak at the docs and make sure god can talk to you.

Ruby Background Tasks with Starling - Part 2

March 29th, 2008

In my previous post, I briefly described how simple it is to add background processing to your RoR app with Starling and the Workling plugin. Now, let’s discuss the changes I made to the Workling plugin, and then get everything deployed and monitored. As good as the Workling plugin is, it had one limitation that hurt me. At Inquisix, I have 5 workers to handle importing contacts, stats/logging, emails, searching, etc. Most methods are pretty quick, but a large import process could take 5-10 minutes to complete. With the original Workling, that meant all my other background tasks would wait, and I didn’t want that. The easiest solution for me was to modify Workling so each worker polled its queues in a thread. That way, if the contact worker was busy importing 1000 contacts, my emails still got delivered in a reasonable amount of time.

Most of the changes to Workling, are limited to the lib/workling/starling/poller.rb file. Here is a summary of the changes I made:

  • Added threading
  • Updated the way ClassAndMethodRouting builds the routing hash. I needed a routing hash per worker. Note, whatever you do don’t ever try to call routing.build from inside another thread. Really strange things happen that were very difficult to track down. Ruby threads are nice, but I’m used to real threads.
  • Added handling for MemCache exceptions. During development, I went to dinner and left the listener running without Starling. When I got back, my Mac was complaining the disk was full because my log was > 40Gig. Now, I explicitly catch MemCache exceptions and wait 30 seconds before trying again. The log will still grow, but at least not as fast. Besides, I will show you how to make sure Starling stays running.
  • Keep your database connection alive. I found this out after leaving for the night. I thought I had a working system when I left, but nothing worked in the morning. Basically, ActiveRecord will drop your connection if you don’t do anything for a while. Normally, this is not a problem in web apps because the connection is updated every time you access a page, but it doesn’t work that way here. You have to do it yourself. Add this in your loop to be sure:
    
    
    unless ActiveRecord::Base.connection.active?
    
      unless ActiveRecord::Base.connection.reconnect!
    
        RAILS_DEFAULT_LOGGER.fatal("FAILED - Database not available")
    
        break
    
      end
    
    end
    
    

Here is my workling patch. Now, I need to go back and update the tests, but the threads complicate that. It’s a learning experience…

That’s it for now. Next time, I will walk through the deployment and monitoring process as I have it.

Ruby Background Tasks with Starling

March 25th, 2008

At Inquisix, we help sales professionals exchange trusted referrals. To do that requires several background tasks, some that could take 10-15 minutes to process. Obviously, I can’t make a client wait for that, so I needed a system that could handle background tasks. At first, I started with backgroundrb, and it worked just fine. Backgroundrb was in production for two months while Inquisix grew. However, there were a few things about backgroundrb that bothered me:

  • It uses a lot of memory. Every worker creates at least one process. Plus, there is a master process to watch everything and deal with communication. It doesn’t take much before you end up with 5-6 processes. I had to upgrade my test server just to deal with the extra memory requirements.
  • It’s not easy to build a queue with control over threads without creating a ton of processes.
  • Too many times, I wanted to do something pretty straight forward, but I had to dig through the backgroundrb code to figure out the backgroundrb way. For example, don’t ever call sleep in a backgroundb thread pool. You need to call next_turn instead.

After a while, I decided to look for a simpler way that would scale better without using so much memory. I decided a more traditional queue system would work better for me. At a former company, I built up an enterprise system based on queues that processes millions of transactions across dozens of servers. Something based on queues would work for me, but I did not want to take on the complexity of JMS, ActiveMQ (or some other queue), ActiveMessaging, etc. As usual with Ruby projects, I looked around on the web. Within a few minutes, I came across Starling and Sparrow. Both a Ruby queue systems using the memcached interface. That means I can use the memcache-client gem that I already use. Starling was developed at Twitter for background processing, so I figure it’s got some testing behind it. Sparrow is newer, but basically the same. However, there isn’t much experience with Sparrow, so I settled on Starling as my queue server.

To install Starling:


sudo gem install starling
sudo gem install memcache-client

Now, I needed a way to use my new queue server for background tasks. Again, a few minutes of looking, and I found Workling. It didn’t have everything I wanted, but it was nice and simple, and it had almost everything I wanted. I use Piston for all my plugins, so here is how to install with that:


piston import http://svn.playtype.net/plugins/workling/ vendor/plugins/workling
svn commit -m "added workling"

Make sure you commit now because we will be making some changes to Workling later. Piston will get confused and toss your changes if you don’t commit first.

Client Code

First, create a worker in app/workers/my_worker.rb


class MyWorker < Workling::Base
  def do_something_big(options = {})
    SomeModel.do_something_big(options[:some_arg])
  end
end

Anything in app/workers that inherits from Workling::Base will get picked up automatically as a worker. Workers are basically listeners on a Starling queue. By default, Workling defines queues based on class and method. There will be a queue for every method in every class that inherits from Workling::Base.

Now, you can call your worker asynchronously anywhere like so:

MyWorker.asynch_do_something_big(:some_arg => 5)

Starling Runner

To use the Workling’s Starling runner, you need to setup your environment like so:

Workling::Remote.dispatcher = Workling::Remote::Runners::StarlingRunner.new

I add this line to all my environment files (development.rb, etc.). Workling is nice in that if you comment out the above line, all the MyWorker.asynch_* calls will become synchronous calls — nice for debugging!

The Starling runner takes care of several things:

  1. Mapping of queue names to worker code. this is done with Workling::ClassAndMethodRouting, but you can change the queue routing pretty easily.
  2. There’s a client daemon that waits for messages and dispatches these to the responsible workers. if you intend to run this on a remote machine, then just check out your rails project there and start up the Starling client.

Now, fire up Starling, your app, and the workling runner, and your are processing background tasks. Don’t forget to edit config/starling.yml first to tell Workling where Starling is running.


sudo starling -d
script/server
script/workling_starling_client start

What I ended up with was much better for what I was doing. This combination processed my background tasks faster and more reliably. It is much easier to add new workers and call them. Finally, it uses a whole lot less memory, so my end user application performs better. Basically, it wins on all fronts for me.

Next time, I will share the changes I made to Workling to support threads and provide the necessary configuration to ensure that everything stays running in production.

F-117 stealth fighters to make final flight and Agile Development

March 12th, 2008

F-117 stealth fighters to make final flight no one will know about - Engadget

Skunk Works It’s the final flight of the F-117. It’s not exactly the prettiest aircraft in the world, but it sure broke new ground. At first, I was surprised, but I didn’t realize there we so few of them left. Then it makes sense. Although, being replaced by the F-22 is an expensive proposition.

If you are interested in how engineering works when you have very smart people and not much oversight, check out Skunk Works. It still boggles my mind that they were able to build the SR-71 from scratch in two years. Especially, when you think about the fact that almost nothing on that aircraft had every been done before. They had to invent new tires, new fuel, new oil, engines that could cruise on afterburner, titanium fabrication techniques, and the list goes on. My favorite story involves paying vendors with suitcases full of cash so no one could follow the money — these were very secret projects. Crazy stuff…

I also look back and see the parallels with Agile Software development. The principles are the same. You take a small team of smart people, empower them to solve a very difficult problem, and you can do a lot in a very short period of time. Kelly Johnson and his team had the luxury of super-secret projects that afforded them the freedom to do whatever is necessary to build their product. The same can be said for Agile Development teams. Give the team a goal and let them figure out the best way to reach that goal. With strong leadership, I’ll bet that small team will destroy a large team’s productivity any day of the week and twice on Sunday. Kelly Johnson said it best:

“Be Quick, Be Quiet, And Be On Time”

I might just have to add that to my own “Go big, or stay home.”

Kelly’s rules of management aren’t too bad either:

  1. The Skunk Works manager must be delegated practically complete control of his program in all aspects. He should report to a division president or higher.
  2. Strong but small project offices must be provided both by the military and industry.
  3. The number of people having any connection with the project must be restricted in an almost vicious manner. Use a small number of good people (10% to 25% compared to the so-called normal systems).
  4. A very simple drawing and drawing release system with great flexibility for making changes must be provided.
  5. There must be a minimum number of reports required, but important work must be recorded thoroughly.
  6. There must be a monthly cost review covering not only what has been spent and committed but also projected costs to the conclusion of the program. Don’t have the books ninety days late and don’t surprise the customer with sudden overruns.
  7. The contractor must be delegated and must assume more than normal responsibility to get good vendor bids for subcontract on the project. Commercial bid procedures are very often better than military ones.
  8. The inspection system as currently used by the Skunk Works, which has been approved by both the Air Force and Navy, meets the intent of existing military requirements and should be used on new projects. Push more basic inspection responsibility back to subcontractors and vendors. Don’t duplicate so much inspection.
  9. The contractor must be delegated the authority to test his final product in flight. He can and must test it in the initial stages. If he doesn’t, he rapidly loses his competency to design other vehicles.
  10. The specifications applying to the hardware must be agreed to well in advance of contracting. The Skunk Works practice of having a specification section stating clearly which important military specification items will not knowingly be complied with and reasons therefore is highly recommended.
  11. Funding a program must be timely so that the contractor doesn’t have to keep running to the bank to support government projects.
  12. There must be mutual trust between the military project organization and the contractor with very close cooperation and liaison on a day-to-day basis. This cuts down misunderstanding and correspondence to an absolute minimum.
  13. Access by outsiders to the project and its personnel must be strictly controlled by appropriate security measures.
  14. Because only a few people will be used in engineering and most other areas, ways must be provided to reward good performance by pay not based on the number of personnel supervised.

In a follow-up post, I will map these rules to an Agile Software project.