Anti-regression Training: git bisect (Part 1)

Part 1 of 2

Git is a crazy-powerful and flexible tool for managing your source code. I’m gonna talk about one of git’s most powerful and useful tools that I’ve always felt doesn’t get enough attention, git bisect, a tool for fighting everybody’s [least?] favorite gremlin, software regressions.

Software regressions can cause major headaches, and waste a lot of resources. According to the National Institute of Standards & Technology , software regressions can cost 80% of developer’s time. Personally, I feel like the 80% figure is pretty high, but I think everyone can agree that regressions are annoying, and take up more a lot more time than we as developers would like to spend on them.

Let’s take a look at a scenario I’ve outlined in this picture. I’m sure you can relate to as a developer, and have probably even had something like this happen to you within the last week!

This scenario revolves around a kernel development team, with a Hard Drive driver developer, a Sr Power Management developer, a Junior Power Management developer, and a subtle bug. (Say some power management disables something the hard drive needs powered on prematurely) Here’s the directed graph of the code development in git:

I’ll narrate the graph a bit. Hard Drive dev branches, does some work, merges back into mainline. At the same time PM dev branches, does some work, asks Jr for assistance with some small subroutine. Junior makes a minor mistake, but PM dev doesn’t notice, as its a very subtle bug. He merges Jr’s bad commit into his branch, and merges it into mainline.

A few weeks later, testing teams come back and say to the hard drive dev, “Hard Drive dev, your hard drive isn’t working!” Hard drive dev, confident in his code, (or perhaps he was on vacation for a few weeks) is baffled. As the point of contact for the Hard Drive team, he is stuck in the limelight; even when he has made no mistakes in his work!

Hard Drive dev could scratch his head, compile some code, and eventually figure out what part of the code isn’t working right any more. This might be time consuming though, involving lots of compiling, checking, and testing before its determined that something outside of the hard drive driver is malfunctioning. Even when its determined that the power management is wrong, the power management teams may not really have a good feel anymore for what went wrong; the patch that caused the error has probably already faded in memory.

By using git bisect, Hard Drive Dev can get out of the limelight quickly and give the other teams a starting point for correcting the issue. By doing a quick (and oftentimes automated) git bisect, he can pinpoints in short order that the merge from the Power Team caused an issue. The limelight leaves Hard Drive Dev (phew) and is directed towards the Power Team, chiefly Sr PM Dev. Sr PM Dev is a bit peeved his merge caused problems in the hard drive, but Sr PM Dev uses git bisect to pinpoint that the merge from Jr caused the problem. The hot potato passes to Junior. Since the change was minor (let’s say 20 lines of code) Junior can likely correct the mistake quickly.

So, as a developer being good at finding regressions saves you in a few ways:

  1. Rapidly narrows the potential field of code that is problematic for the fixer. Provided you keep commits small, you only have less than 100 lines of code that are suspicious, instead of thousands.
  2. Takes issues that aren’t your direct area off your plate quickly so you can work on whatever is the most important thing to work on.
  3. Quickly focuses responsibility to the person that can fix the issue the fastest.
  4. Ensures that every bug that you are working to fix is actually a bug that was caused by your mistakes.
  5. Allows you, as a developer to maintain mental context on your current problem, and not jump to a new bug all the time.
  6. Allows people without any understanding of the code, or even automated systems to test regressions.

Hopefully you’ve been convinced that being able to track down regressions saves you, as a developer, time and unneccessary blame. From the macro perspective, (what managers are concerned with) finding regression quickly can cut down on probably the most time-consuming activity the developers are doing.

Explanation on how to use git bisect coming tomorrow!

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.