Fully automated bisecting with "git bisect run"

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

February 3, 2009

This article was contributed by Christian Couder

It's a common developer practice to track down a bug by looking for the change that introduced it. This is most efficiently done by performing a binary search between the last known working commit and the first known broken commit in the commit history. git bisect is a feature of the Git version control system that helps developers do just that.

git bisect may also be well known by LWN readers for heated discussions on the Linux kernel mailing list about "asking" (or "forcing" depending on the point of view) users to find the bad commit when they report a regression. But a little-known addition, git bisect run, can allow a developer to completely automate the process. This can be very useful and may enable switching to interesting new debugging workflows.

At each step of the binary search, git bisect checks out the source code at the commit chosen by the search. The user then has to test to see if the software is working or not. If it is, the user performs a git bisect good, otherwise they do a git bisect bad, and the search proceeds accordingly. This is different than the idea behind git bisect run, as it uses a script or a shell command to determine if the source code—which git bisect automatically checked out—is "good" or "bad".

This idea was suggested by Bill Lear in March 2007, and I implemented it shortly thereafter. It was then released in Git 1.5.1.

Technically, the script or command passed to git bisect run is run at each step of the bisection process, and its exit code is interpreted as "good", if it's 0, or "bad", otherwise (except 125 and values greater than 127, see the git bisect documentation for more information.)

One simple and yet useful way to take advantage of that is to use git bisect run to find which commit broke the build. Some kernel developers like this very much. Ingo Molnar wrote:

for example git-bisect was godsent. I remember that years ago bisection of a bug was a very [laborious] task so that it was only used as a final, last-ditch approach for really nasty bugs. Today we can [autonomously] bisect build bugs via a simple shell command around "git-bisect run", without any human interaction!

For example, with a not too old Git (version 1.5.2 or greater), bisecting a build bug in the Linux kernel may be just a matter of launching:

    git bisect start linux-next/master v2.6.26-rc8
    git bisect run make kernel/fork.o

because the git bisect start command, when it is passed two (or more) revisions, here "linux-next/master" and "v2.6.26-rc8", interprets the first one as "bad" and the other ones as "good".

This works as follows: git bisect checks out the source code of a commit to be tested, then runs make kernel/fork.o. make will exit with code 0 if it builds, or with something else (usually 2) otherwise. This gets recorded as "good" or "bad" for the commit that was checked out, which will enable the binary search to continue by finding another commit to check out, then run make again, and so on, until the first "bad" commit in the history is found.

But to bisect regressions that manifest themselves on the running code, as opposed to build problems, it's usually more complicated. You probably have to write a test script that should be passed to git bisect run.

For example, a test script for an application built with make and printing on its standard output might look like this:

    #!/bin/sh

    make || exit 125   # an exit code of 125 asks "git bisect"
		       # to "skip" the current commit

    # run the application and check that it produces good output
    ./my_app arg1 arg2 | grep 'my good output'

See this message from Junio Hamano, the Git maintainer, for explanations and a real world example of git bisect run used to find a regression in Git. The git bisect documentation has some short examples too.

It's even trickier for kernel hackers, because you have to reboot the computer each time you want to test a new kernel, but some kernel hackers suggest that it be used anyway if the problem is "reproducible, scriptable, and you have a second box". Ingo Molnar describes his bisection environment this way:

i have a fully automated bootup-hang bisection script. It is based on "git-bisect run". I run the script, it builds and boots kernels fully automatically, and when the bootup fails (the script notices that via the serial log, which it continuously watches - or via a timeout, if the system does not come up within 10 minutes it's a "bad" kernel), the script raises my attention via a beep and i power cycle the test box. (yeah, i should make use of a managed power outlet to 100% automate it)

So it's possible to use git bisect run on a wide array of applications. This means that, for example, automatically in your nightly builds, you can find the commit that broke the build or the test suite, and then use information from it to send a ~~flame~~ warning email to the developer responsible for that.

But what may be more interesting is that fully automated bisection may enable new workflows. On the git mailing list, Andreas Ericsson, a Git developer, reported:

To me, I'd happily use any scm in the world, so long as it has git-bisect. Otoh, I'm a lazy bastard and love bisect so much that all our automated tests are focused around "git bisect run". This means bugs in software released to customers are few and far apart. When we get one reported, we just create a new test that exposes it, fire up git-bisect and then go to lunch. Quality costs, however. We pay that bill by using a workflow that's perhaps more convoluted than necessary.

So it requires a little more work to make sure that every commit is small and easily bisectable. Then, to debug regressions, they follow these steps:

write, in the test suite, a test script that exposes the regression
use git bisect run to find the commit that introduced it
fix the bug that is often made obvious by the previous step
commit both the fix and the test script (and if needed more tests)

This may seem more complicated than a traditional workflow. But when asked about it, Andreas says:

I guess the real benefit is that "git bisect" makes the tests so immensely valuable, and so easy to write, that we do it gladly and quickly. The value comes *now* from almost all test-cases instead of in some far-distant and obscure future.

So this kind of workflow is good to take advantage of test cases you write. But what about global productivity? Four months after having said that he uses git bisect run, Andreas Ericsson wrote that git bisect "is well-nigh single-handedly responsible for reducing our average bugreport-to-fix time from 4 days to 6 hours".

Now, after more than one year of using it, he gives the following details:

To give some hard figures, we used to have an average report-to-fix cycle of 142.6 hours (according to our somewhat weird bug-tracker which just measures wall-clock time). Since we moved to git, we've lowered that to 16.2 hours. Primarily because we can stay on top of the bugfixing now, and because everyone's jockeying to get to fix bugs (we're quite proud of how lazy we are to let git find the bugs for us). Each new release results in ~40% fewer bugs (almost certainly due to how we now feel about writing tests). That's a huge boost in code quality and productivity, and it earned me and my co-workers a rather nice bonus last year :)

So quality costs, but, when using the right tools and workflows, it can bring in a rather nice return on investment!

Index entries for this article
GuestArticles	Couder, Christian

(Log in to post comments)

Fully automated bisecting with "git bisect run"

Posted Feb 5, 2009 2:13 UTC (Thu) by ncm (guest, #165) [Link]

Perhaps I am showing my ignorance here, but doesn't this require that buggy intermediate versions of the code be scrubbed out of the repository, so that the bisection may choose any random intermediate version? Or must known-toxic versions labeled to deflect bisect from choosing them?

Fully automated bisecting with "git bisect run"

Posted Feb 5, 2009 3:48 UTC (Thu) by dtlin (subscriber, #36537) [Link]

The article did mention that exit code 125 signaled "skip this commit" -- if you have your git run script return that for known-toxic versions, that would do the trick. Or so it seems, from reading the article; I've never tried it myself.

Fully automated bisecting with "git bisect run"

Posted Feb 5, 2009 5:58 UTC (Thu) by christian_couder (subscriber, #56350) [Link]

Yes this is the idea.

Exit code 125 tells "git bisect" to use "git bisect skip". "git bisect skip" marks the current commit as untestable and checks out another one nearby to be tested.

See the "git bisect" documentation for more information:

http://www.kernel.org/pub/software/scm/git/docs/git-bisec...

Fully automated bisecting with "git bisect run"

Posted Feb 5, 2009 7:44 UTC (Thu) by dlang (guest, #313) [Link]

note that to cause problems the buggy intermediate versions need to be buggy in a way that your test script can't tell from the failure you are looking for.

how big a problem this is depends on what your failure condition is.

if you know that when it fails it generates message X then you just look for message X and mark everything else as 'good' (it may actually crash and not do anything useful, but it's not the bug you are looking for)

if you are looking for a hang (or failure to boot like Ingo did in one example) then it's harder, you may end up going down the wrong path becouse some other bug is causing the problem (failing to boot in this example)

Bisect on patchset boundaries ?

Posted Feb 6, 2009 10:33 UTC (Fri) by lbt (subscriber, #29672) [Link]

The "buggy intermediate versions" problem is one that I would have expected more attention to be drawn to. I've done a few bisects and had my filesystem fail to mount on several cuts during one run.

A bisect that throws you into the middle of a patch set that messes with your filesystem is a dangerous place to be; especially when you ask 'normal' users to run a bisect on their everyday machines without warning them that they are potentially about to expose their data to random collections of code.

The problem is that all patches are supposed to be non-toxic - and git is deliberately not good at revisionist history ;)

That means that marking 'safe' bisect points is hard - but maybe a step in the right direction would be worth cutting on patchset boundaries?

Or maybe an external (well, inside .git/) list of 'good' commits eg rc releases?

Obviously this would make the bisect slightly less efficient but it may reduce the risk.

Other similar tools

Posted Feb 5, 2009 10:30 UTC (Thu) by epa (subscriber, #39769) [Link]

'git bisect' is a kind of delta debugging <http://www.st.cs.uni-saarland.de/dd/>: where you have a broken version of the code and a known working version, and you find the change between them that introduces the bug. It's constrained to the version history in the repository, so it will find the guilty commit.

An alternative is to randomly generate in-between files to find what difference causes the change. I believe DD.py <http://www.st.cs.uni-saarland.de/dd/ddusage.php3> is a tool for doing this.

Finally, a plug for delta <http://delta.tigris.org/> which isn't quite the same thing, but will automatically generate a minimal test case given a larger one.

Fully automated bisecting with "git bisect run"

Posted Feb 7, 2009 18:19 UTC (Sat) by oak (guest, #2786) [Link]

In general, how many of the kernel bugs need to be tested on real HW and
how many could be found in a virtual machine?

Fully automated bisecting with "git bisect run"

Posted Feb 9, 2009 18:19 UTC (Mon) by droundy (subscriber, #4559) [Link]

I'd imagine that very few of the "hard" bugs could be found on a virtual machine, almost by definition. I'd think that a "hard" bug would be one that can't easily be reproduced by a kernel developer.

Fully automated bisecting with "git bisect run"

Posted Oct 3, 2018 1:37 UTC (Wed) by samiam95124 (guest, #120873) [Link]

From my experience, the answer is "none of them and all of them". Let me explain.

What VMs (and simulations) allow is the finding of "some bugs" related to hardware. The reason is simple. Hardware generates timing bugs. VMs and simulations do find some of those issues, but they also tend to model hardware in an overly simplified way that will not trip many of the most serious timing bugs. What they bring to the table, however, is important in another way, which is they are generally repeatable (this is not always true by the way). Just like a pseudo-random sequence, they can deliver the same bug over and over again. I think a lot of us have enough experience chasing hardware timing bugs to realize how valuable that is.

Thus VM/Sim has a place, and that is usually first before you enter the "wild and woolly world" of hardware. The nice part is that typically matches hardware design flow, because sim is typically done before the hardware is available, especially in silicon/ic design world.

Fully automated bisecting with "git bisect run"

Posted Feb 8, 2009 16:06 UTC (Sun) by wfranzini (subscriber, #6946) [Link]

I noticed only reading the article that 'git bisect' was unable to run the command being investigated.

People interested in a workflow that includes tests can look at Aegis (http://aegis.sf.net/). It also has an aebisect(1) command that run the command being investigated.

Fully automated bisecting with "git bisect run"

Posted Feb 10, 2009 4:41 UTC (Tue) by christian_couder (subscriber, #56350) [Link]

The long usage message is:

$ git bisect help
Usage: git bisect [help|start|bad|good|skip|next|reset|visualize|replay|log|run]

git bisect help
        print this long help message.
git bisect start [<bad> [<good>...]] [--] [<pathspec>...]
        reset bisect state and start bisection.
git bisect bad [<rev>]
        mark <rev> a known-bad revision.
git bisect good [<rev>...]
        mark <rev>... known-good revisions.
git bisect skip [(<rev>|<range>)...]
        mark <rev>... untestable revisions.
git bisect next
        find next bisection to test and check it out.
git bisect reset [<branch>]
        finish bisection search and go back to branch.
git bisect visualize
        show bisect status in gitk.
git bisect replay <logfile>
        replay bisection log.
git bisect log
        show bisect log.
git bisect run <cmd>...
        use <cmd>... to automatically bisect.

Please use "git help bisect" to get the full man page.

So you have to use "git bisect run <cmd>..." to automatically run the command you investigate. And if you don't want to automatically run the command, you can test by yourself at each step of the binary search and then use "git bisect good" or "git bisect bad" or "git bisect skip" depending on the result of your tests.

Fully automated bisecting with "git bisect run"

Posted Oct 3, 2018 1:28 UTC (Wed) by samiam95124 (guest, #120873) [Link]

My 2 cents:

First, love the fact that this has been automated. I have been doing this procedure automatically for years (decades).

However, this is a bit like closing the barn door after the horse has left. I was a bit shocked to find there is a procedure to find make errors *after they have been committed*??? Why would the kernel group allow code to be committed without running a clean build?

Second, Linux is a pretty mature product these days. Your "clients" the hardware manufacturers, all run (or should all run) extensive regression tests. In fact, I don't think it is a tall order to submit regression tests along with drivers, which is not a requirement at the moment. Driver groups develop extensive regression tests, which Linux never sees, since they aren't upstreamed. So the drivers are open source, but their tests are secret?

Finally, for the extensive kernel code running outside of driver land, this also needs regression testing.

The point I was getting to here is that with build tests and regression tests, there would be little or no need for bisection, automated or manual. The reason build servers are popular now is to prevent make errors from reaching repositories. And it should be no different for functional errors, as well.

Fully automated bisecting with "git bisect run"

Posted Oct 3, 2018 7:15 UTC (Wed) by neilbrown (subscriber, #359) [Link]

> Why would the kernel group allow code to be committed without running a clean build?

Maybe someone started using a new compiler.
Or maybe I'm doing a git-bisect on my own private tree prior to submitting it. I just noticed a bug and now I want to find out where I introduced it, so I can fix and rebase.

> The point I was getting to here is that with build tests and regression tests, there would be little or no need for bisection, automated or manual.

"Testing shows the presence, not the absence, of bugs". Still true 50 years after Dijksrta wrote it.

Fully automated bisecting with "git bisect run"

Posted Oct 3, 2018 14:27 UTC (Wed) by bfields (subscriber, #19510) [Link]

Seems to me that a lot of the dumb build bugs I've introduced over the years have required a particular combination of config options.

This is a case where I should do more testing--a lot of those bugs would probably be caught by just a few extra builds that exercise some of the most commonly problematic combinations.

But I'll never catch all of those just by doing enough builds. I mean, the kernel has thousands of build options. Combinatorics is against us.

Fully automated bisecting with "git bisect run"

Posted Oct 3, 2018 14:08 UTC (Wed) by bfields (subscriber, #19510) [Link]

Completely agreed on the value of regression testing. We could always use more. But that will never catch every bug!

Take any random bug that's been discussed on lwn, say, I don't know, this mutex_unlock/free race or the famous e1000e bug. Without knowing about the bug ahead of time, what kind of testing would be guaranteed to catch one of those in a reasonable amount of time?

It's not impossible, but it's hard, and something will always slip through, absent some change more fundamental than just adding more regression tests.