Friday, October 30, 2009

4 Dan debugging

So, a 4 Dan programmer must have some pretty awesome debugging skills, right? I've done most of my programming for the last 8 years or so on Linux using GCC and GDB, so I should have intimate knowledge of the full power of GDB, right?

Wrong. In fact, my knowledge of GDB is probably just a small subset of the things you might find in a GDB cheat sheet. Hmm, maybe I should check that assumption. I'm feeling lucky, what is Google's first link for gdb cheat sheet? Ahh, it is a tutorial page on Ok, I just skimmed that page and here is my list of commands I commonly use:
  1. gdb -e name-of-executable -c name-of-core-file
  2. bt
  3. up number
  4. down number
  5. list
  6. p variable-name
  7. quit
There are two more gdb commands that I use regularly that are not on the cheat sheet:
  1. thread apply all bt
  2. thread number
That's pretty much it. Yes, I vaguely remember a few other commands and can find them using the help command, but I hardly ever use those commands. Note that I nearly always invoke gdb on a core file, and since I almost never debug a running program, I don't use breakpoints, watchpoints, single stepping, etc.

Let me state up front that I sometimes have real feelings of inferiority over my knowledge of debugging tools. I've known other programmers who were strong designers/programmers but who seemed godlike (to me) in their mastery of debugging tools and I have to admit that I have been envious. Yet I've been on teams with programmers like that and my productivity was comparable or better.

So, do I just write perfect code that never needs to be debugged? Absolutely not. I typically invoke gdb several times a day, sometimes dozens of times. Are you ready for me to reveal my secrets?

Ok, here they are: 
  1. assert() is your friend.
  2. Make your assumptions explicit in your code (see #1)
  3. Don't fall into the trap of believing something to be true without evidence.
There, that's basically it. So, how does it manifest in the way I work? Well, as I said I invoke gdb several times a day or more. Most of the times that I invoke gdb, I do so because a program has just terminated due to an assertion that I put in the code. I run gdb with the core dump, look at the stack trace (or possibly several if the code is multithreaded), and probably examine some variables. I do this to figure out where my previous assumptions where incorrect. This generally leads directly to the fix and probably prompts me to add some more assertions.

Sometimes my program terminated for some other reason than an assertion. Then my job is to look at the stack trace and figure out what assertions I should add that will cause the program to instead terminate with an assertion. Most of the time this is relatively easy to do.

As a quick aside, the various Agile programming methodologies put a big emphasis on unit tests. Note that assertions are a kind of unit test.

There are more to assertions than just assert(). It's very useful to understand the concepts of precondition, postcondition, invariant, etc. Understanding these concepts will help you not only in choosing good assertions to use in your code, but how to better structure your code. However, I've think I've said enough about assertions for this post.

Returning to the three points above, let's look at points 2 and 3. Point 2 is almost redundant. It connects points 1 and 3. Point 3 says to make sure you have evidence to support your beliefs about the functioning of your code. Point 1 gives you the mechanism for  obtaining that evidence from your running code. Point 2 simply says just do it.

So the real key concept that must be mastered is the final one. A master programmer must be a kind of zen skeptic. You must never let your beliefs about reality outweigh the evidence that reality gives you, and you must be actively asking reality to give you evidence. You must always seek evidence with an open mind.

A few days ago I read a delightful post by Paul Buchheit titled Applied Philosophy, a.k.a. "Hacking". Paul opens by saying:
Every system has two sets of rules: The rules as they are intended or commonly perceived, and the actual rules ("reality"). In most complex systems, the gap between these two sets of rules is huge.
Paul goes on to discuss how hackers of various kinds exploit the gap in various systems and gives several great examples in different domains. But lets see how we can apply his statement to the problem of debugging.

Computer programs have a great deal of complexity. There is complexity that the programmer puts into the program through her use of programming language constructs. There is complexity that the compiler puts into the program by turning the program into machine code. There is additional complexity that arises from the data the program consumes, the internal state of the program as it evolves. And finally, if the program is multithreaded, there is additional complexity that arises from the complex timing relationships between the threads.

The human mind is amazing in its ability to understand complex systems, but we have to recognize that our understanding is always an imprecise model of reality. We create beliefs about how complex systems work and these beliefs are necessarily dramatic simplifications. So, there is a gap between reality and our belief about reality. To be able to debug complex computer programs we must be aware of that gap and apply various methods to minimize the gap. Or to at least minimize the negative consequences of the gap. We can do that using concrete tools such as assertions, but the biggest gain comes from having a good understanding of the limitations of our own minds.

We can (and all of us do) have bugs in our thought processes that can hold us back in our ability to design, implement and debug. We begin the path to mastery when we begin to debug our own thought processes.

No comments:

Post a Comment