CITS3007 lab 2 (week 3) – Debugging

For this lab, from within your VM, download the source code for the lab from the lab-01-code.zip zip file. (You can do this by running, for instance, wget https://cits3007.github.io/labs/lab-02-code.zip from within the VM.) You can then unzip the file using the unzip command, and view individual files using less or vim.

This lab shows how you can use GDB (the GNU Debugger) program to inspect a running program. This is important for later labs, and for the unit project. The best way of fixing bugs in your project code will be to use GDB to step through your code and pinpoint the source of those bugs. Often, you will also be able to access a debugger through your IDE or graphical editor.1 However, it’s worth learning how to use GDB directly, as in practice, you won’t always have access to an IDE or graphical editor (for instance, when debugging programs running on cloud-based virtual machines).

1. GDB basics

GDB, the GNU Debugger, lets us step through compiled C (or C++) programs and examine the values of variables in the running program.

When compiling programs we wish to debug, we need to pass the flag -g to gcc, which tells it to add debugging information. It can also be helpful to pass the -O0 option to gcc, which tells the compiler not to optimize the compiled code.2 If we try to execute a binary, and gcc has heavily optimized the machine-code instructions emitted, then the CPU instructions being executed may not correspond very closely to the source code we provided, making the behaviour of GDB unexpected.3

The Makefile for this lab already includes these two flags, so running make factorial in your VM is all you need to do to compile the code. (All commands from this point on in the lab are intended to be run from the command-line in your VM, in the cloned lab02 directory, unless otherwise specified.)

1.1. Factorial results

Read the API comments for the factorial function in factorial.c, and build the factorial program with the command make factorial.

Try executing the factorial program with various arguments from 0 to 20 (the valid range) and outside it. Does the program print the correct result? (If you’re not sure what the factorial of some number is, then Googling “factorial 10”, for example, should give you an answer.)

See if you can spot the cause of the error in factorial.c. If you can, don’t fix it yet – we’re going to use the program to experiment with debugging using GDB.

1.2. Running GDB

Launch the debugger by running

$ gdb ./factorial

You should see some welcome messages from GDB, then it will display the debugger prompt (gdb). As the welcome messages say, you can type help at this prompt to get help, but the online help is unfortunately not especially useful unless you already have some familiarity with GDB. (If you do know the first letter of a command you’re interested in, then GDB has an “autocomplete” feature – type l and then the tab key a couple of times, to see commands beginning with l.)

Some of the commands you can run from the GDB prompt include:

Try both of these commands. When you run the program, you should see it print the error message

Error: expected 1 command-line argument (an INT), but got 0

since by default, GDB runs the program with no command-line arguments. (GDB should also print a message saying that our program exited with code 01. By convention, programs on Unix-like platforms exit with a non-zero code to indicate an error.)

Set the programs arguments by running the following command (don’t type the (gdb) prompt):

(gdb) set args 6

and then running the program again.

Now, exit the debugger by typing quit or ctrl-d, and start it again. This time, we’ll use GDB’s TUI (text-based user interface).

Type ctrl-x and then the a key immediately afterward. A “window” should open in your terminal; run the list command, and you should see something like this:

The arrow keys and the pageup and pagedown keys on your keyboard should now move you around in the source listing window, and ctrl-i will refresh the display if at any point it seems to get out of sync with what you’re doing. (The ctrl-x a sequence toggles between GDBs normal mode and TUI mode; hitting it repeatedly will take you back and forth between them.)

The breakpoint LINENUM command (b for short) will set a breakpoint in the code (and the source listing will indicate this with a “b+” in the code margin).

Run the command b 26 to set a breakpoint at line 26 (containing the statement argc--), and r to run the program.

GDB will highlight the line about to be executed. Some other useful commands:

For some additional commands and advanced features, see the Hitchikers Guide To The GDB and the GDB tutorial series here and here from RedHat. GDB “cheat sheets” are available here (PDF) and here.

Dynamic printf

A common method of debugging C programs is to add printf() invocations at various points in the program to show what the value program variables take on at different times. A disadvantage of this approach is that it requires you to re-compile your program, and you must remember to remove the calls to printf() from your final code.

However, GDB will let you add printf() invocations without recompiling the program using the dprintf (dynamic printf) command. Issuing the dprintf LINENUM, FORMAT-STRING, EXPRESSION command has the effect of adding a breakpoint at LINENUM, as well as inserting a call to printf which prints the specified expression using a specified printf-style format string.

So, for example, the command dprintf myprogram.c:8, "Num elements: %d\n", n would allow you to insert printf calls that nicely display the value of the variable n at line 8 of a program.

If you’re interested in using the dprintf command, you can find a tutorial on how to use it here.

1.3. argc and argv

The first thing the factorial program does in main is execute the following statements:

argc--;
argv++;

If you’re running an instance of the factorial program, kill it with k, use set args 6 to set the command-line arguments of the program, and run it with r. (Your breakpoint at line 26 should still be showing; execute the command b 26 to set if you’ve accidentally exited GDB and come back in.)

Step through the program, examining the values of argc, argv, and elements of argv (like argv[0] and argv[1]) at various points in the program.

[s]tep vs [n]ext

In general, when you’re stepping through code, the command you want to use is “n” (“next”), which steps over function calls when it encounters them.

When you encounter a call to a function you’ve defined elsewhere in the program, and want to step “into” that function, then “s” (“step”) is the command to use.

If you try and invoke “s” on a function that is part of the C runtime, however, like strtol, then GDB will print an error something like this:

  (gdb) s
  __strtol (nptr=0x7fffffffe730 "6", endptr=0x7fffffffe3a0, base=10) at ../stdlib/strtol.c:105
  ../stdlib/strtol.c: No such file or directory.

Here, GDB is telling you that it can’t “step into” the code for strtol, because it can’t find the original source code for that function, nor can it find any “debugging symbols” for it. (To save disk space, C runtime libraries are normally shipped without either of those – though it is possible to install them if you wish.) A quick fix is to type f for finish, which will finish running the current function, and so should get you back to the C code the function was called from.

You’ll get a similar error if you try to “step into” the errno variable. errno isn’t a library function, but is a global symbol defined in the C runtime, and thus causes the same sort of error messages if you try to step “into” it.

The takeaway here is: usually, you can only “step into” functions that you’ve defined, and only when you compiled your code using the “-g” option which causes gcc to include debugging symbols.

What is the effect of the two statements we listed above? Why would we use them?

1.4. strtol

In the file factorial.c, we use the function strtol to convert the program’s first command-line argument into a long, despite the fact that the factorial function only takes an int, and we then cast the long into an int.

However, C11 has a function atoi, which converts strings to ints, so it seems we could have used that. Read the documentation for

and summarize what the differences are. Why might we prefer strtol over atoi?

1.5. Diagnosing and fixing the factorial bug

Kill the factorial program, set a breakpoint somewhere in the factorial function (e.g. line 18), and use the run and/or continue commands to get to your breakpoint.

Step through execution of the factorial function, and examine the values of the local variables (using either print or info locals). What is the bug in factorial? Fix it.

Recommendation – keep lab notes

It’s recommended you keep online notes of useful commands you come across in the unit and/or useful links, as a reminder to yourself of what we’ve covered. You could keep a Word or text document, if you like, using Google Docs, but another option is to store your notes in a “Gist” – a single text file versioned by GitHub.

If you have a GitHub account and are logged in, then click on the “+” symbol in the top right of any GitHub page, and select “New gist”. Give your gist a description (e.g. “My CITS3007 notes”) and a filename (e.g. “notes.md”). Then click “Create secret gist” (or “public”, if you wish to make it public).

Gists support formatting your file using Markdown – for instance, use asterisks (“*”) to surround words intended to be italic, and start paragraphs which should be part of a list with a hyphen and space (“- ”). Clicking the “Preview” tab will show you what your notes look like converted to HTML.

2. Segmentation faults

Compile the segfault program by running make segfault and then run it with ./segfault. The intended behaviour is that it should accept a line of input from the user, and echo this back.

However, when it is run and some text entered, it produces a segmentation fault. A segmentation fault is caused when the CPU detects that a program has attempted to access memory which it is not permitted to access.

Try running the program using GDB. (Hint: you can get GDB to start in TUI mode by running gdb -tui ./segfault.) Start GDB and run the program with the run command, and enter some text. Once the segfault occurs, run the backtrace command to see the current stack trace.

You should see something like

#1  0x00007ffff7e2a96c in __GI__IO_getline (fp=fp@entry=0x7ffff7f93980 <_IO_2_1_stdin_>, buf=buf@entry=0x0,
    n=n@entry=1023, delim=delim@entry=10, extract_delim=extract_delim@entry=1) at iogetline.c:34
#2  0x00007ffff7e296ca in _IO_fgets (buf=0x0, n=1024, fp=0x7ffff7f93980 <_IO_2_1_stdin_>) at iofgets.c:53
#3  0x0000555555555209 in main () at segfault.c:9

Each stack frame shows the values of the arguments to the function called for that frame. Do any of them look suspicious?

Try printing the value of buf (with the command print buf) before and after it has been allocated, and see what result you get.

Try using the print command to see how the “bitwise left shift” (“<<”) operator works.

Try p 1 << 2, p 1 << 10, and a few other values, then try p 1 << 31. What result do you get? Why might this occur? (Hint: read the cppreference.com page on arithmetic operators, in particular the section on “overflow”: https://en.cppreference.com/w/cpp/language/operator_arithmetic. Also try the ptype command for the various values you typed above, to see what their type is.) How can the program be fixed?

3. C refresher no. 2

On Moodle, you will find an unassessed quiz entitled “C refresher no. 2”. It’s recommended you complete this (either now, or in your own time) to check your knowledge of C control flow structures and data types.

 

  1. For instance, Eclipse and VS Code will provide a graphical interface to GDB.↩︎

  2. Passing flags like -O1, -O2 and -O3 to gcc tells it to spend longer compiling the code, in order apply increasingly advanced optimizations; see the documentation for gcc’s optimization options for more details.↩︎

  3. On the other hand, sometimes the behaviour we’re trying to debug might only appear when optimizations are enabled. In such a case, we will likely have to debug our optimized binary, and simply accept that sometimes, the code being executed differs from what we see in the source file.↩︎