CITS3007 lab 2 (week 3) – Debugging – solutions

For this lab, from within your VM, download the source code for the lab from the lab-01-code.zip zip file. (You can do this by running, for instance, wget https://cits3007.github.io/labs/lab-02-code.zip from within the VM.) You can then unzip the file using the unzip command, and view individual files using less or vim.

This lab shows how you can use GDB (the GNU Debugger) program to inspect a running program. This is important for later labs, and for the unit project. The best way of fixing bugs in your project code will be to use GDB to step through your code and pinpoint the source of those bugs. Often, you will also be able to access a debugger through your IDE or graphical editor.1 However, it’s worth learning how to use GDB directly, as in practice, you won’t always have access to an IDE or graphical editor (for instance, when debugging programs running on cloud-based virtual machines).

1. GDB basics

GDB, the GNU Debugger, lets us step through compiled C (or C++) programs and examine the values of variables in the running program.

When compiling programs we wish to debug, we need to pass the flag -g to gcc, which tells it to add debugging information. It can also be helpful to pass the -O0 option to gcc, which tells the compiler not to optimize the compiled code.2 If we try to execute a binary, and gcc has heavily optimized the machine-code instructions emitted, then the CPU instructions being executed may not correspond very closely to the source code we provided, making the behaviour of GDB unexpected.3

The Makefile for this lab already includes these two flags, so running make factorial in your VM is all you need to do to compile the code. (All commands from this point on in the lab are intended to be run from the command-line in your VM, in the cloned lab02 directory, unless otherwise specified.)

1.1. Factorial results

Read the API comments for the factorial function in factorial.c, and build the factorial program with the command make factorial.

Try executing the factorial program with various arguments from 0 to 20 (the valid range) and outside it. Does the program print the correct result? (If you’re not sure what the factorial of some number is, then Googling “factorial 10”, for example, should give you an answer.)

See if you can spot the cause of the error in factorial.c. If you can, don’t fix it yet – we’re going to use the program to experiment with debugging using GDB.

1.2. Running GDB

Launch the debugger by running

$ gdb ./factorial

You should see some welcome messages from GDB, then it will display the debugger prompt (gdb). As the welcome messages say, you can type help at this prompt to get help, but the online help is unfortunately not especially useful unless you already have some familiarity with GDB. (If you do know the first letter of a command you’re interested in, then GDB has an “autocomplete” feature – type l and then the tab key a couple of times, to see commands beginning with l.)

Some of the commands you can run from the GDB prompt include:

Try both of these commands. When you run the program, you should see it print the error message

Error: expected 1 command-line argument (an INT), but got 0

since by default, GDB runs the program with no command-line arguments. (GDB should also print a message saying that our program exited with code 01. By convention, programs on Unix-like platforms exit with a non-zero code to indicate an error.)

Set the programs arguments by running the following command (don’t type the (gdb) prompt):

(gdb) set args 6

and then running the program again.

Now, exit the debugger by typing quit or ctrl-d, and start it again. This time, we’ll use GDB’s TUI (text-based user interface).

Type ctrl-x and then the a key immediately afterward. A “window” should open in your terminal; run the list command, and you should see something like this:

The arrow keys and the pageup and pagedown keys on your keyboard should now move you around in the source listing window, and ctrl-i will refresh the display if at any point it seems to get out of sync with what you’re doing. (The ctrl-x a sequence toggles between GDBs normal mode and TUI mode; hitting it repeatedly will take you back and forth between them.)

The breakpoint LINENUM command (b for short) will set a breakpoint in the code (and the source listing will indicate this with a “b+” in the code margin).

Run the command b 26 to set a breakpoint at line 26 (containing the statement argc--), and r to run the program.

GDB will highlight the line about to be executed. Some other useful commands:

For some additional commands and advanced features, see the Hitchikers Guide To The GDB and the GDB tutorial series here and here from RedHat. GDB “cheat sheets” are available here (PDF) and here.

Dynamic printf

A common method of debugging C programs is to add printf() invocations at various points in the program to show what the value program variables take on at different times. A disadvantage of this approach is that it requires you to re-compile your program, and you must remember to remove the calls to printf() from your final code.

However, GDB will let you add printf() invocations without recompiling the program using the dprintf (dynamic printf) command. Issuing the dprintf LINENUM, FORMAT-STRING, EXPRESSION command has the effect of adding a breakpoint at LINENUM, as well as inserting a call to printf which prints the specified expression using a specified printf-style format string.

So, for example, the command dprintf myprogram.c:8, "Num elements: %d\n", n would allow you to insert printf calls that nicely display the value of the variable n at line 8 of a program.

If you’re interested in using the dprintf command, you can find a tutorial on how to use it here.

1.3. argc and argv

The first thing the factorial program does in main is execute the following statements:

argc--;
argv++;

If you’re running an instance of the factorial program, kill it with k, use set args 6 to set the command-line arguments of the program, and run it with r. (Your breakpoint at line 26 should still be showing; execute the command b 26 to set if you’ve accidentally exited GDB and come back in.)

Step through the program, examining the values of argc, argv, and elements of argv (like argv[0] and argv[1]) at various points in the program.

[s]tep vs [n]ext

In general, when you’re stepping through code, the command you want to use is “n” (“next”), which steps over function calls when it encounters them.

When you encounter a call to a function you’ve defined elsewhere in the program, and want to step “into” that function, then “s” (“step”) is the command to use.

If you try and invoke “s” on a function that is part of the C runtime, however, like strtol, then GDB will print an error something like this:

  (gdb) s
  __strtol (nptr=0x7fffffffe730 "6", endptr=0x7fffffffe3a0, base=10) at ../stdlib/strtol.c:105
  ../stdlib/strtol.c: No such file or directory.

Here, GDB is telling you that it can’t “step into” the code for strtol, because it can’t find the original source code for that function, nor can it find any “debugging symbols” for it. (To save disk space, C runtime libraries are normally shipped without either of those – though it is possible to install them if you wish.) A quick fix is to type f for finish, which will finish running the current function, and so should get you back to the C code the function was called from.

You’ll get a similar error if you try to “step into” the errno variable. errno isn’t a library function, but is a global symbol defined in the C runtime, and thus causes the same sort of error messages if you try to step “into” it.

The takeaway here is: usually, you can only “step into” functions that you’ve defined, and only when you compiled your code using the “-g” option which causes gcc to include debugging symbols.

What is the effect of the two statements we listed above? Why would we use them?

Sample solutions

The statements are used to “ignore” the value of argv[0], which contains the filename of the program being executed. argc is decremented by one (so the number of arguments is reduced by one), and argv is incremented by one – the effect is that argv[0] after the statements are executed points to what used to be argv[1].

This is commonly done in programs where we are interested in the command-line arguments of the program, but have no especial interest in the name of the executable.

1.4. strtol

In the file factorial.c, we use the function strtol to convert the program’s first command-line argument into a long, despite the fact that the factorial function only takes an int, and we then cast the long into an int.

However, C11 has a function atoi, which converts strings to ints, so it seems we could have used that. Read the documentation for

and summarize what the differences are. Why might we prefer strtol over atoi?

Sample solutions

strtol provides more information and allows for more precise error-checking than atoi. When atoi can’t convert the string it sees, it returns the value 0: this is unhelpful, since it means we have no way of knowing whether it encountered a conversion error, or was actually supplied with the string "0".

atoi also has unhelpful behaviour when the number encountered would fall outside the bounds of an int (i.e. is too large in magnitude, whether positive or negative): its behaviour is simply undefined. strtol in contrast is more helpful: it sets the global value errno (which you can read about here) to the value ERANGE to indicate an “out of range” error, and still returns some potentially useful value (either the highest or lowest value a long can hold).

Because of this, many C style guides recommend that strtol be used instead of atoi.

1.5. Diagnosing and fixing the factorial bug

Kill the factorial program, set a breakpoint somewhere in the factorial function (e.g. line 18), and use the run and/or continue commands to get to your breakpoint.

Step through execution of the factorial function, and examine the values of the local variables (using either print or info locals). What is the bug in factorial? Fix it.

Sample solutions

The fix is that the line

for (int i = n; i >= 0; i--) {

should be changed to

for (int i = n; i > 0; i--) {

In other words, this is an “off-by-one” error.

Recommendation – keep lab notes

It’s recommended you keep online notes of useful commands you come across in the unit and/or useful links, as a reminder to yourself of what we’ve covered. You could keep a Word or text document, if you like, using Google Docs, but another option is to store your notes in a “Gist” – a single text file versioned by GitHub.

If you have a GitHub account and are logged in, then click on the “+” symbol in the top right of any GitHub page, and select “New gist”. Give your gist a description (e.g. “My CITS3007 notes”) and a filename (e.g. “notes.md”). Then click “Create secret gist” (or “public”, if you wish to make it public).

Gists support formatting your file using Markdown – for instance, use asterisks (“*”) to surround words intended to be italic, and start paragraphs which should be part of a list with a hyphen and space (“- ”). Clicking the “Preview” tab will show you what your notes look like converted to HTML.

2. Segmentation faults

Compile the segfault program by running make segfault and then run it with ./segfault. The intended behaviour is that it should accept a line of input from the user, and echo this back.

However, when it is run and some text entered, it produces a segmentation fault. A segmentation fault is caused when the CPU detects that a program has attempted to access memory which it is not permitted to access.

Try running the program using GDB. (Hint: you can get GDB to start in TUI mode by running gdb -tui ./segfault.) Start GDB and run the program with the run command, and enter some text. Once the segfault occurs, run the backtrace command to see the current stack trace.

You should see something like

#1  0x00007ffff7e2a96c in __GI__IO_getline (fp=fp@entry=0x7ffff7f93980 <_IO_2_1_stdin_>, buf=buf@entry=0x0,
    n=n@entry=1023, delim=delim@entry=10, extract_delim=extract_delim@entry=1) at iogetline.c:34
#2  0x00007ffff7e296ca in _IO_fgets (buf=0x0, n=1024, fp=0x7ffff7f93980 <_IO_2_1_stdin_>) at iofgets.c:53
#3  0x0000555555555209 in main () at segfault.c:9

Each stack frame shows the values of the arguments to the function called for that frame. Do any of them look suspicious?

Sample solutions

The argument buf (in stack frames #1 and #2) is equal to 0 – that is, the NULL pointer. Trying to dereference the NULL pointer in C – on platforms that have hardware memory protection – will generally result in a segmentation fault.

However, not all platforms do implement memory protection, so this is not guaranteed.

Try printing the value of buf (with the command print buf) before and after it has been allocated, and see what result you get.

Sample solutions

After the allocation, buf will be set to 0 (the NULL pointer), which is not what we want – we expect it should point to newly-allocated memory.

Try using the print command to see how the “bitwise left shift” (“<<”) operator works.

Try p 1 << 2, p 1 << 10, and a few other values, then try p 1 << 31. What result do you get? Why might this occur? (Hint: read the cppreference.com page on arithmetic operators, in particular the section on “overflow”: https://en.cppreference.com/w/cpp/language/operator_arithmetic. Also try the ptype command for the various values you typed above, to see what their type is.) How can the program be fixed?

Sample solutions

The behaviour occurs because the type of 1 << 32 is type int, which is a signed integer type, and only has 4 bytes (32 bits) on the platform we’re using. The left shift operator << has the effect of multiplying a number by two n times (where n is the right-hand operand). For an int, multiplying by two 32 times leads to overflow, and the int “wraps around” to a negative number.

The immediate problem with the program can be fixed by:

3. C refresher no. 2

On Moodle, you will find an unassessed quiz entitled “C refresher no. 2”. It’s recommended you complete this (either now, or in your own time) to check your knowledge of C control flow structures and data types.

 

  1. For instance, Eclipse and VS Code will provide a graphical interface to GDB.↩︎

  2. Passing flags like -O1, -O2 and -O3 to gcc tells it to spend longer compiling the code, in order apply increasingly advanced optimizations; see the documentation for gcc’s optimization options for more details.↩︎

  3. On the other hand, sometimes the behaviour we’re trying to debug might only appear when optimizations are enabled. In such a case, we will likely have to debug our optimized binary, and simply accept that sometimes, the code being executed differs from what we see in the source file.↩︎