Sie sind auf Seite 1von 7

S-GDB: A Usability Enhancing Extension for GDB

Raquel Alvarez

May 1st 2019

1 Problem Definition
The analysis of binaries has proven to be essential in certain applications where source code is not
available [9]. There are many tools available to perform binary analysis, such as IDA Pro and
GDB. However, my experience as a TA has shown me that these tools are often not user-friendly
for beginners. Therefore, beginners may come across a steep learning curve that discourages them
to continue exploring binary analysis. To help solve this issue, I designed an interface aimed to
expose beginners, in an intuitive way, to the powerful features of GDB.

2 Methodology
GDB offers a wide array of commands that expose useful details about a program to the user.
In order to understand which GDB commands are useful for a beginner user, I used one the
assignments given to CMPSC 311 students. The Bomb Lab assignment was designed by Randal
E. Bryant and David R. O’Hallaron of Carnegie Mellon University, for the systems programming
textbook Computer Systems: A Programmer’s Perspective. This assignment is intended to teach
students how to interpret x86 assembly instructions while using GDB. My experience as a TA
showed me that students mostly struggled with the following aspects of the assignment: (1) getting
started with loading a program into GDB, setting breakpoints and running, (2) knowing what each
instruction means and how they affect the program, (3) following the program flow when the list
of instructions was lengthy (20+ lines), and (4) using the examine command to display contents
in memory. Based on these observations, the S-GDB extension was designed with the following
features:

1. Interactive Tutorial (inspired by Vim). Students learn how to load a program into
GDB, set breakpoints, run it, and analyze its x86 instructions. In addition, students are
shown how to use the examine command effectively.

2. x86 Instruction Dictionary. Users can lookup instructions by name, which will show them
how instructions are used and how they work.

3. Loops and Recursion Highlighting. Students can select to display loops and recursive
calls for any given function.

4. Examining Memory. This interface teaches the user how to select the different arguments
used for the examine command.

The goal of this extension is to be user-friendly and easy to implement and adopt.

1
Figure 1: The code in this figure shows how to use a python class to create a the example command.

3 Implementation
GDB runs an instance of a python interpreter on the background, and provides an API to create
new commands (see Figure 1). S-GDB is implemented by leveraging this feature. In order to
create an interface for each of the features described on the Methodology section, I created a set
of commands. Described below are each interface and their functionality.

3.1 Interactive Tutorial


The interactive tutorial was inspired by the Vim tutorial [1]. Vim offers users a tutorial to learn
how to use vim’s commands effectively. S-GDB mimics this functionality by providing a tutorial
that walks users through a set of typical commands needed to analyze a program. The commands
featured in this tutorial are:

1. file. This command teaches the user how to load executables into GDB to be analyzed.

2. info functions. This is used to show the user how to find names of functions found in the
program.

3. disas. The user can see the actual instructions to be executed by the program.

4. breakpoint. We show the user how to setup a breakpoint to stop execution at a particular
point of interest.

5. run. After setting breakpoints, we show the user how to start running the program.

6. stepi. We show the user how to step through each instruction (or a group of instructions)
to analyze the state of the program after a certain instruction was executed.

7. finish. The user is taken to a state in which the instruction pointer is at a library function.
We show them how to use finish to continue executing until the next instruction is from
the caller function.

8. Memory Examination and Instruction Lookup. We also show the user how to use other
custom commands, which are described in detail on the following sections. In particular, the
tutorial places a special emphasis on reading strings in memory. This is introduced to the user
in a way that teaches them when it is appropriate to read strings rather than hex numbers.

The user can access this tutorial by running the command tutorial. The command is designed
to mimic the GDB shell. The user is asked to input the specified GDB command described at each

2
Figure 2: The tutorial displays an introductory message that describes the exercise and the moti-
vation behind it. It also reminds the user to not use binary analysis for unethical purposes.

step. There are a total of 16 steps. Figure 2 shows the introductory message shown to the user
when the tutorial command is executed.

In order to make this tutorial interactive, I created an program, notes, that reads the contents
of a file into a buffer and then asks the user for a password before displaying it (this was inspired
by reading some of the contest entries in The Underhanded C Contest [10]). The tutorial teaches
the user how to find the secret contents of the file without knowing the password. The setup for
this tutorial is envisioned as follows: (1) the program’s source code is not available to the user, and
(2) the user is also not allowed to open the secret file. The idea behind this exercise is to show the
user what binary analysis could be useful for, and how important it is to write secure programs.
Once the tutorial has been completed, the user is encouraged to find the correct password that the
program expects.

More details about the tutorial are available in the script: sgdb.py 1 .

3.2 x86 Instructions Dictionary


One the most common mistakes made by students during the Bomb Lab was guessing what each
instruction does. Typically, students would find that a simple google search might not be as quick
or descriptive as necessary to identify the behavior of an instruction. In addition, looking up each
instruction in the book was very time consuming, so students resorted to making educated guesses
on what each instruction did at a given point. This issue inspired the x86 Instructions Dictionary
feature in S-GDB. This dictionary is available to students through the command instruction
<name>, where <name> is the name of the instruction. The dictionary has a total of 90 instruc-
1
github.com/rva5120/sgdb

3
Figure 3: Output of the instruction sub command.

tions. These instructions were chosen based on the content featured in [3].

The command instruction <name> displays the instruction’s name, the way it may be found
in assembly code, and what the instruction does. When appropriate, the entry may also include
an explanation about how it can be interpreted system-wise (for example, push and sub). Figure 3
shows an example of the sub instruction. The explanation for each instruction was compiled from
both [3] and [2].

For a full list of instructions, please refer to: sgdb.py2 . Note that at any time it is possible to
add support for extra instructions.

3.3 Loops and Recursion Highlighting


Students tend to get overwhelmed by large amounts of unknown instructions. In addition to the
dictionary described in the previous section, I defined two custom commands that analyze the
instructions that make a function, detect loops and recursive calls, and change the coloring of the
output as visual cues to follow the control flow of the program. Both methods use the output of
the disassemble command. In order to do this I used the following methodology:

• Loops. Loops occur when the control flow of the program goes from one instruction to a
previously executed instruction in the same function. In order to detect this, I scan each
instruction in the given function and extract: (1) address of the instruction, (2) instruction
code, and (3) the arguments. If the instruction is a jump and the jumping address is less than
the current address, the loop is detected. I then add ANSI escape codes [11, 12] around the
lines from jumping address to current address. When printed, as seen in Figure 4, the output
highlights the loop with a different color. This coloring also allows for nested loops, because
the inner most ANSI escape code is the one that determines the coloring (see Figure 5). A
user can access this functionality through the show loops <function name> command.

• Recursion. Similarly to loops, recursive calls occur when the call instruction is used with
an address that matches the first instruction of the current function. Therefore, if the call
instruction is detected and the address matches the function address, the line is wrapped with
ANSI escape codes to color it. Figure 6 shows an example of the show recursion <function
name> command.

2
github.com/rva5120/sgdb

4
Figure 4: Command show loops main. Figure 5: Output with nested loops.

Figure 6: Command show recursion fib.

3.4 Examining Memory


To help students better understand the somewhat cryptic examine command, I designed a command
that asks the user a question to figure out what type of output they are looking for. The goal of
this command is to give the user some intuition on what each argument in the examine command
represents. To do this, I crafted a question for each one of the relevant arguments as follows:

1. What is the starting address? The user can enter the address from where they want to
see memory contents.

2. What format do you want? This question helps the user understand the type of format the
data stored in memory should be displayed as. For example, binary, hexadecimal or a string.

3. How many bytes do you want to display? The user can now select the number of bytes,
starting at the specified address, that GDB should display.

5
Figure 7: Command memory with strings. Figure 8: Command memory with hex.

4. What grouping do you want? Finally, the user can select how they want bytes to be
grouped.

Overall, students had more trouble remembering what each flag represented for questions 2
and 4. That made the examine command difficult for them to use. Therefore, I designed this new
interface to demystify the examine command, see Figures 7 and 8. The memory command shows the
user the equivalent examine command given the answers to the questions, in hopes that eventually
the user can craft their own examine command.

4 Evaluation
In order to evaluate the effectiveness of S-GDB in helping students learn more about GDB and
binary analysis, I crafted a user study with the following format:

1. Knowledge Questions. Answer the questions below.

(a) What GDB command can you use to read 20 bytes of memory at address 0xbeef?
(Restriction: You answer should be in hex format.)
(b) What GDB command can you use to load an executable into GDB?

2. Tutorial. Please complete the tutorial. Enter the command tutorial.

3. Knowledge Questions Revisited. Answer the questions below.

(a) What GDB command can you use to read 20 bytes of memory at address 0xbeef?
(Restriction: You answer should be in hex format.)
(b) What GDB command can you use to load an executable into GDB?

To test S-GDB, I asked one participant to take the study. The participant was a computer
science student with knowledge in C and assembly, but no knowledge on GDB. The student was
not able to answer the questions at the beginning, which was expected since they had no prior
knowledge on GDB, but answered them correctly at the end. Overall, the student found the
instructions dictionary was the most useful. In the future, I would like to perform a user study
with a larger number of participants to gain broader insights.

6
References
[1] Vim. https://www.vim.org/

[2] Aldeid. https://www.aldeid.com/wiki/X86-assembly/Instructions/div

[3] Bryant O’Hallaron. Computer Systems: A Programmer’s Perspective.

[4] J. Erickson. Hacking: The Art of Exploitation (No Starch Press)

[5] G. Weidman. Penetration Testing: A Hands-On Introduction to Hacking (No Starch Press)

[6] E. Eilam. Reversing: Secrets of Reverse Engineering (Wiley)

[7] GDB. https://sourceware.org/gdb/onlinedocs/gdb/Python-API.htmlPython-API

[8] T. Tromey. http://tromey.com/blog/?p=501

[9] Lin et al. Automatic Reverse Engineering of Data Structures from Binary Execution

[10] S. Craver. Underhanded C Contest. http://underhanded-c.org/

[11] Jafrog. http://jafrog.com/2013/11/23/colors-in-terminal.html

[12] Wikipedia. https://en.wikipedia.org/wiki/ANSI escape code

Das könnte Ihnen auch gefallen