Sie sind auf Seite 1von 12

Linux x86 Program Start Up

or - How the heck do we get to main()?


by Patrick Horgan
(Back to debugging.)
Click to show Table of Contents
ho!" thi" #or?
This is for people who want to understand how programs get loaded under linux. In particular it talks about
dynamically loaded x86 ELF files. The information you learn will let you understand how to debug problems
that occur in your program before main starts up. Eerything I tell you is true! but some things will be glossed
oer since they don"t take us toward our goal. Further! if you link statically! some of the details will be different.
I won"t coer that at all. #y the time you"re done with this though! you"ll know enough to figure that out for
yourself if you need to.
$hi" i" what we!%% co&er (pretty picture brought to you by dot - #i%ter #or drawing directed graph")
$hen we"re done! you"ll understand this.
How did we get to main?
$e"re going to build the simplest % program possible! an empty main! and then we"re going to look at the
disassembly of it to see how we get to main. $e"ll see that the first thing that"s run is a function linked to eery
program named &start which eentually leads to your program"s main being run.
'ae a copy of this as prog(.c if you want! and follow along. The first thing I"ll do is to build it like this.
HOME TUTORIALS PHOTOGRAPHY DEBUGGING STUFF
Page 1 of 12 Linux x86 Program Start Up
11/16/21! "ttp#//dbp$con%u&ting.com/tutoria&%/debugging/&inuxProgramStartup."tm&
#efore we try to debug a later ersion of this )prog*+! in gdb! we"re going to look at the disassembly of it and
learn a few things about how our program starts up. I"m going to show the output of ! but
I"m not going to show it in the order it would be dumped by ob,dump! but rather in the order it would be
executed. )#ut you"re perfectly welcome to dump it yourself. 'omething like
will sae a copy for you! and then you can use your faorite editor to look at it. )#ut -./0I 1 -eal .en /se 0I2+
But first, how do we get to _start?
$hen you run a program! the shell or gui calls which executes the linux system call . If you
want more information about then you can simply type from your shell. It will come
from section * of man where all the system calls are. To summari3e! it will set up a stack for you! and push
onto it ! ! and . The file descriptions 4! (! and *! )stdin! stdout! stderr+! are left to whateer the
shell set them to. The loader does much work for you setting up your relocations! and as we"ll see much later!
calling your preinitiali3ers. $hen eerything is ready! control is handed to your program by calling
5ere from is the section with &start.
_start is, oddly enough, where we start
xor of anything with itself sets it to 3ero. so the sets to 3ero. This is suggested by the 6#I
)6pplication #inary Interface specification+! to mark the outermost frame. 7ext we pop off the top of the
stack. 8n entry we hae ! and on the stack! so the pop makes go into . $e"re ,ust
going to sae it and push it back on the stack in a minute. 'ince we popped off ! is now pointing at
. The puts into without moing the stack pointer. Then we the stack pointer with a mask
that clears off the bottom four bits. 9epending on where the stack pointer was it will moe it lower! by 4 to (:
bytes. In any case it will make it aligned on an een multiple of (6 bytes. This alignment is done so that all of
the stack ariables are likely to be nicely aligned for memory and cache efficiency! in particular! this is
re;uired for ''E )'treaming 'I.9 Extensions+! instructions that can work on ectors of single precision floating
point simultaneously. In a particular run! the was on entry to . 6fter we popped
off the stack! was . It moed up to a higher address )putting things on the stack moes down
in memory! taking things off moes up in memory+. 6fter the the stack pointer is back at .
Now set up for calling __libc_start_main
'o now we start pushing arguments for onto the stack. The first one! is garbage
pushed onto the stack ,ust because < things are going to be pushed on the stack and they needed an 8th
one to keep the (61byte alignment. It"s neer used for anything. is linked in from glibc. In
the source tree for glibc! it lies in csu=libc1start.c. is specified like
'o we expect &start to push those arguments on the stack in reerse order before the call to
&&libc&start&main.
Page 2 of 12 Linux x86 Program Start Up
11/16/21! "ttp#//dbp$con%u&ting.com/tutoria&%/debugging/&inuxProgramStartup."tm&
Stack contents just before call of __libc_start_main
is linked into our code from glibc! and lies in the source tree in csu=elf1init.c. It"s our
program"s % leel destructor! and I"ll look at it later in the white paper.
Hey! Where's the environment variables?
9id you notice that we didn"t get enp!
the pointer to our enironment ariables
off the stack> It"s not one of the
arguments to !
either. #ut we know that is called
so what"s up>
$ell! calls ! who immediately uses secret inside information to find the
enironment ariables ,ust after the terminating null of the argument ector and then sets a global ariable
which uses thereafter wheneer it needs it including when it calls . 6fter
the is established! then &&libc&start&main uses the same trick and surprise! ?ust past the terminating null
at the end of the enp array! there"s another ector! the ELF auxiliary ector the loader uses to pass some
information to the process. 6n easy way to see what"s in there is to set the enironment ariable
before running the program. 5ere"s the result for our prog(.
Isn"t that interesting. 6ll sorts of
information. The 6T&E7T-@ is the address
of &start! there"s our userid! our effectie
userid! and our groupid. $e know we"re
a 686! times)+ fre;uency is (44! clock1
ticks=s> I"ll hae to inestigate this. The
6T&A59- is the location of the ELF
program header that has information
about the location of all the segments
of the program in memory and about
relocation entries! and anything else a
loader needs to know. 6T&A5E7T is ,ust
the number of bytes in a header entry.
$e won"t chase down this path ,ust now!
since we don"t need that much
information about the loading of a file
to be an effectie program debugger.
__libc_start_main in general
That"s about as much as I"m going to get into the nitty1gritty details of how ! but in general!
it
B Takes care of some security problems with setuid setgid programs
B 'tarts up threading
B -egisters the )our program+! and )run1time loader+ arguments to get run by to
run the program"s and the loader"s cleanup routines
Page ! of 12 Linux x86 Program Start Up
11/16/21! "ttp#//dbp$con%u&ting.com/tutoria&%/debugging/&inuxProgramStartup."tm&

B %alls the argument
B %alls the with the and arguments passed to it and with the global &&eniron argument as
detailed aboe.
B %alls with the return alue of main
Calling the argument
The argument! to ! is set to which is also linked into our code. It"s
compiled from a % program which lies in the glibc source tree in csu=elf1init.c and linked into our program.
The % code is similar to )but with a lot more Cifdefs+!
This is our program's constructor
It"s pretty important to our program
because it"s our executable"s
constructor. D$aitED! you say! DThis isn"t
%FFED. @es that"s true! but the concept of
constructors and destructors doesn"t
belong to %FF! and preceeded %FFE
8ur executable! and eery other
executable gets a % leel constructor
and a % leel
destructor! . Inside the
constructor! as you"ll see! the
executable will look for global % leel constructors and call any that it finds. It"s possible for a % program to
also hae these! and I"ll demonstrate it before this paper is through. If it makes you more comfortable though!
you can call them initiali3ers and finali3ers. 5ere"s the assembler generated for .
What the
heck is a
thunk?
7ot
much to
talk
about
here! but
I thought
you"d
want to
see it.
The
get&pc&thunk thing is a little interesting. It"s used for position independent code. They"re setting up for position
independent code to be able to work. In order for it to work! the base pointer needs to hae the address of
the GL8#6L&8FF'ET&T6#LE. The code had something likeH
'o! look
closely
at what
Page 4 of 12 Linux x86 Program Start Up
11/16/2013 http://dbp-on!u"ting#om/tutoria"!/debugging/"inuxProgramStartup#htm"
happens. The call to ! like all other calls! pushes onto the stack the address of the next
instruction! so that when we return! the execution continues at the next consecutie instruction. In this case!
what we really want is that address. 'o in ! we copy the return address from the stack into
. $hen we return! the next instruction adds to it &GL8#6L&8FF'ET&T6#LE& which resoles to the difference
between the current address and the global offset table used by position independent code. That table
keeps a set of pointers to data that we want to access! and we ,ust hae to know offsets into the table. The
loader fixes up the address in the table for us. There is a similar table for accessing procedures. It could be
really tedious to program this way in assembler! but you can ,ust write % or %FF and pass the 1pic argument
to the compiler and it will do it automagically. 'eeing this code in the assembler tells you that the source
code was compiled with the 1pic flag.
But what is that loop?
The loop from will be discussed in a minute after we discuss the init)+ call that really calls
. For now! ,ust remember that it calls any % leel initiali3ers for our program.
_init gets the call
8k! the loader handed control to ! who called who called who
now calls .
It starts with the regular C calling convention
If you want to know more about the % calling conention! ,ust look at #asic 6ssembler 9ebugging with G9#.
The short story is that we sae our caller"s base pointer on the stack and point our base pointer at the top of
the stack and then sae space for a I byte local of some sort. 6n interesting thing is the first call. It"s purpose is
;uite similar to that call to get&pc&thunk that we saw earlier. If you look closely! the call is to the next
se;uential addressE That gets you to the next address as if you"d ,ust continued! but with the side effect that
the address is now on the stack. It gets popped into Jebx and then used to set up for access to the global
access table.
Show me your best profile
Then we grab the address of . If it"s 3ero then we don"t call it! instead we ,ump past it. 8therwise!
we call it to set up profiling. It runs a routine to start profiling! and calls at&exit to schedule another routine to
run later to write gmon.out at the end of execution.
This guy's no dummy! e's been framed!
In either case! next we call frame&dummy. The intention is to call &&register&frame&info! but frame&dummy is
called to set up the arguments to it. The purpose of this is to set up for unwinding stack frames for exception
handling. It"s interesting! but not a part of this discussion! so I"ll leae it for another tutorial perhaps. )9on"t be
too disappointed! in our case! it doesn"t get run anyway.+
!inally we're getting constructive!
Finally we call &do&global&ctors&aux. If you hae a problem with your program that occurs before main starts!
this is probably where you"ll need to look. 8f course! constructors for global %FF ob,ects are put in here but it"s
possible for other things to be in here as well.
Page 5 of 12 Linux x86 Program Start Up
11/16/2013 http://dbp-on!u"ting#om/tutoria"!/debugging/"inuxProgramStartup#htm"
Let's set up an example
Let's modify our prog1 and make a prog2. The exciting part is the that tells
gcc that the linker should stick a pointer to this in the table used by . As you can see,
our fake constructor gets run. (!"#$T%&# is filled in by the compiler 'ith the name of the function. %t's gcc
magic.(
prog2's _init, much the same as prog1
%n a minute 'e'll drop into gdb and see it happen. )e'll be going into prog2's init.
As you can see, the addresses are slightly different than in prog1. The extra bit of data seems to ha*e shifted
things 2+ bytes. ,o, there's the name of the t'o functions, -aconstructor- (1. bytes 'ith null terminator(, and
-main- (/ bytes 'ith null terminator( and the t'o format strings, -0s1n- (22. bytes 'ith the ne'line as 1
character and the null terminator(, so 1. 3 / 3 . 3 . 4 256 7mmm off by one some'here. %t's 8ust a guess
any'ay, % didn't go and look. Any'ay, 'e're going to break on the call to doglobalctorsaux, and then
single step and 'atch 'hat happens.
And here's the code that will get called
9ust to help, here's the $ source code for out of the gcc source code 'here it li*es in
a file .
As you can see, it initiali:es from a
global *ariable and
subtracts 1 from it. ;emember this is
pointer arithmetic though and the
pointer points at a function, so in this
case, that <1 backs it up one function
pointer, or . bytes. )e'll see that in the
assembler as 'ell. )hile the pointer doesn't ha*e a *alue of <1 (cast to a pointer(, 'e'll call the function 'e're
pointing at, and then back the pointer up again. &b*iously, the beginning of this table starts 'ith <1, and then
has some number (perhaps =( function pointers.
Page 6 of 12 Linux x86 Program Start Up
11/16/2013 http://dbp-conu!ting"com/tutoria!/debugging/!inuxProgramStartup"htm!
Here's the same in assembler
7ere's the assembler that corresponds to it from ob8dump <d. )e'll go o*er it carefully so you understand it
completely before 'e trace through it in the debugger.
First the preamble
There's the normal preamble 'ith the addition of sa*ing as 'ell because 'e're going to use it in the
function, and 'e also sa*e room for the pointer . >ou'll notice that e*en though 'e sa*e room on the stack
for it, 'e ne*er store it there. 'ill instead li*e in , and 'ill li*e in .
Now set up before the loop
%t looks like an optimi:ation has occurred, instead of loading and then subtracting 1 from it,
and dereferencing it, instead, 'e go ahead and load , 'hich is the immediate *alue
. )e load the *alue in it (remember 'ould mean put that *alue, 'ithout the ?, 8ust
means the contents of that address(, into 0eax. %mmediately, 'e compare this first *alue 'ith <1
and if it's e@ual, 'e're done and 8ump to address , 'here 'e clean up our stack, pop off the things
'e'*e sa*ed on there and return.
Assuming that there's at least one thing in the function table, though, 'e also mo*e the immediate *alue
, into 'hich is our function pointer, and then do the . )hat the heck is that6
)ell, grasshopper, that is 'hat they use for a nop (#o &Aeration( in 1B or C2 bit x+B. %t does nothing but take a
cycle and some space. %n this case, it's used to make the loop (the top of the loop is the subtract on the next
line( start on instead of . The ad*antage of that is that it aligns the start of the loop on a .
byte boundary and gi*es a better chance that the 'hole loop 'ill fit in a cache line instead of being broken
across t'o. %t speeds things up.
And now we hit the top of the loop
#ext 'e subtract . from to be ready for the next time through the loop, call the function 'e'*e got the
address of in , mo*e the next function pointer into , and compare it to <1. %f it's not <1 'e 8ump back
up to the subtract and loop again.
And finally the epilogue
&ther'ise 'e fall through into our function epilogue and return to , 'hich immediately falls through into
its epilogue and returns to . Det you forgot all about him. There's still a loop to deal 'ith
there but first<<
I promised you we'd go into the debugger with prog2!
,o here 'e goE ;emember that gdb al'ays sho's you the line or instruction that you are about to execute.
Page 7 of 12 Linux x86 Program Start Up
11/16/2013 http://dbp-on!u"ting#om/tutoria"!/debugging/"inuxProgramStartup#htm"
)e ran it in the debugger, turned on, so that it 'ill al'ays sho' us the disassembly
for the line of code that is about to be executed, and set a breakpoint at the line in 'here 'e're about
to call .
% typed r to run the program and hit the breakpoint. Fy next command to gdb 'as , step instruction, to tell
gdb to single step one instruction. )e'*e no' entered . As 'e go along you'll see
times 'hen it seems that % entered no command to gdb. That's because, if you simply press return, gdb 'ill
repeat the last instruction. ,o if % press enter no', %'ll do another si.
&k, no' 'e'*e finished the preamble, and the real code is about to start.
% 'as curious after loading the pointer so % told gdb 'hich means print as hexadecimal the contents
of the register . %t's not <1, so 'e can assume that 'e'll continue through the loop. #o', since my last
command 'as the print, % can't hit enter to get an si, %'ll ha*e to type it the next time.
Page 8 of 12 Linux x86 Program Start Up
11/16/2013 http://dbp-on!u"ting#om/tutoria"!/debugging/"inuxProgramStartup#htm"
#o' this is *ery interesting. )e'*e single stepped into the call. #o' 'e're in our function, .
,ince gdb has the source code for it, it sho's us the $ source for the next line. ,ince % turned on
, it 'ill also gi*e us the assembler that corresponds to that line. %n this case, it's the
preamble for the function that corresponds to the declaration of the function, so 'e get all three lines of the
preamble. %sn't that interesting6 #o' %'m going to s'itch o*er to the command n (for next( because our printf
is coming up. The first n 'ill skip the preamble, the second the printf, and the third the epilogue. %f you'*e e*er
'ondered 'hy you ha*e to do an extra step at the beginning and end of a function 'hen single stepping
'ith gdb, no' you kno' the ans'er.
)e mo*ed the address of the string -aconstructor- onto the stack as an argument for , but it calls
since the compiler 'as smart enough to see that 'as all 'e needed.
,ince 'e're tracing the program, it is, of course running, so 'e see print out abo*e. The
closing brace (G( corresponds to the epilogue so that prints out no'. 9ust a note, if you don't kno' about the
instruction it does exactly the same as
&ne more step and 'e exit the function and return, %'ll ha*e to s'itch back to si.
Hot curious and checked again. This time, our function pointer is <1, so 'e'll exit the loop.
#otice 'e're back in no'.
Page 9 of 12 Linux x86 Program Start Up
11/16/2013 http://dbp-on!u"ting#om/tutoria"!/debugging/"inuxProgramStartup#htm"
#otice 'e 8umped back up into , and that's 'hen % typed @ to @uite the debugger. That's all
the debugging % promised you. #o' that 'e're back in libccsuinit there's another loop to deal 'ith,
and %'m not going to step through it, but % am about to talk about it.
Back up to
,ince 'e'*e spent a long tedious time dealing 'ith a loop in assembler and the assembler for this one is e*en
more tedious, %'ll lea*e it to you to figure it out if you 'ant. 9ust to remind you, here it is in $.
Here's another function call loop
)hat is this initarray6 % thought you'd ne*er ask. >ou can ha*e code run at this stage as 'ell. ,ince this is
8ust after returning from running 'hich ran our constructors, that means anything in this array 'ill run
after constructors are done. >ou can tell the compiler you 'ant a function to run at this phase. The function
'ill recei*e the same arguments as main.
)e 'on't do it, yet, because there's more things like that. Lets 8ust return from . Io you
remember 'here that 'ill take us6
We'll be all the way back in
7e calls our main no', and then passes the result to exit((.
exit() runs some more loops of functions
exit(( runs the functions registered 'ith atexit run in the order they 'ere added. Then he runs another loop of
functions, this time, functions in the fini array. After that he runs another loop of functions, this time destructors.
(%n reality, he's in a nested loop dealing 'ith an array of lists of functions, but trust me this is the order they
come out in.( 7ere, %'ll sho' you.
This program, hooks.c ties it all together
Page 10 of 12 Linux x86 Program Start Up
11/16/2013 http://dbp-on!u"ting#om/tutoria"!/debugging/"inuxProgramStartup#htm"
If you build and run this, (I call it hooks.c), the output is
The End
I'll give you a last look at how far we've come. This time it should all be familiar territory to you.
Page 11 of 12 Linux x86 Program Start Up
11/16/2013 http://dbp-conu!ting"com/tutoria!/debugging/!inuxProgramStartup"htm!
(Back to debugging.)
Page 12 of 12 Linux x86 Program Start Up
11/16/21! "ttp#//dbp$con%u&ting.com/tutoria&%/debugging/&inuxProgramStartup."tm&

Das könnte Ihnen auch gefallen