Beruflich Dokumente
Kultur Dokumente
M. D. Jones, Ph.D.
Center for Computational Research University at Buffalo State University of New York
1 / 89
2 / 89
Introduction
The most common method for debugging (by far) is the instrumentation method: One instruments the code with print statements to check values and follow the execution of the program Not exactly sophisticated - one can certainly debug code in this way, but wise use of software debugging tools can be more effective
4 / 89
Introduction
Debugging Tools
Debugging tools are abundant, but we will focus merely on some of the most common attributes to give you a bag of tricks that can be used when dealing with common problems.
5 / 89
Introduction
Basic Capabilities
Common attributes: Divided into command-line or graphical user interfaces Usually have to recompile (-g is almost a standard option to enable debugging) your code to utilize most debugger features Invocation by name of debugger and executable (e.g. gdb ./a.out [core])
6 / 89
Introduction
Running Within
Inside a debugger (be it using a command-line interface (CLI) or graphical front-end), you have some very handy abilities: Look at source code listing (very handy when isolating an IEEE exception) Line-by-line execution Insert stops or breakpoints at certain functional points (i.e., when critical values change) Ability to monitor variable values Look at stack trace (or backtrace) when code crashes
7 / 89
Introduction
8 / 89
Introduction
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
/ P r i n t t h e elements o f a r r a y 1 / p r i n t f ( " array1 = " ) ; p r i n t A r r a y ( nelem , a r r a y 1 ) ; / Copy a r r a y 1 to a r r a y 2 / array2 = array1 ; / Pass a r r a y 2 to t h e function squareArray ( ) / squareArray ( nelem , a r r a y 2 ) ; / Compute d i f f e r e n c e between elements o f a r r a y 2 and a r r a y 1 / f o r ( i n d x = 0 ; i n d x < nelem ; i n d x ++) { del [ indx ] = array2 [ indx ] array1 [ indx ] ; } / P r i n t t h e computed d i f f e r e n c e s / p r i n t f ( " The d i f f e r e n c e i n t h e elements o f a r r a y 2 and a r r a y 1 are : p r i n t A r r a y ( nelem , d e l ) ; free ( array1 ) ; free ( array2 ) ; free ( del ) ; return 0; }
" );
9 / 89
Introduction
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
v o i d i n i t A r r a y ( c o n s t i n t nelem_in_array , i n t a r r a y ) { f o r ( i n d x = 0 ; i n d x < n e le m_ i n_ ar r ay ; i n d x ++) { array [ indx ] = indx + 1; } } i n t squareArray ( c o n s t i n t nelem_in_array , i n t a r r a y ) { i n t indx ; f o r ( i n d x = 0 ; i n d x < n e le m_ i n_ ar r ay ; i n d x + + ) { a r r a y [ i n d x ] = a r r a y [ i n d x ] ; } return array ; } v o i d p r i n t A r r a y ( c o n s t i n t nelem_in_array , i n t a r r a y ) { printf ( " \n( " ); f o r ( i n d x = 0 ; i n d x < n e le m_ i n_ ar r ay ; i n d x + + ) { p r i n t f ( "%d " , a r r a y [ i n d x ] ) ; } p r i n t f ( " ) \ n" ) ; }
10 / 89
Introduction
Not exactly what we expect, is it? Array2 should contain the squares of the values in array1, and therefore the difference should be i 2 i for i = [2, 11].
11 / 89
Introduction
Now let us run the code from within gdb. Our goal is to set a breakpoint where the squared arrays elements are computed, then step through the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [ bono : ~ / d_debug ] $ gdb a r r a yex ( gdb ) l 34 31 32 / Copy a r r a y 1 t o a r r a y 2 / 33 array2 = array1 ; 34 35 / Pass a r r a y 2 t o t h e f u n c t i o n squareArray ( ) / 36 squareArray ( nelem , a r r a y 2 ) ; 37 38 / Compute d i f f e r e n c e between elements o f a r r a y 2 and a r r a y 1 / 39 f o r ( i n d x = 0 ; i n d x < nelem ; i n d x ++) { 40 del [ indx ] = array2 [ indx ] array1 [ indx ] ; ( gdb ) b 34 B r e a k p o i n t 1 a t 0x400604 : f i l e ex1 . c , l i n e 3 4 . ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / a r r a yex array1 = ( 2 3 4 5 6 7 8 9 10 11 )
12 / 89
Introduction
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
B r e a k p o i n t 1 , main ( ) a t a r r a yex . c : 3 4 34 squareArray ( nelem , a r r a y 2 ) ; ( gdb ) s squareArray ( n el e m_ i n_ ar r ay =10 , a r r a y =0x501010 ) a t a r r a yex . c : 5 9 59 f o r ( i n d x = 0 ; i n d x < n e le m_ i n_ ar r ay ; i n d x + + ) { ( gdb ) s 60 a r r a y [ i n d x ] = a r r a y [ i n d x ] ; ( gdb ) s 59 f o r ( i n d x = 0 ; i n d x < n e le m_ i n_ ar r ay ; i n d x + + ) { ( gdb ) p i n d x $1 = 0 ( gdb ) p a r r a y [ i n d x ] $2 = 4 ( gdb ) d i s p l a y i n d x 1: indx = 0 ( gdb ) d i s p l a y a r r a y [ i n d x ] 2: array [ indx ] = 4 ( gdb ) s 60 a r r a y [ i n d x ] = a r r a y [ i n d x ] ; 2: array [ indx ] = 3 1: indx = 1
13 / 89
Introduction
So, what have we learned so far about the command-line debugger: Useful for peaking inside source code (break) Breakpoints (s) Stepping through execution (p) Print values at selected points (can also use handy printf syntax as in C) (display) Displaying values for monitoring while stepping through code (bt) Backtrace, or Stack Trace - havent used this yet, but certainly will
14 / 89
Introduction
");
15 / 89
Introduction
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
B r e a k p o i n t 1 , main ( ) a t a r r a yex . c : 3 7 37 f o r ( i n d x = 0 ; i n d x < nelem ; ( gdb ) d i s p i n d x 1 : i n d x = 10 ( gdb ) d i s p a r r a y 1 [ i n d x ] 2 : a r r a y 1 [ i n d x ] = 49 ( gdb ) d i s p a r r a y 2 [ i n d x ] 3 : a r r a y 2 [ i n d x ] = 49 ( gdb ) s 38 del [ indx ] = array2 [ indx ] 3: array2 [ indx ] = 4 2: array1 [ indx ] = 4 1: indx = 0 ( gdb ) s 37 f o r ( i n d x = 0 ; i n d x < nelem ; 3: array2 [ indx ] = 4 2: array1 [ indx ] = 4 1: indx = 0 ( gdb ) s 38 del [ indx ] = array2 [ indx ] 3: array2 [ indx ] = 9 2: array1 [ indx ] = 9 1: indx = 1
i n d x ++) {
array1 [ indx ] ;
i n d x ++) {
array1 [ indx ] ;
16 / 89
Introduction
Now that isnt right - array1 was not supposed to change. Let us go back and look more closely at the call to squareArray ...
17 / 89
Introduction
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
( gdb ) l 32 33 / Pass a r r a y 2 t o t h e f u n c t i o n squareArray ( ) / 34 squareArray ( nelem , a r r a y 2 ) ; 35 36 / Compute d i f f e r e n c e between elements o f a r r a y 2 and a r r a y 1 / 37 f o r ( i n d x = 0 ; i n d x < nelem ; i n d x ++) { 38 del [ indx ] = array2 [ indx ] array1 [ indx ] ; 39 } 40 41 / P r i n t t h e computed d i f f e r e n c e s / ( gdb ) b 34 B r e a k p o i n t 2 a t 0x400605 : f i l e a r r a yex . c , l i n e 3 4 . ( gdb ) run The program being debugged has been s t a r t e d a l r e a d y . S t a r t i t from t h e b e g i n n i n g ? ( y o r n ) y S t a r t i n g program : / san / user / jonesm / u2 / d_debug / a r r a yex array1 = ( 2 3 4 5 6 7 8 9 10 11 ) B r e a k p o i n t 2 , main ( ) a t a r r a yex . c : 3 4 34 squareArray ( nelem , a r r a y 2 ) ; 3 : a r r a y 2 [ i n d x ] = 49 2 : a r r a y 1 [ i n d x ] = 49 1 : i n d x = 10 ( gdb ) d i s p a r r a y 2 4 : a r r a y 2 = ( i n t ) 0x501010 ( gdb ) d i s p a r r a y 1 5 : a r r a y 1 = ( i n t ) 0x501010
18 / 89
Introduction
Yikes, array1 and array2 point to the same memory location! See, pointer errors like this dont happen too often in Fortran ... Now , of course, the bug is obvious - but arent they all obvious after you nd them?
19 / 89
Introduction
The Fix Is In
Just as an afterthought, what we ought to have done in the rst place was copy array1 into array2:
/ Copy a r r a y 1 t o a r r a y 2 / / a r r a y 2 = a r r a y 1 ; / f o r ( i n d x =0; indx <nelem ; i n d x ++) { array2 [ indx ]= array1 [ indx ] ; }
20 / 89
Array indexing errors are one of the most common errors in both sequential and parallel codes - and it is not entirely surprising: Different languages have different indexing defaults Multi-dimensional arrays are pretty easy to reference out-of-bounds Fortran in particular lets you use very complex indexing schemes (essentially arbitrary!)
22 / 89
23 / 89
Ok, that hardly seems reasonable (does it?) Now, lets run this example from within gdb and set a breakpoint to examine the accumulation of values to even_sum.
24 / 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
( gdb ) l 16 11 a r r [ i ] = ( i i )%5; 12 } 13 } 14 odd_sum =0; 15 even_sum =0; 16 f o r ( i =0; i <(N1);++ i ) { 17 i f ( i %2==0) { 18 even_sum += a r r [ i ] ; 19 } else { 20 odd_sum += a r r [ i ] ; ( gdb ) b 16 B r e a k p o i n t 1 a t 0x40051e : f i l e ex2 . c , l i n e 1 6 . ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / ex2 B r e a k p o i n t 1 , main ( argc= V a r i a b l e " argc " i s n o t a v a i l a b l e . ) a t ex2 . c : 1 6 16 f o r ( i =0; i <(N1);++ i ) { ( gdb ) p a r r $1 = {671173696 , 1 , 1 , 0 , 1 , 0 , 1 , 4 , 4 , 0 }
25 / 89
So we see that our original example code missed initializing the rst element of the array, and the results were rather erratic (in fact they will likely be compiler and ag dependent). Initialization is just one aspect of things going wrong with array indexing - let us examine another common problem ...
26 / 89
This example I borrowed from Norman Matloff (UC Davis), who has a nice article (well worth the time to read): Guide to Faster, Less Frustrating Debugging, which you can nd easily enough on the web: http://heather.cs.ucdavis.edu/~matloff/unix.html
27 / 89
28 / 89
Function FindPrime:
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 void CheckPrime ( i n t K) { int J ; / t h e p l a n : see i f J d i v i d e s K , f o r a l l v a l u e s J which are ( a ) themselves prime ( no need t o t r y J i f i t i s nonprime ) , and ( b ) l e s s than o r equal t o s q r t (K) ( i f K has a d i v i s o r l a r g e r than t h i s square r o o t , i t must a l s o have a s m a l l e r one , so no need t o check f o r l a r g e r ones ) / J = 2; while ( 1 ) { i f ( Prime [ J ] == 1 ) i f ( K % J == 0 ) { Prime [ K ] = 0 ; return ; } J ++; } / i f we g e t here , then t h e r e were no d i v i s o r s o f K , so i t i s prime / Prime [ K ] = 1 ; }
1 2 3 4 5
30 / 89
Now, the scanf intrinsic is probably pretty safe from internal bugs, so the error is likely coming from our usage:
1 2 3 4 5 6 7 8 9 10 11 ( gdb ) l 16 s c a n f ("%d " , UpperBound ) ; 17 18 Prime [ 2 ] = 1 ; 19 20 f o r (N = 3 ; N <= UpperBound ; N += 2 ) 21 CheckPrime (N ) ; 22 i f ( Prime [N ] ) p r i n t f ("%d i s a prime \ n " ,N ) ; 23 } 24 25 v o i d CheckPrime ( i n t K) {
that takes care of the rst bug ... but lets keep running from within gdb
31 / 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[ bono : ~ / d_debug ] $ gcc g o f i n d p r i m e s f i n d p r i m e s . c [ bono : ~ / d_debug ] $ gdb f i n d p r i m e s ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / f i n d p r i m e s e n t e r upper bound 20 Program r e c e i v e d s i g n a l SIGSEGV, Segmentation f a u l t . 0x0000000000400586 i n CheckPrime (K=3) a t f i n d p r i m e s . c : 3 7 37 i f ( Prime [ J ] == 1 ) ( gdb ) b t #0 0x0000000000400586 i n CheckPrime (K=3) a t f i n d p r i m e s . c : 3 7 #1 0x0000000000400547 i n main ( ) a t f i n d p r i m e s . c : 2 1 ( gdb ) l 37 32 than t h i s square r o o t , i t must a l s o have a s m a l l e r one , 33 so no need t o check f o r l a r g e r ones ) / 34 35 J = 2; 36 while (1) { 37 i f ( Prime [ J ] == 1 ) 38 i f (K % J == 0 ) { 39 Prime [ K ] = 0 ; 40 return ; 41 } ( gdb )
32 / 89
very often we get seg faults on trying to reference an array out-of-bounds, so have a look at the value of J:
26 27 28 29 30 31 32 33 34 35 36 37 38 ( gdb ) l 37 32 than t h i s square r o o t , i t must a l s o have a s m a l l e r one , 33 so no need t o check f o r l a r g e r ones ) / 34 35 J = 2; 36 while (1) { 37 i f ( Prime [ J ] == 1 ) 38 i f (K % J == 0 ) { 39 Prime [ K ] = 0 ; 40 return ; 41 } ( gdb ) p J $1 = 376
Oops! That is just a tad outside the bounds (50). Kind of forgot to put a cap on the value of J ...
33 / 89
34 / 89
Ok, so now we will set a couple of breakpoints - one at the call to FindPrime and the second where a successful prime is to be output:
35 / 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
( gdb ) l 16 s c a n f ("%d " ,& UpperBound ) ; 17 18 Prime [ 2 ] = 1 ; 19 20 f o r (N = 3 ; N <= UpperBound ; N += 2 ) 21 CheckPrime (N ) ; 22 i f ( Prime [N ] ) p r i n t f ("%d i s a prime \ n " ,N ) ; 23 } 24 25 v o i d CheckPrime ( i n t K) { ( gdb ) b 20 B r e a k p o i n t 1 a t 0x40052d : f i l e f i n d p r i m e s . c , l i n e 2 0 . ( gdb ) b 22 B r e a k p o i n t 2 a t 0x400550 : f i l e f i n d p r i m e s . c , l i n e 2 2 . ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / f i n d p r i m e s e n t e r upper bound 20 B r e a k p o i n t 1 , main ( ) a t f i n d p r i m e s . c : 2 0 20 f o r (N = 3 ; N <= UpperBound ; N += 2 ) ( gdb ) c Continuing . B r e a k p o i n t 2 , main ( ) a t f i n d p r i m e s . c : 2 2 22 i f ( Prime [N ] ) p r i n t f ("%d i s a prime \ n " ,N ) ; ( gdb ) p N $1 = 21 ( gdb )
36 / 89
Ah, the sweet taste of success ... (even better, give the program a return code!)
M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 37 / 89
Game of Life
Well, ok, not exactly debugging life itself; rather the game of life. Mathematician John Horton Conways game of life1 , to be exact. This example will basically be similar to the prior examples, but now we will work in Fortran, and debug some integer arithmetic errors. And the context will be slightly more interesting.
see, for example, Martin Gardners article in Scientic American, 223, pp. 120-123 (1970).
M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 39 / 89
Game of Life
Game of Life
The Game of Life is one of the better known examples of cellular automatons (CA), namely a discrete model with a nite number of states, often used in theoretical biology, game theory, etc. The rules are actually pretty simple, and can lead to some rather surprising self-organizing behavior. The universe in the game of life: Universe is an innite 2D grid of cells, each of which is alive or dead Cells interact only with nearest neighbors (including on the diagonals, which makes for eight neighbors)
40 / 89
Game of Life
Rules of Life
The rules in the game of life: Any live cell with fewer than two neighbours dies, as if by loneliness Any live cell with more than three neighbours dies, as if by overcrowding Any live cell with two or three neighbours lives, unchanged, to the next generation Any dead cell with exactly three neighbours comes to life An initial pattern is evolved by simultaneously applying the above rules to the entire grid, and subsequently at each tick of the clock.
41 / 89
Game of Life
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
program l i f e ! ! Conway game o f l i f e ( debugging example ) ! i m p l i c i t none integer , parameter : : n i =1000 , n j =1000 , nsteps = 100 i n t e g e r : : i , j , n , im , i p , jm , j p , nsum , isum integer , dimension ( 0 : n i , 0 : n j ) : : old , new r e a l : : arand , nim2 , njm2 ! ! i n i t i a l i z e elements o f " o l d " t o 0 o r 1 ! do j = 1 , n j do i = 1 , n i CALL random_number ( arand ) o l d ( i , j ) = NINT ( arand ) enddo enddo nim2 = n i 2 njm2 = n j 2
42 / 89
Game of Life
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
! ! time i t e r a t i o n ! t i m e _ i t e r a t i o n : do n = 1 , nsteps do j = 1 , n j do i = 1 , n i ! ! p e r i o d i c boundaries , ! im = 1 + ( i +nim2 ) ( ( i +nim2 ) / n i ) n i ! if i p = 1 + i ( i / n i ) n i ! if jm = 1 + ( j +njm2 ) ( ( j +njm2 ) / n j ) n j ! if j p = 1 + j ( j / n j ) n j ! if ! ! f o r each p o i n t , add s u r r o u n d i n g v a l u e s ! nsum = o l d ( im , j p ) + o l d ( i , j p ) + o l d ( i p , j p ) + o l d ( im , j ) + old ( ip , j ) + o l d ( im , jm ) + o l d ( i , jm ) + o l d ( i p , jm )
i =1 , i =ni , j =1 , j =nj ,
ni 1 nj 1
& &
43 / 89
Game of Life
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
! ! s e t new v a l u e based on number o f " l i v e " n e i gh b o r s ! s e l e c t case ( nsum ) case ( 3 ) new ( i , j ) = 1 case ( 2 ) new ( i , j ) = o l d ( i , j ) case d e f a u l t new ( i , j ) = 0 end s e l e c t enddo enddo ! ! copy new s t a t e i n t o o l d s t a t e ! o l d = new p r i n t , T i c k , n , number o f l i v i n g : ,sum ( new ) enddo t i m e _ i t e r a t i o n ! ! w r i t e number o f l i v e p o i n t s ! p r i n t , number o f l i v e p o i n t s = , sum ( new ) end program l i f e
44 / 89
Game of Life
1 2 3 4 5 6 7 8 9 10 11 12 13
[ bono : ~ / d_debug ] $ i f o r t g [ bono : ~ / d_debug ] $ . / l i f e Tick 1 number Tick 2 number Tick 3 number Tick 4 number Tick 5 number Tick 6 number : : Tick 99 number Tick 100 number number o f l i v e p o i n t s =
o l i f e of of of of of of
of l i v i n g : of l i v i n g : 0
0 0
Hmm, everybody dies! What kind of life is that? ... well, not a correct one, in this context, at least. Undoubtedly the problem lies within the neighbor calculation, so let us take a closer look at the execution ...
45 / 89
Game of Life
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
( gdb ) l 30 25 do j = 1 , n j 26 do i = 1 , n i 27 ! 28 ! p e r i o d i c boundaries 29 ! 30 im = 1 + ( i +nim2 ) ( ( i +nim2 ) / n i ) n i 31 i p = 1 + i ( i / n i ) n i 32 jm = 1 + ( j +njm2 ) ( ( j +njm2 ) / n j ) n j 33 j p = 1 + j ( j / n j ) n j ( gdb ) b 25 B r e a k p o i n t 1 a t 0x402e23 : f i l e l i f e . f90 , l i n e 2 5 . ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / l i f e
! ! ! !
if if if if
i =1 , i =ni , j =1 , j =nj ,
ni 1 nj 1
Breakpoint 1 , l i f e ( ) at l i f e . f90 :25 25 do j = 1 , n j C u r r e n t language : auto ; c u r r e n t l y f o r t r a n ( gdb ) s 26 do i = 1 , n i ( gdb ) s 30 im = 1 + ( i +nim2 ) ( ( i +nim2 ) / n i ) n i ! i f ( gdb ) s 31 i p = 1 + i ( i / n i ) n i ! if ( gdb ) p r i n t im $1 = 1 ( gdb ) p r i n t ( i +nim2 ) / 1 0 0 0 $2 = 0.999
i =1 , n i i =ni , 1
46 / 89
Game of Life
Ok, so therein lay the problem - nim2 and njm2 should be integers, not real values ... x that:
1 2 3 4 5 6 7 8 9 program l i f e ! ! Conway game o f l i f e ( debugging example ) ! i m p l i c i t none integer , parameter : : n i =1000 , n j =1000 , nsteps = 100 i n t e g e r : : i , j , n , im , i p , jm , j p , nsum , isum , nim2 , njm2 integer , dimension ( 0 : n i , 0 : n j ) : : old , new r e a l : : arand
of l i v i n g : of l i v i n g : 94664
95073 94664
47 / 89
Game of Life
http://www.radicaleye.com/lifepage http://en.wikipedia.org/wiki/Conways_Game_of_Life Interesting repository of Conways life and cellular automata references.
48 / 89
Core Files
Core Files
Core les can also be used to instantly analyze problems that caused a code failure bad enough to dump a core le. Often the computer system has been set up in such a way that the default is not to output core les, however:
1 2 [ bono : ~ / d_debug ] $ u l i m i t c 0
for bash syntax. In tcsh you would use the limit built-in command to set the coredumpsize value:
1 2 3 [ bono : ~ / d_debug ] $ t c s h [ jonesm@bono ~ / d_debug ] $ l i m i t coredumpsize coredumpsize u n l i m i t e d
50 / 89
Core Files
Systems administrators set the core le size limit to zero by default for a good reason - these les generally contain the entire memory image of an application process when it dies, and that can be very large. End-users are also notoriously bad about leaving these les laying around ... Having said that, we can up the limit, and produce a core le that can later be used for analysis.
51 / 89
Core Files
Ok, so now we can use one of our previous examples, and generate a core le:
1 2 3 4 5 6 7 [ bono : ~ / d_debug ] $ gcc g o f i n d p r i m e s _ o r i g f i n d p r i m e s _ o r i g . c [ bono : ~ / d_debug ] $ . / f i n d p r i m e s _ o r i g e n t e r upper bound 20 Segmentation f a u l t ( core dumped ) [ bono : ~ / d_debug ] $ l s l core .7428 rw - - - 1 jonesm c c r s t a f f 65536 Sep 27 12:15 core .7428
52 / 89
Core Files
this particular core le is not at all large (it is a very simple code, though, with very little stored data - generally the core le size will reect the size of the application in terms of its memory use when it crashed). Analyzing it is pretty much like we did when running this example live in gdb:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 [ bono : ~ / d_debug ] $ l s l core .7428 rw - - - 1 jonesm c c r s t a f f 65536 Sep 27 12:15 core .7428 [ bono : ~ / d_debug ] $ gdb f i n d p r i m e s _ o r i g core .7428 GNU gdb Red Hat L i n u x ( 6 . 3 . 0 . 0 1 . 1 4 3 . e l 4 r h ) ... Core was generated by . / f i n d p r i m e s _ o r i g . Program t e r m i n a t e d w i t h s i g n a l 11 , Segmentation f a u l t . Reading symbols from / l i b 6 4 / t l s / l i b c . so . 6 . . . done . Loaded symbols f o r / l i b 6 4 / t l s / l i b c . so . 6 Reading symbols from / l i b 6 4 / l dl i n u xx8664.so . 2 . . . done . Loaded symbols f o r / l i b 6 4 / l dl i n u xx8664.so . 2 #0 0x000000392815062f i n _ I O _ v f s c a n f _ i n t e r n a l ( ) from / l i b 6 4 / t l s / l i b c . so . 6 ( gdb ) b t #0 0x000000392815062f i n _ I O _ v f s c a n f _ i n t e r n a l ( ) from / l i b 6 4 / t l s / l i b c . so . 6 #1 0x000000392815866a i n s c a n f ( ) from / l i b 6 4 / t l s / l i b c . so . 6 #2 0x0000000000400524 i n main ( ) a t f i n d p r i m e s _ o r i g . c : 1 6 ( gdb ) l 16 11 i n t main ( ) 12 { 13 i n t N; 14 15 p r i n t f ( " e n t e r upper bound \ n " ) ; 16 s c a n f ("%d " , UpperBound ) ;
53 / 89
Core Files
So why would you want to use a core le rather than interactively debug? Your bug may take quite a while to manifest itself You have to debug inside a batch queuing system where interactive use is difcult or curtailed You want to capture a picture of the code state when it crashes
54 / 89
We focused on gdb, but there are command-line debuggers that accompany just about every available compiler product: pgdbg part of the PGI compiler suite, defaults to a GUI, but can be run as a command line interface (CLI) using the -text option idb part of the Intel compiler suite, defaults to CLI (has a special option -gdb for using gdb command syntax)
55 / 89
Most compilers support run-time checks than can quickly catch common bugs. Here is a handy short-list (contributions welcome!): For Intel fortran, -check bounds -traceback -g will automate bounds checking, and enable extensive traceback analysis in case of a crash (leave out the bounds option to get a crash report on any IEEE exception, format mismatch, etc.) For PGI compilers, -Mbounds -g will do bounds checking For GNU compilers, -fbounds-check -g should also do bounds checking, but is only currently supported for Fortran and Java front-ends.
56 / 89
WARNING It should be noted that run-time error checking can very much slow down a codes execution, so it is not something that you will want to use all of the time.
57 / 89
There are, of course, a matching set of GUIs for the various debuggers. A short list: ddd a graphical front-end for the venerable gdb pgdbg GUI for the PGI debugger idb -gui GUI for Intel compiler suite debugger It is very much a matter of preference whether or not to use the GUI. I nd the GUI to be constraining, but it does make navigation easier.
58 / 89
DDD Example
Running one of our previous examples using ddd ...
59 / 89
More information on the tools that we have used/mentioned (man pages are also a good place to start): gdb User Manual:
http://sources.redhat.com/gdb/current/onlinedocs/gdb_toc.html
idb Manual:
http://www.intel.com/software/products/compilers/docs/linux/idb_ manual_l.html
60 / 89
Now, in a completely different vein, there are tools designed to help identify errors pre-compilation, namely by running it through the source code itself. splint is a tool for statically checking C programs: http://www.splint.org ftncheck is a tool that checks only (alas) FORTRAN 77 codes: http://www.dsm.fordham.edu/~ftnchek/ I cant say that I have found these to be particulary helpful, though.
61 / 89
Memory allocation problems are very common - there are some tools designed to help you catch such errors at run-time: efence , or Electric Fence, tries to trap any out-of-bounds references (see man efence) valgrind is a suite of tools for anlayzing and proling binaries (see man valgrind) - there is a user manual available at:
file:///usr/share/doc/valgrind-3.6.0/html/manual.html
valgrind I have seen used with good success, but not particularly in the HPC arena.
62 / 89
Strace
strace is a powerful tool that will allow you to trace all system calls and signals made by a particular binary, whether or not you have source code. Can be attached to already running processes. A powerful lowlevel tool. You can learn a lot from it, but is often a tool of last resort for user applications in HPC due to the copious quantity of extraneous information it outputs.
63 / 89
Strace Example
As an example of using strace, lets peek in on a running MPI process (part of a 32 task job on U2):
[ c06n15 : ~ ] $ ps u jonesm L f UID PID PPID LWP C NLWP STIME TTY TIME CMD jonesm 23964 16284 23964 92 2 14:34 ? 0 0 : 0 4 : 1 1 / u t i l / nwchem / nwchem5.0/ b i n / jonesm 23964 16284 23965 99 2 14:34 ? 0 0 : 0 4 : 3 0 / u t i l / nwchem / nwchem5.0/ b i n / jonesm 23987 23986 23987 0 1 14:37 p t s / 0 0 0 : 0 0 : 0 0 bash jonesm 24128 23987 24128 0 1 14:39 p t s / 0 0 0 : 0 0 : 0 0 ps u jonesm L f [ c06n15 : ~ ] $ s t r a c e p 23965 Process 23965 a t t a c h e d i n t e r r u p t t o q u i t : l s e e k ( 4 5 , 691535872 , SEEK_SET) = 691535872 read ( 4 5 , " \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 2 \ 2 7 3 \ f [ \ 2 5 0 \ 2 0 7V\ 2 7 6 \ 3 7 6K& ] \ 3 3 1 \ 2 3 0 d " . . . , 524288)=524288 g e t t i m e o f d a y ({1161107631 , 126604} , { 2 4 0 , 1161107631}) = 0 g e t t i m e o f d a y ({1161107631 , 128553} , { 2 4 0 , 1161107631}) = 0 : : s e l e c t ( 4 7 , [ 3 4 6 7 8 9 42 43 44 4 6 ] , [ 4 ] , NULL , NULL ) = 2 ( i n [ 4 ] , o u t [ 4 ] ) w r i t e ( 4 , " \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 " . . . , 2932) = 2932 writev (4 , [ { " \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 1 7 \ 0 \ 0 \ 0 \ 3 7 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 , \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 " . . . , 32} , { " \ 1 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 3 7 \ 0 \ 0 \ 0 \ 1 7 \ 0 \ 0 \ 0 \ 3 7 \ 0 \ 0 \ 0 , \ 0 \ 1 \ 0 0 0 0 u " . . . , 4 4 } ] , 2 ) = 76
64 / 89
65 / 89
Using a GUI-based debugger gets considerably more difcult when dealing with debugging an MPI-based parallel code (not so much on the OpenMP side), due to the fact that you are now dealing with multiple processes scattered across different machines. The TotalView debugger is the premier product in this arena (it has both CLI and GUI support) - but it is very expensive, and not present in all environments. We will start out using our same toolbox as before, and see that we can accomplish much without spending a fortune. The methodologies will be equally applicable to the fancy commercial products.
67 / 89
Process Checking
Process Checking
First on the agenda - parallel processing involves multiple processes/threads (or both), and the rst rule is to make sure that they are ending up where you think that they should be (needless to say, all too often they do not). Use MPI_Get_processor_name to report back on where processes are running Use ps to monitor processes as they run (useful ags: ps u -L), even on remote nodes (rsh/ssh into them)
68 / 89
Process Checking
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[ bono : ~ ] $ q s t a t n 239365 bono . c c r . b u f f a l o . edu : Req d Req d Job ID Username Queue Jobname SessID NDS TSK Memory Time S - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 239365. bono . c c r . b u f f jonesm ccr QAtest 27130 4 24:00 R c23n31 /1+ c23n31 /0+ c23n30 /1+ c23n30 /0+ c23n29 /1+ c23n29 /0+ c23n28 /1+ c23n28 / 0 [ bono : ~ ] $ r s h c23n31 PID LWP TTY 27130 27130 ? 27201 27201 ? 27235 27235 ? 27244 27244 ? 30970 30970 ? 30982 30982 ? 30984 30984 ? 30985 30985 ? 30985 30987 ? 30986 30986 ? 1616 1616 ? ps u jonesm L TIME CMD 0 0: 0 0 : 0 0 bash 0 0: 0 0 : 0 0 pbs_demux 0 0: 0 0 : 0 0 bash 0 0: 0 0 : 0 0 doqmtests . mpi 0 0: 0 0 : 0 0 r u n t e s t s . mpi . un 0 0: 0 0 : 0 0 mpiexec 0 0: 0 0 : 0 0 mpiexec 0 0: 2 7 : 4 0 nwchem n t e l 9 1 i 0 0: 0 2 : 3 7 nwchem n t e l 9 1 i 0 0: 2 7 : 3 2 nwchem n t e l 9 1 i 0 0: 0 0 : 0 0 ps Elap Time - - 00:50
69 / 89
Process Checking
70 / 89
Process Checking
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
# g e t node l i s t i n g & t r i m o u t excess v e r b i a g e n l i n e s = $QST an $ j o b i d | wc l n l i n e s = echo " $ n l i n e s 6" | bc n o d e l i s t = $QST n $ j o b i d | t a i l $ n l i n e s | sed " s / \ / [ 0 1 ] + / , / g " | sed " s / \ / [ 0 1 ] / / " | sed " s / + / / g " | sed " s / , / \ / g " | awk { f o r ( i =1; i <=NF ; i ++) p r i n t f ( " %s \ n " , $ i ) } | u n i q | awk { p r i n t f " %s " , $1 } | awk { f o r ( i =1; i <=NF1; i ++) p r i n t f ("% s " , $ i ) p r i n t f ("% s " , $NF ) } # d e f i n e ps command #MYPS=" ps aeLf | awk { i f ( \ $5 > 10) p r i n t \ $1 , \ $2 , \ $3 , \ $4 , \ $5 , \ $9 , \ $10 } " MYPS=" ps u jonesm L o pid , pcpu , time ,comm" echo "MYPS = $MYPS" f o r node i n $ n o d e l i s t ; do echo "NODE = $node , my CPU/ t h r e a d Usage : " r s h $node $MYPS done
71 / 89
Process Checking
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
[ bono : ~ ] $ job_ps 239365 MYPS = ps u jonesm L o pid , pcpu , time ,comm NODE = c23n31 , my CPU/ t h r e a d Usage : PID % CPU TIME COMMAND 27130 0 . 0 0 0: 0 0 : 0 0 bash 27201 0 . 0 0 0: 0 0 : 0 0 pbs_demux 27235 0 . 0 0 0: 0 0 : 0 0 bash 27244 0 . 0 0 0: 0 0 : 0 0 doqmtests . mpi 1652 0 . 0 0 0: 0 0 : 0 0 r u n t e s t s . mpi . un 1664 0 . 0 00 : 0 0 : 0 0 mpiexec 1666 0 . 0 00 : 0 0 : 0 0 mpiexec 1667 94.5 00 : 1 7 : 1 8 nwchem n t e l 9 1 i 1667 10.0 00 : 0 1 : 5 0 nwchem n t e l 9 1 i 1668 94.2 00 : 1 7 : 1 5 nwchem n t e l 9 1 i 3813 0 . 0 00 : 0 0 : 0 0 ps NODE = c23n30 , my CPU/ t h r e a d Usage : PID % CPU TIME COMMAND 1975 96.2 00 : 1 7 : 3 6 nwchem n t e l 9 1 i 1975 6 . 6 00 : 0 1 : 1 3 nwchem n t e l 9 1 i 1976 96.0 00 : 1 7 : 3 4 nwchem n t e l 9 1 i 4033 0 . 0 00 : 0 0 : 0 0 ps NODE = c23n29 , my CPU/ t h r e a d Usage : PID % CPU TIME COMMAND 2673 95.2 00 : 1 7 : 2 6 nwchem n t e l 9 1 i 2673 8 . 9 00 : 0 1 : 3 8 nwchem n t e l 9 1 i 2674 94.5 00 : 1 7 : 1 9 nwchem n t e l 9 1 i 4728 1 . 0 00 : 0 0 : 0 0 ps
72 / 89
Process Checking
28 29 30 31 32 33
NODE = c23n28 , my CPU/ t h r e a d Usage : PID % CPU TIME COMMAND 19284 88.2 00 : 1 6 : 0 9 nwchem n t e l 9 1 i 19284 14.8 00 : 0 2 : 4 3 nwchem n t e l 9 1 i 19285 88.2 00 : 1 6 : 0 9 nwchem n t e l 9 1 i 21374 0 . 0 00 : 0 0 : 0 0 ps
73 / 89
GDB in Parallel
Yes, you can certainly run debuggers designed for use in sequential codes in parallel. They are even quite effective. You may just have to jump through a few extra hoops to do so ...
75 / 89
GDB in Parallel
Attaching GDB
76 / 89
GDB in Parallel
Attaching GDB
Of course, unless you put an explicit waiting point inside your code, the processes are probably happily running along when you attach to them, and you will likely want to exert some control over that.
77 / 89
GDB in Parallel
Attaching GDB
First, using our above example, I was running one mpi task on f10n32 and one on f10n24. After attaching gdb to each process, they paused:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [ f10n32 : ~ / d_hw / d_hw2 / d_pp ] $ gdb pp . gdb 923 GNU gdb (GDB) Red Hat E n t e r p r i s e L i n u x (7.2 48. e l 6 ) ... ( gdb ) where #0 0x00000031ee0dc0e8 i n p o l l ( ) from / l i b 6 4 / l i b c . so . 6 #1 0x00002b4015ef9a1b i n MPID_nem_tcp_connpoll ( i n _ b l o c k i n g _ p o l l =1) a t . . / . . / socksm . c :2526 #2 0x00002b4015ef8e82 i n MPID_nem_tcp_poll ( i n _ b l o c k i n g _ p o l l =1) a t . . / . . / socksm . c :2324 #3 0x00002b4015e51a56 i n MPID_nem_network_poll ( i n _ b l o c k i n g _ p r o g r e s s =1) a t . . . #4 0 x00002b4015cff4ce i n MPIDI_CH3I_Progress ( p r o g r e s s _ s t a t e =0 x 7 f f f 3 a f e 2 b 9 0 , . . . #5 0x00002b4015eab822 i n PMPI_Recv ( b u f =0x601e60 , count =1 , d a t a t y p e =1275070495 , . . . a t . . / . . / r e c v . c :156 #6 0x00002b40161ab000 i n pmpi_recv__ ( ) from / u t i l / i n t e l / 2 0 1 1 . 0 / i m p i / 4 . 0 . 1 . 0 0 7 / . . . #7 0x0000000000401288 i n pp ( ) a t pp . f 9 0 : 8 0 #8 0x000000000040176a i n main ( ) #9 0x00000031ee01ecdd i n _ _ l i b c _ s t a r t _ m a i n ( ) from / l i b 6 4 / l i b c . so . 6 #10 0x0000000000400c49 i n _ s t a r t ( ) ( gdb ) c Continuing .
78 / 89
GDB in Parallel
Attaching GDB
and on f10n24:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 [ u2 : ~ ] $ ssh f10n24 [ f10n24 : ~ ] $ ps u jonesm PID TTY TIME CMD 22673 ? 0 0: 0 0 : 0 0 python 22684 ? 0 0: 0 0 : 0 0 python 22686 ? 0 0: 0 2 : 5 4 pp . gdb 22693 ? 0 0: 0 0 : 0 0 sshd 22694 p t s / 0 0 0: 0 0 : 0 0 bash 22733 p t s / 0 0 0: 0 0 : 0 0 ps [ f10n24 : ~ ] $ gdb pp . gdb 22686 GNU gdb (GDB) Red Hat E n t e r p r i s e L i n u x (7.2 48. e l 6 ) ... ( gdb ) where #0 0x000000309b2dc0e8 i n p o l l ( ) from / l i b 6 4 / l i b c . so . 6 #1 0x00002b333112fa1b i n MPID_nem_tcp_connpoll ( i n _ b l o c k i n g _ p o l l =1) a t . . / . . / socksm . c :2526 #2 0x00002b333112ee82 i n MPID_nem_tcp_poll ( i n _ b l o c k i n g _ p o l l =1) a t . . / . . / socksm . c :2324 #3 0x00002b3331087a56 i n MPID_nem_network_poll ( i n _ b l o c k i n g _ p r o g r e s s =1) a t . . / . . / . . . #4 0x00002b3330f354ce i n MPIDI_CH3I_Progress ( p r o g r e s s _ s t a t e =0 x 7 f f f d 9 d f f 6 b 0 , . . . #5 0x00002b33310e1822 i n PMPI_Recv ( b u f =0x601e60 , count =4 , d a t a t y p e =1275070495 , . . . a t . . / . . / r e c v . c :156 #6 0x00002b33313e1000 i n pmpi_recv__ ( ) from / u t i l / i n t e l / 2 0 1 1 . 0 / i m p i / 4 . 0 . 1 . 0 0 7 / . . . #7 0x00000000004013be i n pp ( ) a t pp . f 9 0 : 9 1 #8 0x000000000040176a i n main ( ) #9 0x000000309b21ecdd i n _ _ l i b c _ s t a r t _ m a i n ( ) from / l i b 6 4 / l i b c . so . 6 #10 0x0000000000400c49 i n _ s t a r t ( ) ( gdb ) c Continuing .
and we used the (c) continue command to let the execution pick up again where we (temporarily) interrupted it.
M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 79 / 89
GDB in Parallel
Attaching GDB
You can insert a waiting point into your code to ensure that execution waits until you get a chance to attach a debugger:
i n t e g e r : : gdbWait=0 : : CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid , i e r r ) CALL MPI_COMM_SIZE(MPI_COMM_WORLD, Nprocs , i e r r ) ! dummy pause p o i n t f o r gdb i n s t e r t i o n do while ( gdbWait / = 1 ) end do
80 / 89
GDB in Parallel
Attaching GDB
and then you will nd the waiting at that point when you attach gdb, and you can release it at your leisure:
1 2 3 4 5 6 7 8 9 [ f10n32 : ~ / d_hw / d_hw2 / d_pp ] $ gdb pp . gdb 1003 GNU gdb (GDB) Red Hat E n t e r p r i s e L i n u x (7.2 48. e l 6 ) ... 0x0000000000400dd2 i n pp ( ) a t pp . f 9 0 : 4 2 42 do w h i l e ( gdbWait / = 1 ) ( gdb ) s gdbWait = 1 44 i f ( Nprocs / = 2 ) then ( gdb ) c Continuing .
1 2 3 4 5 6 7 8
[ f10n24 : ~ / d_hw / d_hw2 / d_pp ] $ gdb pp . gdb 22777 GNU gdb (GDB) Red Hat E n t e r p r i s e L i n u x (7.2 48. e l 6 ) 0x0000000000400dd2 i n pp ( ) a t pp . f 9 0 : 4 2 42 do w h i l e ( gdbWait / = 1 ) ( gdb ) s gdbWait = 1 44 i f ( Nprocs / = 2 ) then ( gdb ) c Continuing .
81 / 89
GDB in Parallel
82 / 89
GDB in Parallel
83 / 89
GDB in Parallel
So you can certainly use serial debuggers in parallel - in fact it is a pretty handy thing to do. Just keep in mind: Dont forget to compile with debugging turned on You can always attach to a running code (and you can instrument the code with that purpose in mind) Beware that not all task launchers are equally friendly towards built-in support for serial debuggers
84 / 89
TotalView
The premier parallel debugger, TotalView: Sophisticated commercial product (think many $$ ...) Designed especially for HPC, multi-process, multi-thread Has both GUI and CLI Supports C/C++, Fortran 77/90/95, mixtures thereof The ofcial debugger of DOEs Advanced Simulation and Computing (ASC) program
86 / 89
TotalView
Make sure that your X DISPLAY environment is working if you are going to use the GUI. The current CCR license supports 2 concurrent users up to 8 processors (precludes usage on nodes with more than 8 cores until/unless this license is upgraded).
87 / 89
DDT
Allineas commercial parallel debugger, DDT: Sophisticated commercial product (think many $$ ...) Designed especially for HPC, multi-process, multi-thread Has both GUI and CLI Supports C/C++, Fortran 77/90/95, mixtures thereof CCR has a 32-token license for DDT (including CUDA support) To nd the latest installed version, module avail ddt
88 / 89
Eclipse PTP
Current Recommendations
CCR has licenses for Allineas DDT and TotalView (although the current TotalView license is very small and outdated and will be either upgraded or dropped in favor of DDT). Both are quite expensive, but stay tuned for further developments. Note that the open-source eclipse project also has a parallel tools platform that can be used in combination with C/C++ and Fortran: http://www.eclipse.org/ptp
89 / 89