Class13 Debugging PDF

Debugging in Serial & Parallel
M. D. Jones, Ph.D.
Center for Computational Research University at Buffalo State University of New York
High Performance Computing I, 2012
M. D. Jones, Ph.D. (CCR/UB)
HPC-I Fall 2012
1 / 89
Part I Basic (Serial) Debugging
HPC-I Fall 2012
2 / 89
Introduction
Software for Debugging
The most common method for debugging (by far) is the instrumentation method: One instruments the code with print statements to check values and follow the execution of the program Not exactly sophisticated - one can certainly debug code in this way, but wise use of software debugging tools can be more effective
HPC-I Fall 2012
4 / 89
Introduction
Debugging Tools
Debugging tools are abundant, but we will focus merely on some of the most common attributes to give you a bag of tricks that can be used when dealing with common problems.
HPC-I Fall 2012
5 / 89
Introduction
Basic Capabilities
Common attributes: Divided into command-line or graphical user interfaces Usually have to recompile (-g is almost a standard option to enable debugging) your code to utilize most debugger features Invocation by name of debugger and executable (e.g. gdb ./a.out [core])
HPC-I Fall 2012
6 / 89
Introduction
Running Within
Inside a debugger (be it using a command-line interface (CLI) or graphical front-end), you have some very handy abilities: Look at source code listing (very handy when isolating an IEEE exception) Line-by-line execution Insert stops or breakpoints at certain functional points (i.e., when critical values change) Ability to monitor variable values Look at stack trace (or backtrace) when code crashes
HPC-I Fall 2012
7 / 89
Introduction
Command-line debugging example

Consider the following code example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # include < s t d i o . h> # include < s t d l i b . h> int indx ; void i n i t A r r a y ( i n t nelem_in_array , i n t a r r a y ) ; void p r i n t A r r a y ( i n t nelem_in_array , i n t a r r a y ) ; i n t squareArray ( i n t nelem_in_array , i n t a r r a y ) ; i n t main ( void ) { const i n t nelem = 1 0 ; i n t array1 , array2 , d e l ; / A l l o c a t e memory f o r each a r r a y / a r r a y 1 = ( i n t ) m a l l o c ( nelems i z e o f ( i n t ) ) ; a r r a y 2 = ( i n t ) m a l l o c ( nelems i z e o f ( i n t ) ) ; d e l = ( i n t ) m a l l o c ( nelems i z e o f ( i n t ) ) ; / I n i t i a l i z e a r r a y 1 / i n i t A r r a y ( nelem , a r r a y 1 ) ; f o r ( i n d x = 0 ; i n d x < nelem ; i n d x ++) { array1 [ indx ] = indx + 2; }
HPC-I Fall 2012
8 / 89
Introduction
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
/ P r i n t t h e elements o f a r r a y 1 / p r i n t f ( " array1 = " ) ; p r i n t A r r a y ( nelem , a r r a y 1 ) ; / Copy a r r a y 1 to a r r a y 2 / array2 = array1 ; / Pass a r r a y 2 to t h e function squareArray ( ) / squareArray ( nelem , a r r a y 2 ) ; / Compute d i f f e r e n c e between elements o f a r r a y 2 and a r r a y 1 / f o r ( i n d x = 0 ; i n d x < nelem ; i n d x ++) { del [ indx ] = array2 [ indx ] array1 [ indx ] ; } / P r i n t t h e computed d i f f e r e n c e s / p r i n t f ( " The d i f f e r e n c e i n t h e elements o f a r r a y 2 and a r r a y 1 are : p r i n t A r r a y ( nelem , d e l ) ; free ( array1 ) ; free ( array2 ) ; free ( del ) ; return 0; }
" );
HPC-I Fall 2012
9 / 89
Introduction
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69
v o i d i n i t A r r a y ( c o n s t i n t nelem_in_array , i n t a r r a y ) { f o r ( i n d x = 0 ; i n d x < n e le m_ i n_ ar r ay ; i n d x ++) { array [ indx ] = indx + 1; } } i n t squareArray ( c o n s t i n t nelem_in_array , i n t a r r a y ) { i n t indx ; f o r ( i n d x = 0 ; i n d x < n e le m_ i n_ ar r ay ; i n d x + + ) { a r r a y [ i n d x ] = a r r a y [ i n d x ] ; } return array ; } v o i d p r i n t A r r a y ( c o n s t i n t nelem_in_array , i n t a r r a y ) { printf ( " \n( " ); f o r ( i n d x = 0 ; i n d x < n e le m_ i n_ ar r ay ; i n d x + + ) { p r i n t f ( "%d " , a r r a y [ i n d x ] ) ; } p r i n t f ( " ) \ n" ) ; }
HPC-I Fall 2012
10 / 89
Introduction
Ok, now lets compile and run this code:

1 2 3 4 5 6 7 8 [ bono : ~ / d_debug ] $ gcc g o a r r a yex a r r a yex . c [ bono : ~ / d_debug ] $ . / a r r a yex array1 = ( 2 3 4 5 6 7 8 9 10 11 ) The d i f f e r e n c e i n t h e elements o f a r r a y 2 and a r r a y 1 are : ( 0 0 0 0 0 0 0 0 0 0 ) g l i b c d e t e c t e d double f r e e o r c o r r u p t i o n ( f a s t t o p ) : 0x0000000000501010 Aborted
Not exactly what we expect, is it? Array2 should contain the squares of the values in array1, and therefore the difference should be i 2 i for i = [2, 11].
HPC-I Fall 2012
11 / 89
Introduction
Now let us run the code from within gdb. Our goal is to set a breakpoint where the squared arrays elements are computed, then step through the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [ bono : ~ / d_debug ] $ gdb a r r a yex ( gdb ) l 34 31 32 / Copy a r r a y 1 t o a r r a y 2 / 33 array2 = array1 ; 34 35 / Pass a r r a y 2 t o t h e f u n c t i o n squareArray ( ) / 36 squareArray ( nelem , a r r a y 2 ) ; 37 38 / Compute d i f f e r e n c e between elements o f a r r a y 2 and a r r a y 1 / 39 f o r ( i n d x = 0 ; i n d x < nelem ; i n d x ++) { 40 del [ indx ] = array2 [ indx ] array1 [ indx ] ; ( gdb ) b 34 B r e a k p o i n t 1 a t 0x400604 : f i l e ex1 . c , l i n e 3 4 . ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / a r r a yex array1 = ( 2 3 4 5 6 7 8 9 10 11 )
HPC-I Fall 2012
12 / 89
Introduction
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
B r e a k p o i n t 1 , main ( ) a t a r r a yex . c : 3 4 34 squareArray ( nelem , a r r a y 2 ) ; ( gdb ) s squareArray ( n el e m_ i n_ ar r ay =10 , a r r a y =0x501010 ) a t a r r a yex . c : 5 9 59 f o r ( i n d x = 0 ; i n d x < n e le m_ i n_ ar r ay ; i n d x + + ) { ( gdb ) s 60 a r r a y [ i n d x ] = a r r a y [ i n d x ] ; ( gdb ) s 59 f o r ( i n d x = 0 ; i n d x < n e le m_ i n_ ar r ay ; i n d x + + ) { ( gdb ) p i n d x $1 = 0 ( gdb ) p a r r a y [ i n d x ] $2 = 4 ( gdb ) d i s p l a y i n d x 1: indx = 0 ( gdb ) d i s p l a y a r r a y [ i n d x ] 2: array [ indx ] = 4 ( gdb ) s 60 a r r a y [ i n d x ] = a r r a y [ i n d x ] ; 2: array [ indx ] = 3 1: indx = 1
Ok, that is instructive, but no closer to nding the bug.
HPC-I Fall 2012
13 / 89
Introduction
So, what have we learned so far about the command-line debugger: Useful for peaking inside source code (break) Breakpoints (s) Stepping through execution (p) Print values at selected points (can also use handy printf syntax as in C) (display) Displaying values for monitoring while stepping through code (bt) Backtrace, or Stack Trace - havent used this yet, but certainly will
HPC-I Fall 2012
14 / 89
Introduction
Digging Out the Bug

What we have learned is enough - look more closely at the line where the differences between array1 and array2 are computed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ( gdb ) l 38 33 / Pass a r r a y 2 t o t h e f u n c t i o n squareArray ( ) / 34 squareArray ( nelem , a r r a y 2 ) ; 35 36 / Compute d i f f e r e n c e between elements o f a r r a y 2 and a r r a y 1 / 37 f o r ( i n d x = 0 ; i n d x < nelem ; i n d x ++) { 38 del [ indx ] = array2 [ indx ] array1 [ indx ] ; 39 } 40 41 / P r i n t t h e computed d i f f e r e n c e s / 42 p r i n t f ( " The d i f f e r e n c e i n t h e elements o f a r r a y 2 and a r r a y 1 are : ( gdb ) b 37 B r e a k p o i n t 1 a t 0x400611 : f i l e a r r a yex . c , l i n e 3 7 . ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / a r r a yex array1 = ( 2 3 4 5 6 7 8 9 10 11 )
");
HPC-I Fall 2012
15 / 89
Introduction
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
B r e a k p o i n t 1 , main ( ) a t a r r a yex . c : 3 7 37 f o r ( i n d x = 0 ; i n d x < nelem ; ( gdb ) d i s p i n d x 1 : i n d x = 10 ( gdb ) d i s p a r r a y 1 [ i n d x ] 2 : a r r a y 1 [ i n d x ] = 49 ( gdb ) d i s p a r r a y 2 [ i n d x ] 3 : a r r a y 2 [ i n d x ] = 49 ( gdb ) s 38 del [ indx ] = array2 [ indx ] 3: array2 [ indx ] = 4 2: array1 [ indx ] = 4 1: indx = 0 ( gdb ) s 37 f o r ( i n d x = 0 ; i n d x < nelem ; 3: array2 [ indx ] = 4 2: array1 [ indx ] = 4 1: indx = 0 ( gdb ) s 38 del [ indx ] = array2 [ indx ] 3: array2 [ indx ] = 9 2: array1 [ indx ] = 9 1: indx = 1
i n d x ++) {
array1 [ indx ] ;
i n d x ++) {
array1 [ indx ] ;
HPC-I Fall 2012
16 / 89
Introduction
Now that isnt right - array1 was not supposed to change. Let us go back and look more closely at the call to squareArray ...
HPC-I Fall 2012
17 / 89
Introduction
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
( gdb ) l 32 33 / Pass a r r a y 2 t o t h e f u n c t i o n squareArray ( ) / 34 squareArray ( nelem , a r r a y 2 ) ; 35 36 / Compute d i f f e r e n c e between elements o f a r r a y 2 and a r r a y 1 / 37 f o r ( i n d x = 0 ; i n d x < nelem ; i n d x ++) { 38 del [ indx ] = array2 [ indx ] array1 [ indx ] ; 39 } 40 41 / P r i n t t h e computed d i f f e r e n c e s / ( gdb ) b 34 B r e a k p o i n t 2 a t 0x400605 : f i l e a r r a yex . c , l i n e 3 4 . ( gdb ) run The program being debugged has been s t a r t e d a l r e a d y . S t a r t i t from t h e b e g i n n i n g ? ( y o r n ) y S t a r t i n g program : / san / user / jonesm / u2 / d_debug / a r r a yex array1 = ( 2 3 4 5 6 7 8 9 10 11 ) B r e a k p o i n t 2 , main ( ) a t a r r a yex . c : 3 4 34 squareArray ( nelem , a r r a y 2 ) ; 3 : a r r a y 2 [ i n d x ] = 49 2 : a r r a y 1 [ i n d x ] = 49 1 : i n d x = 10 ( gdb ) d i s p a r r a y 2 4 : a r r a y 2 = ( i n t ) 0x501010 ( gdb ) d i s p a r r a y 1 5 : a r r a y 1 = ( i n t ) 0x501010
HPC-I Fall 2012
18 / 89
Introduction
Yikes, array1 and array2 point to the same memory location! See, pointer errors like this dont happen too often in Fortran ... Now , of course, the bug is obvious - but arent they all obvious after you nd them?
HPC-I Fall 2012
19 / 89
Introduction
The Fix Is In
Just as an afterthought, what we ought to have done in the rst place was copy array1 into array2:
/ Copy a r r a y 1 t o a r r a y 2 / / a r r a y 2 = a r r a y 1 ; / f o r ( i n d x =0; indx <nelem ; i n d x ++) { array2 [ indx ]= array1 [ indx ] ; }
which will nally produce the right output:

1 2 3 4 5 6 7 8 9 ( gdb ) run S t a r t i n g program : / home / jonesm / d_debug / ex1 array1 = ( 2 3 4 5 6 7 8 9 10 11 ) The d i f f e r e n c e i n t h e elements o f a r r a y 2 and a r r a y 1 are : ( 2 6 12 20 30 42 56 72 90 110 ) Program e x i t e d n o r m a l l y . ( gdb )
HPC-I Fall 2012
20 / 89
Array Indexing Errors
Array indexing errors are one of the most common errors in both sequential and parallel codes - and it is not entirely surprising: Different languages have different indexing defaults Multi-dimensional arrays are pretty easy to reference out-of-bounds Fortran in particular lets you use very complex indexing schemes (essentially arbitrary!)
HPC-I Fall 2012
22 / 89
Example: Indexing Error

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # include < s t d i o . h> # define N 10 i n t main ( i n t argc , char argv [ ] ) { i n t a r r [N ] ; i n t i , odd_sum , even_sum ; f o r ( i =1; i <(N1);++ i ) { i f ( i <=4) { a r r [ i ] = ( i i )%3; } else { a r r [ i ] = ( i i )%5; } } odd_sum =0; even_sum =0; f o r ( i =0; i <(N1);++ i ) { i f ( i %2==0) { even_sum += a r r [ i ] ; } else { odd_sum += a r r [ i ] ; } } p r i n t f ( " odd_sum=%d , even_sum=%d \ n " , odd_sum , even_sum ) ; }
HPC-I Fall 2012
23 / 89
Now, try compiling with gcc and running the code:

1 2 3 [ bono : ~ / d_debug ] $ gcc g o ex2 ex2 . c O [ bono : ~ / d_debug ] $ . / ex2 odd_sum=5 , even_sum=671173703
Ok, that hardly seems reasonable (does it?) Now, lets run this example from within gdb and set a breakpoint to examine the accumulation of values to even_sum.
HPC-I Fall 2012
24 / 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
( gdb ) l 16 11 a r r [ i ] = ( i i )%5; 12 } 13 } 14 odd_sum =0; 15 even_sum =0; 16 f o r ( i =0; i <(N1);++ i ) { 17 i f ( i %2==0) { 18 even_sum += a r r [ i ] ; 19 } else { 20 odd_sum += a r r [ i ] ; ( gdb ) b 16 B r e a k p o i n t 1 a t 0x40051e : f i l e ex2 . c , l i n e 1 6 . ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / ex2 B r e a k p o i n t 1 , main ( argc= V a r i a b l e " argc " i s n o t a v a i l a b l e . ) a t ex2 . c : 1 6 16 f o r ( i =0; i <(N1);++ i ) { ( gdb ) p a r r $1 = {671173696 , 1 , 1 , 0 , 1 , 0 , 1 , 4 , 4 , 0 }
HPC-I Fall 2012
25 / 89
So we see that our original example code missed initializing the rst element of the array, and the results were rather erratic (in fact they will likely be compiler and ag dependent). Initialization is just one aspect of things going wrong with array indexing - let us examine another common problem ...
HPC-I Fall 2012
26 / 89
The (Infamous) Seg Fault
This example I borrowed from Norman Matloff (UC Davis), who has a nice article (well worth the time to read): Guide to Faster, Less Frustrating Debugging, which you can nd easily enough on the web: http://heather.cs.ucdavis.edu/~matloff/unix.html
HPC-I Fall 2012
27 / 89
Main code: ndprimes.c

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 / primenumber f i n d i n g program w i l l ( a f t e r bugs are f i x e d ) r e p o r t a l i s t o f a l l primes which are l e s s than o r equal t o t h e users u p p l i e d upper bound / # include < s t d i o . h> # define MaxPrimes 50 i n t Prime [ MaxPrimes ] , / Prime [ I ] w i l l be 1 i f I i s prime , 0 o t h e r w i s e / UpperBound ; / we w i l l check up t h r o u g h UpperBound f o r primeness / void CheckPrime ( i n t K ) ; / p r o t o t y p e f o r CheckPrime f u n c t i o n / i n t main ( ) { i n t N; p r i n t f ( " e n t e r upper bound \ n " ) ; s c a n f ( "%d " , UpperBound ) ; Prime [ 2 ] = 1 ; f o r (N = 3 ; N <= UpperBound ; N += 2 ) CheckPrime (N ) ; i f ( Prime [N ] ) p r i n t f ( "%d i s a prime \ n " ,N ) ; }
HPC-I Fall 2012
28 / 89
Function FindPrime:
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 void CheckPrime ( i n t K) { int J ; / t h e p l a n : see i f J d i v i d e s K , f o r a l l v a l u e s J which are ( a ) themselves prime ( no need t o t r y J i f i t i s nonprime ) , and ( b ) l e s s than o r equal t o s q r t (K) ( i f K has a d i v i s o r l a r g e r than t h i s square r o o t , i t must a l s o have a s m a l l e r one , so no need t o check f o r l a r g e r ones ) / J = 2; while ( 1 ) { i f ( Prime [ J ] == 1 ) i f ( K % J == 0 ) { Prime [ K ] = 0 ; return ; } J ++; } / i f we g e t here , then t h e r e were no d i v i s o r s o f K , so i t i s prime / Prime [ K ] = 1 ; }
so now if we compile and run this code ...

M. D. Jones, Ph.D. (CCR/UB) Debugging in Serial & Parallel HPC-I Fall 2012 29 / 89
1 2 3 4 5
[ bono : ~ / d_debug ] $ gcc g o f i n d p r i m e s f i n d p r i m e s . c [ bono : ~ / d_debug ] $ . / f i n d p r i m e s e n t e r upper bound 20 Segmentation f a u l t
Ok, lets re up gdb and see where this code crashed:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 [ bono : ~ / d_debug ] $ gdb f i n d p r i m e s ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / f i n d p r i m e s e n t e r upper bound 20 Program r e c e i v e d s i g n a l SIGSEGV, Segmentation f a u l t . 0x000000392815062f i n _ I O _ v f s c a n f _ i n t e r n a l ( ) from / l i b 6 4 / t l s / l i b c . so . 6 ( gdb ) b t #0 0x000000392815062f i n _ I O _ v f s c a n f _ i n t e r n a l ( ) from / l i b 6 4 / t l s / l i b c . so . 6 #1 0x000000392815866a i n s c a n f ( ) from / l i b 6 4 / t l s / l i b c . so . 6 #2 0x0000000000400524 i n main ( ) a t f i n d p r i m e s . c : 1 6 ( gdb )
HPC-I Fall 2012
30 / 89
Now, the scanf intrinsic is probably pretty safe from internal bugs, so the error is likely coming from our usage:
1 2 3 4 5 6 7 8 9 10 11 ( gdb ) l 16 s c a n f ("%d " , UpperBound ) ; 17 18 Prime [ 2 ] = 1 ; 19 20 f o r (N = 3 ; N <= UpperBound ; N += 2 ) 21 CheckPrime (N ) ; 22 i f ( Prime [N ] ) p r i n t f ("%d i s a prime \ n " ,N ) ; 23 } 24 25 v o i d CheckPrime ( i n t K) {
Yeah, pretty dumb - scanf needs a pointer argument, i.e.

1 2 ( gdb ) l 16 s c a n f ("%d " ,& UpperBound ) ;
that takes care of the rst bug ... but lets keep running from within gdb
HPC-I Fall 2012
31 / 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[ bono : ~ / d_debug ] $ gcc g o f i n d p r i m e s f i n d p r i m e s . c [ bono : ~ / d_debug ] $ gdb f i n d p r i m e s ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / f i n d p r i m e s e n t e r upper bound 20 Program r e c e i v e d s i g n a l SIGSEGV, Segmentation f a u l t . 0x0000000000400586 i n CheckPrime (K=3) a t f i n d p r i m e s . c : 3 7 37 i f ( Prime [ J ] == 1 ) ( gdb ) b t #0 0x0000000000400586 i n CheckPrime (K=3) a t f i n d p r i m e s . c : 3 7 #1 0x0000000000400547 i n main ( ) a t f i n d p r i m e s . c : 2 1 ( gdb ) l 37 32 than t h i s square r o o t , i t must a l s o have a s m a l l e r one , 33 so no need t o check f o r l a r g e r ones ) / 34 35 J = 2; 36 while (1) { 37 i f ( Prime [ J ] == 1 ) 38 i f (K % J == 0 ) { 39 Prime [ K ] = 0 ; 40 return ; 41 } ( gdb )
HPC-I Fall 2012
32 / 89
very often we get seg faults on trying to reference an array out-of-bounds, so have a look at the value of J:
26 27 28 29 30 31 32 33 34 35 36 37 38 ( gdb ) l 37 32 than t h i s square r o o t , i t must a l s o have a s m a l l e r one , 33 so no need t o check f o r l a r g e r ones ) / 34 35 J = 2; 36 while (1) { 37 i f ( Prime [ J ] == 1 ) 38 i f (K % J == 0 ) { 39 Prime [ K ] = 0 ; 40 return ; 41 } ( gdb ) p J $1 = 376
Oops! That is just a tad outside the bounds (50). Kind of forgot to put a cap on the value of J ...
HPC-I Fall 2012
33 / 89
Fixing the last bug:

1 2 3 4 5 6 7 8 9 10 11 ( gdb ) l 40 35 J = 2; 36 / w h i l e ( 1 ) { / 37 f o r ( J =2; JJ <= K ; J ++) { 38 i f ( Prime [ J ] == 1 ) 39 i f (K % J == 0 ) { 40 Prime [ K ] = 0 ; 41 return ; 42 } 43 / J ++; / 44 }
Ok, now let us try to run the code:

1 2 3 4 5 [ bono : ~ / d_debug ] $ gcc g o f i n d p r i m e s f i n d p r i m e s . c [ bono : ~ / d_debug ] $ . / f i n d p r i m e s e n t e r upper bound 20 [ bono : ~ / d_debug ] $
Oh, fantastic - no primes between 1 and 20? Not hardly ...
HPC-I Fall 2012
34 / 89
Ok, so now we will set a couple of breakpoints - one at the call to FindPrime and the second where a successful prime is to be output:
HPC-I Fall 2012
35 / 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
( gdb ) l 16 s c a n f ("%d " ,& UpperBound ) ; 17 18 Prime [ 2 ] = 1 ; 19 20 f o r (N = 3 ; N <= UpperBound ; N += 2 ) 21 CheckPrime (N ) ; 22 i f ( Prime [N ] ) p r i n t f ("%d i s a prime \ n " ,N ) ; 23 } 24 25 v o i d CheckPrime ( i n t K) { ( gdb ) b 20 B r e a k p o i n t 1 a t 0x40052d : f i l e f i n d p r i m e s . c , l i n e 2 0 . ( gdb ) b 22 B r e a k p o i n t 2 a t 0x400550 : f i l e f i n d p r i m e s . c , l i n e 2 2 . ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / f i n d p r i m e s e n t e r upper bound 20 B r e a k p o i n t 1 , main ( ) a t f i n d p r i m e s . c : 2 0 20 f o r (N = 3 ; N <= UpperBound ; N += 2 ) ( gdb ) c Continuing . B r e a k p o i n t 2 , main ( ) a t f i n d p r i m e s . c : 2 2 22 i f ( Prime [N ] ) p r i n t f ("%d i s a prime \ n " ,N ) ; ( gdb ) p N $1 = 21 ( gdb )
HPC-I Fall 2012
36 / 89
Another gotcha - misplaced (or no) braces. Fix that:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ( gdb ) l 16 s c a n f ("%d " ,& UpperBound ) ; 17 18 Prime [ 2 ] = 1 ; 19 20 f o r (N = 3 ; N <= UpperBound ; N += 2 ) { 21 CheckPrime (N ) ; 22 i f ( Prime [N ] ) p r i n t f ("%d i s a prime \ n " ,N ) ; 23 } 24 } 25 ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / f i n d p r i m e s e n t e r upper bound 20 3 i s a prime 5 i s a prime 7 i s a prime 11 i s a prime 13 i s a prime 17 i s a prime 19 i s a prime Program e x i t e d w i t h code 025. ( gdb )
Ah, the sweet taste of success ... (even better, give the program a return code!)
Debugging Life Itself
Game of Life
Well, ok, not exactly debugging life itself; rather the game of life. Mathematician John Horton Conways game of life1 , to be exact. This example will basically be similar to the prior examples, but now we will work in Fortran, and debug some integer arithmetic errors. And the context will be slightly more interesting.
see, for example, Martin Gardners article in Scientic American, 223, pp. 120-123 (1970).
Game of Life
Game of Life
The Game of Life is one of the better known examples of cellular automatons (CA), namely a discrete model with a nite number of states, often used in theoretical biology, game theory, etc. The rules are actually pretty simple, and can lead to some rather surprising self-organizing behavior. The universe in the game of life: Universe is an innite 2D grid of cells, each of which is alive or dead Cells interact only with nearest neighbors (including on the diagonals, which makes for eight neighbors)
HPC-I Fall 2012
40 / 89
Game of Life
Rules of Life
The rules in the game of life: Any live cell with fewer than two neighbours dies, as if by loneliness Any live cell with more than three neighbours dies, as if by overcrowding Any live cell with two or three neighbours lives, unchanged, to the next generation Any dead cell with exactly three neighbours comes to life An initial pattern is evolved by simultaneously applying the above rules to the entire grid, and subsequently at each tick of the clock.
HPC-I Fall 2012
41 / 89
Game of Life
Sample Code - Game of Life
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
program l i f e ! ! Conway game o f l i f e ( debugging example ) ! i m p l i c i t none integer , parameter : : n i =1000 , n j =1000 , nsteps = 100 i n t e g e r : : i , j , n , im , i p , jm , j p , nsum , isum integer , dimension ( 0 : n i , 0 : n j ) : : old , new r e a l : : arand , nim2 , njm2 ! ! i n i t i a l i z e elements o f " o l d " t o 0 o r 1 ! do j = 1 , n j do i = 1 , n i CALL random_number ( arand ) o l d ( i , j ) = NINT ( arand ) enddo enddo nim2 = n i 2 njm2 = n j 2
HPC-I Fall 2012
42 / 89
Game of Life
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
! ! time i t e r a t i o n ! t i m e _ i t e r a t i o n : do n = 1 , nsteps do j = 1 , n j do i = 1 , n i ! ! p e r i o d i c boundaries , ! im = 1 + ( i +nim2 ) ( ( i +nim2 ) / n i ) n i ! if i p = 1 + i ( i / n i ) n i ! if jm = 1 + ( j +njm2 ) ( ( j +njm2 ) / n j ) n j ! if j p = 1 + j ( j / n j ) n j ! if ! ! f o r each p o i n t , add s u r r o u n d i n g v a l u e s ! nsum = o l d ( im , j p ) + o l d ( i , j p ) + o l d ( i p , j p ) + o l d ( im , j ) + old ( ip , j ) + o l d ( im , jm ) + o l d ( i , jm ) + o l d ( i p , jm )
i =1 , i =ni , j =1 , j =nj ,
ni 1 nj 1
& &
HPC-I Fall 2012
43 / 89
Game of Life
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
! ! s e t new v a l u e based on number o f " l i v e " n e i gh b o r s ! s e l e c t case ( nsum ) case ( 3 ) new ( i , j ) = 1 case ( 2 ) new ( i , j ) = o l d ( i , j ) case d e f a u l t new ( i , j ) = 0 end s e l e c t enddo enddo ! ! copy new s t a t e i n t o o l d s t a t e ! o l d = new p r i n t , T i c k , n , number o f l i v i n g : ,sum ( new ) enddo t i m e _ i t e r a t i o n ! ! w r i t e number o f l i v e p o i n t s ! p r i n t , number o f l i v e p o i n t s = , sum ( new ) end program l i f e
HPC-I Fall 2012
44 / 89
Game of Life
Initial Run ...
1 2 3 4 5 6 7 8 9 10 11 12 13
[ bono : ~ / d_debug ] $ i f o r t g [ bono : ~ / d_debug ] $ . / l i f e Tick 1 number Tick 2 number Tick 3 number Tick 4 number Tick 5 number Tick 6 number : : Tick 99 number Tick 100 number number o f l i v e p o i n t s =
o l i f e of of of of of of
l i f e . f90 : : : : : : 342946 334381 291022 263356 290940 322733
living living living living living living
of l i v i n g : of l i v i n g : 0
0 0
Hmm, everybody dies! What kind of life is that? ... well, not a correct one, in this context, at least. Undoubtedly the problem lies within the neighbor calculation, so let us take a closer look at the execution ...
HPC-I Fall 2012
45 / 89
Game of Life
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
( gdb ) l 30 25 do j = 1 , n j 26 do i = 1 , n i 27 ! 28 ! p e r i o d i c boundaries 29 ! 30 im = 1 + ( i +nim2 ) ( ( i +nim2 ) / n i ) n i 31 i p = 1 + i ( i / n i ) n i 32 jm = 1 + ( j +njm2 ) ( ( j +njm2 ) / n j ) n j 33 j p = 1 + j ( j / n j ) n j ( gdb ) b 25 B r e a k p o i n t 1 a t 0x402e23 : f i l e l i f e . f90 , l i n e 2 5 . ( gdb ) run S t a r t i n g program : / san / user / jonesm / u2 / d_debug / l i f e
! ! ! !
if if if if
i =1 , i =ni , j =1 , j =nj ,
ni 1 nj 1
Breakpoint 1 , l i f e ( ) at l i f e . f90 :25 25 do j = 1 , n j C u r r e n t language : auto ; c u r r e n t l y f o r t r a n ( gdb ) s 26 do i = 1 , n i ( gdb ) s 30 im = 1 + ( i +nim2 ) ( ( i +nim2 ) / n i ) n i ! i f ( gdb ) s 31 i p = 1 + i ( i / n i ) n i ! if ( gdb ) p r i n t im $1 = 1 ( gdb ) p r i n t ( i +nim2 ) / 1 0 0 0 $2 = 0.999
i =1 , n i i =ni , 1
HPC-I Fall 2012
46 / 89
Game of Life
Ok, so therein lay the problem - nim2 and njm2 should be integers, not real values ... x that:
1 2 3 4 5 6 7 8 9 program l i f e ! ! Conway game o f l i f e ( debugging example ) ! i m p l i c i t none integer , parameter : : n i =1000 , n j =1000 , nsteps = 100 i n t e g e r : : i , j , n , im , i p , jm , j p , nsum , isum , nim2 , njm2 integer , dimension ( 0 : n i , 0 : n j ) : : old , new r e a l : : arand
and things become a bit more reasonable:

1 2 3 4 5 6 7 8 9 [ bono : ~ / d_debug ] $ i f o r t g [ bono : ~ / d_debug ] $ . / l i f e Tick 1 number Tick 2 number : : Tick 99 number Tick 100 number number o f l i v e p o i n t s = o l i f e l i f e . f90 272990 253690 of l i v i n g : of l i v i n g :
of l i v i n g : of l i v i n g : 94664
95073 94664
HPC-I Fall 2012
47 / 89
Game of Life
Diversion - Demo life
http://www.radicaleye.com/lifepage http://en.wikipedia.org/wiki/Conways_Game_of_Life Interesting repository of Conways life and cellular automata references.
HPC-I Fall 2012
48 / 89
Other Debugging Miscellany
Core Files
Core Files
Core les can also be used to instantly analyze problems that caused a code failure bad enough to dump a core le. Often the computer system has been set up in such a way that the default is not to output core les, however:
1 2 [ bono : ~ / d_debug ] $ u l i m i t c 0
for bash syntax. In tcsh you would use the limit built-in command to set the coredumpsize value:
1 2 3 [ bono : ~ / d_debug ] $ t c s h [ jonesm@bono ~ / d_debug ] $ l i m i t coredumpsize coredumpsize u n l i m i t e d
HPC-I Fall 2012
50 / 89
Core Files
Systems administrators set the core le size limit to zero by default for a good reason - these les generally contain the entire memory image of an application process when it dies, and that can be very large. End-users are also notoriously bad about leaving these les laying around ... Having said that, we can up the limit, and produce a core le that can later be used for analysis.
HPC-I Fall 2012
51 / 89
Core Files
Core File Example
Ok, so now we can use one of our previous examples, and generate a core le:
1 2 3 4 5 6 7 [ bono : ~ / d_debug ] $ gcc g o f i n d p r i m e s _ o r i g f i n d p r i m e s _ o r i g . c [ bono : ~ / d_debug ] $ . / f i n d p r i m e s _ o r i g e n t e r upper bound 20 Segmentation f a u l t ( core dumped ) [ bono : ~ / d_debug ] $ l s l core .7428 rw - - - 1 jonesm c c r s t a f f 65536 Sep 27 12:15 core .7428
HPC-I Fall 2012
52 / 89
Core Files
this particular core le is not at all large (it is a very simple code, though, with very little stored data - generally the core le size will reect the size of the application in terms of its memory use when it crashed). Analyzing it is pretty much like we did when running this example live in gdb:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 [ bono : ~ / d_debug ] $ l s l core .7428 rw - - - 1 jonesm c c r s t a f f 65536 Sep 27 12:15 core .7428 [ bono : ~ / d_debug ] $ gdb f i n d p r i m e s _ o r i g core .7428 GNU gdb Red Hat L i n u x ( 6 . 3 . 0 . 0 1 . 1 4 3 . e l 4 r h ) ... Core was generated by . / f i n d p r i m e s _ o r i g . Program t e r m i n a t e d w i t h s i g n a l 11 , Segmentation f a u l t . Reading symbols from / l i b 6 4 / t l s / l i b c . so . 6 . . . done . Loaded symbols f o r / l i b 6 4 / t l s / l i b c . so . 6 Reading symbols from / l i b 6 4 / l dl i n u xx8664.so . 2 . . . done . Loaded symbols f o r / l i b 6 4 / l dl i n u xx8664.so . 2 #0 0x000000392815062f i n _ I O _ v f s c a n f _ i n t e r n a l ( ) from / l i b 6 4 / t l s / l i b c . so . 6 ( gdb ) b t #0 0x000000392815062f i n _ I O _ v f s c a n f _ i n t e r n a l ( ) from / l i b 6 4 / t l s / l i b c . so . 6 #1 0x000000392815866a i n s c a n f ( ) from / l i b 6 4 / t l s / l i b c . so . 6 #2 0x0000000000400524 i n main ( ) a t f i n d p r i m e s _ o r i g . c : 1 6 ( gdb ) l 16 11 i n t main ( ) 12 { 13 i n t N; 14 15 p r i n t f ( " e n t e r upper bound \ n " ) ; 16 s c a n f ("%d " , UpperBound ) ;
HPC-I Fall 2012
53 / 89
Core Files
Summary on Core Files
So why would you want to use a core le rather than interactively debug? Your bug may take quite a while to manifest itself You have to debug inside a batch queuing system where interactive use is difcult or curtailed You want to capture a picture of the code state when it crashes
HPC-I Fall 2012
54 / 89
More Command-line Debuggers
More Comannd-line Debugging Tools
We focused on gdb, but there are command-line debuggers that accompany just about every available compiler product: pgdbg part of the PGI compiler suite, defaults to a GUI, but can be run as a command line interface (CLI) using the -text option idb part of the Intel compiler suite, defaults to CLI (has a special option -gdb for using gdb command syntax)
HPC-I Fall 2012
55 / 89
Run-time Compiler Checks
Most compilers support run-time checks than can quickly catch common bugs. Here is a handy short-list (contributions welcome!): For Intel fortran, -check bounds -traceback -g will automate bounds checking, and enable extensive traceback analysis in case of a crash (leave out the bounds option to get a crash report on any IEEE exception, format mismatch, etc.) For PGI compilers, -Mbounds -g will do bounds checking For GNU compilers, -fbounds-check -g should also do bounds checking, but is only currently supported for Fortran and Java front-ends.
HPC-I Fall 2012
56 / 89
Run-time Compiler Checks(contd)
WARNING It should be noted that run-time error checking can very much slow down a codes execution, so it is not something that you will want to use all of the time.
HPC-I Fall 2012
57 / 89
Serial Debugging GUIs
There are, of course, a matching set of GUIs for the various debuggers. A short list: ddd a graphical front-end for the venerable gdb pgdbg GUI for the PGI debugger idb -gui GUI for Intel compiler suite debugger It is very much a matter of preference whether or not to use the GUI. I nd the GUI to be constraining, but it does make navigation easier.
HPC-I Fall 2012
58 / 89
DDD Example
Running one of our previous examples using ddd ...
HPC-I Fall 2012
59 / 89
More Information on Debuggers
More information on the tools that we have used/mentioned (man pages are also a good place to start): gdb User Manual:
http://sources.redhat.com/gdb/current/onlinedocs/gdb_toc.html
ddd User Guide:

http://www.gnu.org/manual/ddd/pdf/ddd.pdf
idb Manual:
http://www.intel.com/software/products/compilers/docs/linux/idb_ manual_l.html
pgdbg Guide (locally on CCR systems):

file:///util/pgi/linux86-64/6.2/doc/index.htm
HPC-I Fall 2012
60 / 89
Source Code Checking Tools
Now, in a completely different vein, there are tools designed to help identify errors pre-compilation, namely by running it through the source code itself. splint is a tool for statically checking C programs: http://www.splint.org ftncheck is a tool that checks only (alas) FORTRAN 77 codes: http://www.dsm.fordham.edu/~ftnchek/ I cant say that I have found these to be particulary helpful, though.
HPC-I Fall 2012
61 / 89
Memory Allocation Tools
Memory allocation problems are very common - there are some tools designed to help you catch such errors at run-time: efence , or Electric Fence, tries to trap any out-of-bounds references (see man efence) valgrind is a suite of tools for anlayzing and proling binaries (see man valgrind) - there is a user manual available at:
file:///usr/share/doc/valgrind-3.6.0/html/manual.html
valgrind I have seen used with good success, but not particularly in the HPC arena.
HPC-I Fall 2012
62 / 89
Strace
strace is a powerful tool that will allow you to trace all system calls and signals made by a particular binary, whether or not you have source code. Can be attached to already running processes. A powerful lowlevel tool. You can learn a lot from it, but is often a tool of last resort for user applications in HPC due to the copious quantity of extraneous information it outputs.
HPC-I Fall 2012
63 / 89
Strace Example
As an example of using strace, lets peek in on a running MPI process (part of a 32 task job on U2):
[ c06n15 : ~ ] $ ps u jonesm L f UID PID PPID LWP C NLWP STIME TTY TIME CMD jonesm 23964 16284 23964 92 2 14:34 ? 0 0 : 0 4 : 1 1 / u t i l / nwchem / nwchem5.0/ b i n / jonesm 23964 16284 23965 99 2 14:34 ? 0 0 : 0 4 : 3 0 / u t i l / nwchem / nwchem5.0/ b i n / jonesm 23987 23986 23987 0 1 14:37 p t s / 0 0 0 : 0 0 : 0 0 bash jonesm 24128 23987 24128 0 1 14:39 p t s / 0 0 0 : 0 0 : 0 0 ps u jonesm L f [ c06n15 : ~ ] $ s t r a c e p 23965 Process 23965 a t t a c h e d i n t e r r u p t t o q u i t : l s e e k ( 4 5 , 691535872 , SEEK_SET) = 691535872 read ( 4 5 , " \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 2 \ 2 7 3 \ f [ \ 2 5 0 \ 2 0 7V\ 2 7 6 \ 3 7 6K& ] \ 3 3 1 \ 2 3 0 d " . . . , 524288)=524288 g e t t i m e o f d a y ({1161107631 , 126604} , { 2 4 0 , 1161107631}) = 0 g e t t i m e o f d a y ({1161107631 , 128553} , { 2 4 0 , 1161107631}) = 0 : : s e l e c t ( 4 7 , [ 3 4 6 7 8 9 42 43 44 4 6 ] , [ 4 ] , NULL , NULL ) = 2 ( i n [ 4 ] , o u t [ 4 ] ) w r i t e ( 4 , " \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 " . . . , 2932) = 2932 writev (4 , [ { " \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 1 7 \ 0 \ 0 \ 0 \ 3 7 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 , \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 " . . . , 32} , { " \ 1 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 0 \ 3 7 \ 0 \ 0 \ 0 \ 1 7 \ 0 \ 0 \ 0 \ 3 7 \ 0 \ 0 \ 0 , \ 0 \ 1 \ 0 0 0 0 u " . . . , 4 4 } ] , 2 ) = 76
HPC-I Fall 2012
64 / 89
Part II Advanced (Parallel) Debugging
HPC-I Fall 2012
65 / 89
Basic Parallel Debugging
Wither Goest the GUI?
Wither Goest the GUI?
Using a GUI-based debugger gets considerably more difcult when dealing with debugging an MPI-based parallel code (not so much on the OpenMP side), due to the fact that you are now dealing with multiple processes scattered across different machines. The TotalView debugger is the premier product in this arena (it has both CLI and GUI support) - but it is very expensive, and not present in all environments. We will start out using our same toolbox as before, and see that we can accomplish much without spending a fortune. The methodologies will be equally applicable to the fancy commercial products.
HPC-I Fall 2012
67 / 89
Process Checking
Process Checking
First on the agenda - parallel processing involves multiple processes/threads (or both), and the rst rule is to make sure that they are ending up where you think that they should be (needless to say, all too often they do not). Use MPI_Get_processor_name to report back on where processes are running Use ps to monitor processes as they run (useful ags: ps u -L), even on remote nodes (rsh/ssh into them)
HPC-I Fall 2012
68 / 89
Process Checking
Process Checking Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[ bono : ~ ] $ q s t a t n 239365 bono . c c r . b u f f a l o . edu : Req d Req d Job ID Username Queue Jobname SessID NDS TSK Memory Time S - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 239365. bono . c c r . b u f f jonesm ccr QAtest 27130 4 24:00 R c23n31 /1+ c23n31 /0+ c23n30 /1+ c23n30 /0+ c23n29 /1+ c23n29 /0+ c23n28 /1+ c23n28 / 0 [ bono : ~ ] $ r s h c23n31 PID LWP TTY 27130 27130 ? 27201 27201 ? 27235 27235 ? 27244 27244 ? 30970 30970 ? 30982 30982 ? 30984 30984 ? 30985 30985 ? 30985 30987 ? 30986 30986 ? 1616 1616 ? ps u jonesm L TIME CMD 0 0: 0 0 : 0 0 bash 0 0: 0 0 : 0 0 pbs_demux 0 0: 0 0 : 0 0 bash 0 0: 0 0 : 0 0 doqmtests . mpi 0 0: 0 0 : 0 0 r u n t e s t s . mpi . un 0 0: 0 0 : 0 0 mpiexec 0 0: 0 0 : 0 0 mpiexec 0 0: 2 7 : 4 0 nwchem n t e l 9 1 i 0 0: 0 2 : 3 7 nwchem n t e l 9 1 i 0 0: 2 7 : 3 2 nwchem n t e l 9 1 i 0 0: 0 0 : 0 0 ps Elap Time - - 00:50
HPC-I Fall 2012
69 / 89
Process Checking
or you can script it (I called this script job_ps):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # ! / b i n / sh # # S h e l l s c r i p t t o t a k e a s i n g l e argument (PBS j o b i d ) and launch a # ps command on each node # QST= which q s t a t i f [ z $QST ] ; then echo "ERROR: no q s t a t i n PATH : PATH="$PATH exit fi case $# i n 0) echo " s i n g l e PBS_JOBID r e q u i r e d . " ; e x i t ; ; # no args , e x i t 1) j o b i d =$1 ; ; ) echo " s i n g l e PBS_JOBID r e q u i r e d . " ; e x i t ; ; # t o o many args , e x i t esac
HPC-I Fall 2012
70 / 89
Process Checking
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
# g e t node l i s t i n g & t r i m o u t excess v e r b i a g e n l i n e s = $QST an $ j o b i d | wc l n l i n e s = echo " $ n l i n e s 6" | bc n o d e l i s t = $QST n $ j o b i d | t a i l $ n l i n e s | sed " s / \ / [ 0 1 ] + / , / g " | sed " s / \ / [ 0 1 ] / / " | sed " s / + / / g " | sed " s / , / \ / g " | awk { f o r ( i =1; i <=NF ; i ++) p r i n t f ( " %s \ n " , $ i ) } | u n i q | awk { p r i n t f " %s " , $1 } | awk { f o r ( i =1; i <=NF1; i ++) p r i n t f ("% s " , $ i ) p r i n t f ("% s " , $NF ) } # d e f i n e ps command #MYPS=" ps aeLf | awk { i f ( \ $5 > 10) p r i n t \ $1 , \ $2 , \ $3 , \ $4 , \ $5 , \ $9 , \ $10 } " MYPS=" ps u jonesm L o pid , pcpu , time ,comm" echo "MYPS = $MYPS" f o r node i n $ n o d e l i s t ; do echo "NODE = $node , my CPU/ t h r e a d Usage : " r s h $node $MYPS done
HPC-I Fall 2012
71 / 89
Process Checking
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
[ bono : ~ ] $ job_ps 239365 MYPS = ps u jonesm L o pid , pcpu , time ,comm NODE = c23n31 , my CPU/ t h r e a d Usage : PID % CPU TIME COMMAND 27130 0 . 0 0 0: 0 0 : 0 0 bash 27201 0 . 0 0 0: 0 0 : 0 0 pbs_demux 27235 0 . 0 0 0: 0 0 : 0 0 bash 27244 0 . 0 0 0: 0 0 : 0 0 doqmtests . mpi 1652 0 . 0 0 0: 0 0 : 0 0 r u n t e s t s . mpi . un 1664 0 . 0 00 : 0 0 : 0 0 mpiexec 1666 0 . 0 00 : 0 0 : 0 0 mpiexec 1667 94.5 00 : 1 7 : 1 8 nwchem n t e l 9 1 i 1667 10.0 00 : 0 1 : 5 0 nwchem n t e l 9 1 i 1668 94.2 00 : 1 7 : 1 5 nwchem n t e l 9 1 i 3813 0 . 0 00 : 0 0 : 0 0 ps NODE = c23n30 , my CPU/ t h r e a d Usage : PID % CPU TIME COMMAND 1975 96.2 00 : 1 7 : 3 6 nwchem n t e l 9 1 i 1975 6 . 6 00 : 0 1 : 1 3 nwchem n t e l 9 1 i 1976 96.0 00 : 1 7 : 3 4 nwchem n t e l 9 1 i 4033 0 . 0 00 : 0 0 : 0 0 ps NODE = c23n29 , my CPU/ t h r e a d Usage : PID % CPU TIME COMMAND 2673 95.2 00 : 1 7 : 2 6 nwchem n t e l 9 1 i 2673 8 . 9 00 : 0 1 : 3 8 nwchem n t e l 9 1 i 2674 94.5 00 : 1 7 : 1 9 nwchem n t e l 9 1 i 4728 1 . 0 00 : 0 0 : 0 0 ps
HPC-I Fall 2012
72 / 89
Process Checking
28 29 30 31 32 33
NODE = c23n28 , my CPU/ t h r e a d Usage : PID % CPU TIME COMMAND 19284 88.2 00 : 1 6 : 0 9 nwchem n t e l 9 1 i 19284 14.8 00 : 0 2 : 4 3 nwchem n t e l 9 1 i 19285 88.2 00 : 1 6 : 0 9 nwchem n t e l 9 1 i 21374 0 . 0 00 : 0 0 : 0 0 ps
HPC-I Fall 2012
73 / 89
GDB in Parallel
Using Serial Debuggers in Parallel?
Using Serial Debuggers in Parallel?
Yes, you can certainly run debuggers designed for use in sequential codes in parallel. They are even quite effective. You may just have to jump through a few extra hoops to do so ...
HPC-I Fall 2012
75 / 89
GDB in Parallel
Attaching GDB
Attaching GDB to Running Processes

The simplest way to use a CLI-based debugger in parallel is to attach it to already running processes, namely: Find the parallel processes using the ps command (may have to rsh/ssh into remote nodes if that is where they are running) Invoke gdb on each process ID:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 [ u2 : ] $ ssh f10n32 [ u2 : ~ ] $ ps u jonesm PID TTY TIME CMD 680 p t s / 0 0 0: 0 0 : 0 0 bash 682 p t s / 0 0 0: 0 0 : 0 0 pbs_mom 683 p t s / 0 0 0: 0 0 : 0 0 pbs_demux 784 ? 0 0: 0 0 : 0 0 python 789 p t s / 0 0 0: 0 0 : 0 0 python 790 p t s / 0 0 0: 0 0 : 0 0 sh < d e f u n c t > 791 ? 00 : 0 0 : 0 0 python 792 ? 00 : 0 0 : 1 4 pp . gdb 797 ? 00 : 0 0 : 0 0 sshd [ f10n32 : ~ ] $ gdb pp . gdb 792 GNU gdb (GDB) Red Hat E n t e r p r i s e L i n u x (7.2 48. e l 6 ) .... 0x0000000000400dd5 i n pp ( ) a t pp . f 9 0 : 4 2 42 do w h i l e ( gdbWait / = 1 )
HPC-I Fall 2012
76 / 89
GDB in Parallel
Attaching GDB
Of course, unless you put an explicit waiting point inside your code, the processes are probably happily running along when you attach to them, and you will likely want to exert some control over that.
HPC-I Fall 2012
77 / 89
GDB in Parallel
Attaching GDB
First, using our above example, I was running one mpi task on f10n32 and one on f10n24. After attaching gdb to each process, they paused:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [ f10n32 : ~ / d_hw / d_hw2 / d_pp ] $ gdb pp . gdb 923 GNU gdb (GDB) Red Hat E n t e r p r i s e L i n u x (7.2 48. e l 6 ) ... ( gdb ) where #0 0x00000031ee0dc0e8 i n p o l l ( ) from / l i b 6 4 / l i b c . so . 6 #1 0x00002b4015ef9a1b i n MPID_nem_tcp_connpoll ( i n _ b l o c k i n g _ p o l l =1) a t . . / . . / socksm . c :2526 #2 0x00002b4015ef8e82 i n MPID_nem_tcp_poll ( i n _ b l o c k i n g _ p o l l =1) a t . . / . . / socksm . c :2324 #3 0x00002b4015e51a56 i n MPID_nem_network_poll ( i n _ b l o c k i n g _ p r o g r e s s =1) a t . . . #4 0 x00002b4015cff4ce i n MPIDI_CH3I_Progress ( p r o g r e s s _ s t a t e =0 x 7 f f f 3 a f e 2 b 9 0 , . . . #5 0x00002b4015eab822 i n PMPI_Recv ( b u f =0x601e60 , count =1 , d a t a t y p e =1275070495 , . . . a t . . / . . / r e c v . c :156 #6 0x00002b40161ab000 i n pmpi_recv__ ( ) from / u t i l / i n t e l / 2 0 1 1 . 0 / i m p i / 4 . 0 . 1 . 0 0 7 / . . . #7 0x0000000000401288 i n pp ( ) a t pp . f 9 0 : 8 0 #8 0x000000000040176a i n main ( ) #9 0x00000031ee01ecdd i n _ _ l i b c _ s t a r t _ m a i n ( ) from / l i b 6 4 / l i b c . so . 6 #10 0x0000000000400c49 i n _ s t a r t ( ) ( gdb ) c Continuing .
HPC-I Fall 2012
78 / 89
GDB in Parallel
Attaching GDB
and on f10n24:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 [ u2 : ~ ] $ ssh f10n24 [ f10n24 : ~ ] $ ps u jonesm PID TTY TIME CMD 22673 ? 0 0: 0 0 : 0 0 python 22684 ? 0 0: 0 0 : 0 0 python 22686 ? 0 0: 0 2 : 5 4 pp . gdb 22693 ? 0 0: 0 0 : 0 0 sshd 22694 p t s / 0 0 0: 0 0 : 0 0 bash 22733 p t s / 0 0 0: 0 0 : 0 0 ps [ f10n24 : ~ ] $ gdb pp . gdb 22686 GNU gdb (GDB) Red Hat E n t e r p r i s e L i n u x (7.2 48. e l 6 ) ... ( gdb ) where #0 0x000000309b2dc0e8 i n p o l l ( ) from / l i b 6 4 / l i b c . so . 6 #1 0x00002b333112fa1b i n MPID_nem_tcp_connpoll ( i n _ b l o c k i n g _ p o l l =1) a t . . / . . / socksm . c :2526 #2 0x00002b333112ee82 i n MPID_nem_tcp_poll ( i n _ b l o c k i n g _ p o l l =1) a t . . / . . / socksm . c :2324 #3 0x00002b3331087a56 i n MPID_nem_network_poll ( i n _ b l o c k i n g _ p r o g r e s s =1) a t . . / . . / . . . #4 0x00002b3330f354ce i n MPIDI_CH3I_Progress ( p r o g r e s s _ s t a t e =0 x 7 f f f d 9 d f f 6 b 0 , . . . #5 0x00002b33310e1822 i n PMPI_Recv ( b u f =0x601e60 , count =4 , d a t a t y p e =1275070495 , . . . a t . . / . . / r e c v . c :156 #6 0x00002b33313e1000 i n pmpi_recv__ ( ) from / u t i l / i n t e l / 2 0 1 1 . 0 / i m p i / 4 . 0 . 1 . 0 0 7 / . . . #7 0x00000000004013be i n pp ( ) a t pp . f 9 0 : 9 1 #8 0x000000000040176a i n main ( ) #9 0x000000309b21ecdd i n _ _ l i b c _ s t a r t _ m a i n ( ) from / l i b 6 4 / l i b c . so . 6 #10 0x0000000000400c49 i n _ s t a r t ( ) ( gdb ) c Continuing .
and we used the (c) continue command to let the execution pick up again where we (temporarily) interrupted it.
GDB in Parallel
Attaching GDB
Using a Waiting Point
You can insert a waiting point into your code to ensure that execution waits until you get a chance to attach a debugger:
i n t e g e r : : gdbWait=0 : : CALL MPI_COMM_RANK(MPI_COMM_WORLD, myid , i e r r ) CALL MPI_COMM_SIZE(MPI_COMM_WORLD, Nprocs , i e r r ) ! dummy pause p o i n t f o r gdb i n s t e r t i o n do while ( gdbWait / = 1 ) end do
HPC-I Fall 2012
80 / 89
GDB in Parallel
Attaching GDB
and then you will nd the waiting at that point when you attach gdb, and you can release it at your leisure:
1 2 3 4 5 6 7 8 9 [ f10n32 : ~ / d_hw / d_hw2 / d_pp ] $ gdb pp . gdb 1003 GNU gdb (GDB) Red Hat E n t e r p r i s e L i n u x (7.2 48. e l 6 ) ... 0x0000000000400dd2 i n pp ( ) a t pp . f 9 0 : 4 2 42 do w h i l e ( gdbWait / = 1 ) ( gdb ) s gdbWait = 1 44 i f ( Nprocs / = 2 ) then ( gdb ) c Continuing .
1 2 3 4 5 6 7 8
[ f10n24 : ~ / d_hw / d_hw2 / d_pp ] $ gdb pp . gdb 22777 GNU gdb (GDB) Red Hat E n t e r p r i s e L i n u x (7.2 48. e l 6 ) 0x0000000000400dd2 i n pp ( ) a t pp . f 9 0 : 4 2 42 do w h i l e ( gdbWait / = 1 ) ( gdb ) s gdbWait = 1 44 i f ( Nprocs / = 2 ) then ( gdb ) c Continuing .
HPC-I Fall 2012
81 / 89
GDB in Parallel
Using GDB Within MPI Task Launcher

Last, but not least, you can usually launch gdb through your MPI task launcher. For example, using the Intel MPI task launcher, mpiexec,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 [ k07n14 : ~ / d_hw / d_hw2 / d_pp ] $ mpiexec gdb np 2 . / pp . gdb 01: ( gdb ) l i s t 30 01: 25 01: 26 i n t e g e r : : gdbWait=0 01: 27 i n t e g e r myid , Nprocs , i e r r , mpi_procname_length 01: 28 i n t e g e r : : s t a t u s ( MPI_STATUS_SIZE ) 01: 29 c h a r a c t e r ( l e n =MPI_MAX_PROCESSOR_NAME) : : mpi_procname 01: 30 01: 31 ! 01: 32 ! I n i t i a l i z e communicator , check t h a t we are u s i n g o n l y 2p 01: 33 ! 01: 34 CALL MPI_INIT ( i e r r ) 01: ( gdb ) b 34 01: B r e a k p o i n t 2 a t 0 x400d1f : f i l e pp . f90 , l i n e 3 4 . 01: ( gdb ) run 01: C o n t i n u i n g . 01: 01: B r e a k p o i n t 2 , pp ( ) a t pp . f 9 0 : 3 4 01: 34 CALL MPI_INIT ( i e r r ) 01: ( gdb ) 01: ( gdb ) c 01: C o n t i n u i n g . 0 : H e l l o from proc 0 o f 2 k07n14 . c c r . b u f f a l o . edu 1 : H e l l o from proc 1 o f 2 k07n14 . c c r . b u f f a l o . edu 0 : Number Averaged f o r Sigmas : 2
HPC-I Fall 2012
82 / 89
GDB in Parallel
or, in batch mode:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 [ u2 : ~ ] $ qsub q debug lnodes =2:GM: ppn =2 , w a l l t i m e =00:15:00 I qsub : w a i t i n g f o r j o b 993514. d15n41 . c c r . b u f f a l o . edu t o s t a r t qsub : j o b 993514. d15n41 . c c r . b u f f a l o . edu ready Job 993514. d15n41 . c c r . b u f f a l o . edu has requested 2 cores / p r o c e ss o r s per node . PBSTMPDIR i s / s c r a t c h / 9 93 5 1 4 . d15n41 . c c r . b u f f a l o . edu [ f10n32 : ~ ] $ module l o a d i n t e l mpi [ f10n32 : ~ ] $ NNODES= c a t $PBS_NODEFILE | u n i q | wc l [ f10n32 : ~ ] $ mpdboot n $NNODES f $PBS_NODEFILE [ f10n32 : ~ ] $ mpicc g o c p i . i m p i c p i . c [ f10n32 : ~ ] $ mpiexec gdb np 4 . / c p i . i m p i 03: ( gdb ) l i s t 16 ,20 03: 16 double s t a r t w t i m e = 0 . 0 , endwtime ; 03: 17 i n t namelen ; 03: 18 char processor_name [MPI_MAX_PROCESSOR_NAME ] ; 03: 19 03: 20 M P I _ I n i t (& argc ,& argv ) ; 03: ( gdb ) b 20 03: B r e a k p o i n t 2 a t 0x4009a0 : f i l e c p i . c , l i n e 2 0 . 03: ( gdb ) run 03: C o n t i n u i n g . 03: 03: B r e a k p o i n t 2 , main ( argc =1 , argv =0 x 7 f f f f f f f d 9 b 8 ) a t c p i . c : 2 0 03: 20 M P I _ I n i t (& argc ,& argv ) ; 03: ( gdb ) 03: ( gdb ) c 03: C o n t i n u i n g . 0 : Process 0 on f10n32 . c c r . b u f f a l o . edu 1 : Process 1 on f10n32 . c c r . b u f f a l o . edu 2 : Process 2 on f10n27 . c c r . b u f f a l o . edu 3 : Process 3 on f10n27 . c c r . b u f f a l o . edu
HPC-I Fall 2012
83 / 89
GDB in Parallel
Using Serial Debuggers in Parallel
So you can certainly use serial debuggers in parallel - in fact it is a pretty handy thing to do. Just keep in mind: Dont forget to compile with debugging turned on You can always attach to a running code (and you can instrument the code with that purpose in mind) Beware that not all task launchers are equally friendly towards built-in support for serial debuggers
HPC-I Fall 2012
84 / 89
GUI-based Parallel Debugging
TotalView
The TotalView Debugger
The premier parallel debugger, TotalView: Sophisticated commercial product (think many $$ ...) Designed especially for HPC, multi-process, multi-thread Has both GUI and CLI Supports C/C++, Fortran 77/90/95, mixtures thereof The ofcial debugger of DOEs Advanced Simulation and Computing (ASC) program
HPC-I Fall 2012
86 / 89
TotalView
Using TotalView at CCR
Pretty simple to start using TotalView on the CCR systems:

1
Generally you want to load the latest version:

[ d16n03 : ~ ] $ module a v a i l t o t a l v i e w
Make sure that your X DISPLAY environment is working if you are going to use the GUI. The current CCR license supports 2 concurrent users up to 8 processors (precludes usage on nodes with more than 8 cores until/unless this license is upgraded).
HPC-I Fall 2012
87 / 89
DDT
The DDT Debugger
Allineas commercial parallel debugger, DDT: Sophisticated commercial product (think many $$ ...) Designed especially for HPC, multi-process, multi-thread Has both GUI and CLI Supports C/C++, Fortran 77/90/95, mixtures thereof CCR has a 32-token license for DDT (including CUDA support) To nd the latest installed version, module avail ddt
HPC-I Fall 2012
88 / 89
Eclipse PTP
Current Recommendations
CCR has licenses for Allineas DDT and TotalView (although the current TotalView license is very small and outdated and will be either upgraded or dropped in favor of DDT). Both are quite expensive, but stay tuned for further developments. Note that the open-source eclipse project also has a parallel tools platform that can be used in combination with C/C++ and Fortran: http://www.eclipse.org/ptp
HPC-I Fall 2012
89 / 89

Class13 Debugging PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Class13 Debugging PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Debugging in Serial & Parallel

High Performance Computing I, 2012

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Part I Basic (Serial) Debugging

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

Software for Debugging

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

Command-line debugging example

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

Ok, now lets compile and run this code:

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

Ok, that is instructive, but no closer to nding the bug.

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging

Digging Out the Bug

M. D. Jones, Ph.D. (CCR/UB)

Debugging in Serial & Parallel

HPC-I Fall 2012

Software for Debugging