Sie sind auf Seite 1von 62

Data Mining:

Concepts and Techniques


Chapter 11 Additional Theme: Software Bug Mining Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj 2006 Jiawei Han and Micheline Kamber. All rights reserved.
Acknowledgement: Chao Liu
7/15/2011 Data Mining: Principles and Algorithms 1

7/15/2011

Data Mining: Principles and Algorithms

Outline
    

Motivation Related Work Classification of Program Executions Extract Backtrace from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions

 

7/15/2011

Data Mining: Principles and Algorithms

Motivation


Software is full of bugs




Windows 2000, 35 million lines of code




Courtesy to CNN.com

63,000 known bugs at the time of release, 2 per 1000 lines

Software failure costs




Ariane 5 explosion due to errors in the software of the inertial reference system (Ariaen-5 flight 501 inquiry board report http://ravel.esrin.esa.it/docs/esa-x-1819eng.pdf) A study by the National Institute of Standards and Technology found that software errors cost the U.S. economy about $59.5 billion annually http://www.nist.gov/director/prog-ofc/report02-3.pdf 50% of my company employees are testers, and the rest spends 50% of their time testing! Bill Gates, in 1995
Data Mining: Principles and Algorithms 4

Testing and debugging are laborious and expensive




7/15/2011

A Glimpse on Software Bugs




Crashing bugs
  

Symptoms: segmentation faults Reasons: memory access violations Tools: Valgrind, CCured

Noncrashing bugs
 

Symptoms: unexpected outputs Reasons: logic or semantic errors


  

if ((m >= 0)) vs. if ((m >= 0) && (m != lastm)) < vs. <=, > vs. >=, etc .. j = i vs. j= i+1

Tools: No sound tools


Data Mining: Principles and Algorithms 5

7/15/2011

Example of Noncrashing Bugs


void subline(char *lin, char *pat, char *sub) void subline(char *lin, char *pat, char *sub) { { int i, lastm, m; int i, lastm, m; lastm = -1; lastm = -1; i = 0; i = 0; while((lin[i] != ENDSTR)) { while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); m = amatch(lin, i, pat, 0); if ((m >=0){ && (lastm != m) ){ 0) if (m >>= 0){ putsub(lin, i, m, sub); putsub(lin, i, m, sub); lastm = m; lastm = m; } } if ((m == -1) || (m == i)){ if ((m == -1) || (m == i)){ fputc(lin[i], stdout); fputc(lin[i], stdout); i = i + 1; i = i + 1; } else } else i = m; i = m; } } } }
7/15/2011 Data Mining: Principles and Algorithms 6

Debugging Crashes
Crashing Bugs

7/15/2011

Data Mining: Principles and Algorithms

Bug Localization via Backtrace


 

Can we circle out the backtrace for noncrashing bugs? Major challenges


We do not know where abnormality happens

Observations


Classifications depend on discriminative features, which can be regarded as a kind of abnormality Can we extract backtrace from classification results?

7/15/2011

Data Mining: Principles and Algorithms

Outline
    

Motivation Related Work Classification of Program Executions Extract Backtrace from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions

 

7/15/2011

Data Mining: Principles and Algorithms

Related Work


Crashing bugs


Memory access monitoring




Purify [HJ92], Valgrind [SN00]

Noncrashing bugs
  

Static program analysis Traditional model checking Model checking source code

7/15/2011

Data Mining: Principles and Algorithms

10

Static Program Analysis




Methodology  Examine source code directly  Enumerate all the possible execution paths without running the program  Check user-specified properties, e.g.
  

free(p) (*p) lock(res) unlock(res) receive_ack() send_data()

Strengths  Check all possible execution paths Problems  Shallow semantics  Properties can be directly mapped to source code structure Tools  ESC [DRL+98], LCLint [EGH+94], ESP [DLS02], MC Checker [ECC00]
Data Mining: Principles and Algorithms 11

7/15/2011

Traditional Model Checking




Methodology  Formally model the system under check in a particular description language  Exhaustive exploration of the reachable states in checking desired or undesired properties Strengths  Model deep semantics  Naturally fit in checking event-driven systems, like protocols Problems  Significant amount of manual efforts in modeling  State space explosion Tools  SMV [M93], SPIN [H97], Murphi [DDH+92]
Data Mining: Principles and Algorithms 12

7/15/2011

Model Checking Source Code




Methodology  Run real program in sandbox  Manipulate event happenings, e.g.,


 

Message incomings the outcomes of memory allocation

Strengths  Less significant manual specification Problems  Application restrictions, e.g.,


 

Event-driven programs (still) Clear mapping between source code and logic event

Tools  CMC [MPC+02], Verisoft [G97], Java PathFinder [BHP+-00]


Data Mining: Principles and Algorithms 13

7/15/2011

Summary of Related Work




In common,  Semantic inputs are necessary


 

Program model Properties to check Shallow semantics Event-driven system

Application scenarios
 

7/15/2011

Data Mining: Principles and Algorithms

14

Outline
    

Motivation Related Work Classification of Program Executions Extract Backtrace from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions

 

7/15/2011

Data Mining: Principles and Algorithms

15

Example Revisited
void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >=0){ && (lastm != m) ){ ((m>>=0){ 0) putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } }
7/15/2011 Data Mining: Principles and Algorithms 16

No memory violations Not event-driven program No explicit error properties

Identification of Incorrect Executions




A two-class classification problem  How to abstract program executions




Program behavior graph Edges + Closed frequent subgraphs

Feature selection


Program behavior graphs  Function-level abstraction of program behaviors


int main(){ ... A(); ... B(); } int A(){ ... } int B(){ ... C() ... } int C(){ ... }

7/15/2011

Data Mining: Principles and Algorithms

17

Values of Classification


A graph classification problem  Every execution gives one behavior graph  Two sets of instances: correct and incorrect Values of classification  Classification itself does not readily work for bug localization


Classifier only labels each run as either correct or incorrect as a whole It does not tell when abnormality happens Can discriminative features be treated as a kind of abnormality? Incremental classification?
Data Mining: Principles and Algorithms 18

? Successful classification relies on discriminative features





7/15/2011

When abnormality happens?




Outline
    

Motivation Related Work Classification of Program Executions Extract Backtrace from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions

 

7/15/2011

Data Mining: Principles and Algorithms

19

Incremental Classification
Classification works only when instances of two classes are different. So that we can use classification accuracy as a measure of difference. Relate classification dynamics to bug relevant functions

7/15/2011

Data Mining: Principles and Algorithms

20

Illustration: Precision Boost


main main

One Correct Execution


7/15/2011

One Incorrect Execution


21

Data Mining: Principles and Algorithms

Bug Relevance


Precision boost  For each function F:




Precision boost = Exit precision - Entrance precision. Differences take place within the execution of F Abnormalities happens while F is in the stack The larger this precision boost, the more likely F is part of the backtrace

Intuition
  

Bug-relevant function

7/15/2011

Data Mining: Principles and Algorithms

22

Outline

    

Related Work Classification of Program Executions Extract Backtrace from Classification Dynamics Case Study Conclusions

7/15/2011

Data Mining: Principles and Algorithms

23

Case Study
void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); ((m >= 0) if (m >= 0){ && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } }
  

Subject program  replace: perform regular expression matching and substitutions  563 lines of C code  17 functions are involved Execution behaviors  130 out of 5542 test cases fail to give correct outputs  No incorrect executions incur segmentation faults Logic bug  Can we circle out the backtrace for this bug?
24

7/15/2011

Data Mining: Principles and Algorithms

Precision Pairs

7/15/2011

Data Mining: Principles and Algorithms

25

Precision Boost Analysis




Objective judgment of bug relevant functions main function is always bug relevant Stepwise precision boost Line-up property

7/15/2011

Data Mining: Principles and Algorithms

26

Backtrace for Noncrashing Bugs

7/15/2011

Data Mining: Principles and Algorithms

27

Method Summary


Identify incorrect executions from program runtime behaviors

Classification dynamics can give away backtrace for noncrashing bugs without any semantic inputs

Data mining can contribute to software engineering and system researches in general

7/15/2011

Data Mining: Principles and Algorithms

28

Outline
    

Motivation Related Work Classification of Program Executions Extract Backtrace from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions

 

7/15/2011

Data Mining: Principles and Algorithms

29

An Example
void dodash(char delim, char *src, int *i, char *dest, int *j, int maxset) { while (){ if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){ for(k = src[*i-1]+1; k<=src[*i+1]; k++) junk = addst(k, dest, j, maxset); *i = *i + 1; } *i = *i + 1; } }
 

Replace program: 563 lines of C code, 20 functions Symptom: 30 out of 5542 test cases fail to give correct outputs, and no crashes Goal: Localizing the bug, and prioritizing manual examination
Data Mining: Principles and Algorithms 30

7/15/2011

Difficulty & Expectation




Difficulty  Statically, even small programs are complex due to dependencies  Dynamically, execution paths can vary significantly across all possible inputs  Logic errors have no apparent symptoms Expectations  Unrealistic to fully unload developers  Localize buggy region  Prioritize manual examination
Data Mining: Principles and Algorithms 31

7/15/2011

Execution Profiling


Full execution trace  Control flow + value tags  Too expensive to record at runtime  Unwieldy to process Summarized control flow for conditionals (if, while, for)  Branch evaluation counts  Lightweight to take at runtime  Easy to process and effective
Data Mining: Principles and Algorithms 32

7/15/2011

Analysis of the Example


if(isalnum(isalnum(src[*i+1]) && src[*i-1]<=src[*i+1]){ for(k = src[*i-1]+1; k<=src[*i+1]; k++) junk = addst(k, dest, j, maxset); *i = *i + 1; }

A = isalnum(isalnum(src[*i+1])) B = src[*i-1]<=src[*i+1]


An execution is logically correct until (A ^ B) is evaluated as true when the evaluation reaches this condition If we monitor the program conditionals like A here, their evaluation will shed light on the hidden error and can be exploited for error isolation

7/15/2011

Data Mining: Principles and Algorithms

33

Analysis of Branching Actions




Correct vs. in correct runs in program P A B B nAB nAB = 0 A B B nAB nAB


1

A nAB nAB

A nAB nAB

AS we tested through 5542 test cases, the true eval prob for (A^B) is 0.727 in a correct and 0.896 in an incorrect execution on average Error location does exhibit detectable abnormal behaviors in incorrect executions
Data Mining: Principles and Algorithms 34

7/15/2011

Conditional Test Works for Nonbranching Errors


oid makepat (char *arg, int start, char delim, char *pat) { if (!junk) result = 0; else result = i + 1; /* off-by-one error */ /* should be: result = i */ return result; }


Off-by-one error can still be detected using the conditional tests

7/15/2011

Data Mining: Principles and Algorithms

35

Ranking Based on Boolean Bias




Let input di has a desired output oi. We execute P. P passes the test iff oi is identical to oi Tp = {ti| oi = P(di) matches oi} Tf = {ti| oi = P(di) does not match oi} Boolean bias:  nt: # times that a boolean feature B evaluates true, similar for nf  Boolean bias: (B) = (nt nf )/(nt + nf)  It encodes the distribution of B s value: 1 if B always assumes true, -1 if always false, in between for all the other mixtures
Data Mining: Principles and Algorithms 36

7/15/2011

Evaluation Abnormality


Boolean bias for branch P  the probability of being evaluated as true within one execution Suppose we have n correct and m incorrect executions, for any predicate P, we end up with  An observation sequence for correct runs  S_p = (X _1, X _2, , X _n)  An observation sequence for incorrect runs  S_f = (X_1, X_2, , X_m) Can we infer whether P is suspicious based on S_p and S_f?
Data Mining: Principles and Algorithms 37

7/15/2011

Underlying Populations


Imagine the underlying distribution of boolean bias for correct and incorrect executions are f(X| p) and f(X| f) S_p and S_f can be viewed as random sample from the underlying populations respectively Major heuristic: The larger the divergence between f(X| p) and f(X| f), the more relevant the branch P is to the bug
Prob Prob

0
7/15/2011

Evaluation bias

Evaluation bias

1
38

Data Mining: Principles and Algorithms

Major Challenges
Prob Prob

0
 

Evaluation bias

Evaluation bias

No knowledge of the closed forms of both distributions Usually, we do not have sufficient incorrect executions to estimate f(X| f) reliably.

7/15/2011

Data Mining: Principles and Algorithms

39

Our Approach: Hypothesis Testing

7/15/2011

Data Mining: Principles and Algorithms

40

Faulty Functions


Motivation
 

Bugs are not necessarily on branches Higher confidence in function rankings than branch rankings Calculate the abnormality score for each branch within each function Aggregate them

Abnormality score for functions




7/15/2011

Data Mining: Principles and Algorithms

41

Two Evaluation Measures




CombineRank
 

Combine these score by summation Intuition: When a function contains many abnormal branches, it is likely bug-relevant Choose the largest score as the representative Intuition: When a function has one extremely abnormal branch, it is likely bug-relevant
Data Mining: Principles and Algorithms 42

UpperRank
 

7/15/2011

Dodash vs. Omatch: Which function is likely buggy? And Which Measure is More Effective?

7/15/2011

Data Mining: Principles and Algorithms

43

Bug Benchmark


Bug benchmark  Siemens Program Suite  89 variants of 6 subject programs, each of 200-600 LOC  89 known bugs in total  Mainly logic (or semantic) bugs  Widely used in software engineering research

7/15/2011

Data Mining: Principles and Algorithms

44

Results on Program replace

7/15/2011

Data Mining: Principles and Algorithms

45

Comparison between CombineRank and UpperRank




Buggy function ranked within top-k

7/15/2011

Data Mining: Principles and Algorithms

46

Results on Other Programs

7/15/2011

Data Mining: Principles and Algorithms

47

More Questions to Be Answered




What will happen (i.e., how to handle) if multiple errors exist in one program? How to detect bugs if only very few error test cases are available? Is it really more effective if we have more execution traces? How to integrate program semantics in this statistics-based testing algorithm? How to integrate program semantics analysis with statistics-based analysis?
Data Mining: Principles and Algorithms 48

7/15/2011

Outline
    

Motivation Related Work Classification of Program Executions Extract Backtrace from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions

 

7/15/2011

Data Mining: Principles and Algorithms

49

Mining Copy-Paste Bugs




Copy-pasting is common void __init prom_meminit(void) {  12% in Linux file system [Kasper2003] for (i=0; i<n; i++) { total[i].adr = list[i].addr;  19% in X Window total[i].bytes = list[i].size; system [Baker1995] total[i].more = &total[i+1]; Copy-pasted code is } error prone for (i=0; i<n; i++) {  Among 35 errors in taken[i].adr = list[i].addr; Linux drivers/i2o, taken[i].bytes = list[i].size; 34 are caused by taken[i].more = &total[i+1]; copy-paste [Chou2001] }

Forget to change!

(Simplified example from linux-2.6.6/arch/sparc/prom/memory.c)


7/15/2011 Data Mining: Principles and Algorithms 50

An Overview of Copy-Paste Bug Detection


Parse source code & build a sequence database Mine for basic copy-pasted segments

Compose larger copy-pasted segments Prune false positives

7/15/2011

Data Mining: Principles and Algorithms

51

Parsing Source Code


 

Purpose: building a sequence database Idea: statement number  Tokenize each component  Different operators/constant/key words Handle identifier renaming:  same type of identifiers same token

different tokens

old = 3; Tokenize
5 61 20

new = 3;
5 61 20

Hash 16
7/15/2011 Data Mining: Principles and Algorithms

Hash 16
52

Building Sequence Database




Program a long sequence Hash values 65 for (i=0; i<n; i++) {  Need a sequence database Cut the long sequence  Nave method: fixed length  Our method: basic block
Final sequence DB: (65) (16, 16, 71) (65) (16, 16, 71)
16 16 71 } 65 16 16 71

total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].more = &total[i+1];

for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; }

7/15/2011

Data Mining: Principles and Algorithms

53

Mining for Basic Copy-pasted Segments




Apply frequent sequence mining algorithm on the sequence database


Frequent subsequence total[i].adr = list[i].addr; Insert 1 statement total[i].bytes = list[i].size; ( ap = 1) total[i].more = &total[i+1];

(16, 16, 71) (16, 16, 71) 71) taken[i].adr = list[i].addr; 10,


Modification  Constrain the max gap

taken[i].bytes = list[i].size; taken[i].more = &total[i+1];

7/15/2011

Data Mining: Principles and Algorithms

54

Composing Larger Copy-Pasted Segments




Combine the neighboring copy-pasted segments repeatedly


Hash values 65 16 16 16 71 71 copypasted 65 16 16 16 71 71 for (i=0; i<n; i++) { total[i].adr = list[i].addr; total[i].adr = list[i].addr; total[i].bytes = list[i].size; total[i].bytes = list[i].size; total[i].more = &total[i+1]; } total[i].more = &total[i+1];

combine

for (i=0; i<n; i++) { taken[i].adr = list[i].addr; taken[i].adr = = list[i].size; taken[i].bytes list[i].addr; taken[i].bytes = list[i].size; taken[i].more = &total[i+1]; } taken[i].more = &total[i+1];
Data Mining: Principles and Algorithms

combine

7/15/2011

55

Pruning False Positives




Unmappable segments  Identifier names cannot be mapped to corresponding ones f (a1); f (a2); f (a3); f1 (b1); f1 (b2); f2 (b3);

Tiny segments conflict For more detail, see




Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation, 2004
Data Mining: Principles and Algorithms 56

7/15/2011

Some Test Results of C-P Bug Detection


Software Linux FreeBSD Apache PostgreSQL
Software Linux FreeBSD Apache PostgreSQL
7/15/2011

Verified Bugs 28 23 5 2
# LOC 4.4 M 3.3 M 224 K 458 K

Potential Bugs (careless programming) 21 8 0 0

Software Linux FreeBSD Apache PostgreSQL

Time 20 mins 20 mins 15 secs 38 secs

Space (MB) 527 459 30 57


57

Data Mining: Principles and Algorithms

Outline
    

Motivation Related Work Classification of Program Executions Extract Backtrace from Classification Dynamics Mining Control Flow Abnormality for Logic Error Isolation CP-Miner: Mining Copy-Paste Bugs Conclusions

 

7/15/2011

Data Mining: Principles and Algorithms

58

Conclusions
 

Data mining into software and computer systems Identify incorrect executions from program runtime behaviors Classification dynamics can give away backtrace for noncrashing bugs without any semantic inputs A hypothesis testing-like approach is developed to localize logic bugs in software No prior knowledge about the program semantics is assumed Lots of other software bug mining methods should be and explored
Data Mining: Principles and Algorithms 59

7/15/2011

References


[DRL+98] David L. Detlefs, K. Rustan, M. Leino, Greg Nelson and James B. Saxe. Extended static checking, 1998 [EGH+94] David Evans, John Guttag, James Horning, and Yang Meng Tan. LCLint: A tool for using specifications to check code. In Proceedings of the ACM SIG-SOFT '94 Symposium on the Foundations of Software Engineering, pages 87-96, 1994. [DLS02] Manuvir Das, Sorin Lerner, and Mark Seigle. Esp: Path-sensitive program verication in polynomial time. In Conference on Programming Language Design and Implementation, 2002. [ECC00] D.R. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specic, programmer-written compiler extensions. In Proc. 4th Symp. Operating Systems Design and Implementation, October 2000. [M93] Ken McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993 [H97] Gerard J. Holzmann. The model checker SPIN. Software Engineering, 23(5):279-295, 1997. [DDH+92] David L. Dill, Andreas J. Drexler, Alan J. Hu, and C. Han Yang. Protocol verication as a hardware design aid. In IEEE Int. Conf. Computer Design: VLSI in Computers and Processors, pages 522-525, 1992. [MPC+02] M. Musuvathi, D. Y.W. Park, A. Chou, D. R. Engler and D. L. Dill. CMC: A Pragmatic Approach to Model Checking Real Code. In Proc. 5th Symp. Operating Systems Design and Implementation, 2002.
Data Mining: Principles and Algorithms 60

  

7/15/2011

References (cont d)


[G97] P. Godefroid. Model Checking for Programming Languages using VeriSoft. In Proc. 24th ACM Symp. Principles of Programming Languages, 1997 [BHP+-00] G. Brat, K. Havelund, S. Park, and W. Visser. Model checking programs. In IEEE Int.l Conf. Automated Software Engineering (ASE), 2000. [HJ92] R. Hastings and B. Joyce. Purify: Fast Detection of Memory Leaks and Access Errors. 1991. in Proc. Winter 1992 USENIX Conference, pp. 125-138. San Francisco, California Chao Liu, Xifeng Yan, and Jiawei Han, Mining Control Flow Abnormality for Logic Error Isolation, in Proc. 2006 SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006. C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff, SOBER: Statistical Model-based Bug Localization , in Proc. 2005 ACM SIGSOFT Symp. Foundations of Software Engineering (FSE 2005), Lisbon, Portugal, Sept. 2005. C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, Mining Behavior Graphs for Backtrace of Noncrashing Bugs , in Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport Beach, CA, April 2005. [SN00] Julian Seward and Nick Nethercote. Valgrind, an open-source memory debugger for x86GNU/Linux http://valgrind.org/ [LLM+04] Zhenmin Li, Shan Lu, Suvda Myagmar, Yuanyuan Zhou. CP-Miner: A Tool for Finding Copypaste and Related Bugs in Operating System Code, in Proc. 6th Symp. Operating Systems Design and Implementation, 2004

[LCS+04] Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan, Yuanyuan Zhou. C-Miner: Mining Block Correlations in Storage Systems. In pro. 3rd USENIX conf. on file and storage technologies, 2004 7/15/2011 Data Mining: Principles and Algorithms


61

7/15/2011

Data Mining: Principles and Algorithms

62