Sie sind auf Seite 1von 129

Development of a combined white and black

box security testing tool for detecting


injection flaws in web applications

MSE Project Thesis 2

Advisor: Prof. Dr. Marc Rennhard

Kevin Denver

Winterthur, 2nd August 2010


Abstract

This project thesis builds upon the findings of the seminary thesis "Preliminary study of white box
security testing for the integration into the ASTF framework" which has been written at the Institut
für angewandte Informationstechnologie (InIT) by Kevin Denver (Feb. 2010).

A web application security testing framework (WASTF) has been developed during this project thesis
which uses the Java programming language. WASTF is a stand alone application which uses several
open source libraries such as HtmlUnit. HtmlUnit provides JavaScript support which is essential in
thoroughly crawling and analysing today’s web applications.

WASTF includes a plugin which combines black and white box testing techniques for improving the
detection of input validation vulnerabilities as opposed to a sole black box approach. The plugin uses
database query log files to detect web application input validation vulnerabilities in an automated
manner.

Tests have shown that the detection of SQL input validation vulnerabilities are more accurate and
more performant by combining black and white box testing methodology than only using a black box
testing approach. WASTF is 14 times faster than w3af in detecting SQL input validation vulnerabili-
ties.

i
Abstract

Diese Projektarbeit baut auf den Ergebnissen der Seminararbeit mit dem Titel "Preliminary study of
white box security testing for the integration into the ASTF framework" auf. Die Arbeit wurde am
Institut für angewandte Informationstechnologie (InIT) von Kevin Denver geschrieben (Feb. 2010).

Während dieser Projektarbeit wurde ein Framework geschrieben, welches Web Applikationen
automatisiert auf ihre Sicherheit überprüft. Das Web Application Security Testing Framework
(WASTF) ist eine selbständige konsolenbasierte Applikation, welche in Java geschrieben wurde.
WASTF verwendet mehrere Open Source Bibliotheken wie zum Beispiel HtmlUnit. HtmlUnit ist eine
Bibliothek, die erfolgreich Web Browser mit JavaScript Unterstützung emulieren kann. JavaScript
Unterstützung ist essentiell, wenn heutige Web Applikationen gründlich analysiert werden sollen.

WASTF beinhaltet ein Plugin, welches Black- und White-Box Testmethoden miteinander kombiniert.
Durch die Kombination soll die Detektierung von “Input Validation” Schwachstellen im Gegensatz
zu “Black-Box only ” Testmethoden verbessert, respektive präzisiert werden. Das Plugin verwendet
Datenbank Query Log Dateien, um SQL “Input Validation” Schwachstellen automatisiert zu
detektieren.

Anschliessende Tests haben gezeigt, dass die Kombination von Black- und White-Box Testmethoden
die Genauigkeit und die Geschwindigkeit der Detektierung von Schwachstellen erhöhen kann.
In einem durchgeführten Test konnte gezeigt werden, dass WASTF 14 mal schneller ist als ein
vergleichbarer “Black-Box only” Web Applikations Scanner (w3af).

ii
Contact

Zürcher Hochschule für Angewandte Wissenschaften (ZHAW)


c/o Institut für angewandte Informationstechnologie InIT
Steinberggasse 13
Postfach 805
CH-8401 Winterthur

Tel.: +41 58 934 75 87


Fax: +41 58 935 75 87

E-Mail: info.init@zhaw.ch
http://www.zhaw.ch
http://www.init.zhaw.ch

Name Staff-Code Function E-Mail


Denver Kevin denk Student kevin.denver@zhaw.ch
Rennhard Marc rema Project Leader & Advisor marc.rennhard@zhaw.ch

iii
Contents

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Combined White and Black Box Security Testing 3


2.1 Information Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Source Code Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.3 Runtime Analysis & Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.4 Configuration files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.5 Log/Trace files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.6 Resource Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.7 Application Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.8 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.9 Recording of Network Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 How to combine White and Black Box Security Testing . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Web Application Profiling Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Detection of Injection Flaws by using the Database query.log File . . . . . . . . . . 13

3 Application Requirements 15
3.1 Overall Goal & Project Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 Usability Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.3 Reliability Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.4 Performance Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.5 Supportability Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 User Stories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 The Database Query Log & How to Detect Input Validation Vulnerabilities 23
4.1 Configuration of the Database Query Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.1 MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.2 PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.3 Microsoft SQL Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

iv
Contents

4.1.4 Oracle Database Standard Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29


4.2 How to Detect Input Validation Vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 Online Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.2 Offline Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.3 Special Character Probing & Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Security Testing Framework - w3af 40


5.1 w3af Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 w3af WebSpider Plugin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2.2 Accessing Secured Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2.3 Shortcomings of the w3af WebSpider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2.4 Wivet Framework & Performance of the w3af WebSpider plugin . . . . . . . . . . . . . 44
5.3 XSS Injection Plugin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 SQL Injection Plugin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Security Testing Framework - WASTF 48


6.1 Overview & Interactive Command Line Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.2 Program Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2.1 Interactive Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2.2 Automated Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.3 Package Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.4 Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.4.1 Data Access Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.4.2 Database Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7 HtmlUnit based Web Spider 55


7.1 HtmlUnit Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Submitting a form with HtmlUnit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3 HtmlUnit JavaScript Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.4 Design of the Web Spider plugin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.4.1 Configuration Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.4.2 Program Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.5 Smartly filling out HTML forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.5.1 Non Invasive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.5.2 Invasive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.5.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

8 Detecting Injection Flaws 69


8.1 SQL Code Injection Mutator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

v
Contents

8.1.1 Validating SQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69


8.1.2 Mutation Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.1.3 Limitations of SQL Validation Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

9 Test Series 78
9.1 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.1.1 Base System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.1.2 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.1.3 Usernames and Passwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.2 Web Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.2.1 Index & Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.2.2 phpMyAdmin Version 3.3.2deb1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9.2.3 WIVET Version 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9.2.4 Damn Vulnerable Web Application (DVWA) Version 1.0.6 . . . . . . . . . . . . . . . . . . 81
9.2.5 Wordpress Version 3.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.2.6 Magento Version 1.4.1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.2.7 Flowershop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.3 Test Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.3.1 Test Set-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.3.2 Test Case: WIVET [Web Spider only] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.3.3 Test Case: Magento [Web Spider only] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.3.4 Test Case: Wordpress [Web Spider only] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.3.5 Test Case: Flowershop [Web Spider only] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.3.6 Test Case: DVWA [Web Form Login only] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.3.7 Test Case: Secure Messaging (Privasphere) [Web Form Login only] . . . . . . . . . . . 87
9.3.8 Test Case: DVWA [Database Query Log - Online Mode] . . . . . . . . . . . . . . . . . . . 87
9.3.9 Test Case: DVWA [Database Query Log - Offline Mode] . . . . . . . . . . . . . . . . . . . 89
9.3.10 Test Case: Wordpress [Web Spider & Database Query Log - Online Mode] . . . . . . 89
9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

10 User Manual & Development Guide 92


10.1 User Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
10.1.1 Hard- & Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
10.1.2 Installation Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
10.1.3 Running WASTF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
10.1.4 Video Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
10.1.5 Use WASTF Results for Automated Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 94
10.1.6 http-settings Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
10.1.7 webFormLogin Plugin Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.1.8 webSpider Plugin Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.1.9 misc-settings Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
10.1.10databaseQueryLog Plugin Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

vi
Contents

10.1.11Using the Offline Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97


10.2 Development Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
10.2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
10.2.2 Installing the Development Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
10.2.3 Building WASTF with Maven . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
10.2.4 Free/Third-Party Libraries Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
10.2.5 Use logback to debug WASTF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.2.6 WASTF Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.3 Sonar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.3.1 Analysing WASTF with Sonar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
10.3.2 Conclusion about the usage of Sonar in Software Projects . . . . . . . . . . . . . . . . . 104

11 Summary 105
11.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
11.2 Further Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
11.2.1 Application Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
11.2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Appendix 108
List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
List of listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

A HtmlUnit JavaScript Performance (Runtime Numbers) 114

B HtmlUnit JavaScript Performance (Code) 115

C Web Spider UML Class Diagram 117

D Web Spider UML Sequence Diagram 118

E Original Assignment 119

vii
Contents

• Chapter 1: The first chapter gives a brief overview over the main aspects of this project thesis. This
includes the motivation behind the project and how combining white and black box testing techniques
improve the detection of injection flaws in web applications. Additionally this chapter contains the
limitations which have been left aside.

• Chapter 2: Summarises the findings of [4] which serve as a basis for the development of a combined
white and black box web application security testing application throughout this project thesis.

• Chapter 3: Documents the first set of requirements for a combined white and black box security testing
application used for detecting injection flaws in web applications. The chapter contains a list of user
stories and application requirements the security testing application needs to address.

• Chapter 4: This chapter is all about configuring and parsing database query logs from various database
products like MySQL, PostgreSQL, Orcale and Mircrosoft SQL Server. Most if not all recent database
solutions provide some kind of logging facility to monitor processed SQL queries coming from clients.
However, the metadata adjoining the logged SQL queries differ greatly in the level of detail between
various database products.

• Chapter 5: Evaluates w3af as a basis for a combined white and black box security testing application.
w3af is an actively developed open source project with a lot of existing plugins including a web crawler
component and several code injection modules. It might be possible to enhance w3af with the missing
white box functionality.

• Chapter 6: Outlines the design decisions and overall package structure of the Web Application Security
Testing Framework (WASTF) which has been developed during this project thesis. The framework builds
the foundation for the combined white and black box web application security testing plugin. One pos-
sibility would have been to extend the w3af framework but this has been ruled out in chapter 5 because
of w3af’s shortcomings in certain areas such as the web spider plugin.

• Chapter 7: Describes the open source HtmlUnit project and the benefits it provides for testing web
applications automatically. Additionally, the design and performance of the web spider plugin, which
has been developed as part of this project thesis, are being discussed and visualised.

• Chapter 8: The following chapter outlines the final implementation details of the previous elaborated
combined white and black box plugin based on parsing database query log files. Especially how the
plugin uses SQL syntax validation libraries to find actually working exploits for found SQL injection vul-
nerabilities.

• Chapter 9: Is concerned with testing the WASTF application thoroughly. Especially in comparison with
standard black box web application security testing tools such as w3af. The goal is to provide a detailed
comparison between WASTF and w3af in regard of sent HTTP(S) requests, number of detected vulnera-
bilities, web pages discovered etc.

• Chapter 10: This chapter contains the user manual and the development guide for WASTF. The user
manual leads new users trough the installation process of WASTF and helps to get familiar with its com-
mands. The development guide is for advanced users who want to extend the functionality of WASTF
and build it from source.

• Chapter 11: Recapitulates the findings and limitations of this project thesis and proposes a list of further
steps for the project. These steps include some architectural refactoring and adding new features for
extending the possible applications of WASTF.

viii
1 Introduction

This first chapter gives a brief overview over the main aspects of this project thesis. This includes the
motivation behind this project and how combining white and black box testing techniques improve
the detection of injection flaws in web applications.

1.1 Motivation

The Automated Security-Testing Framework (ASTF) which has been developed at the Institut für
angewandte Informationstechnologie (InIT) as part of a KTI-Project, combines different security and
non-security testing tools. The main focus of ASTF lies in doing continuous and reproducible soft-
ware tests during the lifetime of an application. ASTF relies on third party tools which perform the
actual tests. Tools like nmap1 or Nessus2 are used to scan for open ports or known security holes.
The detection of typical web application vulnerabilities like Cross Site Scripting and SQL Injection
are covered by w3af3 . Yet all of these tools take an external perspective of the test object to derive test
cases. This method of testing is also called black box testing. While this method can uncover flaws in
the application, one cannot be sure that all existent paths are tested. White box testing on the other
hand takes more of an internal perspective on the test object and uses as much additional informa-
tion as possible to derive test cases. Such additional information can consist of the actual source code
of the application or metadata about the internal structure of the application.
One of the future goals of ASTF is to provide a mechanism to conduct white box security tests in an
automated way on different kinds of web applications whether they are written in PHP, Ruby, Java
or any other kind of server side scripting language. The objective of this project thesis is to build a
working web application security testing tool by combining white and black box testing techniques
as described in [4].

1.2 Limitations

A combined white and black box security testing application for detecting input validation vulnera-
bilities in web applications has been written during this project thesis with the following limitations:

• The database query log module only supports the MySQL database so far.

• The vulnerability detection routine does not consider different encodings of input characters.
If a web application consequently filters input characters such as < but not the URL encoded
1 http://nmap.org [15.02.2010]
2 http://www.nessus.org/nessus/ [15.02.2010]
3 http://w3af.sourceforge.net/ [15.02.2010]

Chapter 1. Introduction | 1
equivalent %3C the application is still vulnerable but the routine will not detect the vulnerability
yet.

These limitations are merely missing features and can be easily added to the existing application in
another development cycle.

1.3 Related Work

There exist a fairly large number of white papers on the topic of white box security testing of web
applications. Yet all of these papers focus either on:

• parsing the source code of the application and deriving test cases out of it (see [16],[6])

• describing an application layout for writing a test management system (see [8, p.5])

• retrieving an object model of a web application containing all static and dynamic pages as well
as the allowed navigation links between the pages (see [15])

At least none of the white papers listed in the reference section of this study try to combine a classical
black box security testing approach with additional information sources coming from an internal
perspective of the test object. The next chapter summaries the identified information sources as
illustrated in [4] and how these sources are being used in this project thesis to enhance the detection
of injection flaws in web applications.

Chapter 1. Introduction | 2
2 Combined White and Black Box Security Testing

The following chapter summarises the findings of [4] which serve as a basis for the development of a
combined white and black box web application security testing application throughout this project
thesis. This chapter describes shortly the information sources available to a white box security testing
agent given he has access to the machine on which the test object is deployed. Based on the list of
available information sources, one of many useful test cases will be described where the additional
information comes in handy in order to improve the test results coming from a black box only test. By
using the additional information it should be possible to detect bugs and flaws in the tested applica-
tion which would not have surfaced by using a sole black box approach to testing. Generally speaking
the results should be more detailed and more extensive.

2.1 Information Sources

Information sources are available to a white box security testing agent by introducing an internal
perspective on the test object. These information sources should help to improve the overall test
results obtained from a black box test. This chapter tries to list and describe some of the possible
information sources available to a test agent given he has access to the machine on which the test
object is deployed.

The following list is certainly not complete but gives a good overview over some possible sources
which could be accessed during a combined black and white box test. Depending on the pro-
gramming language and web application framework used to develop the application, even more
information sources might become available such as configuration files or framework specific code
annotations. There exists a sheer uncountable amount of different web programming languages and
frameworks like Spring1 , Struts2 and Wicket3 for Java and CakePHP4 for the PHP programming lan-
guage. Identified information sources can be used before the actual test run begins, during the test
run as a real-time feedback from the test object or after the test run has been finished. Information
sources which are accessed before the test begins might be used to derive tailored test cases for the
test object. The information sources can generally be divided into two categories: static information
like configuration files and dynamic sources like log files.

The following sub chapters are an excerpt from [4].


1 http://www.springsource.org/ [16.02.09]
2 http://struts.apache.org/ [16.02.09]
3 http://wicket.apache.org/ [16.02.09]
4 http://cakephp.org/ [16.02.09]

Chapter 2. Combined White and Black Box Security Testing | 3


2.1.1 Source Code Repository

Let’s start with something obvious. A test agent might have access to the source code management
server which contains the latest source code of the test object. With access to the source code reposi-
tory the test agent has automatically the possibility to query the following metadata:

• Name of the developer who last edited a specific source file

• Date of the last change of a specific source file

• Differences between two versions of a specific source file

• Checksum of all files in the repository

• Comments for all the changes made on a specific source file

2.1.2 Source Code

By looking closely at the source code the test agent might even find more metadata inside the code.
Especially classes and methods written in the Java programming language could contain Java anno-
tations. Java annotations were introduced in version 1.5 and come in very handy for analysing source
code. Annotations do not directly affect program semantics, but they do affect the way programs
are treated by tools and libraries, which can in turn affect the semantics of the running program.
Annotations can be read from source files, class files, or reflectively at run time. Listing 2.1 shows a
simple Java EJB Bean with security annotations @RolesAllowed on line 2 and @PermitAll on line
9. These are used to inform the Java Bean container only to allow users with the bankemployee
role to access these functions; except for the method findCustomer which can be accessed by any
user regardless of his role. This concept of adding metadata to the source code is certainly not new
and other programming languages might have similar features available. Any kind of source code
metadata is useful in trying to derive test cases tailored to the test object.

1 @Stateless
2 @RolesAllowed ( " bankemployee " )
3 public c l a s s BankServiceBean implements BankService {
4

5 @PersistenceContext ( unitName = " BankService " )


6 p r i v a t e EntityManager em;
7 p r i v a t e Customer c u s t ;
8

9 @PermitAll
10 public Customer findCustomer ( i n t c u s t I d ) {
11 return ( ( Customer ) em. f i n d ( Customer . c l a s s , c u s t I d ) ) ;
12 }
13 public void addCustomer ( i n t custId , S t r i n g firstName , S t r i n g lastName ) {
14 c u s t = new Customer ( ) ;
15 cust . setId ( custId ) ;
16 c u s t . setFirstName ( firstName ) ;

Chapter 2. Combined White and Black Box Security Testing | 4


17 c u s t . setLastName ( lastName ) ;
18 em. p e r s i s t ( c u s t ) ;
19 }
20

21 public void updateCustomer ( Customer c u s t ) {


22 Customer mergedCust = em. merge ( c u s t ) ;
23 }
24 }

Listing 2.1: EJB3 Security Annotations Example

2.1.3 Runtime Analysis & Profiling

Another approach for the derivation of dynamically created and tailored test cases is presented in the
white paper of Patrice Godefroid et al. (see [6]) by analysing the test object at runtime. Taken from
the abstract:

Fuzz testing is an effective technique for finding security vulnerabilities in software. Tra-
ditionally, fuzz testing tools apply random mutations to well-formed inputs of a program
and test the resulting values. We present an alternative white box fuzz testing approach
inspired by recent advances in symbolic execution and dynamic test generation. Our ap-
proach records an actual run of the program under test on a well-formed input, symbol-
ically evaluates the recorded trace, and gathers constraints on inputs capturing how the
program uses these. The collected constraints are then negated one by one and solved
with a constraint solver, producing new inputs that exercise different control paths in the
program. This process is repeated with the help of a code-coverage maximizing heuristic
designed to find defects as fast as possible.

The only disadvantage of this approach is that the authors focused on applications which are be-
ing compiled and are available to the test agent in executable machine code. In the case of web
applications a similar approach might work by adding profiling modules to the Zend engine5 in
case the application is written in PHP, the Zend engine is a compiler for the PHP programming lan-
guage and is included in the mod_php module of the Apache webserver, or by using profiling appli-
cations for the Java programming language. For a list of available open source Java profilers visit:
http://java-source.net/open-source/profilers. For example λProbe6 is a self sufficient
web application, which helps to visualize various parameters of Apache Tomcat instance in real time.
λProbe is designed to work specifically with Tomcat (JBoss compatibility has been added recently) so
it is able to access far more information that is normally available to JMX agents. The JMX technology
provides the tools for building distributed, web based, modular and dynamic solutions for managing
and monitoring devices, applications, and service-driven networks. The following list is a subset of
the features available through λProbe:

5 http://www.zend.com/en/ [21.02.10]
6 http://www.lambdaprobe.org [21.02.10]

Chapter 2. Combined White and Black Box Security Testing | 5


• Comprehensive JVM memory usage monitor

• Display of deployed applications, their status, session count, session object count, context ob-
ject count, datasource usage etc.

• Ability to view deployed JSP files

• Ability to compile all or selected JSP files at any time

• Ability to pre-compile JSP files on application deployment

• Ability to view auto-generated JSP servlets

• Display of list of sessions for a particular application

• Display of session attributes and their values for a particular application. Ability to remove
session attributes

• Ability to view application context attributes and their values

• Graphical display of datasource details including maximum number of connections, number


of busy connections and configuration details

• Ability to group datasource properties by URL to help visualizing impact on the databases

• Display of system information including System.properties, memory usage bar and OS details

• Real time OS memory usage, swap usage and CPU utilisation monitoring

• Ability to show information about log files and download selected files

2.1.4 Configuration files

Configuration files of daemons or any other kind of application contain valuable information as well.
Depending on the application such configuration files might contain a list of users which are allowed
to access the service or a list of modules which are enabled etc. Of course the kind of information
stored in configuration files differs greatly between applications and the possibility is high that they
won’t be useful at all.

2.1.5 Log/Trace files

Trace files are text files, which are automatically created by an application to record error messages
that are useful when troubleshooting a problem. Whenever an error occurs in the application, it
creates a trace file to write the error message. Depending on the type of the error message, the appli-
cation might create two types of trace files:

1. Background trace files: Background trace files are created to record exceptions and errors gen-
erated during operations. Whenever background processes are unable to function normally
due to any reason, they create background trace files.

Chapter 2. Combined White and Black Box Security Testing | 6


2. User trace files: User trace files are created to record errors generated by user sessions. When-
ever a user session encounters an unusual situation or any internal error, the server processes
generate a user trace file.

If the test application uses the Apache webserver then there exists the possibility to enable detailed
logging of the requests received by the daemon. The Apache access.log records all requests
processed by the server. The format of the access log is highly configurable. The format is specified
using a format string that looks much like a C-style printf(1) format string. See the Apache
webserver documentation for more details on how to configure the format of the output string:
http://httpd.apache.org/docs/2.0/logs.html

1 1 2 7 . 0 . 0 . 1 − f r a n k [ 1 0 / Oct / 2 0 0 0 : 1 3 : 5 5 : 3 6 −0700] GET / apache_pb . g i f HTTP/ 1 . 0 200 2326

Listing 2.2: Sample Apache access.log output

The error.log is the place where the Apache webserver will send diagnostic information and
record any errors that it encounters in processing requests. It is the first place to look when a problem
occurs with starting the server or with the operation of the server, since it will often contain details of
what went wrong and how to fix it. The format of the error log is relatively free-form and descriptive.
But there is certain information that is contained in most error log entries. A very wide variety of
different messages can appear in the error log. The error log will also contain debugging output from
CGI scripts. Any information written to stderr by a CGI script will be copied directly to the error log.
It is not possible to customise the error log by adding or removing information. However, error log
entries dealing with particular requests have corresponding entries in the access log. For example,
the example entry in listing 2.3 corresponds to an access log entry with status code 403. Since it is
possible to customise the access log, you can obtain more information about error conditions using
that log file. During testing, it is often useful to continuously monitor the error log for any problems.

1 [Wed Oct 11 1 4 : 3 2 : 5 2 2000] [ e r r o r ] [ c l i e n t 1 2 7 . 0 . 0 . 1 ] c l i e n t denied by s e r v e r


c o n f i g u r a t i o n : / export /home/ l i v e /ap/ htdocs / t e s t

Listing 2.3: Sample Apache error.log output

The optional mod_log_forensic module is an often forgotten yet very handy tool in debugging the
Apache webserver. It gives each request a unique id which can then be used to track through the log
file. It first writes the request prefixed with the unique id, then it writes the same id once the request
is completed. Very useful to spot scripts which never finish, be it due to client or server issues. Listing
2.4 shows an entire request including browser information, cookies etc. The only downside to this
module is that the output format is fixed and can not be changed.

+sS6NLH8AAAEAAHoSUdUAAAAB | GET / f a v i c o n . i c o HTTP / 1 . 1 | User−Agent : Opera / 9 . 2 6


( Windows NT 5 . 1 ; U; en ) | Host : example . com | Accept : t e x t / html ,
a p p l i c a t i o n /xml ; q=0.9 , a p p l i c a t i o n / xhtml+xml , image /png , image / jpeg ,
image / g i f , image / x−xbitmap , ∗ / ∗ ; q = 0 . 1 | Accept−Language : en−GB, en ; q = 0 . 9 |

Chapter 2. Combined White and Black Box Security Testing | 7


Accept−Charset : i so −8859−1, u t f −8 , u t f −16 , ∗ ; q = 0 . 1 | Accept−Encoding : d e f l a t e ,
gzip , x−gzip , i d e n t i t y , ∗ ; q=0| R e f e r e r : ht tp%3a // example . com/ t e s t . html |
Connection : Keep−A l i v e , TE | TE : d e f l a t e , gzip , chunked , i d e n t i t y , t r a i l e r s

Listing 2.4: Sample Apache mod_log_forensic module output

2.1.6 Resource Monitor

Resource monitors provide an ongoing look at processor activity in real time. Listing various system
information like the most CPU-intensive task, memory consumption, network activity etc. The most
famous tool under Linux is the top command. See listing 2.5 for a sample output.

1 Tasks : 173 t o t a l , 2 running , 171 s l e e p i n g , 0 stopped , 0 zombie


2 Cpu ( s ) : 1.5%us , 1.5% sy , 0.0% ni , 97.1% id , 0.0%wa , 0.0% hi , 0.0% s i , 0.0% s t
3 Mem: 2030288k t o t a l , 1553696k used , 476592k f r e e , 53592k b u f f e r s
4 Swap : 5947384k t o t a l , 127464k used , 5819920k f r e e , 1161812k cached
5 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6 1376 r o o t 20 0 85000 25m 8440 S 3 1.3 2 9 : 2 3 . 2 2 Xorg
7 11522 kevin 20 0 2472 1208 896 R 2 0.1 0 : 0 0 . 0 6 top
8 22 r o o t 15 −5 0 0 0 S 1 0.0 0 : 1 0 . 9 1 a t a /0
9 10656 kevin 20 0 246m 113m 26m S 1 5.7 1 : 4 7 . 6 7 f i r e f o x −bin
10 1 root 20 0 2608 1056 752 S 0 0.1 0:01.04 i n i t
11 2 root 15 −5 0 0 0 S 0 0.0 0 : 0 0 . 0 0 kthreadd
12 3 root RT −5 0 0 0 S 0 0.0 0 : 0 0 . 0 0 migration /0

Listing 2.5: Sample top output

2.1.7 Application Frameworks

Depending on the framework the developer used to implement the test object, there might exist ad-
ditional configuration files especially needed for the framework. For example: if the web application
uses the Spring framework prior to version 2.5, the developer had to configure URL mappings inside
a XML configuration file. If a user wants to access a specific resource by entering an URL, the Java
Bean container loads the Java object associated with the given URL. Nowadays this is also configured
by using Java annotations directly inside the source code. See listing 2.6 for an example of such a
mapping XML file.

1 <bean i d=" urlMappings "


2 c l a s s=" org . springframework . web . s e r v l e t . handler . SimpleUrlHandlerMapping ">
3 <property name=" mappings ">
4 <props>
5 <prop key=" / logon . a c t i o n ">a u t h C o n t r o l l e r</ prop>
6 <prop key=" / help . a c t i o n ">c o r e C o n t r o l l e r</ prop>
7 <prop key=" / c o n t a c t . a c t i o n ">c o r e C o n t r o l l e r</ prop>
8 <prop key=" / a u t h e n t i c a t e d / core / s e l e c t _ l o u n g e . a c t i o n ">selectLoungeForm</ prop>
9 <prop key=" / auth / core /home . a c t i o n ">homeController</ prop>

Chapter 2. Combined White and Black Box Security Testing | 8


10 <prop key=" / auth / core / edit_welcomepage . a c t i o n ">editWelcomePageForm</ prop>
11 <prop key=" / auth / core / e d i t _ r u l e p a g e . a c t i o n ">editRulePageForm</ prop>
12 </ props>
13 </ property>
14 </bean>

Listing 2.6: Simple Spring URL mapping example prior to version 2.5

Looking yet at another web application framework like Struts for the Java programming language
one finds a file called struts.xml. The Struts 2 framework uses a configuration file struts.xml to
initialize its own resources. These resources include:

• Interceptors that can preprocess and postprocess a request

• Action classes that can call business logic and data access code

• Results that can prepare views using JavaServer pages, Velocity and FreeMarker templates.

At runtime, there is a single configuration for an application. Prior to runtime, the configuration
is defined through one or more XML documents, including the default struts.xml document.
Listing 2.7 on pgae 9 shows an example struts.xml file. The <action> tag is used by the Struts
controller to determine which view to return to the client based on the given path respectively URL.
The information stored in the struts.xml file is very interesting for webcrawlers which can find
additional paths which might not have been found by the usual black box crawling.

1 ...
2 <package name=" Customization " namespace=" / customization " extends="NG">
3

4 <g l o b a l −r e s u l t s>
5 <r e s u l t name = "common−e r r o r ">/WEB−INF/web/ j s p /common/commonError . j s p</ r e s u l t>
6 <g l o b a l −r e s u l t s>
7

8 <a c t i o n name="Welcome" c l a s s=" mailreader2 . Welcome">


9 <r e s u l t>/WEB−INF/web/ j s p /welcome . j s p</ r e s u l t>
10 </ a c t i o n>
11

12 <a c t i o n name="MainMenu" c l a s s=" mailreader2 . MainMenu">


13 <r e s u l t>/WEB−INF/web/ j s p /main . j s p</ r e s u l t>
14 </ a c t i o n>
15

16 <a c t i o n name=" l o g i n " c l a s s=" customizationAction " method=" l o g i n ">


17 <r e s u l t>/WEB−INF/web/ j s p /common/ Login . j s p</ r e s u l t>
18 </ a c t i o n>
19

20 <a c t i o n name=" l o g i n " c l a s s=" mailreader2 . Login ">


21 <r e s u l t name=" input ">/WEB−INF/web/ j s p / l o g i n . j s p</ r e s u l t>
22 <r e s u l t name=" cancel " type=" r e d i r e c t A c t i o n ">Welcome</ r e s u l t>
23 <r e s u l t type=" r e d i r e c t A c t i o n ">MainMenu</ r e s u l t>
24 <r e s u l t name=" e x p i r e d " type=" chain ">ChangePassword</ r e s u l t>

Chapter 2. Combined White and Black Box Security Testing | 9


25 </ a c t i o n>
26

27 </ package>
28 ...

Listing 2.7: Simple Struts URL mapping example for version 2

The Struts framework has even more interesting configuration files. Struts uses another XML
file called validation.xml for validating form fields. Listing 2.8 on pgae 10 shows an example
validation.xml file. The validate functionality can be used to validate the data on the users
browser as well as on the server side. The Struts framework emits the Javascript and it can be used to
validate the form data on the client browser. Validator uses the XML file to pickup the validation rules
to be applied to a form. In XML validation requirements are defined applied to a form. The Validator
framework uses two XML configuration files validator-rules.xml and validation.xml. The
validator-rules.xml defines the standard validation routines, these are reusable and used
in validation.xml to define the form specific validations. The validation.xml defines the
validations applied to a form bean. The information provided by the validation.xml file could
be used before an actual test run begins in order to derive especially tailored test cases for a specific
form. By knowing what kind of validation the application respectively form uses, it might be possible
to create a malicous input which slips trough the validation framework.

1 <form name=" logonForm ">


2 < f i e l d property=" username " depends=" r e q u i r e d ">
3 <arg key=" logonForm . username " />
4 </ f i e l d>
5 < f i e l d property=" password " depends=" required , mask">
6 <arg key=" logonForm . password " />
7 <var>
8 <var−name>mask</ var−name>
9 <var−value>^[0−9a−zA−Z ] ∗ $</ var−value>
10 </ var>
11 </ f i e l d>
12 </ form>

Listing 2.8: Simple Struts validator example for version 2

2.1.8 Data Storage

If the test agent has access to the data storage solution, he is able to browse trough all the previously
collected data of the object. Depending on the test object this can consist of user credentials, pur-
chases made, stock levels etc. Given the storage solution is some kind of relational database like
MySQL7 or PostgreSQL8 then there exists the possibility to enable the detailed logging of queries
made to the database. See listing 2.9 for a MySQL query log example. Taken from the MySQL ref-
erence manual (see [12]):
7 http://www.mysql.com/ [17.02.09]
8 http://www.postgresql.org/ [17.02.09]

Chapter 2. Combined White and Black Box Security Testing | 10


The general query log is a general record of what mysqld is doing. The server writes in-
formation to this log when clients connect or disconnect, and it logs each SQL statement
received from clients. The general query log can be very useful when you suspect an error
in a client and want to know exactly what the client sent to mysqld.

1 080228 1 5 : 2 7 : 5 0 1170 Connect user@host on database_name


2 1170 Query SELECT something FROM sometable WHERE some=t h i n g
3 1170 Quit

Listing 2.9: MySQL query log example

Additionally, the test agent is able to gather information about the database tables he has access
to. Most common relational databases implement the describe command. The describe SQL
command is used to list all of the fields in a table and the data format of each field. See listing 2.10 for
an example.

1 mysql> show t a b l e s ;
2 +−−−−−−−−−−−−−−−−−−−−−−−−−−−+
3 | Tables_in_WebApp |
4 +−−−−−−−−−−−−−−−−−−−−−−−−−−−+
5 | Category |
6 | Priority |
7 | Type |
8 | Users |
9 +−−−−−−−−−−−−−−−−−−−−−−−−−−−+
10 4 rows in s e t ( 0 . 0 0 sec )
11

12 mysql> d e s c r i b e Category ;
13 +−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−+−−−−−−−−−−−−−−−−+
14 | Field | Type | Null | Key | D e f a u l t | E x t r a |
15 +−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−+−−−−−−−−−−−−−−−−+
16 | id | int (11) | NO | PRI | NULL | auto_increment |
17 | category_name | varchar ( 2 5 5 ) | YES | UNI | NULL | |
18 +−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−+−−−−−−−−−−−−−−−−+
19 2 rows in s e t ( 0 . 0 0 sec )

Listing 2.10: MySQL table information example

2.1.9 Recording of Network Traffic

If the test agent has access to the test or production machine he is able to use network recording
tools like Tcpdump9 or Wireshark10 to record all the network traffic coming in and out of the test
or production machine. Tcpdump prints out a description of the contents of packets on a network
interface that match a user specified boolean expression. It can also save the packet data to a file for
later analysis. The exact contents of the packets sent and received over the specified network interface
9 http://www.tcpdump.org/ [22.02.09]
10 http://www.wireshark.org/ [22.02.09]

Chapter 2. Combined White and Black Box Security Testing | 11


are available if the packets are saved into a PCAP (Packet Capture) file. See the Tcpdump manual on
how to start the recording of network traffic: http://www.tcpdump.org/tcpdump_man.html.

2.2 How to combine White and Black Box Security Testing

The focus of this project thesis lies on the detection of injection flaws, such as SQL (SQLi) and stored
Cross Site Scripting (XSS) injection flaws, by parsing the query log file generated by a database (see
section 2.1.8). This section illustrates a possible way to achieve a combined white and black security
testing tool. The necessary steps and technologies needed to implement such a combined testing
tool are the same as in the following example of a web application profiling component.

2.2.1 Web Application Profiling Example

Web application profiling is usually done by using a web crawler in order to discover and step trough
every possible site respectively form of the web application. A web crawler is one type of bot, or
software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits
these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called
the crawl frontier. If the test agent uses a black box approach he is only able to map a discovered site
to a discovered or manually entered URL. By using the previously identified information sources due
to a mixed black and white box approach, a much more elaborate web application profiling becomes
possible.

Given the web application uses some kind of database which allows an administrator to log all the
SQL queries the database receives into a file (see section 2.1.8 on page 10) and the test agent is able to
read and parse the generated log file, then the test agent is not only able to map HTTP(S) POST and
GET requests to the response received from the webserver but also potential SQL statements sent
from the webserver to the database triggered by the previously sent HTTP(S) request. Figure 2.1 on
page 13 illustrates the described approach.

1. The test agent sends a HTTP(S) GET or POST request to the web application on the test or
production machine. Depending on the HTTP(S) request the web application needs to query
the database for additional data. Let’s say the web application is some kind of e-commerce
site and the HTTP(S) request triggered the application to list all the recorded customers which
last names start with a “K”. The web application builds a SQL statement like SELECT * FROM
‘customers‘ cust WHERE ‘lastname‘ LIKE ’K*’ and sends it to the database. The
database receives the query and automatically writes it into the query log file (see section 2.1.8).

2. The test agent accesses the test or production machine either between HTTP(S) requests or
after the web crawler component has finished spidering the web application. It is not necessary
to run a SSH server on the machine, the test agent solely needs access to the log files.

3. The test agent reads and parses the database query log file and applies a matching algorithm in
order to associate a sent HTTP(S) requests with the executed SQL statements by the database.

Chapter 2. Combined White and Black Box Security Testing | 12


Figure 2.1: White box web application profiling

Given the test agent is able to access the machine on which the webserver is hosted and the web-
server is also configured to log all the requests it receives and sends into a file (see section 2.1.5 on
page 6) then the test agent is able to make precise statements about the time it took the webserver
to process the HTTP(S) request without the usual fluctuations due to network latency and load (also
used for “Timing Attacks”, see [14, p.124]). Timing Attacks might be the last resort for an attacker
to enumerate usernames from error messages, registration, or password changes if everything else
fails. An attacker calculates the time it takes from an error message to come up for a bad password
versus a bad username. Depending on how the matching algorithm is implemented and types of
technologies that are used, there might be a significant difference in time for the two responses.
Although this technique has a high risk of false positives.

This approach is not only useful for profiling a web application but it can be used to enhance the
detection of common injection flaws.

2.2.2 Detection of Injection Flaws by using the Database query.log File

SQL Injection:

The detection of possible SQL Injection (SQLi) attack vectors can be improved by parsing the database
query log file. SQLi is a code injection technique that exploits a security vulnerability occurring in the
database layer of an application. The vulnerability is present when user input is either incorrectly
filtered for string literal escape characters embedded in SQL statements or user input is not strongly
typed and thereby unexpectedly executed. The procedure is basically the same as in the web applica-
tion profiling scenario described in section 2.2.1. Let’s use the e-commerce example again. The test
agent found that HTML form mentioned in section 2.2.1 which triggers the following SQL statement
SELECT * FROM ‘customers‘ cust WHERE ‘lastname‘ LIKE ’<HTML_FORM_FIELD>*’. As
you can see this is slightly different than before because a text field from that HTML form is now

Chapter 2. Combined White and Black Box Security Testing | 13


being used by the web application to create a dynamic SQL statement. The test agent is now free to
send various SQLi attack strings and is instantly able to tell if these inputs are being filtered by the
web application or not. Once the SQL statements end up in the database query log file, they have left
the web application and passed all the validation routines implemented by the developers.

Cross Site Scripting (XSS):

If we want to use the same procedure in order to improve the detection of Cross Site Scripting (XSS)
flaws then this only works for the the detection of persistent XSS attacks. XSS is a type of computer
security vulnerability typically found in web applications which enable malicious attackers to inject
client-side script into web pages viewed by other users. The non-persistent (or reflected) XSS vulner-
ability is by far the most common type (see [7]). These holes show up when data provided by a web
client, most commonly in HTTP(S) query parameters or in HTML form submissions, is used immedi-
ately by server-side scripts to generate a page of results for that user, without properly sanitising the
response. If the web application is vulnerable to a reflected XSS attack, then there are no additional
information sources available which would not be available to a black box test as well. The persis-
tent (or stored) XSS vulnerability is a more devastating variant of a cross-site scripting flaw: it occurs
when the data provided by the attacker is saved by the web application, and then permanently dis-
played on “normal” pages returned to other users in the course of regular browsing, without proper
HTML escaping. This resembles the SQLi approach from above. The malicious input (now JavaScript
code instead of malicious SQL code) would show up in the database query log and hence alleviate
the detection. There is one little drawback to this conclusion. Malicious XSS attack code showing
up unfiltered in the database query log is not yet a 100 percent proof that the web application is vul-
nerable to such an attack. The web application might save everything unfiltered in the database but
might apply filtering and validation before displaying the data to the user. This means that the HTML
output would have to be considered as well to make a meaningful statement without generating false
positives.

Chapter 2. Combined White and Black Box Security Testing | 14


3 Application Requirements

This chapter is devoted to document the first set of requirements for a combined white and black box
security testing application used for detecting injection flaws in web applications. This chapter con-
tains a list of use cases and application requirements the security testing tool needs to address. The
use cases are written with a possible security testing agent in mind which uses the to be developed
testing application.

3.1 Overall Goal & Project Idea

The goal at the end of this project thesis is to provide a working white and black box security testing
application for detecting injection flaws of web applications by additionally parsing a database
query log file as described in 2.2.2 on page 13. Whether an existing framework like w3af is extended
by adding the missing features (mostly the parsing of database query log files) as plugins to the
framework or if a completely new framework should be written will be considered in chapter 5 on
page 40. In the end of this project thesis the tool should be in a state where it can be productively
used against real life web applications of various sizes.

Even tough there are many good commercially as well as open source web application security testing
tools available, none of these use a combined white and black box security testing approach besides
scanning the source code of the application. None of the common web application security tools like
Acunetix 6, Rational AppScan, Google Skipfish, w3af, etc. mention such a feature on their feature list
or product homepage. Automated source code scanning is error prone because of several reasons:

• There are way too many programming languages to support them all in one tool. Web applica-
tions can be written in: PHP, Java, Microsoft .NET, Python, Perl, CGI Scripts, Ruby, JavaScript,
Lisp etc. For a more comprehensive list visit: http://www.objs.com/survey/lang.htm.

• There is even a bigger number of web application frameworks which support a developer in
writing web applications. Every one of these frameworks uses different schemes and libraries
for providing database access, site navigation, session handling etc. A simple statical source
code analyser without a data-flow and control-flow analysis module will in most cases not suc-
ceed in detecting vulnerabilities especially if complex libraries are being used to construct the
web application. Security analysers use data-flow analysis primarily to reduce false positives
and false negatives. As a simple (but common) example, many buffer overflows in real code are
unexploitable because the attacker cannot control the data that overflows the buffer. Data-flow
analysis, in this example, can be helpful in distinguishing exploitable from unexploitable buffer

Chapter 3. Application Requirements | 15


overflows. The data-flow analysis that seems to be used most often in security-related applica-
tions is taint analysis. It defines an abstract property of variables called taint, which behaves
very much like a data type. The most obvious use of taint is to say that a variable is tainted if
its value can be influenced by a potential attacker. If a tainted variable is used to compute the
value of a second variable, then the second variable also becomes tainted, and so on (taken
from [11]).

The approach described in 2.2.2 on page 13 tries to mitigate the dependencies imposed by the utilised
programming language and framework used to develop the web application. Parsing the database
query log file requires much fewer dependencies and is independent regarding the used program-
ming language and web application framework. Nevertheless parsing a query log file requires knowl-
edge about the used database product given the web application even uses a data storage solution
with a SQL language interface. All common database solutions like Oracle, MySQL and PostgreSQL
provide one or more of the following output destinations for logging all received SQL queries: sim-
ple text file, comma separated value list and/or database table. Implementing a parser for collecting
processed SQL queries by the database is rather easy compared to the task of writing multiple source
code analysers. Parsing the database query log file is especially useful for detecting any kind of stored
injection flaws given the test agent has access to the query log during or at the end of the security test.
Normal black box web application security testing tools struggle with the detection of SQL Injections
when the web application does not return enough information about application errors which oc-
curred during the execution of the provided input. These are so called Blind SQL Injection vulnerabil-
ities. Blind SQL Injection is identical to normal SQL Injection except that when an attacker attempts
to exploit an application rather than getting a useful error message they get a generic page specified
by the developer instead. This makes exploiting a potential SQL Injection attack more difficult but
not impossible. An attacker can still steal data by asking a series of True and False questions through
SQL statements. By analysing the database query log file there is no need for this kind of detection
scheme. The security testing application can very precisely reveal what kind of SQL statements have
been triggered by sending a malicious HTTP request to the web application. How different database
products like MySQL and PostgreSQL are being configured and how the database query log looks like
can be seen in chapter 4 on page 23.

3.2 Requirements

The following sub sections describe the application requirements shortly by providing a short one
or two sentence statement for each requirement. The requirements are continuously numbered for
later reference. The requirements are divided into four categories: Functional, Usability, Reliability
and Performance requirements according to the FURPS+1 model (see [9, p. 42]).
1 The FURPS+ System for Classifying Requirements: One such classification system was devised by Robert Grady at
Hewlett-Packard. It goes by the acronym FURPS+ which represents: Functionality, Usability, Reliability, Perfor-
mance and Supportability. The "+" in FURPS+ also helps us to remember concerns such as: Design require-
ments,Implementation requirements, Interface requirements and Physical requirements. It is helpful to use FURPS+
categories (or some categorisation scheme) as a checklist for requirements coverage, to reduce the risk of not consider-
ing some important facet of the system.

Chapter 3. Application Requirements | 16


3.2.1 Functional Requirements

These set of requirements describe the overall features, capabilities and security requirements of the
application.

3.2.1.1 Automated Security Testing


Testing for injection flaws in a deployed and running web application has to be done in an
automated fashion which does not require any user input during the scan. The user only
has to provide a set of previously configured settings which are tailored for the targeted web
application.

3.2.1.2 ASTF Compatibility


The tool has to meet the following three requirements in order to provide ASTF compatibility
support:

a) Starting and conducting a web application security test without user interaction (see
requirement 3.2.1.1)

b) Generating a machine parseable output of the results (e.g XML structured file)

c) The web application security tool has to support the Linux operating system platform
because most of the tools used by the ASTF security testing framework depend on it

3.2.1.3 Web Spidering Component


The tool needs to have a web spidering component which allows the testing agent to crawl
trough a running and deployed web application regardless of the used web (application)
server and programming language used to develop the web application. The web crawler
component extracts as many linked web pages as possible trough parsing received HTML re-
sponses from the targeted web application. Evaluation of the spidering component should
follow the web application security scanner evaluation criteria (WASSEC)2

3.2.1.4 Accessing Secured Areas


The security testing application needs to provide means for the user to configure user cre-
dentials for accessing secured areas of a web application. Many web applications have some
kind of secured area where only users with the right credentials and user roles have access
to. This could be some kind of administration interface or the like. The necessary user roles
should be acquired by the security testing tool prior to spidering. Web application authenti-
cation schemes include the following: Basic-, Digest, HTTP Negotiate (NTLM and Kerberos)
and HTML Form-based, Single Sign On (e.g. Cas, OpenID, AWS, AuthSub) and Client SSL
Certificates. Which of these schemes have to be be supported will be decided on a later date.

3.2.1.5 Detection of “Injection Attack Vectors”


The security testing application needs the ability to identify possible attack vectors from re-
2 The Web Application Security Scanner Evaluation Criteria (WASSEC) is a set of guidelines to evaluate web ap-
plication scanners on their ability to effectively test web applications and identify vulnerabilities. It covers ar-
eas such as crawling, parsing, session handling, testing, and reporting. http://projects.webappsec.org/
Web-Application-Security-Scanner-Evaluation-Criteria [06.04.10]

Chapter 3. Application Requirements | 17


ceived HTML pages for injecting malicious code. Attack vectors include HTML query param-
eters, HTML form input fields etc. Other attack vectors include the following: Ajax calls, XML
input parameters, RSS feeds etc. The detection of attack vectors based on HTML query pa-
rameters and HTML form input fields is mandatory for the first version of the security testing
application.

3.2.1.6 Detection of Injection Flaws


Previously identified attack vectors need to be exploited by the security testing application in
order to detect possible injection flaws. Detection is done by injecting malicious JavaScript
code (in case of XSS flaws) or SQL commands (in case of SQLi flaws). If the parsing of a
database query log file as described in section 2.2.2 and requirement 3.2.1.7 is not enabled,
the security testing tool will behave like any other black box web application security scanner.

3.2.1.7 Parsing of the Database Query Log


In order to provide the functionality described in 3.1 the security testing tool needs to under-
stand a database query log file generated by one of the major database products used in web
application development. These include: MySQL, PostgreSQL and Oracle Standard/Enter-
prise Database. By parsing the database query log file the security testing application should
be able to detect possible injection vulnerabilities which would not have been detected by
doing a black box only tests.

3.2.2 Usability Requirements

Usability requirements describe the requirements in regard to human factors, help and documenta-
tion.

3.2.2.1 Self Explanatory


The application should be as much self explanatory as possible. Using and configuring the
application should be feasible by reading a short tutorial.

3.2.2.2 Help Menu


The tool should provide a way to display explanatory messages to the user about required and
optional settings which can be configured prior to a test run.

3.2.2.3 Documentation
The code should be documented with the language specific convention, e.g. JavaDoc for Java
applications.

3.2.3 Reliability Requirements

The following use cases describe the requirements regarding frequency of failure, recoverability and
predictability.

Chapter 3. Application Requirements | 18


3.2.3.1 Reproducibility
The results generated by the tool have to be reproducible. This is essential for making state-
ments about the general security status of a web application over time. Two scans conducted
with the same configuration should produce the exact same results if the targeted web appli-
cation was not altered between runs.

3.2.4 Performance Requirements

Performance requirements tackle response times, throughput, accuracy, availability and resource us-
age issues.

3.2.4.1 Runtime
The security testing tool should be able to produce some meaningful results within a reason-
able amount of time. This depends greatly on the size of the tested web application.

3.2.5 Supportability Requirements

Supportability requirements describe adaptability, maintainability, internationalization and config-


urability requirements.

3.2.5.1 Extensibility
The tool should have or should be written with extensibility in mind. Adding new features
and plugins has to be feasible without major code rewriting and refactoring.

3.3 User Stories

The following user stories are short success stories describing specific sequences of actions and in-
teractions between actors and the system under discussion (see [9, p. 48]).
These stories are far from being a complete list of all the features which should be implemented in
the finished product. These features are merely a starting point and new stories will be added on the
go during the project thesis.

3.3.0.1 As a user, I can enable and disable plugins.

3.3.0.2 As a user, I can configure plugin settings.

3.3.0.3 As a user, I want to access a help menu for the plugin I am currently configuring.

3.3.0.4 As a user, I want to configure a HTTP proxy (with username and password if necessary).

3.3.0.5 As a user, I can log all relevant information to a text file.

3.3.0.6 As a user, I can stop and later resume a previously stopped run.

3.3.0.7 As a user, I want to see some statistics on how many HTTP requests have been sent.

Chapter 3. Application Requirements | 19


3.3.0.8 As a user, I want a report of the identified vulnerabilities in an XML format for further auto-
mated processing.

3.3.0.9 As a user, I can provide a file with commands and start the tool in an automated mode.

3.3.0.10 As a user, I want the spidering process to stop after a configured amount of time (in minutes).

3.3.0.11 As a user, I want the spidering process to ignore or only follow some specific URLs.

3.3.0.12 As a user, I want to see a detailed list of possible attack vectors if a SQL Injection vulnerability
has been found.

3.3.0.13 As a developer, I want to add new features easily.

Additionally, figure 3.1 shows the external actors which interact with the white box security testing
application. External actors are: the Automated Security-Testing Framework (ASTF), the designated
security testing agent, the targeted web application and the corresponding database storage host. To
visualise the application flow a user experiences while interacting with the application figure 3.2 has
been added. Figure 3.2 shows a simplified UML 2 activity diagram on how a future user interacts
with the white box security testing application by either starting the application in automated or
interactive mode.

Chapter 3. Application Requirements | 20


Figure 3.1: Use Case Context

Chapter 3. Application Requirements | 21


Figure 3.2: Simplified UML 2 activity diagram of the application usage flow

Chapter 3. Application Requirements | 22


4 The Database Query Log & How to Detect Input
Validation Vulnerabilities

This chapter is all about configuring and parsing database query logs from various database prod-
ucts like MySQL and PostgreSQL. This is especially important for the development of the combined
white and black box security testing tool as described in sections 2.2.2 and 3.1. Most if not all recent
database solutions provide some kind of logging facility to monitor processed SQL queries coming
from clients. However, the metadata adjoining the logged SQL queries differ greatly in the level of
detail between various database products.

4.1 Configuration of the Database Query Log

This section explains how the SQL query log is being configured and activated for the MySQL and
PostgreSQL database. MySQL is a relational database management system (RDBMS) that runs as a
server providing multi-user access to a number of databases. The MySQL development project has
made its source code available under the terms of the GNU General Public License, as well as under
a variety of proprietary agreements. Members of the MySQL community have created several forks
such as Drizzle and MariaDB. Free software projects that require a full featured database manage-
ment system often use MySQL. Such projects include (for example) WordPress, phpBB, Drupal and
other software built on the LAMP1 software stack. MySQL is also used in many high profile, large
scale World Wide Web products including Wikipedia, Google and Facebook2 . PostgreSQL, often sim-
ply Postgres, is an object-relational database management system (ORDBMS). It is released under
an MIT-style license and is thus free and open source software. As with many other open source pro-
grams, PostgreSQL is not controlled by any single company, but has a global community of developers
and companies to develop it. PostgreSQL evolved from the Ingres project at University of California,
Berkeley.

4.1.1 MySQL

The general query log is a general record of what the MySQL daemon (mysqld) is doing. The server
writes information to this log when clients connect or disconnect, and it logs each SQL statement
received from clients. The general query log can be very useful when you suspect an error in a client

1 LAMP is an acronym for a solution stack of free, open source software, originally coined from the first letters of Linux
(operating system), Apache HTTP Server, MySQL (database software), and PHP, Python or Perl (scripting language),
principal components to build a viable general purpose web server
2 http://www.mysql.com/why-mysql/case-studies/ [16.04.10]

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 23
and want to know exactly what the client sent to the database daemon. The MySQL daemon writes
statements to the query log in the order that it receives them, which might differ from the order in
which they are executed. This logging order contrasts to the binary log, for which statements are
written after they are executed but before any locks are released. Also, the query log contains all
statements, whereas the binary log does not contain statements that only select data (taken from
[12]).

The possible settings for managing the query log might differ based on the used version of the MySQL
database server. The following settings are meant to be used with the latest MySQL Community
Server. The currently available version as of April 2010 is 5.1.45. The general query log can either
be enabled trough the normal MySQL configuration file or dynamically at runtime trough specific
SQL commands. The following sub sections are taken from the MySQL manual (see [12]).

Enabling the Query Log trough the MySQL Configuration File & Command Line Arguments

In a default Ubuntu Linux installation (version 9.10 as of April 2010) the MySQL configuration file
is located in: /etc/mysql/my.cnf. The configuration file can be found under the following path
C:\Program Files\MySQL\my.cnf for Windows users. The following settings need to bet set in
order to start logging the SQL queries:

• Before 5.1.6, the general query log destination is always a file. To enable the log, start mysqld
with the –log[=file_name] or -l [file_name] option.

• As of MySQL 5.1.6, the destination can be a file or a table, or both. Start mysqld with the
–log[=file_name] or -l [file_name] option to enable the general query log, and option-
ally use –log-output to specify the log destination.

• As of MySQL 5.1.12, as an alternative to –log or -l, use –general_log[=0|1] to specify the


initial general query log state. In this case, the default general query log file name is used. With
no argument or an argument of 1, –general_log enables the log. With an argument of 0, this
option disables the log.

• As of MySQL 5.1.29, use –general_log[=0|1] to enable or disable the general query log, and
optionally –general_log_file=file_name to specify a log file name. The –log and -l op-
tions are deprecated.

Listing 4.1 shows an excerpt of the mysqld configuration file. These settings causes the
MySQL daemon to log all received SQL queries into /var/log/mysql/query.log and into the
mysql.general_log table.
1 [ mysqld ]
2 log_output = FILE , TABLE
3 general_log = 1
4 general_log_file = / var / l o g / mysql / query . l o g

Listing 4.1: Enabling of the MySQL query log trough the configuration file

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 24
Enabling the Query Log Dynamically At Runtime

For runtime control of the general query log, use the global general_log and general_log_-
file system variables. Set general_log to 0 (or OFF) to disable the log or to 1 (or ON) to enable
it. Set general_log_file to specify the name of the log file. If a log file already is open, it is
closed and the new file is opened. When the general query log is enabled, output is written to any
destinations specified by the –log-output option or log_output system variable. If you enable
the log, the server opens the log file and writes startup messages to it. However, further logging
of queries to the file does not occur unless the FILE log destination is selected. If the destination
is NONE, no queries are written even if the general log is enabled. Setting the log file name has no
effect on logging if the log destination value does not contain FILE. Server restarts and log flush-
ing do not cause a new general query log file to be generated (although flushing closes and reopens it).

As of MySQL 5.1.12, you can disable the general query log at runtime: SET GLOBAL general_log
= ’OFF’; With the log disabled, rename the log file externally; for example, from the command
line. Then enable the log again: SET GLOBAL general_log = ’ON’; This method works on any
platform and does not require a server restart. The session sql_log_off variable can be set to ON
or OFF to disable or enable general query logging for the current connection. The general query log
should be protected because logged statements might contain passwords.

Listing 4.2 shows the order of SQL commands needed to enable the general query log at runtime and
disregard any existing settings.

1 # mysql −u r o o t −p
2 mysql> FLUSH LOGS ;
3 mysql> SET GLOBAL g e n e r a l _ l o g = ’OFF ’ ;
4 mysql> SET GLOBAL log_output = ’ FILE , TABLE ’ ;
5 mysql> SET GLOBAL g e n e r a l _ l o g _ f i l e = ’ / var / l o g / mysql / query . log ’ ;
6 mysql> SET GLOBAL g e n e r a l _ l o g = ’ON’ ;

Listing 4.2: Enabling of the MySQL query log at runtime

Contents Of The MySQL Query Log

As seen in listing 4.1 MySQL allows the user to change the default general log output from a
file to a database table. Per default MySQL uses the mysql.general_log table. Listing 4.3 shows
the overall structure of this table and what kind of information is being logged by the MySQL daemon.

1 mysql> SHOW CREATE TABLE mysql . g e n e r a l _ l o g ;


2 CREATE TABLE ‘ g e n e r a l _ l o g ‘ (
3 ‘ event_time ‘ TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
4 ‘ user_host ‘ MEDIUMTEXT NOT NULL,
5 ‘ thread_id ‘ INT ( 1 1 ) NOT NULL,
6 ‘ s e r v e r _ i d ‘ INT ( 1 0 ) UNSIGNED NOT NULL,

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 25
7 ‘ command_type ‘ VARCHAR( 6 4 ) NOT NULL,
8 ‘ argument ‘ MEDIUMTEXT NOT NULL
9 ) ENGINE=CSV DEFAULT CHARSET=u t f 8 COMMENT= ’ General l o g ’ |

Listing 4.3: Definition of the MySQL query log table

Storing the database query log in a database table is very convenient because accessing and searching
for specific entries in the stored data becomes trivial by sending appropriate SQL commands as seen
in listing 4.4.
1 mysql> SELECT argument FROM mysql . g e n e r a l _ l o g WHERE command_type = ’ Query ’ ;
2 +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
3 | argument |
4 +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
5 | r o o t @ l o c a l h o s t on |
6 | s e l e c t @@version_comment l i m i t 1 |
7 | SHOW CREATE TABLE mysql . g e n e r a l _ l o g |
8 | DESCRIBE mysql . g e n e r a l _ l o g |
9 +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
10 4 rows in s e t ( 0 . 0 0 sec )

Listing 4.4: Content of the MySQL query log table

4.1.2 PostgreSQL

Enabling the query log dynamically at runtime as seen in 4.1.1 is not supported by PostgreSQL as
opposed to MySQL. The currently available version as of April 2010 is 8.4. The query log can only be
enabled trough the PostgreSQL configuration file which entails a server restart after modifying the
PostgreSQL configuration file.

Enabling the Query Log trough the PostgreSQL Configuration File

In a default Ubuntu Linux installation (version 9.10 as of April 2010) the PostgreSQL con-
figuration file is located in:/etc/postgresql/8.x/main/postgresql.conf. Win-
dows users find the configuration file under the following directory: C:\Program
Files\PostgreSQL\8.x\main\postgresql.conf. Listing 4.5 shows an excerpt of the Post-
greSQL configuration file and the needed settings to activate the query log. PostgreSQL supports
several methods for logging server messages, including stderr, csvlog and syslog. On Windows,
eventlog is also supported. The default is to log to stderr only. This parameter can only be
set in the postgresql.conf file or on the server command line. If csvlog is included in log_-
destination, log entries are output in “comma separated value” format, which is convenient for
loading them into programs. See paragraph Using The CSV-Format Log Output in section 4.1.2
for details. logging_collector must be enabled to generate CSV-format log output. When
logging_collector is enabled, this parameter determines the directory in which log files will
be created. It can be specified as an absolute path, or relative to the cluster data directory. When
logging_collector is enabled, the log_filename parameter sets the file names of the created

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 26
log files. The value is treated as a strftime pattern, so %-escapes can be used to specify time-varying
file names. If CSV-format output is enabled in log_destination, .csv will be appended to the
timestamped log file name to create the file name for CSV-format output (taken from [13]).

1 log_destination = ’ csvlog ’
2 logging_collector = on
3 log_directory = ’ pg_log ’

Listing 4.5: Enabling the query log trough the PostgreSQL configuration file

Using The CSV-Format Log Output

Including csvlog in the log_destination list provides a convenient way to import log files into a
database table. This option emits log lines in comma-separated-value format, with these columns:
timestamp with milliseconds, user name, database name, process ID, host:port number, session ID,
per-session or -process line number, command tag, session start time, virtual transaction ID, regular
transaction id, error severity, SQL state code, error message detail, hint, internal query that led to
the error (if any), character count of the error position thereof, error context, user query that led to
the error (if any and enabled by log_min_error_statement), character count of the error position
thereof, location of the error in the PostgreSQL source code (if log_error_verbosity is set to
verbose). Listing 4.6 shows a single log entry produced by the PostgreSQL daemon given csvlog is
enabled.

1 2010−04−13 1 6 : 1 6 : 3 0 . 4 1 0 CEST , " p o s t g r e s " , " p o s t g r e s " , 3 7 0 5 , " [ l o c a l ] " , 4 bc47caf . e79 , 2 , "SHOW" , \
2 2010−04−13 1 6 : 1 6 : 1 5 CEST , 1 / 9 , 0 ,ERROR, 4 2 7 0 4 , " unrecognized c o n f i g u r a t i o n parameter \
3 "" database " " " , , , , , , " show database ; " , ,

Listing 4.6: Content of the PostgreSQL query log file (CSV-format output)

Listing 4.7 shows the table definition which can be used to store the CSV-formatted query log (taken
from [13]).
1 CREATE TABLE p o s t g r e s _ l o g (
2 log_time TIMESTAMP( 3 ) WITH TIME ZONE, user_name TEXT , database_name TEXT ,
3 p r o c e s s _ i d INTEGER , connection_from TEXT , s e s s i o n _ i d TEXT ,
4 session_line_num BIGINT , command_tag TEXT ,
5 s e s s i o n _ s t a r t _ t i m e TIMESTAMP WITH TIME ZONE, v i r t u a l _ t r a n s a c t i o n _ i d TEXT ,
6 t r a n s a c t i o n _ i d BIGINT , e r r o r _ s e v e r i t y TEXT , s q l _ s t a t e _ c o d e TEXT ,
7 message TEXT , d e t a i l TEXT , h i n t TEXT , i n t e r n a l _ q u e r y TEXT ,
8 i n t e r n a l _ q u e r y _ p o s INTEGER , c o n t e x t TEXT , query TEXT ,
9 query_pos INTEGER , l o c a t i o n TEXT ,
10 PRIMARY KEY ( s e s s i o n _ i d , session_line_num ) ) ;

Listing 4.7: Definition of the PostgreSQL query log table

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 27
4.1.3 Microsoft SQL Server

The Microsoft SQL Server also implements a method to log and analyse processed SQL queries as
of Microsoft SQL Server 2008. The so called “SQL Server Profiler” shows how SQL Server resolves
queries internally. This allows administrators to see exactly what Transact-SQL statements or
Multi-Dimensional Expressions are submitted to the server and how the server accesses the database
or cube to return result sets. Using SQL Server Profiler, administrators can do the following: a) Create
a trace that is based on a reusable template, b) watch the trace results as the trace runs, c) store the
trace results in a table, d) start, stop, pause, and modify the trace results as necessary, e) replay the
trace results3 .

For the most part, Profiler is an administrative tool that requires a bit of experience to master.
Fortunately, Profiler provides a graphical interface, which makes both learning and monitoring much
simpler. However, Profiler requires 10 MB of free space; if free space falls below 10 MB, Profiler stops.
To access Profiler, you must be the administrator or have permission to connect to a specific instance
of SQL Server and have granted permissions to execute Profiler stored procedures.

From the Start menu, locate Microsoft SQL Server among your available programs and then click
Profiler from the SQL Server group. In Enterprise Manager, choose SQL Profiler from the Tools menu.
From the File menu, choose New, select Trace from the submenu, identify the appropriate SQL Server
instance, and click OK. Use the resulting Trace Properties dialog box and its four tabs to initiate the
process. Set the following options on the General tab4 :

• Name the trace and identify the server on which you will run the trace. The Trace SQL Server
property defaults to the instance identified earlier.

• Use the Template Name control’s drop-down list to choose one of the available templates. If
you create a template, be sure to specify the path to that file (tdf extension). You can add a
default template via the Options menu off the Tools menu.

• Save the trace to a file, reducing overhead on the server. Selecting this option enables the two
check boxes immediately below: The Enable File Roller option permits you to open a new file
for the trace once the original file is full, and Server Processes SQL Server Traces Data indicates
whether the server or the client application should perform the trace. Performing lengthy com-
plicated event tracing on the server can reduce performance. Saving the trace to a table is an
alternate to saving the trace to a file. This again can have performance implications on a busy
server. Specifically, a table trace requires more overhead.

• The final option, Enable Trace Stop Time, allows you to determine when the trace ends.

For more information about using the SQL Server Profiler and the layout of the generated text file go
to http://msdn.microsoft.com/en-us/library/ms187929.aspx.
3 Taken from http://msdn.microsoft.com/en-us/library/ms187929.aspx [26.04.10]
4 Taken from http://articles.techrepublic.com.com/5100-10878_11-5054787.html [26.04.10]

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 28
4.1.4 Oracle Database Standard Edition

The database query log is called “Audit Trail” in the Oracle jargon. In Oracle9i Database and below,
auditing captures only the “who” part of the activity, not the “what”. With the arrival of 10g, these
limitations are gone, thanks to two significant changes to the auditing facility: because two types
of audits are involved - the standard audit (available in all versions) and the fine-grained audit
(available in Oracle9i and up)5 .

Auditing is disabled by default, but can be enabled by setting the AUDIT_TRAIL static parameter,
which has the following allowed values.

AUDIT_TRAIL = { none | os | db | db,extended | xml | xml,extended }

The following list provides a description of each setting:

• none or false - Auditing is disabled.

• db or true - Auditing is enabled, with all audit records stored in the database audit trial
(SYS.AUD$).

• db,extended - As db, but the SQL_BIND and SQL_TEXT columns are also populated.

• xml - Auditing is enabled, with all audit records stored as XML format OS files.

• xml,extended - As xml, but the SQL_BIND and SQL_TEXT columns are also populated.

• os - Auditing is enabled, with all audit records directed to the operating system’s audit trail.

The static AUDIT_FILE_DEST parameter specifies the OS directory used for the audit trail when
the os, xml and xml,extended options are used. It is also the location for all mandatory auditing
specified by the AUDIT_SYS_OPERATIONS parameter.

Standard auditing, implemented by the SQL command AUDIT, can be used to quickly and easily set
up tracking for a specific object. For instance, if you wanted to track all the updates to the table EMP
owned by Scott, you would issue: audit UPDATE on SCOTT.EMP by access;. This command
will record all updates on the table SCOTT.EMP by any user each time it occurs, in the audit trail table
AUD$, visible through the view DBA_AUDIT_TRAIL6 .

For more information about the Oracle Audit Trail visit: http://www.oracle.com/technology/
documentation/index.html.

5 Taken from http://www.oracle-base.com/articles/10g/Auditing_10gR2.php [26.04.10]


6 Taken from http://www.oracle.com/technology/pub/articles/10gdba/week10_10gdba.html [26.04.10]

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 29
4.2 How to Detect Input Validation Vulnerabilities

The next big task is to decide respectively design when to parse the database query log in order to
improve the detection of injection flaws of web applications. The query log can either be parsed
repeatedly after each sent HTTP request containing malicious code for detecting input validation
vulnerabilities or as part of a post processing module when all the tests have been finished. Both
methods have their advantages and disadvantages. The following sections describe both methods in
detail and outline their respective benefits. The goal of both methods is to find web application input
parameters which can be manipulated by a malicious or trustworthy user and are being used to con-
struct SQL queries which are then executed by the database tier. Depending on the input validation
scheme the application uses, these identified parameters might be used to inject malicious code into
the targeted web application.

4.2.1 Online Approach

The pseudo code in listing 4.8 is a design idea for a combined white and black box security testing
module which uses the database SQL query log file at runtime to make more thorough statements
about input validation vulnerabilities of the targeted web application. The basic idea is to access
respectively parse the database query log repeatedly after each sent HTTP request containing
malicious code for detecting input validation vulnerabilities.

The algorithm in listing 4.8 starts with a list of previously discovered web pages on line 1. This list
can either be coming from a web crawler component or from a user imported list of accessible and
valid URLs of the targeted web application. This list is then used to find possible attack vectors
consisting of any kind of input parameter which can be manipulated by a malicious or trustworthy
user. Typically these are common URL query strings or HTML forms embedded in the HTML body of
a received web application response (see listing 4.8 line 3). Of course there are many more possible
attack vectors like cross site scripting in AJAX requests, malicious XML injections etc. A first version
of the security testing application certainly has to cover HTML query strings and HTML forms.

An identified attack vector is then populated with a unique random value which is later used together
with the database query log to verify if any of the injected values are being reused by the web appli-
cation to construct one or multiple SQL queries. A random value is being used to make the detection
of correlating SQL queries easier. Lets say a web crawler component detected the following URL:
http://www.example.com/search.php?query=Cat+Food. The security testing application
identifies the HTML query string correctly as an attack vector and replaces the query parameter with
a unique and random alphanumeric value like OPL89FGHC and sends the following HTTP request to
the targeted web application: http://www.example.com/search.php?query=OPL89FGHC (see
listing 4.8 lines 4-5).

After the HTTP request has been sent, the security testing application parses the database query
log. The query log has to be accessible in some way. As seen in section 4.1 this can be one of the

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 30
following: simple text file, comma separated values file (CSV) or a database table. Depending on
the used database solution, the security testing tool downloads or queries the database query log
accordingly and searches for queries containing the unique random value OPL89FGHC or at least
parts of it. A possible SQL query which was constructed by the web application and executed by
the database (and afterwards stored in the database query log) could look similar to this: SELECT
* FROM ‘products‘ WHERE ‘description‘ LIKE ’%OPL89FGHC%’;. Given such a SQL query
exists in the database query log it indicates that the targeted web application uses the supplied data
from the query parameter to construct SQL queries which are being processed by the database. The
circumstance that the web application uses the supplied data in a SQL statement might lead to a
vulnerability which can be used to exploit the targeted web application (see listing 4.8 lines 6-7).

The next step is to probe the validation scheme used by the web application. The probing helps to
reduce the number of previously identified attack vectors and therefore increases the performance of
the vulnerability detection routine. Depending on the used input validation library by the targeted
web application, an identified attack vector might not be exploitable after all (see section 4.2.3) be-
cause it filters malicious characters out of the user provided input which renders a possible attack
string inexecutable and therefore no exploitable vulnerability exists. The probing is done by append-
ing special characters used in common attack code to the random value which has been proven to
be used in SQL statements created by the targeted web application. In case an attacker wants to in-
sert malicious SQL code into the targeted web application he would append something like this to
the query parameter: “’ OR 1=1–” or “’ AND 1=0 UNION ALL SELECT...”7 . Most good valida-
tion schemes escape or convert characters like “’” or “<” to their respective HTML encoding, namely
“&apos;” and “&lt;”. This mitigates the possibility of an attack because the conversion renders
otherwise executable code unusable. The trick to determine whether specific characters are being
filtered or not is to append these special characters to the random value and check if they show up
altered in the database query log. This method can also be used without a database query log when
the HTTP(S) parameter is being reflected in the HTML body of the web application response. A min-
imal set of special characters needed to cover most payloads is around 11 characters (see section
4.2.3). Let’s say the security testing application wants to test whether the “’” character is being fil-
tered by the input validation routine of the web application or not: one would take the random value
from before “OPL89FGHC” and append the specific character “’” resulting in the following value:
’OPL89FGHC. This is again sent to the targeted web application with the following HTTP request:
http://www.example.com/search.php?query=’OPL89FGHC. By parsing the database query log
after the request has been sent one might find one of these SQL queries (see listing 4.8 lines 8-15):

• SELECT * FROM ‘products‘ WHERE ‘description‘ LIKE ’%OPL89FGHC%’;


This indicates that the web application uses an input validation routine before SQL queries
are being sent to the database. In this case the web application completely removed the “’”
character.

7 See http://www.owasp.org/index.php/SQL_Injection for more information about SQLi vulnerabilities


[19.04.10]

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 31
• SELECT * FROM ‘products‘ WHERE ‘description‘ LIKE ’%&apos;OPL89FGHC%’;
This indicates that the web application uses an input validation routine before SQL queries are
being sent to the database. In this case the web application changes special characters to their
respective HTML entities (or any other kind of encoding might be conceivable).

• SELECT * FROM ‘products‘ WHERE ‘description‘ LIKE ’%’OPL89FGHC%’;


This could basically mean two things: either the web application uses no input validation be-
fore SQL queries are being sent to the database or filtering is done before displaying data to the
user.

• No Query found
The targeted web application blocked the execution of the SQL query before it was sent to the
database. This indicates that the targeted web application uses some input validation to filter
user supplied input.

After this process has been done several times with various special characters the security testing
application should be able to tell which characters can be used to generate attack payloads which
are not being filtered by the input validation routine (if there is any). Security testing applications
commonly use a static list of attack payloads in order to probe for vulnerabilities. Possible XSS and
SQLi payloads look similar to this: jav&#x0A;ascript:alert(’OPL89FGHC’);8 or “+ (SELECT
TOP 1 password FROM users ) + ”9 . Given the list of allowed and unfiltered characters the
static list of attack payloads can be reduced by removing payloads which contain characters that are
being filtered by the input validation routine of the web application. This reduces the number of
HTTP(S) requests and the effectiveness of the vulnerability detection routine (see listing 4.8 line 16).

The second last step is to modify the HTTP query parameter from the previous example to include
the payloads which have a high chance of getting trough the validation routine of the web application
(see listing 4.8 lines 17-19). The final step is to check whether an input validation vulnerability finally
exists or not by parsing the query log again for the inserted payload. If the payload is showing up in
the query log unchanged, then there is a very good chance that a vulnerability exists (see listing 4.8
lines 20-22).

This approach might seem a bit complicated and over excessive but in theory it should help to in-
crease performance because only attack payloads are being tested which have a legitimate chance
of getting trough the input validation filter and in the same time false positives are being reduced.
However, the prerequisite for this approach is the availability of the database query log at any time
during the security testing scan.

8 Takenfrom the XSS Cheat Sheet at http://ha.ckers.org/xss.html [20.04.10]


9 Takenfrom the SQL Injection Cheat Sheet at http://ferruh.mavituna.com/sql-injection-cheatsheet-oku/
[20.04.10]

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 32
1 FOR each p r e v i o u s l y discovered web page
2 IF discovered web page c ont ai ns HTML forms or URL parameters THEN
3 FOR each HTML input f i e l d or URL query s t r i n g
4 SET query parameter to an unique random value
5 CALL sendHttpRequest with modified v a l u e s
6 CALL parseQueryLog
7 IF query l o g co nt ai ns p r e v i o u s l y sent random value THEN
8 FOR each s p e c i a l c h a r a c t e r in l i s t
9 SET parameter to s p e c i a l C h a r a c t e r + p r e v i o u s l y sent random value
10 CALL sendHttpRequest with modified v a l u e s
11 CALL parseQueryLog
12 IF query l o g co nt ai ns p r e v i o u s l y sent c h a r a c t e r u n f i l t e r e d THEN
13 CALL saveAllowedCharacter with s e p c i a l C h a r a c t e r
14 END IF
15 END FOR
16 CALL buildPayloads with allowedCharacters
17 FOR each payload
18 SET query parameter to payload
19 CALL sendHttpRequest with i n j e c t e d payload
20 CALL parseQueryLog
21 IF query l o g co nt ai ns sent payload unchanged THEN
22 CALL v u l n e r a b i l i t y D e t e c t e d with payload
23 END IF
24 END FOR
25 END IF
26 END FOR
27 END IF
28 END FOR

Listing 4.8: Pseudo code for detecting input validation vulnerabilities by parsing the database query
log at runtime

4.2.2 Offline Approach

This approach is a slightly modified version of the previously described online approach in section
4.2.1. This approach considers the fact that the database query log might not be available during the
scan but rather in the end after all the security checks have been finished as a post scanning process.

This approach is less performant than the previously described online approach in section 4.2.1
because the online approach first checks if an attack vector is being “echoed” into the database
query log. Extensive testing of an identified attack vector is only done by the online approach if the
vector is being used by the targeted web application as part of one or multiple SQL queries sent to
the database. The following offline approach is not able to determine whether the identified vector
is being “echoed” or not, at least not at the beginning of the security tests. The offline approach has
to find a trade off between performance and accuracy of the reported vulnerabilities.

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 33
The pseudo code in listing 4.9 describes such a trade off. The algorithm in listing 4.9 starts the same
way as the online approach described in section 4.2.1 with the difference that the first sent random
value can not be checked whether it is being “echoed” or not at this particular moment (see listing
4.9 lines 1-5). Instead the security testing application keeps track of each sent random value for later
comparison with the database query log (see listing 4.9 line 6). This missing piece of information
whether the identified attack vector is actually being used in one or multiple SQL queries is the cause
of the performance issues. Every additional HTTP(S) request targeting the current attack vector might
be in vain because the parameter might not be used by the web application for querying the database.

At this point several strategies can be used to finish the implementation of the offline approach with
different outcomes regarding the accuracy of identified input validation vulnerabilities:

• Do Nothing: If we finish the test at this point we have optimal performance but are only able to
tell whether the identified attack vector is being “echoed” or not as soon as the database query
log becomes available to the security testing application. For each identified attack vector one
HTTP(S) request is being sent to the targeted web application. Unfortunately the information
whether a parameter is being used by the web application to construct a SQL query does not
yet qualify as a sound enough indication whether a vulnerability exists or not.

• Probe Input Validation Scheme: Additional to the “reflection” test, the security testing appli-
cation sends HTTP requests for probing the input validation scheme used by the web applica-
tion (see section 4.2.1 for a detailed description of the feature). When the database query log
becomes available to the security testing application it will be possible to tell whether the iden-
tified attack vector is: a) being “echoed” and b) which special characters are being filtered by
the input validation routine of the targeted web application. For each identified attack vector
one HTTP request is being sent for detecting if it is being “echoed” and about 20 HTTP requests
for probing the used input validation scheme (if a minimal character set is being used). This
variation is shown in the pseudo code in listing 4.9 lines 18-24.

• Send all Payloads: In addition to the “echoed” test, the security testing application sends all the
available attack payloads for XSS and SQLi vulnerabilities without filtering them based on the
allowed special characters by the input validation routine of the targeted web application (as
seen in the online approach described in section 4.2.1). Depending on how many attack pay-
loads are being implemented this could easily be around 30 or more payloads per vulnerability
type. As soon as the database query log becomes available to the security testing application
it is able to tell whether the identified attack vector is: a) being “echoed” and b) if a sent pay-
load has been used by the web application inside one or multiple SQL queries without being
modified by any input validation routine. For each identified attack vector one HTTP request
is being sent for detecting if it is being “echoed” and about 60 HTTP requests for sending attack
payloads to the targeted web application (if a full payload set is being used). It might be possi-
ble to use a reduced set of payloads resulting in fewer HTTP requests but with the risk of exactly
missing the one specific payload which would pass the input validation scheme unfiltered and
miss a potential vulnerability.

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 34
• Probe Input Validation Scheme & Send all Payloads: This is a combination of the above men-
tioned strategies Send all Payloads and Probe Input Validation Scheme. This strategy is in the
end as thorough as the online approach described in section 4.2.1 but with a lot more HTTP
requests per identified attack vector resulting in a weak performance. As soon as the database
query log becomes available to the security testing application it is able to tell whether the
identified attack vector is: a) being “echoed” b) which special characters are being filtered by
the input validation routine of the targeted web application and c) if a sent payload has been
used by the web application inside one or multiple SQL queries without being modified by any
input validation routine. For each identified attack vector one HTTP request is being sent for
detecting if it is being “echoed”, about 11 HTTP requests for probing the used input validation
scheme (if a minimal character set is being used. See section 4.2.3) and around 60 HTTP re-
quests for sending attack payloads to the targeted web application (if a full payload set is being
used).

1 FOR each p r e v i o u s l y discovered web page


2 IF discovered web page c ont ai ns HTML forms or URL parameters THEN
3 FOR each HTML input f i e l d or URL parameter
4 SET parameter to a random value
5 CALL sendHttpRequest with modified v a l u e s
6 CALL saveSentHttpRequest with modified v a l u e s
7 FOR each s p e c i a l c h a r a c t e r in l i s t
8 SET parameter to s p e c i a l C h a r a c t e r + p r e v i o u s l y sent random value
9 CALL sendHttpRequest with modified v a l u e s
10 CALL saveSentHttpRequest with modified v a l u e s
11 END FOR
12 END FOR
13 END IF
14 END FOR
15 // Given the database query l o g i s now a v a i l a b l e . . .
16 FOR each sent HTTP r e q u e s t
17 FOR each e n t r y in the query l o g
18 IF e n t r y co nt ai ns p r e v i o u s l y sent random value THEN
19 IF e n t r y co nt ain s s p e c i a l c h a r a c t e r THEN
20 CALL saveAllowedCharacter with s e p c i a l C h a r a c t e r
21 END IF
22 CALL p o s s i b l e V u l n e r a b i l i t y D e t e c t e d with HTTP request , allowed c h a r a c t e r s
23 END IF
24 END FOR
25 END FOR

Listing 4.9: Pseudo code for detecting input validation vulnerabilities by parsing the database query
log as a post scanning process

4.2.3 Special Character Probing & Performance

The mentioned special character probing in section 4.2.1 is a very convenient way to increase the
performance of the vulnerability detection routine by reducing previously discovered attack vectors

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 35
which are being protected by an input filtering library implemented by the targeted web application.
If special characters such as <, >, ., ; (which are typically needed to successfully exploit a XSS or
SQLi vulnerability) are being filtered, the chance is high that the discovered attack vector is not
posing any security threat to the application and therefore can be safely ignored. This results in a
performance gain by reducing HTTP(S) requests which would have been needed to detect input
validation vulnerabilities.

The most performance gain can be made with the “Online” approach described in section 4.2.1
because the result whether a special character is being filtered or not can be resolved immediately by
parsing the database query log. Table 4.4 illustrates the situation by comparing the needed HTTP(S)
requests for both the “Online” and “Offline” approach to detect input validation vulnerabilities.
As one can easily see in table 4.4 the “Offline” approach is far less performant than the “Online”
approach.

One way to slightly improve the performance of both approaches is to check a reduced set of
special characters. How many special characters are being checked is a question of weighing up
performance against precision. The security testing application can make very sound statements
about the used input filtering implementation used by the targeted web application if more special
characters are being checked. On the other hand, every character which is being checked results in:
a) sending a HTTP(S) request and b) parsing the database query log for the inserted special character
at different times depending on the used approach.

Table 4.2 lists some typical exploit code used for detecting XSS or SQLi vulnerabilities in a targeted
web application. Based on this list a minimal set of special characters can be selected which should
guarantee code execution if none of the characters is being filtered by the targeted web application.

XSS Minimal Character Set

A minimal set for XSS injection code would consist of the following special characters: ’ ; . ! - ” < >
= & { ( ) } (14 characters). A more advanced set would additionally check all the hex encoded values
of the minimal set.

SQLi Minimal Character Set

A minimal set for SQLi injection code would consist of the following special characters: ’ = ( ) OR AND
UNION ALL SELECT * , ; (12 characters/keywords). A more advanced set would additionally check
more SQL keywords such as DROP, UPDATE etc.

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 36
# Type Exploit Code
1 XSS <script>{document.write(String.fromCharCode(79,80,76,56,57,70,71,72,67))}</script>
2 XSS %3C%73%63%72%69%70%74%3E%7Bdocument.write%28String.fromCharCode%28
79,80,76,56,57,70,71,72,67%29%29%7D%3C%73%63%72%69%70%74%3E
3 XSS <script>document.write(String.fromCharCode(79,80,76,56,57,70,71,72,67));</script>
4 XSS <script>document.write(String.fromCharCode(79,80,76,56,57,70,71,72,67))</script>
5 XSS javascript:document.write(String.fromCharCode(79,80,76,56,57,70,71,72,67))
6 XSS %6A%61%76%61%73%63%72%69%70%74:document.write(String.fromCharCode(79,80,76,56,57,70,71,72,67))
7 SQLi ’ OR ’OPL89FGHC’ = ’OPL89FGHC’−−
8 SQLi ’ OR ’OPL89FGHC’ = ’OPL89FGHC’#
9 SQLi ’ OR ’OPL89FGHC’ = ’OPL89FGHC’/*
10 SQLi ’) OR ’OPL89FGHC’ = ’OPL89FGHC−−
11 SQLi ’) OR (’OPL89FGHC’ = ’OPL89FGHC−−
12 SQLi x’ AND 1=(SELECT COUNT(*) FROM OPL89FGHC);−−
13 SQLi ’ UNION ALL SELECT name, pass FROM members

Table 4.2: Typical exploit code used by security testing applications to detect XSS and SQLi injection vulnerabilities

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities
| 37
“Online” Approach # HTTP Requests “Offline” Approach # HTTP Requests
Discovered Attack Vectors Discovered Attack Vectors
After Spidering targeted Application 50 1 50 1
After Phase #1 - “Echoed” Checks 10 21 50 21
After Phase #2 - “Special Character” Probing 3 60 50 60
TOTAL # of HTTP Requests needed 440 4’100
to discover Vulnerabilities

Table 4.4: Simplified performance comparison of the “Online” and “Offline” strategies for detecting input validation vulnerabilities by parsing the
database query log

Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities
| 38
5 Security Testing Framework - w3af

This chapter evaluates w3af as a basis for a combined white and black box security testing applica-
tion. w3af is an actively developed open source project with a lot of existing plugins including a web
crawler component and several code injection modules. It might be possible to enhance w3af with
the missing white box functionality.

w3af is a web application attack and audit framework completely written in Python. The project’s
goal is to create a framework to find and exploit web application vulnerabilities that is easy to use and
extend. w3af is released under the GNU General Public License Version 2.

5.1 w3af Plugins

w3af is a plugin based framework and posess over 100 plugins. These plugins do the actual work of
identifying web application vulnerabilities. Basically, w3af has three types of plugins: discovery, audit
and attack plugins. Discovery plugins have only one task, finding new so called “injection points” to
use the w3af term. These consist of HTML pages, URLs with query parameters, HTML forms etc. A
classic example of a discovery plugin is a web spider. This plugin takes an URL as input and returns
one or more injection points. When a user enables more than one plugin of this type, they work in
a loop: If plugin A finds a new URL in the first run, the w3af core will send that URL to plugin B . If
plugin B then finds a new URL, it will be sent to plugin A. This process will go on until all plugins are
run and no more knowledge about the application can be found using the enabled discovery plugins.
Audit plugins take the injection points found by discovery plugins and send specially crafted data to
all of them in order to find vulnerabilities. A classic example of an audit plugin is one that searches
for SQL injection vulnerabilities. Attack plugins objective is to exploit vulnerabilities found by audit
plugins. They usually return a shell on the remote server, or a dump of remote tables in the case of
SQL injections exploits. A complete list of all available w3af plugins can be found on the w3af project
homepage: http://w3af.sourceforge.net.

5.2 w3af WebSpider Plugin

The w3af web spider plugin is one of the first plugins one needs to configure in order to start au-
tomated web application vulnerability scans. The more resources the web spider is able to detect
of a targeted web application during a scan the more thorough the results will be depending on the
quality of the vulnerability detection routine. The overall goal of any web spider is to detect as many

Chapter 5. Security Testing Framework - w3af | 39


resources as possible so that other components can work with the detected HTML pages. Lets take a
closer look at the web spider plugin provided by w3af.

5.2.1 Settings

The w3af web spider plugin has the following options (see table 5.1) which can be configured by
the user. Other options regarding the spidering of a web application can also be found in the
http-settings and misc-settings menu.

Menu Option Description


plugins/discovery/ followRegex When spidering, only follow links that match this reg-
webSpider ular expression (ignoreRegex has precedence over
followRegex)
ignoreRegex When spidering, DO NOT follow links that match
this regular expression (has precedence over
followRegex)
onlyForward When spidering, only search directories inside the
one that was given as target
/http-settings userAgent User Agent header
maxRetrys Maximum number of retries
never404 A comma separated list that determines what URLs
will NEVER be detected as 404 pages
always404 A comma separated list that determines what URLs
will ALWAYS be detected as 404 pages
404string If this string is found in an HTTP response, then it will
be tagged as a 404
proxyPort Proxy TCP port
proxyAddress Proxy IP address
/misc-settings maxThreads Maximum number of threads that the w3af process
will spawn. Zero means no threads (recommended)
maxDiscoveryLoops Maximum number of times the discovery function is
called
maxDepth Maximum depth of the discovery phase

Table 5.1: w3af’s WebSpider plugin settings

5.2.2 Accessing Secured Content

Many web applications have some kind of secured area where only users with the right credentials
and user roles have access to. This could be some kind of administration interface or the like. w3af
provides three means to configure the user credentials needed for accessing these secured areas of a
web application. These three means include the following: Basic HTTP Authentication, Setting HTTP
Headers manually and providing a cookiejar file. How to use and configure these options will be
explained in the following paragraphs.

Chapter 5. Security Testing Framework - w3af | 40


Basic HTTP Authentication

“HTTP/1.0”, includes the specification for a “Basic Access Authentication scheme”. The basic
authentication scheme is based on the model that the client must authenticate itself with a user id
and a password for each realm. The realm value should be considered an opaque string which can
only be compared for equality with other realms on that server. The server will service the request
only if it can validate the user id and password for the protection space of the Request-URI. There
are no optional authentication parameters[5, p.5]. A user can configure these settings trough the
http-settings menu (see listing 5.1).

1 # . / w3af_console
2 w3af>>> http−s e t t i n g s
3 w3af / c o n f i g : http−s e t t i n g s >>> s e t basicAuthUser myuser
4 w3af / c o n f i g : http−s e t t i n g s >>> s e t basicAuthPass mypass
5 w3af / c o n f i g : http−s e t t i n g s >>> s e t basicAuthDomain l o c a l h o s t

Listing 5.1: Configuring basic HTTP authentication in w3af

Every time a w3af plugin encounters a “401 Authorization Required” response from the target web
server, w3af will send a response containing the Authorization header field with the previously
configured settings (see line 13 in listing 5.2). Listing 5.2 shows how a typical Basic HTTP Authenti-
cation works between a webserver and a web client (typically a browser). The arrows used in listing
5.2 indicate whether the response was sent (>) or received (<) from the clients point of view.

1 > GET / p r i v a t e / index . html HTTP/ 1 . 0


2 > Host : l o c a l h o s t
3

4 < HTTP/ 1 . 0 401 A u t h o r i z a t i o n Required


5 < S e r v e r : HTTPd/ 1 . 0
6 < Date : Sat , 27 Nov 2004 1 0 : 1 8 : 1 5 GMT
7 < WWW−A u t h e n t i c a t e : B a s i c realm="Secure Area "
8 < Content−Type : t e x t / html
9 < Content−Length : 311
10

11 > GET / p r i v a t e / index . html HTTP/ 1 . 0


12 > Host : l o c a l h o s t
13 > A u t h o r i z a t i o n : B a s i c bXl1c2VyOm15cGFzcw==

Listing 5.2: Basic HTTP authentication headers

Setting HTTP Headers Manually

w3af allows the user to manually configure HTTP headers which then will be included in every HTTP
request the framework sends. This can be used to set a HTTP cookie header manually. This requires
that the user received a valid cookie from the web application prior to starting the w3af framework.
Listing 5.3 shows how a user can configure a HTTP header file and listing 5.4 shows the actual content

Chapter 5. Security Testing Framework - w3af | 41


of such a HTTP header file containing a valid PHP session id as an example.

1 # . / w3af_console
2 w3af>>> http−s e t t i n g s
3 w3af / c o n f i g : http−s e t t i n g s >>> s e t h e a d e r s F i l e headers . t x t

Listing 5.3: Setting a HTTP headers file in w3af

1 Cookie : PHPSESSID=9cac0b235c5be6a9f3c70e281d2c3bff ; path=/

Listing 5.4: Contents of a possible HTTP headers file in w3af

Providing an Existing cookiejar File

This method is somewhat outdated because the feature relies on the assumption that the cookiejar
file is in the Mozilla cookie format which is basically the Netscape HTTP cookie format specified in
RFC 2965 and that a user has easily access to the cookiejar file. Versions prior to Mozilla Firefox
2.x and up to 2.0.0.20 stored the cookies in a plaintext, whitespace-delimited file which could be
easily copied and used in conjunction with w3af. Nowadays Firefox stores everything inside a SQLite
database which is easily accessible as well but missing an export functionality to convert stored
cookies to the required cookiejar format. Listing 5.5 shows how a user would need to set the
cookiejar file.

1 # . / w3af_console
2 w3af>>> http−s e t t i n g s
3 w3af / c o n f i g : http−s e t t i n g s >>> s e t c o o k i e J a r F i l e /home/ user / Desktop / c o o k i e j a r

Listing 5.5: Setting a Cookie JAR file in w3af

5.2.3 Shortcomings of the w3af WebSpider

The w3af web spider plugin has several shortcomings. The two most prominent are:

• Missing JavaScript Support: Most of the web applications written today use some kind of
JavaScript functionality for either visualisation of content or to provide application features
which have to run on the users computer. Web spiders without JavaScript support won’t be
able to fully extract all embedded links in modern web applications and therefore miss possible
attack vectors which could be posing a security threat to the targeted web application. A web
application might use some JavaScript code to display more HTML links after an element of a
HTML drop down menu has been selected. Web spiders without JavaScript support wont be
able to see these additional and dynamically created HTML links. Although sophisticated web
spiders try to evade the missing JavaScript support by parsing embedded and externally loaded
JavaScript source files for complete URLs. Of course this only works if no JavaScript code is be-
ing used to concatenate several strings to a valid URL. Section 5.2.4 introduces a benchmarking

Chapter 5. Security Testing Framework - w3af | 42


framework which is being used to measure the ability of web spiders to extract links on HTML
pages with embedded JavaScript code.

• Missing Automated HTML form Authentication: As seen in section 5.2.2 w3af does not provide
a plugin for automated HTML form authentication. Most modern web applications use some
kind of HTML form for their user authentication. A user has to enter his credentials such as
username and password into a HTML form field and send it to the web application. w3af is
missing a plugin which would automate this task. Other tools have to be used instead prior
to starting the w3af framework to collect cookies or session ids after a successful login. See
[2, p. 35] for a python script called SessionGrabber which can be used prior to scanning a web
application with w3af to collect session cookies of a targeted web application in an automated
manner if the username and password of a legitimate user is known to the testing agent.

Nevertheless web spiders wihtout JavaScript are much faster than their counterparts. Web spiders
with JavaScript support have to parse and compile the embedded JavaScript code first before a re-
quested HTML page can be further processed.

5.2.4 Wivet Framework & Performance of the w3af WebSpider plugin

WIVET1 is a benchmarking framework that aims to statistically analyse web link extractors. In general,
web application vulnerability scanners fall into this category. These vulnerability scanners, given a
URL, try to extract as many input vectors as they possibly can to increase the coverage of the attack
surface. WIVET provides a good sum of input vectors to any extractor and presents the results. In
order an input extractor meaningfully, it has to provide some kind of session handling, which nearly
all of the decent crawlers do. The WIVET project is released under the GNU General Public License
Version 2.

Crawler High Scores

This section compares the high scores scored by other commercial or free web spiders whether they
have JavaScript support or not. The high scores have been taken from the WIVET homepage (see
http://code.google.com/p/wivet/wiki/CurrentResults. Only w3af has been verified with
SVN revision 3438. Other scores could not be verified because the mentioned products in the high
score miss any versioning information.

• w3af - 50%
Open Source web application attack and audit framework.
Version: 1.1 (from SVN server) Revision: 3438. See http://w3af.sourceforge.net/

• Acunetix - 94%
Commercially available web application vulnerability scanner. Acunetix has pioneered the web
application security scanning technology: Its engineers have focused on web security as early as

1 http://code.google.com/p/wivet/ [29.03.10]

Chapter 5. Security Testing Framework - w3af | 43


1997 and developed an engineering lead in web site analysis and vulnerability detection. See
http://www.acunetix.com

• Cenzic Hailstorm - 88%


Commercially available web vulnerability scanner. Protect your Web applications by using Cen-
zic’s desktop, black box scanning solution. See http://www.cenzic.com/

• HP Webinspect - 94%
Commercially available automated web application security testing and assess-
ment tool. Get innovative assessment technology for web services and web appli-
cation security and automate web application security testing and assessment. See
https://h10078.www1.hp.com/cda/hpms/display/main/hpms_content.jsp?zn=
bto&cp=1-11-201-200%5E9570_4000_100__

• IBM AppScan - 83% Commercially available desktop solution to automate web application se-
curity testing. Rational AppScan Standard Edition significantly reduces costs associated with
manual vulnerability testing and helps to protect against the threat of cyber-attack by au-
tomating security analysis to detect exploitable vulnerabilities. See http://www-01.ibm.com/
software/awdtools/appscan/.

• MavitunaSecurity Netsparker - 92% Commercially available web application vulnerability


scanner. Netsparker can crawl, attack and identify vulnerabilities in all custom web applications
regardless of the platform and the technology they are built on, just like an actual attacker. It can
identify web application vulnerabilities like SQL Injection, Cross-site Scripting (XSS), Remote
Code Execution and many more. See http://www.mavitunasecurity.com/netsparker/.

As one can easily see, w3af performed the worst of the tested web spiders although one has to note
that w3af is the only free software project amongst the tested products. For a list of other commer-
cially or freely available web application security tools see [3].

5.3 XSS Injection Plugin

The w3af XSS plugin tries to find reflected JavaScript code in previously discovered HTML pages
of a targeted web application by injecting JavaScript code into the pages. At first, several special
characters are sent trough writing them into available GET parameters or by writing them into input
fields provided by HTML forms (performance reasons). The special characters tested are characters
such as <, >, ", ’, ( and ). Afterwards more complicated XSS payloads are being tested (see
figure 5.1 for a complete list of XSS payloads sent by w3af).

Chapter 5. Security Testing Framework - w3af | 44


<\0SCrIPT>alert("RANDOMIZE")</SCrIPT>
<SCrIPT>alert("RANDOMIZE")</SCrIPT>
<SCR\0IPt>alert("RANDOMIZE")</Sc\0RIPt>
<ScRIPT>a=/RANDOMIZE/\nalert(a.source)</SCRiPT>
<IFRAME SRC=\"javascript:alert(’RANDOMIZE’);\"></IFRAME>
<ScRIpT>alert(String.fromCharCode(RANDOMIZE</SCriPT>
</A/style="xss:exp/**/ression(alert(\’XSS\’">
’’;!--\"<RANDOMIZE>=&{()}
jAvasCript:alert("RANDOMIZE");
<ScRIPt SrC=http://RANDOMIZE/x.js></ScRIPt>
javas\tcript:alert("RANDOMIZE");
<ScRIPt/XSS SrC=http://RANDOMIZE/x.js></ScRIPt>
javas&#x09;cript:alert("RANDOMIZE");
<ScRIPt/SrC=http://RANDOMIZE/x.js></ScRIPt>
javas\0cript:alert("RANDOMIZE");

Figure 5.1: w3af XSS Plugin JavaScript Payloads

5.3.1 Settings

Table 5.2 shows the available options which allow the user to tweak the XSS plugin to his needs.

Menu Option Description


/plugins/audit/xss numberOfChecks Set the amount of checks to perform for each fuzzable
parameter. Valid numbers: 1 to 15
checkStored Identify stored cross site scripting vulnerabilities

Table 5.2: w3af’s XSS plugin settings

5.4 SQL Injection Plugin

The sqli plugin of w3af has no settings which can be configured by the user except the settings found
in the misc-settings and the http-settings menu.

5.5 Conclusion

Although w3af is one of the best freely available web application security testing frameworks it
lacks in regard of spidering modern web applications which rely on JavaScript support in the users
browser. Furthermore the extraction of web application cookies or session ids after valid user
credentials have been provided to the targeted web application is not optimally implemented in
w3af. It is quite difficult to automate the authentication process with the features provided by the
w3af framework. Third party tools have to be used for authentication and extraction of application
cookies or session ids prior to launching the w3af framework.

A proper working web spider is essential for the detection of possible security threats in a targeted
web application. This project tries to enhance the detection of possible input validation vulnerabil-
ities trough the parsing of database query log files but if no attack vectors are being detected on the
targeted web application the white box component can not detect any vulnerabilities. A possible so-
lution to this problem would have been to include JavaScript support to the w3af web spider in order
to enhance the detection of HTML links and forms. Unfortunately there are currently no Python li-
braries (w3af is completely written in Python) available which are able to modify a HTML document
tree based on common HTML JavaScript events such as: onmouseover, onselect, onload etc. This
is different if one looks for Java libraries. HtmlUnit is a “GUI-Less browser for Java programs” and is

Chapter 5. Security Testing Framework - w3af | 45


released under the Apache License Version 2. It models HTML documents and provides an API that
allows you to invoke pages, fill out forms, click links, etc... just like you do in your “normal” browser.
This sounds very promising and enhancing w3af has been ruled out. Chapter ?? and 7 describe how
the HtmlUnit library is being used to develop a combined black and white box security testing appli-
cation.

Chapter 5. Security Testing Framework - w3af | 46


6 Security Testing Framework - WASTF

The following chapter outlines the design decisions and overall package structure of the Web Ap-
plication Security Testing Framework (WASTF) which has been developed during this project thesis.
The framework builds the foundation for the combined white and black box web application security
testing plugin. One possibility would have been to extend the w3af framework but this has been ruled
out in chapter 5 because of w3af’s shortcomings in certain areas such as the web spider plugin. The
WASTF framework has been designed to be very similar to w3af to make it easier to use for people
which are already familiar with w3af.

6.1 Overview & Interactive Command Line Interface

The WASTF framework is a command line tool with an interactive command line interface (CLI)
similiar to that of w3af. WASTF uses the open source JLine1 library to provide the CLI. JLine is a Java
library for handling console input. People familiar with the readline/editline capabilities for modern
shells (such as bash and tcsh) will find most of the command editing features of JLine to be familiar2 .
JLine is distributed under the BSD license.

The framework consists of menus [M] and plugins [P] which are represented in a hierarchical tree
like structure and which can be traversed by the user trough the CLI by entering the name of the
menu or plugin he wants to enter. By entering ’back’ the user leaves the current menu or plugin and
jumps to the parent of the current element. Inside a plugin or menu element the user can configure
settings provided by the element trough the ’set <parameter> <value>’ command. To display a
list of available commands and parameters to change, the user simply has to type ’help’ followed by
ENTER no matter where he currently resides in the hierarchical menu structure. Figure 6.1 shows the
hierarchical menu structure of the framework.

The plugins which contain the actual business logic are classified according to one of the following
categories (similar to w3af):

• WebLogin: These types of plugins are being used to get the necessary user credentials from the

1 http://jline.sourceforge.net/ [25.06.10]
2 The most generic sense of the term shell means any program that users employ to type commands. In the Unix operating
system users may select which shell to use for interactive sessions. When the user logs in to the system the shell program
is automatically executed. Many types of shells have been developed for this purpose. The program is called a “shell”
because it hides the details of the underlying operating system behind the shell’s interface. The shell manages the
technical details of the operating system kernel interface, which is to the lowest-level, or ’inner-most’ component of an
operating system.

Chapter 6. Security Testing Framework - WASTF | 47


/
http-settings [M]
misc-settings [M]
plugins [M]
audit [M]
databaseQueryLog [P]
discovery [M]
webSpider [P]
output [M]
xmlFile [P]
webLogin [M]
webFormLogin [P]
target [M]
Figure 6.1: Menu and plugin hierarchical tree structure of the WASTF framework

web application before any other plugins are being started. This is especially useful if the tar-
geted web application is being secured with a login form where a user has to enter his username
and password before he even gets access to the content of the web application.

• Discovery: Discovery plugins are used to collect as much information about the targeted web
application as possible (Information Gathering). This includes web server data such as brand
and version (not yet implemented) and spidering the web application.

• Audit: Audit plugins search for vulnerabilities in the previously discovered web application re-
sources.

• Output: Output plugins have the simple task to write the findings of all the executed plugins
into a nicely formatted report such as a XML, HTML or PDF file.

Depending on the category, a plugin which has been enabled by the user will be started at a different
time in the program flow (see section 6.2). As described in 3.3 the framework can be started in two dif-
ferent modes: interactive and automated. The automated mode can be used to schedule automated
runs trough a cron job in UNIX like operating systems or trough the Scheduler in Microsoft operating
systems. The user simply has to provide a common text file with commands he wants to be executed
during a scheduled run of the framework. The commands have to be separated trough newlines.

6.2 Program Flow

The next sub sections describe the two modes in which the WASTF framework can be launched and
the differences between them.

6.2.1 Interactive Mode

The interactive mode is the default mode which is being used if the user does not specify the -s
<script file> parameter. The application is being launched in the default mode by simply run-

Chapter 6. Security Testing Framework - WASTF | 48


ning java -jar WASTF.jar in an operating system provided command line shell. The application
is being started in the interactive mode and provides its own interactive command line shell to the
user. Listing 6.1 shows how to start WASTF in the interactive mode.

6.2.2 Automated Mode

The automated mode is especially important for testing the security of web applications in an au-
tomated way. This mode allows the testing agent to schedule continuous scans of a targeted web
application lets say every morning at three o’clock. Listing 6.2 shows how to start WASTF in the auto-
mated mode. Listing 6.3 shows the content of the file used to automate the WASTF framework. The
wastfScript.txt file contains the commands which are being executed from top to bottom by the
WASTF framework.

6.3 Package Overview

Figure 6.2 shows a simplified UML 2 package/class diagram with a birds eye overview over the pack-
age structure of the WASTF framework. The actual source code can be found on the enclosed CD-
ROM. The framework consists of 9 packages with various sub packages. The general purpose of each
package shall be explained:

• com.wastf.cli The cli package contains the classes responsible for the interactive Command
Line Interface (CLI). The CLI uses the free and open source library JLine3 . JLine is a Java library
for handling console input. It is similar in functionality to BSD editline and GNU readline. Peo-
ple familiar with the readline/editline capabilities for modern shells (such as bash and tcsh)
will find most of the command editing features of JLine to be familiar. WASTF provides a hier-
archical menu structure on top of JLine which can be traversed by the user (see section 6.1).

• com.wastf.configuration The configuration package keeps track of settings the user made in-
side menus as well as in plugins. Additionally, these settings are being encoded and saved into
the underlying data storage system. Another noteworthy task of this package is to provide an in
memory service for inter plugin communication during an active scan of a targeted web appli-
cation.

• com.wastf.database The database package contains the necessary classes for accessing and
setting up a connection to the underlying Database Management System. Section 6.4 describes
how other classes can make use of the provided data storage system.

• com.wastf.internalPlugins The internalPlugins package contains the classes which actually


implement the various plugins in WASTF. This includes the web spider, form login, XML output
and the database query log plugin.

• com.wastf.log The log package contains classes for logging what the user sees and enters into
the CLI into a common text file for later analysis and reproducibility.
3 http://jline.sourceforge.net/ [13.07.10]

Chapter 6. Security Testing Framework - WASTF | 49


Figure 6.2: Simplified UML 2 Package Overview of the WASTF Application

• com.wastf.menu The menu package contains the various menus which can be accessed by the
user such as the target menu for setting the web application to scan or the discovery menu
which displays all the plugins which belong to the discovery category.

• com.wastf.plugin The plugin package holds the various plugin interfaces for the different plu-
gin categories such as discovery, output and audit plugins.

• com.wastf.util The util package contains simple static classes with helpful utility methods
which can be commonly used. The package contains for example a random string generator,
Base64 en-/decoder etc.

• com.wastf.web The web package contains wrapper and helper classes for using the HtmlUnit
library for sending HTTP(S) POST and GET requests as well as parsing retrieved HTML content
(with JavaScript support). See chapter 7 for more information about the HtmlUnit library and
its features.

Chapter 6. Security Testing Framework - WASTF | 50


1 # j a v a − j a r WASTF . j a r
2 __ __ __ ________ ______ _________ ______
3 / _ // _ // _ /\ / _______ /\ / _____ /\ / ________ /\/ _____ /\
4 \:\\:\\:\ \\::: _ \ \ \ : : : : _ \/ _ \ __ . : : . __ \ / \ : : : : _ \/ _
5 \:\\:\\:\ \\::( _) \ \ \ : \ / ___ /\ \::\ \ \ : \ / ___ /\
6 \ : \ \ : \ \ : \ \ \ : : __ \ \\ _ : : . _ \ : \ \::\ \ \ : : : . _ \/
7 \:\\:\\:\ \\:.\ \ \ \ / ____ \ : \ \::\ \ \:\ \
8 \ _______ \/ \ __ \/\ __ \/ \ _____ \/ \ __ \/ \ _ \/
9

10 Web A p p l i c a t i o n S e c u r i t y T e s t i n g Framework (WASTF)


11 Copyright (C) 2010 Kevin Denver
12 This program comes with ABSOLUTELY NO WARRANTY.
13 This i s f r e e software , and you are welcome to r e d i s t r i b u t e i t
14 under c e r t a i n c o n d i t i o n s .
15

16 wastf ( / ) >

Listing 6.1: Starting the WASTF framework in the interactive mode

1 # j a v a − j a r WASTF . j a r −s /home/<myuser>/Desktop / w a s t f S c r i p t . t x t
2 __ __ __ ________ ______ _________ ______
3 / _ // _ // _ /\ / _______ /\ / _____ /\ / ________ /\/ _____ /\
4 \:\\:\\:\ \\::: _ \ \ \ : : : : _ \/ _ \ __ . : : . __ \ / \ : : : : _ \/ _
5 \:\\:\\:\ \\::( _) \ \ \ : \ / ___ /\ \::\ \ \ : \ / ___ /\
6 \ : \ \ : \ \ : \ \ \ : : __ \ \\ _ : : . _ \ : \ \::\ \ \ : : : . _ \/
7 \:\\:\\:\ \\:.\ \ \ \ / ____ \ : \ \::\ \ \:\ \
8 \ _______ \/ \ __ \/\ __ \/ \ _____ \/ \ __ \/ \ _ \/
9

10 Web A p p l i c a t i o n S e c u r i t y T e s t i n g Framework (WASTF)


11 Copyright (C) 2010 Kevin Denver
12 This program comes with ABSOLUTELY NO WARRANTY.
13 This i s f r e e software , and you are welcome to r e d i s t r i b u t e i t
14 under c e r t a i n c o n d i t i o n s .
15

16 wastf ( / ) > version


17 [ ∗ ] WASTF Version 1 . 0 ( R e v i s i o n : 3 3 3 )
18 wastf ( / ) > e x i t

Listing 6.2: Starting the WASTF framework in the automated mode

1 ## This i s a comment
2 version
3 exit

Listing 6.3: Content of the wastfScript.txt WASTF script file

Chapter 6. Security Testing Framework - WASTF | 51


6.4 Data Storage

For saving relevant information during and between scans of targeted web applications, WASTF uses
a relational database management system. The decision to use a database management system
was a necessity for developing a working web spider. The alternative would have been to keep all
the discovered web pages in memory or download the discovered pages to the machine’s hard disk.
Both alternatives have some significant drawbacks. Storing everything in the machine’s memory is
obviously not a good idea for large web applications such as e-business applications. The memory
would soon be filled and overrun with data which in turn causes the WASTF application to crash
inevitably. Downloading web pages onto the machine’s hard disk would probably work but comes
with great speed losses. Each plugin activated after the web spider would have to load and parse the
stored web site a new form the hard disk.

WASTF uses the H2 database management system. H2 is a relational database management system
written in Java4 . It can be embedded in Java applications or run in the client-server mode. The disk
footprint (size of the jar file) is about 1 MB5 .The software is available as open source software under
modified versions of the Mozilla Public License or the original Eclipse Public License.

6.4.1 Data Access Object

WASTF uses the Data Access Object (DAO) pattern6 for accessing the database. The component that
relies on the DAO uses the simpler interface exposed by the DAO for its clients. The DAO completely
hides the data source implementation details from its clients. Because the interface exposed by the
DAO to clients does not change when the underlying data source implementation changes, this pat-
tern allows the DAO to adapt to different storage schemes without affecting its clients or components.
Essentially, the DAO acts as an adapter between the component and the data source.

6.4.2 Database Schema

Figure 6.3 shows an entity relationship model (ERM) of the current implemented database schema.
What follows is a short description of each table shown in figure 6.3:

• Run: This table stores a URL of a web application the user wants to scan. Additionally a time
stamp is being kept when the scan started. All the other tables reference the primary key em-
bedded in this table.

• Configuration: The configuration table keeps track of the settings a user made before starting
a scan. This includes menus and plugins likewise.

• DiscoveredItem: The following table is being used by the web spider plugin to store discovered
web pages during the crawling process.

4 http://www.h2database.com/html/main.html [12.07.10]
5 http://www.h2database.com/html/main.html [12.07.10]
6 http://java.sun.com/blueprints/corej2eepatterns/Patterns/DataAccessObject.html [12.07.10]

Chapter 6. Security Testing Framework - WASTF | 52


Figure 6.3: Entity Relationship Model (ERM) of the current implemented WASTF database schema

• HttpLog & HttpLogParameter: The HttpLog table stores every HTTP or HTTPS request sent by
the HtmlUnit library.

• Report: Is being used by plugins to store pieces of information or identified vulnerabilities in


the targeted web application which later are being written into the report.

Chapter 6. Security Testing Framework - WASTF | 53


7 HtmlUnit based Web Spider

This chapter describes the open source HtmlUnit project and the benefits it provides for testing web
applications automatically. Additionally, the design and performance of the web spider plugin, which
has been developed as part of this project thesis, are being discussed and visualised.

7.1 HtmlUnit Overview

HtmlUnit is a “GUI-Less browser for Java programs” and is released under the Apache License
Version 2. It models HTML documents and provides an API that allows developers to invoke pages,
fill out forms, click links, etc... just like one does in a “normal” browser. The JavaScript support
(which is constantly being improved) is fairly good and is able to work even with quite complex AJAX
libraries, simulating either the Mozilla Firefox or Microsoft Internet Explorer. HtmlUnit is typically
used for testing purposes although HtmlUnit is not a generic unit testing framework such as JUnit
for Java. It is specifically a way to simulate a browser for testing purposes and is intended to be used
within another testing framework such as JUnit or TestNG1 .

HtmlUnit uses the Mozilla Rhino engine2 to provide JavaScript support. Rhino is an open source
implementation of JavaScript written entirely in Java. Rhino is an implementation of the core
language only and does not contain objects or methods for manipulating HTML documents. The
methods for manipulating HTML documents is instead provided by HtmlUnit. Rhino contains (as
of version 1.6): All the features of JavaScript 1.7, allows direct scripting of Java, a JavaScript shell
for executing JavaScript scripts, a JavaScript compiler to transform JavaScript source files into Java
class files and a JavaScript debugger for scripts executed with Rhino. The JavaScript language itself is
standardised by Standard ECMA-262 ECMAScript3 : A general purpose, cross-platform programming
language. Rhino 1.3 and greater conform to Edition 3 of the Standard.

JavaScript support is essential for web spider components which want to crawl today’s web applica-
tions thoroughly. There are no absolute trends about the use of JavaScript. Some users have scripting
turned off, some browsers do not support scripting. However W3C’s browser statistics show that 95%
of all browsers on the Internet have JavaScript support enabled (as of January 2008)4 . It is safe to say
that there are more web applications being built using some kind of JavaScript functionality than

1 Takenfrom http://htmlunit.sourceforge.net/[05.04.10]
2 http://www.mozilla.org/rhino/ [05.04.10]
3 http://www.ecma-international.org/publications/standards/Ecma-262.htm [05.04.10]
4 http://www.w3schools.com/browsers/browsers_stats.asp [11.06.10]

Chapter 7. HtmlUnit based Web Spider | 54


there are applications completely avoiding the use of JavaScript.

The goal of web spider components especially embedded in automated web application security
related applications is to find and unveil as many HTML pages containing attack vectors as possible.
Web spiders without JavaScript support such as the webSpider plugin of w3af would return an empty
HTML document in case of stumbling upon the HTML page shown in listing 7.2. This results in a
reduced accuracy because pages of the targeted web application which rely on extended JavaScript
functionality are never being checked for security related issues in an automated scan.

By using the HtmlUnit API and its JavaScript support it is relatively easy to write a multi threaded
web spider module which is able to cope with modern web applications using extended JavaScript
functionality for their web applications. The next section shows how HtmlUnit is being used in a
simple scenario: to retrieve a HTML page with embedded JavaScript, fill out some HTML text input
fields and to actually submit the form by using the HtmlUnit API to click on the submit button.

7.2 Submitting a form with HtmlUnit

The API of HtmlUnit is very accommodating for filling out and submitting HTML forms. Listing 7.1
shows a sample Java method which retrieves a HTML page (the source code of the retrieved HTML
page is shown in listing 7.2) trough a proxy and populates the different HTML text input fields. The
received HTML page contains a login form which has been dynamically created by a JavaScript
routine (see listing 7.2, lines 5 - 12).

HtmlUnit (as of version 2.7) is able to simulate the following browsers: Microsoft Internet Explorer
versions 6, 7 and 8, Netscape and Mozilla Firefox versions 2 and 3. Selecting a specific browser
version changes the HTTP user agent header which is being sent by the HtmlUnit API. Additionally
some specific JavaScript instructions are being interpreted differently based on the selected browser
version5 . The specific browser version is being set upon creation of the HtmlUnit WebClient object
(see listing 7.1, lines 3-4).

Retrieving a HTML page is done by calling the webClient.getPage("http://www....");


method as seen in listing 7.1 on line 13. The HtmlUnit library retrieves the HTML page located
at the given location and silently loads any additional resources such as Cascading Style Sheets
(CSS) and JavaScript source files embedded with the <script type="text/javascript"
src="remote.js"></script> directive from the remote host.

HtmlUnit now uses the Mozilla Rhino engine to interpret all the downloaded JavaScript instructions
and changes the HTML document structure accordingly (if necessary). The mentioned getPage()
method returns a Java HTML page object which now contains the finished rendered HTML page

5 http://htmlunit.sourceforge.net/apidocs/com/gargoylesoftware/htmlunit/BrowserVersion.html
[11.06.10]

Chapter 7. HtmlUnit based Web Spider | 55


which now can be used for further processing.

The HTML page object can now be used to manipulate the contents of the retrieved HTML page. To
give an impression of what functionality HtmlUnit is offering trough its API, the following list con-
tains some of the more interesting methods which are being offered by the HTML page object (as of
HtmlUnit version 2.7). For a complete documentation of the API visit HtmlUnit’s project homepage:

• List<HtmlAnchor> getAnchors()
Returns a list of all <a href="">...</a> anchors contained in a received HTML page.

• List<HtmlElement> getElementByName(String name)


Returns the HTML element with the specified name.

• HtmlElement getFocusedElement()
Returns the element with the focus or null if no element has the focus.

• List<HtmlForm> getForms()
Returns a list of all the forms in a received HTML page.

• List<FrameWindow> getFrames()
Returns a list containing all the frames (from frame and iframe tags) in a received HTML page.

Some of these functions return a HtmlElement object which is an abstract class provided by Htm-
lUnit. This abstract class is being used by other HtmlUnit classes to model HTML elements such as
text- or password input fields, radio buttons, checkbox buttons etc. Again a short list of interesting
methods which all HtmlElement objects have in commom:

• <P extends Page> P click()


Simulates clicking on this element, returning the page in the window that has the focus after
the element has been clicked.

• <P extends Page> P dblClick()


Simulates double-clicking on this element, returning the page in the window that has the focus
after the element has been clicked.

• void focus()
Sets the focus on this element.

• Page mouseOver()
Simulates moving the mouse over this element, returning the page that this element’s window
contains after the mouse move.

• void setAttribute(String attributeName, String attributeValue)


Sets the value of the attribute specified by name.

Chapter 7. HtmlUnit based Web Spider | 56


The example method shown in listing 7.1 manipulates a HTML text- and a HTML pass-
word input element on lines 20-25 by calling the setText() method (which is equivalent to
setAttribute("value", "myString");) and finally submits the filled out form by “clicking” on
the submit button on line 28 by calling the click() method.

The example in listing 7.1 uses the names of the HTML elements in order to retrieve them from
the previously retrieved HTML page. This is just one of several ways to retrieve and modify HTML
elements embedded in the HTML page object. Other methods include: retrieving HTML elements
by their “id” attribute, iterating trough a list of all available HTML elements in the HTML page or by
using the XML Path Language (XPath). XPath is a query language for selecting nodes from an XML
document. The XPath language is based on a tree representation of a XML document, and provides
the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use (though
not in the official specification), an XPath expression is often referred to simply as an XPath.

HtmlUnit’s HtmlPage object allows the developer to retrieve HTML elements embedded in the HTML
page by issuing XPath queries such as the following code snippet which retrieves all HTML text- and
password input elements embedded in a HTML page object.

String xPathQuery = "//input[@type=’text’] | //input[@type=’password’]";


List<HtmlElement> nodeList = htmlPage.getByXPath(xPathQuery);

These XPath queries are extensively used in the multi threaded web spider plugin written for this
project thesis (see section 7.4). The XPath queries are being used to extract new URLs pointing to
pages on the targeted web application which have not been visited in an ongoing web spider run.

Chapter 7. HtmlUnit based Web Spider | 57


1 public void submittingForm ( ) throws Exception {
2 // Creat es a new browser o b j e c t using a proxy s e r v e r
3 // and s i m u l a t i n g M o z i l l a F i r e f o x v e r s i o n 3
4 f i n a l WebClient webClient = new WebClient ( BrowserVersion . FIREFOX_3 ,
5 " htt p : / / myproxyserver " , 8 0 8 0 ) ;
6

7 // Se t proxy username and password


8 final DefaultCredentialsProvider credentialsProvider = ( DefaultCredentialsProvider )
9 webClient . g e t C r e d e n t i a l s P r o v i d e r ( ) ;
10 c r e d e n t i a l s P r o v i d e r . addProxyCredentials ( " proxyUsername " , " myProxyPassword123 " ) ;
11

12 // Get the f i r s t page


13 f i n a l HtmlPage page1 = webClient . getPage ( " htt p : / /www. example . com/ l o g i n . php" ) ;
14

15 // Get the form t h a t we are d e a l i n g with and within t h a t form ,


16 // f i n d the submit button and the f i e l d t h a t we want to change .
17 f i n a l HtmlForm form = page1 . getFormByName ( " loginForm " ) ;
18

19 f i n a l HtmlSubmitInput button = form . getInputByName ( " loginButton " ) ;


20 f i n a l HtmlTextInput textFieldUsername = form . getInputByName ( " username " ) ;
21 f i n a l HtmlPasswordInput t e x t F i e l d P a s s w o r d = form . getInputByName ( " password " ) ;
22

23 // Change the value o f the t e x t f i e l d s


24 textFieldUsername . s e t T e x t ( " john " ) ;
25 t e x t F i e l d P a s s w o r d . s e t T e x t ( "gaephah6MueD" ) ;
26

27 // Now submit the form by c l i c k i n g the button and g e t back the second page
28 f i n a l HtmlPage page2 = button . c l i c k ( ) ;
29 }

Listing 7.1: Short HtmlUnit example for submitting a login form trough a proxy server

1 <html>
2 <head>< t i t l e>Login Form</ t i t l e></head>
3 <body>
4 <form action=" checkLogin . php" name=" loginForm ">
5 <s c r i p t type=" t e x t / j a v a s c r i p t ">
6 f u n c t i o n writeInputElement ( inputType , inputName ) {
7 document . w r i t e ( ’<input type=" ’ + inputType + ’ " name=" ’ + inputName + ’ " /> ’ ) ;
8 }
9 writeInputElement ( " t e x t " , " username " ) ;
10 writeInputElement ( " password " , " password " ) ;
11 writeInputElement ( " submit " , " loginButton " ) ;
12 </ s c r i p t>
13 </form>
14 </body>
15 </html>

Listing 7.2: Dynamically created HTML login form with embedded JavaScript code

Chapter 7. HtmlUnit based Web Spider | 58


7.3 HtmlUnit JavaScript Performance

The only drawback of HtmlUnit is its JavaScript execution speed. HtmlUnit is noticeable slower in
processing downloaded HTML documents when its JavaScript support is enabled than other state of
the art browsers like Mozilla Firefox and its competitors. In order to measure the performance of the
HtmlUnit library a test case from the Mozilla Dromaeo JavaScript Performance Test Suite has been
used6 . The test case has been slightly modified for testing the execution speed of the HtmlUnit API.

The selected test case involves modifying the HTML Document Object Model (DOM) by creating and
appending several hundred new HTML elements to a retrieved page. The following functions are
being used by the test case to measure the execution speed of HTML DOM modifications:

• createElement(tagName)
This method returns an Element object. The tagName parameter is of type String. This method
can raise a DOMException object.

• createTextNode(data)
This method returns a Text object. The data parameter is of type String.

• cloneNode(deep)
This method returns a Node object. The deep parameter is of type Boolean.

• document.body.appendChild(newChild)
This method returns a Node object. The newChild parameter is a Node object. This method
can raise a DOMException object.

• document.body.insertBefore(newChild, refChild)
This method returns a Node object. The newChild parameter is a Node object. The refChild
parameter is a Node object. This method can raise a DOMException object.

• document.body.innerHTML
This variable can be used to modify the rendered content of a HTML document after it has been
fully loaded inside a web browser.

Figure 7.1 shows the relative runtimes in milliseconds between Mozilla Firefox 3.6.3, Chromium
6.0.431.0 and HtmlUnit 2.7. In comparison HtmlUnit is almost 4x slower than Mozilla Firefox 3.6.3
and even 19x slower than Chromium 6.0.431.0 in doing HTML DOM modifications. Whereas the cre-
ation of new HTML elements is the most time consuming task for HtmlUnit. The test case has been
run five times for every browser and the runtimes in figure 7.1 are the arithmetic mean of these times.
See Appendix A for a detailed breakdown of the measured runtimes and Appendix B for the code used
to measure the JavaScript performance.

6 https://wiki.mozilla.org/Dromaeo [11.06.10]

Chapter 7. HtmlUnit based Web Spider | 59


Figure 7.1: JavaScript execution time in various browsers for HTML DOM modifiactions

7.4 Design of the Web Spider plugin

The following sections describe the overall design of the web spider plugin which has been devel-
oped as part of this project thesis. The web spider uses the HtmlUnit library described in sections
7.1, 7.2 and 7.3 to use its JavaScript interpreter and the ability to simulate state of the art web browsers.

The main goal throughout the design and implementation of the web spider was to build it:

• Scalable: As seen in section 7.3 the HtmlUnit library is not the fastest when it comes to pro-
cessing JavaScript instructions especially with HTML DOM modifications. In order to increase
the crawling speed without disabling JavaScript support the web spider has to be developed in
a multi threaded fashion.

• Easily extensible: The web spider needs to be easily extensible if more functionality should be
added to the code base without any major hassle.

• JavaScript aware: The main advantage over other web spiders is the JavaScript support of the
HtmlUnit library. The full potential of the HtmlUnit API regarding JavaScript support should be
used.

• Comprehensive: The web spider should be crawling a web application as thoroughly as pos-
sible. Other components like the detection of SQL injection vulnerabilities build up on the
discovered resources by the web spider.

• Configurable: The user should be able to alter the behaviour of the web spider trough several
parameters which he can set prior to starting the web spider. The configurable parameters are
being described in section 7.4.1.

Appendix C and D contain a simplified UML 2 class and sequence diagram of the developed web
spider component.

Chapter 7. HtmlUnit based Web Spider | 60


7.4.1 Configuration Settings

In order to alter the behaviour of the web spider plugin the user can set various settings trough the
configuration menu of the web spider. These settings mostly influence how long the web spider plu-
gin will be crawling trough a web application before terminating the crawling process and passing
the results to other modules like the SQL injection detection routine. Additionally, a user is able to
tell the web spider which resources it either should follow or ignore.

• Setting a maximum crawling level: This setting allows the user to tell the web spider plugin
how “deep” into the web application it should crawl. The first page the web spider retrieves
from the web application is on level 0. Any links embedded in that first page lead to other pages
which are located on level 1 and so on. So by setting a maximum crawling level n, the web
spider will stop crawling any further if it reached level n.

• Setting a follow regular expression: If this setting is being used then the web spider plugin
will only follow links matching the given regular expression and will ignore all other links not
matching the provided expression.

• Setting an ignore regular expression: If this setting is being used then the web spider will crawl
every link it finds but will ignore those that match the provided expression.

• Setting a time limit: The user is able to set a maximum time limit for the spidering process. The
web spider will stop spidering when the time limit has been exceeded. Of course the spidering
process will be terminated before the time limit has been exceeded if either the web spider is not
able to find any more resources on the targeted web application or the disk space is becoming
low.

• Staying on the same domain: Usually the web spider follows any found link embedded in a
received HTML page even if the link leads to a completely different domain than the initial
address of the targeted web application. For example a web application reachable under the
following address https://web1.example.com has an embedded link to a completely differ-
ent domain such as http://www.example2.org. This setting tells the web spider if domains
differing from the initial domain of the web application should be ignored or not.

7.4.2 Program Flow

Figure 7.2 shows an UML 2 activity diagram with a birds eye view on the implemented program logic
of the multi threaded, JavaScript aware web spider. The steps following the Start Spidering node are
being executed in multiple threads thus speeding up the whole spidering process. A received HTML
page is being processed by executing different link extraction routines. These extraction routines
scan the retrieved HTML code for links pointing to previously undiscovered HTML pages of the web
application. The currently implemented extraction routines include the following:

• A HREF link extraction: This routine simply extracts the src attribute from common HTML
<a href="...">...</a> links.

Chapter 7. HtmlUnit based Web Spider | 61


• Frame extraction: This routine scans the retrieved HTML code for HTML <frame
src="..."/> and <iframe src="..."/> tags and extracts the src attribute.

• Mouse event extraction: Scans the retrieved HTML code for onmouseover and onmouseout
tags and invokes the event with HtmlUnit’s JavaScript support. If the event triggers a page redi-
rect to a previously undiscovered HTML page, then the page will be added to the web spider
queue for further processing.

• OnClick extraction: This routine scans the retrieved HTML code for onclick tags and invokes
the event with HtmlUnit’s JavaScript support. If the event triggers a page redirect to a previously
undiscovered HTML page, then the page will be added to the web spider queue for further
processing.

• OnChange extraction: This routine scans the retrieved HTML code for onchange tags and
invokes the event with HtmlUnit’s JavaScript support. If the event triggers a page redirect to a
previously undiscovered HTML page, then the page will be added to the web spider queue for
further processing.

• HTML comment extraction: Scans the retrieved HTML code for URLs between HTML com-
ments with a sophisticated regular expression. The following example shows a common HTML
comment and a embedded URL which will be detected by this routine. The identified URLs
inside HTML comments are being added to the web spider queue for further processing.
<!--
This is a common HTML comment...
and this is an embedded URL: https://web1.example.com/index.php
-->

• Submitting HTML forms: If the retrieved HTML page contains one or multiple HTML forms,
this routine tries to fill out the existing input fields, drop down boxes, radio buttons etc. to
successfully submit the form. If the submitted form triggers a page redirect to a previously
undiscovered HTML page, then the page will be added to the web spider queue for further
processing. For more details on how forms are being submitted and how input fields are being
populated see section 7.5.

7.5 Smartly filling out HTML forms

Common HTML forms are widely used in today’s web applications for passing information from
the user to the web application. A form can contain input elements like text fields, checkboxes,
radio-buttons, submit buttons and more. A form can also contain select lists, textarea, fieldset,
legend, and label elements. Of course there are other technologies such as Adobe Flash, Java Applets,
JavaFX, Microsoft Silverlight etc. that can be used to create expressive, feature-rich web application
user interfaces. These technologies should not be discussed here further. The web spider plugin
developed for this project thesis is able to detect HTML forms embedded in retrieved HTML pages.

Chapter 7. HtmlUnit based Web Spider | 62


Figure 7.2: Simplified UML 2 activity diagram showing the main flow of the web spider

The discovered forms are being analysed and the web spider tries to populate the form elements such
as text fields, radio-buttons, checkboxes etc. with reasonable data to circumvent any restrictions in a
non invasive way.

The main difficulty in filling out HTML forms in an automated manner is to circumvent any re-
strictions the business logic of the web application might impose on a particular HTML form. Lets
say a user has to create a new user profile in an E-Business application. The user has to enter his
name, address and phone number in order to use the E-Business application. Most probably the
user will be shown a HTML form with the necessary input fields such as a text field for entering his
name, address, phone number and e-mail address and a drop down list for entering his birthday
and country. The business logic of the web application probably checks the input fields and helps
the user in filling out the necessary data. The business logic makes sure that the text field used for
entering the users phone number only contains numeric values such as numbers ranging from 0 to
9 before accepting the data. The main goal is to guess such restrictions based on various indicators
like the name of the HTML element and others.

The routine for smartly filling out HTML forms is being implemented as a helper library and can
be used in a non- and in an invasive mode. How business logic restrictions are being guessed and
what the differences between those two mentioned modes are will be explained in the following sub
sections 7.5.1 and 7.5.2.

Chapter 7. HtmlUnit based Web Spider | 63


7.5.1 Non Invasive

The non invasive form of this routine is being used by the web spider plugin. Non invasive means
that no hidden HTML elements are being changed and no existing values except for text fields are
being overwritten. This reflects the behaviour of a normal and friendly user using the web applica-
tion. The following indicators are being used to guess the imposed business logic on HTML elements
embedded in a retrieved HTML form:

• Buzzwords: If the name or the id of a HTML element matches a predefined buzzword


such as [number, phone, telephone, mobile, zip, postal, day, month, year,
hour, minute, second] then the input field will be populated with a unique random nu-
meric value. Other buzzwords such as [mail, email] are being used to identify email ad-
dress fields. Unique and random e-mail addresses will be created in the following format
[a-zA-Z0-9]@[a-zA-Z0-9].com if the name or id of the HTML element matches the e-mail
buzzword.

• Existing value detection: If the name or the id of a HTML element does not match any prede-
fined buzzword the value of the element is being analysed. It might be possible that the input
field has been pre filled out with an example by the web application developer. Let’s say the
user has to enter his e-mail address and the according HTML element is filled out with an ex-
ample such as email@example.com. The routine detects the @ by using regular expressions
and generates a random unique e-mail address. If the value attribute matches the following
[0-9]* regular expression a unique random numeric value is being generated. If the value
attribute matches the following [a-zA-Z0-9]* regular expression a unique random alphanu-
meric value is being generated. If none of the predefined regular expressions match, a unique
random alphanumeric value is being generated per default.

• Max length detection: After a unique random value has been created whatsoever the HTML
element is being checked for length restrictions. HTML input fields can be limited to a specific
size with the maxlength attribute. If such an attribute exists and the generated value is longer
than the maximum allowed size, the random value is being reduced to not violate the maximum
length restriction.

Figure 7.3 shows an UML 2 activity diagram of the main program flow of the non invasive mode.

7.5.2 Invasive

The invasive mode behaves exactly the same as the non invasive form with only a few exceptions. The
main goal of the invasive mode is to modify as many HTML element value attributes as possible in an
automated manner. This mode is mainly used by plugins trying to find input validation vulnerabilities
in a targeted web application. HTML elements such as hidden input fields and values of drop down
lists, checkboxes and radio buttons are being changed.

• Changing the value of hidden HTML elements: Hidden HTML fields are not rendered by the
browser and thus not visible to the user without looking at the code of a retrieved HTML page.

Chapter 7. HtmlUnit based Web Spider | 64


Figure 7.3: Simplified UML 2 activity diagram for the non invasive smart fill HTML form module

Hidden HTML elements are used by web applications to pass along information which should
not be visible and editable to the user. The following snippet shows a hidden HTML element
passing along a username.
<input type="hidden" name="username" value="dave">
The invasive routine detects such hidden HTML elements and analyses the element like in the
non invasive method in section 7.5.1 (buzzwords, value matching and maximum length restric-
tions). The generated unique and random value is either numeric, alphanumeric or a random
e-mail address. In the above case the unique random value would look something like this:
<input type="hidden" name="username" value="ear5ue5xoi5JeecohT1o">

• Changing the value of select options: The <select> tag is used to create a select list (drop-
down list) for the user and the <option> tags inside the select element define the available
options in the list.
<select name="top5" size="3">
<option value="Stones">The Rolling Stones</option>
<option value="Waits">Tom Waits</option>
<option value="Beatles">The Beatles</option>
<option value="Presley">Elvis Presley</option>
<option value="King">B.B. King</option>
</select>
The invasive routine detects such option HTML elements and analyses the element like in the
non invasive method in section 7.5.1 (buzzwords, value matching and maximum length restric-
tions). The generated unique and random value is either numeric, alphanumeric or a random
e-mail address. In the above case the unique random value would look something like this:
<select name="top5" size="3">
<option value="feebe8Oovahbo0hei6iP">The Rolling Stones</option>

Chapter 7. HtmlUnit based Web Spider | 65


<option value="xoh2unae9eeg0EeGooTa">Tom Waits</option>
<option value="rahfedaimaeg4Aegea0r">The Beatles</option>
<option value="eit6eeSheFai4thohKae">Elvis Presley</option>
<option value="iv2bo8jeeteeyohwahPi">B.B. King</option>
</select>

• Changing the value of radio options: Radio buttons are used when the user has to select an
option from a set of alternatives but is only allowed to select one.
<input type="radio" name="Payment" value="Master"> Mastercard
<input type="radio" name="Payment" value="Visa"> Visa
<input type="radio" name="Payment" value="AmExpress"> American Express
The invasive routine detects such radio buttons and analyses the element like in the non in-
vasive method in section 7.5.1 (buzzwords, value matching and maximum length restrictions).
The generated unique and random value is either numeric, alphanumeric or a random e-mail
address. In the above case the unique random value would look something like this:
<input type="radio" name="Payment" value="OoFohQu9luu2shiwahh7"> Mastercard
<input type="radio" name="Payment" value="thoobaeR2Daeph6oovie"> Visa
<input type="radio" name="Payment" value="seef5ShoomeoyohPhai0"> American Express

• Changing the value of checkbox options: Checkbox options are often used in groups to indi-
cate a series of choices any one of which can be on or off.
<input type="checkbox" name="Ingredient" value="salami"> Salami
<input type="checkbox" name="Ingredient" value="mushrooms"> Mushrooms
<input type="checkbox" name="Ingredient" value="anchovies"> Anchovies
The invasive routine detects such radio buttons and analyses the element like in the non in-
vasive method in section 7.5.1 (buzzwords, value matching and maximum length restrictions).
The generated unique and random value is either numeric, alphanumeric or a random e-mail
address. In the above case the unique random value would look something like this:
<input type="checkbox" name="Ingredient" value="failie9eeJaexaeJ3cho"> Salami
<input type="checkbox" name="Ingredient" value="geitohsh7oophai6Aek4"> Mushrooms
<input type="checkbox" name="Ingredient" value="IeteeRoh1Eishieh6oow"> Anchovies

7.5.3 Limitations

Using buzzwords and regular expression to determine the kind of random value which might be
accepted by the business logic of a web application is very primitive. The routine assumes that
the developer of the targeted web application named his HTML elements according to their pur-
pose. If the names or ids of HTML elements are randomly named such as <input type="text"
name="input15" value=""> then the routine will be generating unique random alphanumeric
values for all elements. Even if one of the fields only accepts numeric values. The consequence is
that some forms can not be populated with unique and random data which satisfies the imposed
business logic and thus the form will not be accepted by the web application. This in turn decreases
the accuracy of the web spider plugin because additional HTML pages that would have been shown to

Chapter 7. HtmlUnit based Web Spider | 66


a user only after correctly submitting the form will be left undiscovered. This also applies for plugins
trying to discover input validation vulnerabilities.

Chapter 7. HtmlUnit based Web Spider | 67


8 Detecting Injection Flaws

The following chapter outlines the final implementation details of the previously elaborated com-
bined white and black box plugin based on parsing database query log files. Especially how the plugin
uses SQL syntax validation libraries to find actually working exploits for found SQL injection vulner-
abilities.

8.1 SQL Code Injection Mutator

After a SQL statement has been identified in the database query log, actually containing a previous
sent unique and random value, the statement is going to be processed further. See section 4.2 on
how these SQL statements are being detected and collected. Based on the original statement, multi-
ple mutants are being generated which might lead to a working SQL Code Injection exploit for that
particular vulnerability. The following sections describe the procedure developed in the plugin which
tries to find working SQL Code Injection exploits for a given SQL statement in an automated manner.
This is a very important step because finding SQL statements containing a randomly generated value
does not pose a security threat to the targeted web application. The following described routine tries
to modify the randomly generated value in such a way, to proof the existence of a SQL Code Injection
vulnerability.

8.1.1 Validating SQL Statements

The first step is to validate the captured original SQL statement containing the previously sent
unique and random value. This is an example of such a captured SQL statement containing a sent
random value: SELECT * FROM products WHERE description LIKE ’%OPL89FGHC%’;. The
statement is being validated before tampering to make sure it is even a valid statement. This helps
in reducing false positives and increases the execution speed of the overall routine by skipping in-
valid statements. Finding working SQL syntax validation libraries for the Java programming language
sounds easier than it actually is. Nevertheless, there exist two promising approaches/projects whose
mission goal is to provide a stand alone SQL syntax validation library. Sadly, both libraries are not
able to validate every given SQL statement correctly, they do report false negatives. That means, SQL
statements which are syntactically correct are reported as being invalid. Because of this fact WASTF
uses two SQL validation libraries and flags a SQL statement as invalid only if both libraries come to
the same conclusion. The libraries used are JsqlParser and the validation routine from the Apache
Derby Database Management System.

Chapter 8. Detecting Injection Flaws | 68


JsqlParser

JSqlParser1 parses an SQL statement and translate it into a hierarchy of Java classes. The generated
hierarchy can be navigated using the Visitor Pattern. Listing 8.1 shows how a SQL statement can be
validated using JsqlParser.

Apache Derby

Apache Derby2 , an Apache DB sub project, is an open source relational database implemented en-
tirely in Java and available under the Apache License, Version 2.0. Listing 8.2 shows how a SQL state-
ment can be validated using the Apache Derby database management system. Unfortunately the
standard Apache Derby does not provide this functionality. A debug version of Apache Derby and
some additional Java classes from Derby’s Issue Tracker have to be added. See the following item
in the Apache Derby Issue Management System: https://issues.apache.org/jira/browse/
DERBY-3946.

8.1.2 Mutation Routine

The following sections explain how WASTF mutates collected SQL statements to find working SQL
Injection exploits step by step. Figure 8.5 tries to visualise the outlined steps in the following sections.
The routine basically builds the Cartesian product from several sets such as PAYLOADS, VALUE and
COMMENTS.

Step 1 - Validate original SQL query: As mentioned in section 8.1.1 the first step is to validate the
captured SQL statement with both SQL validation libraries. Additionally the collected SQL state-
ment is again being checked to contain the previously sent unique and random value contained in a
HTTP(S) request. If it is missing from the collected SQL statement, then there is no way WASTF would
be able to tamper with the SQL statement. The sample SQL statement used throughout this section
is the following: SELECT * FROM products WHERE description LIKE ’%OPL89FGHC%’;.

Step 2 - Remove fragments which contain non allowed characters: As seen in section 4.2.1 or
section 4.2.2 WASTF first probes the web application and tries to determine which special characters
such as >, < and others are being filtered by the web application under test. This list of non allowed
characters is now being used to reduce the size of the so called Fragment List. Fragments containing
characters which are being filtered by the targeted web application can not be used to successfully
inject working SQL code and can be safely removed. Figure 8.1 shows the contents of the Fragment
List whereas the PAYLOAD, VALUE, COMMENT represent tokens which later are being replaced with
meaningful values.

1 http://jsqlparser.sourceforge.net/ [12.07.10]
2 http://db.apache.org/derby/ [12.07.10]

Chapter 8. Detecting Injection Flaws | 69


PAYLOAD VALUE)) PAYLOAD
PAYLOAD COMMENT VALUE) PAYLOAD COMMENT
VALUE PAYLOAD COMMENT VALUE)) PAYLOAD COMMENT
VALUE PAYLOAD ’; PAYLOAD
’ PAYLOAD ); PAYLOAD
) PAYLOAD ’); PAYLOAD
’) PAYLOAD ’)); PAYLOAD
’)) PAYLOAD ’; PAYLOAD COMMENT
’ PAYLOAD COMMENT ); PAYLOAD COMMENT
) PAYLOAD COMMENT ’); PAYLOAD COMMENT
’) PAYLOAD COMMENT ’)); PAYLOAD COMMENT
’)) PAYLOAD COMMENT VALUE; PAYLOAD COMMENT
VALUE) PAYLOAD ’’’; PAYLOAD COMMENT

Figure 8.1: Fragment List used for mutating SQL Statements

Step 3 - Replace random token with fragments: The random value embedded in the collected SQL
statement (OPL89FGHC) marks the spot where code can be injected into the SQL query by sending
special crafted HTTP(S) requests. The third step is to replace the random value with the rest of
the Fragment List. See figure 8.2 for some examples what the current working set of mutated SQL
statements look like.

SELECT * FROM products WHERE description LIKE ’%PAYLOAD %’;


SELECT * FROM products WHERE description LIKE ’%PAYLOAD COMMENT %’;
SELECT * FROM products WHERE description LIKE ’%VALUE PAYLOAD COMMENT %’;
SELECT * FROM products WHERE description LIKE ’%VALUE PAYLOAD %’;
SELECT * FROM products WHERE description LIKE ’%’ PAYLOAD %’;
SELECT * FROM products WHERE description LIKE ’%) PAYLOAD %’;
SELECT * FROM products WHERE description LIKE ’%’)) PAYLOAD %’;
SELECT * FROM products WHERE description LIKE ’%’ PAYLOAD COMMENT %’;
...

Figure 8.2: Random value in the original SQL statement replaced with the Fragment List

Step 4 - Replace PAYLOAD token: The Payload List is similar to the previously mentioned Fragment
List and contains SQL payloads such as OR 1=1, ’ AND 1=1 etc. Again, like before, the Payload List
is being reduced by removing payloads which contain a non allowed character. After the list has been
reduced, the Cartesian product is being built by combining the SQL statements shown in figure 8.2
with the rest of the available payloads in the Payload List. The result of this operation is shown in
figure 8.3.

SELECT * FROM products WHERE description LIKE ’%OR 1=1 %’;


SELECT * FROM products WHERE description LIKE ’%’ AND 1=1 %’;
SELECT * FROM products WHERE description LIKE ’%OR 1=1 COMMENT %’;
SELECT * FROM products WHERE description LIKE ’%’ AND 1=1 COMMENT %’;
SELECT * FROM products WHERE description LIKE ’%VALUE OR 1=1 COMMENT %’;
SELECT * FROM products WHERE description LIKE ’%VALUE ’ AND 1=1 COMMENT %’;
...

Figure 8.3: PAYLOAD token replaced with values from the Payload List

Step 5 - Replace VALUE token: This step is basically no different than step 4. The main alteration is
that the VALUE token is being replaced with entries out of the Value List. The previously mutated SQL
statements as seen in figure 8.3 are being used and the Cartesian product is being built with entries
out of the Value List. The current implementation of the Value List contains the following entries: 1,

Chapter 8. Detecting Injection Flaws | 70


x, <empty string>.

Step 6 - Replace COMMENT token: Again the mutated SQL statements which resulted from the last
replacement in step 5 are being used for further processing. This time the COMMENT token is being
replaced with values out of the Comment List. The Cartesian product is being built with the mutated
SQL statements and the following entries out of the Comment List: −−. This is a comment indicator
for the MySQL Database Management System.

Step 7 - Remove enclosed payloads: Step 7 is used to reduce the size of the now finished mutated
SQL statements. Thus removing mutated SQL statements whose payloads are enclosed in brackets.
Payloads which are enclosed in brackets might only be useful for stored XSS attacks but do not
allow WASTF to send arbitrary SQL commands to the database of the targeted web application.
The following mutated SQL statement would be removed by this step: SELECT * FROM products
WHERE description LIKE ’%OR 1=1%’;.

Step 8 - Validate mutated SQL statements: The final step is to validate all the mutated SQL state-
ments. The mutated SQL statements have to conform to the SQL syntax. If they do not, the database
Management System of the targeted web application throws an error while trying to process these
statements. This is again done by using the two SQL syntax validation libraries mentioned in section
8.1.1.

Example: Let’s say we start with our original captured SQL statement SELECT * FROM products
WHERE description LIKE ’%OPL89FGHC%’; from before. All characters are allowed and none
are filtered by the targeted web application. We start mutating the SQL statement and after step 3
we have a total of 26 mutated statements (because the Fragment List consists of 26 values). Now the
PAYLOAD tokens are being replaced in step 4. After this a total of 234 (26 · 9 = 234) SQL statements are
ready for further processing because the Payload List consists of 9 values. After replacing the VALUE
token a total of 702 (234 · 3 = 702) SQL statements are ready for further processing. After replacing the
COMMENT token we have 702 (702 · 1 = 702) finished mutated SQL statements. After removing all the
payloads which are enclosed in brackets there are a total of 354 mutated SQL statements available.
The next step is to validate the statements if they conform to the SQL syntax. After validating the
statements there are a total of 15 mutated SQL statements left. These 15 statements are the proof
that a SQL Code Injection vulnerability exists and can be used to send arbitrary SQL statements to
the database Management System of the targeted web application. Figure 8.4 shows a complete list
of the mutated SQL statements which seem to be working exploits for the previously identified SQL
Injection vulnerability in the targeted web application.

Chapter 8. Detecting Injection Flaws | 71


[SELECT * FROM products WHERE description LIKE ’%1’ AND 1=(SELECT COUNT(*) FROM tablename); --%’;]
[1’ AND 1=(SELECT COUNT(*) FROM tablename); --]

[SELECT * FROM products WHERE description LIKE ’%x’ UNION SELECT a FROM b --%’;]
[x’ UNION SELECT a FROM b --]

[SELECT * FROM products WHERE description LIKE ’% 1’ AND 1=(SELECT COUNT(*) FROM tablename); --%’;]
[ 1’ AND 1=(SELECT COUNT(*) FROM tablename); --]

[SELECT * FROM products WHERE description LIKE ’%1 1’ AND 1=(SELECT COUNT(*) FROM tablename); --%’;]
[1 1’ AND 1=(SELECT COUNT(*) FROM tablename); --]

[SELECT * FROM products WHERE description LIKE ’%x’ OR 1=1 --%’;]


[x’ OR 1=1 --]

[SELECT * FROM products WHERE description LIKE ’%’ OR 1=1 -- %’;]


[’ OR 1=1 -- ]

[SELECT * FROM products WHERE description LIKE ’%) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- %’;]
[) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]

[SELECT * FROM products WHERE description LIKE ’%’; SELECT a FROM b -- %’;]
[’; SELECT a FROM b -- ]

...

Figure 8.4: Working SQL Code Injection exploits after mutating a captured SQL statement

8.1.3 Limitations of SQL Validation Libraries

Even though there exists a standard for the SQL query language such as the SQL-923 , SQL:20034 or
SQL:20085 every Database Management System has their own little specialities and dialects. This
makes it especially hard for SQL syntax validation libraries because there is no way they can validate
all the special functions every database manufacturer adds to his product. This circumstance leads
to false negatives being reported by the validation libraries while validating arbitrary SQL statements.
One sophisticated solution to this problem would be to use the validation routine of a particular
database product for which the SQL statement originally was written for. In reality this is sadly not
feasible for a number of reasons:

1. Installation: All these different database products would have to be physically installed on a
machine in order to access their SQL statement validation routine.

2. Cohesion: Most validation routines implemented in database products are highly coupled
with the underlying data structure. This means validating an arbitrary SQL statement such
as SELECT * FROM products WHERE id=12 only works if the products table exists and its
schema defines a numeric field with the name id. One would have to replicate database tables
based on the information given in collected SQL statements.

Writing a new SQL validation library which tries to recognise most of the special features added by
database manufacturers would be the best approach. Writing such a library would include using
parser generator tools such as ANTLR6 or JavaCC7 . There also exists an online SQL syntax validation
3 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt [13.07.10]
4 http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=34134
[13.07.10]
5 http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=38641
[13.07.10]
6 http://www.antlr.org/ [13.07.10]
7 https://javacc.dev.java.net/ [13.07.10]

Chapter 8. Detecting Injection Flaws | 72


web service called Mimer8 which validates arbitrary SQL statements against the SQL standards.

The JsqlParser parser library tries to recognise some of the features added by database manufactures
which are not described in the official SQL standards such as the MD5(’foobar’) function in MySQL
which returns the MD5 hash for the given string. The only drawback is it’s performance. Compared
to the validation routine provided by the Apache Derby database management system validating a
statement with JsqlParser is 1.5 times slower9

1 import net . s f . j s q l p a r s e r . JSQLParserException ;


2 import net . s f . j s q l p a r s e r . p a r s e r . CCJSqlParserManager ;
3

4 public c l a s s SQLValidator {
5 p r i v a t e f i n a l S t r i n g sqlStatement = "SELECT ∗ FROM products WHERE "
6 + " d e s c r i p t i o n LIKE ’%OPL89FGHC% ’; " ;
7 public s t a t i c boolean v a l i d a t e S t a t e m e n t ( ) {
8 CCJSqlParserManager parserManager = new CCJSqlParserManager ( ) ;
9 try {
10 parserManager . parse (new S t r i n g R e a d e r ( sqlStatement ) ) ;
11 return true ;
12 } catch ( JSQLParserException e ) {
13 e . printStackStrace ( ) ;
14 }
15 return f a l s e ;
16 }
17 }

Listing 8.1: Validating a SQL statement with JsqlParser

8 http://developer.mimer.se/validator/ [13.07.10]
9 Measured by using the Eclipse Test & Performance Tools Platform Project (TPTP) and the SqlValidatorTest.java
JUnit test case (see the enclosed CD-ROM for the source code).

Chapter 8. Detecting Injection Flaws | 73


1 import org . apache . derby . i a p i . s e r v i c e s . c o n t e x t . ContextManager ;
2 import org . apache . derby . i a p i . s q l . conn . LanguageConnectionContext ;
3 import org . apache . derby . impl . jdbc . EmbedConnection ;
4 import org . apache . derby . impl . s q l . compile . QueryTreeNode ;
5

6 public f i n a l c l a s s SQLValidator {
7 p r i v a t e s t a t i c Connection conn = n u l l ;
8 p r i v a t e f i n a l S t r i n g sqlStatement = "SELECT ∗ FROM products WHERE "
9 + " d e s c r i p t i o n LIKE ’%OPL89FGHC% ’; " ;
10 public f i n a l s t a t i c S t r i n g DRIVER_NAME = " org . apache . derby . jdbc . EmbeddedDriver " ;
11 public f i n a l s t a t i c S t r i n g CONNECTION_URL = " jdbc : derby : memory :dummy; c r e a t e=t r u e " ;
12 public f i n a l s t a t i c S t r i n g DERBY_DEBUG_SETTING = " derby . debug . t r u e " ;
13 public f i n a l s t a t i c S t r i n g STOP_AFTER_PARSING = " S t o p A f t e r P a r s i n g " ;
14 public f i n a l s t a t i c S t r i n g SHUTDOWN_URL = " jdbc : derby : ; shutdown=t r u e " ;
15 public f i n a l s t a t i c S t r i n g LANG_CONNECTION = " LanguageConnectionContext " ;
16 public f i n a l s t a t i c S t r i n g STOPPED_AFTER_PARSING = " 42Z55 " ;
17 public f i n a l s t a t i c S t r i n g DERBY_LOG = " derby . stream . e r r o r . method" ;
18

19 p r i v a t e void i n i t i a l i s e C o n n e c t i o n ( ) {
20 try {
21 System . s e t P r o p e r t y (DERBY_LOG, DERBY_LOG_CLASS ) ;
22 System . s e t P r o p e r t y (DERBY_DEBUG_SETTING , STOP_AFTER_PARSING ) ;
23 C l a s s . forName (DRIVER_NAME ) . newInstance ( ) ;
24 } catch ( ClassNotFoundException e ) {
25 LOGGER . e r r o r ( e . getMessage ( ) , e ) ;
26 return ;
27 } catch ( I n s t a n t i a t i o n E x c e p t i o n e ) {
28 LOGGER . e r r o r ( e . getMessage ( ) , e ) ;
29 return ;
30 } catch ( I l l e g a l A c c e s s E x c e p t i o n e ) {
31 LOGGER . e r r o r ( e . getMessage ( ) , e ) ;
32 return ;
33 }
34

35 try {
36 conn = DriverManager . getConnection (CONNECTION_URL ) ;
37 } catch ( SQLException e ) {
38 LOGGER . e r r o r ( e . getMessage ( ) , e ) ;
39 return ;
40 }
41 }
42

43 public boolean v a l i d a t e S t a t e m e n t ( ) {
44 initialiseConnection ( ) ;
45

46 i f ( conn == n u l l ) {
47 throw new I l l e g a l S t a t e E x c e p t i o n ( " Derby datbase connection not a v a i l a b l e ! " ) ;
48 }
49

50 QueryTreeNode queryTree = n u l l ;

Chapter 8. Detecting Injection Flaws | 74


51 boolean exception = f a l s e ;
52

53 try {
54 conn . prepareStatement ( sqlStatement ) ;
55 } catch ( SQLException e ) {
56 String s q l S t a t e = e . getSQLState ( ) ;
57 i f ( ! STOPPED_AFTER_PARSING . equals ( s q l S t a t e ) ) {
58 exception = true ;
59 }
60 }
61

62 i f ( exception ) {
63 return f a l s e ;
64 }
65

66 ContextManager contextManager = ( ( EmbedConnection ) conn ) . getContextManager ( ) ;


67 LanguageConnectionContext l c c = ( LanguageConnectionContext )
68 contextManager . getContext (LANG_CONNECTION ) ;
69 queryTree = ( QueryTreeNode ) l c c . getLastQueryTree ( ) ;
70

71 // Return whether the SQL i s a v a l i d statement or not


72 return ( queryTree == n u l l ) ? f a l s e : true ;
73 }
74 ...
75 }

Listing 8.2: Validating a SQL statement with Apache Derby

Chapter 8. Detecting Injection Flaws | 75


Chapter 8. Detecting Injection Flaws
Figure 8.5: Simplified UML 2 activity diagram showing the SQL mutation routine

| 76
9 Test Series

The Test Series chapter is concerned with testing the WASTF application thoroughly. Especially in
comparison with standard black box web application security testing tools such as w3af. The goal
is to provide a detailed comparison between WASTF and w3af in regard of sent HTTP(S) requests,
number of detected vulnerabilities, web pages discovered etc.

9.1 Test Environment

The next sections describe the basic system which has been set up to conduct the comparison be-
tween WASTF and its black box security testing application equivalent w3af.

9.1.1 Base System

In the enclosed CD-ROM’s root directory there exists a folder called “Test-Environment”. This folder
contains a virtual machine with the test environment set up. The virtual machine has been created
using the free virtualisation project VirtualBox (version 3.2.6.r63112) and converted to the Open
Virtualisation Format (.ovf files).

The base system consists of a Ubuntu 10.04 LTS “Lucid Lynx” Server edition operating system with
the following services enabled and software installed:

• Apache Webserver: Server version: Apache/2.2.14 (Ubuntu) / Server built: Apr 13 2010 19:28:27

• Apache Tomcat: Server version: Apache Tomcat/6.0.24

• MySQL Server: mysqld Ver 5.1.41-3ubuntu12.3 for debian-linux-gnu on i486 ((Ubuntu))

• MySQL Client: mysql Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (i486) using readline 6.1

• OpenSSH: OpenSSH_5.3p1 Debian-3ubuntu4, OpenSSL 0.9.8k 25 Mar 2009

9.1.2 Network

The test environment has been configured to use a “Host-only Adapter” for networking. Host-only
networking can be thought of as a hybrid between the bridged and internal networking modes: as
with bridged networking, the virtual machines can talk to each other and the host as if they were con-
nected through a physical Ethernet switch. Similarly, as with internal networking however, a physical
networking interface needs not to be present, and the virtual machines cannot talk to the world out-
side the host since they are not connected to a physical networking interface. Instead, when host-only

Chapter 9. Test Series | 77


networking is used, VirtualBox creates a new software interface on the host which then appears next
to your existing network interfaces. In other words, whereas with bridged networking an existing
physical interface is used to attach virtual machines to, with host-only networking a new “loopback”
interface is created on the host. And whereas with internal networking, the traffic between the virtual
machines cannot be seen, the traffic on the “loopback” interface on the host can be intercepted (taken
from the VirtualBox help pages). The system’s network card has been configured in the following way:

iface eth0 inet static


address 192.168.56.101
netmask 255.255.255.0
network 192.168.56.0
broadcast 192.168.56.255
gateway 192.168.56.1

9.1.3 Usernames and Passwords

The test environment can be accessed with the following users:

• Standard User: Username: wastf / Password: wastf

• MySQL Administrator: Username: root / Password: mysql

• DVWA Administrator: Username: admin / Password: password (See section 9.2.4)

• Wordpress Administrator: Username: admin / Password: password (See section 9.2.5)

• Magento Administrator: Username: admin / Password: password2010 (See section 9.2.6)

• Tomcat Administrator: Username: admin / Password: password

Note: The virtual machine is not intended for use in a productive environment or for storing sensitive
data. The used passwords are very weak and chosen for easy and quick access to the test environment.

9.2 Web Applications

The following sections describe the installed web applications on the virtual machine which are be-
ing used for testing and comparing WASTF with another web application security scanner. The URLs
given for accessing the web applications are meant to be opened after the test environment respec-
tively the virtual machine (see contents of the enclosed CD-ROM) has been started.

9.2.1 Index & Overview

There exists a static welcome page which can be accessed trough the following URL http://192.
168.56.101/. The welcome page contains a list of all the installed web applications on the test en-
vironment server. Figure 9.1 shows the WASTF test environment welcome page and two pre-installed
web applications.

Chapter 9. Test Series | 78


(a) WASTF Test Env. Home Screen (b) Damn Vulnerable Web Application (c) Magento eCommerce Application

Figure 9.1: Various pre-installed applications in the WASTF test environment virtual machine

9.2.2 phpMyAdmin Version 3.3.2deb1

phpMyAdmin is a free software tool written in PHP intended to handle the administration of MySQL
over the World Wide Web. phpMyAdmin supports a wide range of operations with MySQL. The most
frequently used operations are supported by the user interface (managing databases, tables, fields,
relations, indexes, users, permissions, etc), while you still have the ability to directly execute any SQL
statement1 .

phpMyAdmin can be accessed trough the follwoing URL (using the MySQL Administrator creden-
tials): http://192.168.56.101/phpmyadmin/

Note: The MySQL server can also be accessed from a remote machine with the mysql-client package
installed:

# mysql -u root -pmysql -h 192.168.56.101

Note: The ommited space between the -p parameter and the provided password is intentional.

9.2.3 WIVET Version 3

WIVET2 is a benchmarking framework that aims to statistically analyse web link extractors. In
general, web application vulnerability scanners fall into this category. These vulnerability scanners,
given a URL, try to extract as many input vectors as they possibly can to increase the coverage of the
attack surface. WIVET provides a good sum of input vectors to any extractor and presents the results.
In order an input extractor to run meaningfully, it has to provide some kind of session handling,
which nearly all of the decent crawlers do. The WIVET project is released under the GNU General
Public License Version 2. Also see section 5.2.4.

WIVET can be accessed trough the following URL: http://192.168.56.101/wivet

1 http://www.phpmyadmin.net/home_page/index.php [11.07.10]
2 http://code.google.com/p/wivet/ [29.03.10]

Chapter 9. Test Series | 79


9.2.4 Damn Vulnerable Web Application (DVWA) Version 1.0.6

Damn Vulnerable Web App (DVWA)3 is a PHP/MySQL web application that is damn vulnera-
ble. Its main goals are to be an aid for security professionals to test their skills and tools in a
legal environment, help web developers better understand the processes of securing web applica-
tions and aid teachers/students to teach/learn web application security in a class room environment.

DVWA can be accessed trough the following URL: http://192.168.56.101/dvwa

9.2.5 Wordpress Version 3.0

WordPress4 is an open source CMS, often used as a blog publishing application powered by PHP and
MySQL. It has many features including a plugin architecture and a templating system. Used by over
300 of the 10,000 biggest websites, WordPress is the most popular blog software in use today5 . It was
first released in May 2003 by Matt Mullenweg as a fork of b2/cafelog. As of September 2009, it was
being used by 202 million websites worldwide6 .

Wordpress can be accessed trough the following URL: http://192.168.56.101/wordpress

9.2.6 Magento Version 1.4.1.0

Magento7 is a feature-rich eCommerce platform built on open-source technology that provides


online merchants with unprecedented flexibility and control over the look, content and functionality
of their eCommerce store. Magento’s intuitive administration interface features powerful marketing,
search engine optimization and catalog-management tools to give merchants the power to create
sites that are tailored to their unique business needs. Designed to be completely scalable and backed
by Varien’s support network, Magento offers companies the ultimate eCommerce solution.

Magento can be accessed trough the following URL: http://192.168.56.101/magento

9.2.7 Flowershop

Flowershop is a vulnerable web application mimicking an eCommerce site. Flowershop is included


in the enclosed CD-ROM accompanying the book How to Break Web Software: Functional and
Security Testing of Web Applications and Web Services (see [1]). The flowershop application contains
vulnerabilities such as Command Injection, Cross Site Scripting, SQL Injection etc.

Flowershop can be accessed trough the following URL: http://192.168.56.101/flowershop

3 http://www.dvwa.co.uk/ [11.07.10]
4 http://wordpress.org [16.07.10]
5 http://trends.builtwith.com/blog/WordPress [11.07.10]
6 http://andrewapeterson.com/2009/09/wordpress-usage-202-million-worldwide-62-8-million-us/

[11.07.10]
7 http://www.magentocommerce.com/ [16.07.10]

Chapter 9. Test Series | 80


9.3 Test Series

The following sections describe the set-up used to compare WASTF and its black box security testing
application equivalent w3af. The specific test cases are being explained and the results are being
analysed and visually presented.

9.3.1 Test Set-Up

Figure 9.2 shows the set-up used to conduct the comparison between WASTF and its black box secu-
rity testing application equivalent w3af. The following software versions have been used:

• w3af 1.0-rc3: The latest release candidate for version 1 (see chapter 5).

• WASTF 1.0: The current development release.

• WebScarab 20100414-0036: A ZIP containing an up to date build of the master branch of the
WebScarab git tree. WebScarab is a framework for analysing applications that communicate
using the HTTP and HTTPS protocols. It is written in Java, and is thus portable to many plat-
forms. WebScarab has several modes of operation, implemented by a number of plugins. In
its most common usage, WebScarab operates as an intercepting proxy, allowing the operator
to review and modify requests created by the browser before they are sent to the server, and to
review and modify responses returned from the server before they are received by the browser.
WebScarab is able to intercept both HTTP and HTTPS communication. The operator can also
review the conversations (requests and responses) that have passed through WebScarab.

These specific versions of the tools can be found in the CD-ROM’s root directory filed under Test-
Environment > Tools. As shown in figure 9.2 all the HTTP requests made by the web application
security scanners are being recorded by the WebScarab proxy for detailed analyses. This ensures that
not a single HTTP request is transmitted unnoticed and unrecorded. The files generated by WASTF
and w3af for every test case are stored on the enclosed CD-ROM under Test-Environment > Test
Series. The file structure of every test case can be seen in figure 9.3. The recordings of WebScarab can
be loaded by unziping the webscarab.zip located in every test case folder into a new folder and
then by clicking on File > Open inside WebScarab’s graphical user interface.

The conducted tests focus on the following attributes to make a sound comparison between WASTF
and w3af:

• Speed/Runtime: How long in regard of execution time does the web application security scan-
ner need for completing the test case. This metric strongly correlates with the accuracy metric.
A web application security scanner whith a low accuracy is usually faster than its counterpart
which has a higher accuracy.

• Accuracy: This metric depends on the conducted test case. The accuracy metric should tell
how many pages have been detected by the web spider plugin or how many vulnerabilities
have been found during the scan.

Chapter 9. Test Series | 81


Figure 9.2: Test Set-Up

CD-ROM
...
Test Environment
<Test Case 1>
w3af .................................................... w3af test case results
output.txt.......................................... w3af’s console output
output-http.txt.......................... w3af also records HTTP messages
webscarab.zip ........................ WebScarabs recorded HTTP messages
*.pw3af...................... w3af configuration file for replaying the test case
WASTF................................................. WASTF test case results
console_output.log...............................WASTF’s console output
wastf.log ............................................ Debug log messages
webscarab.zip ........................ WebScarabs recorded HTTP messages
*.wastf.................... WASTF configuration file for replaying the test case
<Test Case 2>
...
Figure 9.3: Test case file structure on the enclosed CD-ROM

• Number of sent HTTP(S) requests: This metric compares the number of sent HTTP(S) requests
a web application security scanner used to achieve its measured accuracy.

9.3.2 Test Case: WIVET [Web Spider only]

The WIVET framework for benchmarking web link extractors has been mentioned before in section
5.2.4. The following test case tries to compare the web spider component of both WASTF and w3af in
regard of accuracy, speed and sent HTTP requests.

Table 9.1 shows a summary of the achieved results of w3af and WASTF. Because of the JavaScript
support provided by the web spider component, WASTF is able to collect considerably more pages
from the WIVET benchmark framework than w3af. With WASTF’s achieved 81% it nearly is as good
as IBM’s web spider component included in its commercially available AppScan product (see section

Chapter 9. Test Series | 82


5.2.4). This test case is being shown in the tutorial video on the enclosed CD-ROM (see Videos >
WASTF-WebSpider.ogv). Note: The achieved results of the commercially available products have not
been verified during this project thesis and the web spider components of the noted products in
section 5.2.4 might have been enhanced to achieve a greater score.

WASTF w3af
Runtime [s] 52 46
Accuracy 81% 16%
# HTTP Requests 575 (POST: 6 / GET: 569) 231 (POST: 5 / GET: 226)

Table 9.1: Results of the WIVET [Web Spider] test case

9.3.3 Test Case: Magento [Web Spider only]

This test case features a “large” (in means of many available HTML pages and many HTML links)
eCommerce web application based on PHP and MySQL. WASTF completed the task in 23 minutes
and found 824 different HTML pages have been found. w3af on the other hand was not able to suc-
cessfully collect any HTML page. Table 9.2 shows the recorded information by WebScarab. Sadly w3af
crashed after 233 GET requests without reporting any found HTML page. The error message received
can be seen in figure 9.4. The “bug” has been submitted to the developing community of w3af and
its status can be tracked under the following URL: http://sourceforge.net/tracker/?func=
detail&aid=3030470&group_id=170274&atid=853652.

WASTF w3af
Runtime [s] 1’380 132
Accuracy 1421 pages 0 pages
# HTTP Requests 2576 (POST: 6 / GET: 2570) 233 (POST: 0 / GET: 233)

Table 9.2: Results of the Magento [Web Spider] test case

Chapter 9. Test Series | 83


Submit this bug here: http://sourceforge.net/tracker/?func=detail&aid=3030470&group_id=170274&atid=853652

Python version:
2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3]

GTK version:2.20.1
PyGTK version:2.17.0

w3af - Web Application Attack and Audit Framework


Version: 1.1 (from tgz)
Author: Andres Riancho and the w3af team.Traceback (most recent call last):
File "/home/kevin/Documents/Tools/w3af/core/ui/gtkUi/main.py", line 590, in startScanWrap
self.w3af.start()
File "/home/kevin/Documents/Tools/w3af/core/controllers/w3afCore.py", line 422, in start
self._end()
File "/home/kevin/Documents/Tools/w3af/core/controllers/w3afCore.py", line 676, in _end
tm.join( joinAll=True )
File "/home/kevin/Documents/Tools/w3af/core/controllers/threads/threadManager.py", line 119, in join
self._threadPool.wait( ownerObj, joinAll )
File "/home/kevin/Documents/Tools/w3af/core/controllers/threads/threadpool.py", line 260, in wait
self.poll(block=True, ownerObj=ownerObj, joinAll=joinAll)
File "/home/kevin/Documents/Tools/w3af/core/controllers/threads/threadpool.py", line 245, in poll
raise result
w3afMustStopException: The xUrllib found too much consecutive errors. The remote
webserver doesn’t seem to be reachable anymore; please verify manually.

Figure 9.4: w3af’s crash message while spidering the Magento eCommerce web application

9.3.4 Test Case: Wordpress [Web Spider only]

Table 9.3 shows the achieved results of WASTF and w3af after spidering the popular blog software
Wordpress. Interestingly, WASTF needed almost 3 times the HTTP(S) requests in order to detect only
20 HTML pages more than w3af. After analysing the recorded HTTP traffic of WASTF the cause of
this odd behaviour could be determined. The spidering component is being trapped inside a loop
because of a special crafted URL provided by Wordpress. The request in question causing the loop is
shown in figure 9.5.

After analysing the recorded HTTP traffic the following conclusion has to be made on the origin of
the loop:

1. The originating HTML page (wp-login.php) is part of a login screen with an embedded HTML
form with text fields reserved for entering username and password of a legitimate user. If the
wrong username and/or password is being supplied the HTTP request shown in figure 9.5 is
being issued. In the end this request redirects the users browser to the originating HTML page
(wp-login.php). Normally this kind of “loop” is not an issue because the web spider routine
is able to distinguish whether a found HTML page has been visited before or not.

2. The web spider component of WASTF uses Java Prepared Statements8 to store collected URLs
in the WASTF database. The found HTML site

http://192.168.56.101:80/wordpress/wp-login.php?redirect_to=
http%3A%2F%2F192.168.56.101%2Fwordpress%2Fwp-admin%2F&reauth=1 HTTP/1.1

8 With most development platforms, parametrised statements can be used that work with parameters (sometimes called
placeholders or bind variables) instead of embedding user input in the statement. In many cases, the SQL statement is
fixed, and each parameter is a scalar, not a table. The user input is then assigned (bound) to a parameter.

Chapter 9. Test Series | 84


is being stored in the database of WASTF. Upon revisiting the same site the routine is not able
to tell whether the found HTML page has been visited before or not and assumes it is a new one
because of URL character encoding issues thus causing an endless loop.

In the end, the web spider routine is able to leave the endless loop because of the maxLevel setting.
Every time the URL with the redirect is being called (see figure 9.5) the internal level variable is being
increased by one. After reaching the maximum allowed level (set by the user, the default is 25) the
page is being ignored and the endless loop is resolved. To fix the issue some debugging has to be
made with URLs containing URL encoded characters such as %3C etc. This is an issue which should
be easy to fix in a second development phase.

GET http://192.168.56.101:80/wordpress/wp-login.php?redirect_to=http%3A%2F%2F192.168.56.101%2Fwordpress%2Fwp-admin%2F&reauth=1 HTTP/1.1


User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
Accept-Language: en-us
Accept: */*
Host: 192.168.56.101
Cookie: wordpress_test_cookie=WP+Cookie+check
Proxy-Connection: Keep-Alive

Figure 9.5: The HTTP request causing an endless loop in the web spider component

WASTF w3af
Runtime [s] 98 113
Accuracy 93 pages 77 pages
# HTTP Requests 319 (POST: 70 / GET: 249) 99 (POST: 0 / GET: 233)

Table 9.3: Results of the Wordpress [Web Spider] test case

9.3.5 Test Case: Flowershop [Web Spider only]

Table 9.4 shows the achieved results of WASTF and w3af after spidering the Flowershop eCommerce
web application. WASTF and w3af detect the same amount of HTML pages. The only big difference
is the running time. w3af needed more than 5 minutes to finish the web spidering process. During
the run it stopped for about one or two minutes without sending any HTTP(S) requests. This might
be a bug in the web spider component of w3af. The cause of the stalling could not be determined in
an appropriate time limit.

WASTF w3af
Runtime [s] 14 303
Accuracy 14 pages 14 pages
# HTTP Requests 168 (POST: 6 / GET: 162) 80 (POST: 6 / GET: 74)

Table 9.4: Results of the Flowershop [Web Spider] test case

9.3.6 Test Case: DVWA [Web Form Login only]

The following test case tests the ability of WASTF to access hidden HTML pages in web applications
which are being protected by simple HTML login forms. The left side of figure 9.6 shows how to

Chapter 9. Test Series | 85


configure the webFormLogin plugin and the right side shows the output after starting the plugin.
The login procedure was successful because the browser had been redirected to the DVWA welcome
page (see the line with Received Title Text:... in figure 9.6). The hole process needed 3 HTTP
requests (POST:1 / GET: 2) and took 1.7 seconds to complete. This test case is being shown in the
tutorial video on the enclosed CD-ROM (see Videos > WASTF-WebFormLogin.ogv).

Automated web form population is not available in w3af. See section 5.2.2 on how to collect session
cookies with w3af.

[+] Executing webFormLogin...


plugins [+] Received Title Text: Damn Vulnerable Web App (DVWA) - Login
webLogin [+] Found login form
enable webFormLogin [+] Setting field (username) to ’admin’
webFormLogin [+] Setting field (password) to ’password’
set urlWithLoginForm http://192.168.56.101/... [+] Found login button: <input type="submit" value="Login" name="Login"/>
set formData {"username":"admin", [+] Clicking button...
"password":"password"} [+] Received Title Text: Damn Vulnerable Web App (DVWA) v1.0.6 :: Welcome
[+] webFormLogin finished

Figure 9.6: WebFormLogin plugin configuration (left side) and output (right side) for DVWA

9.3.7 Test Case: Secure Messaging (Privasphere) [Web Form Login only]

This test case tests if WASTF is able to access a users Secure Messaging mailbox in an automated way
by populating the provided login form. Privasphere provides innovative secure and authenticated
Internet messaging technologies and service to corporations and individual users. An overall of 5
HTTP(S) requests are being sent (POST: 1 / GET: 4) and the login procedure took 5 seconds. Figure
9.7 shows the configuration of WASTF on the left side and the generated output on the right side. The
login was successfull because the browser had been redirected to the users Inbox (see the line with
Received Title Text:... in figure 9.7).
[+] Executing webFormLogin...
[+] Received Title Text: PrivaSphere AG - Secure Messaging Service
plugins
[+] Found login form: loginForm
webLogin
[+] Setting field (login) to ’kevin.denver@zhaw.ch’
enable webFormLogin
[+] Setting field (password) to ’MYPASSWORD’
webFormLogin
[+] Found login button: <input type="image" name="submit" value="submit"
set formName loginForm
src="/hp/imgs/bt-login.gif" height="17" width="57"
set urlWithLoginForm https://www.privasphere.com
style="width:57px" tabindex="3"/>
set formData {"login":"kevin.denver@zhaw.ch",
[+] Clicking button...
"password":"MYPASSWORD"}
[+] Received Title Text: Inbox; Kevin’s PrivaSphere
[+] webFormLogin finished

Figure 9.7: WebFormLogin plugin configuration (left side) and output (right side) for logging into the
SecureMessaging web application

9.3.8 Test Case: DVWA [Database Query Log - Online Mode]

This test case uses the DVWA web application to test the ability of WASTF to detect SQL Injection
vulnerabilities. The configuration can be seen in figure 9.8. Before the database query log plugin
is being executed, WASTF enters the necessary user credentials into the login form as seen in test
case 9.3.6. This is necessary in order to access the protected HTML page with the intentional SQL
Injection vulnerability.

Chapter 9. Test Series | 86


plugins
webLogin
enable webFormLogin
webFormLogin
set urlWithLoginForm http://192.168.56.101/dvwa/login.php
set formData {"username":"admin","password":"password"}
back
back
audit
enable databaseQueryLog
databaseQueryLog
set online true
set queryLogLocation jdbc:mysql://192.168.56.101:3306/mysql?user=root&password=mysql
back
back
back
http-settings
set proxyHost 127.0.0.1
set proxyPort 8008
back
target
set target http://192.168.56.101/dvwa/vulnerabilities/sqli/
back
start
exit

Figure 9.8: The DVWA database query log plugin test case configuration (Online Mode)

WASTF needed 7 seconds to complete and sent 42 HTTP requests (POST: 1 / GET: 41). The SQL
Injection vulnerability has been successfully detected in the following SQL statement: SELECT
first_name, last_name FROM users WHERE user_id = ’70032423380958851878’ and
proposed 15 working SQL payloads which have been verified by sending the payloads to the web
applications and afterwards looking for them in the SQL query log. The following input characters
are unfiltered by the input validation routine of the web application: <, >, (, ), , , „ ;, ",
’, ., !, -, =, &, *, OR, AND, UNION, ALL, SELECT. A list of working SQL payloads can
be seen in figure 9.9. This test case is being shown in the tutorial video on the enclosed CD-ROM (see
Videos > WASTF-DatabaseQueryLog-Online.ogv).

[1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]


[x’ UNION SELECT a FROM b -- ]
[1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
[1 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
[x’ OR 1=1 -- ]
[’ UNION SELECT a FROM b -- ]
[’ OR 1=1 -- ]
[) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
[) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
[1) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
[)) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
[1)) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
[’; SELECT a FROM b -- ]
[x’; SELECT a FROM b -- ]
[’’’; SELECT a FROM b --]

Figure 9.9: The DVWA database query log plugin test case working payloads (Online Mode)

Chapter 9. Test Series | 87


w3af has also been put to the test whether it is able to detect the SQL Injection vulnerability. w3af
needed 45 seconds to finish the scan and sent 62 HTTP requests (POST: 0 / GET: 62) and was not able
to detect the SQL Injection vulnerability. Table 9.5 compares the results of WASTF and w3af. Because
of a bug in w3af (version 1.0, release candidate 3) the SVN version has been used for this test case
(revision 3506).

WASTF w3af
Runtime [s] 7 45
Accuracy 1 Vulnerability 0 Vulnerabilities
# HTTP Requests 42 (POST: 1 / GET: 41) 62 (POST: 0 / GET: 62)

Table 9.5: Results of the DVWA [Database Query Log - Online Mode] test case

9.3.9 Test Case: DVWA [Database Query Log - Offline Mode]

This test case tests the ability of WASTF to find SQL Injection vulnerabilities by using the offline mode
of the database query log plugin. Coincidentally the found SQL payloads are the same in the offline
scenario as well as in the previous online scenario (see section 9.3.8). This does not conclude that
the results are equally sound. The online mode verifies it found SQL payloads (see chapter 8 and
section 4.2.1) which the offline mode does not. Thus the online mode is more precise because the
SQL payloads reported have been tested to work.

9.3.10 Test Case: Wordpress [Web Spider & Database Query Log - Online Mode]

This test case consists of two parts. First the Wordpress application is being spidered for 10 minutes
max and after that the SQL Injection vulnerability routines are being used to detect possible weak-
nesses in the software. In the case of w3af both the “normal” and the Blind SQL Injection detection
plugin have been enabled. This test case is all about speed/performance and sent HTTP requests.
This test case has been recorded and can be found on the enclosed CD-ROM (see Videos > WASTF-
DatabaseQueryLog-Online_2.ogv and see Videos > WASTF-DatabaseQueryLog-Offline_1.ogv).

WASTF w3af
Runtime [s] 279 3’902
Accuracy - 2 False-Positives
# HTTP Requests 465 (POST: 120 / GET: 345) 4255 (POST: 2275 / GET: 1980)

Table 9.6: Results of the Wordpress [Database Query Log Online Mode & Web Spider] test case

WASTF w3af
Runtime [s] 272 3’902
Accuracy - 2 False-Positives
# HTTP Requests 632 (POST: 140 / GET: 362) 4255 (POST: 2275 / GET: 1980)

Table 9.7: Results of the Wordpress [Database Query Log Offline Mode & Web Spider] test case

Chapter 9. Test Series | 88


One easily sees (see table 9.6) that WASTF is far more superior when it comes to execution time and
sent HTTP(S) requests. This shows that it is possible to enhance the performance of web application
security scanners by combining black and white box security testing techniques. WASTF can easily
detect if a modified HTML or GET parameter triggers a SQL statement worth looking for input vali-
dation vulnerabilities, whereas w3af continues probing. w3af is roughly 14 times slower than WASTF
and needs 9 times the HTTP(S) requests (in WASTF’s online mode). WASTF’s offline mode (see ta-
ble 9.7) needs slightly more HTTP(S) requests than the online mode because every HTML and GET
input parameter is being probed before ruling the parameters out which do not trigger a SQL state-
ment. WASTF correctly found no SQL Injection vulnerabilities whereas w3af reported a discovered
Blind-SQL vulnerability twice:

* Blind SQL injection was found at: "http://192.168.56.101/wordpress/wp-comments-post.php", using HTTP method POST.
The injectable parameter is: "comment". This vulnerability was found in the requests with ids 4136 and 4137.

* Blind SQL injection was found at: "http://192.168.56.101/wordpress/wp-comments-post.php", using HTTP method POST.
The injectable parameter is: "comment". This vulnerability was found in the requests with ids 4347 and 4348.}

Looking closer at the log file generated by w3af, the reported vulnerability turned out to be a false
positive. The HTTP request and response which triggered the false positive can be seen in figure 9.10.

9.4 Summary

This chapter tested WASTF extensively with small and ‘large‘ web applications and WASTF’s per-
formance and accuracy has been better than w3af results. Apart from a small bug in spidering the
Wordpress web application (see section 9.3.4), WASTF has outperformed w3af’s web spider in either
runtime or accuracy. WASTF sends more HTTP(S) requests during the spidering phase but this is due
to the JavaScript support whereas w3af ignores external JavaScript source files.

It can be shown that the detection of SQL input validation vulnerabilities is more accurate and more
performant by combining black and white box testing methodology than only using a black box
testing approach. WASTF is 14 times faster than w3af in detecting SQL input validation vulnerabil-
ities in web applications. WASTF is still in an early phase and might contain some bugs and thus
report false positives but the false positive Blind SQL Injection vulnerability reported by w3af (see
section 9.3.10) has been successfully detected as not exploitable by WASTF. The online mode (see
chapter 4 and 8) even generates SQL exploit payloads and verifies them by actually submitting to
the targeted web application before reporting any vulnerabilities and thus increasing the accuracy of
reported vulnerabilities even further. The offline mode is not as accurate as the online mode because
WASTF’s generated SQL exploit payloads can not be tested until the SQL statement triggered by the
web application is known to WASTF. An alternative would bee to send SQL exploit payloads blindly
(same as black box security testing applications) would not increase the accuracy of the vulnerability
detection routine but only increase the HTTP(S) request footprint. WASTF tries to find working SQL
exploit payloads offline by mutating the found SQL statements in the database query log. Because
WASTF operates in the offline mode the SQL payloads can not be verified by sending them to the
targeted web application and this circumstance might lead to false positives being reported by

Chapter 9. Test Series | 89


WASTF.

========================================Request 4136 - Thu 29 Jul 2010 06:52:49 PM CEST========================================


POST http://192.168.56.101/wordpress/wp-comments-post.php HTTP/1.1
Accept-encoding: identity
Accept: */*
User-agent: w3af.sourceforge.net
Host: 192.168.56.101
Referer: http://192.168.56.101/
Cookie: wordpress_test_cookie=WP+Cookie+check
Content-type: application/x-www-form-urlencoded

comment=21"+OR+"21"="21&author=FrAmE30.&url=5672&comment_parent=8&comment_post_ID=62&email=w3af%40email.com
========================================Response 4136 - Thu 29 Jul 2010 06:52:49 PM CEST=======================================
HTTP/1.1 500 Internal Server Error
content-length: 1224
x-powered-by: PHP/5.3.2-1ubuntu4.2
expires: Wed, 11 Jan 1984 05:00:00 GMT
vary: Accept-Encoding
server: Apache/2.2.14 (Ubuntu)
last-modified: Thu, 29 Jul 2010 16:52:49 GMT
connection: close
pragma: no-cache
cache-control: no-cache, must-revalidate, max-age=0
date: Thu, 29 Jul 2010 16:52:49 GMT
content-type: text/html; charset=utf-8

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">


<!-- Ticket #11289, IE bug fix: always pad the error page with enough characters such that it is greater than 512
bytes, even after gzip compression abcdefghijklmnopqrstuvwxyz1234567890aabbccddeeffgghhiijjkkllmmnnooppqqrrss
ttuuvvwwxxyyzz11223344556677889900abacbcbdcdcededfefegfgfhghgihihjijikjkjlklkmlmlnmnmononpopoqpqprqrqsrsrtsts
ubcbcdcdedefefgfabcadefbghicjkldmnoepqrfstugvwxhyz1i234j567k890laabmbccnddeoeffpgghqhiirjjksklltmmnunoovppqwqr
rxsstytuuzvvw0wxx1yyz2z113223434455666777889890091abc2def3ghi4jkl5mno6pqr7stu8vwx9yz11aab2bcc3dd4ee5ff6gg7hh8i
i9j0jk1kl2lmm3nnoo4p5pq6qrr7ss8tt9uuvv0wwx1x2yyzz13aba4cbcb5dcdc6dedfef8egf9gfh0ghg1ihi2hji3jik4jkj5lkl6kml7ml
n8mnm9ono -->
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>WordPress &rsaquo; Error</title>
<link rel="stylesheet" href="http://192.168.56.101/wordpress/wp-admin/css/install.css" type="text/css" />
</head>
<body id="error-page">
<p>Duplicate comment detected; it looks as though you&#8217;ve already said that!</p></body>
</html>

===============================================================================================================================

Figure 9.10: w3af HTTP log showing a request which triggered a false positive Blind SQL Injection
vulnerability

Chapter 9. Test Series | 90


10 User Manual & Development Guide

This chapter contains the user manual and the development guide for WASTF. The user manual leads
new users trough the installation process of WASTF and helps to get familiar with its commands. The
development guide is for advanced users who want to extend the functionality of WASTF and build
it from source. Because WASTF is currently only supported on Linux these two manuals describe the
installation and build process specifically for the Linux operating system.

10.1 User Manual

The user manual is targeted at users who want to install and run WASTF on their machines. The
manual outlines the software and hardware requirements, installation procedure and how to start
the WASTF application in either interactive or automated mode.

10.1.1 Hard- & Software Requirements

WASTF is implemented purely in Java. This means that the only true requirement for running it is
an installed Java Runtime Edition (JRE). WASTF makes use of Java 6.0 features, so your JRE must
be at least of a 6.0 (1.6.0+) pedigree (either 32 or 64bit is fine). WASTF currently includes all of the
free/open source third-party libraries necessary to run WASTF in the distribution package. See the
dependencies section for a complete list of all the libraries in the raw Readme.txt at the root of the
enclosed CD-ROM). WASTF has been built and tested primarily on Linux (Ubuntu 10.04 LTS). It has
seen some informal use on Windows 7, but is not tested, packaged, nor supported on platforms other
than Linux at this time.

• Java(TM) SE Runtime Environment >= 1.6.0 (32 or 64 bit)

• > 1 GB Random Access Memory (RAM)

• >= 500 MB free Hard Disk space

10.1.2 Installation Instructions

The following section outlines the necessary steps for installing and running WASTF.

Prerequisites: WASTF needs a working Sun JRE >= 1.6.0 in order to run correctly. Check the manual
of your Linux operating system on how to install the Sun JRE.

Chapter 10. User Manual & Development Guide | 91


(a) Welcome Screen (b) Readme Screen (c) License Screen

Figure 10.1: WASTF Installer

Installation: The installation is fairly simple. Insert the enclosed CD-ROM and double click on the
install.sh executable shell script in the CD-ROM’s root directory and follow the on screen instruc-
tions. Alternatively open a terminal and enter the following commands:

# cd /media/<CD-ROM DRIVE>
# java -jar Binaries/WASTF-installer.jar

If you have to install WASTF on a remote machine (e.g. accessed trough a SSH client) where no graphi-
cal user interface is available to you, you can alternatively install WASTF in a text-only mode by adding
the -console parameter to the installer like shown in the following snippet:

# java -jar WASTF-installer.jar -console

Figure 10.1 shows the first three steps of the installer.

10.1.3 Running WASTF

After WASTF has been installed successfully it can be started for the first time. Opening a terminal
and entering the following commands starts the WASTF binary in interactive mode:

# cd <WASTF Installation Directory>/


# java -Xms1024m -Xmx384M -Xss8M -Xmx1024m -jar WASTF.jar

After WASTF has been initialised and loaded all the necessary plugins the user will be prompted with
an interactive command line interface (CLI) awaiting commands from the user. WASTF is very self
explanatory and by entering help into the CLI at any time displays the currently available commands
and a short description for every command.

Starting WASTF in the automated mode is just as simple. WASTF can be started in automated mode
by opening a terminal and entering the following commands:

# cd <WASTF Installation Directory>/


# java -Xms1024m -Xmx384M -Xss8M -Xmx1024m -jar WASTF.jar -s <script_file>

Chapter 10. User Manual & Development Guide | 92


Whereas the mentioned <script_file> needs to be provided by the user. The script file is a simple
text file containing valid WASTF commands which would have been entered manually into the CLI
by the user in the interactive mode. The commands are being processed sequentially from top to
bottom and no interaction by the user is needed after starting WASTF. See section 6.2 for more details
about the two modes.

10.1.4 Video Tutorials

The enclosed CD-ROM contains various tutorial videos in the OGG video format. These videos
show how to configure various plugins and how to start them. The videos are located in the
root directory of the CD-ROM in a folder labelled Videos. The videos have been recorded using
gtk-recordmydesktop version v0.3.8.1 on a Ubuntu Linux operating system.

10.1.5 Use WASTF Results for Automated Processing

WASTF has the ability to produce a machine readable report of his findings after a successful scan of
a targeted web application. This report can be used by other third party tools to process the found
information or vulnerabilities and act upon them. The report uses a XML format and can be enabled
after WASTF has been started like follows:

wastf(/) > plugins


wastf(plugins) > output
wastf(output) > enable xmlFile
wastf(output) > xmlFile
wastf(xmlFile) > set fileName report.xml

Figure 10.2 contains a sample WASTF XML report. The XML report structure generated by WASTF
is very similar to that of w3af. This was done deliberately for easy integration in applications which
already use w3af XML reports. The XML report contains valuable information such as: the URL
pointing to the location where a SQL Injection Vulnerability has been found, the HTML GET or POST
parameter which is vulnerable to SQL code injection and a list of working payloads.

Chapter 10. User Manual & Development Guide | 93


<?xml version="1.0" encoding="UTF-8"?>
<wastf start="1280501587664" startstr="Fri Jul 30 16:53:07 CEST 2010" xmloutputversion="1.00">
<scaninfo target="http://192.168.56.101/dvwa/vulnerabilities/sqli/"/>
<information name="SQL Query Detected" url="http://192.168.56.101/dvwa/vulnerabilities/sqli/" var="id">
The following list of SQL queries have been found containing the sent random value: [78377841730792481813]:
* SELECT first_name, last_name FROM users WHERE user_id = ’78377841730792481813’]
</information>
<vulnerability name="SQL Injection Vulnerability" url="http://192.168.56.101/dvwa/vulnerabilities/sqli/" var="id">
A SQL Injection vulnerability has been detected in: [http://192.168.56.101/dvwa/vulnerabilities/sqli/] using the
follwing parameter: [<input name="id" value="" type="text"/>]
The following special characters are allowed: [<, >, (, ), {, }, ,, ;, ", ’, ., !, -, =, &, *, \, OR, AND, UNION, ALL, SELECT]
The following special characters are being filtered: []
The following list of SQL queries have been found containing the sent random value: [78377841730792481813]:
* SELECT first_name, last_name FROM users WHERE user_id = ’78377841730792481813’]
The following list contains a set of working SQL payloads:
* [1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
* [x’ UNION SELECT a FROM b -- ]
* [ 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
* [1 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
* [x’ OR 1=1 -- ]
* [’ UNION SELECT a FROM b -- ]
* [’ OR 1=1 -- ]
* [) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
* [) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
* [1) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
* [)) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
* [1)) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
* [’; SELECT a FROM b -- ]
* [x’; SELECT a FROM b -- ]
* [’’’; SELECT a FROM b -- ]

</vulnerability>
</wastf>

Figure 10.2: WASTF Report XML File

10.1.6 http-settings Configuration


Table 10.1 shows the options available to the user to tweak the HTTP and HTML parsing libraries used
by WASTF to retrieve and process HTML pages.

Option Description
userAgent This option changes the UserAgent sent by WASTF. Chang-
ing this value also alters the behaviour when parsing and exe-
cuting JavaScript code encountered in retrieved HTML pages.
Possible values are: FIREFOX_3, INTERNET_EXPLORER_6,
INTERNET_EXPLORER_7, INTERNET_EXPLORER_8
cssEnabled Turns CSS parsing on or off. Expect WASTF to be slower when
this option is enabled.
javaScriptEnabled Turns JavaScript parsing on or off.
checkSSLCertificates When this option is enabled and a connection over HTTPS is
made to a server, the server certificate will be validated before
successfully returning a connection to WASTF.
proxyHost Host of the HTTP proxy
proxyPort The port the HTTP proxy is listening on
proxyUsername If necessary configures a username for accessing the HTTP
proxy
proxyPassword If necessary configures a password for accessing the HTTP
proxy

Table 10.1: WASTF http-settings menu

Chapter 10. User Manual & Development Guide | 94


10.1.7 webFormLogin Plugin Configuration
Table 10.2 shows the options available to the user to configure the webFormLogin plugin which au-
tomates the login process for web applications with simple HTML form based authentication.

Option Description
formName The name or id of the form containing the login form. If the
form does not have an id or a name tag this can be left blank
and WASTF will use the first HTML form it finds on the HTML
page.
urlWithLoginForm URL pointing to the HTML page containing the login form.
formData JSON encoded string containing the data which will be in-
serted into the previously specified login form. Example:
{"username_field_name_or_id":"myusername",
"password_field_name_or_id":"mypassword"}. In order
to find the necessary names of the input elements, the user has
to look at the HTML source of the page containing the HTML
form.

Table 10.2: WASTF webFormLogin plugin settings

10.1.8 webSpider Plugin Configuration


Table 10.3 contains the options available to the user to tweak the web spider component of WASTF.

Option Description
maxThreads Maximal number of concurrent web spidering threads. The de-
fault value is 15. Setting this value to high actually results in
decreased performance (RAM restrictions).
maxLevel This option is a termination criteria. If the configured number
of levels have been reached the web spider will automatically
stop spidering any further.
onlyForward Enabled or disables if the web spider should only search HTML
pages inside the configured target URL. If this option is dis-
abled and a retrieved HTML page contains a link to e.g. Google
then this page will also be spidered. In most cases this option
should be enabled.
stopAfterMinutes This option is a termination criteria. After spidering x minutes
the web spider will automatically stop spidering any further.
ignoreRegex By specifying a Java regular expression, URLs matching this ex-
pression will be ignored by the web spider.
followRegex By specifying a Java regular expression, only URLs matching
this expression will be processed by the web spider (ignoring
the ignoreRegex option).

Table 10.3: WASTF webSpider plugin settings

Chapter 10. User Manual & Development Guide | 95


10.1.9 misc-settings Configuration
Table 10.4 contains the settings provided by the misc-settings menu.

Option Description
randomSeed Contains the seed to initialise the random number/string gen-
erator used by the databaseQueryLog plugin. This value is ran-
domly generated at each start of WASTF.

Table 10.4: WASTF misc-settings settings

10.1.10 databaseQueryLog Plugin Configuration


Table 10.5 shows the settings provided by the databaseQueryLog plugin.

Option Description
online Whether the online or offline mode should be used by the plu-
gin.
queryLogLocation The location of the database query log file.
ignoreRegex By specifying a Java regular expression, HTTP GET or POST pa-
rameters matching this expression will be ignored by the plu-
gin.

Table 10.5: WASTF databaseQueryLog plugin settings

Currently there are two queryLogLocation schemes supported. Either a direct connection to a
MySQL server can be configured by the user or a MySQL query log file location. To configure a di-
rect connection to a MySQL server the following scheme has to be used for the queryLogLocation
parameter:

set queryLogLocation
jdbc:mysql://<hostname_or_ip>:<port>/mysql?user=root&password=<password>

If the offline mode is being used and a MySQL query log file should be analysed by WASTF, the follow-
ing scheme has to be used for the queryLogLocation parameter:

set queryLogLocation
file:mysql://<path_to_query_log_file>

10.1.11 Using the Offline Mode

The offline mode is being enabled by setting the online parameter to false. This tells the plugin to
send all the necessary requests for identifying SQL Injection vulnerabilities but expects the database
query log to be provided by the user whenever he has the necessary information available. WASTF
keeps the necessary information persistent and thus does not have to be running until the database
query log becomes available. When it does, the following commands starts the detection routine:

Chapter 10. User Manual & Development Guide | 96


# java -jar WASTF.jar
wastf(/) > history
+----+-------------------------------+--------------------------------------------------+
| Id | Date | Target |
+----+-------------------------------+--------------------------------------------------+
| 1 | Wed Jul 28 15:43:51 CEST 2010 | http://192.168.56.101/dvwa/vulnerabilities/sqli/ |
+----+-------------------------------+--------------------------------------------------+
wastf(/) > loadConf 1
[+] Previous settings restored
wastf(/) > plugins
wastf(plugins) > audit
wastf(audit) > databaseQueryLog
wastf(databaseQueryLog) > set queryLogLocation file:mysql:///home/kevin/Desktop/query.log
wastf(databaseQueryLog) > back
wastf(audit) > back
wastf(plugins) > output
wastf(output) > enable xmlFile
[+] Plugin enabled
wastf(output) > back
wastf(plugins) > back
wastf(/) > parseQueryLog

10.2 Development Guide

The development guide is targeted at users who want to extend the functionality of WASTF and build
it from source on their own machines. The following sections contain detailed instructions on how
to build WASTF and what kind of tools and libraries have been used during the initial development
phase of the WASTF application.

10.2.1 Prerequisites

The following section outlines the prerequisites needed for building WASTF from source.

Sun Java 1.6 JRE / JDK: WASTF has been developed and tested under the Ubuntu 10.04 LTS “Lucid
Lynx” operating system with the Java(TM) SE Runtime Environment (build 1.6.0_20-b02) 32 and
64bit and Java(TM) SE Development Kit (1.6.0_20) 32 and 64 bit. Check the manual of your Linux
operating system on how to install the Sun JRE and JDK.

Maven 2: Apache Maven is a software project management and comprehension tool. Based on the
concept of a project object model (POM), Maven can manage a project’s build, reporting and doc-
umentation from a central piece of information. If you are using the Ubuntu operating system you
can install Maven by simply typing: sudo apt-get install maven into a terminal. See the Maven
documentation on how to install Maven for your specific operating system
(http://maven.apache.org/download.html#Installation).

10.2.2 Installing the Development Environment

Insert the enclosed CD-ROM and double click on the install.sh executable shell script in the CD-
ROM’s root directory and follow the on screen instructions of the installer1 . Alternatively open a

1 Theinstaller has been created using the free IzPack project. IzPack is a one-stop solution for packaging, distributing and
deploying applications. It is fully cross-platform and generates a single installer. As such, it is an alternative to native
solutions such as platform-specific installers and package managers. IzPack-generated installers only require a Java

Chapter 10. User Manual & Development Guide | 97


terminal and enter the following commands:

# cd /media/<CD-ROM DRIVE>
# java -jar Binaries/WASTF-installer.jar

Make sure that you select the Sources option in the package selection panel in order to install the
WASTF source files needed for extending and building WASTF from source. If you have to install the
WASTF sources on a remote machine (e.g. accessed trough a SSH client) where no graphical user
interface is available to you, you can alternatively install WASTF in a text-only mode by adding the
-console parameter to the installer like shown in the following snippet:

# java -jar WASTF-installer.jar -console

After successfully installing the WASTF source files, several Java libraries needed for building WASTF
have to be added to the local Maven repository on the development machine. This is necessary be-
cause these specific libraries are not part of an official and online available Maven repository. Open a
terminal and enter the following commands in order to add the necessary libraries to the local Maven
repository:

# cd <WASTF Installation Directory>/src/


# ./install-missing-maven-deps.sh

The install-missing-maven-deps.sh script has to be executed only once after installing the
sources. See section 10.2.4 for a complete list of used free/third party libraries which have been used
by WASTF.

10.2.3 Building WASTF with Maven

Building WASTF from source is fairly easy and is taken care of by Maven. Enter the following com-
mands into a terminal for building WASTF from the previously installed source files:

# cd <WASTF Installation Directory>/src/


# mvn package

Alternatively, if you want to skip the JUnit tests which are being automatically executed by Maven
after the build process you can provide the -Dmaven.test.skip parameter to the mvn command.
For more information about specific Maven commands visit http://maven.apache.org/.

10.2.4 Free/Third-Party Libraries Used

WASTF relies on several free and third party libraries to provide the needed functionality. Table 10.9
provides a list of all the libraries used and their specific licensing scheme. The following three libraries
are not part of an official and online available Maven repository and have to be installed manually (see
section 10.2.2).
virtual machine to run. http://izpack.org/ [10.07.10]

Chapter 10. User Manual & Development Guide | 98


• Ganymed SSH2 Build250: Ganymed SSH-2 for Java is a library which implements the SSH-2
protocol in pure Java. It allows one to connect to SSH servers from within Java programs. It
supports SSH sessions (remote command execution and shell access), local and remote port
forwarding, local stream forwarding, X11 forwarding, SCP and SFTP.

• JsqlParser Version 0.6.2.a: JSqlParser parses a SQL statement and translates it into a hierarchy
of Java classes. The generated hierarchy can be navigated using the Visitor Pattern.

• Apache Derby Debug Library: Apache Derby, an Apache DB sub project, is an open source
relational database implemented entirely in Java.

10.2.5 Use logback to debug WASTF

WASTF uses the free SLF4J project to provide extensive logging capabilities to the users of WASTF. The
Simple Logging Facade for Java (SLF4J) or serves as a simple facade or abstraction for various logging
frameworks, e.g. java.util.logging, log4j and logback, allowing the end user to plug in the desired
logging framework at deployment time. By making modifications to the logback.xml file inside the
WASTF packaged jar file, the logging details can be increased in order to create a more verbose output
for debugging and program error identification purposes.

10.2.6 WASTF Test Environment

The enclosed CD-ROM contains a virtual machine with various pre-installed and configured web
applications for testing WASTF. The virtual machine is located in the root directory of the CD-ROM in
a folder labelled Test Environment. See chapter 9 for more details about the test environment.

10.3 Sonar

Sonar2 is an open platform to manage code quality and has been extensively used during the devel-
opment of the WASTF application. Sonar covers the 7 axes of code quality (see table 10.10).
Sonar is a web-based application. Rules, alerts, thresholds, exclusions, settings... can be configured
online. By levering its database, Sonar not only allows to combine metrics altogether but also to mix
them with historical measures. Covering new languages, adding rules engines, computing advanced
metrics can be done through a powerful extension mechanism. More than 30 plugins are already
available.

Sonar makes heavy use of already existing source quality tools such as PMD, Checkstyle, Findbugs
and others for providing meaningful statistics about the quality of analysed code. Additional features
include (taken from the Sonar project web site):

• Drill down to source code: Want to know why a project has for instance so many coding rules
violations? Drill down to modules, then to packages and finally to source code

2 http://www.sonarsource.org/ [09.07.10]

Chapter 10. User Manual & Development Guide | 99


Maven Artificat ID Version URL License: Description
jcl-core 2.1 http://jcloader.sourceforge.net/ LGPL JCL is a configurable, dynamic and extensible cus-
tom classloader that loads java classes directly
from Jar files and other sources.
junit 4.8.1 http://www.junit.org/ CPLv1 JUnit testing framework
commons-logging 1.1.1 http://commons.apache.org/logging/ Apache License, v2 Logging Framework
slf4j-api 1.6.0 http://www.slf4j.org/ MIT Logging Framework
logback-core 0.9.21 http://logback.qos.ch/ LGPL v2.1 Logging Framework
logback-classic 0.9.21 http://logback.qos.ch/ LGPL v2.1 Logging Framework
args4j 2.0.12 https://args4j.dev.java.net/ MIT args4j is a small Java class library that makes it easy
to parse command line options/arguments in your
CUI application.
commons-net 2.0 http://commons.apache.org/ Apache License, v2 Collection of network utilities and protocol imple-
mentations.
commons-validator 1.3.1 http://commons.apache.org/ Apache License, v2 Framework to define validators and validation
rules in an xml file.
ganymed-ssh2 build250 http://www.cleondris.ch/opensource/ssh2/ BSD style Ganymed SSH-2 for Java is a library which imple-
ments the SSH-2 protocol in pure Java.
h2 1.2.127 http://www.h2database.com/ EPL 1.0 Java SQL database
htmlunit 2.7 http://htmlunit.sourceforge.net/ Apache License, v2 HtmlUnit is a GUI-Less browser for Java programs.
jline 0.9.94 http://jline.sourceforge.net/ BSD style JLine is a Java library for handling console input. It
is similar in functionality to BSD editline and GNU
readline.

Table 10.7: Free/Third Party libraries used by WASTF (1 of 2)

Chapter 10. User Manual & Development Guide


| 100
Maven Artificat ID Version URL License: Description
json-simple 1.1 http://code.google.com/p/json-simple/ Apache License, v2 JSON.simple is a simple Java toolkit for JSON. You
can use JSON.simple to encode or decode JSON
text.
utils 1.07.00 http://ostermiller.org GPL Libraries for common programming tasks such as
CSV files and Base64 encoding.
oro 2.0.8 http://jakarta.apache.org/oro/ Apache License, v2 The Jakarta-ORO Java classes are a set of text-
processing Java classes that provide Perl5 compat-
ible regular expressions.
mysql-connector-java 5.1.12 http://www.mysql.com/products/connector/ GPL MySQL offers standard database driver connectiv-
ity for using MySQL with applications and tools
that are compatible with industry standards ODBC
and JDBC.
jsqlparser 0.6.2.a http://jsqlparser.sourceforge.net/ LGPL v2.1 JSqlParser parses a SQL statement and translates it
into a hierarchy of Java classes. The generated hi-
erarchy can be navigated using the Visitor Pattern.
derby-debug-lib 10.6.1.0 http://db.apache.org/derby/ Apache License, v2 Apache Derby, an Apache DB sub project, is an
open source relational database implemented en-
tirely in Java.

Table 10.9: Free/Third Party libraries used by WASTF (2 of 2)

Chapter 10. User Manual & Development Guide


| 101
• Comments • Unit Tests Coverage
• Coding Rules • Duplications
• Potential Bugs • Architecture & Design
• Complexity

Table 10.10: Sonar’s 7 axes of code quality

• Unit Tests: Nowadays, what does code quality mean without considering unit tests and associ-
ated metrics like code coverage?

• Time Machine: Sonar helps you replay the past and show you how quality metrics evolve in
time.

• Coding Rules: More than 600 coding rules are provided off the shelf from simple naming con-
vention to complex anti - pattern detection.

• Leverage existing components: You probably already use best of breed tools like Checkstyle,
PMD, Findbugs, Clover, Cobertura. Sonar can transparently orchestrate all those components
for you.

10.3.1 Analysing WASTF with Sonar

The first step in analysing WASTF with Sonar is to install Sonar by downloading the binary package
from the Sonar project web site. The installation of Sonar is finished by extracting the files contained
in the previously downloaded archive into an arbitrary folder. In order to analyse any kind of Java
project with Sonar the following commands have to be executed every time:

• Starting Sonar: If Sonar is not already running, it has to be started manually by opening a
terminal and entering the following commands:

# cd <Sonar Installation Directory>/bin/linux-x86-32/


# ./sonar.sh console

Give Sonar a few seconds to boot up before moving to the next step.

• Analyse WASTF: In order to analyse WASTF with Sonar one simply has to type the following
commands into a terminal:

# cd <WASTF Installation Directory>/src/


# mvn sonar:sonar

Maven automatically builds and runs the necessary tools for generating meaningful statistics
about the project which are being interpreted and visually rehashed by Sonar. The results and
statistics can be accessed trough the Sonar Management Console. The Management Console
can be accessed by opening http://localhost:9000 inside a web browser.

Chapter 10. User Manual & Development Guide | 102


10.3.2 Conclusion about the usage of Sonar in Software Projects

After using Sonar during the development of WASTF which is a complex and relatively large (about
12’000 lines of code) application, the following conclusions have to be drawn:

• If a developer already uses source code analysing tools such as PMD, Checkstyle and Findbugs
then Sonar does not provide more information than the developer already has except for a view
on Java package dependency cycles3 . These could be easily obtained by another stand alone
tool such as JDepend4 .

• It is easy to become a slave of the metrics provided by Sonar. Endless refactoring of source code
only to satisfy generated metrics might lead to more complex and unreadable code.

• Sonar can be easily integrated into a continuous integration tool such as Hudson5 . The inte-
gration in such tools reduces the need to manually start Sonar. For a guide on how to integrate
Sonar into Hudson consult the following quick guide: http://meera-subbarao.blogspot.
com/2009/11/hudson-sonar-perfect-match.html.

If a developer is not familiar with code analysing tools, then Sonar is an easy way to monitor software
quality throughout the development of a software project. Sonar operates with a good set of default
settings for its source code analysing tools which reduces the configuration basically to zero. One of
the main advantages of Sonar as opposed to using source code analysing tools on their own is the
consolidated Management Console which allows to quickly determine the current code quality with
a blink of an eye.

3 Cycles exist across a variety of modules; notably class, package and .jar. Class cycles exist when two classes, such as Cus-
tomer and Bill, each reference the other (assume Customer has a list of Bill instances, and Bill references the Customer
to calculate a discount amount). This is also known as a bi-directional association. It is a maintenance and testing issue,
since a developer can not do anything to either class without possibly affecting the other. Class cycles can be broken a
few different ways, one of which is to introduce an abstraction that breaks the cycle.
4 http://www.clarkware.com/software/JDepend.html [09.07.10]
5 https://hudson.dev.java.net/ [11.07.10]

Chapter 10. User Manual & Development Guide | 103


11 Summary

This chapter recapitulates the findings and limitations of this project thesis and proposes a list of
further steps for the project.

11.1 Conclusion

This project thesis has shown that it is possible to enhance the detection of security flaws in web
applications by combining black and white box testing techniques.

The detection of SQL and persistent XSS Injection flaws is based on doing common black box tests
but additionally parsing database query log files which contain every SQL statement sent from the
web application to the database. The test agent is now free to send various SQLi or XSS attack strings
and is instantly able to tell if these inputs are being filtered by the web application or not. Once the
triggered SQL statements end up in the database query log file, they have left the web application
and passed all the validation routines implemented by the developers. The test agent can now tell if
his malicious attack strings have reached the database unfiltered or not.

A web application security testing framework (WASTF) has been developed during this project thesis
with four finished plugins so far:

• Web Form Login: A plugin that can be used for automated log in into web applications by
configuring the HTML form elements to populate with the necessary user credentials.

• Web Spider: A web spider based on the open source HtmlUnit library with JavaScript support,
scoring 81% in the Wivet benchmark (w3af has 50%).

• Database Query Log: A combined white and black box plugin which detects input validation
flaws in web applications by parsing the database query log in an online and offline mode.

• XML Report: A plugin for writing a XML report containing the findings of all the plugins which
have been enabled during a run.

WASTF can be easily integrated into the Automated Security Testing Framework (ASTF) which
has been developed at the Institut für angewandte Informationstechnologie (InIT) because of the
machine readable XML report file. WASTF has been developed with extendibility in mind and in
compliance with Findbugs, Checkstyle and Sonar.

Chapter 11. Summary | 104


The tests in chapter 9 have shown that WASTF outperforms w3af in nearly all categories. Apart from
a small bug in spidering the Wordpress web application (see section 9.3.4), WASTF has outperformed
w3af’s web spider in either runtime or accuracy. WASTF sends more HTTP(S) requests during the
spidering phase but this is due to the JavaScript support whereas w3af ignores external JavaScript
source files.

It can be shown that the detection of SQL input validation vulnerabilities is more accurate and more
performant by combining black and white box testing methodology than only using a black box test-
ing approach. WASTF is 14 times faster than w3af in detecting SQL input validation vulnerabilities
in web applications. WASTF is still in an early phase and might contain some bugs and thus report
false positives but the false positive Blind SQL Injection vulnerability reported by w3af (see section
9.3.10) has been successfully detected as not exploitable by WASTF. The online mode (see chapter 4
and 8) even generates SQL exploit payloads and verifies them by actually submitting to the targeted
web application before reporting any vulnerabilities and thus increasing the accuracy of reported
vulnerabilities even further. The offline mode is not as accurate as the online mode because WASTF’s
generated SQL exploit payloads can not be tested until the SQL statement triggered by the web ap-
plication is known to WASTF. An alternative would bee to send SQL exploit payloads blindly (same
as black box security testing applications). They would not increase the accuracy of the vulnerabil-
ity detection routine but only increase the HTTP(S) request footprint. WASTF tries to find working
SQL exploit payloads offline by mutating the found SQL statements in the database query log. Be-
cause WASTF operates in the offline mode, the SQL payloads can not be verified by sending them to
the targeted web application and this circumstance might lead to false positives being reported by
WASTF.

11.2 Further Steps

This section outlines further steps which would make WASTF more mature and useful for its users.

11.2.1 Application Architecture

The application architecture should be revised and refactored if necessary. Adding more JUnit test
cases is a bit difficult with the current application architecture due to external and internal package
dependencies. The factoring process should follow these four steps:

1. Identify classes: Identify the classes of the system which should be tested with an appropriate
JUnit test case.

2. Find test points: Identify the methods and classes which are needed to validate the correctness
of the test.

3. Break dependencies: Make changes that the identified class can be tested in isolation.

4. Write tests: Write the actual JUnit test case.

Chapter 11. Summary | 105


Package structures can not be designed from the top down. This means that this is not one of the first
things about the system that is designed. The package structure evolves as the system grows and this
refactoring the package structure is a natural process in agile software development (see [10, p. 260]).

11.2.2 Features

• The database query log module only supports the MySQL database at the moment. Adding
support for other database solutions is trivial and should be done to increase the possible ap-
plications.

• Adding more white and/or black box plugins to increase the feature list of WASTF.

• Eventually add a graphical user interface (GUI) for WASTF to make it easier to use.

• Enhance the SQL payload generation routine and add more SQL validation libraries for other
SQL dialects.

Chapter 11. Summary | 106


List of Figures

2.1 White box web application profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1 Use Case Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21


3.2 Simplified UML 2 activity diagram of the application usage flow . . . . . . . . . . . . . . . . . . 22

5.1 w3af XSS Plugin JavaScript Payloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.1 Menu and plugin hierarchical tree structure of the WASTF framework . . . . . . . . . . . . . . 49
6.2 Simplified UML 2 Package Overview of the WASTF Application . . . . . . . . . . . . . . . . . . . 51
6.3 Entity Relationship Model (ERM) of the current implemented WASTF database schema 54

7.1 JavaScript execution time in various browsers for HTML DOM modifiactions . . . . . . . . 61
7.2 Simplified UML 2 activity diagram showing the main flow of the web spider . . . . . . . . . 64
7.3 Simplified UML 2 activity diagram for the non invasive smart fill HTML form module . . 66

8.1 Fragment List used for mutating SQL Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71


8.2 Random value in the original SQL statement replaced with the Fragment List . . . . . . . . 71
8.3 PAYLOAD token replaced with values from the Payload List . . . . . . . . . . . . . . . . . . . . . 71
8.4 Working SQL Code Injection exploits after mutating a captured SQL statement . . . . . . . 73
8.5 Simplified UML 2 activity diagram showing the SQL mutation routine . . . . . . . . . . . . . 77

9.1 Various pre-installed applications in the WASTF test environment virtual machine . . . . 80
9.2 Test Set-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.3 Test case file structure on the enclosed CD-ROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.4 w3af’s crash message while spidering the Magento eCommerce web application . . . . . . 85
9.5 The HTTP request causing an endless loop in the web spider component . . . . . . . . . . . 86
9.6 WebFormLogin plugin configuration (left side) and output (right side) for DVWA . . . . . . 87
9.7 WebFormLogin plugin configuration (left side) and output (right side) for logging into
the SecureMessaging web application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.8 The DVWA database query log plugin test case configuration (Online Mode) . . . . . . . . . 88
9.9 The DVWA database query log plugin test case working payloads (Online Mode) . . . . . . 88
9.10 w3af HTTP log showing a request which triggered a false positive Blind SQL Injection
vulnerability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

10.1 WASTF Installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93


10.2 WASTF Report XML File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

List of Figures | 107


List of Tables

4.2 Typical exploit code used by security testing applications to detect XSS and SQLi injec-
tion vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Simplified performance comparison of the “Online” and “Offline” strategies for detect-
ing input validation vulnerabilities by parsing the database query log . . . . . . . . . . . . . . 39

5.1 w3af’s WebSpider plugin settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41


5.2 w3af’s XSS plugin settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

9.1 Results of the WIVET [Web Spider] test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84


9.2 Results of the Magento [Web Spider] test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.3 Results of the Wordpress [Web Spider] test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.4 Results of the Flowershop [Web Spider] test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.5 Results of the DVWA [Database Query Log - Online Mode] test case . . . . . . . . . . . . . . . . 89
9.6 Results of the Wordpress [Database Query Log Online Mode & Web Spider] test case . . . 89
9.7 Results of the Wordpress [Database Query Log Offline Mode & Web Spider] test case . . . 89

10.1 WASTF http-settings menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95


10.2 WASTF webFormLogin plugin settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.3 WASTF webSpider plugin settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.4 WASTF misc-settings settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
10.5 WASTF databaseQueryLog plugin settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
10.7 Free/Third Party libraries used by WASTF (1 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.9 Free/Third Party libraries used by WASTF (2 of 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
10.10Sonar’s 7 axes of code quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

List of Tables | 108


Listings

2.1 EJB3 Security Annotations Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4


2.2 Sample Apache access.log output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Sample Apache error.log output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Sample Apache mod_log_forensic module output . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Sample top output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 Simple Spring URL mapping example prior to version 2.5 . . . . . . . . . . . . . . . . . . . . . . . 8
2.7 Simple Struts URL mapping example for version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8 Simple Struts validator example for version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.9 MySQL query log example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.10 MySQL table information example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.1 Enabling of the MySQL query log trough the configuration file . . . . . . . . . . . . . . . . . . . 24
4.2 Enabling of the MySQL query log at runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Definition of the MySQL query log table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Content of the MySQL query log table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Enabling the query log trough the PostgreSQL configuration file . . . . . . . . . . . . . . . . . . 27
4.6 Content of the PostgreSQL query log file (CSV-format output) . . . . . . . . . . . . . . . . . . . . 28
4.7 Definition of the PostgreSQL query log table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.8 Pseudo code for detecting input validation vulnerabilities by parsing the database
query log at runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.9 Pseudo code for detecting input validation vulnerabilities by parsing the database
query log as a post scanning process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.1 Configuring basic HTTP authentication in w3af . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42


5.2 Basic HTTP authentication headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 Setting a HTTP headers file in w3af . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.4 Contents of a possible HTTP headers file in w3af . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.5 Setting a Cookie JAR file in w3af . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.1 Starting the WASTF framework in the interactive mode . . . . . . . . . . . . . . . . . . . . . . . . . 52


6.2 Starting the WASTF framework in the automated mode . . . . . . . . . . . . . . . . . . . . . . . . 52
6.3 Content of the wastfScript.txt WASTF script file . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.1 Short HtmlUnit example for submitting a login form trough a proxy server . . . . . . . . . . 59
7.2 Dynamically created HTML login form with embedded JavaScript code . . . . . . . . . . . . 59

Listings | 109
8.1 Validating a SQL statement with JsqlParser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.2 Validating a SQL statement with Apache Derby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Listings | 110
Bibliography

[1] Mike Andrews and James A. Whittaker. How to Break Web Software: Functional and Security
Testing of Web Applications and Web Services. Addison-Wesley Professional, 2006.

[2] Kevin Denver. Development and Integration of ASTF-Plugins in the field of Web Application
Vulnerabilities. MSE Project Thesis 1, February 2009.

[3] Kevin Denver. Evaluation geeigneter Plugins im Bereich Web Application Vulnerabilities für die
Integration ins ASTF. MSE Seminary Thesis 1, November 2009.

[4] Kevin Denver. Preliminary Study of white box security testing for the integration into the ASTF
framework. MSE Seminary Thesis 2, February 2010.

[5] J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, and L. Stewart. HTTP
Authentication: Basic and Digest Access Authentication. RFC 2617 (Draft Standard), June 1999.

[6] Patrice Godefroid, Michael Y. Levin, and David A Molnar. Automated whitebox fuzz testing. In
Network Distributed Security Symposium (NDSS). Internet Society, 2008.

[7] Paco Hope and Ben Walther. Web Security Testing Cookbook: Systematic Techniques to Find Prob-
lems Fast. O’Reilly Media, Inc., 2008.

[8] David Chenho Kung, Chien-Hung Liu, and Pei Hsia. An object-oriented web test model for test-
ing web applications. In COMPSAC ’00: 24th International Computer Software and Applications
Conference, pages 537–542, Washington, DC, USA, 2000. IEEE Computer Society.

[9] Craig Larman. Applying UML and Patterns: An Introduction to Object-Oriented Analysis and
Design and Iterative Development (3rd Edition). Prentice Hall PTR, Upper Saddle River, NJ, USA,
2004.

[10] Robert Cecil Martin. Agile Software Development: Principles, Patterns, and Practices. Prentice
Hall PTR, Upper Saddle River, NJ, USA, 2003.

[11] Michael C., Lavenhar Steven R. Building security in: Source code analysis tools - overview.
https://buildsecurityin.us-cert.gov/bsi/articles/tools/code/263-BSI.html, 2009.

[12] MySQL 5.1 Reference Manual. The General Query Log. http://dev.mysql.com/doc/refman/5.1/en/query-
log.html, 2010.

[13] PostgreSQL 8.3.10 Documentation. http://www.postgresql.org/docs/8.3/static/runtime-


config-logging.html, 2010.

Bibliography | 111
[14] Joel Scambray, Mike Shema, and Caleb Sima. Hacking Exposed Web Applications, Second Edition.
McGraw-Hill, Inc., New York, NY, USA, 2006.

[15] Paolo Tonella and Filippo Ricca. Dynamic model extraction and statistical analysis of web ap-
plications. In WSE ’02: Proceedings of the Fourth International Workshop on Web Site Evolution
(WSE’02), page 43, Washington, DC, USA, 2002. IEEE Computer Society.

[16] Paolo Tonella and Filippo Ricca. A 2-layer model for the white-box testing of web applications. In
WSE ’04: Proceedings of the Web Site Evolution, Sixth IEEE International Workshop, pages 11–19,
Washington, DC, USA, 2004. IEEE Computer Society.

Bibliography | 112
JavaScript DOM Modifications Performance Test
Web Browser createElement createTextNode cloneNode appendChild insertBefore innerHTML Total
[ms] [ms] [ms] [ms] [ms] [ms] [ms]

HtmlUnit 2.7 #1 269 92 584 65 66 300 1376


#2 216 97 537 60 62 292 1264
#3 242 177 607 62 68 314 1470
#4 202 74 612 30 52 258 1228
#5 244 106 564 60 55 301 1330
234.6 109.2 580.8 55.4 60.6 293 1333.6

Firefox 3.6.3 #1 26 8 32 80 95 96 337


#2 25 11 29 83 151 85 384
#3 26 10 32 85 141 99 393
#4 25 11 30 84 105 87 342
#5 25 9 30 81 94 91 330
25.4 9.8 30.6 82.6 117.2 91.6 357.2 Δ 4

Chromium 6.0.432.0 #1 3 4 14 11 11 30 73
#2 3 4 8 10 10 33 68
#3 2 4 10 12 13 31 72
#4 2 3 10 12 10 28 65
#5 2 3 12 10 10 32 69
2.4 3.6 10.8 11 10.8 30.8 69.4 Δ 19

Kevin Denver 15.06.2010 Page 1


Kevin Denver HtmlUnit JavaScript Perfomance Benchmark MSE Project Thesis 2

HtmlUnit Java Benchmark


import java.io.IOException; 
import java.net.MalformedURLException; 
import java.util.ArrayList; 
import java.util.List; 

import com.gargoylesoftware.htmlunit.CollectingAlertHandler; 
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException; 
import com.gargoylesoftware.htmlunit.WebClient; 
import com.gargoylesoftware.htmlunit.html.HtmlPage; 

public class HtmlUnitBenchmark { 
    public static void main(final String[] args) 
        throws FailingHttpStatusCodeException, 
MalformedURLException, IOException, InterruptedException { 
        final WebClient webClient = new WebClient(); 
        final List<String> collectedAlerts = new ArrayList<String>(); 
        webClient.setCssEnabled(false); 
        webClient.waitForBackgroundJavaScriptStartingBefore(9000); 
        webClient.setThrowExceptionOnScriptError(false); 
        webClient.setAlertHandler(new CollectingAlertHandler(collectedAlerts)); 
        final HtmlPage page = 
webClient.getPage("file:///home/kevin/Desktop/dom­modify.html"); 
        
        while(true) { 
            if (page.getTitleText().equalsIgnoreCase("Finished")) { 
                break; 
            } 
            
            synchronized (page) { 
                page.wait(1); 
            } 
        } 
        System.out.println(collectedAlerts); 
    } 
}

HTML Page with embedded JavaScript


...
<script> 
var timeDiff  =  { 
    setStartTime:function (){ 
        d = new Date(); 
        time  = d.getTime(); 
    }, 

    getDiff:function (){ 
        d = new Date(); 
        return (d.getTime()­time); 
    } 

// Try to force real results 
var ret, tmp, str; 

var elems = []; 
var htmlstr = document.body.innerHTML; 
var div = document.createElement("div"); 
var num = 400; 

18.06.10
Kevin Denver HtmlUnit JavaScript Perfomance Benchmark MSE Project Thesis 2

for (var i = 0; i < 1; i++) { 
    str += String.fromCharCode( (25 * Math.random()) + 97 ); 
    
    timeDiff.setStartTime(); 
    for ( var i = 0; i < num; i++ ) { 
ret = document.createElement("div"); 
ret = document.createElement("span"); 
ret = document.createElement("table"); 
ret = document.createElement("tr"); 
ret = document.createElement("select"); 
    } 
    alert("createElement(): " + timeDiff.getDiff()); 
    
    timeDiff.setStartTime(); 
    for ( var i = 0; i < num; i++ ) { 
ret = document.createTextNode(str); 
ret = document.createTextNode(str + "2"); 
ret = document.createTextNode(str + "3"); 
ret = document.createTextNode(str + "4"); 
ret = document.createTextNode(str + "5"); 
    } 
    alert("createTextNode(): " + timeDiff.getDiff()); 

    timeDiff.setStartTime(); 
    document.body.innerHTML = htmlstr; 
    alert("document.body.innerHTML: " +timeDiff.getDiff()); 
    
    elems = []; 
    var telems = document.body.childNodes; 
    for ( var i = 0; i < telems.length; i++ ) { 
        elems.push( telems[i] ); 
    } 

timeDiff.setStartTime(); 
for ( var i = 0; i < elems.length; i++ ) { 
    ret = elems[i].cloneNode(false); 
ret = elems[i].cloneNode(true); 
ret = elems[i].cloneNode(true); 
    } 
    alert("cloneNode(): " + timeDiff.getDiff()); 

    timeDiff.setStartTime(); 
for ( var i = 0; i < elems.length; i++ ) { 
    document.body.appendChild( elems[i] ); 
    } 
    alert("appendChild(): " + timeDiff.getDiff()); 

    timeDiff.setStartTime(); 
for ( var i = 0; i < elems.length; i++ ) { 
    document.body.insertBefore( elems[i], document.body.firstChild ); 

    alert("insertBefore(): " + timeDiff.getDiff()); 
    
    document.title = "Finished"; 

alert("Finished"); 
</script>

18.06.10
InIT Institut für angewandte
Informationstechnologie

Projektarbeit

Studierender: Kevin Denver Betreuer: Prof. Dr. Marc Rennhard


Firmenpartner: PrivaSphere AG Advisor: Prof. Dr. Marc Rennhard

Ausgabe: 04. März 2010 Umfang: 14 ECTS (420 h)


Abgabe: 02. August 2010
Präsentation: TBA Unterschrift:

Entwicklung eines kombinierten Black- und White-Box Securi-


ty-Testing Tools zur Detektierung von Injection-Flaws in Web-
applikationen

Einführung und Motivation

Das Automated Security-Testing Framework (ASTF) wurde gemeinsam vom InIT und der
Firma PrivaSphere AG im Rahmen eines KTI-Projekts entwickelt. Das ASTF vereinigt ver-
schiedene Security Testing-Tools, um Online-Applikation konsequent und reproduzierbar
bezüglich Sicherheit zu testen. Dabei verwendet ASTF Tools aus den Bereichen Black-Box
Testing (z.B. mittels Vulnerability-Scans von Aussen) und White-Box Testing (z.B. Static
Source Code Analysis).
In einer vorgängigen Seminararbeit [1] wurde untersucht, welche weiteren White-Box Securi-
ty Testing-Ansätze möglich wären und sich dadurch für die Integration in ASTF eignen wür-
den. Der Fokus lag dabei bei Security-Tests von Web-Applikationen, also z.B. beim Aufdec-
ken von Schwachstellen in den Bereichen Injection-Flaws oder Access Control. In der Semi-
nararbeit wurde ebenfalls ein konkreter Vorschlag gemacht, welche der vorgeschlagenen
Tests im Rahmen dieser Projektarbeit umgesetzt werden sollen und die vorlegende Aufgaben-
stellung hält sich an diesen Vorschlag.
Im Folgenden wird der in der Seminararbeit vorgeschlagene Ansatz kurz vorgestellt, für ge-
nau Details ist der Seminararbeits-Bericht [1] zu konsultieren. Grundsätzlich geht es darum,
Injection-Flaws in Webapplikationen zu detektieren. Üblicherweise geschieht dies heute so,
dass ein reiner Black-Box Scan durchgeführt wird (z.B. mittels w3af) und das verwendete
Testing-Tool versucht dann anhand der Antworten der Webapplikation zu erkennen, ob eine
Schwachstelle vorliegt. Das Problem mit diesem Ansatz liegt darin, dass er häufig nicht sehr
präzise ist und viele false positives und false negatives produziert. Der Vorschlag der Semi-
nararbeit ist es nun, diesen Ansatz mit einer Analyse der serverseitigen Datenbank-Logfiles,
in welchen üblicherweise die Datenbank-Queries geloggt werden, zu kombinieren. Die Sicht
auf das Server-interne Datenbank-Logfile ist hier also die White-Box Komponente, um den

03.03.2010, Marc Rennhard (rema) 1/3


eigentlichen Black-Box Test von aussen zu verbessern. Damit lässt sich z.B. einfach erken-
nen, ob ein vom Scanner gesendeter String seinen Weg ungefiltert bis zur Datenbank findet
(in diesem Falle wird es entsprechend im Logfile auftauchen), was die Erkennung von SQL
Injection und persistenten (oder auch stored) XSS Attacken ermöglicht. Ein entsprechendes
Testing-Tool würde folgendermassen funktionieren:
• Mit einem Crawler wird eine möglichst umfassende Liste der URLs der zu testenden Ap-
plikation generiert.
• Basierend auf den URLs werden dann gezielt Injection-Attacken durchgeführt, d.h. es
werden Teile von SQL Statements oder Javascripts via GET oder POST Parameter an die
Webapplikation gesendet.
• Eine Analyse des Datenbank-Logfiles und die entsprechende Korrelation mit den abge-
setzten Requests ermöglicht eine präzise Aussage über vorhandene Injection-
Schwachstellen. Um den Zugriff auf die Datenbank-Logfiles zu ermöglichen, ist eine ent-
sprechende Software-Komponente auf dem zu testenden System nötig.

Termine

Der Zeitrahmen dieser Projektarbeit ist so gewählt, dass Herr Denver den Umfang von 420
Stunden neben deiner Belastung durch das MSE-Studium (3 Module à 3 ECTS im FS 2010)
und Assistententätigkeit am InIT erbringen kann.

Aufgabenstellung

Ihre Aufgabe besteht darin, ein Testing-Tool zu entwickeln, das gemäss obiger Beschreibung
die Aufdeckung von Injection-Flaws in Webapplikationen durch Kombination von Black- und
White-Box Testing-Methoden ermöglicht. Dabei soll der Fokus auf dem in der Seminararbeit
vorgeschlagenen Ansatz liegen, wobei es natürlich durchaus möglich ist, dass dieser Ansatz
im Verlaufe der Projektarbeit durch neue Erkenntnisse angepasst wird.
Die Anforderungen an das zu entwickelnde Testing-Tool sind wie folgt:
• Der verwendete Webcrawler soll möglichst alle URLs einer gegebenen Webapplikation
bestimmen können. Im Idealfall kann der Crawler auch mit Javascript umgehen. Der
Crawler soll mit einer möglichst grossen Zahl von Webapplikationen umgehen können
(z.B. Secure Messaging von PrivaSphere, Apache Wicket etc.) und auch das Login bei
der zu crawlenden Applikation selbstständig durchführen können.
• Ihr Testing-Tool soll grundsätzlich mit einer Vielzahl von Datenbank-Logfiles umgehen
können. In einem ersten Schritt und im Rahmen der Projektarbeit soll dabei mindestens
MySQL unterstützt werden.
• Die Software-Komponente auf dem Server-System soll möglichst klein und minimal-
invasiv sein. Im Rahmen der Projektarbeit soll diese Komponente mindestens auf einem
Linux-System lauffähig sein.
• Derzeit macht der vorgeschlagene Ansatz für die Detektion von SQL Injection und persi-
stenten XSS Attacken Sinn. Überlegen Sie sich zu Beginn der Projektarbeit nochmals ge-
nau, ob sich damit durch kleinen Zusatzaufwand allenfalls auch noch andere Attacken
(z.B. durch Analyse weiterer serverseitiger Logfiles) erkennen können.
• Das Testing-Tool soll sich einfach in ASTF integrieren lassen. Es muss also ein Com-
mand-Line Interface aufweisen und einen strukturierten, im Idealfall XML-basierten,
Output generieren.

03.03.2010, Marc Rennhard (rema) 2/3


• Der ganze Test soll möglichst effizient durchgeführt werden können. Dies bedeutet, dass
auch grosse Webapplikationen in vernünftiger Zeit (z.B. maximal ein paar Stunden) gete-
stet werden können.
• Inwiefern Sie Ihr Testing-Tool auf einem bereits bestehenden Tool basieren wollen
(w3af), ist Ihnen überlassen.
• Als optionale Möglichkeit wurde in der Seminararbeit angegeben, dass die Ergebnisse des
Crawler mit einer Analyse der Konfigurationsfiles des verwendeten Web-Application-
Frameworks noch verbessert werden können. Wird dies in der Projektarbeit verwendet,
dann soll der Fokus auf dem von PrivaSphere verwendeten Struts Framework liegen. Als
weitere Option wäre die Verwendung der Datei validation.xml des Apache Common Va-
lidators Framework denkbar, um Informationen über die Validierung von Input-Daten zu
erhalten.
• Verfassen Sie einen nachvollziehbaren technischen Bericht für den Projekt-internen Ge-
brauch, der sowohl die Vorgehensweise als auch die Ergebnisse enthält. Dieser Bericht
soll in Englisch geschrieben sein.
• Verfassen Sie eine 15-minütige Präsentation, um diese im Institutsrahmen zu präsentie-
ren.

Leistungsnachweis und Bewertung

Die Bewertung der Projektarbeit setzt sich aus folgenden Punkten zusammen:
• Schriftlicher Bericht
• Mündliche Präsentation

Referenzen

[1] Kevin Denver. Preliminary Study of white box security testing for the integration into
the ASTF framework. MSE Seminary Thesis, February 2010.

03.03.2010, Marc Rennhard (rema) 3/3

Das könnte Ihnen auch gefallen