Beruflich Dokumente
Kultur Dokumente
Kevin Denver
This project thesis builds upon the findings of the seminary thesis "Preliminary study of white box
security testing for the integration into the ASTF framework" which has been written at the Institut
für angewandte Informationstechnologie (InIT) by Kevin Denver (Feb. 2010).
A web application security testing framework (WASTF) has been developed during this project thesis
which uses the Java programming language. WASTF is a stand alone application which uses several
open source libraries such as HtmlUnit. HtmlUnit provides JavaScript support which is essential in
thoroughly crawling and analysing today’s web applications.
WASTF includes a plugin which combines black and white box testing techniques for improving the
detection of input validation vulnerabilities as opposed to a sole black box approach. The plugin uses
database query log files to detect web application input validation vulnerabilities in an automated
manner.
Tests have shown that the detection of SQL input validation vulnerabilities are more accurate and
more performant by combining black and white box testing methodology than only using a black box
testing approach. WASTF is 14 times faster than w3af in detecting SQL input validation vulnerabili-
ties.
i
Abstract
Diese Projektarbeit baut auf den Ergebnissen der Seminararbeit mit dem Titel "Preliminary study of
white box security testing for the integration into the ASTF framework" auf. Die Arbeit wurde am
Institut für angewandte Informationstechnologie (InIT) von Kevin Denver geschrieben (Feb. 2010).
Während dieser Projektarbeit wurde ein Framework geschrieben, welches Web Applikationen
automatisiert auf ihre Sicherheit überprüft. Das Web Application Security Testing Framework
(WASTF) ist eine selbständige konsolenbasierte Applikation, welche in Java geschrieben wurde.
WASTF verwendet mehrere Open Source Bibliotheken wie zum Beispiel HtmlUnit. HtmlUnit ist eine
Bibliothek, die erfolgreich Web Browser mit JavaScript Unterstützung emulieren kann. JavaScript
Unterstützung ist essentiell, wenn heutige Web Applikationen gründlich analysiert werden sollen.
WASTF beinhaltet ein Plugin, welches Black- und White-Box Testmethoden miteinander kombiniert.
Durch die Kombination soll die Detektierung von “Input Validation” Schwachstellen im Gegensatz
zu “Black-Box only ” Testmethoden verbessert, respektive präzisiert werden. Das Plugin verwendet
Datenbank Query Log Dateien, um SQL “Input Validation” Schwachstellen automatisiert zu
detektieren.
Anschliessende Tests haben gezeigt, dass die Kombination von Black- und White-Box Testmethoden
die Genauigkeit und die Geschwindigkeit der Detektierung von Schwachstellen erhöhen kann.
In einem durchgeführten Test konnte gezeigt werden, dass WASTF 14 mal schneller ist als ein
vergleichbarer “Black-Box only” Web Applikations Scanner (w3af).
ii
Contact
E-Mail: info.init@zhaw.ch
http://www.zhaw.ch
http://www.init.zhaw.ch
iii
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Application Requirements 15
3.1 Overall Goal & Project Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 Usability Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.3 Reliability Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.4 Performance Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.5 Supportability Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 User Stories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 The Database Query Log & How to Detect Input Validation Vulnerabilities 23
4.1 Configuration of the Database Query Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.1 MySQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.2 PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.3 Microsoft SQL Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
iv
Contents
v
Contents
9 Test Series 78
9.1 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.1.1 Base System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.1.2 Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.1.3 Usernames and Passwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.2 Web Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.2.1 Index & Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
9.2.2 phpMyAdmin Version 3.3.2deb1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9.2.3 WIVET Version 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9.2.4 Damn Vulnerable Web Application (DVWA) Version 1.0.6 . . . . . . . . . . . . . . . . . . 81
9.2.5 Wordpress Version 3.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.2.6 Magento Version 1.4.1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.2.7 Flowershop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9.3 Test Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.3.1 Test Set-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
9.3.2 Test Case: WIVET [Web Spider only] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.3.3 Test Case: Magento [Web Spider only] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.3.4 Test Case: Wordpress [Web Spider only] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
9.3.5 Test Case: Flowershop [Web Spider only] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.3.6 Test Case: DVWA [Web Form Login only] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
9.3.7 Test Case: Secure Messaging (Privasphere) [Web Form Login only] . . . . . . . . . . . 87
9.3.8 Test Case: DVWA [Database Query Log - Online Mode] . . . . . . . . . . . . . . . . . . . 87
9.3.9 Test Case: DVWA [Database Query Log - Offline Mode] . . . . . . . . . . . . . . . . . . . 89
9.3.10 Test Case: Wordpress [Web Spider & Database Query Log - Online Mode] . . . . . . 89
9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
vi
Contents
11 Summary 105
11.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
11.2 Further Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
11.2.1 Application Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
11.2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Appendix 108
List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
List of listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
vii
Contents
• Chapter 1: The first chapter gives a brief overview over the main aspects of this project thesis. This
includes the motivation behind the project and how combining white and black box testing techniques
improve the detection of injection flaws in web applications. Additionally this chapter contains the
limitations which have been left aside.
• Chapter 2: Summarises the findings of [4] which serve as a basis for the development of a combined
white and black box web application security testing application throughout this project thesis.
• Chapter 3: Documents the first set of requirements for a combined white and black box security testing
application used for detecting injection flaws in web applications. The chapter contains a list of user
stories and application requirements the security testing application needs to address.
• Chapter 4: This chapter is all about configuring and parsing database query logs from various database
products like MySQL, PostgreSQL, Orcale and Mircrosoft SQL Server. Most if not all recent database
solutions provide some kind of logging facility to monitor processed SQL queries coming from clients.
However, the metadata adjoining the logged SQL queries differ greatly in the level of detail between
various database products.
• Chapter 5: Evaluates w3af as a basis for a combined white and black box security testing application.
w3af is an actively developed open source project with a lot of existing plugins including a web crawler
component and several code injection modules. It might be possible to enhance w3af with the missing
white box functionality.
• Chapter 6: Outlines the design decisions and overall package structure of the Web Application Security
Testing Framework (WASTF) which has been developed during this project thesis. The framework builds
the foundation for the combined white and black box web application security testing plugin. One pos-
sibility would have been to extend the w3af framework but this has been ruled out in chapter 5 because
of w3af’s shortcomings in certain areas such as the web spider plugin.
• Chapter 7: Describes the open source HtmlUnit project and the benefits it provides for testing web
applications automatically. Additionally, the design and performance of the web spider plugin, which
has been developed as part of this project thesis, are being discussed and visualised.
• Chapter 8: The following chapter outlines the final implementation details of the previous elaborated
combined white and black box plugin based on parsing database query log files. Especially how the
plugin uses SQL syntax validation libraries to find actually working exploits for found SQL injection vul-
nerabilities.
• Chapter 9: Is concerned with testing the WASTF application thoroughly. Especially in comparison with
standard black box web application security testing tools such as w3af. The goal is to provide a detailed
comparison between WASTF and w3af in regard of sent HTTP(S) requests, number of detected vulnera-
bilities, web pages discovered etc.
• Chapter 10: This chapter contains the user manual and the development guide for WASTF. The user
manual leads new users trough the installation process of WASTF and helps to get familiar with its com-
mands. The development guide is for advanced users who want to extend the functionality of WASTF
and build it from source.
• Chapter 11: Recapitulates the findings and limitations of this project thesis and proposes a list of further
steps for the project. These steps include some architectural refactoring and adding new features for
extending the possible applications of WASTF.
viii
1 Introduction
This first chapter gives a brief overview over the main aspects of this project thesis. This includes the
motivation behind this project and how combining white and black box testing techniques improve
the detection of injection flaws in web applications.
1.1 Motivation
The Automated Security-Testing Framework (ASTF) which has been developed at the Institut für
angewandte Informationstechnologie (InIT) as part of a KTI-Project, combines different security and
non-security testing tools. The main focus of ASTF lies in doing continuous and reproducible soft-
ware tests during the lifetime of an application. ASTF relies on third party tools which perform the
actual tests. Tools like nmap1 or Nessus2 are used to scan for open ports or known security holes.
The detection of typical web application vulnerabilities like Cross Site Scripting and SQL Injection
are covered by w3af3 . Yet all of these tools take an external perspective of the test object to derive test
cases. This method of testing is also called black box testing. While this method can uncover flaws in
the application, one cannot be sure that all existent paths are tested. White box testing on the other
hand takes more of an internal perspective on the test object and uses as much additional informa-
tion as possible to derive test cases. Such additional information can consist of the actual source code
of the application or metadata about the internal structure of the application.
One of the future goals of ASTF is to provide a mechanism to conduct white box security tests in an
automated way on different kinds of web applications whether they are written in PHP, Ruby, Java
or any other kind of server side scripting language. The objective of this project thesis is to build a
working web application security testing tool by combining white and black box testing techniques
as described in [4].
1.2 Limitations
A combined white and black box security testing application for detecting input validation vulnera-
bilities in web applications has been written during this project thesis with the following limitations:
• The database query log module only supports the MySQL database so far.
• The vulnerability detection routine does not consider different encodings of input characters.
If a web application consequently filters input characters such as < but not the URL encoded
1 http://nmap.org [15.02.2010]
2 http://www.nessus.org/nessus/ [15.02.2010]
3 http://w3af.sourceforge.net/ [15.02.2010]
Chapter 1. Introduction | 1
equivalent %3C the application is still vulnerable but the routine will not detect the vulnerability
yet.
These limitations are merely missing features and can be easily added to the existing application in
another development cycle.
There exist a fairly large number of white papers on the topic of white box security testing of web
applications. Yet all of these papers focus either on:
• parsing the source code of the application and deriving test cases out of it (see [16],[6])
• describing an application layout for writing a test management system (see [8, p.5])
• retrieving an object model of a web application containing all static and dynamic pages as well
as the allowed navigation links between the pages (see [15])
At least none of the white papers listed in the reference section of this study try to combine a classical
black box security testing approach with additional information sources coming from an internal
perspective of the test object. The next chapter summaries the identified information sources as
illustrated in [4] and how these sources are being used in this project thesis to enhance the detection
of injection flaws in web applications.
Chapter 1. Introduction | 2
2 Combined White and Black Box Security Testing
The following chapter summarises the findings of [4] which serve as a basis for the development of a
combined white and black box web application security testing application throughout this project
thesis. This chapter describes shortly the information sources available to a white box security testing
agent given he has access to the machine on which the test object is deployed. Based on the list of
available information sources, one of many useful test cases will be described where the additional
information comes in handy in order to improve the test results coming from a black box only test. By
using the additional information it should be possible to detect bugs and flaws in the tested applica-
tion which would not have surfaced by using a sole black box approach to testing. Generally speaking
the results should be more detailed and more extensive.
Information sources are available to a white box security testing agent by introducing an internal
perspective on the test object. These information sources should help to improve the overall test
results obtained from a black box test. This chapter tries to list and describe some of the possible
information sources available to a test agent given he has access to the machine on which the test
object is deployed.
The following list is certainly not complete but gives a good overview over some possible sources
which could be accessed during a combined black and white box test. Depending on the pro-
gramming language and web application framework used to develop the application, even more
information sources might become available such as configuration files or framework specific code
annotations. There exists a sheer uncountable amount of different web programming languages and
frameworks like Spring1 , Struts2 and Wicket3 for Java and CakePHP4 for the PHP programming lan-
guage. Identified information sources can be used before the actual test run begins, during the test
run as a real-time feedback from the test object or after the test run has been finished. Information
sources which are accessed before the test begins might be used to derive tailored test cases for the
test object. The information sources can generally be divided into two categories: static information
like configuration files and dynamic sources like log files.
Let’s start with something obvious. A test agent might have access to the source code management
server which contains the latest source code of the test object. With access to the source code reposi-
tory the test agent has automatically the possibility to query the following metadata:
By looking closely at the source code the test agent might even find more metadata inside the code.
Especially classes and methods written in the Java programming language could contain Java anno-
tations. Java annotations were introduced in version 1.5 and come in very handy for analysing source
code. Annotations do not directly affect program semantics, but they do affect the way programs
are treated by tools and libraries, which can in turn affect the semantics of the running program.
Annotations can be read from source files, class files, or reflectively at run time. Listing 2.1 shows a
simple Java EJB Bean with security annotations @RolesAllowed on line 2 and @PermitAll on line
9. These are used to inform the Java Bean container only to allow users with the bankemployee
role to access these functions; except for the method findCustomer which can be accessed by any
user regardless of his role. This concept of adding metadata to the source code is certainly not new
and other programming languages might have similar features available. Any kind of source code
metadata is useful in trying to derive test cases tailored to the test object.
1 @Stateless
2 @RolesAllowed ( " bankemployee " )
3 public c l a s s BankServiceBean implements BankService {
4
9 @PermitAll
10 public Customer findCustomer ( i n t c u s t I d ) {
11 return ( ( Customer ) em. f i n d ( Customer . c l a s s , c u s t I d ) ) ;
12 }
13 public void addCustomer ( i n t custId , S t r i n g firstName , S t r i n g lastName ) {
14 c u s t = new Customer ( ) ;
15 cust . setId ( custId ) ;
16 c u s t . setFirstName ( firstName ) ;
Another approach for the derivation of dynamically created and tailored test cases is presented in the
white paper of Patrice Godefroid et al. (see [6]) by analysing the test object at runtime. Taken from
the abstract:
Fuzz testing is an effective technique for finding security vulnerabilities in software. Tra-
ditionally, fuzz testing tools apply random mutations to well-formed inputs of a program
and test the resulting values. We present an alternative white box fuzz testing approach
inspired by recent advances in symbolic execution and dynamic test generation. Our ap-
proach records an actual run of the program under test on a well-formed input, symbol-
ically evaluates the recorded trace, and gathers constraints on inputs capturing how the
program uses these. The collected constraints are then negated one by one and solved
with a constraint solver, producing new inputs that exercise different control paths in the
program. This process is repeated with the help of a code-coverage maximizing heuristic
designed to find defects as fast as possible.
The only disadvantage of this approach is that the authors focused on applications which are be-
ing compiled and are available to the test agent in executable machine code. In the case of web
applications a similar approach might work by adding profiling modules to the Zend engine5 in
case the application is written in PHP, the Zend engine is a compiler for the PHP programming lan-
guage and is included in the mod_php module of the Apache webserver, or by using profiling appli-
cations for the Java programming language. For a list of available open source Java profilers visit:
http://java-source.net/open-source/profilers. For example λProbe6 is a self sufficient
web application, which helps to visualize various parameters of Apache Tomcat instance in real time.
λProbe is designed to work specifically with Tomcat (JBoss compatibility has been added recently) so
it is able to access far more information that is normally available to JMX agents. The JMX technology
provides the tools for building distributed, web based, modular and dynamic solutions for managing
and monitoring devices, applications, and service-driven networks. The following list is a subset of
the features available through λProbe:
5 http://www.zend.com/en/ [21.02.10]
6 http://www.lambdaprobe.org [21.02.10]
• Display of deployed applications, their status, session count, session object count, context ob-
ject count, datasource usage etc.
• Display of session attributes and their values for a particular application. Ability to remove
session attributes
• Ability to group datasource properties by URL to help visualizing impact on the databases
• Display of system information including System.properties, memory usage bar and OS details
• Real time OS memory usage, swap usage and CPU utilisation monitoring
• Ability to show information about log files and download selected files
Configuration files of daemons or any other kind of application contain valuable information as well.
Depending on the application such configuration files might contain a list of users which are allowed
to access the service or a list of modules which are enabled etc. Of course the kind of information
stored in configuration files differs greatly between applications and the possibility is high that they
won’t be useful at all.
Trace files are text files, which are automatically created by an application to record error messages
that are useful when troubleshooting a problem. Whenever an error occurs in the application, it
creates a trace file to write the error message. Depending on the type of the error message, the appli-
cation might create two types of trace files:
1. Background trace files: Background trace files are created to record exceptions and errors gen-
erated during operations. Whenever background processes are unable to function normally
due to any reason, they create background trace files.
If the test application uses the Apache webserver then there exists the possibility to enable detailed
logging of the requests received by the daemon. The Apache access.log records all requests
processed by the server. The format of the access log is highly configurable. The format is specified
using a format string that looks much like a C-style printf(1) format string. See the Apache
webserver documentation for more details on how to configure the format of the output string:
http://httpd.apache.org/docs/2.0/logs.html
The error.log is the place where the Apache webserver will send diagnostic information and
record any errors that it encounters in processing requests. It is the first place to look when a problem
occurs with starting the server or with the operation of the server, since it will often contain details of
what went wrong and how to fix it. The format of the error log is relatively free-form and descriptive.
But there is certain information that is contained in most error log entries. A very wide variety of
different messages can appear in the error log. The error log will also contain debugging output from
CGI scripts. Any information written to stderr by a CGI script will be copied directly to the error log.
It is not possible to customise the error log by adding or removing information. However, error log
entries dealing with particular requests have corresponding entries in the access log. For example,
the example entry in listing 2.3 corresponds to an access log entry with status code 403. Since it is
possible to customise the access log, you can obtain more information about error conditions using
that log file. During testing, it is often useful to continuously monitor the error log for any problems.
The optional mod_log_forensic module is an often forgotten yet very handy tool in debugging the
Apache webserver. It gives each request a unique id which can then be used to track through the log
file. It first writes the request prefixed with the unique id, then it writes the same id once the request
is completed. Very useful to spot scripts which never finish, be it due to client or server issues. Listing
2.4 shows an entire request including browser information, cookies etc. The only downside to this
module is that the output format is fixed and can not be changed.
Resource monitors provide an ongoing look at processor activity in real time. Listing various system
information like the most CPU-intensive task, memory consumption, network activity etc. The most
famous tool under Linux is the top command. See listing 2.5 for a sample output.
Depending on the framework the developer used to implement the test object, there might exist ad-
ditional configuration files especially needed for the framework. For example: if the web application
uses the Spring framework prior to version 2.5, the developer had to configure URL mappings inside
a XML configuration file. If a user wants to access a specific resource by entering an URL, the Java
Bean container loads the Java object associated with the given URL. Nowadays this is also configured
by using Java annotations directly inside the source code. See listing 2.6 for an example of such a
mapping XML file.
Listing 2.6: Simple Spring URL mapping example prior to version 2.5
Looking yet at another web application framework like Struts for the Java programming language
one finds a file called struts.xml. The Struts 2 framework uses a configuration file struts.xml to
initialize its own resources. These resources include:
• Action classes that can call business logic and data access code
• Results that can prepare views using JavaServer pages, Velocity and FreeMarker templates.
At runtime, there is a single configuration for an application. Prior to runtime, the configuration
is defined through one or more XML documents, including the default struts.xml document.
Listing 2.7 on pgae 9 shows an example struts.xml file. The <action> tag is used by the Struts
controller to determine which view to return to the client based on the given path respectively URL.
The information stored in the struts.xml file is very interesting for webcrawlers which can find
additional paths which might not have been found by the usual black box crawling.
1 ...
2 <package name=" Customization " namespace=" / customization " extends="NG">
3
4 <g l o b a l −r e s u l t s>
5 <r e s u l t name = "common−e r r o r ">/WEB−INF/web/ j s p /common/commonError . j s p</ r e s u l t>
6 <g l o b a l −r e s u l t s>
7
27 </ package>
28 ...
The Struts framework has even more interesting configuration files. Struts uses another XML
file called validation.xml for validating form fields. Listing 2.8 on pgae 10 shows an example
validation.xml file. The validate functionality can be used to validate the data on the users
browser as well as on the server side. The Struts framework emits the Javascript and it can be used to
validate the form data on the client browser. Validator uses the XML file to pickup the validation rules
to be applied to a form. In XML validation requirements are defined applied to a form. The Validator
framework uses two XML configuration files validator-rules.xml and validation.xml. The
validator-rules.xml defines the standard validation routines, these are reusable and used
in validation.xml to define the form specific validations. The validation.xml defines the
validations applied to a form bean. The information provided by the validation.xml file could
be used before an actual test run begins in order to derive especially tailored test cases for a specific
form. By knowing what kind of validation the application respectively form uses, it might be possible
to create a malicous input which slips trough the validation framework.
If the test agent has access to the data storage solution, he is able to browse trough all the previously
collected data of the object. Depending on the test object this can consist of user credentials, pur-
chases made, stock levels etc. Given the storage solution is some kind of relational database like
MySQL7 or PostgreSQL8 then there exists the possibility to enable the detailed logging of queries
made to the database. See listing 2.9 for a MySQL query log example. Taken from the MySQL ref-
erence manual (see [12]):
7 http://www.mysql.com/ [17.02.09]
8 http://www.postgresql.org/ [17.02.09]
Additionally, the test agent is able to gather information about the database tables he has access
to. Most common relational databases implement the describe command. The describe SQL
command is used to list all of the fields in a table and the data format of each field. See listing 2.10 for
an example.
1 mysql> show t a b l e s ;
2 +−−−−−−−−−−−−−−−−−−−−−−−−−−−+
3 | Tables_in_WebApp |
4 +−−−−−−−−−−−−−−−−−−−−−−−−−−−+
5 | Category |
6 | Priority |
7 | Type |
8 | Users |
9 +−−−−−−−−−−−−−−−−−−−−−−−−−−−+
10 4 rows in s e t ( 0 . 0 0 sec )
11
12 mysql> d e s c r i b e Category ;
13 +−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−+−−−−−−−−−−−−−−−−+
14 | Field | Type | Null | Key | D e f a u l t | E x t r a |
15 +−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−+−−−−−−−−−−−−−−−−+
16 | id | int (11) | NO | PRI | NULL | auto_increment |
17 | category_name | varchar ( 2 5 5 ) | YES | UNI | NULL | |
18 +−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−+−−−−−−−−−−−−−−−−+
19 2 rows in s e t ( 0 . 0 0 sec )
If the test agent has access to the test or production machine he is able to use network recording
tools like Tcpdump9 or Wireshark10 to record all the network traffic coming in and out of the test
or production machine. Tcpdump prints out a description of the contents of packets on a network
interface that match a user specified boolean expression. It can also save the packet data to a file for
later analysis. The exact contents of the packets sent and received over the specified network interface
9 http://www.tcpdump.org/ [22.02.09]
10 http://www.wireshark.org/ [22.02.09]
The focus of this project thesis lies on the detection of injection flaws, such as SQL (SQLi) and stored
Cross Site Scripting (XSS) injection flaws, by parsing the query log file generated by a database (see
section 2.1.8). This section illustrates a possible way to achieve a combined white and black security
testing tool. The necessary steps and technologies needed to implement such a combined testing
tool are the same as in the following example of a web application profiling component.
Web application profiling is usually done by using a web crawler in order to discover and step trough
every possible site respectively form of the web application. A web crawler is one type of bot, or
software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits
these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called
the crawl frontier. If the test agent uses a black box approach he is only able to map a discovered site
to a discovered or manually entered URL. By using the previously identified information sources due
to a mixed black and white box approach, a much more elaborate web application profiling becomes
possible.
Given the web application uses some kind of database which allows an administrator to log all the
SQL queries the database receives into a file (see section 2.1.8 on page 10) and the test agent is able to
read and parse the generated log file, then the test agent is not only able to map HTTP(S) POST and
GET requests to the response received from the webserver but also potential SQL statements sent
from the webserver to the database triggered by the previously sent HTTP(S) request. Figure 2.1 on
page 13 illustrates the described approach.
1. The test agent sends a HTTP(S) GET or POST request to the web application on the test or
production machine. Depending on the HTTP(S) request the web application needs to query
the database for additional data. Let’s say the web application is some kind of e-commerce
site and the HTTP(S) request triggered the application to list all the recorded customers which
last names start with a “K”. The web application builds a SQL statement like SELECT * FROM
‘customers‘ cust WHERE ‘lastname‘ LIKE ’K*’ and sends it to the database. The
database receives the query and automatically writes it into the query log file (see section 2.1.8).
2. The test agent accesses the test or production machine either between HTTP(S) requests or
after the web crawler component has finished spidering the web application. It is not necessary
to run a SSH server on the machine, the test agent solely needs access to the log files.
3. The test agent reads and parses the database query log file and applies a matching algorithm in
order to associate a sent HTTP(S) requests with the executed SQL statements by the database.
Given the test agent is able to access the machine on which the webserver is hosted and the web-
server is also configured to log all the requests it receives and sends into a file (see section 2.1.5 on
page 6) then the test agent is able to make precise statements about the time it took the webserver
to process the HTTP(S) request without the usual fluctuations due to network latency and load (also
used for “Timing Attacks”, see [14, p.124]). Timing Attacks might be the last resort for an attacker
to enumerate usernames from error messages, registration, or password changes if everything else
fails. An attacker calculates the time it takes from an error message to come up for a bad password
versus a bad username. Depending on how the matching algorithm is implemented and types of
technologies that are used, there might be a significant difference in time for the two responses.
Although this technique has a high risk of false positives.
This approach is not only useful for profiling a web application but it can be used to enhance the
detection of common injection flaws.
SQL Injection:
The detection of possible SQL Injection (SQLi) attack vectors can be improved by parsing the database
query log file. SQLi is a code injection technique that exploits a security vulnerability occurring in the
database layer of an application. The vulnerability is present when user input is either incorrectly
filtered for string literal escape characters embedded in SQL statements or user input is not strongly
typed and thereby unexpectedly executed. The procedure is basically the same as in the web applica-
tion profiling scenario described in section 2.2.1. Let’s use the e-commerce example again. The test
agent found that HTML form mentioned in section 2.2.1 which triggers the following SQL statement
SELECT * FROM ‘customers‘ cust WHERE ‘lastname‘ LIKE ’<HTML_FORM_FIELD>*’. As
you can see this is slightly different than before because a text field from that HTML form is now
If we want to use the same procedure in order to improve the detection of Cross Site Scripting (XSS)
flaws then this only works for the the detection of persistent XSS attacks. XSS is a type of computer
security vulnerability typically found in web applications which enable malicious attackers to inject
client-side script into web pages viewed by other users. The non-persistent (or reflected) XSS vulner-
ability is by far the most common type (see [7]). These holes show up when data provided by a web
client, most commonly in HTTP(S) query parameters or in HTML form submissions, is used immedi-
ately by server-side scripts to generate a page of results for that user, without properly sanitising the
response. If the web application is vulnerable to a reflected XSS attack, then there are no additional
information sources available which would not be available to a black box test as well. The persis-
tent (or stored) XSS vulnerability is a more devastating variant of a cross-site scripting flaw: it occurs
when the data provided by the attacker is saved by the web application, and then permanently dis-
played on “normal” pages returned to other users in the course of regular browsing, without proper
HTML escaping. This resembles the SQLi approach from above. The malicious input (now JavaScript
code instead of malicious SQL code) would show up in the database query log and hence alleviate
the detection. There is one little drawback to this conclusion. Malicious XSS attack code showing
up unfiltered in the database query log is not yet a 100 percent proof that the web application is vul-
nerable to such an attack. The web application might save everything unfiltered in the database but
might apply filtering and validation before displaying the data to the user. This means that the HTML
output would have to be considered as well to make a meaningful statement without generating false
positives.
This chapter is devoted to document the first set of requirements for a combined white and black box
security testing application used for detecting injection flaws in web applications. This chapter con-
tains a list of use cases and application requirements the security testing tool needs to address. The
use cases are written with a possible security testing agent in mind which uses the to be developed
testing application.
The goal at the end of this project thesis is to provide a working white and black box security testing
application for detecting injection flaws of web applications by additionally parsing a database
query log file as described in 2.2.2 on page 13. Whether an existing framework like w3af is extended
by adding the missing features (mostly the parsing of database query log files) as plugins to the
framework or if a completely new framework should be written will be considered in chapter 5 on
page 40. In the end of this project thesis the tool should be in a state where it can be productively
used against real life web applications of various sizes.
Even tough there are many good commercially as well as open source web application security testing
tools available, none of these use a combined white and black box security testing approach besides
scanning the source code of the application. None of the common web application security tools like
Acunetix 6, Rational AppScan, Google Skipfish, w3af, etc. mention such a feature on their feature list
or product homepage. Automated source code scanning is error prone because of several reasons:
• There are way too many programming languages to support them all in one tool. Web applica-
tions can be written in: PHP, Java, Microsoft .NET, Python, Perl, CGI Scripts, Ruby, JavaScript,
Lisp etc. For a more comprehensive list visit: http://www.objs.com/survey/lang.htm.
• There is even a bigger number of web application frameworks which support a developer in
writing web applications. Every one of these frameworks uses different schemes and libraries
for providing database access, site navigation, session handling etc. A simple statical source
code analyser without a data-flow and control-flow analysis module will in most cases not suc-
ceed in detecting vulnerabilities especially if complex libraries are being used to construct the
web application. Security analysers use data-flow analysis primarily to reduce false positives
and false negatives. As a simple (but common) example, many buffer overflows in real code are
unexploitable because the attacker cannot control the data that overflows the buffer. Data-flow
analysis, in this example, can be helpful in distinguishing exploitable from unexploitable buffer
The approach described in 2.2.2 on page 13 tries to mitigate the dependencies imposed by the utilised
programming language and framework used to develop the web application. Parsing the database
query log file requires much fewer dependencies and is independent regarding the used program-
ming language and web application framework. Nevertheless parsing a query log file requires knowl-
edge about the used database product given the web application even uses a data storage solution
with a SQL language interface. All common database solutions like Oracle, MySQL and PostgreSQL
provide one or more of the following output destinations for logging all received SQL queries: sim-
ple text file, comma separated value list and/or database table. Implementing a parser for collecting
processed SQL queries by the database is rather easy compared to the task of writing multiple source
code analysers. Parsing the database query log file is especially useful for detecting any kind of stored
injection flaws given the test agent has access to the query log during or at the end of the security test.
Normal black box web application security testing tools struggle with the detection of SQL Injections
when the web application does not return enough information about application errors which oc-
curred during the execution of the provided input. These are so called Blind SQL Injection vulnerabil-
ities. Blind SQL Injection is identical to normal SQL Injection except that when an attacker attempts
to exploit an application rather than getting a useful error message they get a generic page specified
by the developer instead. This makes exploiting a potential SQL Injection attack more difficult but
not impossible. An attacker can still steal data by asking a series of True and False questions through
SQL statements. By analysing the database query log file there is no need for this kind of detection
scheme. The security testing application can very precisely reveal what kind of SQL statements have
been triggered by sending a malicious HTTP request to the web application. How different database
products like MySQL and PostgreSQL are being configured and how the database query log looks like
can be seen in chapter 4 on page 23.
3.2 Requirements
The following sub sections describe the application requirements shortly by providing a short one
or two sentence statement for each requirement. The requirements are continuously numbered for
later reference. The requirements are divided into four categories: Functional, Usability, Reliability
and Performance requirements according to the FURPS+1 model (see [9, p. 42]).
1 The FURPS+ System for Classifying Requirements: One such classification system was devised by Robert Grady at
Hewlett-Packard. It goes by the acronym FURPS+ which represents: Functionality, Usability, Reliability, Perfor-
mance and Supportability. The "+" in FURPS+ also helps us to remember concerns such as: Design require-
ments,Implementation requirements, Interface requirements and Physical requirements. It is helpful to use FURPS+
categories (or some categorisation scheme) as a checklist for requirements coverage, to reduce the risk of not consider-
ing some important facet of the system.
These set of requirements describe the overall features, capabilities and security requirements of the
application.
a) Starting and conducting a web application security test without user interaction (see
requirement 3.2.1.1)
b) Generating a machine parseable output of the results (e.g XML structured file)
c) The web application security tool has to support the Linux operating system platform
because most of the tools used by the ASTF security testing framework depend on it
Usability requirements describe the requirements in regard to human factors, help and documenta-
tion.
3.2.2.3 Documentation
The code should be documented with the language specific convention, e.g. JavaDoc for Java
applications.
The following use cases describe the requirements regarding frequency of failure, recoverability and
predictability.
Performance requirements tackle response times, throughput, accuracy, availability and resource us-
age issues.
3.2.4.1 Runtime
The security testing tool should be able to produce some meaningful results within a reason-
able amount of time. This depends greatly on the size of the tested web application.
3.2.5.1 Extensibility
The tool should have or should be written with extensibility in mind. Adding new features
and plugins has to be feasible without major code rewriting and refactoring.
The following user stories are short success stories describing specific sequences of actions and in-
teractions between actors and the system under discussion (see [9, p. 48]).
These stories are far from being a complete list of all the features which should be implemented in
the finished product. These features are merely a starting point and new stories will be added on the
go during the project thesis.
3.3.0.3 As a user, I want to access a help menu for the plugin I am currently configuring.
3.3.0.4 As a user, I want to configure a HTTP proxy (with username and password if necessary).
3.3.0.6 As a user, I can stop and later resume a previously stopped run.
3.3.0.7 As a user, I want to see some statistics on how many HTTP requests have been sent.
3.3.0.9 As a user, I can provide a file with commands and start the tool in an automated mode.
3.3.0.10 As a user, I want the spidering process to stop after a configured amount of time (in minutes).
3.3.0.11 As a user, I want the spidering process to ignore or only follow some specific URLs.
3.3.0.12 As a user, I want to see a detailed list of possible attack vectors if a SQL Injection vulnerability
has been found.
Additionally, figure 3.1 shows the external actors which interact with the white box security testing
application. External actors are: the Automated Security-Testing Framework (ASTF), the designated
security testing agent, the targeted web application and the corresponding database storage host. To
visualise the application flow a user experiences while interacting with the application figure 3.2 has
been added. Figure 3.2 shows a simplified UML 2 activity diagram on how a future user interacts
with the white box security testing application by either starting the application in automated or
interactive mode.
This chapter is all about configuring and parsing database query logs from various database prod-
ucts like MySQL and PostgreSQL. This is especially important for the development of the combined
white and black box security testing tool as described in sections 2.2.2 and 3.1. Most if not all recent
database solutions provide some kind of logging facility to monitor processed SQL queries coming
from clients. However, the metadata adjoining the logged SQL queries differ greatly in the level of
detail between various database products.
This section explains how the SQL query log is being configured and activated for the MySQL and
PostgreSQL database. MySQL is a relational database management system (RDBMS) that runs as a
server providing multi-user access to a number of databases. The MySQL development project has
made its source code available under the terms of the GNU General Public License, as well as under
a variety of proprietary agreements. Members of the MySQL community have created several forks
such as Drizzle and MariaDB. Free software projects that require a full featured database manage-
ment system often use MySQL. Such projects include (for example) WordPress, phpBB, Drupal and
other software built on the LAMP1 software stack. MySQL is also used in many high profile, large
scale World Wide Web products including Wikipedia, Google and Facebook2 . PostgreSQL, often sim-
ply Postgres, is an object-relational database management system (ORDBMS). It is released under
an MIT-style license and is thus free and open source software. As with many other open source pro-
grams, PostgreSQL is not controlled by any single company, but has a global community of developers
and companies to develop it. PostgreSQL evolved from the Ingres project at University of California,
Berkeley.
4.1.1 MySQL
The general query log is a general record of what the MySQL daemon (mysqld) is doing. The server
writes information to this log when clients connect or disconnect, and it logs each SQL statement
received from clients. The general query log can be very useful when you suspect an error in a client
1 LAMP is an acronym for a solution stack of free, open source software, originally coined from the first letters of Linux
(operating system), Apache HTTP Server, MySQL (database software), and PHP, Python or Perl (scripting language),
principal components to build a viable general purpose web server
2 http://www.mysql.com/why-mysql/case-studies/ [16.04.10]
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 23
and want to know exactly what the client sent to the database daemon. The MySQL daemon writes
statements to the query log in the order that it receives them, which might differ from the order in
which they are executed. This logging order contrasts to the binary log, for which statements are
written after they are executed but before any locks are released. Also, the query log contains all
statements, whereas the binary log does not contain statements that only select data (taken from
[12]).
The possible settings for managing the query log might differ based on the used version of the MySQL
database server. The following settings are meant to be used with the latest MySQL Community
Server. The currently available version as of April 2010 is 5.1.45. The general query log can either
be enabled trough the normal MySQL configuration file or dynamically at runtime trough specific
SQL commands. The following sub sections are taken from the MySQL manual (see [12]).
Enabling the Query Log trough the MySQL Configuration File & Command Line Arguments
In a default Ubuntu Linux installation (version 9.10 as of April 2010) the MySQL configuration file
is located in: /etc/mysql/my.cnf. The configuration file can be found under the following path
C:\Program Files\MySQL\my.cnf for Windows users. The following settings need to bet set in
order to start logging the SQL queries:
• Before 5.1.6, the general query log destination is always a file. To enable the log, start mysqld
with the –log[=file_name] or -l [file_name] option.
• As of MySQL 5.1.6, the destination can be a file or a table, or both. Start mysqld with the
–log[=file_name] or -l [file_name] option to enable the general query log, and option-
ally use –log-output to specify the log destination.
• As of MySQL 5.1.29, use –general_log[=0|1] to enable or disable the general query log, and
optionally –general_log_file=file_name to specify a log file name. The –log and -l op-
tions are deprecated.
Listing 4.1 shows an excerpt of the mysqld configuration file. These settings causes the
MySQL daemon to log all received SQL queries into /var/log/mysql/query.log and into the
mysql.general_log table.
1 [ mysqld ]
2 log_output = FILE , TABLE
3 general_log = 1
4 general_log_file = / var / l o g / mysql / query . l o g
Listing 4.1: Enabling of the MySQL query log trough the configuration file
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 24
Enabling the Query Log Dynamically At Runtime
For runtime control of the general query log, use the global general_log and general_log_-
file system variables. Set general_log to 0 (or OFF) to disable the log or to 1 (or ON) to enable
it. Set general_log_file to specify the name of the log file. If a log file already is open, it is
closed and the new file is opened. When the general query log is enabled, output is written to any
destinations specified by the –log-output option or log_output system variable. If you enable
the log, the server opens the log file and writes startup messages to it. However, further logging
of queries to the file does not occur unless the FILE log destination is selected. If the destination
is NONE, no queries are written even if the general log is enabled. Setting the log file name has no
effect on logging if the log destination value does not contain FILE. Server restarts and log flush-
ing do not cause a new general query log file to be generated (although flushing closes and reopens it).
As of MySQL 5.1.12, you can disable the general query log at runtime: SET GLOBAL general_log
= ’OFF’; With the log disabled, rename the log file externally; for example, from the command
line. Then enable the log again: SET GLOBAL general_log = ’ON’; This method works on any
platform and does not require a server restart. The session sql_log_off variable can be set to ON
or OFF to disable or enable general query logging for the current connection. The general query log
should be protected because logged statements might contain passwords.
Listing 4.2 shows the order of SQL commands needed to enable the general query log at runtime and
disregard any existing settings.
1 # mysql −u r o o t −p
2 mysql> FLUSH LOGS ;
3 mysql> SET GLOBAL g e n e r a l _ l o g = ’OFF ’ ;
4 mysql> SET GLOBAL log_output = ’ FILE , TABLE ’ ;
5 mysql> SET GLOBAL g e n e r a l _ l o g _ f i l e = ’ / var / l o g / mysql / query . log ’ ;
6 mysql> SET GLOBAL g e n e r a l _ l o g = ’ON’ ;
As seen in listing 4.1 MySQL allows the user to change the default general log output from a
file to a database table. Per default MySQL uses the mysql.general_log table. Listing 4.3 shows
the overall structure of this table and what kind of information is being logged by the MySQL daemon.
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 25
7 ‘ command_type ‘ VARCHAR( 6 4 ) NOT NULL,
8 ‘ argument ‘ MEDIUMTEXT NOT NULL
9 ) ENGINE=CSV DEFAULT CHARSET=u t f 8 COMMENT= ’ General l o g ’ |
Storing the database query log in a database table is very convenient because accessing and searching
for specific entries in the stored data becomes trivial by sending appropriate SQL commands as seen
in listing 4.4.
1 mysql> SELECT argument FROM mysql . g e n e r a l _ l o g WHERE command_type = ’ Query ’ ;
2 +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
3 | argument |
4 +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
5 | r o o t @ l o c a l h o s t on |
6 | s e l e c t @@version_comment l i m i t 1 |
7 | SHOW CREATE TABLE mysql . g e n e r a l _ l o g |
8 | DESCRIBE mysql . g e n e r a l _ l o g |
9 +−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
10 4 rows in s e t ( 0 . 0 0 sec )
4.1.2 PostgreSQL
Enabling the query log dynamically at runtime as seen in 4.1.1 is not supported by PostgreSQL as
opposed to MySQL. The currently available version as of April 2010 is 8.4. The query log can only be
enabled trough the PostgreSQL configuration file which entails a server restart after modifying the
PostgreSQL configuration file.
In a default Ubuntu Linux installation (version 9.10 as of April 2010) the PostgreSQL con-
figuration file is located in:/etc/postgresql/8.x/main/postgresql.conf. Win-
dows users find the configuration file under the following directory: C:\Program
Files\PostgreSQL\8.x\main\postgresql.conf. Listing 4.5 shows an excerpt of the Post-
greSQL configuration file and the needed settings to activate the query log. PostgreSQL supports
several methods for logging server messages, including stderr, csvlog and syslog. On Windows,
eventlog is also supported. The default is to log to stderr only. This parameter can only be
set in the postgresql.conf file or on the server command line. If csvlog is included in log_-
destination, log entries are output in “comma separated value” format, which is convenient for
loading them into programs. See paragraph Using The CSV-Format Log Output in section 4.1.2
for details. logging_collector must be enabled to generate CSV-format log output. When
logging_collector is enabled, this parameter determines the directory in which log files will
be created. It can be specified as an absolute path, or relative to the cluster data directory. When
logging_collector is enabled, the log_filename parameter sets the file names of the created
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 26
log files. The value is treated as a strftime pattern, so %-escapes can be used to specify time-varying
file names. If CSV-format output is enabled in log_destination, .csv will be appended to the
timestamped log file name to create the file name for CSV-format output (taken from [13]).
1 log_destination = ’ csvlog ’
2 logging_collector = on
3 log_directory = ’ pg_log ’
Listing 4.5: Enabling the query log trough the PostgreSQL configuration file
Including csvlog in the log_destination list provides a convenient way to import log files into a
database table. This option emits log lines in comma-separated-value format, with these columns:
timestamp with milliseconds, user name, database name, process ID, host:port number, session ID,
per-session or -process line number, command tag, session start time, virtual transaction ID, regular
transaction id, error severity, SQL state code, error message detail, hint, internal query that led to
the error (if any), character count of the error position thereof, error context, user query that led to
the error (if any and enabled by log_min_error_statement), character count of the error position
thereof, location of the error in the PostgreSQL source code (if log_error_verbosity is set to
verbose). Listing 4.6 shows a single log entry produced by the PostgreSQL daemon given csvlog is
enabled.
1 2010−04−13 1 6 : 1 6 : 3 0 . 4 1 0 CEST , " p o s t g r e s " , " p o s t g r e s " , 3 7 0 5 , " [ l o c a l ] " , 4 bc47caf . e79 , 2 , "SHOW" , \
2 2010−04−13 1 6 : 1 6 : 1 5 CEST , 1 / 9 , 0 ,ERROR, 4 2 7 0 4 , " unrecognized c o n f i g u r a t i o n parameter \
3 "" database " " " , , , , , , " show database ; " , ,
Listing 4.6: Content of the PostgreSQL query log file (CSV-format output)
Listing 4.7 shows the table definition which can be used to store the CSV-formatted query log (taken
from [13]).
1 CREATE TABLE p o s t g r e s _ l o g (
2 log_time TIMESTAMP( 3 ) WITH TIME ZONE, user_name TEXT , database_name TEXT ,
3 p r o c e s s _ i d INTEGER , connection_from TEXT , s e s s i o n _ i d TEXT ,
4 session_line_num BIGINT , command_tag TEXT ,
5 s e s s i o n _ s t a r t _ t i m e TIMESTAMP WITH TIME ZONE, v i r t u a l _ t r a n s a c t i o n _ i d TEXT ,
6 t r a n s a c t i o n _ i d BIGINT , e r r o r _ s e v e r i t y TEXT , s q l _ s t a t e _ c o d e TEXT ,
7 message TEXT , d e t a i l TEXT , h i n t TEXT , i n t e r n a l _ q u e r y TEXT ,
8 i n t e r n a l _ q u e r y _ p o s INTEGER , c o n t e x t TEXT , query TEXT ,
9 query_pos INTEGER , l o c a t i o n TEXT ,
10 PRIMARY KEY ( s e s s i o n _ i d , session_line_num ) ) ;
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 27
4.1.3 Microsoft SQL Server
The Microsoft SQL Server also implements a method to log and analyse processed SQL queries as
of Microsoft SQL Server 2008. The so called “SQL Server Profiler” shows how SQL Server resolves
queries internally. This allows administrators to see exactly what Transact-SQL statements or
Multi-Dimensional Expressions are submitted to the server and how the server accesses the database
or cube to return result sets. Using SQL Server Profiler, administrators can do the following: a) Create
a trace that is based on a reusable template, b) watch the trace results as the trace runs, c) store the
trace results in a table, d) start, stop, pause, and modify the trace results as necessary, e) replay the
trace results3 .
For the most part, Profiler is an administrative tool that requires a bit of experience to master.
Fortunately, Profiler provides a graphical interface, which makes both learning and monitoring much
simpler. However, Profiler requires 10 MB of free space; if free space falls below 10 MB, Profiler stops.
To access Profiler, you must be the administrator or have permission to connect to a specific instance
of SQL Server and have granted permissions to execute Profiler stored procedures.
From the Start menu, locate Microsoft SQL Server among your available programs and then click
Profiler from the SQL Server group. In Enterprise Manager, choose SQL Profiler from the Tools menu.
From the File menu, choose New, select Trace from the submenu, identify the appropriate SQL Server
instance, and click OK. Use the resulting Trace Properties dialog box and its four tabs to initiate the
process. Set the following options on the General tab4 :
• Name the trace and identify the server on which you will run the trace. The Trace SQL Server
property defaults to the instance identified earlier.
• Use the Template Name control’s drop-down list to choose one of the available templates. If
you create a template, be sure to specify the path to that file (tdf extension). You can add a
default template via the Options menu off the Tools menu.
• Save the trace to a file, reducing overhead on the server. Selecting this option enables the two
check boxes immediately below: The Enable File Roller option permits you to open a new file
for the trace once the original file is full, and Server Processes SQL Server Traces Data indicates
whether the server or the client application should perform the trace. Performing lengthy com-
plicated event tracing on the server can reduce performance. Saving the trace to a table is an
alternate to saving the trace to a file. This again can have performance implications on a busy
server. Specifically, a table trace requires more overhead.
• The final option, Enable Trace Stop Time, allows you to determine when the trace ends.
For more information about using the SQL Server Profiler and the layout of the generated text file go
to http://msdn.microsoft.com/en-us/library/ms187929.aspx.
3 Taken from http://msdn.microsoft.com/en-us/library/ms187929.aspx [26.04.10]
4 Taken from http://articles.techrepublic.com.com/5100-10878_11-5054787.html [26.04.10]
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 28
4.1.4 Oracle Database Standard Edition
The database query log is called “Audit Trail” in the Oracle jargon. In Oracle9i Database and below,
auditing captures only the “who” part of the activity, not the “what”. With the arrival of 10g, these
limitations are gone, thanks to two significant changes to the auditing facility: because two types
of audits are involved - the standard audit (available in all versions) and the fine-grained audit
(available in Oracle9i and up)5 .
Auditing is disabled by default, but can be enabled by setting the AUDIT_TRAIL static parameter,
which has the following allowed values.
• db or true - Auditing is enabled, with all audit records stored in the database audit trial
(SYS.AUD$).
• db,extended - As db, but the SQL_BIND and SQL_TEXT columns are also populated.
• xml - Auditing is enabled, with all audit records stored as XML format OS files.
• xml,extended - As xml, but the SQL_BIND and SQL_TEXT columns are also populated.
• os - Auditing is enabled, with all audit records directed to the operating system’s audit trail.
The static AUDIT_FILE_DEST parameter specifies the OS directory used for the audit trail when
the os, xml and xml,extended options are used. It is also the location for all mandatory auditing
specified by the AUDIT_SYS_OPERATIONS parameter.
Standard auditing, implemented by the SQL command AUDIT, can be used to quickly and easily set
up tracking for a specific object. For instance, if you wanted to track all the updates to the table EMP
owned by Scott, you would issue: audit UPDATE on SCOTT.EMP by access;. This command
will record all updates on the table SCOTT.EMP by any user each time it occurs, in the audit trail table
AUD$, visible through the view DBA_AUDIT_TRAIL6 .
For more information about the Oracle Audit Trail visit: http://www.oracle.com/technology/
documentation/index.html.
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 29
4.2 How to Detect Input Validation Vulnerabilities
The next big task is to decide respectively design when to parse the database query log in order to
improve the detection of injection flaws of web applications. The query log can either be parsed
repeatedly after each sent HTTP request containing malicious code for detecting input validation
vulnerabilities or as part of a post processing module when all the tests have been finished. Both
methods have their advantages and disadvantages. The following sections describe both methods in
detail and outline their respective benefits. The goal of both methods is to find web application input
parameters which can be manipulated by a malicious or trustworthy user and are being used to con-
struct SQL queries which are then executed by the database tier. Depending on the input validation
scheme the application uses, these identified parameters might be used to inject malicious code into
the targeted web application.
The pseudo code in listing 4.8 is a design idea for a combined white and black box security testing
module which uses the database SQL query log file at runtime to make more thorough statements
about input validation vulnerabilities of the targeted web application. The basic idea is to access
respectively parse the database query log repeatedly after each sent HTTP request containing
malicious code for detecting input validation vulnerabilities.
The algorithm in listing 4.8 starts with a list of previously discovered web pages on line 1. This list
can either be coming from a web crawler component or from a user imported list of accessible and
valid URLs of the targeted web application. This list is then used to find possible attack vectors
consisting of any kind of input parameter which can be manipulated by a malicious or trustworthy
user. Typically these are common URL query strings or HTML forms embedded in the HTML body of
a received web application response (see listing 4.8 line 3). Of course there are many more possible
attack vectors like cross site scripting in AJAX requests, malicious XML injections etc. A first version
of the security testing application certainly has to cover HTML query strings and HTML forms.
An identified attack vector is then populated with a unique random value which is later used together
with the database query log to verify if any of the injected values are being reused by the web appli-
cation to construct one or multiple SQL queries. A random value is being used to make the detection
of correlating SQL queries easier. Lets say a web crawler component detected the following URL:
http://www.example.com/search.php?query=Cat+Food. The security testing application
identifies the HTML query string correctly as an attack vector and replaces the query parameter with
a unique and random alphanumeric value like OPL89FGHC and sends the following HTTP request to
the targeted web application: http://www.example.com/search.php?query=OPL89FGHC (see
listing 4.8 lines 4-5).
After the HTTP request has been sent, the security testing application parses the database query
log. The query log has to be accessible in some way. As seen in section 4.1 this can be one of the
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 30
following: simple text file, comma separated values file (CSV) or a database table. Depending on
the used database solution, the security testing tool downloads or queries the database query log
accordingly and searches for queries containing the unique random value OPL89FGHC or at least
parts of it. A possible SQL query which was constructed by the web application and executed by
the database (and afterwards stored in the database query log) could look similar to this: SELECT
* FROM ‘products‘ WHERE ‘description‘ LIKE ’%OPL89FGHC%’;. Given such a SQL query
exists in the database query log it indicates that the targeted web application uses the supplied data
from the query parameter to construct SQL queries which are being processed by the database. The
circumstance that the web application uses the supplied data in a SQL statement might lead to a
vulnerability which can be used to exploit the targeted web application (see listing 4.8 lines 6-7).
The next step is to probe the validation scheme used by the web application. The probing helps to
reduce the number of previously identified attack vectors and therefore increases the performance of
the vulnerability detection routine. Depending on the used input validation library by the targeted
web application, an identified attack vector might not be exploitable after all (see section 4.2.3) be-
cause it filters malicious characters out of the user provided input which renders a possible attack
string inexecutable and therefore no exploitable vulnerability exists. The probing is done by append-
ing special characters used in common attack code to the random value which has been proven to
be used in SQL statements created by the targeted web application. In case an attacker wants to in-
sert malicious SQL code into the targeted web application he would append something like this to
the query parameter: “’ OR 1=1–” or “’ AND 1=0 UNION ALL SELECT...”7 . Most good valida-
tion schemes escape or convert characters like “’” or “<” to their respective HTML encoding, namely
“'” and “<”. This mitigates the possibility of an attack because the conversion renders
otherwise executable code unusable. The trick to determine whether specific characters are being
filtered or not is to append these special characters to the random value and check if they show up
altered in the database query log. This method can also be used without a database query log when
the HTTP(S) parameter is being reflected in the HTML body of the web application response. A min-
imal set of special characters needed to cover most payloads is around 11 characters (see section
4.2.3). Let’s say the security testing application wants to test whether the “’” character is being fil-
tered by the input validation routine of the web application or not: one would take the random value
from before “OPL89FGHC” and append the specific character “’” resulting in the following value:
’OPL89FGHC. This is again sent to the targeted web application with the following HTTP request:
http://www.example.com/search.php?query=’OPL89FGHC. By parsing the database query log
after the request has been sent one might find one of these SQL queries (see listing 4.8 lines 8-15):
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 31
• SELECT * FROM ‘products‘ WHERE ‘description‘ LIKE ’%'OPL89FGHC%’;
This indicates that the web application uses an input validation routine before SQL queries are
being sent to the database. In this case the web application changes special characters to their
respective HTML entities (or any other kind of encoding might be conceivable).
• No Query found
The targeted web application blocked the execution of the SQL query before it was sent to the
database. This indicates that the targeted web application uses some input validation to filter
user supplied input.
After this process has been done several times with various special characters the security testing
application should be able to tell which characters can be used to generate attack payloads which
are not being filtered by the input validation routine (if there is any). Security testing applications
commonly use a static list of attack payloads in order to probe for vulnerabilities. Possible XSS and
SQLi payloads look similar to this: jav
ascript:alert(’OPL89FGHC’);8 or “+ (SELECT
TOP 1 password FROM users ) + ”9 . Given the list of allowed and unfiltered characters the
static list of attack payloads can be reduced by removing payloads which contain characters that are
being filtered by the input validation routine of the web application. This reduces the number of
HTTP(S) requests and the effectiveness of the vulnerability detection routine (see listing 4.8 line 16).
The second last step is to modify the HTTP query parameter from the previous example to include
the payloads which have a high chance of getting trough the validation routine of the web application
(see listing 4.8 lines 17-19). The final step is to check whether an input validation vulnerability finally
exists or not by parsing the query log again for the inserted payload. If the payload is showing up in
the query log unchanged, then there is a very good chance that a vulnerability exists (see listing 4.8
lines 20-22).
This approach might seem a bit complicated and over excessive but in theory it should help to in-
crease performance because only attack payloads are being tested which have a legitimate chance
of getting trough the input validation filter and in the same time false positives are being reduced.
However, the prerequisite for this approach is the availability of the database query log at any time
during the security testing scan.
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 32
1 FOR each p r e v i o u s l y discovered web page
2 IF discovered web page c ont ai ns HTML forms or URL parameters THEN
3 FOR each HTML input f i e l d or URL query s t r i n g
4 SET query parameter to an unique random value
5 CALL sendHttpRequest with modified v a l u e s
6 CALL parseQueryLog
7 IF query l o g co nt ai ns p r e v i o u s l y sent random value THEN
8 FOR each s p e c i a l c h a r a c t e r in l i s t
9 SET parameter to s p e c i a l C h a r a c t e r + p r e v i o u s l y sent random value
10 CALL sendHttpRequest with modified v a l u e s
11 CALL parseQueryLog
12 IF query l o g co nt ai ns p r e v i o u s l y sent c h a r a c t e r u n f i l t e r e d THEN
13 CALL saveAllowedCharacter with s e p c i a l C h a r a c t e r
14 END IF
15 END FOR
16 CALL buildPayloads with allowedCharacters
17 FOR each payload
18 SET query parameter to payload
19 CALL sendHttpRequest with i n j e c t e d payload
20 CALL parseQueryLog
21 IF query l o g co nt ai ns sent payload unchanged THEN
22 CALL v u l n e r a b i l i t y D e t e c t e d with payload
23 END IF
24 END FOR
25 END IF
26 END FOR
27 END IF
28 END FOR
Listing 4.8: Pseudo code for detecting input validation vulnerabilities by parsing the database query
log at runtime
This approach is a slightly modified version of the previously described online approach in section
4.2.1. This approach considers the fact that the database query log might not be available during the
scan but rather in the end after all the security checks have been finished as a post scanning process.
This approach is less performant than the previously described online approach in section 4.2.1
because the online approach first checks if an attack vector is being “echoed” into the database
query log. Extensive testing of an identified attack vector is only done by the online approach if the
vector is being used by the targeted web application as part of one or multiple SQL queries sent to
the database. The following offline approach is not able to determine whether the identified vector
is being “echoed” or not, at least not at the beginning of the security tests. The offline approach has
to find a trade off between performance and accuracy of the reported vulnerabilities.
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 33
The pseudo code in listing 4.9 describes such a trade off. The algorithm in listing 4.9 starts the same
way as the online approach described in section 4.2.1 with the difference that the first sent random
value can not be checked whether it is being “echoed” or not at this particular moment (see listing
4.9 lines 1-5). Instead the security testing application keeps track of each sent random value for later
comparison with the database query log (see listing 4.9 line 6). This missing piece of information
whether the identified attack vector is actually being used in one or multiple SQL queries is the cause
of the performance issues. Every additional HTTP(S) request targeting the current attack vector might
be in vain because the parameter might not be used by the web application for querying the database.
At this point several strategies can be used to finish the implementation of the offline approach with
different outcomes regarding the accuracy of identified input validation vulnerabilities:
• Do Nothing: If we finish the test at this point we have optimal performance but are only able to
tell whether the identified attack vector is being “echoed” or not as soon as the database query
log becomes available to the security testing application. For each identified attack vector one
HTTP(S) request is being sent to the targeted web application. Unfortunately the information
whether a parameter is being used by the web application to construct a SQL query does not
yet qualify as a sound enough indication whether a vulnerability exists or not.
• Probe Input Validation Scheme: Additional to the “reflection” test, the security testing appli-
cation sends HTTP requests for probing the input validation scheme used by the web applica-
tion (see section 4.2.1 for a detailed description of the feature). When the database query log
becomes available to the security testing application it will be possible to tell whether the iden-
tified attack vector is: a) being “echoed” and b) which special characters are being filtered by
the input validation routine of the targeted web application. For each identified attack vector
one HTTP request is being sent for detecting if it is being “echoed” and about 20 HTTP requests
for probing the used input validation scheme (if a minimal character set is being used). This
variation is shown in the pseudo code in listing 4.9 lines 18-24.
• Send all Payloads: In addition to the “echoed” test, the security testing application sends all the
available attack payloads for XSS and SQLi vulnerabilities without filtering them based on the
allowed special characters by the input validation routine of the targeted web application (as
seen in the online approach described in section 4.2.1). Depending on how many attack pay-
loads are being implemented this could easily be around 30 or more payloads per vulnerability
type. As soon as the database query log becomes available to the security testing application
it is able to tell whether the identified attack vector is: a) being “echoed” and b) if a sent pay-
load has been used by the web application inside one or multiple SQL queries without being
modified by any input validation routine. For each identified attack vector one HTTP request
is being sent for detecting if it is being “echoed” and about 60 HTTP requests for sending attack
payloads to the targeted web application (if a full payload set is being used). It might be possi-
ble to use a reduced set of payloads resulting in fewer HTTP requests but with the risk of exactly
missing the one specific payload which would pass the input validation scheme unfiltered and
miss a potential vulnerability.
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 34
• Probe Input Validation Scheme & Send all Payloads: This is a combination of the above men-
tioned strategies Send all Payloads and Probe Input Validation Scheme. This strategy is in the
end as thorough as the online approach described in section 4.2.1 but with a lot more HTTP
requests per identified attack vector resulting in a weak performance. As soon as the database
query log becomes available to the security testing application it is able to tell whether the
identified attack vector is: a) being “echoed” b) which special characters are being filtered by
the input validation routine of the targeted web application and c) if a sent payload has been
used by the web application inside one or multiple SQL queries without being modified by any
input validation routine. For each identified attack vector one HTTP request is being sent for
detecting if it is being “echoed”, about 11 HTTP requests for probing the used input validation
scheme (if a minimal character set is being used. See section 4.2.3) and around 60 HTTP re-
quests for sending attack payloads to the targeted web application (if a full payload set is being
used).
Listing 4.9: Pseudo code for detecting input validation vulnerabilities by parsing the database query
log as a post scanning process
The mentioned special character probing in section 4.2.1 is a very convenient way to increase the
performance of the vulnerability detection routine by reducing previously discovered attack vectors
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 35
which are being protected by an input filtering library implemented by the targeted web application.
If special characters such as <, >, ., ; (which are typically needed to successfully exploit a XSS or
SQLi vulnerability) are being filtered, the chance is high that the discovered attack vector is not
posing any security threat to the application and therefore can be safely ignored. This results in a
performance gain by reducing HTTP(S) requests which would have been needed to detect input
validation vulnerabilities.
The most performance gain can be made with the “Online” approach described in section 4.2.1
because the result whether a special character is being filtered or not can be resolved immediately by
parsing the database query log. Table 4.4 illustrates the situation by comparing the needed HTTP(S)
requests for both the “Online” and “Offline” approach to detect input validation vulnerabilities.
As one can easily see in table 4.4 the “Offline” approach is far less performant than the “Online”
approach.
One way to slightly improve the performance of both approaches is to check a reduced set of
special characters. How many special characters are being checked is a question of weighing up
performance against precision. The security testing application can make very sound statements
about the used input filtering implementation used by the targeted web application if more special
characters are being checked. On the other hand, every character which is being checked results in:
a) sending a HTTP(S) request and b) parsing the database query log for the inserted special character
at different times depending on the used approach.
Table 4.2 lists some typical exploit code used for detecting XSS or SQLi vulnerabilities in a targeted
web application. Based on this list a minimal set of special characters can be selected which should
guarantee code execution if none of the characters is being filtered by the targeted web application.
A minimal set for XSS injection code would consist of the following special characters: ’ ; . ! - ” < >
= & { ( ) } (14 characters). A more advanced set would additionally check all the hex encoded values
of the minimal set.
A minimal set for SQLi injection code would consist of the following special characters: ’ = ( ) OR AND
UNION ALL SELECT * , ; (12 characters/keywords). A more advanced set would additionally check
more SQL keywords such as DROP, UPDATE etc.
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities | 36
# Type Exploit Code
1 XSS <script>{document.write(String.fromCharCode(79,80,76,56,57,70,71,72,67))}</script>
2 XSS %3C%73%63%72%69%70%74%3E%7Bdocument.write%28String.fromCharCode%28
79,80,76,56,57,70,71,72,67%29%29%7D%3C%73%63%72%69%70%74%3E
3 XSS <script>document.write(String.fromCharCode(79,80,76,56,57,70,71,72,67));</script>
4 XSS <script>document.write(String.fromCharCode(79,80,76,56,57,70,71,72,67))</script>
5 XSS javascript:document.write(String.fromCharCode(79,80,76,56,57,70,71,72,67))
6 XSS %6A%61%76%61%73%63%72%69%70%74:document.write(String.fromCharCode(79,80,76,56,57,70,71,72,67))
7 SQLi ’ OR ’OPL89FGHC’ = ’OPL89FGHC’−−
8 SQLi ’ OR ’OPL89FGHC’ = ’OPL89FGHC’#
9 SQLi ’ OR ’OPL89FGHC’ = ’OPL89FGHC’/*
10 SQLi ’) OR ’OPL89FGHC’ = ’OPL89FGHC−−
11 SQLi ’) OR (’OPL89FGHC’ = ’OPL89FGHC−−
12 SQLi x’ AND 1=(SELECT COUNT(*) FROM OPL89FGHC);−−
13 SQLi ’ UNION ALL SELECT name, pass FROM members
Table 4.2: Typical exploit code used by security testing applications to detect XSS and SQLi injection vulnerabilities
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities
| 37
“Online” Approach # HTTP Requests “Offline” Approach # HTTP Requests
Discovered Attack Vectors Discovered Attack Vectors
After Spidering targeted Application 50 1 50 1
After Phase #1 - “Echoed” Checks 10 21 50 21
After Phase #2 - “Special Character” Probing 3 60 50 60
TOTAL # of HTTP Requests needed 440 4’100
to discover Vulnerabilities
Table 4.4: Simplified performance comparison of the “Online” and “Offline” strategies for detecting input validation vulnerabilities by parsing the
database query log
Chapter 4. The Database Query Log & How to Detect Input Validation Vulnerabilities
| 38
5 Security Testing Framework - w3af
This chapter evaluates w3af as a basis for a combined white and black box security testing applica-
tion. w3af is an actively developed open source project with a lot of existing plugins including a web
crawler component and several code injection modules. It might be possible to enhance w3af with
the missing white box functionality.
w3af is a web application attack and audit framework completely written in Python. The project’s
goal is to create a framework to find and exploit web application vulnerabilities that is easy to use and
extend. w3af is released under the GNU General Public License Version 2.
w3af is a plugin based framework and posess over 100 plugins. These plugins do the actual work of
identifying web application vulnerabilities. Basically, w3af has three types of plugins: discovery, audit
and attack plugins. Discovery plugins have only one task, finding new so called “injection points” to
use the w3af term. These consist of HTML pages, URLs with query parameters, HTML forms etc. A
classic example of a discovery plugin is a web spider. This plugin takes an URL as input and returns
one or more injection points. When a user enables more than one plugin of this type, they work in
a loop: If plugin A finds a new URL in the first run, the w3af core will send that URL to plugin B . If
plugin B then finds a new URL, it will be sent to plugin A. This process will go on until all plugins are
run and no more knowledge about the application can be found using the enabled discovery plugins.
Audit plugins take the injection points found by discovery plugins and send specially crafted data to
all of them in order to find vulnerabilities. A classic example of an audit plugin is one that searches
for SQL injection vulnerabilities. Attack plugins objective is to exploit vulnerabilities found by audit
plugins. They usually return a shell on the remote server, or a dump of remote tables in the case of
SQL injections exploits. A complete list of all available w3af plugins can be found on the w3af project
homepage: http://w3af.sourceforge.net.
The w3af web spider plugin is one of the first plugins one needs to configure in order to start au-
tomated web application vulnerability scans. The more resources the web spider is able to detect
of a targeted web application during a scan the more thorough the results will be depending on the
quality of the vulnerability detection routine. The overall goal of any web spider is to detect as many
5.2.1 Settings
The w3af web spider plugin has the following options (see table 5.1) which can be configured by
the user. Other options regarding the spidering of a web application can also be found in the
http-settings and misc-settings menu.
Many web applications have some kind of secured area where only users with the right credentials
and user roles have access to. This could be some kind of administration interface or the like. w3af
provides three means to configure the user credentials needed for accessing these secured areas of a
web application. These three means include the following: Basic HTTP Authentication, Setting HTTP
Headers manually and providing a cookiejar file. How to use and configure these options will be
explained in the following paragraphs.
“HTTP/1.0”, includes the specification for a “Basic Access Authentication scheme”. The basic
authentication scheme is based on the model that the client must authenticate itself with a user id
and a password for each realm. The realm value should be considered an opaque string which can
only be compared for equality with other realms on that server. The server will service the request
only if it can validate the user id and password for the protection space of the Request-URI. There
are no optional authentication parameters[5, p.5]. A user can configure these settings trough the
http-settings menu (see listing 5.1).
1 # . / w3af_console
2 w3af>>> http−s e t t i n g s
3 w3af / c o n f i g : http−s e t t i n g s >>> s e t basicAuthUser myuser
4 w3af / c o n f i g : http−s e t t i n g s >>> s e t basicAuthPass mypass
5 w3af / c o n f i g : http−s e t t i n g s >>> s e t basicAuthDomain l o c a l h o s t
Every time a w3af plugin encounters a “401 Authorization Required” response from the target web
server, w3af will send a response containing the Authorization header field with the previously
configured settings (see line 13 in listing 5.2). Listing 5.2 shows how a typical Basic HTTP Authenti-
cation works between a webserver and a web client (typically a browser). The arrows used in listing
5.2 indicate whether the response was sent (>) or received (<) from the clients point of view.
w3af allows the user to manually configure HTTP headers which then will be included in every HTTP
request the framework sends. This can be used to set a HTTP cookie header manually. This requires
that the user received a valid cookie from the web application prior to starting the w3af framework.
Listing 5.3 shows how a user can configure a HTTP header file and listing 5.4 shows the actual content
1 # . / w3af_console
2 w3af>>> http−s e t t i n g s
3 w3af / c o n f i g : http−s e t t i n g s >>> s e t h e a d e r s F i l e headers . t x t
This method is somewhat outdated because the feature relies on the assumption that the cookiejar
file is in the Mozilla cookie format which is basically the Netscape HTTP cookie format specified in
RFC 2965 and that a user has easily access to the cookiejar file. Versions prior to Mozilla Firefox
2.x and up to 2.0.0.20 stored the cookies in a plaintext, whitespace-delimited file which could be
easily copied and used in conjunction with w3af. Nowadays Firefox stores everything inside a SQLite
database which is easily accessible as well but missing an export functionality to convert stored
cookies to the required cookiejar format. Listing 5.5 shows how a user would need to set the
cookiejar file.
1 # . / w3af_console
2 w3af>>> http−s e t t i n g s
3 w3af / c o n f i g : http−s e t t i n g s >>> s e t c o o k i e J a r F i l e /home/ user / Desktop / c o o k i e j a r
The w3af web spider plugin has several shortcomings. The two most prominent are:
• Missing JavaScript Support: Most of the web applications written today use some kind of
JavaScript functionality for either visualisation of content or to provide application features
which have to run on the users computer. Web spiders without JavaScript support won’t be
able to fully extract all embedded links in modern web applications and therefore miss possible
attack vectors which could be posing a security threat to the targeted web application. A web
application might use some JavaScript code to display more HTML links after an element of a
HTML drop down menu has been selected. Web spiders without JavaScript support wont be
able to see these additional and dynamically created HTML links. Although sophisticated web
spiders try to evade the missing JavaScript support by parsing embedded and externally loaded
JavaScript source files for complete URLs. Of course this only works if no JavaScript code is be-
ing used to concatenate several strings to a valid URL. Section 5.2.4 introduces a benchmarking
• Missing Automated HTML form Authentication: As seen in section 5.2.2 w3af does not provide
a plugin for automated HTML form authentication. Most modern web applications use some
kind of HTML form for their user authentication. A user has to enter his credentials such as
username and password into a HTML form field and send it to the web application. w3af is
missing a plugin which would automate this task. Other tools have to be used instead prior
to starting the w3af framework to collect cookies or session ids after a successful login. See
[2, p. 35] for a python script called SessionGrabber which can be used prior to scanning a web
application with w3af to collect session cookies of a targeted web application in an automated
manner if the username and password of a legitimate user is known to the testing agent.
Nevertheless web spiders wihtout JavaScript are much faster than their counterparts. Web spiders
with JavaScript support have to parse and compile the embedded JavaScript code first before a re-
quested HTML page can be further processed.
WIVET1 is a benchmarking framework that aims to statistically analyse web link extractors. In general,
web application vulnerability scanners fall into this category. These vulnerability scanners, given a
URL, try to extract as many input vectors as they possibly can to increase the coverage of the attack
surface. WIVET provides a good sum of input vectors to any extractor and presents the results. In
order an input extractor meaningfully, it has to provide some kind of session handling, which nearly
all of the decent crawlers do. The WIVET project is released under the GNU General Public License
Version 2.
This section compares the high scores scored by other commercial or free web spiders whether they
have JavaScript support or not. The high scores have been taken from the WIVET homepage (see
http://code.google.com/p/wivet/wiki/CurrentResults. Only w3af has been verified with
SVN revision 3438. Other scores could not be verified because the mentioned products in the high
score miss any versioning information.
• w3af - 50%
Open Source web application attack and audit framework.
Version: 1.1 (from SVN server) Revision: 3438. See http://w3af.sourceforge.net/
• Acunetix - 94%
Commercially available web application vulnerability scanner. Acunetix has pioneered the web
application security scanning technology: Its engineers have focused on web security as early as
1 http://code.google.com/p/wivet/ [29.03.10]
• HP Webinspect - 94%
Commercially available automated web application security testing and assess-
ment tool. Get innovative assessment technology for web services and web appli-
cation security and automate web application security testing and assessment. See
https://h10078.www1.hp.com/cda/hpms/display/main/hpms_content.jsp?zn=
bto&cp=1-11-201-200%5E9570_4000_100__
• IBM AppScan - 83% Commercially available desktop solution to automate web application se-
curity testing. Rational AppScan Standard Edition significantly reduces costs associated with
manual vulnerability testing and helps to protect against the threat of cyber-attack by au-
tomating security analysis to detect exploitable vulnerabilities. See http://www-01.ibm.com/
software/awdtools/appscan/.
As one can easily see, w3af performed the worst of the tested web spiders although one has to note
that w3af is the only free software project amongst the tested products. For a list of other commer-
cially or freely available web application security tools see [3].
The w3af XSS plugin tries to find reflected JavaScript code in previously discovered HTML pages
of a targeted web application by injecting JavaScript code into the pages. At first, several special
characters are sent trough writing them into available GET parameters or by writing them into input
fields provided by HTML forms (performance reasons). The special characters tested are characters
such as <, >, ", ’, ( and ). Afterwards more complicated XSS payloads are being tested (see
figure 5.1 for a complete list of XSS payloads sent by w3af).
5.3.1 Settings
Table 5.2 shows the available options which allow the user to tweak the XSS plugin to his needs.
The sqli plugin of w3af has no settings which can be configured by the user except the settings found
in the misc-settings and the http-settings menu.
5.5 Conclusion
Although w3af is one of the best freely available web application security testing frameworks it
lacks in regard of spidering modern web applications which rely on JavaScript support in the users
browser. Furthermore the extraction of web application cookies or session ids after valid user
credentials have been provided to the targeted web application is not optimally implemented in
w3af. It is quite difficult to automate the authentication process with the features provided by the
w3af framework. Third party tools have to be used for authentication and extraction of application
cookies or session ids prior to launching the w3af framework.
A proper working web spider is essential for the detection of possible security threats in a targeted
web application. This project tries to enhance the detection of possible input validation vulnerabil-
ities trough the parsing of database query log files but if no attack vectors are being detected on the
targeted web application the white box component can not detect any vulnerabilities. A possible so-
lution to this problem would have been to include JavaScript support to the w3af web spider in order
to enhance the detection of HTML links and forms. Unfortunately there are currently no Python li-
braries (w3af is completely written in Python) available which are able to modify a HTML document
tree based on common HTML JavaScript events such as: onmouseover, onselect, onload etc. This
is different if one looks for Java libraries. HtmlUnit is a “GUI-Less browser for Java programs” and is
The following chapter outlines the design decisions and overall package structure of the Web Ap-
plication Security Testing Framework (WASTF) which has been developed during this project thesis.
The framework builds the foundation for the combined white and black box web application security
testing plugin. One possibility would have been to extend the w3af framework but this has been ruled
out in chapter 5 because of w3af’s shortcomings in certain areas such as the web spider plugin. The
WASTF framework has been designed to be very similar to w3af to make it easier to use for people
which are already familiar with w3af.
The WASTF framework is a command line tool with an interactive command line interface (CLI)
similiar to that of w3af. WASTF uses the open source JLine1 library to provide the CLI. JLine is a Java
library for handling console input. People familiar with the readline/editline capabilities for modern
shells (such as bash and tcsh) will find most of the command editing features of JLine to be familiar2 .
JLine is distributed under the BSD license.
The framework consists of menus [M] and plugins [P] which are represented in a hierarchical tree
like structure and which can be traversed by the user trough the CLI by entering the name of the
menu or plugin he wants to enter. By entering ’back’ the user leaves the current menu or plugin and
jumps to the parent of the current element. Inside a plugin or menu element the user can configure
settings provided by the element trough the ’set <parameter> <value>’ command. To display a
list of available commands and parameters to change, the user simply has to type ’help’ followed by
ENTER no matter where he currently resides in the hierarchical menu structure. Figure 6.1 shows the
hierarchical menu structure of the framework.
The plugins which contain the actual business logic are classified according to one of the following
categories (similar to w3af):
• WebLogin: These types of plugins are being used to get the necessary user credentials from the
1 http://jline.sourceforge.net/ [25.06.10]
2 The most generic sense of the term shell means any program that users employ to type commands. In the Unix operating
system users may select which shell to use for interactive sessions. When the user logs in to the system the shell program
is automatically executed. Many types of shells have been developed for this purpose. The program is called a “shell”
because it hides the details of the underlying operating system behind the shell’s interface. The shell manages the
technical details of the operating system kernel interface, which is to the lowest-level, or ’inner-most’ component of an
operating system.
web application before any other plugins are being started. This is especially useful if the tar-
geted web application is being secured with a login form where a user has to enter his username
and password before he even gets access to the content of the web application.
• Discovery: Discovery plugins are used to collect as much information about the targeted web
application as possible (Information Gathering). This includes web server data such as brand
and version (not yet implemented) and spidering the web application.
• Audit: Audit plugins search for vulnerabilities in the previously discovered web application re-
sources.
• Output: Output plugins have the simple task to write the findings of all the executed plugins
into a nicely formatted report such as a XML, HTML or PDF file.
Depending on the category, a plugin which has been enabled by the user will be started at a different
time in the program flow (see section 6.2). As described in 3.3 the framework can be started in two dif-
ferent modes: interactive and automated. The automated mode can be used to schedule automated
runs trough a cron job in UNIX like operating systems or trough the Scheduler in Microsoft operating
systems. The user simply has to provide a common text file with commands he wants to be executed
during a scheduled run of the framework. The commands have to be separated trough newlines.
The next sub sections describe the two modes in which the WASTF framework can be launched and
the differences between them.
The interactive mode is the default mode which is being used if the user does not specify the -s
<script file> parameter. The application is being launched in the default mode by simply run-
The automated mode is especially important for testing the security of web applications in an au-
tomated way. This mode allows the testing agent to schedule continuous scans of a targeted web
application lets say every morning at three o’clock. Listing 6.2 shows how to start WASTF in the auto-
mated mode. Listing 6.3 shows the content of the file used to automate the WASTF framework. The
wastfScript.txt file contains the commands which are being executed from top to bottom by the
WASTF framework.
Figure 6.2 shows a simplified UML 2 package/class diagram with a birds eye overview over the pack-
age structure of the WASTF framework. The actual source code can be found on the enclosed CD-
ROM. The framework consists of 9 packages with various sub packages. The general purpose of each
package shall be explained:
• com.wastf.cli The cli package contains the classes responsible for the interactive Command
Line Interface (CLI). The CLI uses the free and open source library JLine3 . JLine is a Java library
for handling console input. It is similar in functionality to BSD editline and GNU readline. Peo-
ple familiar with the readline/editline capabilities for modern shells (such as bash and tcsh)
will find most of the command editing features of JLine to be familiar. WASTF provides a hier-
archical menu structure on top of JLine which can be traversed by the user (see section 6.1).
• com.wastf.configuration The configuration package keeps track of settings the user made in-
side menus as well as in plugins. Additionally, these settings are being encoded and saved into
the underlying data storage system. Another noteworthy task of this package is to provide an in
memory service for inter plugin communication during an active scan of a targeted web appli-
cation.
• com.wastf.database The database package contains the necessary classes for accessing and
setting up a connection to the underlying Database Management System. Section 6.4 describes
how other classes can make use of the provided data storage system.
• com.wastf.log The log package contains classes for logging what the user sees and enters into
the CLI into a common text file for later analysis and reproducibility.
3 http://jline.sourceforge.net/ [13.07.10]
• com.wastf.menu The menu package contains the various menus which can be accessed by the
user such as the target menu for setting the web application to scan or the discovery menu
which displays all the plugins which belong to the discovery category.
• com.wastf.plugin The plugin package holds the various plugin interfaces for the different plu-
gin categories such as discovery, output and audit plugins.
• com.wastf.util The util package contains simple static classes with helpful utility methods
which can be commonly used. The package contains for example a random string generator,
Base64 en-/decoder etc.
• com.wastf.web The web package contains wrapper and helper classes for using the HtmlUnit
library for sending HTTP(S) POST and GET requests as well as parsing retrieved HTML content
(with JavaScript support). See chapter 7 for more information about the HtmlUnit library and
its features.
16 wastf ( / ) >
1 # j a v a − j a r WASTF . j a r −s /home/<myuser>/Desktop / w a s t f S c r i p t . t x t
2 __ __ __ ________ ______ _________ ______
3 / _ // _ // _ /\ / _______ /\ / _____ /\ / ________ /\/ _____ /\
4 \:\\:\\:\ \\::: _ \ \ \ : : : : _ \/ _ \ __ . : : . __ \ / \ : : : : _ \/ _
5 \:\\:\\:\ \\::( _) \ \ \ : \ / ___ /\ \::\ \ \ : \ / ___ /\
6 \ : \ \ : \ \ : \ \ \ : : __ \ \\ _ : : . _ \ : \ \::\ \ \ : : : . _ \/
7 \:\\:\\:\ \\:.\ \ \ \ / ____ \ : \ \::\ \ \:\ \
8 \ _______ \/ \ __ \/\ __ \/ \ _____ \/ \ __ \/ \ _ \/
9
1 ## This i s a comment
2 version
3 exit
For saving relevant information during and between scans of targeted web applications, WASTF uses
a relational database management system. The decision to use a database management system
was a necessity for developing a working web spider. The alternative would have been to keep all
the discovered web pages in memory or download the discovered pages to the machine’s hard disk.
Both alternatives have some significant drawbacks. Storing everything in the machine’s memory is
obviously not a good idea for large web applications such as e-business applications. The memory
would soon be filled and overrun with data which in turn causes the WASTF application to crash
inevitably. Downloading web pages onto the machine’s hard disk would probably work but comes
with great speed losses. Each plugin activated after the web spider would have to load and parse the
stored web site a new form the hard disk.
WASTF uses the H2 database management system. H2 is a relational database management system
written in Java4 . It can be embedded in Java applications or run in the client-server mode. The disk
footprint (size of the jar file) is about 1 MB5 .The software is available as open source software under
modified versions of the Mozilla Public License or the original Eclipse Public License.
WASTF uses the Data Access Object (DAO) pattern6 for accessing the database. The component that
relies on the DAO uses the simpler interface exposed by the DAO for its clients. The DAO completely
hides the data source implementation details from its clients. Because the interface exposed by the
DAO to clients does not change when the underlying data source implementation changes, this pat-
tern allows the DAO to adapt to different storage schemes without affecting its clients or components.
Essentially, the DAO acts as an adapter between the component and the data source.
Figure 6.3 shows an entity relationship model (ERM) of the current implemented database schema.
What follows is a short description of each table shown in figure 6.3:
• Run: This table stores a URL of a web application the user wants to scan. Additionally a time
stamp is being kept when the scan started. All the other tables reference the primary key em-
bedded in this table.
• Configuration: The configuration table keeps track of the settings a user made before starting
a scan. This includes menus and plugins likewise.
• DiscoveredItem: The following table is being used by the web spider plugin to store discovered
web pages during the crawling process.
4 http://www.h2database.com/html/main.html [12.07.10]
5 http://www.h2database.com/html/main.html [12.07.10]
6 http://java.sun.com/blueprints/corej2eepatterns/Patterns/DataAccessObject.html [12.07.10]
• HttpLog & HttpLogParameter: The HttpLog table stores every HTTP or HTTPS request sent by
the HtmlUnit library.
This chapter describes the open source HtmlUnit project and the benefits it provides for testing web
applications automatically. Additionally, the design and performance of the web spider plugin, which
has been developed as part of this project thesis, are being discussed and visualised.
HtmlUnit is a “GUI-Less browser for Java programs” and is released under the Apache License
Version 2. It models HTML documents and provides an API that allows developers to invoke pages,
fill out forms, click links, etc... just like one does in a “normal” browser. The JavaScript support
(which is constantly being improved) is fairly good and is able to work even with quite complex AJAX
libraries, simulating either the Mozilla Firefox or Microsoft Internet Explorer. HtmlUnit is typically
used for testing purposes although HtmlUnit is not a generic unit testing framework such as JUnit
for Java. It is specifically a way to simulate a browser for testing purposes and is intended to be used
within another testing framework such as JUnit or TestNG1 .
HtmlUnit uses the Mozilla Rhino engine2 to provide JavaScript support. Rhino is an open source
implementation of JavaScript written entirely in Java. Rhino is an implementation of the core
language only and does not contain objects or methods for manipulating HTML documents. The
methods for manipulating HTML documents is instead provided by HtmlUnit. Rhino contains (as
of version 1.6): All the features of JavaScript 1.7, allows direct scripting of Java, a JavaScript shell
for executing JavaScript scripts, a JavaScript compiler to transform JavaScript source files into Java
class files and a JavaScript debugger for scripts executed with Rhino. The JavaScript language itself is
standardised by Standard ECMA-262 ECMAScript3 : A general purpose, cross-platform programming
language. Rhino 1.3 and greater conform to Edition 3 of the Standard.
JavaScript support is essential for web spider components which want to crawl today’s web applica-
tions thoroughly. There are no absolute trends about the use of JavaScript. Some users have scripting
turned off, some browsers do not support scripting. However W3C’s browser statistics show that 95%
of all browsers on the Internet have JavaScript support enabled (as of January 2008)4 . It is safe to say
that there are more web applications being built using some kind of JavaScript functionality than
1 Takenfrom http://htmlunit.sourceforge.net/[05.04.10]
2 http://www.mozilla.org/rhino/ [05.04.10]
3 http://www.ecma-international.org/publications/standards/Ecma-262.htm [05.04.10]
4 http://www.w3schools.com/browsers/browsers_stats.asp [11.06.10]
The goal of web spider components especially embedded in automated web application security
related applications is to find and unveil as many HTML pages containing attack vectors as possible.
Web spiders without JavaScript support such as the webSpider plugin of w3af would return an empty
HTML document in case of stumbling upon the HTML page shown in listing 7.2. This results in a
reduced accuracy because pages of the targeted web application which rely on extended JavaScript
functionality are never being checked for security related issues in an automated scan.
By using the HtmlUnit API and its JavaScript support it is relatively easy to write a multi threaded
web spider module which is able to cope with modern web applications using extended JavaScript
functionality for their web applications. The next section shows how HtmlUnit is being used in a
simple scenario: to retrieve a HTML page with embedded JavaScript, fill out some HTML text input
fields and to actually submit the form by using the HtmlUnit API to click on the submit button.
The API of HtmlUnit is very accommodating for filling out and submitting HTML forms. Listing 7.1
shows a sample Java method which retrieves a HTML page (the source code of the retrieved HTML
page is shown in listing 7.2) trough a proxy and populates the different HTML text input fields. The
received HTML page contains a login form which has been dynamically created by a JavaScript
routine (see listing 7.2, lines 5 - 12).
HtmlUnit (as of version 2.7) is able to simulate the following browsers: Microsoft Internet Explorer
versions 6, 7 and 8, Netscape and Mozilla Firefox versions 2 and 3. Selecting a specific browser
version changes the HTTP user agent header which is being sent by the HtmlUnit API. Additionally
some specific JavaScript instructions are being interpreted differently based on the selected browser
version5 . The specific browser version is being set upon creation of the HtmlUnit WebClient object
(see listing 7.1, lines 3-4).
HtmlUnit now uses the Mozilla Rhino engine to interpret all the downloaded JavaScript instructions
and changes the HTML document structure accordingly (if necessary). The mentioned getPage()
method returns a Java HTML page object which now contains the finished rendered HTML page
5 http://htmlunit.sourceforge.net/apidocs/com/gargoylesoftware/htmlunit/BrowserVersion.html
[11.06.10]
The HTML page object can now be used to manipulate the contents of the retrieved HTML page. To
give an impression of what functionality HtmlUnit is offering trough its API, the following list con-
tains some of the more interesting methods which are being offered by the HTML page object (as of
HtmlUnit version 2.7). For a complete documentation of the API visit HtmlUnit’s project homepage:
• List<HtmlAnchor> getAnchors()
Returns a list of all <a href="">...</a> anchors contained in a received HTML page.
• HtmlElement getFocusedElement()
Returns the element with the focus or null if no element has the focus.
• List<HtmlForm> getForms()
Returns a list of all the forms in a received HTML page.
• List<FrameWindow> getFrames()
Returns a list containing all the frames (from frame and iframe tags) in a received HTML page.
Some of these functions return a HtmlElement object which is an abstract class provided by Htm-
lUnit. This abstract class is being used by other HtmlUnit classes to model HTML elements such as
text- or password input fields, radio buttons, checkbox buttons etc. Again a short list of interesting
methods which all HtmlElement objects have in commom:
• void focus()
Sets the focus on this element.
• Page mouseOver()
Simulates moving the mouse over this element, returning the page that this element’s window
contains after the mouse move.
The example in listing 7.1 uses the names of the HTML elements in order to retrieve them from
the previously retrieved HTML page. This is just one of several ways to retrieve and modify HTML
elements embedded in the HTML page object. Other methods include: retrieving HTML elements
by their “id” attribute, iterating trough a list of all available HTML elements in the HTML page or by
using the XML Path Language (XPath). XPath is a query language for selecting nodes from an XML
document. The XPath language is based on a tree representation of a XML document, and provides
the ability to navigate around the tree, selecting nodes by a variety of criteria. In popular use (though
not in the official specification), an XPath expression is often referred to simply as an XPath.
HtmlUnit’s HtmlPage object allows the developer to retrieve HTML elements embedded in the HTML
page by issuing XPath queries such as the following code snippet which retrieves all HTML text- and
password input elements embedded in a HTML page object.
These XPath queries are extensively used in the multi threaded web spider plugin written for this
project thesis (see section 7.4). The XPath queries are being used to extract new URLs pointing to
pages on the targeted web application which have not been visited in an ongoing web spider run.
27 // Now submit the form by c l i c k i n g the button and g e t back the second page
28 f i n a l HtmlPage page2 = button . c l i c k ( ) ;
29 }
Listing 7.1: Short HtmlUnit example for submitting a login form trough a proxy server
1 <html>
2 <head>< t i t l e>Login Form</ t i t l e></head>
3 <body>
4 <form action=" checkLogin . php" name=" loginForm ">
5 <s c r i p t type=" t e x t / j a v a s c r i p t ">
6 f u n c t i o n writeInputElement ( inputType , inputName ) {
7 document . w r i t e ( ’<input type=" ’ + inputType + ’ " name=" ’ + inputName + ’ " /> ’ ) ;
8 }
9 writeInputElement ( " t e x t " , " username " ) ;
10 writeInputElement ( " password " , " password " ) ;
11 writeInputElement ( " submit " , " loginButton " ) ;
12 </ s c r i p t>
13 </form>
14 </body>
15 </html>
Listing 7.2: Dynamically created HTML login form with embedded JavaScript code
The only drawback of HtmlUnit is its JavaScript execution speed. HtmlUnit is noticeable slower in
processing downloaded HTML documents when its JavaScript support is enabled than other state of
the art browsers like Mozilla Firefox and its competitors. In order to measure the performance of the
HtmlUnit library a test case from the Mozilla Dromaeo JavaScript Performance Test Suite has been
used6 . The test case has been slightly modified for testing the execution speed of the HtmlUnit API.
The selected test case involves modifying the HTML Document Object Model (DOM) by creating and
appending several hundred new HTML elements to a retrieved page. The following functions are
being used by the test case to measure the execution speed of HTML DOM modifications:
• createElement(tagName)
This method returns an Element object. The tagName parameter is of type String. This method
can raise a DOMException object.
• createTextNode(data)
This method returns a Text object. The data parameter is of type String.
• cloneNode(deep)
This method returns a Node object. The deep parameter is of type Boolean.
• document.body.appendChild(newChild)
This method returns a Node object. The newChild parameter is a Node object. This method
can raise a DOMException object.
• document.body.insertBefore(newChild, refChild)
This method returns a Node object. The newChild parameter is a Node object. The refChild
parameter is a Node object. This method can raise a DOMException object.
• document.body.innerHTML
This variable can be used to modify the rendered content of a HTML document after it has been
fully loaded inside a web browser.
Figure 7.1 shows the relative runtimes in milliseconds between Mozilla Firefox 3.6.3, Chromium
6.0.431.0 and HtmlUnit 2.7. In comparison HtmlUnit is almost 4x slower than Mozilla Firefox 3.6.3
and even 19x slower than Chromium 6.0.431.0 in doing HTML DOM modifications. Whereas the cre-
ation of new HTML elements is the most time consuming task for HtmlUnit. The test case has been
run five times for every browser and the runtimes in figure 7.1 are the arithmetic mean of these times.
See Appendix A for a detailed breakdown of the measured runtimes and Appendix B for the code used
to measure the JavaScript performance.
6 https://wiki.mozilla.org/Dromaeo [11.06.10]
The following sections describe the overall design of the web spider plugin which has been devel-
oped as part of this project thesis. The web spider uses the HtmlUnit library described in sections
7.1, 7.2 and 7.3 to use its JavaScript interpreter and the ability to simulate state of the art web browsers.
The main goal throughout the design and implementation of the web spider was to build it:
• Scalable: As seen in section 7.3 the HtmlUnit library is not the fastest when it comes to pro-
cessing JavaScript instructions especially with HTML DOM modifications. In order to increase
the crawling speed without disabling JavaScript support the web spider has to be developed in
a multi threaded fashion.
• Easily extensible: The web spider needs to be easily extensible if more functionality should be
added to the code base without any major hassle.
• JavaScript aware: The main advantage over other web spiders is the JavaScript support of the
HtmlUnit library. The full potential of the HtmlUnit API regarding JavaScript support should be
used.
• Comprehensive: The web spider should be crawling a web application as thoroughly as pos-
sible. Other components like the detection of SQL injection vulnerabilities build up on the
discovered resources by the web spider.
• Configurable: The user should be able to alter the behaviour of the web spider trough several
parameters which he can set prior to starting the web spider. The configurable parameters are
being described in section 7.4.1.
Appendix C and D contain a simplified UML 2 class and sequence diagram of the developed web
spider component.
In order to alter the behaviour of the web spider plugin the user can set various settings trough the
configuration menu of the web spider. These settings mostly influence how long the web spider plu-
gin will be crawling trough a web application before terminating the crawling process and passing
the results to other modules like the SQL injection detection routine. Additionally, a user is able to
tell the web spider which resources it either should follow or ignore.
• Setting a maximum crawling level: This setting allows the user to tell the web spider plugin
how “deep” into the web application it should crawl. The first page the web spider retrieves
from the web application is on level 0. Any links embedded in that first page lead to other pages
which are located on level 1 and so on. So by setting a maximum crawling level n, the web
spider will stop crawling any further if it reached level n.
• Setting a follow regular expression: If this setting is being used then the web spider plugin
will only follow links matching the given regular expression and will ignore all other links not
matching the provided expression.
• Setting an ignore regular expression: If this setting is being used then the web spider will crawl
every link it finds but will ignore those that match the provided expression.
• Setting a time limit: The user is able to set a maximum time limit for the spidering process. The
web spider will stop spidering when the time limit has been exceeded. Of course the spidering
process will be terminated before the time limit has been exceeded if either the web spider is not
able to find any more resources on the targeted web application or the disk space is becoming
low.
• Staying on the same domain: Usually the web spider follows any found link embedded in a
received HTML page even if the link leads to a completely different domain than the initial
address of the targeted web application. For example a web application reachable under the
following address https://web1.example.com has an embedded link to a completely differ-
ent domain such as http://www.example2.org. This setting tells the web spider if domains
differing from the initial domain of the web application should be ignored or not.
Figure 7.2 shows an UML 2 activity diagram with a birds eye view on the implemented program logic
of the multi threaded, JavaScript aware web spider. The steps following the Start Spidering node are
being executed in multiple threads thus speeding up the whole spidering process. A received HTML
page is being processed by executing different link extraction routines. These extraction routines
scan the retrieved HTML code for links pointing to previously undiscovered HTML pages of the web
application. The currently implemented extraction routines include the following:
• A HREF link extraction: This routine simply extracts the src attribute from common HTML
<a href="...">...</a> links.
• Mouse event extraction: Scans the retrieved HTML code for onmouseover and onmouseout
tags and invokes the event with HtmlUnit’s JavaScript support. If the event triggers a page redi-
rect to a previously undiscovered HTML page, then the page will be added to the web spider
queue for further processing.
• OnClick extraction: This routine scans the retrieved HTML code for onclick tags and invokes
the event with HtmlUnit’s JavaScript support. If the event triggers a page redirect to a previously
undiscovered HTML page, then the page will be added to the web spider queue for further
processing.
• OnChange extraction: This routine scans the retrieved HTML code for onchange tags and
invokes the event with HtmlUnit’s JavaScript support. If the event triggers a page redirect to a
previously undiscovered HTML page, then the page will be added to the web spider queue for
further processing.
• HTML comment extraction: Scans the retrieved HTML code for URLs between HTML com-
ments with a sophisticated regular expression. The following example shows a common HTML
comment and a embedded URL which will be detected by this routine. The identified URLs
inside HTML comments are being added to the web spider queue for further processing.
<!--
This is a common HTML comment...
and this is an embedded URL: https://web1.example.com/index.php
-->
• Submitting HTML forms: If the retrieved HTML page contains one or multiple HTML forms,
this routine tries to fill out the existing input fields, drop down boxes, radio buttons etc. to
successfully submit the form. If the submitted form triggers a page redirect to a previously
undiscovered HTML page, then the page will be added to the web spider queue for further
processing. For more details on how forms are being submitted and how input fields are being
populated see section 7.5.
Common HTML forms are widely used in today’s web applications for passing information from
the user to the web application. A form can contain input elements like text fields, checkboxes,
radio-buttons, submit buttons and more. A form can also contain select lists, textarea, fieldset,
legend, and label elements. Of course there are other technologies such as Adobe Flash, Java Applets,
JavaFX, Microsoft Silverlight etc. that can be used to create expressive, feature-rich web application
user interfaces. These technologies should not be discussed here further. The web spider plugin
developed for this project thesis is able to detect HTML forms embedded in retrieved HTML pages.
The discovered forms are being analysed and the web spider tries to populate the form elements such
as text fields, radio-buttons, checkboxes etc. with reasonable data to circumvent any restrictions in a
non invasive way.
The main difficulty in filling out HTML forms in an automated manner is to circumvent any re-
strictions the business logic of the web application might impose on a particular HTML form. Lets
say a user has to create a new user profile in an E-Business application. The user has to enter his
name, address and phone number in order to use the E-Business application. Most probably the
user will be shown a HTML form with the necessary input fields such as a text field for entering his
name, address, phone number and e-mail address and a drop down list for entering his birthday
and country. The business logic of the web application probably checks the input fields and helps
the user in filling out the necessary data. The business logic makes sure that the text field used for
entering the users phone number only contains numeric values such as numbers ranging from 0 to
9 before accepting the data. The main goal is to guess such restrictions based on various indicators
like the name of the HTML element and others.
The routine for smartly filling out HTML forms is being implemented as a helper library and can
be used in a non- and in an invasive mode. How business logic restrictions are being guessed and
what the differences between those two mentioned modes are will be explained in the following sub
sections 7.5.1 and 7.5.2.
The non invasive form of this routine is being used by the web spider plugin. Non invasive means
that no hidden HTML elements are being changed and no existing values except for text fields are
being overwritten. This reflects the behaviour of a normal and friendly user using the web applica-
tion. The following indicators are being used to guess the imposed business logic on HTML elements
embedded in a retrieved HTML form:
• Existing value detection: If the name or the id of a HTML element does not match any prede-
fined buzzword the value of the element is being analysed. It might be possible that the input
field has been pre filled out with an example by the web application developer. Let’s say the
user has to enter his e-mail address and the according HTML element is filled out with an ex-
ample such as email@example.com. The routine detects the @ by using regular expressions
and generates a random unique e-mail address. If the value attribute matches the following
[0-9]* regular expression a unique random numeric value is being generated. If the value
attribute matches the following [a-zA-Z0-9]* regular expression a unique random alphanu-
meric value is being generated. If none of the predefined regular expressions match, a unique
random alphanumeric value is being generated per default.
• Max length detection: After a unique random value has been created whatsoever the HTML
element is being checked for length restrictions. HTML input fields can be limited to a specific
size with the maxlength attribute. If such an attribute exists and the generated value is longer
than the maximum allowed size, the random value is being reduced to not violate the maximum
length restriction.
Figure 7.3 shows an UML 2 activity diagram of the main program flow of the non invasive mode.
7.5.2 Invasive
The invasive mode behaves exactly the same as the non invasive form with only a few exceptions. The
main goal of the invasive mode is to modify as many HTML element value attributes as possible in an
automated manner. This mode is mainly used by plugins trying to find input validation vulnerabilities
in a targeted web application. HTML elements such as hidden input fields and values of drop down
lists, checkboxes and radio buttons are being changed.
• Changing the value of hidden HTML elements: Hidden HTML fields are not rendered by the
browser and thus not visible to the user without looking at the code of a retrieved HTML page.
Hidden HTML elements are used by web applications to pass along information which should
not be visible and editable to the user. The following snippet shows a hidden HTML element
passing along a username.
<input type="hidden" name="username" value="dave">
The invasive routine detects such hidden HTML elements and analyses the element like in the
non invasive method in section 7.5.1 (buzzwords, value matching and maximum length restric-
tions). The generated unique and random value is either numeric, alphanumeric or a random
e-mail address. In the above case the unique random value would look something like this:
<input type="hidden" name="username" value="ear5ue5xoi5JeecohT1o">
• Changing the value of select options: The <select> tag is used to create a select list (drop-
down list) for the user and the <option> tags inside the select element define the available
options in the list.
<select name="top5" size="3">
<option value="Stones">The Rolling Stones</option>
<option value="Waits">Tom Waits</option>
<option value="Beatles">The Beatles</option>
<option value="Presley">Elvis Presley</option>
<option value="King">B.B. King</option>
</select>
The invasive routine detects such option HTML elements and analyses the element like in the
non invasive method in section 7.5.1 (buzzwords, value matching and maximum length restric-
tions). The generated unique and random value is either numeric, alphanumeric or a random
e-mail address. In the above case the unique random value would look something like this:
<select name="top5" size="3">
<option value="feebe8Oovahbo0hei6iP">The Rolling Stones</option>
• Changing the value of radio options: Radio buttons are used when the user has to select an
option from a set of alternatives but is only allowed to select one.
<input type="radio" name="Payment" value="Master"> Mastercard
<input type="radio" name="Payment" value="Visa"> Visa
<input type="radio" name="Payment" value="AmExpress"> American Express
The invasive routine detects such radio buttons and analyses the element like in the non in-
vasive method in section 7.5.1 (buzzwords, value matching and maximum length restrictions).
The generated unique and random value is either numeric, alphanumeric or a random e-mail
address. In the above case the unique random value would look something like this:
<input type="radio" name="Payment" value="OoFohQu9luu2shiwahh7"> Mastercard
<input type="radio" name="Payment" value="thoobaeR2Daeph6oovie"> Visa
<input type="radio" name="Payment" value="seef5ShoomeoyohPhai0"> American Express
• Changing the value of checkbox options: Checkbox options are often used in groups to indi-
cate a series of choices any one of which can be on or off.
<input type="checkbox" name="Ingredient" value="salami"> Salami
<input type="checkbox" name="Ingredient" value="mushrooms"> Mushrooms
<input type="checkbox" name="Ingredient" value="anchovies"> Anchovies
The invasive routine detects such radio buttons and analyses the element like in the non in-
vasive method in section 7.5.1 (buzzwords, value matching and maximum length restrictions).
The generated unique and random value is either numeric, alphanumeric or a random e-mail
address. In the above case the unique random value would look something like this:
<input type="checkbox" name="Ingredient" value="failie9eeJaexaeJ3cho"> Salami
<input type="checkbox" name="Ingredient" value="geitohsh7oophai6Aek4"> Mushrooms
<input type="checkbox" name="Ingredient" value="IeteeRoh1Eishieh6oow"> Anchovies
7.5.3 Limitations
Using buzzwords and regular expression to determine the kind of random value which might be
accepted by the business logic of a web application is very primitive. The routine assumes that
the developer of the targeted web application named his HTML elements according to their pur-
pose. If the names or ids of HTML elements are randomly named such as <input type="text"
name="input15" value=""> then the routine will be generating unique random alphanumeric
values for all elements. Even if one of the fields only accepts numeric values. The consequence is
that some forms can not be populated with unique and random data which satisfies the imposed
business logic and thus the form will not be accepted by the web application. This in turn decreases
the accuracy of the web spider plugin because additional HTML pages that would have been shown to
The following chapter outlines the final implementation details of the previously elaborated com-
bined white and black box plugin based on parsing database query log files. Especially how the plugin
uses SQL syntax validation libraries to find actually working exploits for found SQL injection vulner-
abilities.
After a SQL statement has been identified in the database query log, actually containing a previous
sent unique and random value, the statement is going to be processed further. See section 4.2 on
how these SQL statements are being detected and collected. Based on the original statement, multi-
ple mutants are being generated which might lead to a working SQL Code Injection exploit for that
particular vulnerability. The following sections describe the procedure developed in the plugin which
tries to find working SQL Code Injection exploits for a given SQL statement in an automated manner.
This is a very important step because finding SQL statements containing a randomly generated value
does not pose a security threat to the targeted web application. The following described routine tries
to modify the randomly generated value in such a way, to proof the existence of a SQL Code Injection
vulnerability.
The first step is to validate the captured original SQL statement containing the previously sent
unique and random value. This is an example of such a captured SQL statement containing a sent
random value: SELECT * FROM products WHERE description LIKE ’%OPL89FGHC%’;. The
statement is being validated before tampering to make sure it is even a valid statement. This helps
in reducing false positives and increases the execution speed of the overall routine by skipping in-
valid statements. Finding working SQL syntax validation libraries for the Java programming language
sounds easier than it actually is. Nevertheless, there exist two promising approaches/projects whose
mission goal is to provide a stand alone SQL syntax validation library. Sadly, both libraries are not
able to validate every given SQL statement correctly, they do report false negatives. That means, SQL
statements which are syntactically correct are reported as being invalid. Because of this fact WASTF
uses two SQL validation libraries and flags a SQL statement as invalid only if both libraries come to
the same conclusion. The libraries used are JsqlParser and the validation routine from the Apache
Derby Database Management System.
JSqlParser1 parses an SQL statement and translate it into a hierarchy of Java classes. The generated
hierarchy can be navigated using the Visitor Pattern. Listing 8.1 shows how a SQL statement can be
validated using JsqlParser.
Apache Derby
Apache Derby2 , an Apache DB sub project, is an open source relational database implemented en-
tirely in Java and available under the Apache License, Version 2.0. Listing 8.2 shows how a SQL state-
ment can be validated using the Apache Derby database management system. Unfortunately the
standard Apache Derby does not provide this functionality. A debug version of Apache Derby and
some additional Java classes from Derby’s Issue Tracker have to be added. See the following item
in the Apache Derby Issue Management System: https://issues.apache.org/jira/browse/
DERBY-3946.
The following sections explain how WASTF mutates collected SQL statements to find working SQL
Injection exploits step by step. Figure 8.5 tries to visualise the outlined steps in the following sections.
The routine basically builds the Cartesian product from several sets such as PAYLOADS, VALUE and
COMMENTS.
Step 1 - Validate original SQL query: As mentioned in section 8.1.1 the first step is to validate the
captured SQL statement with both SQL validation libraries. Additionally the collected SQL state-
ment is again being checked to contain the previously sent unique and random value contained in a
HTTP(S) request. If it is missing from the collected SQL statement, then there is no way WASTF would
be able to tamper with the SQL statement. The sample SQL statement used throughout this section
is the following: SELECT * FROM products WHERE description LIKE ’%OPL89FGHC%’;.
Step 2 - Remove fragments which contain non allowed characters: As seen in section 4.2.1 or
section 4.2.2 WASTF first probes the web application and tries to determine which special characters
such as >, < and others are being filtered by the web application under test. This list of non allowed
characters is now being used to reduce the size of the so called Fragment List. Fragments containing
characters which are being filtered by the targeted web application can not be used to successfully
inject working SQL code and can be safely removed. Figure 8.1 shows the contents of the Fragment
List whereas the PAYLOAD, VALUE, COMMENT represent tokens which later are being replaced with
meaningful values.
1 http://jsqlparser.sourceforge.net/ [12.07.10]
2 http://db.apache.org/derby/ [12.07.10]
Step 3 - Replace random token with fragments: The random value embedded in the collected SQL
statement (OPL89FGHC) marks the spot where code can be injected into the SQL query by sending
special crafted HTTP(S) requests. The third step is to replace the random value with the rest of
the Fragment List. See figure 8.2 for some examples what the current working set of mutated SQL
statements look like.
Figure 8.2: Random value in the original SQL statement replaced with the Fragment List
Step 4 - Replace PAYLOAD token: The Payload List is similar to the previously mentioned Fragment
List and contains SQL payloads such as OR 1=1, ’ AND 1=1 etc. Again, like before, the Payload List
is being reduced by removing payloads which contain a non allowed character. After the list has been
reduced, the Cartesian product is being built by combining the SQL statements shown in figure 8.2
with the rest of the available payloads in the Payload List. The result of this operation is shown in
figure 8.3.
Figure 8.3: PAYLOAD token replaced with values from the Payload List
Step 5 - Replace VALUE token: This step is basically no different than step 4. The main alteration is
that the VALUE token is being replaced with entries out of the Value List. The previously mutated SQL
statements as seen in figure 8.3 are being used and the Cartesian product is being built with entries
out of the Value List. The current implementation of the Value List contains the following entries: 1,
Step 6 - Replace COMMENT token: Again the mutated SQL statements which resulted from the last
replacement in step 5 are being used for further processing. This time the COMMENT token is being
replaced with values out of the Comment List. The Cartesian product is being built with the mutated
SQL statements and the following entries out of the Comment List: −−. This is a comment indicator
for the MySQL Database Management System.
Step 7 - Remove enclosed payloads: Step 7 is used to reduce the size of the now finished mutated
SQL statements. Thus removing mutated SQL statements whose payloads are enclosed in brackets.
Payloads which are enclosed in brackets might only be useful for stored XSS attacks but do not
allow WASTF to send arbitrary SQL commands to the database of the targeted web application.
The following mutated SQL statement would be removed by this step: SELECT * FROM products
WHERE description LIKE ’%OR 1=1%’;.
Step 8 - Validate mutated SQL statements: The final step is to validate all the mutated SQL state-
ments. The mutated SQL statements have to conform to the SQL syntax. If they do not, the database
Management System of the targeted web application throws an error while trying to process these
statements. This is again done by using the two SQL syntax validation libraries mentioned in section
8.1.1.
Example: Let’s say we start with our original captured SQL statement SELECT * FROM products
WHERE description LIKE ’%OPL89FGHC%’; from before. All characters are allowed and none
are filtered by the targeted web application. We start mutating the SQL statement and after step 3
we have a total of 26 mutated statements (because the Fragment List consists of 26 values). Now the
PAYLOAD tokens are being replaced in step 4. After this a total of 234 (26 · 9 = 234) SQL statements are
ready for further processing because the Payload List consists of 9 values. After replacing the VALUE
token a total of 702 (234 · 3 = 702) SQL statements are ready for further processing. After replacing the
COMMENT token we have 702 (702 · 1 = 702) finished mutated SQL statements. After removing all the
payloads which are enclosed in brackets there are a total of 354 mutated SQL statements available.
The next step is to validate the statements if they conform to the SQL syntax. After validating the
statements there are a total of 15 mutated SQL statements left. These 15 statements are the proof
that a SQL Code Injection vulnerability exists and can be used to send arbitrary SQL statements to
the database Management System of the targeted web application. Figure 8.4 shows a complete list
of the mutated SQL statements which seem to be working exploits for the previously identified SQL
Injection vulnerability in the targeted web application.
[SELECT * FROM products WHERE description LIKE ’%x’ UNION SELECT a FROM b --%’;]
[x’ UNION SELECT a FROM b --]
[SELECT * FROM products WHERE description LIKE ’% 1’ AND 1=(SELECT COUNT(*) FROM tablename); --%’;]
[ 1’ AND 1=(SELECT COUNT(*) FROM tablename); --]
[SELECT * FROM products WHERE description LIKE ’%1 1’ AND 1=(SELECT COUNT(*) FROM tablename); --%’;]
[1 1’ AND 1=(SELECT COUNT(*) FROM tablename); --]
[SELECT * FROM products WHERE description LIKE ’%) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- %’;]
[) 1’ AND 1=(SELECT COUNT(*) FROM tablename); -- ]
[SELECT * FROM products WHERE description LIKE ’%’; SELECT a FROM b -- %’;]
[’; SELECT a FROM b -- ]
...
Figure 8.4: Working SQL Code Injection exploits after mutating a captured SQL statement
Even though there exists a standard for the SQL query language such as the SQL-923 , SQL:20034 or
SQL:20085 every Database Management System has their own little specialities and dialects. This
makes it especially hard for SQL syntax validation libraries because there is no way they can validate
all the special functions every database manufacturer adds to his product. This circumstance leads
to false negatives being reported by the validation libraries while validating arbitrary SQL statements.
One sophisticated solution to this problem would be to use the validation routine of a particular
database product for which the SQL statement originally was written for. In reality this is sadly not
feasible for a number of reasons:
1. Installation: All these different database products would have to be physically installed on a
machine in order to access their SQL statement validation routine.
2. Cohesion: Most validation routines implemented in database products are highly coupled
with the underlying data structure. This means validating an arbitrary SQL statement such
as SELECT * FROM products WHERE id=12 only works if the products table exists and its
schema defines a numeric field with the name id. One would have to replicate database tables
based on the information given in collected SQL statements.
Writing a new SQL validation library which tries to recognise most of the special features added by
database manufacturers would be the best approach. Writing such a library would include using
parser generator tools such as ANTLR6 or JavaCC7 . There also exists an online SQL syntax validation
3 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt [13.07.10]
4 http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=34134
[13.07.10]
5 http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=38641
[13.07.10]
6 http://www.antlr.org/ [13.07.10]
7 https://javacc.dev.java.net/ [13.07.10]
The JsqlParser parser library tries to recognise some of the features added by database manufactures
which are not described in the official SQL standards such as the MD5(’foobar’) function in MySQL
which returns the MD5 hash for the given string. The only drawback is it’s performance. Compared
to the validation routine provided by the Apache Derby database management system validating a
statement with JsqlParser is 1.5 times slower9
4 public c l a s s SQLValidator {
5 p r i v a t e f i n a l S t r i n g sqlStatement = "SELECT ∗ FROM products WHERE "
6 + " d e s c r i p t i o n LIKE ’%OPL89FGHC% ’; " ;
7 public s t a t i c boolean v a l i d a t e S t a t e m e n t ( ) {
8 CCJSqlParserManager parserManager = new CCJSqlParserManager ( ) ;
9 try {
10 parserManager . parse (new S t r i n g R e a d e r ( sqlStatement ) ) ;
11 return true ;
12 } catch ( JSQLParserException e ) {
13 e . printStackStrace ( ) ;
14 }
15 return f a l s e ;
16 }
17 }
8 http://developer.mimer.se/validator/ [13.07.10]
9 Measured by using the Eclipse Test & Performance Tools Platform Project (TPTP) and the SqlValidatorTest.java
JUnit test case (see the enclosed CD-ROM for the source code).
6 public f i n a l c l a s s SQLValidator {
7 p r i v a t e s t a t i c Connection conn = n u l l ;
8 p r i v a t e f i n a l S t r i n g sqlStatement = "SELECT ∗ FROM products WHERE "
9 + " d e s c r i p t i o n LIKE ’%OPL89FGHC% ’; " ;
10 public f i n a l s t a t i c S t r i n g DRIVER_NAME = " org . apache . derby . jdbc . EmbeddedDriver " ;
11 public f i n a l s t a t i c S t r i n g CONNECTION_URL = " jdbc : derby : memory :dummy; c r e a t e=t r u e " ;
12 public f i n a l s t a t i c S t r i n g DERBY_DEBUG_SETTING = " derby . debug . t r u e " ;
13 public f i n a l s t a t i c S t r i n g STOP_AFTER_PARSING = " S t o p A f t e r P a r s i n g " ;
14 public f i n a l s t a t i c S t r i n g SHUTDOWN_URL = " jdbc : derby : ; shutdown=t r u e " ;
15 public f i n a l s t a t i c S t r i n g LANG_CONNECTION = " LanguageConnectionContext " ;
16 public f i n a l s t a t i c S t r i n g STOPPED_AFTER_PARSING = " 42Z55 " ;
17 public f i n a l s t a t i c S t r i n g DERBY_LOG = " derby . stream . e r r o r . method" ;
18
19 p r i v a t e void i n i t i a l i s e C o n n e c t i o n ( ) {
20 try {
21 System . s e t P r o p e r t y (DERBY_LOG, DERBY_LOG_CLASS ) ;
22 System . s e t P r o p e r t y (DERBY_DEBUG_SETTING , STOP_AFTER_PARSING ) ;
23 C l a s s . forName (DRIVER_NAME ) . newInstance ( ) ;
24 } catch ( ClassNotFoundException e ) {
25 LOGGER . e r r o r ( e . getMessage ( ) , e ) ;
26 return ;
27 } catch ( I n s t a n t i a t i o n E x c e p t i o n e ) {
28 LOGGER . e r r o r ( e . getMessage ( ) , e ) ;
29 return ;
30 } catch ( I l l e g a l A c c e s s E x c e p t i o n e ) {
31 LOGGER . e r r o r ( e . getMessage ( ) , e ) ;
32 return ;
33 }
34
35 try {
36 conn = DriverManager . getConnection (CONNECTION_URL ) ;
37 } catch ( SQLException e ) {
38 LOGGER . e r r o r ( e . getMessage ( ) , e ) ;
39 return ;
40 }
41 }
42
43 public boolean v a l i d a t e S t a t e m e n t ( ) {
44 initialiseConnection ( ) ;
45
46 i f ( conn == n u l l ) {
47 throw new I l l e g a l S t a t e E x c e p t i o n ( " Derby datbase connection not a v a i l a b l e ! " ) ;
48 }
49
50 QueryTreeNode queryTree = n u l l ;
53 try {
54 conn . prepareStatement ( sqlStatement ) ;
55 } catch ( SQLException e ) {
56 String s q l S t a t e = e . getSQLState ( ) ;
57 i f ( ! STOPPED_AFTER_PARSING . equals ( s q l S t a t e ) ) {
58 exception = true ;
59 }
60 }
61
62 i f ( exception ) {
63 return f a l s e ;
64 }
65
| 76
9 Test Series
The Test Series chapter is concerned with testing the WASTF application thoroughly. Especially in
comparison with standard black box web application security testing tools such as w3af. The goal
is to provide a detailed comparison between WASTF and w3af in regard of sent HTTP(S) requests,
number of detected vulnerabilities, web pages discovered etc.
The next sections describe the basic system which has been set up to conduct the comparison be-
tween WASTF and its black box security testing application equivalent w3af.
In the enclosed CD-ROM’s root directory there exists a folder called “Test-Environment”. This folder
contains a virtual machine with the test environment set up. The virtual machine has been created
using the free virtualisation project VirtualBox (version 3.2.6.r63112) and converted to the Open
Virtualisation Format (.ovf files).
The base system consists of a Ubuntu 10.04 LTS “Lucid Lynx” Server edition operating system with
the following services enabled and software installed:
• Apache Webserver: Server version: Apache/2.2.14 (Ubuntu) / Server built: Apr 13 2010 19:28:27
• MySQL Client: mysql Ver 14.14 Distrib 5.1.41, for debian-linux-gnu (i486) using readline 6.1
9.1.2 Network
The test environment has been configured to use a “Host-only Adapter” for networking. Host-only
networking can be thought of as a hybrid between the bridged and internal networking modes: as
with bridged networking, the virtual machines can talk to each other and the host as if they were con-
nected through a physical Ethernet switch. Similarly, as with internal networking however, a physical
networking interface needs not to be present, and the virtual machines cannot talk to the world out-
side the host since they are not connected to a physical networking interface. Instead, when host-only
Note: The virtual machine is not intended for use in a productive environment or for storing sensitive
data. The used passwords are very weak and chosen for easy and quick access to the test environment.
The following sections describe the installed web applications on the virtual machine which are be-
ing used for testing and comparing WASTF with another web application security scanner. The URLs
given for accessing the web applications are meant to be opened after the test environment respec-
tively the virtual machine (see contents of the enclosed CD-ROM) has been started.
There exists a static welcome page which can be accessed trough the following URL http://192.
168.56.101/. The welcome page contains a list of all the installed web applications on the test en-
vironment server. Figure 9.1 shows the WASTF test environment welcome page and two pre-installed
web applications.
Figure 9.1: Various pre-installed applications in the WASTF test environment virtual machine
phpMyAdmin is a free software tool written in PHP intended to handle the administration of MySQL
over the World Wide Web. phpMyAdmin supports a wide range of operations with MySQL. The most
frequently used operations are supported by the user interface (managing databases, tables, fields,
relations, indexes, users, permissions, etc), while you still have the ability to directly execute any SQL
statement1 .
phpMyAdmin can be accessed trough the follwoing URL (using the MySQL Administrator creden-
tials): http://192.168.56.101/phpmyadmin/
Note: The MySQL server can also be accessed from a remote machine with the mysql-client package
installed:
Note: The ommited space between the -p parameter and the provided password is intentional.
WIVET2 is a benchmarking framework that aims to statistically analyse web link extractors. In
general, web application vulnerability scanners fall into this category. These vulnerability scanners,
given a URL, try to extract as many input vectors as they possibly can to increase the coverage of the
attack surface. WIVET provides a good sum of input vectors to any extractor and presents the results.
In order an input extractor to run meaningfully, it has to provide some kind of session handling,
which nearly all of the decent crawlers do. The WIVET project is released under the GNU General
Public License Version 2. Also see section 5.2.4.
1 http://www.phpmyadmin.net/home_page/index.php [11.07.10]
2 http://code.google.com/p/wivet/ [29.03.10]
Damn Vulnerable Web App (DVWA)3 is a PHP/MySQL web application that is damn vulnera-
ble. Its main goals are to be an aid for security professionals to test their skills and tools in a
legal environment, help web developers better understand the processes of securing web applica-
tions and aid teachers/students to teach/learn web application security in a class room environment.
WordPress4 is an open source CMS, often used as a blog publishing application powered by PHP and
MySQL. It has many features including a plugin architecture and a templating system. Used by over
300 of the 10,000 biggest websites, WordPress is the most popular blog software in use today5 . It was
first released in May 2003 by Matt Mullenweg as a fork of b2/cafelog. As of September 2009, it was
being used by 202 million websites worldwide6 .
9.2.7 Flowershop
3 http://www.dvwa.co.uk/ [11.07.10]
4 http://wordpress.org [16.07.10]
5 http://trends.builtwith.com/blog/WordPress [11.07.10]
6 http://andrewapeterson.com/2009/09/wordpress-usage-202-million-worldwide-62-8-million-us/
[11.07.10]
7 http://www.magentocommerce.com/ [16.07.10]
The following sections describe the set-up used to compare WASTF and its black box security testing
application equivalent w3af. The specific test cases are being explained and the results are being
analysed and visually presented.
Figure 9.2 shows the set-up used to conduct the comparison between WASTF and its black box secu-
rity testing application equivalent w3af. The following software versions have been used:
• w3af 1.0-rc3: The latest release candidate for version 1 (see chapter 5).
• WebScarab 20100414-0036: A ZIP containing an up to date build of the master branch of the
WebScarab git tree. WebScarab is a framework for analysing applications that communicate
using the HTTP and HTTPS protocols. It is written in Java, and is thus portable to many plat-
forms. WebScarab has several modes of operation, implemented by a number of plugins. In
its most common usage, WebScarab operates as an intercepting proxy, allowing the operator
to review and modify requests created by the browser before they are sent to the server, and to
review and modify responses returned from the server before they are received by the browser.
WebScarab is able to intercept both HTTP and HTTPS communication. The operator can also
review the conversations (requests and responses) that have passed through WebScarab.
These specific versions of the tools can be found in the CD-ROM’s root directory filed under Test-
Environment > Tools. As shown in figure 9.2 all the HTTP requests made by the web application
security scanners are being recorded by the WebScarab proxy for detailed analyses. This ensures that
not a single HTTP request is transmitted unnoticed and unrecorded. The files generated by WASTF
and w3af for every test case are stored on the enclosed CD-ROM under Test-Environment > Test
Series. The file structure of every test case can be seen in figure 9.3. The recordings of WebScarab can
be loaded by unziping the webscarab.zip located in every test case folder into a new folder and
then by clicking on File > Open inside WebScarab’s graphical user interface.
The conducted tests focus on the following attributes to make a sound comparison between WASTF
and w3af:
• Speed/Runtime: How long in regard of execution time does the web application security scan-
ner need for completing the test case. This metric strongly correlates with the accuracy metric.
A web application security scanner whith a low accuracy is usually faster than its counterpart
which has a higher accuracy.
• Accuracy: This metric depends on the conducted test case. The accuracy metric should tell
how many pages have been detected by the web spider plugin or how many vulnerabilities
have been found during the scan.
CD-ROM
...
Test Environment
<Test Case 1>
w3af .................................................... w3af test case results
output.txt.......................................... w3af’s console output
output-http.txt.......................... w3af also records HTTP messages
webscarab.zip ........................ WebScarabs recorded HTTP messages
*.pw3af...................... w3af configuration file for replaying the test case
WASTF................................................. WASTF test case results
console_output.log...............................WASTF’s console output
wastf.log ............................................ Debug log messages
webscarab.zip ........................ WebScarabs recorded HTTP messages
*.wastf.................... WASTF configuration file for replaying the test case
<Test Case 2>
...
Figure 9.3: Test case file structure on the enclosed CD-ROM
• Number of sent HTTP(S) requests: This metric compares the number of sent HTTP(S) requests
a web application security scanner used to achieve its measured accuracy.
The WIVET framework for benchmarking web link extractors has been mentioned before in section
5.2.4. The following test case tries to compare the web spider component of both WASTF and w3af in
regard of accuracy, speed and sent HTTP requests.
Table 9.1 shows a summary of the achieved results of w3af and WASTF. Because of the JavaScript
support provided by the web spider component, WASTF is able to collect considerably more pages
from the WIVET benchmark framework than w3af. With WASTF’s achieved 81% it nearly is as good
as IBM’s web spider component included in its commercially available AppScan product (see section
WASTF w3af
Runtime [s] 52 46
Accuracy 81% 16%
# HTTP Requests 575 (POST: 6 / GET: 569) 231 (POST: 5 / GET: 226)
This test case features a “large” (in means of many available HTML pages and many HTML links)
eCommerce web application based on PHP and MySQL. WASTF completed the task in 23 minutes
and found 824 different HTML pages have been found. w3af on the other hand was not able to suc-
cessfully collect any HTML page. Table 9.2 shows the recorded information by WebScarab. Sadly w3af
crashed after 233 GET requests without reporting any found HTML page. The error message received
can be seen in figure 9.4. The “bug” has been submitted to the developing community of w3af and
its status can be tracked under the following URL: http://sourceforge.net/tracker/?func=
detail&aid=3030470&group_id=170274&atid=853652.
WASTF w3af
Runtime [s] 1’380 132
Accuracy 1421 pages 0 pages
# HTTP Requests 2576 (POST: 6 / GET: 2570) 233 (POST: 0 / GET: 233)
Python version:
2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3]
GTK version:2.20.1
PyGTK version:2.17.0
Figure 9.4: w3af’s crash message while spidering the Magento eCommerce web application
Table 9.3 shows the achieved results of WASTF and w3af after spidering the popular blog software
Wordpress. Interestingly, WASTF needed almost 3 times the HTTP(S) requests in order to detect only
20 HTML pages more than w3af. After analysing the recorded HTTP traffic of WASTF the cause of
this odd behaviour could be determined. The spidering component is being trapped inside a loop
because of a special crafted URL provided by Wordpress. The request in question causing the loop is
shown in figure 9.5.
After analysing the recorded HTTP traffic the following conclusion has to be made on the origin of
the loop:
1. The originating HTML page (wp-login.php) is part of a login screen with an embedded HTML
form with text fields reserved for entering username and password of a legitimate user. If the
wrong username and/or password is being supplied the HTTP request shown in figure 9.5 is
being issued. In the end this request redirects the users browser to the originating HTML page
(wp-login.php). Normally this kind of “loop” is not an issue because the web spider routine
is able to distinguish whether a found HTML page has been visited before or not.
2. The web spider component of WASTF uses Java Prepared Statements8 to store collected URLs
in the WASTF database. The found HTML site
http://192.168.56.101:80/wordpress/wp-login.php?redirect_to=
http%3A%2F%2F192.168.56.101%2Fwordpress%2Fwp-admin%2F&reauth=1 HTTP/1.1
8 With most development platforms, parametrised statements can be used that work with parameters (sometimes called
placeholders or bind variables) instead of embedding user input in the statement. In many cases, the SQL statement is
fixed, and each parameter is a scalar, not a table. The user input is then assigned (bound) to a parameter.
In the end, the web spider routine is able to leave the endless loop because of the maxLevel setting.
Every time the URL with the redirect is being called (see figure 9.5) the internal level variable is being
increased by one. After reaching the maximum allowed level (set by the user, the default is 25) the
page is being ignored and the endless loop is resolved. To fix the issue some debugging has to be
made with URLs containing URL encoded characters such as %3C etc. This is an issue which should
be easy to fix in a second development phase.
Figure 9.5: The HTTP request causing an endless loop in the web spider component
WASTF w3af
Runtime [s] 98 113
Accuracy 93 pages 77 pages
# HTTP Requests 319 (POST: 70 / GET: 249) 99 (POST: 0 / GET: 233)
Table 9.4 shows the achieved results of WASTF and w3af after spidering the Flowershop eCommerce
web application. WASTF and w3af detect the same amount of HTML pages. The only big difference
is the running time. w3af needed more than 5 minutes to finish the web spidering process. During
the run it stopped for about one or two minutes without sending any HTTP(S) requests. This might
be a bug in the web spider component of w3af. The cause of the stalling could not be determined in
an appropriate time limit.
WASTF w3af
Runtime [s] 14 303
Accuracy 14 pages 14 pages
# HTTP Requests 168 (POST: 6 / GET: 162) 80 (POST: 6 / GET: 74)
The following test case tests the ability of WASTF to access hidden HTML pages in web applications
which are being protected by simple HTML login forms. The left side of figure 9.6 shows how to
Automated web form population is not available in w3af. See section 5.2.2 on how to collect session
cookies with w3af.
Figure 9.6: WebFormLogin plugin configuration (left side) and output (right side) for DVWA
9.3.7 Test Case: Secure Messaging (Privasphere) [Web Form Login only]
This test case tests if WASTF is able to access a users Secure Messaging mailbox in an automated way
by populating the provided login form. Privasphere provides innovative secure and authenticated
Internet messaging technologies and service to corporations and individual users. An overall of 5
HTTP(S) requests are being sent (POST: 1 / GET: 4) and the login procedure took 5 seconds. Figure
9.7 shows the configuration of WASTF on the left side and the generated output on the right side. The
login was successfull because the browser had been redirected to the users Inbox (see the line with
Received Title Text:... in figure 9.7).
[+] Executing webFormLogin...
[+] Received Title Text: PrivaSphere AG - Secure Messaging Service
plugins
[+] Found login form: loginForm
webLogin
[+] Setting field (login) to ’kevin.denver@zhaw.ch’
enable webFormLogin
[+] Setting field (password) to ’MYPASSWORD’
webFormLogin
[+] Found login button: <input type="image" name="submit" value="submit"
set formName loginForm
src="/hp/imgs/bt-login.gif" height="17" width="57"
set urlWithLoginForm https://www.privasphere.com
style="width:57px" tabindex="3"/>
set formData {"login":"kevin.denver@zhaw.ch",
[+] Clicking button...
"password":"MYPASSWORD"}
[+] Received Title Text: Inbox; Kevin’s PrivaSphere
[+] webFormLogin finished
Figure 9.7: WebFormLogin plugin configuration (left side) and output (right side) for logging into the
SecureMessaging web application
This test case uses the DVWA web application to test the ability of WASTF to detect SQL Injection
vulnerabilities. The configuration can be seen in figure 9.8. Before the database query log plugin
is being executed, WASTF enters the necessary user credentials into the login form as seen in test
case 9.3.6. This is necessary in order to access the protected HTML page with the intentional SQL
Injection vulnerability.
Figure 9.8: The DVWA database query log plugin test case configuration (Online Mode)
WASTF needed 7 seconds to complete and sent 42 HTTP requests (POST: 1 / GET: 41). The SQL
Injection vulnerability has been successfully detected in the following SQL statement: SELECT
first_name, last_name FROM users WHERE user_id = ’70032423380958851878’ and
proposed 15 working SQL payloads which have been verified by sending the payloads to the web
applications and afterwards looking for them in the SQL query log. The following input characters
are unfiltered by the input validation routine of the web application: <, >, (, ), , , „ ;, ",
’, ., !, -, =, &, *, OR, AND, UNION, ALL, SELECT. A list of working SQL payloads can
be seen in figure 9.9. This test case is being shown in the tutorial video on the enclosed CD-ROM (see
Videos > WASTF-DatabaseQueryLog-Online.ogv).
Figure 9.9: The DVWA database query log plugin test case working payloads (Online Mode)
WASTF w3af
Runtime [s] 7 45
Accuracy 1 Vulnerability 0 Vulnerabilities
# HTTP Requests 42 (POST: 1 / GET: 41) 62 (POST: 0 / GET: 62)
Table 9.5: Results of the DVWA [Database Query Log - Online Mode] test case
This test case tests the ability of WASTF to find SQL Injection vulnerabilities by using the offline mode
of the database query log plugin. Coincidentally the found SQL payloads are the same in the offline
scenario as well as in the previous online scenario (see section 9.3.8). This does not conclude that
the results are equally sound. The online mode verifies it found SQL payloads (see chapter 8 and
section 4.2.1) which the offline mode does not. Thus the online mode is more precise because the
SQL payloads reported have been tested to work.
9.3.10 Test Case: Wordpress [Web Spider & Database Query Log - Online Mode]
This test case consists of two parts. First the Wordpress application is being spidered for 10 minutes
max and after that the SQL Injection vulnerability routines are being used to detect possible weak-
nesses in the software. In the case of w3af both the “normal” and the Blind SQL Injection detection
plugin have been enabled. This test case is all about speed/performance and sent HTTP requests.
This test case has been recorded and can be found on the enclosed CD-ROM (see Videos > WASTF-
DatabaseQueryLog-Online_2.ogv and see Videos > WASTF-DatabaseQueryLog-Offline_1.ogv).
WASTF w3af
Runtime [s] 279 3’902
Accuracy - 2 False-Positives
# HTTP Requests 465 (POST: 120 / GET: 345) 4255 (POST: 2275 / GET: 1980)
Table 9.6: Results of the Wordpress [Database Query Log Online Mode & Web Spider] test case
WASTF w3af
Runtime [s] 272 3’902
Accuracy - 2 False-Positives
# HTTP Requests 632 (POST: 140 / GET: 362) 4255 (POST: 2275 / GET: 1980)
Table 9.7: Results of the Wordpress [Database Query Log Offline Mode & Web Spider] test case
* Blind SQL injection was found at: "http://192.168.56.101/wordpress/wp-comments-post.php", using HTTP method POST.
The injectable parameter is: "comment". This vulnerability was found in the requests with ids 4136 and 4137.
* Blind SQL injection was found at: "http://192.168.56.101/wordpress/wp-comments-post.php", using HTTP method POST.
The injectable parameter is: "comment". This vulnerability was found in the requests with ids 4347 and 4348.}
Looking closer at the log file generated by w3af, the reported vulnerability turned out to be a false
positive. The HTTP request and response which triggered the false positive can be seen in figure 9.10.
9.4 Summary
This chapter tested WASTF extensively with small and ‘large‘ web applications and WASTF’s per-
formance and accuracy has been better than w3af results. Apart from a small bug in spidering the
Wordpress web application (see section 9.3.4), WASTF has outperformed w3af’s web spider in either
runtime or accuracy. WASTF sends more HTTP(S) requests during the spidering phase but this is due
to the JavaScript support whereas w3af ignores external JavaScript source files.
It can be shown that the detection of SQL input validation vulnerabilities is more accurate and more
performant by combining black and white box testing methodology than only using a black box
testing approach. WASTF is 14 times faster than w3af in detecting SQL input validation vulnerabil-
ities in web applications. WASTF is still in an early phase and might contain some bugs and thus
report false positives but the false positive Blind SQL Injection vulnerability reported by w3af (see
section 9.3.10) has been successfully detected as not exploitable by WASTF. The online mode (see
chapter 4 and 8) even generates SQL exploit payloads and verifies them by actually submitting to
the targeted web application before reporting any vulnerabilities and thus increasing the accuracy of
reported vulnerabilities even further. The offline mode is not as accurate as the online mode because
WASTF’s generated SQL exploit payloads can not be tested until the SQL statement triggered by the
web application is known to WASTF. An alternative would bee to send SQL exploit payloads blindly
(same as black box security testing applications) would not increase the accuracy of the vulnerability
detection routine but only increase the HTTP(S) request footprint. WASTF tries to find working SQL
exploit payloads offline by mutating the found SQL statements in the database query log. Because
WASTF operates in the offline mode the SQL payloads can not be verified by sending them to the
targeted web application and this circumstance might lead to false positives being reported by
comment=21"+OR+"21"="21&author=FrAmE30.&url=5672&comment_parent=8&comment_post_ID=62&email=w3af%40email.com
========================================Response 4136 - Thu 29 Jul 2010 06:52:49 PM CEST=======================================
HTTP/1.1 500 Internal Server Error
content-length: 1224
x-powered-by: PHP/5.3.2-1ubuntu4.2
expires: Wed, 11 Jan 1984 05:00:00 GMT
vary: Accept-Encoding
server: Apache/2.2.14 (Ubuntu)
last-modified: Thu, 29 Jul 2010 16:52:49 GMT
connection: close
pragma: no-cache
cache-control: no-cache, must-revalidate, max-age=0
date: Thu, 29 Jul 2010 16:52:49 GMT
content-type: text/html; charset=utf-8
===============================================================================================================================
Figure 9.10: w3af HTTP log showing a request which triggered a false positive Blind SQL Injection
vulnerability
This chapter contains the user manual and the development guide for WASTF. The user manual leads
new users trough the installation process of WASTF and helps to get familiar with its commands. The
development guide is for advanced users who want to extend the functionality of WASTF and build
it from source. Because WASTF is currently only supported on Linux these two manuals describe the
installation and build process specifically for the Linux operating system.
The user manual is targeted at users who want to install and run WASTF on their machines. The
manual outlines the software and hardware requirements, installation procedure and how to start
the WASTF application in either interactive or automated mode.
WASTF is implemented purely in Java. This means that the only true requirement for running it is
an installed Java Runtime Edition (JRE). WASTF makes use of Java 6.0 features, so your JRE must
be at least of a 6.0 (1.6.0+) pedigree (either 32 or 64bit is fine). WASTF currently includes all of the
free/open source third-party libraries necessary to run WASTF in the distribution package. See the
dependencies section for a complete list of all the libraries in the raw Readme.txt at the root of the
enclosed CD-ROM). WASTF has been built and tested primarily on Linux (Ubuntu 10.04 LTS). It has
seen some informal use on Windows 7, but is not tested, packaged, nor supported on platforms other
than Linux at this time.
The following section outlines the necessary steps for installing and running WASTF.
Prerequisites: WASTF needs a working Sun JRE >= 1.6.0 in order to run correctly. Check the manual
of your Linux operating system on how to install the Sun JRE.
Installation: The installation is fairly simple. Insert the enclosed CD-ROM and double click on the
install.sh executable shell script in the CD-ROM’s root directory and follow the on screen instruc-
tions. Alternatively open a terminal and enter the following commands:
# cd /media/<CD-ROM DRIVE>
# java -jar Binaries/WASTF-installer.jar
If you have to install WASTF on a remote machine (e.g. accessed trough a SSH client) where no graphi-
cal user interface is available to you, you can alternatively install WASTF in a text-only mode by adding
the -console parameter to the installer like shown in the following snippet:
After WASTF has been installed successfully it can be started for the first time. Opening a terminal
and entering the following commands starts the WASTF binary in interactive mode:
After WASTF has been initialised and loaded all the necessary plugins the user will be prompted with
an interactive command line interface (CLI) awaiting commands from the user. WASTF is very self
explanatory and by entering help into the CLI at any time displays the currently available commands
and a short description for every command.
Starting WASTF in the automated mode is just as simple. WASTF can be started in automated mode
by opening a terminal and entering the following commands:
The enclosed CD-ROM contains various tutorial videos in the OGG video format. These videos
show how to configure various plugins and how to start them. The videos are located in the
root directory of the CD-ROM in a folder labelled Videos. The videos have been recorded using
gtk-recordmydesktop version v0.3.8.1 on a Ubuntu Linux operating system.
WASTF has the ability to produce a machine readable report of his findings after a successful scan of
a targeted web application. This report can be used by other third party tools to process the found
information or vulnerabilities and act upon them. The report uses a XML format and can be enabled
after WASTF has been started like follows:
Figure 10.2 contains a sample WASTF XML report. The XML report structure generated by WASTF
is very similar to that of w3af. This was done deliberately for easy integration in applications which
already use w3af XML reports. The XML report contains valuable information such as: the URL
pointing to the location where a SQL Injection Vulnerability has been found, the HTML GET or POST
parameter which is vulnerable to SQL code injection and a list of working payloads.
</vulnerability>
</wastf>
Option Description
userAgent This option changes the UserAgent sent by WASTF. Chang-
ing this value also alters the behaviour when parsing and exe-
cuting JavaScript code encountered in retrieved HTML pages.
Possible values are: FIREFOX_3, INTERNET_EXPLORER_6,
INTERNET_EXPLORER_7, INTERNET_EXPLORER_8
cssEnabled Turns CSS parsing on or off. Expect WASTF to be slower when
this option is enabled.
javaScriptEnabled Turns JavaScript parsing on or off.
checkSSLCertificates When this option is enabled and a connection over HTTPS is
made to a server, the server certificate will be validated before
successfully returning a connection to WASTF.
proxyHost Host of the HTTP proxy
proxyPort The port the HTTP proxy is listening on
proxyUsername If necessary configures a username for accessing the HTTP
proxy
proxyPassword If necessary configures a password for accessing the HTTP
proxy
Option Description
formName The name or id of the form containing the login form. If the
form does not have an id or a name tag this can be left blank
and WASTF will use the first HTML form it finds on the HTML
page.
urlWithLoginForm URL pointing to the HTML page containing the login form.
formData JSON encoded string containing the data which will be in-
serted into the previously specified login form. Example:
{"username_field_name_or_id":"myusername",
"password_field_name_or_id":"mypassword"}. In order
to find the necessary names of the input elements, the user has
to look at the HTML source of the page containing the HTML
form.
Option Description
maxThreads Maximal number of concurrent web spidering threads. The de-
fault value is 15. Setting this value to high actually results in
decreased performance (RAM restrictions).
maxLevel This option is a termination criteria. If the configured number
of levels have been reached the web spider will automatically
stop spidering any further.
onlyForward Enabled or disables if the web spider should only search HTML
pages inside the configured target URL. If this option is dis-
abled and a retrieved HTML page contains a link to e.g. Google
then this page will also be spidered. In most cases this option
should be enabled.
stopAfterMinutes This option is a termination criteria. After spidering x minutes
the web spider will automatically stop spidering any further.
ignoreRegex By specifying a Java regular expression, URLs matching this ex-
pression will be ignored by the web spider.
followRegex By specifying a Java regular expression, only URLs matching
this expression will be processed by the web spider (ignoring
the ignoreRegex option).
Option Description
randomSeed Contains the seed to initialise the random number/string gen-
erator used by the databaseQueryLog plugin. This value is ran-
domly generated at each start of WASTF.
Option Description
online Whether the online or offline mode should be used by the plu-
gin.
queryLogLocation The location of the database query log file.
ignoreRegex By specifying a Java regular expression, HTTP GET or POST pa-
rameters matching this expression will be ignored by the plu-
gin.
Currently there are two queryLogLocation schemes supported. Either a direct connection to a
MySQL server can be configured by the user or a MySQL query log file location. To configure a di-
rect connection to a MySQL server the following scheme has to be used for the queryLogLocation
parameter:
set queryLogLocation
jdbc:mysql://<hostname_or_ip>:<port>/mysql?user=root&password=<password>
If the offline mode is being used and a MySQL query log file should be analysed by WASTF, the follow-
ing scheme has to be used for the queryLogLocation parameter:
set queryLogLocation
file:mysql://<path_to_query_log_file>
The offline mode is being enabled by setting the online parameter to false. This tells the plugin to
send all the necessary requests for identifying SQL Injection vulnerabilities but expects the database
query log to be provided by the user whenever he has the necessary information available. WASTF
keeps the necessary information persistent and thus does not have to be running until the database
query log becomes available. When it does, the following commands starts the detection routine:
The development guide is targeted at users who want to extend the functionality of WASTF and build
it from source on their own machines. The following sections contain detailed instructions on how
to build WASTF and what kind of tools and libraries have been used during the initial development
phase of the WASTF application.
10.2.1 Prerequisites
The following section outlines the prerequisites needed for building WASTF from source.
Sun Java 1.6 JRE / JDK: WASTF has been developed and tested under the Ubuntu 10.04 LTS “Lucid
Lynx” operating system with the Java(TM) SE Runtime Environment (build 1.6.0_20-b02) 32 and
64bit and Java(TM) SE Development Kit (1.6.0_20) 32 and 64 bit. Check the manual of your Linux
operating system on how to install the Sun JRE and JDK.
Maven 2: Apache Maven is a software project management and comprehension tool. Based on the
concept of a project object model (POM), Maven can manage a project’s build, reporting and doc-
umentation from a central piece of information. If you are using the Ubuntu operating system you
can install Maven by simply typing: sudo apt-get install maven into a terminal. See the Maven
documentation on how to install Maven for your specific operating system
(http://maven.apache.org/download.html#Installation).
Insert the enclosed CD-ROM and double click on the install.sh executable shell script in the CD-
ROM’s root directory and follow the on screen instructions of the installer1 . Alternatively open a
1 Theinstaller has been created using the free IzPack project. IzPack is a one-stop solution for packaging, distributing and
deploying applications. It is fully cross-platform and generates a single installer. As such, it is an alternative to native
solutions such as platform-specific installers and package managers. IzPack-generated installers only require a Java
# cd /media/<CD-ROM DRIVE>
# java -jar Binaries/WASTF-installer.jar
Make sure that you select the Sources option in the package selection panel in order to install the
WASTF source files needed for extending and building WASTF from source. If you have to install the
WASTF sources on a remote machine (e.g. accessed trough a SSH client) where no graphical user
interface is available to you, you can alternatively install WASTF in a text-only mode by adding the
-console parameter to the installer like shown in the following snippet:
After successfully installing the WASTF source files, several Java libraries needed for building WASTF
have to be added to the local Maven repository on the development machine. This is necessary be-
cause these specific libraries are not part of an official and online available Maven repository. Open a
terminal and enter the following commands in order to add the necessary libraries to the local Maven
repository:
The install-missing-maven-deps.sh script has to be executed only once after installing the
sources. See section 10.2.4 for a complete list of used free/third party libraries which have been used
by WASTF.
Building WASTF from source is fairly easy and is taken care of by Maven. Enter the following com-
mands into a terminal for building WASTF from the previously installed source files:
Alternatively, if you want to skip the JUnit tests which are being automatically executed by Maven
after the build process you can provide the -Dmaven.test.skip parameter to the mvn command.
For more information about specific Maven commands visit http://maven.apache.org/.
WASTF relies on several free and third party libraries to provide the needed functionality. Table 10.9
provides a list of all the libraries used and their specific licensing scheme. The following three libraries
are not part of an official and online available Maven repository and have to be installed manually (see
section 10.2.2).
virtual machine to run. http://izpack.org/ [10.07.10]
• JsqlParser Version 0.6.2.a: JSqlParser parses a SQL statement and translates it into a hierarchy
of Java classes. The generated hierarchy can be navigated using the Visitor Pattern.
• Apache Derby Debug Library: Apache Derby, an Apache DB sub project, is an open source
relational database implemented entirely in Java.
WASTF uses the free SLF4J project to provide extensive logging capabilities to the users of WASTF. The
Simple Logging Facade for Java (SLF4J) or serves as a simple facade or abstraction for various logging
frameworks, e.g. java.util.logging, log4j and logback, allowing the end user to plug in the desired
logging framework at deployment time. By making modifications to the logback.xml file inside the
WASTF packaged jar file, the logging details can be increased in order to create a more verbose output
for debugging and program error identification purposes.
The enclosed CD-ROM contains a virtual machine with various pre-installed and configured web
applications for testing WASTF. The virtual machine is located in the root directory of the CD-ROM in
a folder labelled Test Environment. See chapter 9 for more details about the test environment.
10.3 Sonar
Sonar2 is an open platform to manage code quality and has been extensively used during the devel-
opment of the WASTF application. Sonar covers the 7 axes of code quality (see table 10.10).
Sonar is a web-based application. Rules, alerts, thresholds, exclusions, settings... can be configured
online. By levering its database, Sonar not only allows to combine metrics altogether but also to mix
them with historical measures. Covering new languages, adding rules engines, computing advanced
metrics can be done through a powerful extension mechanism. More than 30 plugins are already
available.
Sonar makes heavy use of already existing source quality tools such as PMD, Checkstyle, Findbugs
and others for providing meaningful statistics about the quality of analysed code. Additional features
include (taken from the Sonar project web site):
• Drill down to source code: Want to know why a project has for instance so many coding rules
violations? Drill down to modules, then to packages and finally to source code
2 http://www.sonarsource.org/ [09.07.10]
• Unit Tests: Nowadays, what does code quality mean without considering unit tests and associ-
ated metrics like code coverage?
• Time Machine: Sonar helps you replay the past and show you how quality metrics evolve in
time.
• Coding Rules: More than 600 coding rules are provided off the shelf from simple naming con-
vention to complex anti - pattern detection.
• Leverage existing components: You probably already use best of breed tools like Checkstyle,
PMD, Findbugs, Clover, Cobertura. Sonar can transparently orchestrate all those components
for you.
The first step in analysing WASTF with Sonar is to install Sonar by downloading the binary package
from the Sonar project web site. The installation of Sonar is finished by extracting the files contained
in the previously downloaded archive into an arbitrary folder. In order to analyse any kind of Java
project with Sonar the following commands have to be executed every time:
• Starting Sonar: If Sonar is not already running, it has to be started manually by opening a
terminal and entering the following commands:
Give Sonar a few seconds to boot up before moving to the next step.
• Analyse WASTF: In order to analyse WASTF with Sonar one simply has to type the following
commands into a terminal:
Maven automatically builds and runs the necessary tools for generating meaningful statistics
about the project which are being interpreted and visually rehashed by Sonar. The results and
statistics can be accessed trough the Sonar Management Console. The Management Console
can be accessed by opening http://localhost:9000 inside a web browser.
After using Sonar during the development of WASTF which is a complex and relatively large (about
12’000 lines of code) application, the following conclusions have to be drawn:
• If a developer already uses source code analysing tools such as PMD, Checkstyle and Findbugs
then Sonar does not provide more information than the developer already has except for a view
on Java package dependency cycles3 . These could be easily obtained by another stand alone
tool such as JDepend4 .
• It is easy to become a slave of the metrics provided by Sonar. Endless refactoring of source code
only to satisfy generated metrics might lead to more complex and unreadable code.
• Sonar can be easily integrated into a continuous integration tool such as Hudson5 . The inte-
gration in such tools reduces the need to manually start Sonar. For a guide on how to integrate
Sonar into Hudson consult the following quick guide: http://meera-subbarao.blogspot.
com/2009/11/hudson-sonar-perfect-match.html.
If a developer is not familiar with code analysing tools, then Sonar is an easy way to monitor software
quality throughout the development of a software project. Sonar operates with a good set of default
settings for its source code analysing tools which reduces the configuration basically to zero. One of
the main advantages of Sonar as opposed to using source code analysing tools on their own is the
consolidated Management Console which allows to quickly determine the current code quality with
a blink of an eye.
3 Cycles exist across a variety of modules; notably class, package and .jar. Class cycles exist when two classes, such as Cus-
tomer and Bill, each reference the other (assume Customer has a list of Bill instances, and Bill references the Customer
to calculate a discount amount). This is also known as a bi-directional association. It is a maintenance and testing issue,
since a developer can not do anything to either class without possibly affecting the other. Class cycles can be broken a
few different ways, one of which is to introduce an abstraction that breaks the cycle.
4 http://www.clarkware.com/software/JDepend.html [09.07.10]
5 https://hudson.dev.java.net/ [11.07.10]
This chapter recapitulates the findings and limitations of this project thesis and proposes a list of
further steps for the project.
11.1 Conclusion
This project thesis has shown that it is possible to enhance the detection of security flaws in web
applications by combining black and white box testing techniques.
The detection of SQL and persistent XSS Injection flaws is based on doing common black box tests
but additionally parsing database query log files which contain every SQL statement sent from the
web application to the database. The test agent is now free to send various SQLi or XSS attack strings
and is instantly able to tell if these inputs are being filtered by the web application or not. Once the
triggered SQL statements end up in the database query log file, they have left the web application
and passed all the validation routines implemented by the developers. The test agent can now tell if
his malicious attack strings have reached the database unfiltered or not.
A web application security testing framework (WASTF) has been developed during this project thesis
with four finished plugins so far:
• Web Form Login: A plugin that can be used for automated log in into web applications by
configuring the HTML form elements to populate with the necessary user credentials.
• Web Spider: A web spider based on the open source HtmlUnit library with JavaScript support,
scoring 81% in the Wivet benchmark (w3af has 50%).
• Database Query Log: A combined white and black box plugin which detects input validation
flaws in web applications by parsing the database query log in an online and offline mode.
• XML Report: A plugin for writing a XML report containing the findings of all the plugins which
have been enabled during a run.
WASTF can be easily integrated into the Automated Security Testing Framework (ASTF) which
has been developed at the Institut für angewandte Informationstechnologie (InIT) because of the
machine readable XML report file. WASTF has been developed with extendibility in mind and in
compliance with Findbugs, Checkstyle and Sonar.
It can be shown that the detection of SQL input validation vulnerabilities is more accurate and more
performant by combining black and white box testing methodology than only using a black box test-
ing approach. WASTF is 14 times faster than w3af in detecting SQL input validation vulnerabilities
in web applications. WASTF is still in an early phase and might contain some bugs and thus report
false positives but the false positive Blind SQL Injection vulnerability reported by w3af (see section
9.3.10) has been successfully detected as not exploitable by WASTF. The online mode (see chapter 4
and 8) even generates SQL exploit payloads and verifies them by actually submitting to the targeted
web application before reporting any vulnerabilities and thus increasing the accuracy of reported
vulnerabilities even further. The offline mode is not as accurate as the online mode because WASTF’s
generated SQL exploit payloads can not be tested until the SQL statement triggered by the web ap-
plication is known to WASTF. An alternative would bee to send SQL exploit payloads blindly (same
as black box security testing applications). They would not increase the accuracy of the vulnerabil-
ity detection routine but only increase the HTTP(S) request footprint. WASTF tries to find working
SQL exploit payloads offline by mutating the found SQL statements in the database query log. Be-
cause WASTF operates in the offline mode, the SQL payloads can not be verified by sending them to
the targeted web application and this circumstance might lead to false positives being reported by
WASTF.
This section outlines further steps which would make WASTF more mature and useful for its users.
The application architecture should be revised and refactored if necessary. Adding more JUnit test
cases is a bit difficult with the current application architecture due to external and internal package
dependencies. The factoring process should follow these four steps:
1. Identify classes: Identify the classes of the system which should be tested with an appropriate
JUnit test case.
2. Find test points: Identify the methods and classes which are needed to validate the correctness
of the test.
3. Break dependencies: Make changes that the identified class can be tested in isolation.
11.2.2 Features
• The database query log module only supports the MySQL database at the moment. Adding
support for other database solutions is trivial and should be done to increase the possible ap-
plications.
• Adding more white and/or black box plugins to increase the feature list of WASTF.
• Eventually add a graphical user interface (GUI) for WASTF to make it easier to use.
• Enhance the SQL payload generation routine and add more SQL validation libraries for other
SQL dialects.
6.1 Menu and plugin hierarchical tree structure of the WASTF framework . . . . . . . . . . . . . . 49
6.2 Simplified UML 2 Package Overview of the WASTF Application . . . . . . . . . . . . . . . . . . . 51
6.3 Entity Relationship Model (ERM) of the current implemented WASTF database schema 54
7.1 JavaScript execution time in various browsers for HTML DOM modifiactions . . . . . . . . 61
7.2 Simplified UML 2 activity diagram showing the main flow of the web spider . . . . . . . . . 64
7.3 Simplified UML 2 activity diagram for the non invasive smart fill HTML form module . . 66
9.1 Various pre-installed applications in the WASTF test environment virtual machine . . . . 80
9.2 Test Set-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.3 Test case file structure on the enclosed CD-ROM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.4 w3af’s crash message while spidering the Magento eCommerce web application . . . . . . 85
9.5 The HTTP request causing an endless loop in the web spider component . . . . . . . . . . . 86
9.6 WebFormLogin plugin configuration (left side) and output (right side) for DVWA . . . . . . 87
9.7 WebFormLogin plugin configuration (left side) and output (right side) for logging into
the SecureMessaging web application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.8 The DVWA database query log plugin test case configuration (Online Mode) . . . . . . . . . 88
9.9 The DVWA database query log plugin test case working payloads (Online Mode) . . . . . . 88
9.10 w3af HTTP log showing a request which triggered a false positive Blind SQL Injection
vulnerability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 Typical exploit code used by security testing applications to detect XSS and SQLi injec-
tion vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Simplified performance comparison of the “Online” and “Offline” strategies for detect-
ing input validation vulnerabilities by parsing the database query log . . . . . . . . . . . . . . 39
4.1 Enabling of the MySQL query log trough the configuration file . . . . . . . . . . . . . . . . . . . 24
4.2 Enabling of the MySQL query log at runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Definition of the MySQL query log table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Content of the MySQL query log table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Enabling the query log trough the PostgreSQL configuration file . . . . . . . . . . . . . . . . . . 27
4.6 Content of the PostgreSQL query log file (CSV-format output) . . . . . . . . . . . . . . . . . . . . 28
4.7 Definition of the PostgreSQL query log table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.8 Pseudo code for detecting input validation vulnerabilities by parsing the database
query log at runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.9 Pseudo code for detecting input validation vulnerabilities by parsing the database
query log as a post scanning process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.1 Short HtmlUnit example for submitting a login form trough a proxy server . . . . . . . . . . 59
7.2 Dynamically created HTML login form with embedded JavaScript code . . . . . . . . . . . . 59
Listings | 109
8.1 Validating a SQL statement with JsqlParser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.2 Validating a SQL statement with Apache Derby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Listings | 110
Bibliography
[1] Mike Andrews and James A. Whittaker. How to Break Web Software: Functional and Security
Testing of Web Applications and Web Services. Addison-Wesley Professional, 2006.
[2] Kevin Denver. Development and Integration of ASTF-Plugins in the field of Web Application
Vulnerabilities. MSE Project Thesis 1, February 2009.
[3] Kevin Denver. Evaluation geeigneter Plugins im Bereich Web Application Vulnerabilities für die
Integration ins ASTF. MSE Seminary Thesis 1, November 2009.
[4] Kevin Denver. Preliminary Study of white box security testing for the integration into the ASTF
framework. MSE Seminary Thesis 2, February 2010.
[5] J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, and L. Stewart. HTTP
Authentication: Basic and Digest Access Authentication. RFC 2617 (Draft Standard), June 1999.
[6] Patrice Godefroid, Michael Y. Levin, and David A Molnar. Automated whitebox fuzz testing. In
Network Distributed Security Symposium (NDSS). Internet Society, 2008.
[7] Paco Hope and Ben Walther. Web Security Testing Cookbook: Systematic Techniques to Find Prob-
lems Fast. O’Reilly Media, Inc., 2008.
[8] David Chenho Kung, Chien-Hung Liu, and Pei Hsia. An object-oriented web test model for test-
ing web applications. In COMPSAC ’00: 24th International Computer Software and Applications
Conference, pages 537–542, Washington, DC, USA, 2000. IEEE Computer Society.
[9] Craig Larman. Applying UML and Patterns: An Introduction to Object-Oriented Analysis and
Design and Iterative Development (3rd Edition). Prentice Hall PTR, Upper Saddle River, NJ, USA,
2004.
[10] Robert Cecil Martin. Agile Software Development: Principles, Patterns, and Practices. Prentice
Hall PTR, Upper Saddle River, NJ, USA, 2003.
[11] Michael C., Lavenhar Steven R. Building security in: Source code analysis tools - overview.
https://buildsecurityin.us-cert.gov/bsi/articles/tools/code/263-BSI.html, 2009.
[12] MySQL 5.1 Reference Manual. The General Query Log. http://dev.mysql.com/doc/refman/5.1/en/query-
log.html, 2010.
Bibliography | 111
[14] Joel Scambray, Mike Shema, and Caleb Sima. Hacking Exposed Web Applications, Second Edition.
McGraw-Hill, Inc., New York, NY, USA, 2006.
[15] Paolo Tonella and Filippo Ricca. Dynamic model extraction and statistical analysis of web ap-
plications. In WSE ’02: Proceedings of the Fourth International Workshop on Web Site Evolution
(WSE’02), page 43, Washington, DC, USA, 2002. IEEE Computer Society.
[16] Paolo Tonella and Filippo Ricca. A 2-layer model for the white-box testing of web applications. In
WSE ’04: Proceedings of the Web Site Evolution, Sixth IEEE International Workshop, pages 11–19,
Washington, DC, USA, 2004. IEEE Computer Society.
Bibliography | 112
JavaScript DOM Modifications Performance Test
Web Browser createElement createTextNode cloneNode appendChild insertBefore innerHTML Total
[ms] [ms] [ms] [ms] [ms] [ms] [ms]
Chromium 6.0.432.0 #1 3 4 14 11 11 30 73
#2 3 4 8 10 10 33 68
#3 2 4 10 12 13 31 72
#4 2 3 10 12 10 28 65
#5 2 3 12 10 10 32 69
2.4 3.6 10.8 11 10.8 30.8 69.4 Δ 19
import com.gargoylesoftware.htmlunit.CollectingAlertHandler;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class HtmlUnitBenchmark {
public static void main(final String[] args)
throws FailingHttpStatusCodeException,
MalformedURLException, IOException, InterruptedException {
final WebClient webClient = new WebClient();
final List<String> collectedAlerts = new ArrayList<String>();
webClient.setCssEnabled(false);
webClient.waitForBackgroundJavaScriptStartingBefore(9000);
webClient.setThrowExceptionOnScriptError(false);
webClient.setAlertHandler(new CollectingAlertHandler(collectedAlerts));
final HtmlPage page =
webClient.getPage("file:///home/kevin/Desktop/dommodify.html");
while(true) {
if (page.getTitleText().equalsIgnoreCase("Finished")) {
break;
}
synchronized (page) {
page.wait(1);
}
}
System.out.println(collectedAlerts);
}
}
getDiff:function (){
d = new Date();
return (d.getTime()time);
}
}
// Try to force real results
var ret, tmp, str;
var elems = [];
var htmlstr = document.body.innerHTML;
var div = document.createElement("div");
var num = 400;
18.06.10
Kevin Denver HtmlUnit JavaScript Perfomance Benchmark MSE Project Thesis 2
for (var i = 0; i < 1; i++) {
str += String.fromCharCode( (25 * Math.random()) + 97 );
timeDiff.setStartTime();
for ( var i = 0; i < num; i++ ) {
ret = document.createElement("div");
ret = document.createElement("span");
ret = document.createElement("table");
ret = document.createElement("tr");
ret = document.createElement("select");
}
alert("createElement(): " + timeDiff.getDiff());
timeDiff.setStartTime();
for ( var i = 0; i < num; i++ ) {
ret = document.createTextNode(str);
ret = document.createTextNode(str + "2");
ret = document.createTextNode(str + "3");
ret = document.createTextNode(str + "4");
ret = document.createTextNode(str + "5");
}
alert("createTextNode(): " + timeDiff.getDiff());
timeDiff.setStartTime();
document.body.innerHTML = htmlstr;
alert("document.body.innerHTML: " +timeDiff.getDiff());
elems = [];
var telems = document.body.childNodes;
for ( var i = 0; i < telems.length; i++ ) {
elems.push( telems[i] );
}
timeDiff.setStartTime();
for ( var i = 0; i < elems.length; i++ ) {
ret = elems[i].cloneNode(false);
ret = elems[i].cloneNode(true);
ret = elems[i].cloneNode(true);
}
alert("cloneNode(): " + timeDiff.getDiff());
timeDiff.setStartTime();
for ( var i = 0; i < elems.length; i++ ) {
document.body.appendChild( elems[i] );
}
alert("appendChild(): " + timeDiff.getDiff());
timeDiff.setStartTime();
for ( var i = 0; i < elems.length; i++ ) {
document.body.insertBefore( elems[i], document.body.firstChild );
}
alert("insertBefore(): " + timeDiff.getDiff());
document.title = "Finished";
}
alert("Finished");
</script>
18.06.10
InIT Institut für angewandte
Informationstechnologie
Projektarbeit
Das Automated Security-Testing Framework (ASTF) wurde gemeinsam vom InIT und der
Firma PrivaSphere AG im Rahmen eines KTI-Projekts entwickelt. Das ASTF vereinigt ver-
schiedene Security Testing-Tools, um Online-Applikation konsequent und reproduzierbar
bezüglich Sicherheit zu testen. Dabei verwendet ASTF Tools aus den Bereichen Black-Box
Testing (z.B. mittels Vulnerability-Scans von Aussen) und White-Box Testing (z.B. Static
Source Code Analysis).
In einer vorgängigen Seminararbeit [1] wurde untersucht, welche weiteren White-Box Securi-
ty Testing-Ansätze möglich wären und sich dadurch für die Integration in ASTF eignen wür-
den. Der Fokus lag dabei bei Security-Tests von Web-Applikationen, also z.B. beim Aufdec-
ken von Schwachstellen in den Bereichen Injection-Flaws oder Access Control. In der Semi-
nararbeit wurde ebenfalls ein konkreter Vorschlag gemacht, welche der vorgeschlagenen
Tests im Rahmen dieser Projektarbeit umgesetzt werden sollen und die vorlegende Aufgaben-
stellung hält sich an diesen Vorschlag.
Im Folgenden wird der in der Seminararbeit vorgeschlagene Ansatz kurz vorgestellt, für ge-
nau Details ist der Seminararbeits-Bericht [1] zu konsultieren. Grundsätzlich geht es darum,
Injection-Flaws in Webapplikationen zu detektieren. Üblicherweise geschieht dies heute so,
dass ein reiner Black-Box Scan durchgeführt wird (z.B. mittels w3af) und das verwendete
Testing-Tool versucht dann anhand der Antworten der Webapplikation zu erkennen, ob eine
Schwachstelle vorliegt. Das Problem mit diesem Ansatz liegt darin, dass er häufig nicht sehr
präzise ist und viele false positives und false negatives produziert. Der Vorschlag der Semi-
nararbeit ist es nun, diesen Ansatz mit einer Analyse der serverseitigen Datenbank-Logfiles,
in welchen üblicherweise die Datenbank-Queries geloggt werden, zu kombinieren. Die Sicht
auf das Server-interne Datenbank-Logfile ist hier also die White-Box Komponente, um den
Termine
Der Zeitrahmen dieser Projektarbeit ist so gewählt, dass Herr Denver den Umfang von 420
Stunden neben deiner Belastung durch das MSE-Studium (3 Module à 3 ECTS im FS 2010)
und Assistententätigkeit am InIT erbringen kann.
Aufgabenstellung
Ihre Aufgabe besteht darin, ein Testing-Tool zu entwickeln, das gemäss obiger Beschreibung
die Aufdeckung von Injection-Flaws in Webapplikationen durch Kombination von Black- und
White-Box Testing-Methoden ermöglicht. Dabei soll der Fokus auf dem in der Seminararbeit
vorgeschlagenen Ansatz liegen, wobei es natürlich durchaus möglich ist, dass dieser Ansatz
im Verlaufe der Projektarbeit durch neue Erkenntnisse angepasst wird.
Die Anforderungen an das zu entwickelnde Testing-Tool sind wie folgt:
• Der verwendete Webcrawler soll möglichst alle URLs einer gegebenen Webapplikation
bestimmen können. Im Idealfall kann der Crawler auch mit Javascript umgehen. Der
Crawler soll mit einer möglichst grossen Zahl von Webapplikationen umgehen können
(z.B. Secure Messaging von PrivaSphere, Apache Wicket etc.) und auch das Login bei
der zu crawlenden Applikation selbstständig durchführen können.
• Ihr Testing-Tool soll grundsätzlich mit einer Vielzahl von Datenbank-Logfiles umgehen
können. In einem ersten Schritt und im Rahmen der Projektarbeit soll dabei mindestens
MySQL unterstützt werden.
• Die Software-Komponente auf dem Server-System soll möglichst klein und minimal-
invasiv sein. Im Rahmen der Projektarbeit soll diese Komponente mindestens auf einem
Linux-System lauffähig sein.
• Derzeit macht der vorgeschlagene Ansatz für die Detektion von SQL Injection und persi-
stenten XSS Attacken Sinn. Überlegen Sie sich zu Beginn der Projektarbeit nochmals ge-
nau, ob sich damit durch kleinen Zusatzaufwand allenfalls auch noch andere Attacken
(z.B. durch Analyse weiterer serverseitiger Logfiles) erkennen können.
• Das Testing-Tool soll sich einfach in ASTF integrieren lassen. Es muss also ein Com-
mand-Line Interface aufweisen und einen strukturierten, im Idealfall XML-basierten,
Output generieren.
Die Bewertung der Projektarbeit setzt sich aus folgenden Punkten zusammen:
• Schriftlicher Bericht
• Mündliche Präsentation
Referenzen
[1] Kevin Denver. Preliminary Study of white box security testing for the integration into
the ASTF framework. MSE Seminary Thesis, February 2010.