Sie sind auf Seite 1von 3

METHODOLOGY

Todays data mining tools have typically evolved out of the pattern recognition and artificial
intelligence research. These tools have a heavy algorithmic component and are often rather
bare with respect to user friendliness and generality. They mostly work on flat files, which
imposes a significant constraint for their deployment in a corporate environment. In case of
modern corporate databases, copying huge data sets from databases into flat files is not tolerable.
The possibility to directly access different

database types becomes a critical requirement for

modern database mining systems. Furthermore, a sophisticated yet straightforward graphical user
interface (GUI) is a must. All these requirements are provided by EHMM.
The efficient hidden markov model system consists of five main Process
Global Constants Process (GCP):
The purpose of this process is to bundle all the global variables declared in the system
(except the external dynamic link library (DLL) part). For example, record sets used to train and
test the detection techniques are globally declared in the GCP and accessed by other Process.
Core/Graphical User Interface Process (GUIP):
This Process not only allows the user to comfortably control the entire system, but also
serves as the glue for all other Process. It serves as a container for all GUI-related routines,
including the callback code or auxiliary functions for widget control. Moreover, this Process
handles the creation of Data Retrieval description files, which are then accessed. The GUIP
communicates with the GCP Process.
Database Interface Process (DBIP):
This is handles the communication between the database and the remaining Process. It
contains the code for such operations as initialization, opening and modification of databases,
assignment of database fields to GUI data control widgets, querying of the databases via SQL, or
assignment of selected record sets to the global variables. Currently, the database systems
Microsoft Access, dBase, FoxPro, Paradox and ODBC compatible systems are supported; the

test version for the credit card application uses MS Access. The DBIP cooperates with the GUIP,
GCP.
Learning Techniques Library (LTL):
It provides the Data Retrieval learning techniques. In the current version, it is limited to
only a few Data Retrieval architectures with three different learning rules, but it is easily
extensible to any other adaptive techniques which are used to detect anomalies in a customers
credit card usage dynamics. The module has own database access facilities making it
autonomous, i.e. independent of the core part of the system while retrieving transaction data or
marking fraudulent records. This keeps the interfaces to the core highly efficient. No transaction
data interchange takes place at the boundaries of the LTIP and LTL Process. The only
information required by the LTL is the name of the corresponding database, its type, the name of
the network description file, the name of the network optimization parameters description file,
and a few arguments of simple data types. There is a number of routines made available to the
core part of EHMM including DLL initialization functions, database initialization functions,
learning and detection functions, or a routine for saving network parameters. The LTIP
communicates with the LTL process and the database.
Learning Techniques Interface Process (LTIP):
It provides an interface between the core and the Data Retrieval library. This is a rather
small module containing a simple interface to external LTL functions. There are only two
functions inside: train and test with method dependent calls to LAL test/train functions. The
method dependency is aimed to provide an option for future enhancement of EHMM by further
adaptive methods.
At present, only feed-forward network architectures are implemented in EHMM.
although it is currently not possible to change the number of layers and the type of connections,
the user has the freedom to manipulate many other parameters through the GUI. The Data
Retrieval parameter definition process is divided into two steps: first, the user selects the
topology of the network, including the number of nodes in each layer and the activation
functions of every layer. Then, the optimization parameters are determined. The learning rule of
a network may be selected in the current version among conjugate gradient, backpropagation

with momentum, and batch backpropagation with momentum. The backpropagation learning rule
is a standard learning technique. It performs a gradient descent in the error/ weights space. To
improve the efficiency, a momentum term is introduced, which moves the correction of the
weights in the direction compliant with the last weight correction.
There are several issues for future research, such as the improvement of the fraud
detection criteria in EHMM, the introduction of more sophisticated Data Retrieval architectures
and types, the extension of the pool of available detection techniques, an acceleration of the
database access, the adaptation of the system to parallel databases, and the improvement of the
GUI to make the control of the system even more intuitive. Another important task is to find
more efficient ways to represent the knowledge about the customers, i.e. the current limitation
one network per customer should be substituted by a more compact solution. Furthermore, by
making the database access more general and independent of the credit card database EHMM
could be extended to a general-purpose anomaly detection system

Das könnte Ihnen auch gefallen