Beruflich Dokumente
Kultur Dokumente
Abstract:
Introduction:
This following page describes the system design in terms of packages, classes,
relationships, and behavior. Several attached worksheets address specific
aspects of the overall system design, such as user interface and database
design.
This design is intended for helping in creating a rich interface for web
administrators to analyze the web log data and find anomalies in websites.
Feasibility: As per the Gantt chart the amount of time spend on design is
accurate and it is feasible.
Modularity: There is no particular software for parsing Web Log data and it is
unique. And this design comprises of all modules separated distinctly.
Testability: It is very easy to test the system by Testing tools. Manual testing is
also done for verification and validation on each module individually and also on
whole.
Architecture Overview
Software architecture style is being used:
Single web service: app-server, database.
What are the ranked goals of this architecture?
1. Ease of integration
2. Extensibility
3. Capacity matching
Components
The components of this system:-
The components of this system are listed below by type:
* Presentation/UI Components
o C-00: WeblogUI
* Application Logic Components
o C-10: WebLogLogic
* Data Storage Components
o C-20: WebLogStorage
Deployment
The Components are deployed as follows:-
* All-in-one server
o WebLogFront End
+ C-00: WebLogUI
+ C-10: WebLogLogic
o Database process
+ C-20: WebLogStorage
Integration
Architectural Scenarios
The following sequence diagrams give step-by-step descriptions of how
components communicate during some important usage scenarios:
* System startup
* System shutdown
* ParsingLog
* ExportingLog
Architecture Checklist
Ease of integration: It uses the mechanisms been provided for all needed types
of integration and all of the new components are designed to work together.
And, the reused components are integrated via fairly simple interfaces.
Overview
It roughly follows the standard proposed in the Visual Studio .NET
documentation.
Build Targets
Target Description
compile Compiles VB.NET source code and creates and creates an Executable file.
Load Loads the intended Log file into Application
This is the main target of the application, the log file has to be parsed
Parse
and stored in a temporary space.
It will export the parsed data to database and remove the temporary
Export
space used by it at the time of parsing.
Analyze Analyze the exported data from database.
User Interface
Overview
The ranked goals for the user interface of this system:
Task Models
Only Web administrators will use this software for finding drawbacks in web site.
Persistence
Central Database
Database access controls will be used:-
A database user account has been created that has access to the needed
application database tables. The username and password for this account
is stored in a configuration file read by the application server.
This application's central database accessible to other applications:-
No. This database should always be accessed through this application. All
relevant pieces of information are available through the application
interfaces. The database itself does not protect against data corruption
that could be caused by other applications.
File Storage
Nothing is stored in files, everything is in the database. The server stores most
data in the database; all user documents are stored in files on their computer
hard disk.
Main_Parsed Table1:-
Field Name Data Type Length Description
Unique_ID AutoNumber 50 Unique Number to Identify the records.
This is the address of the computer making
Client_IP VARCHAR2 50
the HTTP request. The server records the IP
The field is designed to identify the
RFC_Name VARCHAR2 20 requestor. If this information is not recorded,
a hyphen (-) holds the column in the log.
If using local authentication and registration,
LogName VARCHAR2 20 the user's log name will appear; likewise, if
no value is present, a "-" is substituted.
The format is DD/Mon/YYYY:HH:MM:SS
Log_Date TIMESTAMP
+GMT
Req_method VARCHAR2 20 Request Method is GET, PUT, POST, or HEAD
Req_Path VARCHAR2 256 Path is the path and file retrieved
Req_Protocol VARCHAR2 20 It defines the protocol used by the Client
HTTP completion code. 200: OK 3xx: Some
Stat_Code VARCHAR2 3 sort of Redirection 4xx: Some sort of Client
Error 5xx: Some sort of Server Error
For GET HTTP transactions, this field is the
number of bytes transferred. For other
Req_Bytes VARCHAR2 10
commands this field will be a hyphen (-) or a
zero (0)
The referrer URL indicates the page where
Referrer VARCHAR2 50 the visitor was located when making the next
request.
The user agent is information about the
User_agent VARCHAR2 200 browser, version, and operating system of
the reader. The general format is:
GMT Table2:-
Field Name Data Type Length Description
GMT SMALLINT 5 Greenwich Mean Time in number format
Zone VARCHAR2 2 Zone of the GMT
IP2Country Table3:-
Field Name Data Type Length Description
Starting IP address (Numerical
IP_From NUMBER 12
representation of IP address)
Ending IP address (Numerical representation
IP_To NUMBER 12
of IP address.)
This is having reserved address numbers. It
Registry VARCHAR2 10 contains “apcnic, arin, lacnic, ripencc,
afrinic”
Country_Code VARCHAR2 3 Code of the country
Country VARCHAR2 20 Full Description of the country
User_agent Table4:-
Field Name Data Type Length Description
User Agent String with all information
U_Agent_String VARCHAR2 100
about the Client system.
U_Agent_Type VARCHAR2 2 S-Spiders, R-Robots, C-Crawler, B-Browser
Browser VARCHAR2 10 Browser Version
Platform VARCHAR2 10 Platform of User
Req_Resourse Table5:-
Field Name Data Type Length Description
Req_URL VARCHAR2 100 Requested URL path
Req_File VARCHAR2 50 Requested file
Req_Bytes NUMBER 10 Requested file Size in bytes
Status_Code Table6:-
Field Name Data Type Length Description
Stat_Code NUMBER 3 HTTP completion code.
200: OK 3xx: Some sort of Redirection 4xx:
Stat_C_Desc VARCHAR2 25 Some sort of Client Error 5xx: Some sort of
Server Error
Host_Summary Table7:-
Field Name Data Type Length Description
This is the address of the computer
Client_IP VARCHAR2 50 making the HTTP request. The server
records the IP
Referrar_Code Table8:-
Field Name Data Type Length Description
Ref_URL VARCHAR2 100 Referral URL
Ref_Site VARCHAR2 100 Referring WebSite
Keywords used to search the content in
Key_Word1 VARCHAR2 20
website
Keywords used to search the content in
Key_Word2 VARCHAR2 20
website
Keywords used to search the content in
Key_Word3 VARCHAR2 20
website
Keywords used to search the content in
Key_Word4 VARCHAR2 20
website
Keywords used to search the content in
Key_Word5 VARCHAR2 20
website
Search_Engine VARCHAR2 20 Name of the Search Engine
Dom_Name VARCHAR2 5 Name of the Domain
Parse Log
Data
Update in
database
OuterView Of Project
Access_Stats
Host_Stats
WebAdmin
Referrer_Stats
User_Agent_Stats
User_Agent
Attributes
Private U_Agent_URL As Character
Private Type As Character
Private Browser As Character
Private Platform As Character
Operations
Public Function Class_Initialize()
Public Function getU_Agent_URL() As Character
Public Sub setU_Agent_URL( val As Character )
Public Function getType() As Character
Public Sub setType( val As Character )
Public Function getBrowser() As Character
Public Sub setBrowser( val As Character )
Public Function getPlatform() As Character
Public Sub setPlatform( val As Character )
U_A_OS U_A_Browser
Attributes Attributes
Private NoOfHits As Integer Private NoOfHits As Integer
Private Bandwidth As Integer Private Bandwidth As Integer
Private NoOfPages As Integer Private NoOfPages As Integer
Operations Operations
Public Function Class_Initialize() Public Function Class_Initialize()
Public Function getNoOfHits() As Integer Public Function getNoOfHits() As Integer
Public Sub setNoOfHits( val As Integer ) Public Sub setNoOfHits( val As Integer )
Public Function getBandwidth() As Integer Public Function getBandwidth() As Integer
Public Sub setBandwidth( val As Integer ) Public Sub setBandwidth( val As Integer )
Public Function getNoOfPages() As Integer Public Function getNoOfPages() As Integer
Public Sub setNoOfPages( val As Integer ) Public Sub setNoOfPages( val As Integer )
ClientRequests
Attributes
Private RequestedFile As Character
Private ReqestedURL As Character
Private RequestedBytes As Character
Private ClientIP As Character
Operations
Public Function getRequestedFile() As Character
Public Sub setRequestedFile( val As Character )
Public Function getReqestedURL() As Character
Public Sub setReqestedURL( val As Character )
Public Function getRequestedBytes() As Character
Public Sub setRequestedBytes( val As Character )
Public Function getClientIP() As Character
Public Sub setClientIP( val As Character )
Public Function Class_Initialize()
By_Pages By_ResponseCode
By_Files By_Paths
{ From Access_Stats } Attributes
Attributes
Attributes Attributes Private NoOfVisitors As Integer
Private NofOfVisitors As Integer Private Bandwidth As Integer
Private NoOfVisitors As Integer Private NoOfVisitors As Integer
Private Bandwidth As Integer Private NoOfHits As Integer
Private Bandwidth As Integer Private NoOfHits As Integer
Private NoOfHits As Integer
Private NoOFHits As Integer Private Bandwidth As Integer Operations
Operations
Operations Operations Public Function Class_Initialize()
Public Function Class_Initialize() Public Function getNoOfVisitors() As Integer
Public Function Class_Initialize() Public Function Class_Initialize()
Public Function getNofOfVisitors() As Integer Public Sub setNoOfVisitors( val As Integer )
Public Function getNoOfVisitors() As Integer Public Function getNoOfVisitors() As Integer
Public Sub setNofOfVisitors( val As Integer ) Public Function getBandwidth() As Integer
Public Sub setNoOfVisitors( val As Integer ) Public Sub setNoOfVisitors( val As Integer )
Public Function getBandwidth() As Integer Public Sub setBandwidth( val As Integer )
Public Function getBandwidth() As Integer Public Function getBandwidth() As Integer
Public Sub setBandwidth( val As Integer ) Public Function getNoOfHits() As Integer
Public Sub setBandwidth( val As Integer ) Public Sub setBandwidth( val As Integer )
Public Function getNoOfHits() As Integer Public Sub setNoOfHits( val As Integer )
Public Function getNoOFHits() As Integer Public Function getNoOfHits() As Integer
Public Sub setNoOfHits( val As Integer )
Public Sub setNoOFHits( val As Integer ) Public Sub setNoOfHits( val As Integer )
ReferrerStats
Attributes
Private ReferrerURL As Character
Private RefSite As Character
Private Keyword1 As Character
Private Keyword2 As Character
Private Search_Engine As Character
Private Dom_Name As Character
Operations
Public Function Class_Initialize()
Public Function getReferrerURL() As Character
Public Sub setReferrerURL( val As Character )
Public Function getRefSite() As Character
Public Sub setRefSite( val As Character )
Public Function getKeyword1() As Character
Public Sub setKeyword1( val As Character )
Public Function getKeyword2() As Character
Public Sub setKeyword2( val As Character )
Public Function getSearch_Engine() As Character
Public Sub setSearch_Engine( val As Character )
Public Function getDom_Name() As Character
Public Sub setDom_Name( val As Character )
ByRef_Site
By_Keyword By_SearchEngine
Attributes
Attributes Attributes
Private NoOfHits As Integer
Private Bandwidth As Integer Private NoOfHits As Integer Private NoOfHits As Integer
Private NoOfPages As Integer Private Bandwidth As Integer Private NoOfPages As Integer
Private NoOfPages As Integer Private Bandwidth As Integer
Operations
Operations Operations
Public Function Class_Initialize()
Public Function getNoOfHits() As Integer Public Function Class_Initialize() Public Function Class_Initialize()
Public Sub setNoOfHits( val As Integer ) Public Function getNoOfHits() As Integer Public Function getNoOfHits() As Integer
Public Function getBandwidth() As Integer Public Sub setNoOfHits( val As Integer ) Public Sub setNoOfHits( val As Integer )
Public Sub setBandwidth( val As Integer ) Public Function getBandwidth() As Integer Public Function getBandwidth() As Integer
Public Function getNoOfPages() As Integer Public Sub setBandwidth( val As Integer ) Public Sub setBandwidth( val As Integer )
Public Sub setNoOfPages( val As Integer ) Public Function getNoOfPages() As Integer Public Function getNoOfPages() As Integer
Public Sub setNoOfPages( val As Integer ) Public Sub setNoOfPages( val As Integer )
<ip_addr><base_url>-
<date><method><file><protocol><code><bytes><referrer><user_agent>
Fields:
Client IP: 128.101.228.20
Authenticated User ID: - -
Time/Date: [10/Nov/1999:10:16:39 -0600]
Request: "GET / HTTP/1.0" (Other common methods are POST and HEAD)
Status: 200 (– 200: OK – 3xx: Some sort of Redirection – 4xx: Some sort of
Client Error– 5xx: Some sort of Server Error)
Bytes: -
Referrer: “-”
Agent: "Mozilla/4.61 [en] (WinNT; I)"
Sample Reports
Access Statistics
Pages
Hits Visitors Bandwidth
Page
% % %
1 / 166 27.48 144 26.33 2.30 MB 30.75
2 /coe/schedule.htm 61 10.10 53 9.69 615.87 KB 8.04
3 /result/results_revs.html 32 5.30 31 5.67 117.11 KB 1.53
4 /academic/ 25 4.14 23 4.20 137.72 KB 1.80
Entry Points
Hits Visitors Bandwidth
Entry Point
% % %
1 / 135 57.45 135 57.45 2.22 MB 86.84
2 /academic/ 15 6.38 15 6.38 54.74 KB 2.09
3 /academic 9 3.83 9 3.83 2.85 KB 0.11
4 /academic/lakescr.txt 8 3.40 8 3.40 8 0.00
Paths
Visitors Bandwidth
Path
% %
1 No Referrer -> / 53 22.55 721.30 KB 9.42
2 No Referrer -> / -> /coe/schedule.htm 16 6.81 549.09 KB 7.17
3 No Referrer -> / -> /result/results_revs.html 11 4.68 249.79 KB 3.26
4 No Referrer -> /academic/ 10 4.26 2.88 KB 0.04
Visitor Statistics
Hosts
Visitors
Hits Pages Bandwidth
Visitors Country
% % %
1 122.164.245.135 India 128 3.20 54 1.83 499.78 KB 0.88
2 121.246.25.137 India 121 3.02 86 2.92 352.62 KB 0.62
4 122.164.169.105 India 113 2.82 67 2.27 1.02 MB 1.84
Referrers Statistics
Hits Visitors Pages Bandwidth
Referrer
% % % %
1 http://www.annauniv.edu / 1134 28.31 143 15.29 25 2.55 5.95 MB 10.73
17.25
2 No Referrer 553 13.80 249 26.63 104 10.61 31.10
MB
http://www.annauniv.edu /coe
3 457 11.41 59 6.31 19 1.94 2.25 MB 4.05
/schedule.htm
http://www.annauniv.edu /coe
4 197 4.92 19 2.03 18 1.84 1.11 MB 2.00
/circular.html
Referring Sites
Hits Visitors Pages Bandwidth
Referring Site
% % % %
1 http://www.annauniv.edu / 3311 82.65 195 38.09 548 76.97 33.38 MB 60.19
2 No Referrer 553 13.80 249 48.63 104 14.61 17.25 MB 31.10
3 http://collinfo.annauniv.edu :6060 / 68 1.70 19 3.71 8 1.12 196.97 KB 0.35
4 http://www.google.co.in / 25 0.62 21 4.10 17 2.39 1.98 MB 3.57
5 http://www.google.com / 13 0.32 8 1.56 6 0.84 663.29 KB 1.17
Keywords
Hits Visitors Pages Bandwidth
Keyword SE Page
% % % %
1 anna university 1 11 28.95 7 22.58 3 13.04 153.23 KB 34.60
2 annauniversity 1 5 13.16 3 9.68 1 4.35 42.43 KB 9.58
Error Stats
Errors
Hits
Error
%
/coe/TITLEflowers.gif
1 97 22.77
http://www.annauniv.edu /coe /schedule.htm
/favicon.ico
2 87 20.42
No Referrer
/coe/fd_1.jpg
3 35 8.22
http://www.annauniv.edu /coe /top.htm
/campustour/images/leftboxcorner_top.gif
4 27 6.34
http://www.annauniv.edu /campustour /index.htm
/academic/
5 15 3.52
No Referrer
Hits
Error
%
Sample Code
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#ifndef _DEBUG
#define PRIVATE static
#else
#define PRIVATE
#endif
PRIVATE
void
filter_file(FILE* log_file)
{
struct log_file_entry* entry;
char* line = NULL;
size_t length = INITIAL_BUFFER_LEN;
PRIVATE
void
free_file_specs(void)
{
int counter = 0;
PRIVATE
void
filter_file_specs(void)
{
int counter = 0;
int flags = 0;
int status;
glob_t glob_buf;
assert(file_specs[0] != NULL);
case GLOB_NOSPACE:
// Out of memory error
exit_with_diagnostic("Ran out of memory whilst globbing...\n");
break;
case GLOB_NOMATCH:
// The pattern didn't match any files
exit_with_diagnostic("No files match file spec\n");
break;
default:
// Everything went ok, just carry on...
break;
}
flags |= GLOB_APPEND;
counter++;
}
assert(glob_buf.gl_pathc > 0);
filter_files(&glob_buf);
globfree(&glob_buf);
}
PRIVATE
void
filter_files(glob_t* glob)
{
int i;
FILE* log_file;
PRIVATE
void
usage(void)
{
exit_with_diagnostic(
"usage: " PACKAGE_NAME " [-hiTv] [-b browser] [-c client] [-f filter(s)]\n"
" [-I identity] [-m method] [-p protocol] [-r referer] [-s status]\n"
" [-u uri] [-U user] [-z size] logfile [logfile...]\n"
"\n"
" -b browser filter for user agent (browser) string\n"
" -c client filter for client address\n"
PRIVATE
void
parse_command_line(int argc, char** argv)
{
int choice;
if (argc <= 1)
{
usage();
}
memset(file_specs, 0, MAX_FILE_SPECS * sizeof(char*));
while (((choice = getopt(argc, argv, "b:c:hiTI:m:p:r:s:tu:U:vz:")) != -1))
{
switch (choice)
{
case 'b':
save_ua_filter(&log_filter, optarg);
break;
case 'c':
save_client_filter(&log_filter, optarg);
break;
case 'h':
usage();
break;
case 'i':
// Perform case insensitive matches
case_sensitive = 0;
break;
case 'I':
save_identity_filter(&log_filter, optarg);
break;
case 'm':
save_method_filter(&log_filter, optarg);
break;
case 'p':
save_protocol_filter(&log_filter, optarg);
break;
case 'r':
save_referer_filter(&log_filter, optarg);
break;
case 's':
save_status_filter(&log_filter, optarg);
break;
case 'T':
execute_all_tests();
break;
case 'u':
save_uri_filter(&log_filter, optarg);
break;
case 'U':
save_user_id_filter(&log_filter, optarg);
break;
case 'v':
print_version();
break;
case 'z':
save_size_filter(&log_filter, optarg);
break;
default:
usage();
exit_with_diagnostic("\nUnknown command line option");
break;
}
}
read_file_specs_from_cl(argc, argv);
}
PRIVATE
void
read_file_specs_from_cl(int argc, char* argv[])
{
int cl_counter;
int file_spec_counter = 0;
char* file_spec;
assert(file_specs[0] == NULL);
PRIVATE
void
print_version(void)
{
printf("%s version %s\n", PACKAGE_NAME, VERSION);
exit(EXIT_SUCCESS);
}
PRIVATE
char*
all_tests(void)
{
mu_run_test(entry_all_tests);
mu_run_test(filter_all_tests);
return 0;
}
PRIVATE
void
execute_all_tests(void)
{
int exit_code = EXIT_SUCCESS;
char *result;
result = all_tests();
if (result != 0)
{
printf("%s\n", result);
exit_code = EXIT_FAILURE;
}
else
{
printf("ALL TESTS PASSED\n");
}
printf("Tests run: %d\n", tests_run);
exit(exit_code);
}
Reference:
[1] Vranic, M.Pintar, D. Skocir, "The use of data mining in education environment"
in 9th International Conference on Telecommunications, 2007. ConTel 2007;
June 2007; PP: 243-250
[2] Qianhui Althea LIANG , Jen-Yao CHUNG , Steven MILLER , Yang OUYANG;
"Service Pattern Discovery of Web Service Mining in Web Service Registry-
Repository" in IEEE International Conference on e-Business Engineering
(ICEBE'06); October 2006
[3] Georgios Lappas; "An Overview of Web Mining in Societal Benefit Areas" in
The 9th IEEE International Conference on E-Commerce Technology and The
4th IEEE International Conference on Enterprise Computing, E-Commerce
and E-Services (CEC-EEE 2007); July 2007; pp. 683-690
[4] Hafidh Ba-Omar , Ilias Petrounias , Fahad Anwar; "A Framework for Using
Web Usage Mining to Personalise E-learning" in Seventh IEEE International
Conference on Advanced Learning Technologies (ICALT 2007); July 2007; pp.
937-938
[5] Leticia dos Santos Machado , Karin Becker; "Distance Education: A Web
Usage Mining Case Study for the Evaluation of Learning Sites" In Third IEEE
International Conference on Advanced Learning Technologies (ICALT'03); July
2003; pp. 360
[6] Carlos G. Marquardt , Karin Becker , Duncan D. Ruiz; "A Pre-Processing Tool
for Web Usage Mining in the Distance Education Domain" in International
Database Engineering and Applications Symposium (IDEAS'04); July 2004;
pp. 78-87
[7] Xiangzhu Gao , San Murugesan , Bruce Lo; "Extraction of Keyterms by Simple
Text Mining for Business Information Retrieval" in IEEE International
Conference on e-Business Engineering (ICEBE'05); October 2005; pp. 332-
339
[8] Ajith Abraham; "Natural Computation for Business Intelligence from Web
Usage Mining" in Seventh International Symposium on Symbolic and Numeric
Algorithms for Scientific Computing (SYNASC'05); September 2005; pp. 3-10