Sie sind auf Seite 1von 71

INDEX

1. Problem Definition---------------------------------------------------------------------- 4
1.1 Project Overview---------------------------------------------------------------- 5
1.2 Project Deliverable-------------------------------------------------------------- 6
2. System architecture-----------------------------------------------------------------------7-13
2.1Page rank algorithm-----------------------------------------------------------------7
2.2Simplified algorithm----------------------------------------------------------------8
2.3How page rank works---------------------------------------------------------------9
2.4How is page rank calculated------------------------------------------------------10
2.5Different criterias used in page rank-------------------------------------------10-12
2.6Keyword relevance----------------------------------------------------------------12
2.7Database connector----------------------------------------------------------------13
3.Project Organization---------------------------------------------------------------------- 14-21
3.1 Software Process Model--------------------------------------------------------- 14
3.2 Roles and Responsibilities---------------------------------------------------------17
3.3 Tools and Techniques ------------------------------------------------------------ 19
3.4Brief description of components used---------------------------------------------19

4.Project Management Plan------------------------------------------------------------------ 22-24


4.1 Tasks-------------------------------------------------------------------------------. 22
4.2 Project Plan--------------------------------------------------------------------------23
4.3 Timeline Chart----------------------------------------------------------------------23

5.Software Requirements Specifications (SRS)----------------------------------------- 25-29


5.1Hardware requirement----------------------------------------------------------25
5.2Software requirement-----------------------------------------------------------25
5.3 User Documentation------------------------------------------------------------25
5.4 System features------------------------------------------------------------------25
5.5. User Interfaces -----------------------------------------------------------------26
5.6. Hardware Interface-------------------------------------------------------------26
5.7 Software Interfaces--------------------------------------------------------------26
5.8Software system attributes-------------------------------------------------------27
5.9 Communication Protocols-------------------------------------------------------27
5.10. Software Product Features ----------------------------------------------------28
5.10.1 Reliability ....28
5.10.2 Availability ..28
5.10.3 Security 28
5.10.4 Maintainability ....28
5.10.5 Portability 28
5.10.6Database Requirements 28

6.Software Design Description-------------------------------------------------------------------- 30-50


6.1 System Architectural Design----------------------------------------------------- 30
6.2 System Interface Description --------------------------------------------------- 31
6.3 Detailed Description of Components------------------------------------------- 32
6.4UML diagrams------------------------------------------------------------------------37
6.4.1Use case diagram-------------------------------------------------------------------37
6.4.2Class diagram-----------------------------------------------------------------------38
6.4.3Activity Diagram--------------------------------------------------------------------39
6.4.4 Sequence Diagram-----------------------------------------------------------------40
6.4.5 Communication Diagram----------------------------------------------------------41
6.4.6Component Diagram----------------------------------------------------------------42
6.4.7Deployment Diagram---------------------------------------------------------------43
6.5Implementation details----------------------------------------------------------------45
6.5.1Page rank algorithm-------------------------------------------------------------------45
6.5.2Keyword relevance algorithm-------------------------------------------------------48
6.5.3Database connector-------------------------------------------------------------------49
7.System Test cases and test results --------------------------------------------------------------.51-64

7.1 Introduction 51
7.2 Test Cases & Results 58

8. Further Works -------------------------------------------------------------------------------------- 65


9.Screenshots--------------------------------------------------------------------------------------------66
10.References-------------------------------------------------------------------------------------------71

Abstract

Problem statement: Develop a framework (Rules Engine) for popularity based ranking
algorithms.
Platform: Visual studio 2003,Microsoft .net framework
Detail information:
What is a page rank?
Page Rank is a numeric value that represents how important a page is on the web. When one
page links to another page, it is effectively casting a vote for the other page. The more votes that
are cast for a page, the more important the page must be. Also, the importance of the page that is
casting the vote determines how important the vote itself is.
How the page rank is calculated?
To calculate the Page Rank for a page, all of its inbound links are taken into account. These are
links from within the site and links from outside the site.
PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn))
How to make use of page ranking algorithm:
Every search engine has their own algorithm for ranking the pages in order to retrieve the pages
to the user in return of their query used at the time of searching. Using a good searching strategy
will help to make the search much faster and efficient. This ranking algorithm will provide the
engine with much more refined searching strategies.
Application of project: To develop a Rules engine that will accept the user input and the search
criteria as specified by the user and give proper results. This Rules Engine will accept different
criteria along with the algorithm and will generate the most popular result on the basis of criteria
defined. This engine will even perform processing of the rules. Processing will include indexing,
stemming and stop word removal depending on the parameter passed by the user. The criteria
specified will be used by the Rules Engine and accordingly the most popular results will be given
to the user.

PROBLEM STATEMENT

To implement Rules Engine For Popularity Based Ranking

CHAPTER 1
PROJECT OVERVIEW
Functional description: Rules engine is a user application which has been developed in order to
rank any kind of data the user wants. It is a framework that has two basic elements the first one is
a connector second one is the rules .The model can accommodate any type of a connector the
basic aim of a connector is to fetch a certain kind of data for which it is specialized, the user can
then click in any type of connector and use any ranking algorithm to rank the data. For example a
user can use a WebCrawler as a connector and rank the web pages either using a page ranking
algorithm or by using a keyword relevance algorithm.
For example if we want to find out one of the best communities on Orkut then that can be done
by generating a connector which can find out various communities, we can then use certain
criteria to decide how to judge a best community the criteria can be like maximum number of
people who have joined the community can be ranked higher on the same lines we have to
develop the raking algorithm using this criteria. Finally we get the best community.

The major Area of work is in fields of Information retrieval, text processing and ranking of data.

PROJECT DELIVERABLES

Sr no
1
2
3
4
5
6
7
8
9

Date
27 Aug 2008
29 Sep 2008
05 Oct 2008
01 Jan 2009
15 Jan 2009
20 Jan 2009
21 Feb 2009
25 Feb 2009
05 Mar 2009

Deliverables
Page Ranking
Web crawler
Keyword Relevance
Merging of PR,KR,Crawler
Com DLL
Testing the system
Database connector
Testing the Database connector
Delivering entire system

CHAPTER 2

SYSTEM ARCHITECTURE
The Page ranking Algorithm [23]:
This algorithm is used by all the search engines. It is a method to rank web pages giving
to it a numeric value that represents their importance. Based on the link structure of the
web a page X has a high rank if:
-

It has many In-links or few but highly ranked

Has few Out-links

Basic idea: Pages rank determined by the number of links to the page (also known as
citations). If citing page is more important (has a high page rank/authority page) then the
pages it cites are more important. If citing page has many links, then cited page is less
important (normalize for number of links on citing page). PR(P) is page rank of page P,
T1, , TN are pages that cite P,C(P) is the number of links from Page P, D is a decay
factor, e.g., 0.85 then:
PR (P) = (1 d) + d (PR (T1)/C (T1) + + PR (Tn)/C (Tn))
Page Rank is a probability distribution used to represent the likelihood that a person
randomly clicking on links will arrive at any particular page. Page Rank can be calculated
for any-size collection of documents. It is assumed in several research papers that the
distribution is evenly divided between all documents in the collection at the beginning of
the computational process. The Page Rank computations require several passes, called
"iterations", through the collection to adjust approximate Page Rank values to more
closely reflect the theoretical true value.
A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is
commonly expressed as a "50% chance" of something happening. Hence, a Page Rank of
0.5 means there is a 50% chance that a person clicking on a random link will be directed
to the document with the 0.5 Page Rank.
Simplified algorithm:

10

How Page Rank Works:


Assume a small universe of four web pages: A, B, C and D. The initial approximation of
Page Rank would be evenly divided between these four documents. Hence, each
document would begin with an estimated Page Rank of 0.25.
In the original form of Page Rank initial values were simply 1. This meant that the sum of
all pages was the total number of pages on the web. Later versions of Page Rank would
assume a probability distribution between 0 and 1. Here we're going to simply use a
probability distribution hence the initial value of 0.25.
If pages B, C, and D each only link to A, they would each confer 0.25 Page Rank to A.
All Page Rank i.e. PR ( ) in this simplistic system would thus gather to A because all
links would be pointing to A.

This is 0.75.
Again, suppose page B also has a link to page C, and page D has links to all three pages.
The value of the link-votes is divided among all the outbound links on a page. Thus, page
B gives a vote worth 0.125 to page A and a vote worth 0.125 to page C. Only one third of
D's Page Rank is counted for A's Page Rank (approximately 0.083).

11

In other words, the Page Rank conferred by an outbound link L( ) is equal to the
document's own Page Rank score divided by the normalized number of outbound links (it
is assumed that links to specific URLs only count once per document).

In the general case, the Page Rank value for any page u can be expressed as:

,
i.e. the Page Rank value for a page u is dependent on the Page Rank values for each page
v out of the set Bu (this set contains all pages linking to page u), divided by the number
L(v) of links from page v.
How is Page Rank calculated?
To calculate the Page Rank for a page, all of its inbound links are taken into account.
These are links from within the site and links from outside the site.
PR (A) = (1-d) + d (PR (t1) / C (t1) + ... + PR (tn) / C (tn))
That's the equation that calculates a page's Page Rank.
In the equation 't1 - tn' are pages linking to page A, 'C' is the number of outbound links
that a page has and 'd' is a damping factor, usually set to 0.85 and(1-d) is called as
normalization factor .
Different criteria used in page ranking algorithm
Inbound links:
Inbound links (links into the site from the outside) are one way to increase a site's total
Page Rank. The other is to add more pages. The linking page's Page Rank is important,
but so is the number of links going from that page. Once the Page Rank is injected into
your site, the calculations are done again and each page's Page Rank is changed.
Depending on the internal link structure, some pages' Page Rank is increased, some are
unchanged but no pages lose any Page Rank.
It is beneficial to have the inbound links coming to the pages to which you are channeling
your Page Rank. A Page Rank injection to any other page will be spread around the site
through the internal links. The important pages will receive an increase, but not as much
of an increase as when they are linked to directly. The page that receives the inbound link
makes the biggest gain.

12

It is easy to think of our site as being a small, self-contained network of pages. When we
do the Page Rank calculations we are dealing with our small network. If we make a link
to another site, we lose some of our network's Page Rank, and if we receive a link, our
network's Page Rank is added to. But it isn't like that. For the Page Rank calculations,
there is only one network - every page that Google has in its index. Each iteration of the
calculation is done on the entire network and not on individual websites.
Outbound links:
Outbound links are a drain on a site's total Page Rank. They leak Page Rank. To counter
the drain, try to ensure that the links are reciprocated. Because of the Page Rank of the
pages at each end of an external link, and the number of links out from those pages,
reciprocal links can gain or lose Page Rank. We need to take care when choosing where
to exchange links.
When Page Rank leaks from a site via a link to another site, all the pages in the internal
link structure are affected. The page that you link out from makes a difference to which
pages suffer the most loss. Without a program to perform the calculations on specific link
structures, it is difficult to decide on the right page to link out from, but the generalization
is to link from the one with the lowest Page Rank.
Many websites need to contain some outbound links that are nothing to do with Page
Rank. Unfortunately, all 'normal' outbound links leak Page Rank. But there are 'abnormal'
ways of linking to other sites that don't result in leaks. Page Rank is leaked when Google
recognizes a link to another site. The answer is to use links that Google doesn't recognize
or count. These include form actions and links contained in JavaScript code.

Damping factor:
The Page Rank theory holds that even an imaginary surfer who is randomly clicking on
links will eventually stop clicking. The probability, at any step, that the person will
continue is a damping factor d. Various studies have tested different damping factors, but
it is generally assumed that the damping factor will be set around 0.85.
The damping factor is subtracted from 1 (and in some variations of the algorithm, the
result is divided by the number of documents in the collection) and this term is then
added to the product of the damping factor and the sum of the incoming Page Rank
scores.
That is,

Or (N = the number of documents in collection)

13

So any page's Page Rank is derived in large part from the Page Ranks of other pages. The
damping factor adjusts the derived value downward. Google recalculates Page Rank
scores each time it crawls the Web and rebuilds its index. As Google increases the
number of documents in its collection, the initial approximation of Page Rank decreases
for all documents.
The formula uses a model of a random surfer who gets bored after several clicks and
switches to a random page. The Page Rank value of a page reflects the chance that the
random surfer will land on that page by clicking on a link. If a page has no links to other
pages, it becomes a sink and therefore terminates the random surfing process. However,
the solution is quite simple. If the random surfer arrives at a sink page, it picks another
URL at random and continues surfing again.
When calculating Page Rank, pages with no outbound links are assumed to link out to all
other pages in the collection. Their Page Rank scores are therefore divided evenly among
all other pages. In other words, to be fair with pages that are not sinks, these random
transitions are added to all nodes in the Web, with a residual probability of usually d =
0.85, estimated from the frequency that an average surfer uses his or her browser's
bookmark feature.
So, the equation is as follows:

where p1,p2,...,pN are the pages under consideration, M(pi) is the set of pages that link to
pi, L(pj) is the number of outbound links on page pj, and N is the total number of page.
Keyword relevance algorithm:
In keyword relevance algorithm the page which has maximum count of the keyword is
ranked high.
Two terms used in keyword relevance algorithm are Total count and total keyword
occurrence. Total count actually stands for total number of keywords that occur in a
webpage for example if a page has keywords sun, moon, earth, moon then total count
is 4. If a page has 5 keywords say sun, moon, earth, sun, sun then total keyword
occurrence is 3 since sun has occurred thrice but meanwhile the total keyword count
is incremented only once.
Database connector:
The database connector used in project is used for populating a set of records. The
connector is used to insert specific records by the user. Once the user has entered all the
records he has to recommend some of things like places, foods or

14

restaurants.Recommendation of all the users will be saved which in turn will be used for
endorsement of a specific thing. This application can be used to create brand awareness
on social networking site. This application can even be further integrated with any of the
ranking algorithm and the ranking algorithm could be used to rank the data.

CHAPTER 3
PROJECT ORGANISATION

15

Software Process Model [22]:


Software process models often represent a networked sequence of activities, objects,
transformations, and events that embody strategies for accomplishing software evolution. Such
models can be used to develop more precise and formalized descriptions of software life cycle
activities. Their power emerges from their utilization of a sufficiently rich notation, syntax, or
semantics, often suitable for computational processing. Software process networks can be
viewed as representing multiple interconnected task chains
.
Task chains represent a non-linear sequence of actions that structure and transform available
computational objects (resources) into intermediate or finished products. Non-linearity implies
that the sequence of actions may be non-deterministic, iterative, accommodate multiple/parallel
alternatives, as well as partially ordered to account for incremental progress. Task actions in turn
can be viewed a non-linear sequence of primitive actions which denote atomic units of
computing work, such as a user's selection of a command or menu entry using a mouse or
keyboard.
Task chains can be employed to characterize either prescriptive or descriptive action sequences.
Prescriptive task chains are idealized plans of what actions should be accomplished, and in what
order. For example, a task chain for the activity of object-oriented software design might include
the following task actions:
_ Develop an informal narrative specification of the system.
_ Identify the objects and their attributes.
_ Identify the operations on the objects.
_ Identify the interfaces between objects, attributes, or operations.
_ Implement the operations.
There was a compelling need to provide a limited set of software functionality to users and
then refine and expand on that functionality in later software releases. In such a case, a process
model that is designed to produce software in increments is chosen, i.e. the incremental model is
chosen.
The Incremental Model combines elements of the waterfall model applied in an
iterative fashion. The incremental model applies linear sequences in a staggered fashion as
calendar time progresses. Each linear sequence produces deliverable increments of the
software.

16

When an incremental model is used, the first increment is often a core product. That is
basic requirements are addressed, but many supplementary features remain undelivered. The
core product is used by the customer. As a result of use and/or evaluation, a plan is developed
for the next increment. The plan addresses the modification of the core product to better meet
the needs of the customer and the delivery of additional features and functionality. This
process is repeated following the delivery of each increment, until the complete product is
produced.
The incremental model is iterative in nature. It focuses on the delivery of an operational
product with each increment. Early increments are stripped down versions of the final
product, but they do provide capability that serves the user and also provides a platform for
evaluation by the user. In addition, increments can be planned to manage technical risks.

17

In our software the incremental model was implemented as:


Increment #1

Implementation of page ranking algorithm

Increment #2

Implementation of page ranking algorithm with different damping factor

Increment #3

Implementation of Crawler
Implementation of HTML parser

Increment #4

Implementation of hash table


Implementation of stop word removal code

Increment #5

Implementation of indexing technique

Increment #6

Implementation of keyword relevance algorithm

Increment #7

Implementation of user interface


Implementation of COM DLL

Implementation of database connector

Merging of all the modules

Increment #8

Increment #9

Roles and Responsibility:


1. Roles:

18

Understanding the requirements, purpose, goals and the scale of the project

Finalizing the project problem definition


Market study of various available open source search engines
To study various ranking algorithm
To implement different ranking algorithms
Studying & understanding of various criteria used in ranking purpose
Understanding the web crawler
Implementation of our own crawler
Finalizing the indexing technique
To implement the database connector
Designing the User Interface part of the project
Integrate and test all modules
Demonstrate and do suggested modifications
Prepare a final report and a presentation

2. Project Team Members & their Assignments:

a. Dnyaneshwari Chandarana

Studying different ranking algorithms


Studying the Web crawler
Understanding the COM DLL
Understanding indexing technique
Preparing the detailed synopsis and presentation
Implementing the following features:

Page ranking algorithms


Keyword relevance algorithm
COM DLL
Integrate all the modules together
Perform Authentication & Registration Module Testing

b. Nitu Singh

Finalizing the ranking algorithms to be implemented


Studying the various criteria used in present ranking algorithms

19

Understanding the need for database connector


Preparing the detailed synopsis and presentation
Developing the User Interface for the software
Implementing the following features:
Database connector
User interface

Generate Test Plan Identify Test Cases


Generate Test Summary
Tracking & Monitoring Test Efforts
Manage required software.

Tools and Techniques:


The following tools have been used in the project:
At the initial stage of our project i.e.

20

1. At Stage 1 we gathered requirements from the client and formulated the requirement
Analysis in Microsoft Word 2003.

2. At Stage 2 of our project we prepared a detailed project plan.


Tool used Microsoft Project Planner

3. At Stage 3 the entire designing of the project was done.


4. Database Design
Tool used MS-Access
5. Programming Language C# and C++
Development methodology A name table, likes table, place table, recommendation table
was created using MS-Access.

6. And finally at Stage 5 the Development of code was done.

Brief description of various tools used:


1. Visual Studio 2003.net [21]: Microsoft Visual Studio is an integrated development
environment (IDE) from Microsoft. It can be used to develop console and graphical user
interface applications along with Windows Forms applications, web sites, web
applications, and web services in both code together with managed code for all platforms
supported
by Microsoft
Windows, Windows
Mobile, Windows
CE, .NET
Framework, .NET Compact Framework and Microsoft Silver light.
Visual Studio includes a code editor supporting IntelliSense as well as code refactoring.
The integrated debugger works both as a source-level debugger and a machine-level
debugger. Other built-in tools include a forms designer for building GUI applications,
web designer, class designer, and database schema designer. It allows plug-ins to be
added that enhance the functionality at almost every level - including adding support
for source control systems (like Subversion and Visual SourceSafe) to adding new
toolsets like editors and visual designers for domain or toolsets for other aspects of

21

the cycle. Visual Studio supports languages by means of language services, which allow
any programming language to be supported (to varying degrees) by the code editor and
debugger, provided a language-specific service has been authored. Built-in languages
include C/C++ (via Visual), VB.NET (via Visual Basic .NET), and C# (via Visual C#).
Support for other languages such as Chrome, F#, Python, and Ruby among others has
been made available via language services which are to be installed separately. It also
supports XML/XSLT, HTML/XHTML, JavaScript and CSS

2. Microsoft .net framework [20]: The Microsoft .NET Framework is a software


framework available with several Microsoft Windows operating systems. It includes a
large library of coded solutions to prevent common programming problems and a virtual
machine that manages the execution of programs written specifically for the framework.
The .NET Framework is a key Microsoft offering and is intended to be used by most new
applications created for the Windows platform.
The framework's Base Class Library provides a large range of features including user
interface, data and data access, database connectivity, cryptography, application
development, numeric algorithms, and network communications. The class library is used
by programmers, who combine it with their own code to produce applications.
Programs written for the .NET Framework execute in a software environment that
manages the program's runtime requirements. Also part of the .NET Framework, this
runtime environment is known as the Common Language Runtime (CLR). The CLR
provides the appearance of an application virtual machine so that programmers need not
consider the capabilities of the specific CPU that will execute the program. The CLR also
provides other important services such as security, memory management, and exception
handling. The class library and the CLR together compose the .NET Framework.
Version 3.0 of the .NET Framework is included with Windows Server 2008 and Windows
Vista. The current version of the framework can also be installed on Windows XP and
the Windows Server 2003 family of operating systems.
3. Rational Rose Software: An object-oriented analysis and design tool that runs on
Windows and UNIX platforms from IBM. It supports the Unified Modeling Language
(UML) as well as the earlier Booch and OMT notations.
4. Win Runner [19]: It is an automated functional GUI testing tool that allows a user to
record and play back UI interactions as test scripts.

22

As a Functional test suite, it works together with HP Quick Test Professional and
supports enterprise quality assurance. HP Win Runners intuitive recording process helps
you produce robust functional tests. To create a test, HP Win Runner simply records a
typical business process by emulating user actions, such as ordering an item or opening a
vendor account. During recording, you can directly edit generated scripts to meet the
most complex test requirements. Next, testers can add checkpoints, which compare
expected and actual outcomes from the test run. HP Win Runner offers a variety of
checkpoints, including test, GUI, bitmap and web links. HP Win Runner can also verify
database values to determine transaction accuracy and database integrity, highlighting
records that have been updated, modified, deleted and inserted. With a few mouse clicks,
the Data Driver Wizard feature lets you convert a recorded business process into a data
driven test that reflects the real-life actions of multiple users. For further test
enhancement, the Function Generator feature presents a quick and reliable way to
program tests, while the Virtual Object Wizard feature lets you teach HP Win Runner to
recognize, record and replay any unknown or custom object. As HP Win Runner executes
tests, it operates the application automatically, as though a real user were performing each
step in the business process. If test execution occurs after hours or in the absence of a
quality assurance (QA) engineer, the Recovery Manager and Exception Handling
mechanisms automatically troubleshoot unexpected events, errors and application crashes
so that tests can complete smoothly. Once tests are run, HP Win Runners interactive
reporting tools help your team interpret results by providing detailed, easy-to-read reports
that list errors and their originations. HP Win Runner lets your organization build
reusable tests to repeat throughout an applications lifecycle. Thus, if developers modify
an application over time, testers do not need to modify multiple tests. Instead, they can
apply changes to the Graphical User Interface (GUI) Map, a central repository of testrelated information, and HP Win Runner automatically propagates changes to all relevant
script

CHAPTER 4
PROJECT MANAGEMENT PLAN

23

Task:
A task set is a collection of software engineering work tasks, deliverables and milestones,
resources, dependencies, constraints, risks and contingencies that must be accomplished to
complete a particular project. Our project can be carried out with a structured degree of rigor.
Our project has the following main tasks to be carried out.
Task Name

: Page ranking algorithm

Description : This algorithm should rank the pages according to inbound and
outbound links

Resources needed:

Dependencies & Constraints

Risk & Contingencies

Microsoft Visual studio 2003


Internet connectivity
Crawler
LAN/Internet Connectivity
Failure of Server
Failure of LAN/Internet Connectivity

Task name: Keyword relevance algorithm


Resources needed:

Microsoft Visual studio 2003


Internet connectivity
Crawler

Dependencies & Constraints


LAN/Internet Connectivity
Risk & Contingencies
Failure of LAN/Internet Connectivity

Project plan:

24

1) To implement ranking algorithms.


2) To implement connectors e.g. WebCrawler, database connector.
3) To encompass ranking algorithms in com DLL.
4) Create a user interface for clicking in the type of connector and ranking algorithm to be used
for ranking the data.

Timeline chart:
Task Name

Duration

Start

Finish

Sponsorship Search

8 days

Mon 02/07/08

Mon 09/07/08

Formalities at Ubiqtas

7 days

Tues 10/07/08

Mon 16/07/08

Confirmation letter

1 day

Tues 17/07/08

Tues 17/07/08

Finalizing problem statement

9 days

Wed 18/07/08

Thu 26/07/08

Discussion with external guide

8 days

Fri 27/07/08

Fri 03/08/08

Making synopsis

10 days

Sat 04/08/08

Mon 13/08/07

Submission of final synopsis

1 day

Tues 14/08/08

Tues 14/08/08

Confirmation of problem
statement from college

1 day

Thu 16/08/08

Thu 16/08/08

Information gathering

15 days

Fri 17/08/08

Fri 31/08/08

10

Discussion with internal guide

9 days

Sat 01/09/08

Sun 09/09/08

11

Preparation of presentation

8 days

Mon 10/09/08

Mon17/09/08

12

Delivery of seminar

1 day

Tues 18/09/08

Tues18/09/08

13

Literature survey

20 days

Wed 19/009/08

Mon 18/10/08

14

Requirement specification

26 days

Tues 09/10/08

Sat 03/11/08

15

Initial design

25 days

Sun 04/11/08

Wed 28/11/08

16

Verify design

9 days

Tues 01/01/09

Wed 09/01/09

17

Implementation and coding

60 days

Thu 10/01/08

09/03/09

25

18

GUI

15 days

Mon 10/03/08

Mon 24/03/08

19

Testing and debugging

28 days

Tues 25/03/09

Tues 29/04/09

20

Documentation and preparation


of report

10 days

Wed 30/04/09

10/04/09

CHAPTER 5
SOFTWARE REQUIRMENT SPECIFICATION

Hardware Requirement:

26

766 MHz Pentium or compatible processor (1.5 GHz Pentium recommended)

512 MB RAM more recommended).

Video Monitor (800 600 or higher resolution) with at least 256 colors (1024 768 High
color 16-bit recommended).
Software Requirement:

Front End : VC++(2003),C#.net

Operating System : Windows XP

Database: MS access

User Documentation: User guide or manual should be small and contain all the
information in user understandable format.
In user manual also provide the picture or diagram for proper way to guide user.
System features:
1. Helps in ranking the web pages according to the popularity of web pages.
2. Helps in ranking the web pages according to the relevancy of web pages.
3. Provide an interface/tool to create awareness on social networking sites.

User Interfaces:
We designed a simple user interface using the Microsoft Visual studio 2003
development tool and C# as the programming language. Our user interface is similar to
most of the standard search engines, and contains buttons for performing the basic
functions as specified in the user requirements.
Most of the error messages will be pop up in a dialog box.

Hardware Interfaces:
A computer with minimum 512 MB of RAM with internet connectivity is required.

27

Software Interfaces:
The Rules Engine will run only if the server (in our case Authenticator) is running on
the server machine. The server includes the MS Access database.
The COM DLL is needed to load the project at run time.

Software System Attributes:

The page ranking algorithm computes the rank of pages from a specified set of
pages and displays the most highly ranked pages accordingly.
The Keyword relevance algorithm gives result according to the maximum
frequency count of words on a particular page. The page with the highest
frequency count will be on the top rank.
The database connector is used to insert particular information from the user such
as his/her name, likes as well as the recommendation made.

Communication Protocols:
The communication protocol used in our system is FTP

28

File Transfer Protocol (FTP) is a network protocol used to exchange and manipulate files over
a TCP computer network, such as the internet. An FTP client may connect to an FTP server to
manipulate files on that server.
FTP runs over TCP.[1] It defaults to listen on port 21 for incoming connections from FTP clients.
A connection to this port from the FTP Client forms the control stream on which commands are
passed from the FTP client to the FTP server and on occasion from the FTP server to the FTP
client. FTP uses out-of-band control, which means it uses a separate connection for control and
data. Thus, for the actual file transfer to take place, a different connection is required which is
called the data stream. Depending on the transfer mode, the process of setting up the data stream
is different. Port 21 for control (or program), port 20 for data.
In active mode, the FTP client opens a dynamic port, sends the FTP server the dynamic port
number on which it is listening over the control stream and waits for a connection from the FTP
server. When the FTP server initiates the data connection to the FTP client it binds the source
port to port 20 on the FTP server.
The objectives of FTP are:
1. To promote sharing of files (computer programs and/or data).
2. To encourage indirect or implicit use of remote computers.
3. To shield a user from variations in file storage systems among different hosts.
4. To transfer data reliably, and efficiently.

Reliability:
The reliability of the overall program depends on the reliability of the separate
components.
Availability: The system can be made available if u have the specific kind of connector

for specific kind of data one wants to search .Internet availability is a must.

Security:

29

Passwords will be saved in the database in order to ensure the user's privacy.

Maintainability:
The maintainability of the project has been addressed by assigning appropriate variable
names, following appropriate naming convention for functions and appropriate coding
standards. The segregation of code makes it easy to understand, maintain and modify.

Portability:
The application is Windows-Xp based and should be compatible with other systems. The
end-user part is fully portable and any system having any operating system should be able
to use the features of the application.

Database Requirements:
A database is maintained in MS access to keep a list of all users, their likes and
recommendation made by them. Following are the tables maintained in the database:

FIELD NAME

DATATYPE

VALIDATION

Name

TEXT

NOT NULL,PRIMARY KEY

Likes

TEXT

NOT NULL

Recommendation

TEXT

NOT NULL

30

CHAPTER 6
SOFTWARE DESIGN DESCRIPTION

31

Rules engine will perform processing of the rules. Processing includes different functions like
indexing, stemming and stop word removal depending on the parameters passed by the user.
Algorithm: We will develop an algorithm which takes these parameters as input and
generates the most popular result on the basis of the criteria defined.
This algorithm will have many criteria defined which will allow the user to search
specific information according to his own chosen criteria.
Whenever we search something on search engines the results are displayed according to
popularity of pages meaning those pages which have high rank will be displayed first.
Instead of this we can let user decide the criteria of searching and have the results
according to their own chosen criteria.

32

System interface design:

Detail description of components


1. Dynamic link library [18]:

DLL, is Microsofts implementation of the shared library concept in the Microsoft


Windows and OS/2 operating systems. These libraries usually have the file
extension DLL, OCX (for libraries containing ActiveX controls), or DRV. The formats for
DLLs are the same as for Windows EXE files that is, Portable (PE) for 32-bit and 64bit Windows, and New Executable (NE) for 16-bit Windows. As with EXEs, DLLs can
contain code, data and resources in any combination.
Facilities provided by DLL:

33

a. Memory management:
In Win32, the DLL files are organized into sections. Each section has its own set of
attributes, such as being writable or read-only, executable (for code) or non-executable
(for data), and so on.
The code in a DLL is usually shared among all the processes that use the DLL; that is,
they occupy a single place in physical memory, and do not take up space in the page file.
If the physical memory occupied by a code section is to be reclaimed, its contents are
discarded, and later reloaded directly from the DLL file as necessary.
In contrast to code sections, the data sections of a DLL are usually private; that is, each
process using the DLL has its own copy of all the DLL's data. Optionally, data sections
can be made shared, allowing inter-process communication via this shared memory area.
However, because user restrictions do not apply to the use of shared DLL memory, this
creates a security hole; namely, one process can corrupt the shared data, which will likely
cause all other sharing processes to behave undesirably. For example, a process running
under a guest account can in this way corrupt another process running under a privileged
account. This is an important reason to avoid the use of shared sections in DLLs.
If a DLL is compressed by certain executable packers (e.g. UPX), all of its code sections
are marked as read-and-write, and will be unshared. Read-and-write code sections, much
like private data sections, are private to each process. Thus DLLs with shared data
sections should not be compressed if they are intended to be used simultaneously by
multiple programs, since each program instance would have to carry its own copy of the
DLL, resulting in increased memory consumption.

b. Import libraries
Linking to dynamic libraries is usually handled by linking to an import library when
building or linking to create an executable file. The created executable then contains an
import address table (IAT) by which all DLL function calls are referenced (each
referenced DLL function contains its own entry in the IAT). At run-time, the IAT is filled
with appropriate addresses that point directly to a function in the separately-loaded DLL.
Like static libraries, import libraries for DLLs are noted by the .lib file extension. For
example, kernel32.dll, the primary dynamic library for Windows' base functions such as
file creation and memory management, is linked via kernel32.lib.

34

c. Explicit run-time linking


DLL files may be explicitly loaded at run-time, a process referred to simply as run-time
dynamic linking by Microsoft, by using the LoadLibrary or LoadLibraryEx API
function. The GetProcAddress API function is used to lookup exported symbols by
name, and FreeLibrary to unload the DLL. These functions are analogous
to dlopen, dlsym, and dlclose in the POSIX standard API.

2. Component Object modeling:


The Component Object Model (COM) extends the DLL concept to object-oriented
programming. Objects can be called from another process or hosted on another machine.
COM objects have unique GUIDs and can be used to implement powerful back-ends to
simple GUI front ends such as Visual Basic and ASP. They can also be programmed from
scripting languages. COM objects are more complex to create and use than DLLs.

3. Library file libcurl-7.19.3-win32-ssl-msvc, HTMLReader_src [17]:


cURL is a command line tool for transferring files with URL syntax. The original author
of cURL is Daniel Stenberg, who started the project in 1997, as a way to transfer files
more programmatically via protocols such as http, ftp, gopher, sftp, ftps, scp, tftp, and
many more (13 total), via a command line interface. The strong point of cURL is the
number of data transfer protocols it supports (listed further down). Distributed under the
MIT License, cURL is free software.

Example
Basic use of curl involves simply typing curl at the command line, followed by the URL
of the output you want to retrieve.
To retrieve the Wikipedia homepage, type:
curl www.wikipedia.org
Curl defaults to displaying the output it retrieves to the standard output specified on the
system, which is usually the terminal window. So running the command above would, on
most systems, display the www.wikipedia.org source code in the terminal window.

35

4. Library file HTMLReader_src is an html parser used to parse HTML web pages [16].
An events-based parser uses the callback mechanism to report parsing events. These
callbacks turn out to be protected virtual member functions that you will override.
Events, such as the detection of an opening tag or the closing tag of an element, will
trigger a call to the corresponding member function of your class. The application
implements and registers an event handler with the reader. It is up to the application to
put some code in the event handlers designed to achieve the objective of the application.
Events-based parsers provide a simple, fast, and a lower-level access to the document
being parsed.
Events-based parsers do not create an in-memory representation of the source document.
They simply parse the document and notify client applications about various elements
they find along the way. What happens next is the responsibility of the client application.
Events-based parsers don't cache information and have an enviably small memory
footprint.
The page ranking algorithm, the keyword relevance algorithm and the web crawler are
integrated to form a web connector we create a com DLL further we import this DLL into
our windows application .A database connector which uses ms-access at backend is
created for the same windows application.

5. Web crawler [15]:

It creates a copy of all the visited pages for later processing by a search engine that will
index the downloaded page to provide fast searches. Checking links or validating HTML
code can be used to gather specific type of information from web pages such as
harvesting e-mail address (spam). Web crawling is modeled as a multiple queue, singleserver polling system on which the web crawler is the server and the web sites are
queues. The objective of crawler is to keep the average freshness of pages in its collection
as high as possible or to keep the average age of pages as low as possible. To improve the
freshness we should penalize the elements that change too often. A web crawler (also
known as a web spider, web robot, orespecially in the FOAF communityweb scatter)
is a program or automated script which browses the World Wide Web in a methodical,
automated manner. Other less frequently used names for web crawlers are ants, automatic

36

indexers, bots, and worms. This process is called web crawling. Many sites, in particular
search engines, use crawling as a means of providing up-to-date data. Web crawlers are
mainly used to create a copy of all the visited pages for later processing by a search
engine that will index the downloaded pages to provide fast searches. Crawlers can also
be used for automating maintenance tasks on a website, such as checking links or
validating HTML code. Also, crawlers can be used to gather specific types of information
from Web pages, such as harvesting e-mail addresses (usually for spam).A web crawler is
one type of BOT, or software agent. In general, it starts with a list of URLs to visit, called
the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and
adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier
are recursively visited according to a set of policies.
Algorithm for Web Crawler:

Get the URL

Dump the URL into a data structure called queue

Go to that URL scan the entire page find out if any links are present, if any
URLs are present dump them into that linked list.

All the URLs present in the linked list are called as child URLs and the one
present in queue are called as parent.

Now pick first child URL from linked list dump it into the queue this URL
then becomes the parent repeat step 1.

Repeat the process for each and every child URL present in the linked list.

Keep on doing so till the depth mentioned at the start of the code is reached

6. Database Connector:

To implement this connector the database used is MS-access.

This connector is used for populating a set of records. The connector is used to
insert specific records by the user.

Once the user has entered all the records he has to recommend some of things
like places, foods or restaurants.

Recommendation of all the users will be saved which in turn will be used for
endorsement of a specific thing.

This application can be used to create brand awareness on social networking


site.

37

This application can even be further integrated with any of the ranking
algorithm and the ranking algorithm could be used to rank the data.

When a user recommends a thing or a place he is casting his vote to it.

A place/thing with the maximum number of votes will be the most popular
among all the data.

When a user fires a query to see endorsements of a specific thing the one with
the highest number of votes would be on the top of the list.

UML diagrams:
1. Use case diagram:

38

2. Class Diagram:

39

3. Activity Diagram:

40

4. Sequence Diagram:

41

42

5. Communication Diagram:

43

44

6. Component Diagram:

45

7. Deployment Diagram:

46

Implementation Details:
1. Page Rank Algorithm:
#include<conio.h>
#include<iostream>
using namespace std;
int main()
{

int m=0, i, j, k;
const double d=0.85;
const double n=1-d;
float linkMap[10][10]= {

{1,2,3,0,0,0,1,2,3,1},
{4,5,6,1,0,0,1,2,3,1},
{0,2,3,1,1,1,1,2,3,0},
{1,2,3,0,0,0,1,0,0,0},
{1,3,3,0,0,0,1,2,1,1},
{1,1,1,1,1,0,0,0,0,0},
{0,0,0,0,0,0,1,1,1,1},
{1,1,1,0,0,0,2,2,2,2},
{1,1,1,0,0,0,1,1,2,1},
{1,2,2,0,3,2,1,2,3,1}};

47

float pageValue[10]

= {0};

int outboundLinks[10] = {0};

cout << "Enter the number of iterations ::";


cin >> i;
for(j=0;j<10;j++)
{
for(k=0;k<10;k++)
{
if( j != k)
{
outboundLinks[j] += linkMap[j][k];
}
}

for(m=0;m<10;m++)
{

48

printf("outboundlink of page[%d]::%d \n",m,outboundLinks[m]);


}
while (i)
{
for(j=0;j<10;j++)
{

for(k=0;k<10;k++)
{

if(linkMap[j][k]!=0 && (j!= k))


{

pageValue[j] += pageValue[k]/outboundLinks[j];
}
}
pageValue[j] = n + d* (pageValue[j]);
printf("PageValue[%d] = %f\n", j+1, pageValue[j]);
}
printf("-------------------------------\n");

49

i--;
}
getch();
return 0;
}
2. Keyword Relevance Algorithm
for(temp1 = Head[0]; temp1 != NULL; temp1 = temp1->next)
{
for(temp2 = temp1->next; temp2 != NULL; temp2 = temp2->next)
{
if ((temp1->keywordOccurance < temp2->keywordOccurance) ||
( (temp1->keywordOccurance == temp2->keywordOccurance) && (temp1->TotalCount
< temp2->TotalCount)))
{
ptr = temp1->nodePtr;
temp1->nodePtr = temp2->nodePtr;
temp2->nodePtr = ptr;
i = temp1->keywordOccurance;
temp1->keywordOccurance = temp2->keywordOccurance;
temp2->keywordOccurance = i;
i = temp1->TotalCount;

50

temp1->TotalCount = temp2->TotalCount;
temp2->TotalCount = i;

}
}
3. Database connector
private void ExecuteInsertQuery()
{
int rows = 0;
OpenConnection();
crawlerAdapter.SelectCommand.Connection = crawlerConnection;
crawlerAdapter.SelectCommand.CommandText = "Select * From
table1 WHERE Person = '" + textBox4.Text + "' AND Category = '" + textBox6.Text
+ "' AND Object = '" + textBox5.Text + "'";
crawlerAdapter.SelectCommand.ExecuteNonQuery();
CloseConnection();
DataSet ds = new DataSet();
crawlerAdapter.Fill(ds);
if (ds.Tables[0].Rows.Count > 0)
{
MessageBox.Show("Entry is already present");
return;
}
int ID = 0;
OpenConnection();
crawlerAdapter.SelectCommand.Connection = crawlerConnection;
crawlerAdapter.SelectCommand.CommandText = "Select * from
table1";
crawlerAdapter.SelectCommand.ExecuteNonQuery();
CloseConnection();
ds = new DataSet();
crawlerAdapter.Fill(ds);
if (((int)ds.Tables[0].Rows.Count) > 0)
{
OpenConnection();
crawlerAdapter.SelectCommand.Connection =
crawlerConnection;
crawlerAdapter.SelectCommand.CommandText = "Select
MAX(ID) from table1";
crawlerAdapter.SelectCommand.ExecuteNonQuery();
CloseConnection();
ds=new DataSet();

51

crawlerAdapter.Fill(ds);
ID =(int)ds.Tables[0].Rows[0][0];
ID++;
}
else
{
ID = 1;
}
OpenConnection();
crawlerAdapter.InsertCommand.Connection=crawlerConnection;
crawlerAdapter.InsertCommand.CommandText = "INSERT INTO table1
VALUES (" + ID.ToString() + ", '" + textBox4.Text + "', '" + textBox6.Text + "', '" +
textBox5.Text + "')";
MessageBox.Show(crawlerAdapter.InsertCommand.CommandText);
crawlerAdapter.InsertCommand.Connection = crawlerConnection;
rows = crawlerAdapter.InsertCommand.ExecuteNonQuery();
CloseConnection();
PopulateComboBox();
MessageBox.Show("Insert Successful");
}

CHAPTER 7

52

SYSTEM TEST CASES AND TEST RESULTS


Introduction:

The aim of testing process is to identify all defects in a software product.


Testing is any activity aimed at evaluating the software for quality results it
produces and the quality of results it can handle. Testing is an operation to
detect the differences between the expected (required) result and the actual
result.

Testing a program consists of subjecting the program to a test inputs or test


cases and observing if the program behaves as expected. If the program fails
to behave as expected, then the condition under which failures occurs are
noted for later debugging and correction.

Our goal is to design a series of test cases that would have a high likelihood of
finding errors. The software testing techniques provide a systematic guidance
for designing tests that exercise the internal logic of software components and
exercise the input & output domains of the program to uncover errors in
program function, behavior and performance.

Software is tested from two different perspectives:


(1) Internal program logic is exercised using White Box test case design
techniques.
(2) Software requirements are exercised using Black Box test case design
techniques. In both cases, the intent is to find maximum number of errors with
minimum effort and time.

System test objective and scope

53

The main aim to test this is to insure that:

The Proposed system permits only secure and authenticate access.

Thus requires the user to enter the URL in correct format.

Does all validation time to time as per the need.

Takes a single input as user id for the detection of anomalies that is used to generate the
recommendations.

Does all the ranking calculations internally.

Appropriates alerts are generated as per the condition for user convenience.

Database is updated time to time as the user transaction process proceeds.

A full system test will be conducted including following type of tests:


Functional testing:
To be truly robust, application require more than simple functional testing before
release into production.

Permits only secure and authenticate access.

Thus requires the user to be registered with the system before use.

Does all validation time to time as per the need.

Takes a single input as user id for the detection of anomalies that is used to generate the
recommendations.

Does all the conversion of data internally while requires.

Appropriates alerts are generated as per the condition for user convenience.

Database is updated time to time as the user transaction process proceeds.

At least one and preferably all of the following types of testing before releasing
application to customers should be performed.

54

Performance testing

Load testing

Stress testing

Performance Testing
Performance testing is designed to test run-time performance of application within
the context of an integrated system . Proper response time for user actions is critical to
maintaining and enhancing user base.
Load Testing
Load testing demonstrates how the application performs under concurrent user
sessions for typical user scenarios. Setting up common scenarios that execute for a short
period of time allows seeing how the application operates under a multiple-user load.

Stress Testing
Stress test allows examining how the application behaves under a maximum user
load. To stress test application, remove the think time for load scripts and execute the
scripts against the server to overload use of the application. If there are unhandled
exceptions in a stress test, the application may not be robust enough to handle a sudden
unexpected increase in user activity. Stress tests generally execute for a longer period of
time, and can be used to catch difficult-to-diagnose problems like subtle memory leaks in
the application

Items to be tested:
The following items are the ones that constitute the proposed system.

55

Hear we ensure that all the modules, classes and libraries are included when
integrated properly.
No
1.
2.
3.
4.

Name
The page rank algorithm
Keyword relevance algorithm
Web crawler
Database connector

Identifier
C1
C2
C3
C4

Version no
1
1
1
1

Feature to be tested:
Hear we will be testing all features provided by the proposed system to ensure all the
features that distinguish the system are implemented properly. The following are the features that
will be testing here.

a) Registration for users:


Here we check whether the process is working as per Expectations & when incorrect
input is given the system responds with proper error message.
b) Maintain log for each user session:
Here we check that all users log is maintained properly and all tables in the database
are updated properly. With user name, id, recommendations by the user about places
and things.
c) Blocking of unauthorized user to access:
Here with the help of databases table record of the user are maintained So that the
unauthorized user are unable to access the account & in order to do this proper
update must be done.
d) User & system interaction be proper:
Here we check whether the entire process is working properly.

Approach:

56

Testing accomplishes a variety of things, but most importantly it measures the


quality of the software that is developed. This view presupposes that there are defects in
the software waiting to be discovered.
We are doing testing manually. No tools are used for this testing. The testing would be
conducted until we ensure that all the features are supposed to be provided by the system are
working well and good security is provided.
Testing is never ending activity but there must be Limitation to every activity. We build a
module & testing would be done on it ensuring everything is ok. When entire thing is ready all
kinds of testing like use case testing, unit testing, GUI testing.
Pass/Fail Criteria:
Here pass fail criteria are defined in the 1 & 2 modules. The proposed system is pass
if failures observed in the module rate is low at the time of user registration. If the user are writes
wrong information then the system fails & associated defect must be recorded.

Suspension & resumption criteria:


Suspension of testing will be done at the end of the day & resumed on next morning.
Testing will suspended if
1.
2.
3.
4.

Hardware failure occurs.


The system is unable to accept a valid input.
If server is not responding.
If software crash or damage due to some internal problem.
When a defect causing software failure is repaired, the new version of the software
undergoes a regression test. If the new version passes the regression test, then normal
testing can resume.

Test Deliverables:
The Following are the resultants of testing:
1. Test plan
2. Test cases
3. Test procedure sections
4. Test summary reports
5. Test logs
Test Tasks:

57

The following is the list of testing task:


1.
2.
3.
4.
5.

Preparing the test plan & attachments.


Preparing the test design specifications.
Preparing test cases specification
Transmitting test related data to configuration management group.
Supervising testing staff & organizing the test related measurements.

Test Environment:
Software requirements:
Category (Software tools)

Software Name

Operating System

Microsoft Windows XP

Front End Development Tool

.net 2003

Front End

VC++,C#.net

Back End

MS access, Files

Hardware requirements:
Hardware

Minimum Requirement

Microprocessor

Intel 3.1GHz processor.

Random Access Memory (RAM)

512 MB RAM.

Hard Disk Drive

20GB

(min.

free

useable

space).Network

Responsibilities:

Sr.no

Name

Designation

Task

disk

58

Dnyaneshwari

Test Manager

Nitu

Test Manager

3.

Dnyaneshwari

Test Engineer

4.

Nitu

Test Engineer

Generate test plan


Identify test cases
Generate test summary
Perform login module
testing, perform
preprocessing module
testing, execute testes
Perform feature extraction
module testing, generate test
harnesses. Check result
Perform Validation testing,
manage equipment like
laptop.
Record executed test results.

Staffing & Training Needs:


Tester must be an engineer or from technical background. He must know how to handle
system.

Risks:
1. Power failure.
2. Hardware failure.
3. Server crashes.
4. Unable to handle site.

Test case and Test Case Summary

MODULE INFORMATION: PAGE RANK ALGORTIHM

GENERAL INFORMATION:
PRODUCT NAME: RULES ENGINE
2. EXECUTION INFORMATION:

59

PRODUCT VERSION: 1.1


DURATION: 7 DAYS
TESTED BY: DNYANESHWARI CHANDARANA
REVIEWED BY: SINGH NITU

3. TEST SUIT INFORMATION:


COMPONENT NAME: PAGE RANK ALGORITHM
TEST TYPE: USE CASE TESTING (MANUAL)
4. TEST METHODOLOGY:
STEP1: ENTER URL
STEP2: CHECK IF THE ENTERED URL IS IN CORRECT FORMAT
STEP3: MAKE CONNECTIONS OVER THE INTERNET
STEP4: COMPUTE PAGE RANK

60

Test Items to be
id
tested

Steps

Input

Actual output

Expected output

Pass/fail

1.

User enters the


URL

URL address

Display
success

Display message
successful

Pass

URL

System assigns Make


system connects connections
to web

61

2.

System check
for proper
address entered
by the user

System
compares the
data entered by
the user and the
data present in
the database.

If address is
valid

Make
connection

Make connection

Pass

If address is
invalid

Report
improper
address

Report error

Pass

3.

System
computes page
rank

System
downloads
relevant pages
from the web.

4.

User enters
URL to
compute rank
by keyword
relevance
algorithm

System checks
if the URL
entered is in
correct format.

URL address

Display
message
successful

Display message
successful

Pass

TEST CASE SUMMARY:


Sr. no

Test cases

Descriptions

To check whether the user has entered proper URL.

To check whether the system makes proper connections with internet.

62

To check whether the system computes page rank properly.

To check whether high ranked pages are displayed first.

To check whether user has selected the correct application which is set by
admin to him/her

MODULE NAME: DATABASE CONNECTOR


GENERAL INFORMATION:
PRODUCT NAME: RULES ENGINE
EXECUTION INFORMATION:
PRODUCT VERSION: 1.2
DURATION: 7 DAYS
TESTED BY: SINGH NITU
REVIEWED BY: DNYANESHWARI CHANDARANA
TEST SUIT INFORMATION:
COMPONENT NAME: DATABASE CONNECTOR
TEST TYPE: USE CASE TESTING (MANUAL)
TEST METHODOLOGY:
1. User fills his/her information in the space provided.
2. User recommends about places/things.
3. Recommendation made by other user is shown to all.
DETAIL TEST CASES:

63

TE
ST
ID

ITEM TO BE
TESTED

STEPS

INPUT

ACTUAL
OUTPUT

EXPECTED
OUTPUT

PASS
OR
FAIL

1.

User fill his


name ,likes
about place and
thing

User selects
application from the
application list

2.

Names of the
user

User fills his/her name

user fill that


name that is not
already present
in database

User name

username

pass

System checks if
duplicated records are
present in the database

Display error if
present

Display error
if present

3.

4.

User fills
System updates
recommendation information and
assigns user vote to
that particular
place/thing

User
recommendatio
n

5.

User searches to
see the
popularity of
particular
place/thing

Place/thing

The most popular


record

The most
pass
popular record

System checks in the


database to show the
popularity

TEST CASE SUMMARY:


SR.
NO

TEST ID

TEST ACTIONS

To check whether personal information filled by user is in proper format

64

To check whether user gives recommendation to a particular place/thing

To check whether the user recommendation of one user is made available to all

To check whether the search result of user is carried properly.

CHAPTER 8
FURTHER WORKS
Future work:

We can develop different ranking algorithm

We can develop different connectors

User can use different connectors and ranking algorithms to rank different types of data

To do this user just has to add an extra tab and insert his code inside the framework

65

CHAPTER 9
SCREENSHOTS

66

67

68

69

CHAPTER 10

70

REFERENCES

BOOKS:
1. Roger pressman Software Engineering
2. Information Retrieval
WEBSITES:
1)http://blog.taragana.com/index.php/archive/clean-room-implementation-of-google-page-rankalgorithm/
2)http://www.stanford.edu/group/reputation/ClickThroughAlg_Tutorial.pdf
3)http://kojotovski.diinoweb.com/files/The_mathematical_model_of_Google.pdf
4)http://citeseer.ist.psu.edu/cache/papers/cs/7144/http:zSzzSzwwwdb.stanford.eduzSz~backrubzSzpageranksub.pdf/page98pagerank.pdf
5)http://www.suchmaschinen-doktor.de/index.html
6)http://wwwhome.math.utwente.nl/~litvakn/IntMath07.pdf
7)http://www2006.org/programme/files/xhtml/3101/p3101-Richardson.html
8)http://www.texaswebdevelopers.com/docs/pagerank.pdf
8)http://pr.efactory.de/e-pagerank-implementation.shtml
8)http://www.rankforsales.com/n-aa/095-seo-may-31-03.html
9)http://www.pwqsoft.com/search-engine-ranking.htm#case2
10)http://www.webworkshop.net/pagerank.html
11)http://www.ianrogers.net/google-page-rank/

71

12)http://www.webworkshop.net/pagerank_calculator.html
13)http://www.linkingmatters.com/WhyLinkingIsImportant.html
14)http://www.example-code.com/vcpp/spider_simplecrawler.asp
15)http://en.wikipedia.org/wiki/Web_crawler
16)http://www.codeproject.com/KB/library/GomzyHTMLReader.aspx
17)http://en.wikipedia.org/wiki/CURL
18)http://en.wikipedia.org/wiki/Dynamic-link_library
19)http://en.wikipedia.org/wiki/WinRunne
20)http://www.nokiasoftware.net/general-discussions/19871-net-framework.html
21)http://cache.phazeddl.com/1412686/Microsoft%20Visual%20Studio%206.0
22)www.rocw.raifoundation.org/management/mba/.../lecture-10.pdf
23)http://en.wikipedia.org/wiki/PageRank#Algorithm

Das könnte Ihnen auch gefallen