Project Report

INDEX
1. Problem Definition---------------------------------------------------------------------- 4
1.1 Project Overview---------------------------------------------------------------- 5
1.2 Project Deliverable-------------------------------------------------------------- 6
2. System architecture-----------------------------------------------------------------------7-13
2.1Page rank algorithm-----------------------------------------------------------------7
2.2Simplified algorithm----------------------------------------------------------------8
2.3How page rank works---------------------------------------------------------------9
2.4How is page rank calculated------------------------------------------------------10
2.5Different criterias used in page rank-------------------------------------------10-12
2.6Keyword relevance----------------------------------------------------------------12
2.7Database connector----------------------------------------------------------------13
3.Project Organization---------------------------------------------------------------------- 14-21
3.1 Software Process Model--------------------------------------------------------- 14
3.2 Roles and Responsibilities---------------------------------------------------------17
3.3 Tools and Techniques ------------------------------------------------------------ 19
3.4Brief description of components used---------------------------------------------19
4.Project Management Plan------------------------------------------------------------------ 22-24

4.1 Tasks-------------------------------------------------------------------------------. 22
4.2 Project Plan--------------------------------------------------------------------------23
4.3 Timeline Chart----------------------------------------------------------------------23
5.Software Requirements Specifications (SRS)----------------------------------------- 25-29

5.1Hardware requirement----------------------------------------------------------25
5.2Software requirement-----------------------------------------------------------25
5.3 User Documentation------------------------------------------------------------25
5.4 System features------------------------------------------------------------------25
5.5. User Interfaces -----------------------------------------------------------------26
5.6. Hardware Interface-------------------------------------------------------------26
5.7 Software Interfaces--------------------------------------------------------------26
5.8Software system attributes-------------------------------------------------------27
5.9 Communication Protocols-------------------------------------------------------27
5.10. Software Product Features ----------------------------------------------------28
5.10.1 Reliability ....28
5.10.2 Availability ..28
5.10.3 Security 28
5.10.4 Maintainability ....28
5.10.5 Portability 28
5.10.6Database Requirements 28
6.Software Design Description-------------------------------------------------------------------- 30-50

6.1 System Architectural Design----------------------------------------------------- 30
6.2 System Interface Description --------------------------------------------------- 31
6.3 Detailed Description of Components------------------------------------------- 32
6.4UML diagrams------------------------------------------------------------------------37
6.4.1Use case diagram-------------------------------------------------------------------37
6.4.2Class diagram-----------------------------------------------------------------------38
6.4.3Activity Diagram--------------------------------------------------------------------39
6.4.4 Sequence Diagram-----------------------------------------------------------------40
6.4.5 Communication Diagram----------------------------------------------------------41
6.4.6Component Diagram----------------------------------------------------------------42
6.4.7Deployment Diagram---------------------------------------------------------------43
6.5Implementation details----------------------------------------------------------------45
6.5.1Page rank algorithm-------------------------------------------------------------------45
6.5.2Keyword relevance algorithm-------------------------------------------------------48
6.5.3Database connector-------------------------------------------------------------------49
7.System Test cases and test results --------------------------------------------------------------.51-64
7.1 Introduction 51
7.2 Test Cases & Results 58
8. Further Works -------------------------------------------------------------------------------------- 65

9.Screenshots--------------------------------------------------------------------------------------------66
10.References-------------------------------------------------------------------------------------------71
Abstract
Problem statement: Develop a framework (Rules Engine) for popularity based ranking
algorithms.
Platform: Visual studio 2003,Microsoft .net framework
Detail information:
What is a page rank?
Page Rank is a numeric value that represents how important a page is on the web. When one
page links to another page, it is effectively casting a vote for the other page. The more votes that
are cast for a page, the more important the page must be. Also, the importance of the page that is
casting the vote determines how important the vote itself is.
How the page rank is calculated?
To calculate the Page Rank for a page, all of its inbound links are taken into account. These are
links from within the site and links from outside the site.
PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn))
How to make use of page ranking algorithm:
Every search engine has their own algorithm for ranking the pages in order to retrieve the pages
to the user in return of their query used at the time of searching. Using a good searching strategy
will help to make the search much faster and efficient. This ranking algorithm will provide the
engine with much more refined searching strategies.
Application of project: To develop a Rules engine that will accept the user input and the search
criteria as specified by the user and give proper results. This Rules Engine will accept different
criteria along with the algorithm and will generate the most popular result on the basis of criteria
defined. This engine will even perform processing of the rules. Processing will include indexing,
stemming and stop word removal depending on the parameter passed by the user. The criteria
specified will be used by the Rules Engine and accordingly the most popular results will be given
to the user.
PROBLEM STATEMENT
To implement Rules Engine For Popularity Based Ranking
CHAPTER 1
PROJECT OVERVIEW
Functional description: Rules engine is a user application which has been developed in order to
rank any kind of data the user wants. It is a framework that has two basic elements the first one is
a connector second one is the rules .The model can accommodate any type of a connector the
basic aim of a connector is to fetch a certain kind of data for which it is specialized, the user can
then click in any type of connector and use any ranking algorithm to rank the data. For example a
user can use a WebCrawler as a connector and rank the web pages either using a page ranking
algorithm or by using a keyword relevance algorithm.
For example if we want to find out one of the best communities on Orkut then that can be done
by generating a connector which can find out various communities, we can then use certain
criteria to decide how to judge a best community the criteria can be like maximum number of
people who have joined the community can be ranked higher on the same lines we have to
develop the raking algorithm using this criteria. Finally we get the best community.
The major Area of work is in fields of Information retrieval, text processing and ranking of data.
PROJECT DELIVERABLES
Sr no
1
2
3
4
5
6
7
8
9
Date
27 Aug 2008
29 Sep 2008
05 Oct 2008
01 Jan 2009
15 Jan 2009
20 Jan 2009
21 Feb 2009
25 Feb 2009
05 Mar 2009
Deliverables
Page Ranking
Web crawler
Keyword Relevance
Merging of PR,KR,Crawler
Com DLL
Testing the system
Database connector
Testing the Database connector
Delivering entire system
CHAPTER 2
SYSTEM ARCHITECTURE
The Page ranking Algorithm [23]:
This algorithm is used by all the search engines. It is a method to rank web pages giving
to it a numeric value that represents their importance. Based on the link structure of the
web a page X has a high rank if:
-
It has many In-links or few but highly ranked
Has few Out-links
Basic idea: Pages rank determined by the number of links to the page (also known as
citations). If citing page is more important (has a high page rank/authority page) then the
pages it cites are more important. If citing page has many links, then cited page is less
important (normalize for number of links on citing page). PR(P) is page rank of page P,
T1, , TN are pages that cite P,C(P) is the number of links from Page P, D is a decay
factor, e.g., 0.85 then:
PR (P) = (1 d) + d (PR (T1)/C (T1) + + PR (Tn)/C (Tn))
Page Rank is a probability distribution used to represent the likelihood that a person
randomly clicking on links will arrive at any particular page. Page Rank can be calculated
for any-size collection of documents. It is assumed in several research papers that the
distribution is evenly divided between all documents in the collection at the beginning of
the computational process. The Page Rank computations require several passes, called
"iterations", through the collection to adjust approximate Page Rank values to more
closely reflect the theoretical true value.
A probability is expressed as a numeric value between 0 and 1. A 0.5 probability is
commonly expressed as a "50% chance" of something happening. Hence, a Page Rank of
0.5 means there is a 50% chance that a person clicking on a random link will be directed
to the document with the 0.5 Page Rank.
Simplified algorithm:
10
How Page Rank Works:

Assume a small universe of four web pages: A, B, C and D. The initial approximation of
Page Rank would be evenly divided between these four documents. Hence, each
document would begin with an estimated Page Rank of 0.25.
In the original form of Page Rank initial values were simply 1. This meant that the sum of
all pages was the total number of pages on the web. Later versions of Page Rank would
assume a probability distribution between 0 and 1. Here we're going to simply use a
probability distribution hence the initial value of 0.25.
If pages B, C, and D each only link to A, they would each confer 0.25 Page Rank to A.
All Page Rank i.e. PR ( ) in this simplistic system would thus gather to A because all
links would be pointing to A.
This is 0.75.
Again, suppose page B also has a link to page C, and page D has links to all three pages.
The value of the link-votes is divided among all the outbound links on a page. Thus, page
B gives a vote worth 0.125 to page A and a vote worth 0.125 to page C. Only one third of
D's Page Rank is counted for A's Page Rank (approximately 0.083).
11
In other words, the Page Rank conferred by an outbound link L( ) is equal to the
document's own Page Rank score divided by the normalized number of outbound links (it
is assumed that links to specific URLs only count once per document).
In the general case, the Page Rank value for any page u can be expressed as:
,
i.e. the Page Rank value for a page u is dependent on the Page Rank values for each page
v out of the set Bu (this set contains all pages linking to page u), divided by the number
L(v) of links from page v.
How is Page Rank calculated?
To calculate the Page Rank for a page, all of its inbound links are taken into account.
These are links from within the site and links from outside the site.
PR (A) = (1-d) + d (PR (t1) / C (t1) + ... + PR (tn) / C (tn))
That's the equation that calculates a page's Page Rank.
In the equation 't1 - tn' are pages linking to page A, 'C' is the number of outbound links
that a page has and 'd' is a damping factor, usually set to 0.85 and(1-d) is called as
normalization factor .
Different criteria used in page ranking algorithm
Inbound links:
Inbound links (links into the site from the outside) are one way to increase a site's total
Page Rank. The other is to add more pages. The linking page's Page Rank is important,
but so is the number of links going from that page. Once the Page Rank is injected into
your site, the calculations are done again and each page's Page Rank is changed.
Depending on the internal link structure, some pages' Page Rank is increased, some are
unchanged but no pages lose any Page Rank.
It is beneficial to have the inbound links coming to the pages to which you are channeling
your Page Rank. A Page Rank injection to any other page will be spread around the site
through the internal links. The important pages will receive an increase, but not as much
of an increase as when they are linked to directly. The page that receives the inbound link
makes the biggest gain.
12
It is easy to think of our site as being a small, self-contained network of pages. When we
do the Page Rank calculations we are dealing with our small network. If we make a link
to another site, we lose some of our network's Page Rank, and if we receive a link, our
network's Page Rank is added to. But it isn't like that. For the Page Rank calculations,
there is only one network - every page that Google has in its index. Each iteration of the
calculation is done on the entire network and not on individual websites.
Outbound links:
Outbound links are a drain on a site's total Page Rank. They leak Page Rank. To counter
the drain, try to ensure that the links are reciprocated. Because of the Page Rank of the
pages at each end of an external link, and the number of links out from those pages,
reciprocal links can gain or lose Page Rank. We need to take care when choosing where
to exchange links.
When Page Rank leaks from a site via a link to another site, all the pages in the internal
link structure are affected. The page that you link out from makes a difference to which
pages suffer the most loss. Without a program to perform the calculations on specific link
structures, it is difficult to decide on the right page to link out from, but the generalization
is to link from the one with the lowest Page Rank.
Many websites need to contain some outbound links that are nothing to do with Page
Rank. Unfortunately, all 'normal' outbound links leak Page Rank. But there are 'abnormal'
ways of linking to other sites that don't result in leaks. Page Rank is leaked when Google
recognizes a link to another site. The answer is to use links that Google doesn't recognize
or count. These include form actions and links contained in JavaScript code.
Damping factor:
The Page Rank theory holds that even an imaginary surfer who is randomly clicking on
links will eventually stop clicking. The probability, at any step, that the person will
continue is a damping factor d. Various studies have tested different damping factors, but
it is generally assumed that the damping factor will be set around 0.85.
The damping factor is subtracted from 1 (and in some variations of the algorithm, the
result is divided by the number of documents in the collection) and this term is then
added to the product of the damping factor and the sum of the incoming Page Rank
scores.
That is,
Or (N = the number of documents in collection)
13
So any page's Page Rank is derived in large part from the Page Ranks of other pages. The
damping factor adjusts the derived value downward. Google recalculates Page Rank
scores each time it crawls the Web and rebuilds its index. As Google increases the
number of documents in its collection, the initial approximation of Page Rank decreases
for all documents.
The formula uses a model of a random surfer who gets bored after several clicks and
switches to a random page. The Page Rank value of a page reflects the chance that the
random surfer will land on that page by clicking on a link. If a page has no links to other
pages, it becomes a sink and therefore terminates the random surfing process. However,
the solution is quite simple. If the random surfer arrives at a sink page, it picks another
URL at random and continues surfing again.
When calculating Page Rank, pages with no outbound links are assumed to link out to all
other pages in the collection. Their Page Rank scores are therefore divided evenly among
all other pages. In other words, to be fair with pages that are not sinks, these random
transitions are added to all nodes in the Web, with a residual probability of usually d =
0.85, estimated from the frequency that an average surfer uses his or her browser's
bookmark feature.
So, the equation is as follows:
where p1,p2,...,pN are the pages under consideration, M(pi) is the set of pages that link to
pi, L(pj) is the number of outbound links on page pj, and N is the total number of page.
Keyword relevance algorithm:
In keyword relevance algorithm the page which has maximum count of the keyword is
ranked high.
Two terms used in keyword relevance algorithm are Total count and total keyword
occurrence. Total count actually stands for total number of keywords that occur in a
webpage for example if a page has keywords sun, moon, earth, moon then total count
is 4. If a page has 5 keywords say sun, moon, earth, sun, sun then total keyword
occurrence is 3 since sun has occurred thrice but meanwhile the total keyword count
is incremented only once.
Database connector:
The database connector used in project is used for populating a set of records. The
connector is used to insert specific records by the user. Once the user has entered all the
records he has to recommend some of things like places, foods or
14
restaurants.Recommendation of all the users will be saved which in turn will be used for
endorsement of a specific thing. This application can be used to create brand awareness
on social networking site. This application can even be further integrated with any of the
ranking algorithm and the ranking algorithm could be used to rank the data.
CHAPTER 3
PROJECT ORGANISATION
15
Software Process Model [22]:

Software process models often represent a networked sequence of activities, objects,
transformations, and events that embody strategies for accomplishing software evolution. Such
models can be used to develop more precise and formalized descriptions of software life cycle
activities. Their power emerges from their utilization of a sufficiently rich notation, syntax, or
semantics, often suitable for computational processing. Software process networks can be
viewed as representing multiple interconnected task chains
.
Task chains represent a non-linear sequence of actions that structure and transform available
computational objects (resources) into intermediate or finished products. Non-linearity implies
that the sequence of actions may be non-deterministic, iterative, accommodate multiple/parallel
alternatives, as well as partially ordered to account for incremental progress. Task actions in turn
can be viewed a non-linear sequence of primitive actions which denote atomic units of
computing work, such as a user's selection of a command or menu entry using a mouse or
keyboard.
Task chains can be employed to characterize either prescriptive or descriptive action sequences.
Prescriptive task chains are idealized plans of what actions should be accomplished, and in what
order. For example, a task chain for the activity of object-oriented software design might include
the following task actions:
_ Develop an informal narrative specification of the system.
_ Identify the objects and their attributes.
_ Identify the operations on the objects.
_ Identify the interfaces between objects, attributes, or operations.
_ Implement the operations.
There was a compelling need to provide a limited set of software functionality to users and
then refine and expand on that functionality in later software releases. In such a case, a process
model that is designed to produce software in increments is chosen, i.e. the incremental model is
chosen.
The Incremental Model combines elements of the waterfall model applied in an
iterative fashion. The incremental model applies linear sequences in a staggered fashion as
calendar time progresses. Each linear sequence produces deliverable increments of the
software.
16
When an incremental model is used, the first increment is often a core product. That is
basic requirements are addressed, but many supplementary features remain undelivered. The
core product is used by the customer. As a result of use and/or evaluation, a plan is developed
for the next increment. The plan addresses the modification of the core product to better meet
the needs of the customer and the delivery of additional features and functionality. This
process is repeated following the delivery of each increment, until the complete product is
produced.
The incremental model is iterative in nature. It focuses on the delivery of an operational
product with each increment. Early increments are stripped down versions of the final
product, but they do provide capability that serves the user and also provides a platform for
evaluation by the user. In addition, increments can be planned to manage technical risks.
17
In our software the incremental model was implemented as:

Increment #1
Implementation of page ranking algorithm
Increment #2
Implementation of page ranking algorithm with different damping factor
Increment #3
Implementation of Crawler
Implementation of HTML parser
Increment #4
Implementation of hash table

Implementation of stop word removal code
Increment #5
Implementation of indexing technique
Increment #6
Implementation of keyword relevance algorithm
Increment #7
Implementation of user interface

Implementation of COM DLL
Implementation of database connector
Merging of all the modules
Increment #8
Increment #9
Roles and Responsibility:

1. Roles:
18
Understanding the requirements, purpose, goals and the scale of the project
Finalizing the project problem definition

Market study of various available open source search engines
To study various ranking algorithm
To implement different ranking algorithms
Studying & understanding of various criteria used in ranking purpose
Understanding the web crawler
Implementation of our own crawler
Finalizing the indexing technique
To implement the database connector
Designing the User Interface part of the project
Integrate and test all modules
Demonstrate and do suggested modifications
Prepare a final report and a presentation
2. Project Team Members & their Assignments:
a. Dnyaneshwari Chandarana
Studying different ranking algorithms

Studying the Web crawler
Understanding the COM DLL
Understanding indexing technique
Preparing the detailed synopsis and presentation
Implementing the following features:
Page ranking algorithms

Keyword relevance algorithm
COM DLL
Integrate all the modules together
Perform Authentication & Registration Module Testing
b. Nitu Singh
Finalizing the ranking algorithms to be implemented

Studying the various criteria used in present ranking algorithms
19
Understanding the need for database connector

Preparing the detailed synopsis and presentation
Developing the User Interface for the software
Implementing the following features:
Database connector
User interface
Generate Test Plan Identify Test Cases

Generate Test Summary
Tracking & Monitoring Test Efforts
Manage required software.
Tools and Techniques:

The following tools have been used in the project:
At the initial stage of our project i.e.
20
1. At Stage 1 we gathered requirements from the client and formulated the requirement
Analysis in Microsoft Word 2003.
2. At Stage 2 of our project we prepared a detailed project plan.

Tool used Microsoft Project Planner
3. At Stage 3 the entire designing of the project was done.

4. Database Design
Tool used MS-Access
5. Programming Language C# and C++
Development methodology A name table, likes table, place table, recommendation table
was created using MS-Access.
6. And finally at Stage 5 the Development of code was done.
Brief description of various tools used:

1. Visual Studio 2003.net [21]: Microsoft Visual Studio is an integrated development
environment (IDE) from Microsoft. It can be used to develop console and graphical user
interface applications along with Windows Forms applications, web sites, web
applications, and web services in both code together with managed code for all platforms
supported
by Microsoft
Windows, Windows
Mobile, Windows
CE, .NET
Framework, .NET Compact Framework and Microsoft Silver light.
Visual Studio includes a code editor supporting IntelliSense as well as code refactoring.
The integrated debugger works both as a source-level debugger and a machine-level
debugger. Other built-in tools include a forms designer for building GUI applications,
web designer, class designer, and database schema designer. It allows plug-ins to be
added that enhance the functionality at almost every level - including adding support
for source control systems (like Subversion and Visual SourceSafe) to adding new
toolsets like editors and visual designers for domain or toolsets for other aspects of
21
the cycle. Visual Studio supports languages by means of language services, which allow
any programming language to be supported (to varying degrees) by the code editor and
debugger, provided a language-specific service has been authored. Built-in languages
include C/C++ (via Visual), VB.NET (via Visual Basic .NET), and C# (via Visual C#).
Support for other languages such as Chrome, F#, Python, and Ruby among others has
been made available via language services which are to be installed separately. It also
supports XML/XSLT, HTML/XHTML, JavaScript and CSS
2. Microsoft .net framework [20]: The Microsoft .NET Framework is a software

framework available with several Microsoft Windows operating systems. It includes a
large library of coded solutions to prevent common programming problems and a virtual
machine that manages the execution of programs written specifically for the framework.
The .NET Framework is a key Microsoft offering and is intended to be used by most new
applications created for the Windows platform.
The framework's Base Class Library provides a large range of features including user
interface, data and data access, database connectivity, cryptography, application
development, numeric algorithms, and network communications. The class library is used
by programmers, who combine it with their own code to produce applications.
Programs written for the .NET Framework execute in a software environment that
manages the program's runtime requirements. Also part of the .NET Framework, this
runtime environment is known as the Common Language Runtime (CLR). The CLR
provides the appearance of an application virtual machine so that programmers need not
consider the capabilities of the specific CPU that will execute the program. The CLR also
provides other important services such as security, memory management, and exception
handling. The class library and the CLR together compose the .NET Framework.
Version 3.0 of the .NET Framework is included with Windows Server 2008 and Windows
Vista. The current version of the framework can also be installed on Windows XP and
the Windows Server 2003 family of operating systems.
3. Rational Rose Software: An object-oriented analysis and design tool that runs on
Windows and UNIX platforms from IBM. It supports the Unified Modeling Language
(UML) as well as the earlier Booch and OMT notations.
4. Win Runner [19]: It is an automated functional GUI testing tool that allows a user to
record and play back UI interactions as test scripts.
22
As a Functional test suite, it works together with HP Quick Test Professional and
supports enterprise quality assurance. HP Win Runners intuitive recording process helps
you produce robust functional tests. To create a test, HP Win Runner simply records a
typical business process by emulating user actions, such as ordering an item or opening a
vendor account. During recording, you can directly edit generated scripts to meet the
most complex test requirements. Next, testers can add checkpoints, which compare
expected and actual outcomes from the test run. HP Win Runner offers a variety of
checkpoints, including test, GUI, bitmap and web links. HP Win Runner can also verify
database values to determine transaction accuracy and database integrity, highlighting
records that have been updated, modified, deleted and inserted. With a few mouse clicks,
the Data Driver Wizard feature lets you convert a recorded business process into a data
driven test that reflects the real-life actions of multiple users. For further test
enhancement, the Function Generator feature presents a quick and reliable way to
program tests, while the Virtual Object Wizard feature lets you teach HP Win Runner to
recognize, record and replay any unknown or custom object. As HP Win Runner executes
tests, it operates the application automatically, as though a real user were performing each
step in the business process. If test execution occurs after hours or in the absence of a
quality assurance (QA) engineer, the Recovery Manager and Exception Handling
mechanisms automatically troubleshoot unexpected events, errors and application crashes
so that tests can complete smoothly. Once tests are run, HP Win Runners interactive
reporting tools help your team interpret results by providing detailed, easy-to-read reports
that list errors and their originations. HP Win Runner lets your organization build
reusable tests to repeat throughout an applications lifecycle. Thus, if developers modify
an application over time, testers do not need to modify multiple tests. Instead, they can
apply changes to the Graphical User Interface (GUI) Map, a central repository of testrelated information, and HP Win Runner automatically propagates changes to all relevant
script
CHAPTER 4
PROJECT MANAGEMENT PLAN
23
Task:
A task set is a collection of software engineering work tasks, deliverables and milestones,
resources, dependencies, constraints, risks and contingencies that must be accomplished to
complete a particular project. Our project can be carried out with a structured degree of rigor.
Our project has the following main tasks to be carried out.
Task Name
: Page ranking algorithm
Description : This algorithm should rank the pages according to inbound and
outbound links
Resources needed:
Dependencies & Constraints
Risk & Contingencies
Microsoft Visual studio 2003

Internet connectivity
Crawler
LAN/Internet Connectivity
Failure of Server
Failure of LAN/Internet Connectivity
Task name: Keyword relevance algorithm

Resources needed:
Microsoft Visual studio 2003

Internet connectivity
Crawler
Dependencies & Constraints

LAN/Internet Connectivity
Risk & Contingencies
Failure of LAN/Internet Connectivity
Project plan:
24
1) To implement ranking algorithms.

2) To implement connectors e.g. WebCrawler, database connector.
3) To encompass ranking algorithms in com DLL.
4) Create a user interface for clicking in the type of connector and ranking algorithm to be used
for ranking the data.
Timeline chart:
Task Name
Duration
Start
Finish
Sponsorship Search
8 days
Mon 02/07/08
Mon 09/07/08
Formalities at Ubiqtas
7 days
Tues 10/07/08
Mon 16/07/08
Confirmation letter
1 day
Tues 17/07/08
Tues 17/07/08
Finalizing problem statement
9 days
Wed 18/07/08
Thu 26/07/08
Discussion with external guide
8 days
Fri 27/07/08
Fri 03/08/08
Making synopsis
10 days
Sat 04/08/08
Mon 13/08/07
Submission of final synopsis
1 day
Tues 14/08/08
Tues 14/08/08
Confirmation of problem
statement from college
1 day
Thu 16/08/08
Thu 16/08/08
Information gathering
15 days
Fri 17/08/08
Fri 31/08/08
10
Discussion with internal guide
9 days
Sat 01/09/08
Sun 09/09/08
11
Preparation of presentation
8 days
Mon 10/09/08
Mon17/09/08
12
Delivery of seminar
1 day
Tues 18/09/08
Tues18/09/08
13
Literature survey
20 days
Wed 19/009/08
Mon 18/10/08
14
Requirement specification
26 days
Tues 09/10/08
Sat 03/11/08
15
Initial design
25 days
Sun 04/11/08
Wed 28/11/08
16
Verify design
9 days
Tues 01/01/09
Wed 09/01/09
17
Implementation and coding
60 days
Thu 10/01/08
09/03/09
25
18
GUI
15 days
Mon 10/03/08
Mon 24/03/08
19
Testing and debugging
28 days
Tues 25/03/09
Tues 29/04/09
20
Documentation and preparation

of report
10 days
Wed 30/04/09
10/04/09
CHAPTER 5
SOFTWARE REQUIRMENT SPECIFICATION
Hardware Requirement:
26
766 MHz Pentium or compatible processor (1.5 GHz Pentium recommended)
512 MB RAM more recommended).
Video Monitor (800 600 or higher resolution) with at least 256 colors (1024 768 High
color 16-bit recommended).
Software Requirement:
Front End : VC++(2003),C#.net
Operating System : Windows XP
Database: MS access
User Documentation: User guide or manual should be small and contain all the
information in user understandable format.
In user manual also provide the picture or diagram for proper way to guide user.
System features:
1. Helps in ranking the web pages according to the popularity of web pages.
2. Helps in ranking the web pages according to the relevancy of web pages.
3. Provide an interface/tool to create awareness on social networking sites.
User Interfaces:
We designed a simple user interface using the Microsoft Visual studio 2003
development tool and C# as the programming language. Our user interface is similar to
most of the standard search engines, and contains buttons for performing the basic
functions as specified in the user requirements.
Most of the error messages will be pop up in a dialog box.
Hardware Interfaces:
A computer with minimum 512 MB of RAM with internet connectivity is required.
27
Software Interfaces:
The Rules Engine will run only if the server (in our case Authenticator) is running on
the server machine. The server includes the MS Access database.
The COM DLL is needed to load the project at run time.
Software System Attributes:
The page ranking algorithm computes the rank of pages from a specified set of
pages and displays the most highly ranked pages accordingly.
The Keyword relevance algorithm gives result according to the maximum
frequency count of words on a particular page. The page with the highest
frequency count will be on the top rank.
The database connector is used to insert particular information from the user such
as his/her name, likes as well as the recommendation made.
Communication Protocols:
The communication protocol used in our system is FTP
28
File Transfer Protocol (FTP) is a network protocol used to exchange and manipulate files over
a TCP computer network, such as the internet. An FTP client may connect to an FTP server to
manipulate files on that server.
FTP runs over TCP.[1] It defaults to listen on port 21 for incoming connections from FTP clients.
A connection to this port from the FTP Client forms the control stream on which commands are
passed from the FTP client to the FTP server and on occasion from the FTP server to the FTP
client. FTP uses out-of-band control, which means it uses a separate connection for control and
data. Thus, for the actual file transfer to take place, a different connection is required which is
called the data stream. Depending on the transfer mode, the process of setting up the data stream
is different. Port 21 for control (or program), port 20 for data.
In active mode, the FTP client opens a dynamic port, sends the FTP server the dynamic port
number on which it is listening over the control stream and waits for a connection from the FTP
server. When the FTP server initiates the data connection to the FTP client it binds the source
port to port 20 on the FTP server.
The objectives of FTP are:
1. To promote sharing of files (computer programs and/or data).
2. To encourage indirect or implicit use of remote computers.
3. To shield a user from variations in file storage systems among different hosts.
4. To transfer data reliably, and efficiently.
Reliability:
The reliability of the overall program depends on the reliability of the separate
components.
Availability: The system can be made available if u have the specific kind of connector
for specific kind of data one wants to search .Internet availability is a must.
Security:
29
Passwords will be saved in the database in order to ensure the user's privacy.
Maintainability:
The maintainability of the project has been addressed by assigning appropriate variable
names, following appropriate naming convention for functions and appropriate coding
standards. The segregation of code makes it easy to understand, maintain and modify.
Portability:
The application is Windows-Xp based and should be compatible with other systems. The
end-user part is fully portable and any system having any operating system should be able
to use the features of the application.
Database Requirements:
A database is maintained in MS access to keep a list of all users, their likes and
recommendation made by them. Following are the tables maintained in the database:
FIELD NAME
DATATYPE
VALIDATION
Name
TEXT
NOT NULL,PRIMARY KEY
Likes
TEXT
NOT NULL
Recommendation
TEXT
NOT NULL
30
CHAPTER 6
SOFTWARE DESIGN DESCRIPTION
31
Rules engine will perform processing of the rules. Processing includes different functions like
indexing, stemming and stop word removal depending on the parameters passed by the user.
Algorithm: We will develop an algorithm which takes these parameters as input and
generates the most popular result on the basis of the criteria defined.
This algorithm will have many criteria defined which will allow the user to search
specific information according to his own chosen criteria.
Whenever we search something on search engines the results are displayed according to
popularity of pages meaning those pages which have high rank will be displayed first.
Instead of this we can let user decide the criteria of searching and have the results
according to their own chosen criteria.
32
System interface design:
Detail description of components

1. Dynamic link library [18]:
DLL, is Microsofts implementation of the shared library concept in the Microsoft

Windows and OS/2 operating systems. These libraries usually have the file
extension DLL, OCX (for libraries containing ActiveX controls), or DRV. The formats for
DLLs are the same as for Windows EXE files that is, Portable (PE) for 32-bit and 64bit Windows, and New Executable (NE) for 16-bit Windows. As with EXEs, DLLs can
contain code, data and resources in any combination.
Facilities provided by DLL:
33
a. Memory management:
In Win32, the DLL files are organized into sections. Each section has its own set of
attributes, such as being writable or read-only, executable (for code) or non-executable
(for data), and so on.
The code in a DLL is usually shared among all the processes that use the DLL; that is,
they occupy a single place in physical memory, and do not take up space in the page file.
If the physical memory occupied by a code section is to be reclaimed, its contents are
discarded, and later reloaded directly from the DLL file as necessary.
In contrast to code sections, the data sections of a DLL are usually private; that is, each
process using the DLL has its own copy of all the DLL's data. Optionally, data sections
can be made shared, allowing inter-process communication via this shared memory area.
However, because user restrictions do not apply to the use of shared DLL memory, this
creates a security hole; namely, one process can corrupt the shared data, which will likely
cause all other sharing processes to behave undesirably. For example, a process running
under a guest account can in this way corrupt another process running under a privileged
account. This is an important reason to avoid the use of shared sections in DLLs.
If a DLL is compressed by certain executable packers (e.g. UPX), all of its code sections
are marked as read-and-write, and will be unshared. Read-and-write code sections, much
like private data sections, are private to each process. Thus DLLs with shared data
sections should not be compressed if they are intended to be used simultaneously by
multiple programs, since each program instance would have to carry its own copy of the
DLL, resulting in increased memory consumption.
b. Import libraries
Linking to dynamic libraries is usually handled by linking to an import library when
building or linking to create an executable file. The created executable then contains an
import address table (IAT) by which all DLL function calls are referenced (each
referenced DLL function contains its own entry in the IAT). At run-time, the IAT is filled
with appropriate addresses that point directly to a function in the separately-loaded DLL.
Like static libraries, import libraries for DLLs are noted by the .lib file extension. For
example, kernel32.dll, the primary dynamic library for Windows' base functions such as
file creation and memory management, is linked via kernel32.lib.
34
c. Explicit run-time linking

DLL files may be explicitly loaded at run-time, a process referred to simply as run-time
dynamic linking by Microsoft, by using the LoadLibrary or LoadLibraryEx API
function. The GetProcAddress API function is used to lookup exported symbols by
name, and FreeLibrary to unload the DLL. These functions are analogous
to dlopen, dlsym, and dlclose in the POSIX standard API.
2. Component Object modeling:

The Component Object Model (COM) extends the DLL concept to object-oriented
programming. Objects can be called from another process or hosted on another machine.
COM objects have unique GUIDs and can be used to implement powerful back-ends to
simple GUI front ends such as Visual Basic and ASP. They can also be programmed from
scripting languages. COM objects are more complex to create and use than DLLs.
3. Library file libcurl-7.19.3-win32-ssl-msvc, HTMLReader_src [17]:

cURL is a command line tool for transferring files with URL syntax. The original author
of cURL is Daniel Stenberg, who started the project in 1997, as a way to transfer files
more programmatically via protocols such as http, ftp, gopher, sftp, ftps, scp, tftp, and
many more (13 total), via a command line interface. The strong point of cURL is the
number of data transfer protocols it supports (listed further down). Distributed under the
MIT License, cURL is free software.
Example
Basic use of curl involves simply typing curl at the command line, followed by the URL
of the output you want to retrieve.
To retrieve the Wikipedia homepage, type:
curl www.wikipedia.org
Curl defaults to displaying the output it retrieves to the standard output specified on the
system, which is usually the terminal window. So running the command above would, on
most systems, display the www.wikipedia.org source code in the terminal window.
35
4. Library file HTMLReader_src is an html parser used to parse HTML web pages [16].
An events-based parser uses the callback mechanism to report parsing events. These
callbacks turn out to be protected virtual member functions that you will override.
Events, such as the detection of an opening tag or the closing tag of an element, will
trigger a call to the corresponding member function of your class. The application
implements and registers an event handler with the reader. It is up to the application to
put some code in the event handlers designed to achieve the objective of the application.
Events-based parsers provide a simple, fast, and a lower-level access to the document
being parsed.
Events-based parsers do not create an in-memory representation of the source document.
They simply parse the document and notify client applications about various elements
they find along the way. What happens next is the responsibility of the client application.
Events-based parsers don't cache information and have an enviably small memory
footprint.
The page ranking algorithm, the keyword relevance algorithm and the web crawler are
integrated to form a web connector we create a com DLL further we import this DLL into
our windows application .A database connector which uses ms-access at backend is
created for the same windows application.
5. Web crawler [15]:
It creates a copy of all the visited pages for later processing by a search engine that will
index the downloaded page to provide fast searches. Checking links or validating HTML
code can be used to gather specific type of information from web pages such as
harvesting e-mail address (spam). Web crawling is modeled as a multiple queue, singleserver polling system on which the web crawler is the server and the web sites are
queues. The objective of crawler is to keep the average freshness of pages in its collection
as high as possible or to keep the average age of pages as low as possible. To improve the
freshness we should penalize the elements that change too often. A web crawler (also
known as a web spider, web robot, orespecially in the FOAF communityweb scatter)
is a program or automated script which browses the World Wide Web in a methodical,
automated manner. Other less frequently used names for web crawlers are ants, automatic
36
indexers, bots, and worms. This process is called web crawling. Many sites, in particular
search engines, use crawling as a means of providing up-to-date data. Web crawlers are
mainly used to create a copy of all the visited pages for later processing by a search
engine that will index the downloaded pages to provide fast searches. Crawlers can also
be used for automating maintenance tasks on a website, such as checking links or
validating HTML code. Also, crawlers can be used to gather specific types of information
from Web pages, such as harvesting e-mail addresses (usually for spam).A web crawler is
one type of BOT, or software agent. In general, it starts with a list of URLs to visit, called
the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and
adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier
are recursively visited according to a set of policies.
Algorithm for Web Crawler:
Get the URL
Dump the URL into a data structure called queue
Go to that URL scan the entire page find out if any links are present, if any
URLs are present dump them into that linked list.
All the URLs present in the linked list are called as child URLs and the one
present in queue are called as parent.
Now pick first child URL from linked list dump it into the queue this URL
then becomes the parent repeat step 1.
Repeat the process for each and every child URL present in the linked list.
Keep on doing so till the depth mentioned at the start of the code is reached
6. Database Connector:
To implement this connector the database used is MS-access.
This connector is used for populating a set of records. The connector is used to
insert specific records by the user.
Once the user has entered all the records he has to recommend some of things
like places, foods or restaurants.
Recommendation of all the users will be saved which in turn will be used for
endorsement of a specific thing.
This application can be used to create brand awareness on social networking

site.
37
This application can even be further integrated with any of the ranking
algorithm and the ranking algorithm could be used to rank the data.
When a user recommends a thing or a place he is casting his vote to it.
A place/thing with the maximum number of votes will be the most popular
among all the data.
When a user fires a query to see endorsements of a specific thing the one with
the highest number of votes would be on the top of the list.
UML diagrams:
1. Use case diagram:
38
2. Class Diagram:
39
3. Activity Diagram:
40
4. Sequence Diagram:
41
42
5. Communication Diagram:
43
44
6. Component Diagram:
45
7. Deployment Diagram:
46
Implementation Details:
1. Page Rank Algorithm:
#include<conio.h>
#include<iostream>
using namespace std;
int main()
{
int m=0, i, j, k;
const double d=0.85;
const double n=1-d;
float linkMap[10][10]= {
{1,2,3,0,0,0,1,2,3,1},
{4,5,6,1,0,0,1,2,3,1},
{0,2,3,1,1,1,1,2,3,0},
{1,2,3,0,0,0,1,0,0,0},
{1,3,3,0,0,0,1,2,1,1},
{1,1,1,1,1,0,0,0,0,0},
{0,0,0,0,0,0,1,1,1,1},
{1,1,1,0,0,0,2,2,2,2},
{1,1,1,0,0,0,1,1,2,1},
{1,2,2,0,3,2,1,2,3,1}};
47
float pageValue[10]
= {0};
int outboundLinks[10] = {0};
cout << "Enter the number of iterations ::";

cin >> i;
for(j=0;j<10;j++)
{
for(k=0;k<10;k++)
{
if( j != k)
{
outboundLinks[j] += linkMap[j][k];
}
}
for(m=0;m<10;m++)
{
48
printf("outboundlink of page[%d]::%d \n",m,outboundLinks[m]);

}
while (i)
{
for(j=0;j<10;j++)
{
for(k=0;k<10;k++)
{
if(linkMap[j][k]!=0 && (j!= k))

{
pageValue[j] += pageValue[k]/outboundLinks[j];
}
}
pageValue[j] = n + d* (pageValue[j]);
printf("PageValue[%d] = %f\n", j+1, pageValue[j]);
}
printf("-------------------------------\n");
49
i--;
}
getch();
return 0;
}
2. Keyword Relevance Algorithm
for(temp1 = Head[0]; temp1 != NULL; temp1 = temp1->next)
{
for(temp2 = temp1->next; temp2 != NULL; temp2 = temp2->next)
{
if ((temp1->keywordOccurance < temp2->keywordOccurance) ||
( (temp1->keywordOccurance == temp2->keywordOccurance) && (temp1->TotalCount
< temp2->TotalCount)))
{
ptr = temp1->nodePtr;
temp1->nodePtr = temp2->nodePtr;
temp2->nodePtr = ptr;
i = temp1->keywordOccurance;
temp1->keywordOccurance = temp2->keywordOccurance;
temp2->keywordOccurance = i;
i = temp1->TotalCount;
50
temp1->TotalCount = temp2->TotalCount;
temp2->TotalCount = i;
}
}
3. Database connector
private void ExecuteInsertQuery()
{
int rows = 0;
OpenConnection();
crawlerAdapter.SelectCommand.Connection = crawlerConnection;
crawlerAdapter.SelectCommand.CommandText = "Select * From
table1 WHERE Person = '" + textBox4.Text + "' AND Category = '" + textBox6.Text
+ "' AND Object = '" + textBox5.Text + "'";
crawlerAdapter.SelectCommand.ExecuteNonQuery();
CloseConnection();
DataSet ds = new DataSet();
crawlerAdapter.Fill(ds);
if (ds.Tables[0].Rows.Count > 0)
{
MessageBox.Show("Entry is already present");
return;
}
int ID = 0;
OpenConnection();
crawlerAdapter.SelectCommand.Connection = crawlerConnection;
crawlerAdapter.SelectCommand.CommandText = "Select * from
table1";
CloseConnection();
ds = new DataSet();
if (((int)ds.Tables[0].Rows.Count) > 0)
{
OpenConnection();
crawlerAdapter.SelectCommand.Connection =
crawlerConnection;
crawlerAdapter.SelectCommand.CommandText = "Select
MAX(ID) from table1";
CloseConnection();
ds=new DataSet();
51
ID =(int)ds.Tables[0].Rows[0][0];
ID++;
}
else
{
ID = 1;
}
OpenConnection();
crawlerAdapter.InsertCommand.Connection=crawlerConnection;
crawlerAdapter.InsertCommand.CommandText = "INSERT INTO table1
VALUES (" + ID.ToString() + ", '" + textBox4.Text + "', '" + textBox6.Text + "', '" +
textBox5.Text + "')";
MessageBox.Show(crawlerAdapter.InsertCommand.CommandText);
crawlerAdapter.InsertCommand.Connection = crawlerConnection;
rows = crawlerAdapter.InsertCommand.ExecuteNonQuery();
CloseConnection();
PopulateComboBox();
MessageBox.Show("Insert Successful");
}
CHAPTER 7
52
SYSTEM TEST CASES AND TEST RESULTS

Introduction:
The aim of testing process is to identify all defects in a software product.

Testing is any activity aimed at evaluating the software for quality results it
produces and the quality of results it can handle. Testing is an operation to
detect the differences between the expected (required) result and the actual
result.
Testing a program consists of subjecting the program to a test inputs or test

cases and observing if the program behaves as expected. If the program fails
to behave as expected, then the condition under which failures occurs are
noted for later debugging and correction.
Our goal is to design a series of test cases that would have a high likelihood of
finding errors. The software testing techniques provide a systematic guidance
for designing tests that exercise the internal logic of software components and
exercise the input & output domains of the program to uncover errors in
program function, behavior and performance.
Software is tested from two different perspectives:

(1) Internal program logic is exercised using White Box test case design
techniques.
(2) Software requirements are exercised using Black Box test case design
techniques. In both cases, the intent is to find maximum number of errors with
minimum effort and time.
System test objective and scope
53
The main aim to test this is to insure that:
The Proposed system permits only secure and authenticate access.
Thus requires the user to enter the URL in correct format.
Does all validation time to time as per the need.
Takes a single input as user id for the detection of anomalies that is used to generate the
recommendations.
Does all the ranking calculations internally.
Appropriates alerts are generated as per the condition for user convenience.
Database is updated time to time as the user transaction process proceeds.
A full system test will be conducted including following type of tests:

Functional testing:
To be truly robust, application require more than simple functional testing before
release into production.
Permits only secure and authenticate access.
Thus requires the user to be registered with the system before use.
Does all validation time to time as per the need.
Takes a single input as user id for the detection of anomalies that is used to generate the
recommendations.
Does all the conversion of data internally while requires.
Appropriates alerts are generated as per the condition for user convenience.
Database is updated time to time as the user transaction process proceeds.
At least one and preferably all of the following types of testing before releasing
application to customers should be performed.
54
Performance testing
Load testing
Stress testing
Performance Testing
Performance testing is designed to test run-time performance of application within
the context of an integrated system . Proper response time for user actions is critical to
maintaining and enhancing user base.
Load Testing
Load testing demonstrates how the application performs under concurrent user
sessions for typical user scenarios. Setting up common scenarios that execute for a short
period of time allows seeing how the application operates under a multiple-user load.
Stress Testing
Stress test allows examining how the application behaves under a maximum user
load. To stress test application, remove the think time for load scripts and execute the
scripts against the server to overload use of the application. If there are unhandled
exceptions in a stress test, the application may not be robust enough to handle a sudden
unexpected increase in user activity. Stress tests generally execute for a longer period of
time, and can be used to catch difficult-to-diagnose problems like subtle memory leaks in
the application
Items to be tested:
The following items are the ones that constitute the proposed system.
55
Hear we ensure that all the modules, classes and libraries are included when
integrated properly.
No
1.
2.
3.
4.
Name
The page rank algorithm
Keyword relevance algorithm
Web crawler
Database connector
Identifier
C1
C2
C3
C4
Version no
1
1
1
1
Feature to be tested:
Hear we will be testing all features provided by the proposed system to ensure all the
features that distinguish the system are implemented properly. The following are the features that
will be testing here.
a) Registration for users:

Here we check whether the process is working as per Expectations & when incorrect
input is given the system responds with proper error message.
b) Maintain log for each user session:
Here we check that all users log is maintained properly and all tables in the database
are updated properly. With user name, id, recommendations by the user about places
and things.
c) Blocking of unauthorized user to access:
Here with the help of databases table record of the user are maintained So that the
unauthorized user are unable to access the account & in order to do this proper
update must be done.
d) User & system interaction be proper:
Here we check whether the entire process is working properly.
Approach:
56
Testing accomplishes a variety of things, but most importantly it measures the

quality of the software that is developed. This view presupposes that there are defects in
the software waiting to be discovered.
We are doing testing manually. No tools are used for this testing. The testing would be
conducted until we ensure that all the features are supposed to be provided by the system are
working well and good security is provided.
Testing is never ending activity but there must be Limitation to every activity. We build a
module & testing would be done on it ensuring everything is ok. When entire thing is ready all
kinds of testing like use case testing, unit testing, GUI testing.
Pass/Fail Criteria:
Here pass fail criteria are defined in the 1 & 2 modules. The proposed system is pass
if failures observed in the module rate is low at the time of user registration. If the user are writes
wrong information then the system fails & associated defect must be recorded.
Suspension & resumption criteria:

Suspension of testing will be done at the end of the day & resumed on next morning.
Testing will suspended if
1.
2.
3.
4.
Hardware failure occurs.

The system is unable to accept a valid input.
If server is not responding.
If software crash or damage due to some internal problem.
When a defect causing software failure is repaired, the new version of the software
undergoes a regression test. If the new version passes the regression test, then normal
testing can resume.
Test Deliverables:
The Following are the resultants of testing:
1. Test plan
2. Test cases
3. Test procedure sections
4. Test summary reports
5. Test logs
Test Tasks:
57
The following is the list of testing task:

1.
2.
3.
4.
5.
Preparing the test plan & attachments.

Preparing the test design specifications.
Preparing test cases specification
Transmitting test related data to configuration management group.
Supervising testing staff & organizing the test related measurements.
Test Environment:
Software requirements:
Category (Software tools)
Software Name
Operating System
Microsoft Windows XP
Front End Development Tool
.net 2003
Front End
VC++,C#.net
Back End
MS access, Files
Hardware requirements:
Hardware
Minimum Requirement
Microprocessor
Intel 3.1GHz processor.
Random Access Memory (RAM)
512 MB RAM.
Hard Disk Drive
20GB
(min.
free
useable
space).Network
Responsibilities:
Sr.no
Name
Designation
Task
disk
58
Dnyaneshwari
Test Manager
Nitu
Test Manager
3.
Dnyaneshwari
Test Engineer
4.
Nitu
Test Engineer
Generate test plan

Identify test cases
Generate test summary
Perform login module
testing, perform
preprocessing module
testing, execute testes
Perform feature extraction
module testing, generate test
harnesses. Check result
Perform Validation testing,
manage equipment like
laptop.
Record executed test results.
Staffing & Training Needs:

Tester must be an engineer or from technical background. He must know how to handle
system.
Risks:
1. Power failure.
2. Hardware failure.
3. Server crashes.
4. Unable to handle site.
Test case and Test Case Summary
MODULE INFORMATION: PAGE RANK ALGORTIHM
GENERAL INFORMATION:
PRODUCT NAME: RULES ENGINE
2. EXECUTION INFORMATION:
59
PRODUCT VERSION: 1.1

DURATION: 7 DAYS
TESTED BY: DNYANESHWARI CHANDARANA
REVIEWED BY: SINGH NITU
3. TEST SUIT INFORMATION:

COMPONENT NAME: PAGE RANK ALGORITHM
TEST TYPE: USE CASE TESTING (MANUAL)
4. TEST METHODOLOGY:
STEP1: ENTER URL
STEP2: CHECK IF THE ENTERED URL IS IN CORRECT FORMAT
STEP3: MAKE CONNECTIONS OVER THE INTERNET
STEP4: COMPUTE PAGE RANK
60
Test Items to be
id
tested
Steps
Input
Actual output
Expected output
Pass/fail
1.
User enters the

URL
URL address
Display
success
Display message
successful
Pass
URL
System assigns Make

system connects connections
to web
61
2.
System check
for proper
address entered
by the user
System
compares the
data entered by
the user and the
data present in
the database.
If address is
valid
Make
connection
Make connection
Pass
If address is
invalid
Report
improper
address
Report error
Pass
3.
System
computes page
rank
System
downloads
relevant pages
from the web.
4.
User enters
URL to
compute rank
by keyword
relevance
algorithm
System checks
if the URL
entered is in
correct format.
URL address
Display
message
successful
Display message
successful
Pass
TEST CASE SUMMARY:

Sr. no
Test cases
Descriptions
To check whether the user has entered proper URL.
To check whether the system makes proper connections with internet.
62
To check whether the system computes page rank properly.
To check whether high ranked pages are displayed first.
To check whether user has selected the correct application which is set by
admin to him/her
MODULE NAME: DATABASE CONNECTOR

GENERAL INFORMATION:
PRODUCT NAME: RULES ENGINE
EXECUTION INFORMATION:
PRODUCT VERSION: 1.2
DURATION: 7 DAYS
TESTED BY: SINGH NITU
REVIEWED BY: DNYANESHWARI CHANDARANA
TEST SUIT INFORMATION:
COMPONENT NAME: DATABASE CONNECTOR
TEST TYPE: USE CASE TESTING (MANUAL)
TEST METHODOLOGY:
1. User fills his/her information in the space provided.
2. User recommends about places/things.
3. Recommendation made by other user is shown to all.
DETAIL TEST CASES:
63
TE
ST
ID
ITEM TO BE
TESTED
STEPS
INPUT
ACTUAL
OUTPUT
EXPECTED
OUTPUT
PASS
OR
FAIL
1.
User fill his

name ,likes
about place and
thing
User selects
application from the
application list
2.
Names of the
user
User fills his/her name
user fill that

name that is not
already present
in database
User name
username
pass
System checks if
duplicated records are
present in the database
Display error if
present
Display error
if present
3.
4.
User fills
System updates
recommendation information and
assigns user vote to
that particular
place/thing
User
recommendatio
n
5.
User searches to
see the
popularity of
particular
place/thing
Place/thing
The most popular

record
The most
pass
popular record
System checks in the

database to show the
popularity
TEST CASE SUMMARY:

SR.
NO
TEST ID
TEST ACTIONS
To check whether personal information filled by user is in proper format
64
To check whether user gives recommendation to a particular place/thing
To check whether the user recommendation of one user is made available to all
To check whether the search result of user is carried properly.
CHAPTER 8
FURTHER WORKS
Future work:
We can develop different ranking algorithm
We can develop different connectors
User can use different connectors and ranking algorithms to rank different types of data
To do this user just has to add an extra tab and insert his code inside the framework
65
CHAPTER 9
SCREENSHOTS
66
67
68
69
CHAPTER 10
70
REFERENCES
BOOKS:
1. Roger pressman Software Engineering
2. Information Retrieval
WEBSITES:
1)http://blog.taragana.com/index.php/archive/clean-room-implementation-of-google-page-rankalgorithm/
2)http://www.stanford.edu/group/reputation/ClickThroughAlg_Tutorial.pdf
3)http://kojotovski.diinoweb.com/files/The_mathematical_model_of_Google.pdf
4)http://citeseer.ist.psu.edu/cache/papers/cs/7144/http:zSzzSzwwwdb.stanford.eduzSz~backrubzSzpageranksub.pdf/page98pagerank.pdf
5)http://www.suchmaschinen-doktor.de/index.html
6)http://wwwhome.math.utwente.nl/~litvakn/IntMath07.pdf
7)http://www2006.org/programme/files/xhtml/3101/p3101-Richardson.html
8)http://www.texaswebdevelopers.com/docs/pagerank.pdf
8)http://pr.efactory.de/e-pagerank-implementation.shtml
8)http://www.rankforsales.com/n-aa/095-seo-may-31-03.html
9)http://www.pwqsoft.com/search-engine-ranking.htm#case2
10)http://www.webworkshop.net/pagerank.html
11)http://www.ianrogers.net/google-page-rank/
71
12)http://www.webworkshop.net/pagerank_calculator.html
13)http://www.linkingmatters.com/WhyLinkingIsImportant.html
14)http://www.example-code.com/vcpp/spider_simplecrawler.asp
15)http://en.wikipedia.org/wiki/Web_crawler
16)http://www.codeproject.com/KB/library/GomzyHTMLReader.aspx
17)http://en.wikipedia.org/wiki/CURL
18)http://en.wikipedia.org/wiki/Dynamic-link_library
19)http://en.wikipedia.org/wiki/WinRunne
20)http://www.nokiasoftware.net/general-discussions/19871-net-framework.html
21)http://cache.phazeddl.com/1412686/Microsoft%20Visual%20Studio%206.0
22)www.rocw.raifoundation.org/management/mba/.../lecture-10.pdf
23)http://en.wikipedia.org/wiki/PageRank#Algorithm

Project Report

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Project Report

Hochgeladen von

Copyright:

Verfügbare Formate

INDEX

4.Project Management Plan------------------------------------------------------------------ 22-24

5.Software Requirements Specifications (SRS)----------------------------------------- 25-29

6.Software Design Description-------------------------------------------------------------------- 30-50

8. Further Works -------------------------------------------------------------------------------------- 65

To implement Rules Engine For Popularity Based Ranking

It has many In-links or few but highly ranked

Has few Out-links

How Page Rank Works:

Or (N = the number of documents in collection)

Software Process Model [22]:

In our software the incremental model was implemented as:

Implementation of page ranking algorithm

Implementation of page ranking algorithm with different damping factor

Implementation of hash table

Implementation of indexing technique

Implementation of keyword relevance algorithm

Implementation of user interface

Implementation of database connector

Merging of all the modules

Roles and Responsibility:

Finalizing the project problem definition

2. Project Team Members & their Assignments:

Studying different ranking algorithms

Page ranking algorithms

Finalizing the ranking algorithms to be implemented

Understanding the need for database connector

Generate Test Plan Identify Test Cases

Tools and Techniques:

2. At Stage 2 of our project we prepared a detailed project plan.

3. At Stage 3 the entire designing of the project was done.

6. And finally at Stage 5 the Development of code was done.

Brief description of various tools used:

2. Microsoft .net framework [20]: The Microsoft .NET Framework is a software

: Page ranking algorithm

Dependencies & Constraints

Risk & Contingencies

Microsoft Visual studio 2003

Task name: Keyword relevance algorithm

Microsoft Visual studio 2003

Dependencies & Constraints

1) To implement ranking algorithms.

Finalizing problem statement

Discussion with external guide

Submission of final synopsis

Discussion with internal guide

Implementation and coding

Testing and debugging

Documentation and preparation

766 MHz Pentium or compatible processor (1.5 GHz Pentium recommended)

512 MB RAM more recommended).

Front End : VC++(2003),C#.net

Operating System : Windows XP

Software System Attributes:

NOT NULL,PRIMARY KEY

System interface design:

Detail description of components

DLL, is Microsofts implementation of the shared library concept in the Microsoft

c. Explicit run-time linking

2. Component Object modeling:

3. Library file libcurl-7.19.3-win32-ssl-msvc, HTMLReader_src [17]:

5. Web crawler [15]:

Get the URL