Sie sind auf Seite 1von 2

Synopsis

1.

Topic of the Project: The WWW is the most important source of information. But, there is no guarantee for information correctness and lots of conflicting information is retrieved by the search engines and the quality of provided information also varies from low quality to high quality. We provide enhanced trustworthiness in both specific (entity) and broad (content) queries in web searching. The filtering of trustworthiness is based on 5 factors Provenance, Authority, Age, Popularity, and Related Links. Contextual computing refers to the enhancement of a users interactions by understanding the user, the context, and the applications and information being used, typically across a wide set of user goals. Contextual computing is not just about modeling user preferences and behavior or embedding computation everywhere, its about actively adapting the computational environmentfor each and every userat each point of computation. With respect to personalized search, the contextual computing approach focuses on understanding the information consumption patterns of each user, the various information foraging strategies and applications they employ, and the nature of the information itself. Focusing on the user enables a shift from what we call consensus relevancy where the computed relevancy for the entire population is presumed relevant for each user, toward personal relevancy where relevancy is computed based on each individual within the context of their interactions. The benefits of personalized search can be significant, appreciably decreasing the time it takes people novices and experts aliketo find information.

2.

Objective/Aim/Vision: The objective of this project is to build a web based search engine which overcomes limitations of traditional search engine by incorporating features such as 1. Double filtering of search results using link rank as well as domain information of websites. 2. Filtering query results based on users interests. 3. Offline working. 4. Personalisation of search engine. Area of Project : Data Mining, Search engine optimization. Brief Description: Personalised Enhanced information retrieval system is a search engine which accepts query and search valid websites for result. In Internet we have various websites which maybe trustworthy or untrustworthy. our project aim is to separate out trustworthy websites from untrustworthy websites. our system will work on two sides i.e. client side and server side. Server side have some requirements like high speed internet connection, and high processing ability. Whereas client side need a web browser which will display HTML pages. At server side operations performed are : Crawling, Ranking, Searching and processing of user queries and sending responses. Crawling is a process which tries to visit each and every website present on web and collect information about it. Crawling is an unending process which keeps looking for new websites. Ranking is a process which takes into consideration some parameters like inlinks , outlinks, content length , age , authority etc. And assigns a score to the url. Ranking is important because eventually it decides ranking of urls in final search results. We are proposing a 2 level ranking system one at index time and other at query time. At index time we are using Link Rank algorithm which is similar to Googles Page rank algorithm. And at query time we try to find Authority of websites and rank according to them. Searching is a process which takes parameter form users search query and try to find relevant data from vast set of information provided by crawler with minimum time delay. Crawler deposite some data about each url it visits like HTTP header, meta tags, and some part of content which is used by a searcher. At client side we have web page which opens in browser which will accept query from user and send it to server which will create a response for it that response is shown in web browser. User can see different pages of the responses. Also we propose a system which help user to get search results which are very close to interests of users. This search engine also works according to history of search of a person who logged on it.User has to first log in to use this search engine.Then record of every search is kept in database. And for subsequent searches these records are used. Also if internet is not available then using this database stored on disk, searches are performed. Innovativeness in this project is that we combine all these three things together in our engine.

3. 4.

5.

Market Potential & Competitive advantage/Scope: PageRank: A PageRank results from a mathematical algorithm based on the graph, the webgraph, created by all World Wide Web pages as nodes and hyperlinks as edges, taking into consideration authority hubs such as cnn.com or usa.gov. The rank value indicates an importance of a particular page. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it ("incoming links"). A page that is linked to by many pages with high PageRank receives a high rank itself. If there are no links to a web page there is no support for that page.

6.

7.

8.

Authority-hub Analysis: Authority of a website is defined as how much popular your site is. Normally search engines will index those pages which are most popular, that is, they are seen by many users and the content of the site is good. A website which has high authority are indexed by search engines so that they can be searched faster. But does authority lead to accuracy of information? The answer is unfortunately no. Top ranked websites are usually the most popular ones. However, popularity does not always mean accuracy. Popularity is often correlated with trust but not necessarily. The WIQA: Information Quality Assessment Framework is a set of software components for filtering information using different quality-based information filtering policies. The WIQA framework can be employed by applications which process information of uncertain quality and want to enable users to filter information using different policies. The framework has been designed to fulfill the following requirements. Flexible Representation of Quality-Related Meta-information Information: quality assessment relies on a wide range of different quality indicators. Which quality indicators are relevant depends on the application domain and the quality dimensions to be assessed. Important quality indicators in the context of web-based information systems are provenance information, ratings, and background information about information providers. The WIQA framework uses Named Graphs as a flexible data model for representing information together with quality related meta-information. Support for Information Filtering Policies. The relevance of different quality dimensions and the metrics used to assess these dimensions depend on the application domain, the quality indicators available, the task at hand, and the subjective preferences of the information consumer. Therefore, information consumers use a wide range of different policies for determining whether to accept or reject information. The WIQA framework allows various policies to be employed for filtering information. Policies are expressed using a declarative policy language and can combine context-, content-, and rating-based quality assessment metrics. Innovativeness & Usefulness: Most of todays search engines work on the concept of popularity. but the popularity alone is not sufficient to get most relevant search results. We propose a new ranking concept which depends on 5 factors Provenance, Authority, Age, Popularity, and Related Links. Authority domain specific, Related resources links from trusted websites Popularity- most visited websites, Provenance - origin of information provider Age - lifespan of time-dependent information Also we propose a system which help user to get search results which are very close to interests of users. This search engine also works according to history of search of a person who logged on it.User has to first log in to use this search engine.Then record of every search is kept in database. And for subsequent searches these records are used. Also if internet is not available then using this database stored on disk, searches are performed. Innovativeness in this project is that we combine all these three things together in our engine. Process Description/Methodology Adopted: The development process of the entire proposed software system has to be stated in brief. Summarize the methodology that has to be adopted for the project development. This may be supported by DFD's / Flowcharts. Resources: Hardware resources: Client side: A computer with basic processing abilities and support to a web browser and internet connection. Server side: High computational system with high speed internet connection and large amount of data storage.

9.

Limitations: more stages of filtering require more computational power and increases the time for searching the query. 10. Duration: This project may take 25 to 30 weeks for completion. 11. References: [1] Enhanced Trustworthy and High-Quality Information Retrieval System for Web Search Engines by Sumalatha Ramachandran, Sujaya Paulraj, Sharon Joseph and Vetriselvi Ramaraj [2] Authoritative Sources in a Hyperlinked Environment by Jon M. Kleinberg y Internet http://www.seomoz.org/learn-seo/domain-authority

Das könnte Ihnen auch gefallen