ResearchEngine - SRS

Software Requirements Specification Research Engine (Codename: iBlue)
TEAM NAME: Tech Buddy

REiBlue_1.0.3.2841 (Matured Alpha)
TEAM MENTOR Mrs. Hema Malini Asst. Prof in SASTRA University
TEAM MEMBERS Ashwanth Kumar Kirubaharan A. Salaikumar @ Saravanan M. Swetha S.
Table of Contents
Unit I: Introduction
Modes of Research Engine Purpose of Research Engine Technologies to be used Hardware Requirements Software Requirements iBlue Class hierarchy 1 2 2 4 4 5
Unit II: Web Search Component

Use Case Diagram Activity Diagram Sequential Diagram Tools/Libraries used in implementation 6 7 10 8
Unit III: Knowledge Engine Component

Use Case Diagram Activity Diagram Sequential Diagram Tools/Libraries used in implementation 11 12 13 14
Unit IV: Code Search Component

Use Case Diagram Sequential Diagram Activity Diagram Tools/Libraries used in implementation 15 16 17 18
Software Requirements Specification
REVISIONS
1. Research Engine Alpha (REiBlue_1.0.1.1086) Initial design of the Research Engine (Semantic Search Engine), using NLP for query processing and refinement. Authors: Salaikumar @ Saravanan and A. Kirubaharan
2. Research Engine Matured Alpha (REiBlue_1.0.3.2841) Improved design using world class standard components, and implementing knowledge engine, code search and public SPARQL end-point for Linked Data Authors: Salaikumar @ Saravanan, Kirubaharan A, Ashwanth Kumar, and Swetha S.
Document base URL: http://re-iblue.co.cc/docs/
Project Scenario: Research Engine Team Name: Tech Buddy
Date: 8th December, 2010. Software Requirement Specification
UNIT I
Introduction
Research Engine (Code name: iBlue), is a semantic search engine for the people of 21st century. In brief it has the elegance of Google, and the power of Wolfram Alpha knowledge Engine. Its the outcome of efforts of four people, who was not happy with the way information was available on the internet, and the difficulty involved to get what you want, especially when we are not sure of what we want. If you want the website that contains the best cookery information, or the famous site for Movie ratings, we use Google. When we want to know how a scientific expression is derived, or how exactly it is put into use, and other related science concepts, we are forced to use WolframAlpha (though in Alpha stage, it has a really large entity index of science and technological information). We need a system that combines the power of both of these technologies, to provide to the users, what we call: "Instant Answers to all your Questions!"
Welcome to Research Engine (code name: iBlue). We hope you like using it as much as we did.
Modes of Operation
It operates on three modes. 1. Web Search - Google like searching interface, but uses powerful clustering algorithm to categorize your results, to identify what you want very easily. 2. Knowledge Engine - Works exactly like Wolfram Alpha. All the data of the knowledge base are obtained from Wikipedia (updated till 4th April, 2010) 3. Code Search - Helps you search all opensource code available on SF.net, github, Google Code, and other public SVN.
Tech Buddy / SASTRA University / Tamil Nadu
1 | iBlue Research Engine SRS v1.0.3.2841
Purpose of Research Engine

The main purpose of Research Engine (now on referred to as iBlue), was basically built to be a Semantic Knowledge Engine, from ground up. Though the idea initially was built a search engine, as we worked on it we understood the need for structured data online. Information repositories like Wikipedia contains all structured human-edited information on various subjects, but their contents are not machine ready. iBlue thus is a semantic search engine, providing search feature to all its users based on their interests (syndicated from Social networking sites Facebook in our case) and Semantic results are quantized on the basis of their preferences and presented. Code Search, is a labs feature which should any user (especially students) to browse through the large canopy of free and open source code available online. Knowledge Engine, is the implementation of an inference engine which stands on top of semantic data (information from Wikipedia), available as the Web Ontology Language (OWL) format.
Technologies and Tools used in the implementation:

Rational Software Architect (RSA):
IBM Rational Software Architect, (RSA) made by IBM's Rational Software division, is a comprehensive modeling and development environment that uses the Unified Modeling Language (UML) for designing architecture for C++ and Java 2 Enterprise Edition (J2EE) applications and web services. Rational Software Architect is built on the Eclipse open-source software framework and includes capabilities focused on architectural code analysis, C++, and model-driven development (MDD) with the UML for creating resilient applications and web services.
J2EE:
Java Platform, Enterprise Edition or Java EE is a widely used platform for server programming in the Java programming language. The Java platform (Enterprise Edition) differs from the Java Standard Edition Platform (Java SE) in that it adds libraries which provide functionality to deploy fault-tolerant, distributed, multi-tier Java software, based largely on modular components running on an application server.
Apache Hadoop:
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.
IBM WASCE:
IBM WebSphere Application Server Community Edition (WASCE) is a free, certified Java EE 5 application server for building and managing Java applications. It is IBM's supported distribution of Apache Geronimo that uses Tomcat for servlet container and Axis 2 for web services.
IBM DB2 Express C:

IBM DB2 Express-C is a free to download, use and redistribute edition of the IBM DB2 data server, which has both XML database and relational database management system features.
Semantic Web (Web 3.0):

Semantic Web is a group of methods and technologies to allow machines to understand the meaning - or "semantics" - of information on the World Wide Web.
Jena API:
Jena is an open source Semantic Web framework for Java. It provides an API to extract data from and write to RDF graphs. The graphs are represented as an abstract "model". A model can be sourced with data from files, databases, URLs or a combination of these. A Model can also be queried through SPARQL and updated through SPARUL.
Hardware Requirements
Minimum Configuration 1 Node running - Pentium 4 Processor, 1 GB RAM, 80 GB HDD, DVD Optical drive, and Broadband internet connection. Recommended Configuration 2 100 nodes running - Intel Pentium processor, 512 MB RAM, 40 GB HDD, high speed network connectivity (optic fiber recommended) & uninterrupted power supply
Software Requirements
1. Ubuntu 10.04 or any Linux based operating system 2. Java 1.6 preferably from Oracle, and JAVA_HOME variable must be set to jvm home 3. SSH package must be installed (Used by Hadoop to contact other nodes on the network) 4. Websphere community Edition or Websphere Application server or any other equivalent must be installed 5. IBM DB2 Express Edition or DB2 Enterprise edition
Date: 8th December, 2010. Software Req equirement Specification
iBlue Class Hierarchy
List of all classes being used and i implemented in iBlue can be visualized as abov ve. Each classes, their methods and required fields are mapped under the respective packages. There are some more depende encies and open source tools being used in th project, which he doesnt come into this class hierar ierarchy. You can find the complete list o components used in each module at the end of design under of d each module.
5 | iBlue Research Engine SRS v1.0.3.2841 e
UNIT II
Web Search Component
Web search module enables us sers of the site to perform text based searchin on the entire ing web.
Use Case Diagram for Web search component

User enters his query, which is then improved using Ontology of OpenCy concepts. The yc improved query is then searched against the index. The index gives the weight list of results, ed ted
based on the query and semantics. This result set is then clustered by a clus ster engine using Lingo algorithgm. When a user is logged in, the s search results from the indexer is further imp proved using the users connections and their ont ntology. Rest of the process is same with the clustering of results and output formats. The web co omponent is also supported via AJAX also. WebSearch servlet also supports REST based search A h API.
Activity diagram for We search component eb
The user activity with the web search module, is explained above. The above sequence is also represented in the following sequence diagram. UserQuery User gives their query in form of text or keywords to search from the web. isUserAuthenticated Returns true if the user is logged in else false SearchIndex Searches the index for matching patterns of the UserQuery Quantization Filter the fetched urls based on the users likes and dislikes, interests, activities, etc. (Available only to loggedin user) AddToWebHistory Add the search query of the user to the WebHistory table in the system database for later retrival and filteration process upon subsequent relevant queries Clustering Groups the results of the UserQuery based on the semantics of the result. It uses Lingo algorthim to categorize the search results into different categories Result Contains the quantized, filetered, and then clustered results for the user. It may be in any of the following forms: text/xml, text/json, text/html
Tools / Libraries used in Implementation:

1. Apache Nutch (http://nutch.apache.org) - Nutch is open source web-search software. It builds on Lucene and Solr, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc. Nutch can run on a single machine, but gains a lot of its strength from running in a Hadoop cluster. 2. Clouderas Distribution for Hadoop (http://www.cloudera.com/hadoop/) - Clouderas Distribution for Hadoop (CDH) sets a new standard for Hadoop-based data management platforms. It is the most comprehensive platform available today and significantly accelerates deployment of Apache Hadoop in your organization.
3. Facebook Connect API (http://developers.facebook.com/) - Facebook's powerful APIs enable us to create social experiences to drive growth and engagement on our web site. User context is derived from OpenGraph protocol of Facebook. 4. Apache Lucene (http://lucene.apache.org/) - Apache Lucene is a free/open source information retrieval software library. It is supported by the Apache Software Foundation and is released under the Apache Software License.
Web se earch module sequence diagram

Tech Buddy / SASTRA University / Tamil Nadu 10 | iBlue Research Engine SRS v1.0.3.2841 e
UNIT III
Knowledge Base Engine (KBEng e gine)
Knowledge Base Engine, does all t semantic processing of data from the WWW. Currently the the dataset is limited to 3 million art rticles from Wikipedia (as of July, 2010). The dat is published in ata Linked Data format to be compa atible with Open Calais, DBPedia, OpenCyc, etc. . It can also be used as an Analytic Engine, Computational engine, Inference engine, or anything ical you can think of. It is a very bas knowledge engine, it can be extended to be used under any asic e type of application and requirements.
Use case for Knowledge Engine Component e
Users of the component includes any user (logged in and guest). There is no restrictions being es, r applied since the information has no context of the user related to it. iBlue Kn as nowledge Engine, can be used to query information about any entity on the web. n
Tech Buddy / SASTRA University / Tamil Nadu 11 | iBlue Research Engine SRS v1.0.3.2841 e
It can also perform analysis u using Machine learning algorithms (at the backend); this is b implemented separately its not t part of the KBEngine. the Administrators can block a partic icular Entity or a Entity type (example scenarios include parental s control or unmatured informatio ion).
KBEngine activity in iBl lue
Above activity depicts the usage of KBEngine with iBlue. The general operatio e ional activities are as follows. GetUserQuery Get the user query in REST based or form based medium. er TokenizeQuery It generates the valid tokens of the query es ASTForm It is an intermediate form of representation of the query in the memory, ready for iate computation or quering the Knowledge Store GetAction Once the query is tokenized, the corresponding action can be identified from e the AST form of the query PerformAction Once the ac he action is identified, execute it GenerateOutput The com mputed value or information is then syndicat in the form ated requested by the user (JSON/XML)
KBEngine Sequence for iBlue r
Tools / Libraries used in Implemenation

1. OpenCyc (http://sw.opencyc.org/) - OpenCyc is the open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine. 2. OpenCalais (http://www.opencalais.com/) Open Calais, is a web service that helps to annotate the unstructed text using the OpenCalais Ontology. 3. WikiTractor (http://code.google.com/p/wikixtractor/) TechBuddys project for wikipedia content extraction, which was used to generate the ~16 GB of structured data from Wikipedia using various parsers into N-Triple RDFs for processing. 4. Validated all wikipedia entities against DBPedia public end-point 5. Jena (http://openjena.org/) - Jena is a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. Jena is open source and grown out of work with the HP Labs Semantic Web Programme. The Jena Framework includes: A RDF API, Reading and writing RDF in RDF/XML, N3 and N-Triples, An OWL API, In-memory and persistent storage and SPARQL query engine. 6. Snorl (http://dbpedia.org/snorql/) - SPARQL Explorer for http://dbpedia.org/sparql
UNIT IV
Code Search
Code search is an addon implementation of searching feature purely concerntr rated on indexing and searching open source code online from Apache, Google Code, SF.net. Th programs are e he classified based on the programm language used, package, license under wh its available, ming hich class type, method name, file nam pattern, and also based on custom keyword by the user. ame ds
Code Search Use case diagr ram
Code search module is very sim imilar to Google Code Search feature. As said its an addon aid, implementation for the iBlue spider (web crawler used by all the modu dules). It mainly concentrates on crawling and ind d indexing open source projects under SVN repositer itery. Currently it crawls SF.net, Apache t he top-level projects, Google Code Eclipse labs projects. p
Users can search via the program ramming language, product license, file name pattern, custom e user query, class, and methods. Users can also contribute a SVN Url for indexing Administrators . g. at the backend, can delete any S repositery that is already indexed by the craw SVN awler. Code Index, is available as a ser ervice so that users writing any IDE can utilize the service for e proving real time code completio sugestion techniques. ion
Code Search sequence f iBlue for
The above diagram represents the sequence of actions that takes place in the system with respect to the module. User mak a query to the Code Search server, which accepts the query akes ac and searches in the index. The weighted list of results are then returned to the user. In the e mean while the user client has th PrettyPrintCode JS Framework (similar to Bespin). he Bes
Code Search Activity Di iagram

Code Search modules activity diag iagram has the similar implementation of the se equence. It starts of with user query to identify the project and proceeds upto to display the projec details. e ject
Above activity depicts the typical usage of code search module. Each activity method is described as follows: CodeQuery Query from the user for the code, containing various parameters like language, package, type, class, method, file (regular expression), and licesence CodeIndexer Analyses the CodeQuery against the code index to determine any valid search patterns GetRequiredIndexParameter Returns the required properties of the index object repective to the user code query. CodeSearcher Searches the indices for valid pattern match for CodeQuery PrettyPrintOutput Outputs the result in the preferred format (text/json, text/xml or text/html) after formatting the code
Tools / Libraries used in Implemenation

1. SVNKit (http://svnkit.com/) - SVNKit is a pure Java toolkit - it implements all Subversion features and provides APIs to work with Subversion working copies, access and manipulate Subversion repositories - everything within your Java application. 2. Google project hosting (http://code.google.com/projecthosting/) Repositery of opensource softwares hosted at Google infrastructure 3. SourceForge (http://sourceforge.net/) Repositery of opensource softwares 4. Apache Software Foundation (http://projects.apache.org/) All world class open source projects developed over a course of time by a vibrant developer community 5. Apache Lucene (http://lucene.apache.org/) - Apache Lucene is a free/open source information retrieval software library. It is supported by the Apache Software Foundation and is released under the Apache Software License.

ResearchEngine - SRS

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

ResearchEngine - SRS

Hochgeladen von

Copyright:

Verfügbare Formate

Software Requirements Specification Research Engine (Codename: iBlue)

TEAM NAME: Tech Buddy

TEAM MENTOR Mrs. Hema Malini Asst. Prof in SASTRA University

TEAM MEMBERS Ashwanth Kumar Kirubaharan A. Salaikumar @ Saravanan M. Swetha S.

Unit II: Web Search Component

Unit III: Knowledge Engine Component

Unit IV: Code Search Component

Software Requirements Specification

Document base URL: http://re-iblue.co.cc/docs/

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Requirement Specification

Tech Buddy / SASTRA University / Tamil Nadu

1 | iBlue Research Engine SRS v1.0.3.2841

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Requirement Specification

Purpose of Research Engine

Technologies and Tools used in the implementation:

Tech Buddy / SASTRA University / Tamil Nadu

2 | iBlue Research Engine SRS v1.0.3.2841

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Requirement Specification

IBM DB2 Express C:

Semantic Web (Web 3.0):

Tech Buddy / SASTRA University / Tamil Nadu

3 | iBlue Research Engine SRS v1.0.3.2841

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Requirement Specification

Tech Buddy / SASTRA University / Tamil Nadu

4 | iBlue Research Engine SRS v1.0.3.2841

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Req equirement Specification

iBlue Class Hierarchy

Tech Buddy / SASTRA University / Tamil Nadu

5 | iBlue Research Engine SRS v1.0.3.2841 e

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Req equirement Specification

Use Case Diagram for Web search component

Tech Buddy / SASTRA University / Tamil Nadu

6 | iBlue Research Engine SRS v1.0.3.2841 e

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Req equirement Specification

Activity diagram for We search component eb

Tech Buddy / SASTRA University / Tamil Nadu

7 | iBlue Research Engine SRS v1.0.3.2841 e

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Requirement Specification

Tools / Libraries used in Implementation:

Tech Buddy / SASTRA University / Tamil Nadu

8 | iBlue Research Engine SRS v1.0.3.2841

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Requirement Specification

Tech Buddy / SASTRA University / Tamil Nadu

9 | iBlue Research Engine SRS v1.0.3.2841

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Req equirement Specification

Web se earch module sequence diagram

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Req equirement Specification

Use case for Knowledge Engine Component e

Project Scenario: Research Engine Team Name: Tech Buddy

Date: 8th December, 2010. Software Req equirement Specification

KBEngine activity in iBl lue

Tech Buddy / SASTRA University / Tamil Nadu