Sie sind auf Seite 1von 8

1.

To avoid linearly scanning the texts for each INDEX query the document should be _______ in
advance.
A. Preprocessed B. designed
C. Formated D. indexed
2.Querying of unstructured textual data is referred to as
A. Information access B. Information updation
C. Information manipulation D. Information retrieval
3.The ________ refers to being able to ask any query in the form of AND,OR or NOT
expression of terms.
A. Index B. Incidence matrix
C. Binary retrieval model D. Boolean retrieval model
4.A better idea to build a term-document matrix is ______ where we record only the things that
do occur and their links
A. Incidence matrix. B. Adjecency matrix.
C. index D. Inverted index
5.A dictionary of terms is sometime also referred as
A. Lexicon B. Collection
C. Corpus D. none of the above
6._________retrieval requires term frequency information for documents in posting list.
A. Boolean Retrieval B. Frequent Retrieval
C. current Retrieval D. Ranked Retrieval
7.Edit distance (Levenshtein distance) is a way of:
A. Context-sensitive spelling correction B. Document correction
C. Isolated word correction D. Phonetic correction
8.Boolean retrieval model does not provide provision for:
A. Ranked search B. Proximity search
C. Phrase search D. Both proximity and ranked search.
9.Permuterm indices are used for solving
A. None B. Boolean queries
B. Phrase queries D. Wildcard queries
10.A large repository of documents in IR is called as:
A. Corpus B. Database
C. Dictionary D. Collection
11.Benefits of using a hash table is:
A. Do not need to rehash everything periodically if vocabulary keeps growing.
B. Lookup in a hash table is faster than lookup in a tree. C. All of the above
D. No prefix search is required
12.Benefits of using B-trees:
A. Re-balancing is cheap B. Balanced trees allow efficient retrieval
C. Faster O(log M) D. Solves the prefix problem.
13.Postings list should be sorted by:
A. Document Frequency B. DocID
C. TermID D. Term frequency
14.The goal of IR is to:
A. find documents relevant to an information need
B. find documents relevant to an information need from a given document set
C. find documents relevant to an information need from a large document set
D. find documents relevant to an information need from a small document set
15.Term-document incidence matrix is:
A. Sparse B. Depends upon the data
C. Dense D. Cannot predict
16.A______________ is a list of the observed categories and a count of the number of
observations in each.
A. Matrix B. Frame
C. Frequency distribution D. None
17.Document frequency of a term is the:
A. Number of documents that contain the term.
B. None of the above.
C. Number of times the term appears in the document
D. Number of times the term appears in the collection.
18.Boolean queries often result in:
A. Too many or too few results B. None of the above.
C. Too few results
D. Too many results.
19.The more frequent the query term in the document is:
A. The lesser the score of the document.
B. Does not make any affect.
C. The higher the score of the document.
D. None of the above.
20.The Jaccard coefficient is:
A. [XUY]/[XnY]
B. [XnY]/[XnY]
C. [XnY]/[XUY]
D. [XnY]
21.Wildcard Queries can be solved using :
A. Inverted index
B. Permuterm index
C. Binary Tree
D. None
22.Soundex is a class of heuristics to expand a query into its
A. synonyms
B. phonetic equivalents
C. similar words
D. None
23._________is a term-document matrix, where we record only the things that do occur and their
links.
A. Incidence matrix.
B. Adjecency matrix.
C. index
D. Inverted index
24.Edit distance (Levenshtein distance) is a technique which can be used in:
A. Context-sensitive spelling correction
B. Document correction
C. Isolated word correction
D. Phonetic correction
25.Boolean retrieval model can not be used for:
A. Ranked search
B. Proximity search
C. Phrase search
D. Both proximity and ranked search.
26.Which of the following statement is true for B-trees:
A. Re-balancing is cheap
B. Balanced trees allow efficient retrieval
C. Faster O(log M)
D. Solves the prefix problem.

27. Extremely common words which would appear to be of little value in terms of information
retrieval ,that are excluded from the index vocabulary are called:

A. Stop Words B. Tokens C.Lemmatized Words D.Stemmed Terms

28. the process of chopping off the ends of the word to reduce it to its root form for reducing the
size of vocabulary is called:
A.Lemmatization B.Case Folding C.True casing F.Stemming

29. Which of the following is a technique for context sensitive spelling correction:

A. the Jaccard Coefficient B. Soundex algorithms C. k-gram indexes D. Levenshtein


distance

30. Given two strings s1 and s2, the edit distance between them is sometimes known as the:
A. Levenshtein distance B.isolated-term distance C.k-gram overlap D.Jaccard Coefficient

31. A measure of similarity between two vectors which is determined by measuring the angle
between them is called:

A. cosine similarity B.sin similarity C.vector similarity D.vector scoring

32. _______________ are Powerful sources of authenticity and authority

A. Web pages B. Hyperlinks C. images D. None.

33. Which statement is false about Link analysis?

A. Link analysis used for scoring and ranking of pages

B. Link analysis used for link based clustering

C. Link analysis can be used as features in classification

D.None

34. what is Precision?


A. Fraction of retrieved docs that are relevant to the user’s information need
B. Fraction of relevant docs in collection that are retrieved
C. Fraction of retrieved docs from the collection.
D.None
35. What is Recall?
A. Fraction of retrieved docs that are relevant to the user’s information need
B. Fraction of relevant docs in collection that are retrieved
C. Fraction of retrieved docs from the collection.
D.None
36. Which statement is true for inverted index?
A. For each term t, we must store a list of all documents that contain t.
B. It is similar to sparse matrix.
C.Not good for large collection.
D.None
37.Which is not a part of text preprocessing?
A.Tokenizing
B.Stemming
C.Stop word removal
D.compressing
38. Which statement is false for K-gram index?
A. In K-gram index, dictionary contains all k-grams that occur in any term in the vocabulary
B.used for Processing wild-cards
C. Fast, space efficient (compared to permuterm)
D.None

39. Web crawling steps are

i. Visit all links

ii. Crawler

iii. Build list

iv. indexing

v. Store in database

a. ii ,iii, i,iv,v

b. ii, i, iii,iv,v

c. i, ii,iii,iv,v
d. i, iv, v, iii, ii
40. XQuery is a functional query language used to retrieve information stored in ---format.

a. Html

b. Xml

c. Uml
d. Jscript
41.XPath specification has _________ types of nodes

a)Four

b)Five

c)Six

d)Seven

42. Search engines use a of __________ to automatically index sites


A. crawler
B. query
C. enterprise
D. sitebuilder
43. A search value can be an exact value or it can be
A. Logical operator
B. Relationship
C. Wild card character
D. Comparison operation
44. Ranked retrieval models take as input
A. None of the above
B. Boolean queries
C. Logical queries
D. Free text queries
45. The basic formula for paid placement is_______
a. Pay-per-click ($) = Advertising cost ($) ÷ Ads clicked (#)
b. Pay-per-click ($) = Advertising cost ($) * Ads clicked (#)
c. Pay-per-click ($) = Advertising cost ($) * Ads clicked (#)
d. Both a and b
46. Structure of Web has following entities:
i. Web Graph
ii. Static and Dynamic Pages
iii. Hidden web pages
iv. Size of web page

a)i) & ii)


b)i) & ii)
c)iii) & iv)
d)i),ii),iii) & iv)
47. Collaborative Filtering has following problems
a) Cold Start
b)Scalability
c)Sparsity
d)All of the above
48. __________ can best be described as a programming model used to develop Hadoop-based
applications that can process massive amounts of data.
A MapReduce
B Mahout
C Oozie
D HTML
49. ______ is a page that contains actual information on a topic.
A authority
B Hub
C Hyperlinks
D Image
50. _______is a way of measuring the importance of website pages.
A Querying
B Page Rank
C Link Analysis
D HITS
51.___________ filtering recommends products which are similar to the ones that a user has
liked in the past.
A Collaborative based
B Context based
C Collection based
D Content based
52.SEO stands for _________________
A Search engine order
B Search engine organizer
C Search engine option
D Search engine optimization

Das könnte Ihnen auch gefallen