Beruflich Dokumente
Kultur Dokumente
Received: 12/01/2016
Abstract Research in any field heavily depends on past works performed and
hence easier systematic ways to access relevant publications are a great boon
to researchers. Apart from the individual publishing houses giving search engines to their content, there are several other academic search engines used
by researchers for finding articles, journals and conference publications. Two
such free academic search engines of relevance are Google Scholar(GS) and Semantic Scholar(S2). This work compares various aspects like user friendliness,
quality and quantity of search results of both these search engines for finding
various published works in the domain of Computer Science and draws conclusion from the results obtained. The S2 search engine demonstrates significant
improvement in search results with some typical bottlenecks faced by an early
stage product.
Keywords Search Engines Google Scholar Semantic Scholar
1 Introduction
The advent of the internet along with newer and efficient algorithms helped
in the creation of search engines which increased the capability to access large
databases of research articles publicaly available throughout the world. Knowledge about the existing works done, not only helps in avoiding repetition but
also speeds up research to an extent. The scientific world should thank the
legendary scientist Eugene Garfield for his ideas of citation based searching,
resource discovery and quantitative evaluation of publications which serve as
Michael J Mathew
The School of Computer Science Tel.: +44 (0)121 414 43710
E-mail: mjm522@cs.bham.ac.uk
University of Birmingham,
Edgbaston, B15 2TT, UK
Michael J Mathew
the basis for many of the powerful and innovative search engines these days [9].
Prior to this type of indexing the techniques used to store and sort publications
where either alphabetically or numerically which resulted in a rather cumbersome way to find various articles [9]. Though most of the publishing houses
provided search services for their content, most of the good ones (e.g. Scopus)
were either paid services and the free ones were insufficient. This created a
huge economic based division particularly for students from developing world.
The idea to provide a free service to index research articles in various domains
under a single umbrella would increase efficiency of researchers substantially.
With the advent of GS[7] other similar free academic research indexing services came in later years such as Microsoft Academic Research, PubMed[5],
JSTOR etc. and finally S2 [6]. The availability of high computational resources
of google coupled with their other services like Google Books has made GS the
leading academic search engine [12].The emergence of machine learning and
other statistical techniques has made the search engines to produce tailor fit
results for the researcher. Though there exists several options for searching,
the search results variance among the search engines are indeed a problem.
The quantity of search results is proportional to the total indexed database
size and quality is determined by various ranking algorithms that are used
to sort the search results. Out of the several free search engines available in
academic research one of the oldest (GS) and the latest (S2) are compared on
some measures regarding the search outputs produced by them.
The article is organized as follows. Section 2 and 3 gives an overview of the
search engines under consideration. Section 4 gives the methodology and topics
based on which the comparison was performed. Section 5 gives the results
obtained on comparison and finally the concluding remarks are presented in
section 6.
them access to index articles available in their paid services. Though Google
does not surpass the paywall created by the publishing firms, the knowledge
of existence of a work and their abstract itself could help totally igonorant
researcher [13].
4 Comparison studies
Since S2 just concentrates on Computer Science publications, it would be erroneous to compare keyword searching from other domains. So our experiments
will be restricted to keywords and author names will be tested from Computer
Science domain. The comparison between S2 and GS are done using four experiments. The four planned experiments and the corresponding strategy is
mentioned.
1
2
3
4
Accepting the fact that S2 is a recent search engine and its number of indexed
papers would be less compared to the GS, all the above-mentioned test will
be conducted on research papers published between years 2001 and 2016.
Michael J Mathew
(a) S2 Results
(b) GS Results
Curious enough, we can see that searching phrase - target tracking produced almost equal irrelevant results on both search engines. This is due to
the presence of same phrase in multiple research domains like missile control literature, mobile robots literature, manipulator robotics literature. This
is supported by the results of phrase 3, where all first 30 results were relevant. Dexterous manipulation is a phrase that belongs to mainly manipulator
robotics literature.
Michael J Mathew
(a) S2 Results
(b) GS Results
(a) S2 Results
(b) GS Results
4.3.1 Methodology
The specific keyword selected will be in the missile control literature since
this phrase produced the maximum number of irrelevant results. The specific
keyword to be searched is target tracking in ballistic missiles. The first 30
results of the search term will be compared to find the number of relevant and
irrelevant results.
(a) S2 Results
(b) GS Results
(a) S2 Results
(b) GS Results
Michael J Mathew
(a) S2 Results
(b) GS Results
Fig. 7: Resolving ability test results (pp : path planning, mp: motion planning)
The resolving ability test for both search engines produced similar results.
This proves the null hypothesis. Even if Google does not employ semantic
methods, it is using some alternative method to resolve various keyphrases.
Cons
Semantic Scholar
lower recall and higher precision
easily filter survey papers
sorting based on venue
links to GS results
lacks advanced search
no links to Find@BHAM
no coverage to patents and case law
10
Michael J Mathew
Cons
Google Scholar
higher recall and lower precision
links to papers in Find@BHAM (internally hosted page)
covers patents and case law
presence of advanced search option
redundant search results
there is no way to know the relevance of citation
no option to sort according to venues
no option to search among relevant sub-keywords
Table 1 represent few of the features one notice while searching similar
keywords on both search engines. Similarly table 2 denote variation in design
and usability of both search engines.
Table 3: Design, Usability: S2: Pros and Cons [14]
Search Engine Name
Pros
Cons
Semantic Scholar
streamlined with essential features
intuitive search by finding relevant sub keywords
easy breakdown of citation articles
features like access to figures, citation velocity
fewer external links
lacks internally hosted
no email alerts could be set
Cons
Google Scholar
layout is similar to Google search results
alerts can be scheduled to email users with query
can track and save user citations
features like metrics and language
clutter produced by search results
simple design with lack of some essential features
no intention to understand searches
6 Conclusion
With four different experiments conducted on both search engines, we can
conclude the following aspects. Though the results of GS and S2 shows some
visible differences in terms of coverage (with GS leading); it would be highly
inaccurate to conclude that the search result of S2 is worse than GS. The
features like easy sorting of survey papers are highly useful during the start of
11
any research which is produced by S2. At the same time, lack of giving results
of internally hosted pages of a university puts an additional burden on the
user to check the article again in his own University resources. But overall the
search features specified by S2 seems to be more relevant to a researcher at the
early stage of a work. The presence of some short comings mentioned above
could be the result of a comparatively smaller database size of S2 which could
be improved in time. Even if most of the tests conducted resulted in favor of
GS, one can observe a strong positive correlation between the results. This
supports the fact that S2 is on the right path. Apart from this, the semantic
support provided by S2 is not used in GS leading to finding some more relevant
results in some cases.
One must not forget the fact that GS since its inception in 2004 and has a
well-indexed database (more than 10 years of crawling) along with remarkable
aid from other Google products like Web search and Google books repository.
While S2 is still at its BETA stage with specificity to Computer Science publications and has not fully achieved their true goals(see Sect. ??)- which are
different in many aspects to GS. It might be an early improper to compare
their strengths in their current states. With a more open source intention of
S2, given time and a steady progress at current pace, there is no doubt that
S2 would be a better research friendly search engine than GS.
References
1. Dr. Ales Leonardiss home page.
https://www.cs.bham.ac.uk/~leonarda/
Publications.html. [Accessed Aug 2016].
2. Dr. Jeremy L Wyatts home page. http://www.cs.bham.ac.uk/~jlw/publications.
php. [Accessed Aug 2016].
3. Dr. Peter Tinos home page. https://www.cs.bham.ac.uk/~pxt/my.publ.html. [Accessed Aug 2016].
4. Dr. Xin Yaos home page. http://www.cs.bham.ac.uk/~xin/publications.html. [Accessed Aug 2016].
5. Pubmed. http://www.ncbi.nlm.nih.gov/pubmed. [Accessed Aug 2016].
6. Allen AI Institute. Semantic Scholar. https://semanticscholar.org. [Accessed Aug
2016].
7. Google. Google Scholar. https://scholar.google.co.uk. [Accessed Aug 2016].
8. Tracey Hughes. An interview with anurag acharya, google scholar lead engineer. http:
//www.indolink.com/SciTech/fr010305-075445.php, December 2006. [Accessed Aug
2016].
9. Peter Jacso. As we may search-comparison of major features of the web of science,
scopus, and google scholar citation-based and citation-enhanced databases. Current
Science -Banglore-, 89(9):1537, 2005.
10. P
eter Jacs
o. Google Scholar: the pros and the cons. Online information review,
29(2):208214, 2005.
11. Nicola
Jones.
Artificial-intelligence
institute
launches
free
science
search
engine.
http://www.nature.com/news/
artificial-intelligence-institute-launches-free-science-search-engine-1.
18703, November 2015. [Accessed Aug 2016].
12. Madian Khabsa and C Lee Giles. The number of scholarly documents on the public
web. 2014.
13. Steven Levy. Making the worlds problem solvers 10https://medium.com/backchannel/
the-gentleman-who-made-scholar-d71289d9a82d#.16jjodjhu, October 2014. [Accessed Aug 2016].
12
Michael J Mathew
14. Free Infographic Maker. Academic search engines: Semantic Scholar vs Google Scholar.
https://infograph.venngage.com/p/66001/semantic-scholar-vs-google-scholar.
[Accessed Aug 2016].
15. Enrique Ordu
na-Malea, Juan Manuel Ayll
on, Alberto Martn Martn, and Emilio Delgado L
opez-C
ozar. About the size of Google Scholar: playing the numbers. arXiv
preprint arXiv:1407.6239, 2014.
16. Alex Verstak, Anurag Acharya, Helder Suzuki, Sean Henderson, Mikhail Lakhiaev, Cliff
Chiung Yu Lin, and Namit Shetty. On the shoulders of giants: The growing impact of
older articles. arXiv preprint arXiv:1411.0275, 2014.
17. S2 Team Member Vu Ha. What is Semantic Scholar and how will it work? Accessed
Aug 2016.