Sie sind auf Seite 1von 5

WebCorp: Answer the following questions and do the

tasks. Look for help in the User Guide!

1. What is WebCorp?
2. Who can use WebCorp?
3. Using a wildcard (*) find any 3-word phrases beginning with
the and ending with exploded. Copy and paste the phrases
you found.
4. Repeat the same query using different search engines
[Search API]. Which of them do not yield any results and
why?
5. Using the square brackets ([]) and the vertical bar/the pipe
(|) find examples of the phrases wild cat and wild dog at the
same time (in a single query).
6. Using the square brackets ([]) and the vertical bar/the pipe
(|) find examples of wild cat runs and wild cat running and
wild cat ran at the same time (in a single query).
7. What is the maximum number of web pages you can reach
through Google in WebCorp? What about other Search APIs?
8. What is the purpose of One concordance line per web page
function? Try repeating one of the above searches with this
function checked.
9. How can you restrict your search to all of the pages on an
individual web site or all sites with a given domain?
10. Search web sites in Croatia (.hr) and in Croatian
language for the word telka. What kind of web pages do you
mostly get results from?
11. Repeat the same query without specifying the language
and the domain. What kind of results do you get?
12. Are there any examples of the word telka on Serbian
web sites? (You will first need to know or find out the top-
level domain code for Serbia.)
13. Search for pages in Croatian and on the domain .hr
containing the words mobilni but not telefon. How would you
search for other forms of the word, e.g. mobilna, mobilno,
mobilnih etc.?
14. Using Post Search Options find collocates in the first
position on the right (R1) of the search word mobilni and list
them here. Try to sort the results alphabetically according to
the first word to the right. What is the alphabetically first/last
word on the right of search word mobilni?
15. What is the function of Exclude Stopwords option?
16. How do you access a plain text version of the web page
from which the concordances have been extracted? How do
you access the web site?
17. How do you get a list of all of the words (types) from a
given web page or specified text in frequency order? How
can you benefit from this function? What are n-grams?

1. WebCorp is a suite of tools which allows access to the


World Wide Web as a corpus - a large collection of texts from
which facts about the language can be extracted.

2. WebCorp can be used by anyone who has an interest in


language and how particular words and phrases are used,
especially words and phrases which are too new or too rare
to appear in any dictionary or standard corpus. Since its
launch, WebCorp has been used by corpus linguists,
lexicographers, language teachers and learners, publishers,
journalists, advertisers, and researchers in a variety of fields.
Although WebCorp is designed for linguistic data search,
many users have found its results format (with relevant
sections of text from multiple web pages collated on one
page) useful for information retrieval of the type for which
standard search engines are usually used.

3. The term exploded, the six exploded, the lander exploded,


the phone exploded, the moon exploded, the device
exploded
4. All of them do.

5. no. 312765 Highgate School 2013 Website: Wild Dog


Design
For The Day Lemur Experience Carnivore Combo Wild Cat
Experience Binturong Experience Meerkat
Experience Meerkat, Lemur and Serval Experience Wild Cat
Experience EDUCATION Planning your visit School

6. No results.

7.It's 100 in FAROO, and 50 in FarooNews, Bing and


GuardianOpenPlatform.

8. This will retrieve only one match from each page searched. This
can useful to stop a single web page dominating the results.

9. The option site allows you to restrict your search to all of the
pages on an individual web site or all sites with a given domain.
Note that this only works with the Bing Search API. To search all of
the pages on an individual site enter the URL without the 'http://'
part. For example, enter www.bbc.co.uk to search all pages on the
BBC web site.

To restrict the search to sites within a given domain, enter part of


a URL. For example, entering .ac.uk will restrict the search to UK
academic institutions, while entering .fr will restrict the search to
web sites in France.

To specify more than one domain separate them with spaces or


new lines, e.g. 1) 'www.cnn.com www.abc.com' 2) '.net .org'.

You can also specify domains that should not be included in the
search results by prefacing them with a minus (-) sign.
Below the Site option there is now a list of frequently used
domains. Select one from the list to insert it into the domain box.
This list was complied by inspecting WebCorp search logs.

10. telka.hr, indeks.hr, zargonaut.com

11. telka.hr, telka.uk, youtube.com


12. Tekst oglasa junica simentalka cisto krvna prvo telka steona 6meseci
teska 650kila Cena: 1.350,00 EUR

poput Mikloa Ligetija, Geza Marotija, Edea Telka i Mike Rota. Godine
1998.ova prelepa zgrada je

Rosario Central, Aukas, Sol de Amerika, Santa Telka i Bangu. Najdue se


zadrao u La Korunji - est

su male grudi veoma seksi. Izvor: vecernji.hr / telka.hr Poalji prijatelju


To: Bcc: Vaa e-mail

5: +1 Putanja ka ovom komentaru: Idite na komentar ~telka Anonimni korisnik


~telka : ~telka Anonimni
6: Idite na komentar ~telka Anonimni korisnik ~telka : ~telka
Anonimni korisnik SNS bahatost na delu.
7: na komentar ~telka Anonimni korisnik ~telka : ~telka Anonimni
korisnik SNS bahatost na delu. Brukaju

16.05.2016. u 12:55 Autor: vecernji.hr / telka.hr Foto: Thinkstock ene su


vrlo esto opsednute

imuniteta. Jasvljaju se antitela-odbrambena telka protiv sopstvenih tkiva, kao


da su se odbrambene

13. For the first part of task i typed the word mobilni in the search
bar,selected croatian as the language, selected bing as the search
engine typed .hr in the site bar and telefon in the word filter bar.
There were 36 results in total (too much to copy-paste all of them)
and each site had 3 or more lines with examples.

For the second part I think one way of doing it is to put mobilni|
mobilnih|mobilna|mobilnog etc... so I separated the various forms
with | sign. I did it this way and it gave me 27 results in total
(again, too much to copy-paste)

14. The first word alphabetically listed in the results is mobilni


internet , although really the first one alphabetically would be
mobilni aparati and the last one alphabetically is mobilni
alterski (ured) which really is the last one alphabetically.

15. Choosing to exclude stopwords will filter out high frequency


words (e.g. the, of etc).

16. By clicking on the matched text in a concordance or by


clicking on the 'Text' link under the URL for each result you can
access a plain text version of the web page from which the
concordances have been extracted. Matches to the search term
will be highlighted in the text. The web site is accessed by clicking
the link provide above a concordance example.

17. Word List viewer

This tool lists all of the words (types) from a given web page or
specified text in frequency order. It can also generate n-grams
upto 5 words. It can be accessed in the same way as the Plain
Text viewer (see above) or as a stand-alone tool. Instead of
generating a list of words you can generate lists of combinations
of words (immediate neighbours). Up to 5 words. Those are called
n-grams. This might help us to analyze a specific web page or a
text linguistically.

Das könnte Ihnen auch gefallen