Beruflich Dokumente
Kultur Dokumente
1. What is WebCorp?
2. Who can use WebCorp?
3. Using a wildcard (*) find any 3-word phrases beginning with
the and ending with exploded. Copy and paste the phrases
you found.
4. Repeat the same query using different search engines
[Search API]. Which of them do not yield any results and
why?
5. Using the square brackets ([]) and the vertical bar/the pipe
(|) find examples of the phrases wild cat and wild dog at the
same time (in a single query).
6. Using the square brackets ([]) and the vertical bar/the pipe
(|) find examples of wild cat runs and wild cat running and
wild cat ran at the same time (in a single query).
7. What is the maximum number of web pages you can reach
through Google in WebCorp? What about other Search APIs?
8. What is the purpose of One concordance line per web page
function? Try repeating one of the above searches with this
function checked.
9. How can you restrict your search to all of the pages on an
individual web site or all sites with a given domain?
10. Search web sites in Croatia (.hr) and in Croatian
language for the word telka. What kind of web pages do you
mostly get results from?
11. Repeat the same query without specifying the language
and the domain. What kind of results do you get?
12. Are there any examples of the word telka on Serbian
web sites? (You will first need to know or find out the top-
level domain code for Serbia.)
13. Search for pages in Croatian and on the domain .hr
containing the words mobilni but not telefon. How would you
search for other forms of the word, e.g. mobilna, mobilno,
mobilnih etc.?
14. Using Post Search Options find collocates in the first
position on the right (R1) of the search word mobilni and list
them here. Try to sort the results alphabetically according to
the first word to the right. What is the alphabetically first/last
word on the right of search word mobilni?
15. What is the function of Exclude Stopwords option?
16. How do you access a plain text version of the web page
from which the concordances have been extracted? How do
you access the web site?
17. How do you get a list of all of the words (types) from a
given web page or specified text in frequency order? How
can you benefit from this function? What are n-grams?
6. No results.
8. This will retrieve only one match from each page searched. This
can useful to stop a single web page dominating the results.
9. The option site allows you to restrict your search to all of the
pages on an individual web site or all sites with a given domain.
Note that this only works with the Bing Search API. To search all of
the pages on an individual site enter the URL without the 'http://'
part. For example, enter www.bbc.co.uk to search all pages on the
BBC web site.
You can also specify domains that should not be included in the
search results by prefacing them with a minus (-) sign.
Below the Site option there is now a list of frequently used
domains. Select one from the list to insert it into the domain box.
This list was complied by inspecting WebCorp search logs.
poput Mikloa Ligetija, Geza Marotija, Edea Telka i Mike Rota. Godine
1998.ova prelepa zgrada je
13. For the first part of task i typed the word mobilni in the search
bar,selected croatian as the language, selected bing as the search
engine typed .hr in the site bar and telefon in the word filter bar.
There were 36 results in total (too much to copy-paste all of them)
and each site had 3 or more lines with examples.
For the second part I think one way of doing it is to put mobilni|
mobilnih|mobilna|mobilnog etc... so I separated the various forms
with | sign. I did it this way and it gave me 27 results in total
(again, too much to copy-paste)
This tool lists all of the words (types) from a given web page or
specified text in frequency order. It can also generate n-grams
upto 5 words. It can be accessed in the same way as the Plain
Text viewer (see above) or as a stand-alone tool. Instead of
generating a list of words you can generate lists of combinations
of words (immediate neighbours). Up to 5 words. Those are called
n-grams. This might help us to analyze a specific web page or a
text linguistically.