Beruflich Dokumente
Kultur Dokumente
edu)
Saturday, October 12
Concordance programs (or concordancers) help you investigate patterns of language use across a
large number of texts. AntConc, like other concordancers, allows you to search for all instances of a
particular item (e.g., the word this or the phrase in order to). You can also use AntConc to find:
collocations (words that frequently co-occur such as crystal and clear); word frequencies (a list of the
most frequently occurring words in the corpus); and frequently appearing word clusters (e.g., three,
four, or five-word clusters such as the fact that, on the other hand, or due to the fact that.).
AntConc was developed by the corpus linguist Laurence Anthony and can be downloaded for free
from his website: http://www.antlab.sci.waseda.ac.jp/software.html. There are also helpful tutorial
videos available halfway down this page: http://www.antlab.sci.waseda.ac.jp/antconc_index.html.
1
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)
Note: Running vertically at the left of the interface are all the files in your corpus. If you
uploaded ten papers, for example, you will see ten files. The number 10 will also appear in
the Total No. box at the bottom left. Running horizontally at the top of the page are seven
tabs: Concordance, Concordance Plot, File View, Clusters, Collocates, Word List, and
Keyword List. These are your different search tools. Below, the tools that you will use most
frequently are reviewed.
2. The Concordance Plot tool will show you (a) the particular file or files where your search
item is found and (b) the exact location of the item in each file. Notice that each little vertical
line represents one hit. (If you click on that hit, you will be taken to its location in the
document.) This tool is useful for linking particular language items with their typical
locations within a piece of writing. For example, you may discover that sentence-opening
However occurs more frequently at the beginning of research articles than in the middle or
end. Or you may discover that self mentions (e.g., I or my argument) occur commonly
toward the end of an introductory section.
3. You have several options for using the Clusters tool. As one option, you can research how
your particular item in question (for example, however) appears in two or three or four-word
clusters (e.g., However, I think or In my view, however). You can adjust the cluster size toward
the bottom right: adjust the minimum size to 3 if youre only interested in clusters of 3 or
more words; adjust the maximum size to 5 if you want no more than 5-word clusters. As
2
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)
another option, you can research the most frequently appearing clusters of words in your
corpus, without selecting a particular item to research. To do this, you need to tick the N-
Grams box below the window. Then adjust the N-Gram size (or numbers of words in each
cluster) to your preference. A good place to begin is with a minimum of three words and
maximum of 5 words. (6-word clusters are quite rare.) Finally, you should also adjust the
Minimum N-Gram frequency. Since your corpus is small, you can set this to 5 or 6. This
means a cluster will be retrieved only if it appears at least 5 or 6 times in your corpus.
4. The Collocates tool performs a similar operation as the Clusters tool. The difference is that,
while the Clusters tool retrieves strings of words that occur together in a series, the
Collocates tool retrieves words that are most frequently associated with your search item. For
example, the phrases I believe that and the fact that may be retrieved as frequently appearing
3-word clusters, but we would not say that the words the and fact are collocates. An example
of collocates are high and probability. These two words frequently co-occur (e.g., There is a
high probability that it will remain sunny tomorrow), but high and chance are not collocates.
We would not say there is a high chance that something will happen; instead we might say
there is a good chance that something will happen. Thus good and chance are collocates.
5. The Word List tool creates a list of the most frequently occurring words in your corpus.
(Most likely, the definite article the is at the top of the list in your corpus.) This tool can be
very useful for many research purposes. When analyzing your own writing, for example, it
can help you to identify words that you may be overusing. If, for example, the intensifiers
really and very are high on your list, you may be overusing these words in your academic
writing.
Now is a good time to start experimenting with the interface. Start by asking yourself what aspects of
your corpus, e.g., what grammatical patterns, you are interested in learning more about. Then try to
use all five of the tools just reviewed to conduct some mini-investigations.
As you are conducting searches, it is natural for your observations to lead you to new questions and
thus to new observations. Follow this course of exploration for a while. Frequently, our most
interesting observations, questions, and insights about language come about while we are
investigating some other area of language! The exercises on the following pages will help you work
through searches that might be relevant to your own research questions.
Citing/Referencing AntConc
Use the following method to cite/reference AntConc according to the APA style guide:
Anthony, L. (YEAR OF RELEASE). AntConc (Version VERSION NUMBER) [Computer Software]. Tokyo, Japan:
Waseda University. Available from http://www.antlab.sci.waseda.ac.jp/
Example: Anthony, L. (2011). AntConc (Version 3.2.2) [Computer Software]. Tokyo, Japan: Waseda
University. Available from http://www.antlab.sci.waseda.ac.jp/
3
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)
The following steps demonstrate the capabilities of AntConc which can be used with any corpus of
text files. You might have an existing corpus of text files which youd like to use here, or you may
need to develop your own. These steps will show you how to compare two separate corpora, so you
will need to have at least two separate folders containing one or more text files. For this exercise, you
might create a folder with one or more student papers, saved as .txt files, and one or more of your
own papers, also saved as .txt files. Any files you are interested in searching will work for the
purpose of this exercise.
Exploring AntConc
AntConc has a very intuitive interface. Its best simply to explore it. It can be helpful to work through
these steps with someone as you learn about the capabilities of this software.
STEPS
Questions to consider:
How many words are in the corpus (number of word tokens)?
What words are most frequent? What might the top 20 words tell you about this corpus?
Clone the results (save window) and keep the window on the screen to the side.
4
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)
START
Questions to consider:
How many words (number of word tokens) are in the corpus?
What words are most frequent? What might the top 20 words tell you about this corpus?
Clone the results (save window) and keep the window on the screen to the side.
6. From comparing the two frequency lists, make some observations of the differences between
your two corpora. These observations will likely connect to differences between the corpora you
selected. If you compared two different genres, for instance, you might make observations about
generic differences between your texts. If you chose texts from writers with different levels of
experience, you might make observations about differences between markers of experienced and
novice writers.
Pick a word that youd like to explore. Scroll up and down. How often does that word occur?
What other variants of the word do you see in the corpus? Try to select a word that has
multiple forms, for instance, something like cause, caused, causing.
Click on the word you have selected. The Concordance window will open and show you a
KWIC (Key Word in Context) view of all occurrences.
Sort these occurrences by level: one word to the right, two words to the right, three words to
the right.
KWIC SORT 1R 2R 3R SORT
The Concordance window will open and show you a KWIC (Key Word in Context) view of
all occurrences.
Sort these occurrences by level: one word to the right, two words to the right, three words to
the right.
KWIC SORT 1R 2R 3R SORT
Go back to Word List. Try repeating this search with other variants you notice.
Questions to consider:
What are you noticing about the use of this word in your corpus?
By considering the contexts within which this word appears, you are actually examining
semantic prosody.
5
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)
In the following searches, you will be considering both of your corpora together. To begin,
generate a word list of the most frequently occurring words in your corpora.
WORD LIST
SORT BY FREQ
START
In the Concordance window, search for all forms of your word by using a wildcard (*). For
instance, if you were searching cause, caused, causing you could find words that begin caus
SEARCH TERM (Consider wildcards: caus*)
START
SORT
Consider further the semantic prosody of the word you are searching by comparing its context
with the context around other words that might convey the same (or similar) meaning. For
example, if you were considering cause, you might compare its use to the use of bring about,
grow, lead to, produce, create, etc. in your corpus.
Questions to consider:
What seems unique about the way your word is being used in this corpus?
What meanings does it seem to be used to communicate?
How does this compare to other similar words in the corpus?
9. Examining collocations can help you summarize the context around your search term statistically.
COLLOCATES
SEARCH TERM
EXAMPLE: CAUS*
FROM 0 TO 4R
MIN COLLOCATE FREQ 4
SORT BY STAT
START
SORT BY FREQ(R)
SORT
6
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)
10. To see the text you are analyzing in context, use File View.
FILE > CLOSE ALL FILES
FILE > OPEN DIRCORPUS 1
WORD LIST
FILE VIEW
Click on any of the files in the left window. You will see the original text and will be able to
scroll around it to capture as much context as you like.
SEARCH TERM
This feature allows you to highlight all the instances of the word in the full context. Type in any
word youd like to explore in this box, and you will see how it is distributed across the text.
11. The Concordance Plot allows you to see the distribution of particular terms across multiple texts.
CONCORDANCE PLOT
SEARCH TERM
12. The Clusters/N-Grams tool will allow you to patterns in the phraseology of your corpus. These
patterns can be genre-specific, so comparing them across corpora of different genres can be a
generative space for research.
FILE > CLOSE ALL FILES
Repeat the steps above, except increase the size of your phrase to four tokens.
N-GRAM SIZE MIN 4 MAX 4
Repeat the steps above, except increase the size of your phrase to five tokens.
N-GRAM SIZE MIN 5 MAX 5
Questions to consider:
What do you observe about the phraseology of the language in your corpus?
What might this tell you about genre?
7
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)
Now do the 3 steps under 12 above, but for your second corpus.
Questions to consider:
What do you observe about the phraseology of the language in this corpus?
What do you observe about differences between the phraseology of the language in both of
your corpora?
13. The Keyword List allows you to identify what words are unique to a particular corpus. This is
best done by comparing your corpus to a much larger and diverse corpus of language. For
now, you can practice working with these tools by simply comparing your corpora to one
another.
WORD LIST
SORT BY FREQ
START
KEYWORD LIST
START
Questions to consider:
What is the nature of the vocabulary in your corpus? What words show a higher frequency in
this corpus?
SORT BY FREQ
SORT
8
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)
14. You can also use AntConc to deal with sets of words, for example in semantic sets. To do this,
you might develop a word list that is particularly interesting through qualitative analysis, then
supplement that analysis with quantitative data. To do this, start by choosing a word list that
focuses upon relevant words. For instance, if you wanted to search for prescriptive language, you
might create a wordlist of the following verbs: need, require, requires, required, must, demand,
ought, should, obliged, etc.
WORD LIST
Advanced
Use search terms from list below
Load File (FILE NAME)
Apply
SORT BY FREQ
START
CONCORDANCE
Advanced
Use search terms from list below
Load File (FILE NAME)
Apply
START
Kwic sort
Level 1 0
Level 2 1R
SORT
COLLOCATES
Advanced
Use search terms from list below
Load File (FILE NAME)
Apply
START
CLUSTERS/N-GRAMS
Advanced
Use search terms from list below
Load File (FILE NAME)
9
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)
Apply
N-GRAM SIZE MIN 3 MAX 5
` MIN FREQ 1
SORT BY FREQ
15. AntConc will also allow you to search for word sequences, such as phrases within phrases.
For example, if you wanted to explore how the point of is used in your corpus, as in from the
point of view, you can do this using the N-grams tool.
CONCORDANCE
START
Kwic sort
Level 1 0
Level 2 1R
Level 3 1L
SORT
This will allow you to see the words collocated with the phrase you have searched.
In addition, you can search for slot and frame patterns to see what words appear within
particular structures you specify.
CONCORDANCE
START
Kwic sort
Level 1 0
Level 2 1R
Level 3 1L
SORT
This will allow you to see the words that appear in the wildcard (*) slots of the structures you
specify.
16. AntConc can also be used for searches dealing with multiple frames. For instance, if you
are interested in the use of the passive in academic texts, you could search some of the frames
which use passive voice, like: was|are|is verb by, was|are|is verb in, was|are|is verb on.
CONCORDANCE
Advanced
10
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)
This search will generate instances of phrases framed by the terms you specify around the
wildcard (*).
17. AntConc allows you to save your outputs in order to enter them into statistics programs
for calculation of contingency, collostructional analysis, etc. Most of these outputs will be
generated as .txt files.
COLLOCATE
Window span 0 to 1R
Sort by stat
These tab-separated files can be imported into Excel for sorting, merging, statistical analysis,
etc. They can then serve as input to programs like Gries collocate
11