Sie sind auf Seite 1von 11

WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.

edu)

Saturday, October 12

Exploring Freedom in First-Year Writing: A Corpus Approach for Comparing


Instructor and Student-Generated Feedback
Justine Neiderhiser, University of Michigan

Getting Started with AntConc1

Concordance programs (or concordancers) help you investigate patterns of language use across a
large number of texts. AntConc, like other concordancers, allows you to search for all instances of a
particular item (e.g., the word this or the phrase in order to). You can also use AntConc to find:
collocations (words that frequently co-occur such as crystal and clear); word frequencies (a list of the
most frequently occurring words in the corpus); and frequently appearing word clusters (e.g., three,
four, or five-word clusters such as the fact that, on the other hand, or due to the fact that.).

AntConc was developed by the corpus linguist Laurence Anthony and can be downloaded for free
from his website: http://www.antlab.sci.waseda.ac.jp/software.html. There are also helpful tutorial
videos available halfway down this page: http://www.antlab.sci.waseda.ac.jp/antconc_index.html.

Stage 1: Installing AntConc and uploading your corpus


1. To download AntConc, go to: http://www.antlab.sci.waseda.ac.jp/software.html
2. Download the version appropriate to your platform. If youre using Windows, click on
AntConc 3.2.4w. In the pop up box, click on the Save option. Then choose to Save to your
desktop. If youre using a Mac, scroll down and click on AntConc 3.2.4m.
3. Once youve downloaded the program, open it by clicking on the icon.
Note: You may be prompted to run the program, so click run.)
4. You are now ready to upload your corpus into the program.
Note: AntConc will not allow you to upload Word or PDF files, so before uploading
you first need to convert all your documents to TXT (or plain text) files and give them
consistent file titles. To convert MS Word documents to TXT files, open the doc (or
docx) file and click Save As, then Other Formats. Then choose Plain Text under
Save as type. For larger projects, you may want to download a program for about
$20 that will allow you to convert all of your files at one time. To get the program,
search for MS Word Export To Multiple Text Files Software 7.0, download the free
trial version, then purchase the full version if you want to be able to actually use the
TXT files it generates.
5. To upload your corpus, click on File, which is located at the top left of the screen. Then click
Open Dir In the window that appears, find the folder that contains your corpus, then
highlight it, then click OKAY. Your corpus should now be uploaded into AntConc.

1 Materials developed by Zak Lancaster, Assistant Professor of English at Wake


Forest University.

1
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

Congratulations! You are ready to begin concordancing.

Note: Running vertically at the left of the interface are all the files in your corpus. If you
uploaded ten papers, for example, you will see ten files. The number 10 will also appear in
the Total No. box at the bottom left. Running horizontally at the top of the page are seven
tabs: Concordance, Concordance Plot, File View, Clusters, Collocates, Word List, and
Keyword List. These are your different search tools. Below, the tools that you will use most
frequently are reviewed.

Stage 2: Conducting searches


1. You can use the Concordance tool to search for a specific language item. For example, you
may be curious about how the word however is used in your corpus. To find out, you could
type however in the Search Term box toward the bottom of the page and find out how
many concordance hits were retrieved.
a. Now you can begin to look for patterns. You might want to consider whether the
word appears at the beginning of the sentence, the end, or in the middle. For each hit,
you will be presented with minimal contextjust the clause that contains the item. If
you click directly on the highlighted item, you will be taken to the whole document.
Often, you will need to consider this larger context carefully when analyzing the
function of a particular language feature.
b. To accelerate your analyses, you can use the sort tab below the search term box.
What this tool will do is (a) sort the words to the right or the left of your search item
according to alphabetic order and also (b) highlight the words to the right or to the
left of the item. To use this tool, first adjust the levels that are located at the very
bottom of the screen. If youre interested in words that appear to the left of however,
then adjust the levels to 1L and 2L and 3L. Then press sort. This will highlight (in
different colors) the first, second, and third words to the left of however.

2. The Concordance Plot tool will show you (a) the particular file or files where your search
item is found and (b) the exact location of the item in each file. Notice that each little vertical
line represents one hit. (If you click on that hit, you will be taken to its location in the
document.) This tool is useful for linking particular language items with their typical
locations within a piece of writing. For example, you may discover that sentence-opening
However occurs more frequently at the beginning of research articles than in the middle or
end. Or you may discover that self mentions (e.g., I or my argument) occur commonly
toward the end of an introductory section.

3. You have several options for using the Clusters tool. As one option, you can research how
your particular item in question (for example, however) appears in two or three or four-word
clusters (e.g., However, I think or In my view, however). You can adjust the cluster size toward
the bottom right: adjust the minimum size to 3 if youre only interested in clusters of 3 or
more words; adjust the maximum size to 5 if you want no more than 5-word clusters. As

2
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

another option, you can research the most frequently appearing clusters of words in your
corpus, without selecting a particular item to research. To do this, you need to tick the N-
Grams box below the window. Then adjust the N-Gram size (or numbers of words in each
cluster) to your preference. A good place to begin is with a minimum of three words and
maximum of 5 words. (6-word clusters are quite rare.) Finally, you should also adjust the
Minimum N-Gram frequency. Since your corpus is small, you can set this to 5 or 6. This
means a cluster will be retrieved only if it appears at least 5 or 6 times in your corpus.

4. The Collocates tool performs a similar operation as the Clusters tool. The difference is that,
while the Clusters tool retrieves strings of words that occur together in a series, the
Collocates tool retrieves words that are most frequently associated with your search item. For
example, the phrases I believe that and the fact that may be retrieved as frequently appearing
3-word clusters, but we would not say that the words the and fact are collocates. An example
of collocates are high and probability. These two words frequently co-occur (e.g., There is a
high probability that it will remain sunny tomorrow), but high and chance are not collocates.
We would not say there is a high chance that something will happen; instead we might say
there is a good chance that something will happen. Thus good and chance are collocates.

5. The Word List tool creates a list of the most frequently occurring words in your corpus.
(Most likely, the definite article the is at the top of the list in your corpus.) This tool can be
very useful for many research purposes. When analyzing your own writing, for example, it
can help you to identify words that you may be overusing. If, for example, the intensifiers
really and very are high on your list, you may be overusing these words in your academic
writing.

Now is a good time to start experimenting with the interface. Start by asking yourself what aspects of
your corpus, e.g., what grammatical patterns, you are interested in learning more about. Then try to
use all five of the tools just reviewed to conduct some mini-investigations.

As you are conducting searches, it is natural for your observations to lead you to new questions and
thus to new observations. Follow this course of exploration for a while. Frequently, our most
interesting observations, questions, and insights about language come about while we are
investigating some other area of language! The exercises on the following pages will help you work
through searches that might be relevant to your own research questions.

Citing/Referencing AntConc

Use the following method to cite/reference AntConc according to the APA style guide:

Anthony, L. (YEAR OF RELEASE). AntConc (Version VERSION NUMBER) [Computer Software]. Tokyo, Japan:
Waseda University. Available from http://www.antlab.sci.waseda.ac.jp/

Example: Anthony, L. (2011). AntConc (Version 3.2.2) [Computer Software]. Tokyo, Japan: Waseda
University. Available from http://www.antlab.sci.waseda.ac.jp/

3
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

Practicing Corpus Analysis with AntConc2

The following steps demonstrate the capabilities of AntConc which can be used with any corpus of
text files. You might have an existing corpus of text files which youd like to use here, or you may
need to develop your own. These steps will show you how to compare two separate corpora, so you
will need to have at least two separate folders containing one or more text files. For this exercise, you
might create a folder with one or more student papers, saved as .txt files, and one or more of your
own papers, also saved as .txt files. Any files you are interested in searching will work for the
purpose of this exercise.

Exploring AntConc

AntConc has a very intuitive interface. Its best simply to explore it. It can be helpful to work through
these steps with someone as you learn about the capabilities of this software.

STEPS

1. Load your first corpus of text files into AntConc.


FILE > OPEN DIR

2. Make a frequency list of word types in this corpus.


WORD LIST
SORT BY FREQ
START

Questions to consider:
How many words are in the corpus (number of word tokens)?
What words are most frequent? What might the top 20 words tell you about this corpus?

Clone the results (save window) and keep the window on the screen to the side.

3. Empty the working set.


FILE > CLOSE ALL FILES

4. Load your second corpus.


FILE > OPEN DIR

5. Make a frequency list of word types


WORD LIST
SORT BY FREQ
2 Materials developed by Nick Ellis, Professor of Psychology and Linguistics at the
University of Michigan and adapted by Justine Neiderhiser (janeider@umich.edu) for
CCCC 2013.

4
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

START

Questions to consider:
How many words (number of word tokens) are in the corpus?
What words are most frequent? What might the top 20 words tell you about this corpus?

Clone the results (save window) and keep the window on the screen to the side.

6. From comparing the two frequency lists, make some observations of the differences between
your two corpora. These observations will likely connect to differences between the corpora you
selected. If you compared two different genres, for instance, you might make observations about
generic differences between your texts. If you chose texts from writers with different levels of
experience, you might make observations about differences between markers of experienced and
novice writers.

7. Sort your second corpus alphabetically.


SORT BY WORD
SORT

Pick a word that youd like to explore. Scroll up and down. How often does that word occur?
What other variants of the word do you see in the corpus? Try to select a word that has
multiple forms, for instance, something like cause, caused, causing.

Click on the word you have selected. The Concordance window will open and show you a
KWIC (Key Word in Context) view of all occurrences.

Sort these occurrences by level: one word to the right, two words to the right, three words to
the right.
KWIC SORT 1R 2R 3R SORT

Go back to Word List. Click on a variant of that word, if applicable.

The Concordance window will open and show you a KWIC (Key Word in Context) view of
all occurrences.

Sort these occurrences by level: one word to the right, two words to the right, three words to
the right.
KWIC SORT 1R 2R 3R SORT

Go back to Word List. Try repeating this search with other variants you notice.

Questions to consider:
What are you noticing about the use of this word in your corpus?

By considering the contexts within which this word appears, you are actually examining
semantic prosody.

5
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

8. Searching the full corpus


FILE > CLOSE ALL FILES
FILE > OPEN DIRCORPUS 1
FILE > OPEN DIRCORPUS 2

In the following searches, you will be considering both of your corpora together. To begin,
generate a word list of the most frequently occurring words in your corpora.
WORD LIST
SORT BY FREQ
START

In the Concordance window, search for all forms of your word by using a wildcard (*). For
instance, if you were searching cause, caused, causing you could find words that begin caus
SEARCH TERM (Consider wildcards: caus*)
START
SORT

Consider further the semantic prosody of the word you are searching by comparing its context
with the context around other words that might convey the same (or similar) meaning. For
example, if you were considering cause, you might compare its use to the use of bring about,
grow, lead to, produce, create, etc. in your corpus.

Questions to consider:
What seems unique about the way your word is being used in this corpus?
What meanings does it seem to be used to communicate?
How does this compare to other similar words in the corpus?

9. Examining collocations can help you summarize the context around your search term statistically.
COLLOCATES
SEARCH TERM
EXAMPLE: CAUS*
FROM 0 TO 4R
MIN COLLOCATE FREQ 4
SORT BY STAT
START

So what are the significant collocates of your term?

SORT BY FREQ(R)
SORT

6
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

10. To see the text you are analyzing in context, use File View.
FILE > CLOSE ALL FILES
FILE > OPEN DIRCORPUS 1
WORD LIST
FILE VIEW

Click on any of the files in the left window. You will see the original text and will be able to
scroll around it to capture as much context as you like.

SEARCH TERM

This feature allows you to highlight all the instances of the word in the full context. Type in any
word youd like to explore in this box, and you will see how it is distributed across the text.

11. The Concordance Plot allows you to see the distribution of particular terms across multiple texts.
CONCORDANCE PLOT
SEARCH TERM

12. The Clusters/N-Grams tool will allow you to patterns in the phraseology of your corpus. These
patterns can be genre-specific, so comparing them across corpora of different genres can be a
generative space for research.
FILE > CLOSE ALL FILES

Now to identify the 3, 4, and 5-word formulas of language in your corpus.


FILE > OPEN DIRCORPUS 1
CLUSTERS/N-GRAMS
SEARCH TERM CHECK N-GRAMS
N-GRAM SIZE MIN 3 MAX 3
` MIN FREQ 10
SORT BY FREQ
Clone Results (save window) and put on the side.

Repeat the steps above, except increase the size of your phrase to four tokens.
N-GRAM SIZE MIN 4 MAX 4

Clone Results (save window) and put on the side.

Repeat the steps above, except increase the size of your phrase to five tokens.
N-GRAM SIZE MIN 5 MAX 5

Clone Results (save window) and put on the side.

Questions to consider:
What do you observe about the phraseology of the language in your corpus?
What might this tell you about genre?

7
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

Now do the 3 steps under 12 above, but for your second corpus.

FILE > CLOSE ALL FILES

FILE > OPEN DIRCORPUS 2

Questions to consider:
What do you observe about the phraseology of the language in this corpus?
What do you observe about differences between the phraseology of the language in both of
your corpora?

13. The Keyword List allows you to identify what words are unique to a particular corpus. This is
best done by comparing your corpus to a much larger and diverse corpus of language. For
now, you can practice working with these tools by simply comparing your corpora to one
another.

Set up the target corpus.

FILE > OPEN DIRCORPUS 1

WORD LIST
SORT BY FREQ
START

Set up the comparison or background

a. Settings>Tool Preferences>KeyWord List ADD DIRECTORYCORPUS 2


b. You can repeat this process for any additional corpora you would like to use as a
background for comparison.
c. LOAD
d. APPLY

KEYWORD LIST
START

These are sorted by Keyness (log-likelihood).

Questions to consider:
What is the nature of the vocabulary in your corpus? What words show a higher frequency in
this corpus?

SORT BY FREQ
SORT

What is Keyness telling you? What is Frequency telling you?

8
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

14. You can also use AntConc to deal with sets of words, for example in semantic sets. To do this,
you might develop a word list that is particularly interesting through qualitative analysis, then
supplement that analysis with quantitative data. To do this, start by choosing a word list that
focuses upon relevant words. For instance, if you wanted to search for prescriptive language, you
might create a wordlist of the following verbs: need, require, requires, required, must, demand,
ought, should, obliged, etc.

Saves these words in a text file.

To count instances of these words in a set of files:

First, set up the target genre.

FILE > OPEN DIRCORPUS 1

WORD LIST
Advanced
Use search terms from list below
Load File (FILE NAME)
Apply
SORT BY FREQ
START

Then, to see the uses of these words:

CONCORDANCE
Advanced
Use search terms from list below
Load File (FILE NAME)
Apply
START
Kwic sort
Level 1 0
Level 2 1R
SORT

COLLOCATES
Advanced
Use search terms from list below
Load File (FILE NAME)
Apply
START

CLUSTERS/N-GRAMS
Advanced
Use search terms from list below
Load File (FILE NAME)

9
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

Apply
N-GRAM SIZE MIN 3 MAX 5
` MIN FREQ 1
SORT BY FREQ

15. AntConc will also allow you to search for word sequences, such as phrases within phrases.
For example, if you wanted to explore how the point of is used in your corpus, as in from the
point of view, you can do this using the N-grams tool.

CONCORDANCE

SEARCH TERM (Example: the point of)

START
Kwic sort
Level 1 0
Level 2 1R
Level 3 1L
SORT

This will allow you to see the words collocated with the phrase you have searched.

In addition, you can search for slot and frame patterns to see what words appear within
particular structures you specify.

CONCORDANCE

SEARCH TERM (Examples: is *ed, the * man, etc.)

START
Kwic sort
Level 1 0
Level 2 1R
Level 3 1L
SORT

This will allow you to see the words that appear in the wildcard (*) slots of the structures you
specify.

16. AntConc can also be used for searches dealing with multiple frames. For instance, if you
are interested in the use of the passive in academic texts, you could search some of the frames
which use passive voice, like: was|are|is verb by, was|are|is verb in, was|are|is verb on.

CONCORDANCE

Advanced

10
WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

Use search terms from list below


is * by
was * by
are * by
Apply
START
Kwic sort
Level 1 0
Level 2 1R
SORT

This search will generate instances of phrases framed by the terms you specify around the
wildcard (*).

17. AntConc allows you to save your outputs in order to enter them into statistics programs
for calculation of contingency, collostructional analysis, etc. Most of these outputs will be
generated as .txt files.

COLLOCATE

Search term (Example: give)

Window span 0 to 1R

Sort by stat

File>Save output to text file

These tab-separated files can be imported into Excel for sorting, merging, statistical analysis,
etc. They can then serve as input to programs like Gries collocate

11

Das könnte Ihnen auch gefallen