Beruflich Dokumente
Kultur Dokumente
00
pergamon JournalsLtd.
MICRO-CONCORD:
A LANGUAGE
LEARNERS
RESEARCH
TOOL
TIM JOHNS
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)
12)
41
5;
bi
7)
8!
91
I
Ill
!:I
i ?
Purer.
process
ssing
plants.
depends
An
heavily
improvement
on
on
centrifugal
separator
mounted
They designed
the plant
nium.
this time based
essing
plant,
0 tons of spent
fuel per year
on of the chain-reacting
pile
on
on
on
on
bomb
on
located
at standard
positions
led the length
of the canyons
5 all the uranium.
This play
raction
process
to be applied
on
on
on
on
of
the
first
plutonium
on
solvent-extraction
aPPa
column
and the mixerTypic
the
same
shaft
as
the
mixing
vanes
the principle
that the equipment
inside
t
Purex
solvent
instead
of Butex.
With
a ca
a site owned
by the state
of New York
in
the
both
design
the
December
of
the
extraction
2,
1942,
and
the
explosion
of
th
The important
legacy
o+ H
July
16, 1945.
the inside
and near the top of the canyon
rails.
The crane
operator,
protected
by h
oxidation
states
gave rise to the name Re
a large
scale.
The Redox
process,
with He
TI.LI JOHNS
132
The machine code for the program was written in the Spring term of 1985 to illustrate
text-handling
techniques in 280 assembler for a group of Algerian students following an
English for Science in Engineering
course at Birmingham
University in preparation
for
postgraduate
courses and research at universities
in the UK. Instead of employing the
traditional
strategy in ESP of teaching the English of technology,
I adopted the alternative
approach of teaching a specific area of technology-in
this case computing-through
the
medium of English using a handout-based
workshop approach similar to what the students
could expect in their eventual departments.
For most of them, computational
methods using
a high-level language such as FORTRAN
or PASCAL would form an important
research
tool, and a course in FORTRAN
was provided by the Department
of Computing
of the
University in the Summer term. Rather than anticipate that work, I decided to concentrate
on the architecture
of a typical microprocessor
not only as that might be of some interest
in itself (particularly
for the students specializing in Electronics or Robotics), but also on
the assumption
that an appreciation
of the workin, 0 of the 280 and the structuring
of
programs in a low-level language would help them towards a better understanding
of the
potential and the limitations
of high-level computer languages.
The involvement
of the
students ranged from a number of decisions on the structure of the program (for example,
the decision to use delimiters
rather than loop counters
as exit conditions),
to the
identification
and typing-in of texts, and the preliminary
experiments on the ways in which
the program could be exploited in language-learning.
The overall structure of the machine-code
(Fig. 1). Points to note are:
routine
flow-diagram
1. On entry to the routine, the text-file will be resident at location 758AH: this gives room
for files of over 34K, though in practice due to the limitations
of the word-processor
(Tasword Two) used to prepare text-files, most files will be 20K in length (i.e. a little
more than three thousand
words). The word to be searched for has been POKED by
the BASIC shell program into a buffer at 7530H. Both the text file and the buffer use
the character with code 127 (the copyright symbol) as a delimiter: this was chosen as
one that can be readily inserted at the end of the file using the word-processor,
and
that is unlikely to be needed within text.
2.
In searching for the target word, the routine checks for the upper-case equivalent of
a lower-case character in the buffer, but not vice versa: thus computer in the buffer
indentifies
both Computer
and COMPUTER
in the text, but Computer does not
identify computer.
This may occassionally
be of use to the user for indentifying
instances of a word used in sentence-initial
position. The routine is able to recognize
the standard initial and final word delimiters in text, including hyphens. In addition
to single words, the routine allows the user to search for phrases such as in order to
or as a result of.
and considering-
),lICRO-CONCORD:
153
of a particular
knowledge
morpheme
or
Fig.
1. Micro-concord:
Flow diagram
of machine-code
routine.
154
TIM JOHNS
extended context for an item, students have tended to prefer 74-column printout which
is more legible and easier to scan rapidly. The citations are numbered, using a subroutine
in the Spectrum ROM provided for the line-numbering
of BASIC programs. The routine
provides two entry points, the first of which sets up the printer, prints out the heading,
and resets the number counter to zero: the second is for use with the microdrive and
disc versions of the program (see below) where multiple text files are to be searched
for a target word, and the printing-out
of the citations is to continue from the point
reached with the previous file. As is usual with KWIC concordances,
citations are given
as single lines of context with the target words printed centrally (i.e. with 30 characters
of preceding context for 74-column printout, and 60 characters for 130-column printout):
this format facilitates rapid scanning of a number of citations in order to examine the
linguistic features that the contexts have in common,
but has the disadvantage
that
the context is arbitrarily chopped off (often in mid-word) at either end of the citation.
5. The speed of the routine is best measured by the time taken for an unsuccessful
search
of a 20K text file. This is to some extent dependent on the number of near-misses that
the routine examines: in a good case (attempting
to match xxxx) the routine takes
0.37 seconds, and in a bad case (attempting to match thex) 0.57 seconds. These timings
mean that the speed of the program will depend on the time taken to load text files
and to print out citations: that is to say, it is a function of the hardware rather than
of the software.
HARDWARE
The basic hardware
required
A 48K Spectrum
A printer interface
interface).
CONFIGURATIONS
(the program
was developed
program
using
comprises:
a Kempston
Centronics
An 80-column printer (I use the Brother M1009, which is a versatile and relatively
inexpensive dot-matrix printer: it has the advantage that it recognizes the standard
Epson control codes for enlarged, condensed,
and emphasized
printing).
The potential power and flexibility of the program depends largely on the capacity, speed,
and convenience of the medium used for storing text-files. Three configurations
are possible:
1. Cassette-based
This is the cheapest, but also the most limited. A 20K file takes approximately
2 minutes
to load from cassette, and cassette allows only limited and clumsy file-handling.
With such
a system it is practicable
to search only a single file for instances of a particular
item:
adequate in studying the contexts of high frequency grammatical
items (e.g. prepositions,
auxiliary verbs) and certain technical and semi-technical
terms specific to a particular set
of texts, but making it difficult to sample properly most items with a frequency higher
than 1: 500, or to study a wide range of texts, without a great deal of irksome juggling
of cassettes.
MICRO-CONCORD:
A L.ASGIJAGE
LEhRSERS
RESE.ARCH
TOOL
155
2. Microdrive-based
The Sinclair microdrive is a stringy floppy (high-speed tape loop) system that gives much
of the speed and convenience
of discs at a fraction of the expense. The capacity of the
microdrive
is between 85K and 95K, which will accommodate
both the Micro-concord
program and four 20K files (i.e. just under 14,000 words of text). With a microdrive it
system: all the user has to do is to turn on the
is possible to implement
a turnkey
equipment,
select a cartridge and insert it in the microdrive,
and press R (for RUN) and
ENTER: the Micro-concord
program will then autoload,
ask whether 74- or 130-column
printout is required, and then scan through all the files on the cartridge (or only those
specified by the user) for citations of any item requested. An unsuccessful
search of 80K
of text (including the time needed to print out the heading and to locate and load four
text-files) takes approximately
50 seconds: that is, less than half the time needed to load
a single file from cassette. Experience has shown that a microdrive-based
system allows
the investigation
of a much wider range of items than can readily be handled using cassette
storage.
3. Disc- based
A number
of disc interfaces
are available
for the Spectrum.
Using the interface
manufactured
by Technology
Research, and an go-track double-sided
Cumana disc drive,
the time needed for an unsuccessful
four-file search is reduced to 20 seconds. More
important,
the capacity of such a system (660K per disc-i.e.
110,000 words of text) makes
it possible to base on a humble home computer such as the Spectrum a system that begins
to approach in power and flexibility the professional
concordancing
packages available
hitherto only on much more sophisticated
and less accessible machines.
TEXT
ENTRY
AND CLASSIFICATION
The increased storage capacity and ease of access of systems based on the Microdrive and
on disc brings to the fore questions (neglected in the early stages of development
of the
program) relating to the selection, entry, and classification of texts. It is part of the approach
underlying
the program that a large part of the responsibility
for identifying
and even for
entering texts should remain with the students. Whoever has to do it, there is a great deal
of typing to be done to fill one double-sided
80-track disc. Large-scale data-based projects
in Computational
linguistics such as the COBUILD lexicography
project at Birmingham
University increasingly
make use of entry methods other than the keyboard-for
example
the optical character reader, .and the reading of type-setting
tapes. At least one cheap
character-recognition
device (the Omnireader) has appeared for use with microcomputershowever, it appears that this may be too limited as yet to offer a viable alternative
to
keyboard entry.
Whatever the method used for entry of texts, it is crucial, if the learner or teacher is to
be able to identify precisely those texts in which he or she is interested, or to make meaningful
comparisons
between different types of texts, that a clear system of classification
should
be employed that is comprehensible
both to the user and to the machine. The best approach
appears to be to use the file name to code information
as to text classification.
For the
TI\l JOHNS
156
sort of text in which students of our unit are likely to be most interested,
classification
needs to take account of:
Department
of
Administration).
origin
(e.g.
Topic
Genre
(e.g. Textbook,
g-character
Engineering,
Construction,
Research
Civil
Computational
Report,
text-files
file-name
Physics,
Lecture,
Student
with identical
(the maximum
Methods,
allowed
the scheme of
Development
Sociology).
Essay)
classiifications
by the TR-DOS
on the first
system) reads
TRHIREOl
Department
TRansportation
and Environmental
Planning
Topic Area
HIghway
Construction
Genre
REsearch
Report
Numerical
01
key
This system makes it relatively easy for the computer to recover all the texts in the same
genre or genres, for example, across a wide range of Departments
and topics-and
also
from a number of microdrive
cartridges or discs.
EXTENSIONS
AND
DEVELOPMENTS
.\IICRO-CONCORD:
A LANGLJAGE
LE.ARNERS
RESEhRCH
TOOL
15i
158
TIM JOHNS
3. Even with the text storage available on microdrive, \ve have already, on occasion,
encountered a problem with the very large number of citations generated by highfrequency keywords: 83K of text files, for example, produces 578 citations for of and
1,133 citations for the. At present it is possible to abort a printout by using the Break
key: a more elegant solution would be to give the user the option of specifying in advance
a maximum number of citations, or that the program should print out every nth citation.
4. In addition to the general-purpose wild-card symbol already implemented, it was
intended, in the early planning of the program to offer two further wild cards: any
single character and any single character or no character. Experience in using the
program has led us to give a higher priority to an alternator symbol (e.g. /). No
juggling with wild-cards, for example, will recover all the forms of the verb be in
a single pass through the text-files. If one were able to specify all those forms in a single
input-e.g.
be/being/been/am/is/are/was/were-then
the task of recovering the
information would be speeded up considerably in comparison with a series of searches
for each variant, since the time taken by the program to perform a multiple search
would still be negligible in comparison with the time needed to load files more than
once from external memory. In addition to facilitating searches for variants of a single
lexeme, an alternator symbol would make it easier to investigate the behaviour of lexical
sets-for example, patterns of transitivity and complementation with specific lists of
verbs.
A CONCORDANCE-BASED
METHODOLOGY
There are three potential users of a concordancing program: the linguistic researcher, the
teacher, and the language learner. While Micro-concord was written with the last in mind,
the program may be of some use to others also.
Most computer-based research into text was originally undertaken by scholars concerned
with literary studies (Hockey, 1980); in recent years there has been an increasing interest
in the application of such research to syllabus design, and the writing of grammars,
dictionaries and coursebooks. Sinclair (1985) has claimed that the effect of corpus-based
research on English-language teaching is likely to be radical:
On the one hand, there is now ample evidence of the existence of significant
language patterns which have gone largely unrecorded in centuries of study: on
the other hand there is a dearth of support for some phenomena which are regularly
put forward as normal patterns of English.
While large-scale linguistic research is likely to remain the province of mainframe computers
and the massive databases which they can access, microcomputer-based
programs such
as Micro-concord may be able to play a subsidiary role in investigating specialized varieties
of text that are neglected in the large corpora or where the classification systems of the
large corpora are insufficiently delicate to recover the information required. A disc-based
version of Micro-concord, for example, would form an excellent tool in the investigation
of learners writing, permitting the examination not only of recurrent patterns of error,
MICRO-CONCORD:
.A L.ANGUAGE LE.-\RNERS
RESEARCH
TOOL
159
to
160
T1.11 JOHNS
can do is to learn the correct answer by heart once it has been revealed). The concordance
is inherently
more open and more flexible. Without questions given in advance, it leads
the learner to generate his or her own questions, and to test them out against the evidence.
There is at least a prima facie case for thinking that the early exposure to authentic text
and the skills of observation and inferencing developed in working with concordance output
may be transferred to language learning away from the computer and outside the classroom:
this is one of the many aspects of the approach that merit further investigation.
What is
clear is that the view of language learning as a species of research activity may cause
difficulties for some language teachers, particularly
if its implications
are carried through
in other aspects of the syllabus (see, for example, the approach to reading outlined in Johns
and Davies, 1983): our experience to date in the English for Overseas Students Unit suggests,
however, that the change of approach is accepted readily by most students providing it
is carefully prepared and explained.
The concordance-based
materials and activities we have explored to date are esperimental,
and do little more than scratch the surface of the new approach.
A few examples may
indicate some of the possibilities.
Pre-printed
concordance
output,
and interactive
use of the concordancer
in the
classroom, can provide a range of exercises and activities supplementary
to, and in some
cases replacing,
more traditional
materials.
Our work in vocabulary
teaching,
for
example, lays stress on the development
of strategies for guessing unknown words from
contextual clues: the multiple contexts offered by a concordance
gives the opportunity
for the hypotheses generated by one context to be tested against other contexts. In the
teaching of grammar, the concordancer
is especially valuable in dealing with the crucial
area where syntax overlaps with lesis. A frequent request by my students is for help
with prepositions
in English. One of the first experiments
with output from Microconcord was to use concordances
of the half-dozen
commonest
prepositions,
getting
students to underline
on the printout the head word colligating
with the preposition
(e.g. depending on, on demand), and then to develop a system of classification
for
the examples they found. The reaction of the students was that this was far more helpful
than the usual exercise involving filling in the missing prepositions.
With the computer
on hand, they soon began to investigate such further questions as whether, judging
from the contexts in which they occur, on the contrary could be distinguished
from
on the other hand, and then whether these could in turn be distinguished
from
however and nevertheless.
Similarly, a lesson that started by looking at the contexts
of the preposition
in ended with us getting concordances
for way*, method*,
procedure*
and process * to see if and how these differed in scientific text.
2.
.LlICRO-CONCORD:
;\ L.VKiU.\GE
LE.\RNERS
RESEARCH
161
TOOL
translating
who had been doing a project on how it might be possible to sell a typically
Chinese product such as rice wine to a European clientele. At the beginning of the session
the students were given 10 minutes to write, using the word-processor,
an advertisement
for rice wine: their efforts were then amalgamated
through the network into one large
student
text file. In the next 10 minutes they entered extracts from authentic
advertisements
for wine culled from magazines
and newspapers:
these were then
amalgamated
in a copywriters text file. Using the concordancing
program we then
investigated and discussed with the students the similarities and the differences between
their own use of certain key items and that of the copywriters.
The word wine itself,
for example, had a high frequency in both text files-yet
in the copywriters file it was
wine district)-a
usage which was
usually used as a modifier (e.g. wine-growers,
absent from the students file. Was the difference purely linguistic (the students having
a poorer repertoire of structural devices) or strategic (the copywriters
purpose being
to sell wine by its associations rather than directly)? The students were remarkably
fond
of the word connoisseur,
which appeared in a number of different contexts-yet
it
was absent from the copywriters
files: were the copywriters
anxious
to avoid
connotations
of elitism? In the time available, it was possible to do little more than
raise such questions-even
so, the session gave the relationship
between language and
the writers intention,
and to do so in a way which emphasized the usage of the group
rather than of the individual.
3. The third, and most important,
potential use of an interactive
concordancer
is as a
learning resource to be used freely by-students
on their own initiative with the role of
the teacher restricted to suggesting
points at which it may help to solve learning
difficulties. One possibility with which we have experimented is its use in helping students
to correct their written work, some mistakes being underlined
and a C placed in the
margin signifying You have used this word in a way which is different from how an
English person would use it: if you get a concordance
of the word you should be able
to work out a suitable correction
for yourself.
Many questions about the potential of the Micro-concord
program remain unanswered.
For example, it was developed for a particular type of student (adult: well motivated:
a
sophisticated learner with experience of research methods in his subject area) with particular
needs (fairly closely specifiable in terms of target texts) in a particular learning/teaching
situation
(in which a great deal of emphasis is placed on developing
students learning
strategies and on their responsibility
for their own learning).
It remains to be seen how
far the research methodology
outlined above would be suitable for other learners-for
example, children learning a foreign language at school. The writer would be particularly
interested to hear from other teachers who wish to experiment-or
who may, indeed, have
already experimented-with
a similar approach
for their own students.
REFERENCES
K., CORBETT,
G. and ROGERS, .M. (1985) Using Computers
an Example,
The Language Teacher (Tokyo) 913, pp. 4-i.
AHMAD,
CORDER,
of Learners
Errors,
I&IL
with Advanced
5, pp. 161-170.
Language
Learners:
162
TIM JOHNS
K. S. (1967) Reading: a Psycholinguistic Guessing Game, Jwrnal ofthe Reading Specialist, pp.
126-35.
HIGGINS, J. and JOHNS, T. (1984) Compufers in Language Learning, Collins.
GOODMAN,
SINCLAIR, J. (1985) Retrospect and Prospect: Selected Issues in English in the World: Teeachingand Learning
the Longuuge and Literatures, Quirk, Randolph and Widdowson, H. G. (eds.), Cambridge.
SKEHAN, P. (1981) ESP teachers, computers and research. In The ESP Teacher: Role, Development and
Prospects. ELT Document 112, British Council.