Beruflich Dokumente
Kultur Dokumente
Moshe Koppel
Profiling
In real life:
1. We dont have a closed set of candidate authors 2. We dont have writing samples from each of them We can still try to say something about the author:
Gender Age group Linguistic background
Which is Male/Female?
My aim in this article is to show that given a relevance theoretic approach to utterance interpretation, it is possible to develop a better understanding of what some of these so-called apposition markers indicate. It will be argued that the decision to put something in other words is essentially a decision about style, a point which is, perhaps, anticipated by Burton-Roberts when he describes loose apposition as a rhetorical device. However, he does not justify this suggestion by giving the criteria for classifying a mode of expression as a rhetorical device. Nor does he specify what kind of effects might be achieved by a reformulation or explain how it achieves those effects. In this paper I follow Sperber and Wilson's (1986) suggestion that rhetorical devices like metaphor, irony and repetition are particular means of achieving relevance. As I have suggested, the corrections that are made in unplanned discourse are also made in the pursuit of optimal relevance. However, these are made because the speaker recognises that the original formulation did not achieve optimal relevance . The main aim of this article is to propose an exercise in stylistic analysis which can be employed in the teaching of English language. It details the design and results of a workshop activity on narrative carried out with undergraduates in a university department of English. The methods proposed are intended to enable students to obtain insights into aspects of cohesion and narrative structure: insights, it is suggested, which are not as readily obtainable through more traditional techniques of stylistic analysis. The text chosen for analysis is a short story by Ernest Hemingway comprising only 11 sentences. A jumbled version of this story is presented to students who are asked to assemble a cohesive and well formed version of the story. Their re-constructions are then compared with the original Hemingway version.
Non-fiction / Male
Arts (Non-academic) Arts (Academic) Belief & Thought Biography Commerce Leisure Science Soc. Sci. (Non-ac.) Soc. Sci. (Ac.) World Affairs
151
16 24 24 54 10 16 26 52 38 42
Experiment
Features: 400+ FW ; 600+ POS n-grams
Learner: exponential gradient / linear SVM Test: 10-fold cross-validation
FW POS FW+POS
All docs
Fiction
Non-Fiction
accuracy
FWPOS FW POS
Accuracy
F WPOS POS FW
Non-Fiction
Male: that, one, of, PRP, AT0 Female: she, for, with, and, in, PNP
Feature Frequencies
Fiction Feature PNP he she AT0 DT0 the XX0 PRP PRF for with and Male stderr 732 14 145 4.7 67 4.3 735 9.5 160 2.9 520 8.6 84 2.4 623 6.0 170 4.2 55.7 1.1 58.6 1.1 234 4.9 Female stderr 809 15 135 4.7 139 6.9 626 8.7 153 2.0 418 7.5 98 2.2 615 5.7 158 3.7 61.3 1.0 66.5 1.0 249 5.5 Non-fiction Male stderr 291 12 47.5 3.5 8.73 1.7 884 9.1 220 4.0 611 8.4 54 1.5 767 5.9 355 7.2 77.9 1.6 56.9 1.1 242 3.9 Female stderr 331 17 48.1 4.3 21.5 2.3 822 12 204 4.6 614 12 55 2.3 763 7.0 324 7.9 90.7 1.4 67.8 1.4 287 5.2
Informational features
Involvedness features
Which is Male/Female?
My aim in this article is to show that given a relevance theoretic approach to utterance interpretation, it is possible to develop a better understanding of what some of these so-called apposition markers indicate. It will be argued that the decision to put something in other words is essentially a decision about style, a point which is, perhaps, anticipated by Burton-Roberts when he describes loose apposition as a rhetorical device. However, he does not justify this suggestion by giving the criteria for classifying a mode of expression as a rhetorical device. Nor does he specify what kind of effects might be achieved by a reformulation or explain how it achieves those effects. In this paper I follow Sperber and Wilson's (1986) suggestion that rhetorical devices like metaphor, irony and repetition are particular means of achieving relevance. As I have suggested, the corrections that are made in unplanned discourse are also made in the pursuit of optimal relevance. However, these are made because the speaker recognises that the original formulation did not achieve optimal relevance . The main aim of this article is to propose an exercise in stylistic analysis which can be employed in the teaching of English language. It details the design and results of a workshop activity on narrative carried out with undergraduates in a university department of English. The methods proposed are intended to enable students to obtain insights into aspects of cohesion and narrative structure: insights, it is suggested, which are not as readily obtainable through more traditional techniques of stylistic analysis. The text chosen for analysis is a short story by Ernest Hemingway comprising only 11 sentences. A jumbled version of this story is presented to students who are asked to assemble a cohesive and well formed version of the story. Their re-constructions are then compared with the original Hemingway version.
Which is Male/Female?
My aim in this article is to show that given a relevance theoretic approach to utterance interpretation, it is possible to develop a better understanding of what some of these so-called apposition markers indicate. It will be argued that the decision to put something in other words is essentially a decision about style, a point which is, perhaps, anticipated by Burton-Roberts when he describes loose apposition as a rhetorical device. However, he does not justify this suggestion by giving the criteria for classifying a mode of expression as a rhetorical device. Nor does he specify what kind of effects might be achieved by a reformulation or explain how it achieves those effects. In this paper I follow Sperber and Wilson's (1986) suggestion that rhetorical devices like metaphor, irony and repetition are particular means of achieving relevance. As I have suggested, the corrections that are made in unplanned discourse are also made in the pursuit of optimal relevance. However, these are made because the speaker recognises that the original formulation did not achieve optimal relevance . The main aim of this article is to propose an exercise in stylistic analysis which can be employed in the teaching of English language. It details the design and results of a workshop activity on narrative carried out with undergraduates in a university department of English. The methods proposed are intended to enable students to obtain insights into aspects of cohesion and narrative structure: insights, it is suggested, which are not as readily obtainable through more traditional techniques of stylistic analysis. The text chosen for analysis is a short story by Ernest Hemingway comprising only 11 sentences. A jumbled version of this story is presented to students who are asked to assemble a cohesive and well formed version of the story. Their re-constructions are then compared with the original Hemingway version.
Which is Male/Female?
My aim in this article is to show that given a relevance theoretic approach to utterance interpretation, it is possible to develop a better understanding of what some of these so-called apposition markers indicate. It will be argued that the decision to put something in other words is essentially a decision about style, a point which is, perhaps, anticipated by Burton-Roberts when he describes loose apposition as a rhetorical device. However, he does not justify this suggestion by giving the criteria for classifying a mode of expression as a rhetorical device. Nor does he specify what kind of effects might be achieved by a reformulation or explain how it achieves those effects. In this paper I follow Sperber and Wilson's (1986) suggestion that rhetorical devices like metaphor, irony and repetition are particular means of achieving relevance. As I have suggested, the corrections that are made in unplanned discourse are also made in the pursuit of optimal relevance. However, these are made because the speaker recognises that the original formulation did not achieve optimal relevance . The main aim of this article is to propose an exercise in stylistic analysis which can be employed in the teaching of English language. It details the design and results of a workshop activity on narrative carried out with undergraduates in a university department of English. The methods proposed are intended to enable students to obtain insights into aspects of cohesion and narrative structure: insights, it is suggested, which are not as readily obtainable through more traditional techniques of stylistic analysis. The text chosen for analysis is a short story by Ernest Hemingway comprising only 11 sentences. A jumbled version of this story is presented to students who are asked to assemble a cohesive and well formed version of the story. Their re-constructions are then compared with the original Hemingway version.
Which is Male/Female?
My aim in this article is to show that given a relevance theoretic approach to utterance interpretation, it is possible to develop a better understanding of what some of these so-called apposition markers indicate. It will be argued that the decision to put something in other words is essentially a decision about style, a point which is, perhaps, anticipated by Burton-Roberts when he describes loose apposition as a rhetorical device. However, he does not justify this suggestion by giving the criteria for classifying a mode of expression as a rhetorical device. Nor does he specify what kind of effects might be achieved by a reformulation or explain how it achieves those effects. In this paper I follow Sperber and Wilson's (1986) suggestion that rhetorical devices like metaphor, irony and repetition are particular means of achieving relevance. As I have suggested, the corrections that are made in unplanned discourse are also made in the pursuit of optimal relevance. However, these are made because the speaker recognises that the original formulation did not achieve optimal relevance . The main aim of this article is to propose an exercise in stylistic analysis which can be employed in the teaching of English language. It details the design and results of a workshop activity on narrative carried out with undergraduates in a university department of English. The methods proposed are intended to enable students to obtain insights into aspects of cohesion and narrative structure: insights, it is suggested, which are not as readily obtainable through more traditional techniques of stylistic analysis. The text chosen for analysis is a short story by Ernest Hemingway comprising only 11 sentences. A jumbled version of this story is presented to students who are asked to assemble a cohesive and well formed version of the story. Their re-constructions are then compared with the original Hemingway version.
Which is Male/Female?
My aim in this article is to show that given a relevance theoretic approach to utterance interpretation, it is possible to develop a better understanding of what some of these so-called apposition markers indicate. It will be argued that the decision to put something in other words is essentially a decision about style, a point which is, perhaps, anticipated by Burton-Roberts when he describes loose apposition as a rhetorical device. However, he does not justify this suggestion by giving the criteria for classifying a mode of expression as a rhetorical device. Nor does he specify what kind of effects might be achieved by a reformulation or explain how it achieves those effects. In this paper I follow Sperber and Wilson's (1986) suggestion that rhetorical devices like metaphor, irony and repetition are particular means of achieving relevance. As I have suggested, the corrections that are made in unplanned discourse are also made in the pursuit of optimal relevance. However, these are made because the speaker recognises that the original formulation did not achieve optimal relevance . The main aim of this article is to propose an exercise in stylistic analysis which can be employed in the teaching of English language. It details the design and results of a workshop activity on narrative carried out with undergraduates in a university department of English. The methods proposed are intended to enable students to obtain insights into aspects of cohesion and narrative structure: insights, it is suggested, which are not as readily obtainable through more traditional techniques of stylistic analysis. The text chosen for analysis is a short story by Ernest Hemingway comprising only 11 sentences. A jumbled version of this story is presented to students who are asked to assemble a cohesive and well formed version of the story. Their re-constructions are then compared with the original Hemingway version.
Blog Corpus
85,000 blogs blogger-provided profiles (gender, age, occupation, astrological sign) harvested August 2004 non-text ignored (formatting, quoting)
Example 1
Yesterday we had our second jazz competition. Thank God we weren't competing. We were sooo bad. Like, I was so ashamed, I didn't even want to talk to anyone after. I felt so rotton, and I wanted to cry, but...it's ok.
Example 2
My gracious boss had agreed to let me have one week off of "work." He did finally give me my report back after eight freakin' days! Now I only have the rest of this week and then one full week after my vacation to finish this damned thing.
Example 3
So about a month or two ago, I met Katy N. at a party in New York. Katy's friend, Kevin M., whom she met while living in Barcelona last year, lives in Miami and is working on getting a TV series produced. Kevin is friends with a guy named Charlie P.
Blog Corpus
age unknown 13-17 18-22 23-27 28-32 33-37 38-42 43-48 >48 Total gender female male 12287 12259 6949 4120 7393 7690 4043 6062 1686 3057 860 1827 374 819 263 584 314 906 34169 37324 Total 24546 11069 15083 10105 4743 2687 1193 847 1220 71493
Experimental Setup
Feature sets:
Content: words (filtered by infogain on train set) Style: parts-of-speech, function words, blog slang Learning algorithms: Real-valued balanced winnow (RBW) Bayesian Multinomial Regression (BMR)
Age: Classification
feature college bar apartment beer student drunk album dating semester someday
10s 1.51 0.45 0.18 0.32 0.65 0.77 0.64 0.31 0.22 0.35
20s 1.92 1.53 1.23 1.15 0.98 0.88 0.84 0.52 0.44 0.4
30s 1.31 1.11 0.55 0.7 0.61 0.41 0.56 0.37 0.18 0.28
feature college bar apartment beer student drunk album dating semester someday
10s 1.51 0.45 0.18 0.32 0.65 0.77 0.64 0.31 0.22 0.35
20s 1.92 1.53 1.23 1.15 0.98 0.88 0.84 0.52 0.44 0.4
30s 1.31 1.11 0.55 0.7 0.61 0.41 0.56 0.37 0.18 0.28
feature son local marriage development tax campaign provide democratic systems workers
10s 0.51 0.38 0.27 0.16 0.14 0.14 0.15 0.13 0.12 0.1
20s 0.92 1.18 0.83 0.5 0.38 0.38 0.54 0.29 0.36 0.35
30s 2.37 1.85 1.41 0.82 0.72 0.7 0.69 0.59 0.55 0.46
Gender: Classification
BMR
log(30s/10s)
-2
-4
-6
-8 -2 -1 0 1 2
log(male/female)
husband
log(30s/10s)
-2
-4
-6
-8 -2 -1 0 1 2
log(male/female)
Native Language
Given English text, can we determine the authors native language?
Try it yourself. These were written by Russian, French and Spanish speakers, respectively. Can you tell which is which?
In the second part of this outhors novel, called Time Passes, time has passed indeed and Mrs Ramsay has died. There are pejudments of small groups, such as homosexuals, inmigrants, aids diseaseds, etc. But "political correctness" has have positive and negative consecuences. There is one more kind of films irritating many television viewers - "soap" serials. Santa Barbara has even won "Oskar" prize.
Possible Clues
Patterns of native language are typically reflected in how other languages are spoken (Rado61, Corder81): Word selection Syntax Spelling
Frequency of function words Frequency of letter sequences (adapted from Peng+ 04) Idiosyncrasies
We will gather idiosyncrasies data automatically.
Orthographic Idiosyncrasies
Repeated letter (e.g. remmit instead of remit) Double letter appears once (e.g. comit instead of commit) Letter instead of (e.g. firsd instead of first) Letter inversion (e.g. fisrt instead of first) Inserted letter (e.g. friegnd instead of friend) Missing letter (e.g. frend instead of friend) Conflated words (e.g stucktogether)
Syntactic Idiosyncrasies
Sentence Fragment Run-on Sentence Repeated Word Missing Word Mismatched Singular/Plural Mismatched Tense that/which confusion Rare POS pairs (Chodorow-Leacock 00)
Test Corpus
International Corpus of Learner English (Granger98)
11 countries Subjects same age, proficiency level Samples same genre, length Actually used in study- 258 docs from each of
France Spain Bulgaria Czech Rep. Russia
Baseline=20%
Confusion Matrix
Classified As
Czech
French
Bulgarian
Russian
Spanish
Actual
Czech
209
18
20
10
French
Bulgarian
9
14 24 16
219
8 8 10
13
211 24 10
12
18 194 7
5
7 8 215
Russian Spanish
Spanish c-q confusion (e.g. cuality), m-n confusion (e.g. confortable), undoubled consonant (e.g. comit)
Bulgarian most_ADVERB, cannot (uncontracted) Czech doubled consonant (e.g. remmit)
French: In the second part of this outhors novel, called Time Passes, time has passed indeed and Mrs Ramsay has died. Spanish: There are pejudments of small groups, such as homosexuals, inmigrants, aids diseaseds, etc. But "political correctness" has have positive and negative consecuences.
Russian: There is one more kind of films irritating many television viewers - "soap" serials. Santa Barbara has even won "Oskar" prize.
Real-Life Issues
Many candidate languages
Very short texts Unpredictable English proficiency
Personality
Pennebaker data:
Students wrote essays Same students took personality assessment tests
Accuracy Results
Open 66%
Conscientious 65%
Neurotic
Extroverted
63%
62%
Agreeable
60%
Key Features
Openness
consciousness, strange, thoughts, maybe, you hope, feel, home, friends, football, team
Conscientiousness
school, always, high, grades damn, bad, hate, you, more