Roblem: Parth Mehta, Prasenjit Majumder Parth - Me@daiict - Ac.in, P - Majumder@daiict - Ac.in

Hochgeladen von

Prasenjit Majumder

0% fanden dieses Dokument nützlich (0 Abstimmungen)

15 Ansichten1 Seite

Ijcnlp Poster

Originaltitel

Ijcnlp Poster

Copyright

Verfügbare Formate

PDF, TXT oder online auf Scribd lesen

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Dieses Dokument melden

Ijcnlp Poster

Copyright:

Verfügbare Formate

Als PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

0% fanden dieses Dokument nützlich (0 Abstimmungen)

15 Ansichten1 Seite

Roblem: Parth Mehta, Prasenjit Majumder Parth - Me@daiict - Ac.in, P - Majumder@daiict - Ac.in

Hochgeladen von

Prasenjit Majumder

Ijcnlp Poster

Copyright:

Verfügbare Formate

Als PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

Zu Seite

Sie sind auf Seite 1von 1

Im Dokument suchen

OPTIMUMPARAMETER SELECTIONFOR KLD

BASEDAUTHORSHIP ATTRIBUTIONINGUJARATI
Parth Mehta, Prasenjit Majumder
parth_me@daiict.ac.in, p_majumder@daiict.ac.in
PROBLEM
Given an article and a list of candidate authors, Authorship attribution system needs to predict the author who wrote the given article.The
proposed method utilises the implicit feature weighting property of Kullback Liebler Divergence to create the author proles which do not face
the following problems

2
Method Completely unreliable proles when number of authors increases.
Delta Method[1] L
1
Norm inefcient to capture the distance between author proles.
Z-Score[2] Performance limited by the source of the training set. In case the author writes for two separate newspapers (or two columns
for same newspaper) Z-Score based method distinguishes between the two. That might not always be the ideal case.
PROFILE BASED SYSTEMS
DATA
The articles used here are taken fromsup-
plements of the popular Gujarati newspaper
Gujarat Samachar
49 Weekly periodicals
40 Different authors (9 authors wrote
two columns every week)
5039 Total documents
9 Different categories
WE PROPOSE
Keeping in mind that KLD implicitly weights the terms by their frequencies we propose
using KLD with a low number of most frequent terms because:
As the most frequent terms tend to be common across all articles, even a few articles can
build a reliable author prole. On the other hand for specic vocabulary based systems
a small training set might be insufcient to represent the entire set of specic vocabulary.
Also these frequent terms will be common across articles of different genres (written by
same author) while the same might not be true for specic vocabulary.
RESULTS
For a given experiment the techniques
which had all the parameters same (except
the one under experimentation) are shown in
same colour. shows signicant difference
compared to the maximum value.

0
10
20
30
40
50
60
70
80
90
100
= 0 = 0.1 = 0, L1 = 0, L2 = 0.01 = 0.001 = 0
Z-Score Delta KLD 2
A
c
c
u
r
a
c
y

Effect of variation in smoothing parameter

50
55
60
65
70
75
80
85
90
95
100
tf > 10, df
> 3
tf > 100,
df>3
Top 400 Top 100 tf > 10, df
> 3
tf > 100, df
> 3
tf > 1000,
df > 3
Z-Score Delta (L2) KLD
A
c
c
u
r
a
c
y

Effect of variation in number of terms

0
10
20
30
40
50
60
70
80
90
100
N = 10 N = 40 N = Nmax
A
c
c
u
r
a
c
y

Effect of variation in size of training set
Delta (L2)
Z-Score
KLD
EXPERIMENTS
While keeping other parameters constant
1. Find the best performing value of
smoothing parameter ()
2. Optimum number of terms (T)
3. Effect of variation in number of train-
ing texts (N)
We start with a reasonable guess of the
parameter values keeping in mind the re-
sults found in the literature.
ANALYSIS
For each of the 9 authors having two dis-
tinct columns we constructed the author pro-
les assuming that each column is written by
a separate author. We then analyse the top
50 terms that distinguish a particular prole
from that of other authors.
For the author Ashok Dave the top ve
terms selected from On a wednesday afternoon
and Encounter are more or less same for the
KLD based method but are distinct for the
Z Score based method.
CONCLUSION
For Gujarati newspaper articles K.L.D.
based authorship attribution with proper pa-
rameter selection is comparable to the cur-
rent state of art Z-score based method when
sufcient number of articles are available as
a training set and outperforms it when the
size of training set is small.
ACKNOWLEDGEMENT
This research is supported by part by
the Cross Lingual Information Access project
funded by D.I.T., Government of India.
REFERENCES
[1] John Burrows. Delta: A measure of stylistic difference and a guide to likely authorship In Literary and Linguistic
Computing, 2002
[2] Jacques Savoy. Authorship attribution based on specic vocabulary. In ACM Transactions on Information Systems,
2012

Das könnte Ihnen auch gefallen

Unit-2 CH II
Dokument58 Seiten
Unit-2 CH II
kaartheeka01
Noch keine Bewertungen
Linear Discriminant Analysis How To Have A Practical Approach To An LDA Model?
Dokument6 Seiten
Linear Discriminant Analysis How To Have A Practical Approach To An LDA Model?
Dharaneesh .R.P
Noch keine Bewertungen
Database Systems (CS-122) - Lecture 09 & 10
Dokument18 Seiten
Database Systems (CS-122) - Lecture 09 & 10
fsamreen30970
Noch keine Bewertungen
Relational Model Dan Relational Algebra
Dokument32 Seiten
Relational Model Dan Relational Algebra
Rebecca Rekanita
Noch keine Bewertungen
Developers Google Com Machine Learning Glossary
Dokument85 Seiten
Developers Google Com Machine Learning Glossary
Mithun Pant
Noch keine Bewertungen
Trees Class
Dokument53 Seiten
Trees Class
L Prashanthi
Noch keine Bewertungen
Sensitivity Analyses: A Brief Tutorial With R Package Pse, Version 0.1.2
Dokument14 Seiten
Sensitivity Analyses: A Brief Tutorial With R Package Pse, Version 0.1.2
Kolluru Hemanthkumar
Noch keine Bewertungen
Some More Questions
Dokument3 Seiten
Some More Questions
Surya T
Noch keine Bewertungen
R Algebra
Dokument30 Seiten
R Algebra
Samuel Raju
Noch keine Bewertungen
Sww11 Preliminary Report
Dokument9 Seiten
Sww11 Preliminary Report
qwerty
Noch keine Bewertungen
Database Management Module - 2-2
Dokument112 Seiten
Database Management Module - 2-2
22Sneha JhaIT2
Noch keine Bewertungen
MLSC-Week 4
Dokument112 Seiten
MLSC-Week 4
Vishal Patil
Noch keine Bewertungen
Neighbourhood Components Analysis
Dokument3 Seiten
Neighbourhood Components Analysis
john949
Noch keine Bewertungen
Unit 5 Rdbms
Dokument11 Seiten
Unit 5 Rdbms
Patel Dhruval
Noch keine Bewertungen
Ultimate Guide To JPQL PDF
Dokument10 Seiten
Ultimate Guide To JPQL PDF
dematom
Noch keine Bewertungen
Ultimate Guide To JPQL: Selection - The FROM Clause
Dokument10 Seiten
Ultimate Guide To JPQL: Selection - The FROM Clause
Adolf
Noch keine Bewertungen
DMA-chapter No1
Dokument46 Seiten
DMA-chapter No1
Kiran Kshirsagar
Noch keine Bewertungen
PCD - Answer Key NOV 2019
Dokument19 Seiten
PCD - Answer Key NOV 2019
axar kumar
Noch keine Bewertungen
Privileges in SQL:: Allows Read Access To Relation, or The Ability To Query
Dokument29 Seiten
Privileges in SQL:: Allows Read Access To Relation, or The Ability To Query
Ayele Mitku
Noch keine Bewertungen
DMA-chapter No2
Dokument35 Seiten
DMA-chapter No2
Kiran Kshirsagar
Noch keine Bewertungen
Chp2 Database
Dokument8 Seiten
Chp2 Database
Nway Nway Wint Aung
Noch keine Bewertungen
B22CS014 Report
Dokument11 Seiten
B22CS014 Report
b22cs014
Noch keine Bewertungen
What Statistical Analysis Should I Use
Dokument48 Seiten
What Statistical Analysis Should I Use
Erick Barrios
Noch keine Bewertungen
Practical Arbitrary Lookahead LR Parsing: University Florida, Gainesville, Florida 32611
Dokument21 Seiten
Practical Arbitrary Lookahead LR Parsing: University Florida, Gainesville, Florida 32611
asfadsadas
Noch keine Bewertungen
2.relational Database
Dokument74 Seiten
2.relational Database
Souvik
Noch keine Bewertungen
Working With Text
Dokument19 Seiten
Working With Text
api-3792621
Noch keine Bewertungen
The Truth Is in There: Improving Reasoning in Language Models With Layer-Selective Rank Reduction
Dokument22 Seiten
The Truth Is in There: Improving Reasoning in Language Models With Layer-Selective Rank Reduction
Muhammad Shehryar Obaid
Noch keine Bewertungen
LIDA2007 Hayes
Dokument128 Seiten
LIDA2007 Hayes
My Gmail
Noch keine Bewertungen
Undecidability of D and Its Decidable Fragments: Jason Hu, Ondřej Lhoták
Dokument27 Seiten
Undecidability of D and Its Decidable Fragments: Jason Hu, Ondřej Lhoták
Gaston GB
Noch keine Bewertungen
Introduction To Object-Relational Databases: Course: Object-Oriented Database Effective Period: September 2015
Dokument35 Seiten
Introduction To Object-Relational Databases: Course: Object-Oriented Database Effective Period: September 2015
Prima Apriastika
Noch keine Bewertungen
Relationalalgebra E
Dokument70 Seiten
Relationalalgebra E
Vijay Krishna
Noch keine Bewertungen
Locally Interpretable Model-Agnostic Explanations (Lime) : Solfinder Research
Dokument11 Seiten
Locally Interpretable Model-Agnostic Explanations (Lime) : Solfinder Research
Prateek
Noch keine Bewertungen
Decompiling CODASYL DML Into Relational Queries
Dokument23 Seiten
Decompiling CODASYL DML Into Relational Queries
Spod Spoddy
Noch keine Bewertungen
Lect#6 - Relational Algebra
Dokument33 Seiten
Lect#6 - Relational Algebra
Elyzza Roynie
Noch keine Bewertungen
Cambridge O Level Computer Science: Pseudocode Guide For Teachers
Dokument16 Seiten
Cambridge O Level Computer Science: Pseudocode Guide For Teachers
Syed Umair Anwer
Noch keine Bewertungen
T Anmoy Chakraborty: Proceedings of COLING 2012: Demonstration Papers
Dokument10 Seiten
T Anmoy Chakraborty: Proceedings of COLING 2012: Demonstration Papers
music2850
Noch keine Bewertungen
What Is Multiple Linear Regression (MLR) ?
Dokument2 Seiten
What Is Multiple Linear Regression (MLR) ?
Tandav
Noch keine Bewertungen
Sas Gplot Slides 1 26 2011
Dokument59 Seiten
Sas Gplot Slides 1 26 2011
miriyamsatyanarayana
Noch keine Bewertungen
Domain Relational Calculus
Dokument3 Seiten
Domain Relational Calculus
temp16112000
Noch keine Bewertungen
ENCODING & Logistic Regression
Dokument3 Seiten
ENCODING & Logistic Regression
reshma acharya
Noch keine Bewertungen
Structured Query Language
Dokument36 Seiten
Structured Query Language
Amit Kumar
Noch keine Bewertungen
A Review of Construction Methods For Regular LDPC Codes: Rutuja Shedsale
Dokument6 Seiten
A Review of Construction Methods For Regular LDPC Codes: Rutuja Shedsale
Ayman Ibaida
Noch keine Bewertungen
Wikipedia Data Structures
Dokument377 Seiten
Wikipedia Data Structures
Samveen Gulati
100% (1)
UMA Literature Survey
Dokument11 Seiten
UMA Literature Survey
Shri Ramya Perumal
Noch keine Bewertungen
Handwriting Style Classification: Mandana Ebadian Dehkordi, Nasser Sherkat, Tony Allen
Dokument20 Seiten
Handwriting Style Classification: Mandana Ebadian Dehkordi, Nasser Sherkat, Tony Allen
tweety492
Noch keine Bewertungen
FALLSEM2023 24 - BCSE302L - TH - VL2023240100776 - 2023 06 16 - Reference Material I 2
Dokument41 Seiten
FALLSEM2023 24 - BCSE302L - TH - VL2023240100776 - 2023 06 16 - Reference Material I 2
Suryadevara Meghana Chakravarthi 21BCB0125
Noch keine Bewertungen
RDBMSD 1 42
Dokument42 Seiten
RDBMSD 1 42
net chucky
Noch keine Bewertungen
CC4
Dokument3 Seiten
CC4
Sushant Thite
Noch keine Bewertungen
1.0 Modeling: 1.1 Classification
Dokument5 Seiten
1.0 Modeling: 1.1 Classification
Banujan Kuhaneswaran
Noch keine Bewertungen
Class 9 Relational Algebra
Dokument48 Seiten
Class 9 Relational Algebra
Vijay Prime
Noch keine Bewertungen
Advances in SW Testing
Dokument53 Seiten
Advances in SW Testing
Malli
Noch keine Bewertungen
Notes Chapter 1.3 Lecture 1.3.5 (Comparisons)
Dokument4 Seiten
Notes Chapter 1.3 Lecture 1.3.5 (Comparisons)
Srinibas Pattanaik
Noch keine Bewertungen
Chapter 6
Dokument50 Seiten
Chapter 6
Roza Muluken
Noch keine Bewertungen
Literature Review On Partial Least Squares
Dokument4 Seiten
Literature Review On Partial Least Squares
afdtnybjp
100% (1)
Chapter 6
Dokument49 Seiten
Chapter 6
Yared Arega
Noch keine Bewertungen
Lafi23 Final31
Dokument18 Seiten
Lafi23 Final31
s.dedaloscribd
Noch keine Bewertungen
Master of Computer Application-Mc0077
Dokument12 Seiten
Master of Computer Application-Mc0077
V Srinivasa Rao
Noch keine Bewertungen
Rpackages Ianhowson Com Cran Qdap Man Polarity HTML
Dokument53 Seiten
Rpackages Ianhowson Com Cran Qdap Man Polarity HTML
Joseh Tenylson G. Rodrigues
Noch keine Bewertungen
Oracle SQL and PL/SQL
Von Everand
Oracle SQL and PL/SQL
Niraj Gupta
Bewertung: 4.5 von 5 Sternen
4.5/5 (8)
Visualizing Data Structures
Von Everand
Visualizing Data Structures
Rhonda Hoenigman
Noch keine Bewertungen
What Is A Common Assessment
Dokument3 Seiten
What Is A Common Assessment
api-299952808
Noch keine Bewertungen
CBRC Online Review Modules
Dokument34 Seiten
CBRC Online Review Modules
yuan salayog
0% (1)
Lesson Plan Life in The Virginia Colony
Dokument2 Seiten
Lesson Plan Life in The Virginia Colony
api-612861969
Noch keine Bewertungen
Guía de Aprendizaje: English Level 5
Dokument10 Seiten
Guía de Aprendizaje: English Level 5
Christian Pardo
Noch keine Bewertungen
PHD Thesis On Development Economics
Dokument6 Seiten
PHD Thesis On Development Economics
leslylockwoodpasadena
100% (2)
AESTHETIC THEORY PG 1-213
Dokument206 Seiten
AESTHETIC THEORY PG 1-213
kristanto
50% (2)
Write Your Cover Letter
Dokument6 Seiten
Write Your Cover Letter
jkar_9
Noch keine Bewertungen
Of Nothingness
Dokument19 Seiten
Of Nothingness
EVYA
Noch keine Bewertungen
Chap 7 Management
Dokument7 Seiten
Chap 7 Management
shivani514
Noch keine Bewertungen
Avtc 3-Toeic Course Syllabus
Dokument10 Seiten
Avtc 3-Toeic Course Syllabus
QUY PHAN
Noch keine Bewertungen
Oral Script
Dokument7 Seiten
Oral Script
Ardent Bautista
Noch keine Bewertungen
Methods of Teaching English As A Foreign Language Task 2
Dokument7 Seiten
Methods of Teaching English As A Foreign Language Task 2
IVONNE GARCIA
Noch keine Bewertungen
CS510-MidTerm MCQs With Reference Solved by Arslan
Dokument6 Seiten
CS510-MidTerm MCQs With Reference Solved by Arslan
Saddam Hossein
Noch keine Bewertungen
Assignment English Sem 1
Dokument13 Seiten
Assignment English Sem 1
Lacksumiy Murthy
Noch keine Bewertungen
Japanese Script
Dokument67 Seiten
Japanese Script
Brat_prince
Noch keine Bewertungen
Skema Trial English Johor K2
Dokument4 Seiten
Skema Trial English Johor K2
Genius Unik
Noch keine Bewertungen
Sample IB Diploma Report Card: Powering Your Curriculum
Dokument7 Seiten
Sample IB Diploma Report Card: Powering Your Curriculum
Lorraine Sabbagh
100% (1)
Exercise On Reported Speech - 10 - 10
Dokument1 Seite
Exercise On Reported Speech - 10 - 10
Gaby Huerta
Noch keine Bewertungen
Design and Development of Production Monitoring System
Dokument83 Seiten
Design and Development of Production Monitoring System
ammukhan khan
Noch keine Bewertungen
Course Outline - English For Business Communication (20-24)
Dokument13 Seiten
Course Outline - English For Business Communication (20-24)
Cao Chau
Noch keine Bewertungen
2 Abbasali Resums
Dokument3 Seiten
2 Abbasali Resums
AB Momin
Noch keine Bewertungen
Technical Seminar: Sapthagiri College of Engineering
Dokument18 Seiten
Technical Seminar: Sapthagiri College of Engineering
Parinitha B S
Noch keine Bewertungen
Shm04 PP Manual
Dokument9 Seiten
Shm04 PP Manual
Marcelo Vieira
100% (1)
Escapism in Relation To Leisure Activity - Priyambada
Dokument56 Seiten
Escapism in Relation To Leisure Activity - Priyambada
Priyambada Bhagawati
100% (1)
09 - Chapter 2 PDF
Dokument43 Seiten
09 - Chapter 2 PDF
Irizh Danielle
Noch keine Bewertungen
ADM 501 - Organization Behavior Slaids
Dokument18 Seiten
ADM 501 - Organization Behavior Slaids
Hazwan Mustafa
Noch keine Bewertungen
Attachment Style and Its Influence On Aggression-J.E.Bloodworth
Dokument11 Seiten
Attachment Style and Its Influence On Aggression-J.E.Bloodworth
Mihaela Balion
Noch keine Bewertungen
Kriteria Pentaksiran Persembahan Powerpoint
Dokument4 Seiten
Kriteria Pentaksiran Persembahan Powerpoint
Noraidah Johari
Noch keine Bewertungen
Research Proposal
Dokument13 Seiten
Research Proposal
PRAVEEN CHAUDHARY
Noch keine Bewertungen
Allama Iqbal Open University, Islamabad: (Department of Educational Planning, Policy Studies & Leadership)
Dokument3 Seiten
Allama Iqbal Open University, Islamabad: (Department of Educational Planning, Policy Studies & Leadership)
ubaid sajid
Noch keine Bewertungen