Job Classification Based On Linkedin Summaries Cs 224D

000
001
002
003 Job Classification Based on LinkedIn Summaries
004
005
CS 224D
006
007
008
009 Eric Boucher, Clement Renault
010 Stanford University
011
012
013
Abstract
014
015
016 Using data provided by HiQ Lab - a company specialized in data analysis for
017 Human Resources - we want to predict someone’s job category based on his job
018 summary. Job summaries are created by users to describe their skills and tasks.
019 Our goal is to use NLP to extract information from these free-form text fields and
020
predict the occupation of the user.
021
022
023
024 1 Introduction
025
026 Despite the existence of some classifications, there is no clear normalization of job-titles and people
027 are often using their own words to shortly describe what they are doing. Summaries provide an
028 interesting platform for testing text understanding and classification techniques.
029
030
031 2 Task Definition
032
033
As suggested earlier, job-titles appear in many variations. Thus, in our project, our first task has
034
been to classify a large number of job titles (around 20,000) into the 133 level-3 categories from the
035
ISCO 2008 Standard Classification. From there, we trained a one-hidden-layer neural network able
036 to classify the associated summaries. Ideally, we would like our neural network is able to understand
037 subtleties of language such as power relationships in order to understand hierarchical positions as
038 well.
039
040
041 3 Background and Related Work
042
043
044 We could not find previous paper which treated exactly of the same subject and tried to classify
045
summaries. However, numerous paper exist on text understanding and text classification using var-
ious machine-learning techniques. Based on what we were trying to achieve, we focused on papers
046
regarding text understanding with the help of neural networks. We thought that using a Recurrent
047
Neural Network would make sense. Other interesting techniques include extended RNN by Garen
048 Arevian and Recurrent Convolutional Neural Networks who have shown interesting results. This
049 last model developed by Siwei Lai, Liheng Xu, Kang Liu and Jun Zhao captures contextual infor-
050 mation with the recurrent structure and constructs the representation of text using a convolutional
051 neural network. On the downside, text understanding and text classification is most commonly used
052 on texts composed of complete and grammatically correct sentences such as news articles or short
053 stories. Summaries are structurally different as people often use bullet point and abbreviations to
make their points more effectively.
1
054
4 Approach
055
056
4.1 Data
057
058 We will use a dataset provided by the company HiQ Labs (http://www.hiqlabs.com). This
059 company specializes in HR analytics and has compiled a large dataset of professional profiles from
060 various public online sources. This dataset contains LinkedIn summaries of approximately 25k
061 people as well as the job titles gathered by the company about these profiles. An example of entry
062 is:
063 Senior Vice President, Financial Advisor, Portfolio Management
064
Director
"As a licensed investment professional for more than 16 years,
065
David leads a team that implements risk controlled wealth
066
management strategies for high net worth clients. Specialities:
067 Wealth Management, Portfolio Management, Certified Financial
068 Planner, Retirement Planning, Asset Allocation, Investment
069 Strategies, Estate Planning".
070
071 4.2 Label Acquisition
072
073 As suggested above, one of the big challenges we encountered was the fact that job title are also filled
074 by users. This means that many titles will be slightly different although the jobs are very similar.
075 Our first task was to regroup job titles in a smaller number of categories. To do so we considered
076 many approaches. First we thought about clustering those job titles using: vectors indicating which
077 words are in each title (binary vectors) and then using K-mean clustering on these vectors. Although
078
this last approach gave interesting results, we thought that this classification was very important for
the next steps of our project and decided to go a little further. In the end we used vectors obtained
079
as mean of word vectors of the words in each title (Google Trained wordToVec - dim 300) K-mean
080
clustering Closest occupation based on ISCO 2008 Standard (level 3 i.e. 133 categories) also taken
081 as average word vectors.
082
083 4.2.1 Results
084
085 Table with interesting results showing off the power of word vectors
086
087 Input Job Title Output Category
088 Senior Analyst Pricing Profitability and Mar- Finance Professionals
089 gin Analysis
090
Commercial Loan Officer Financial and Mathematical Associate Profes-
091
sionals
092
093 Youth and Emerging Adult Project Coordina- Legal Social Cultural and Related Associate
094 tor Professionals
095
096 Architect and Engineering Lead Data Center Science and Engineering Associate Profes-
097 and Cloud Networking sionals
098
099
100 Table 1: Sample Clustering Results
101
102 The distribution of the different classes is given in Figure 1.
103
104 4.3 Recurrent Neural Network
105
106 We have tested a good variety of models and methods on the data: at first, we tried to use a nave
107 Recurrent Neural Network on the summaries and read word by word to predict the outcome category.
At first, we decided to train word vectors on the go as we feared that the structural differences
2
108
109
110
111
112
113
114
115
116
117
118
119
120
121
Figure 1: Distribution of the different classes.
122
123
124 between the type of texts used to train commonly available word vectors and summaries would
125 deter the results. In front of poor results, we then tried to use pre-trained word-vectors (Google).
126 Unfortunately, results were still doubtful.
127
128
129
130
131
132
133
134 Figure 2: Equations of the Recurrent Neural Network
135
136
137
138
139
140
141
142
143
144
145
146
147
148
Figure 3: Sketch of the RNN at a Single Timestep
149
150
151 4.4 One Hidden Layer Neural Network
152
153 Eventually, we tried a very simple approach and settled on a basic one-hidden neural network taking
154 as entry the average of the word vectors for the summary. Surprisingly (or not), we got much
155 improved results.
156
157 5 Experiment and Results
158
159 5.1 Data and Job Classification
160
161 As mentioned earlier, classifying the labels before training the model was a much longer task than
previously expected. However, we are quite happy with the results and believe that this part would
3
162
163
164
165
166
Figure 4: Equations of the One-Hidden-Layer Neural Network
167
168
169 be worth spending more time on as it could be very useful in itself to help people classify job-titles
170 into less categories. The closest occupation method is very interesting in itself. As an exemple, one
171 could get a global overview of occupations in a country using public LinkedIn data for instance.
172 This would be very useful to get a live overview of social strata.
173
174 5.1.1 Improvements
175
Looking at the histogram of the 133 classes, we thought that it might be interesting to reduce the
176
number of classes and regroup some categories to have more homogeneously distributed labels.
177
178
179
5.2 Recurrent Neural Network
180
The results of the RNN were inconclusive. We believe that the average length of the summaries
181 created too much noise for the neural network to interpret. Implementing GRU or SLTM model
182 could improve this model significantly. Unfortunately, we did not have enough time to implement
183 these techniques.
184
185 5.3 One Hidden Layer Neural Network
186
187 We found that a middleDim of around 140 yielded good results. Our best model was able to achieve
188 31.7 percent on the training set and 29.4 percent on the dev set for 133 classes. Our baseline was to
189 classify everything in the highest category, which represented 16 percent of our dataset.
190
191
192
193
194
195
196
197
198
199
200
201
202
203
Figure 5: Training Accuracy and Test Accuracy vs middleDim
204
205
Confusion matrices for different models are plotted in Figure 6. One can clearly see that the model
206
has learnt some pattern as a diagonal starts to appear when middleDim increases. We can also see
207 that the model does not learn very well the last classes which is explained by the very small number
208 of samples that classes above 77 regroup (see Figure 1).
209
210
5.4 Final Results and Discussion
211
212 In the end, we were able to achieve 31.7 percent accuracy on 133 classes with a baseline at 16
213 percent. However, accuracy is calculated binarily (in or out) and it would be interesting to see how
214 the model is performing based on a more flexible calculation. For example, we could take the cosine
215 distance between the predicted and target categories to take into that some job categories might
actually be quite close to one-another.
4
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
(a) Training set confusion matrix for middleDim = 10 (b) Training set confusion matrix for middleDim = 60
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250 (c) Training set confusion matrix for middleDim = 120 (d) Training set confusion matrix for best model
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265 (e) Dev set confusion matrix for best model
266
267 Figure 6: Confusion Matrices Evolution
268
269
5
270
6 Conclusion
271
272
The different algorithms implemented gave mixed results. In particular, We believe that our imple-
273 mentation does not give RNN a fair chance and that changes such as implementation of GRU or
274 LSTM would improve this model significantly. This project was the occasion to witness by our-
275 selves the power of well trained word vectors. We were happily surprised to obtain such interesting
276 results on the K-means of word vectors of the title, which proves the power of well trained word
277 vectors. Adding the tf-idf as a weight could help us improve the classification process.
278
This model could be interesting for two main applications:
279
280 • Associate a description and a job-title
281
• Describe what you like and find a job. Indeed, students are often able to list activities that
282
they would like to do in their job, but are rarely informed enough to know what job title
283 would suit their needs. Similarly, job seekers might know what skills and activities from
284 their current job they would like to keep using but not what other position they could use
285 them for.
286
287 References
288
289
[1] Garen Arevian (2007) Recurrent Neural Networks for Robust Real-World Text Classification.
IEEE/WIC/ACM International Conference on Web Intelligence , Issue Date: 2-5 Nov. 2007
290
291 [2] Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao (2013) Recurrent Convolutional Neural Networks for Text
292
Classification. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence
293 [3] Ilya Sutskever (2013) Training Recurrent Neural Networks. PHD Thesis for the University of Toronto
294 [4] Richard Socher (2015) CS224D - Deep Learning for Natural Language Processing - Stanford University
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323

Job Classification Based On Linkedin Summaries Cs 224D

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Job Classification Based On Linkedin Summaries Cs 224D

Hochgeladen von

Copyright:

Verfügbare Formate

000

Das könnte Ihnen auch gefallen