Beruflich Dokumente
Kultur Dokumente
Sohail Manzoor
(17-MS-SE-19)
Muhammad Zeshan
(17-MS-SE-02)
Hadeed Ullah
(17-MS-SE-08)
Outline
2
Goal of thispresentation
3
Text Classification at Walmart
4
Steps of TextClassification
Read Documents Read Documents
Network Design
Feature Selection CNN, RNN, number oflayers
Informational Gain (IG),Chi-
square, odds ratio
Learning Algorithm
Naïve bayes, logisticregression,
SVM, decision trees
• More number of steps and several choices for each • Less number of steps
step
• Right choices are wellestablished
• Major time is spent on feature engineering • Major time is spent on parameter tuning
• Easy to serve model in realtime • Real time serving of model can bechallenging
6
Why to use DeepLearning in Text Classification
• Create uniform approach for all kind of data (image, video, voice, text)
• Enables multi-modal learning from text andimage
7
Deep Neural Networks for Text: RNNor CNN
• CNN extractfeatures
• Works well where feature detection is important (e.g. Sentiment classification, positive/negative review
classification)
• Historically RNN has outperformed CNN where length of the document is important (e.g.
language translation)
• But RNN takes longer to train due to its sequential nature
• Recent research shows CNN can outperform RNN accuracy on language translation
https://code.facebook.com/posts/1978007565818999/a-novel-approach-to-neural-machine-translation/
8
CNN Architectures for TextClassification
1. Character-level CNN
• Zhang, X. et al. Character-level Convolutional Networks for Text Classification, 2015,
https://arxiv.org/pdf/1509.01626.pdf
• Absolutely no preprocessing ofinput
• More familiar Deep CNNarchitecture
• convolution – max pooling layers followed by fully connected layers
2. Word-level CNN
• Kim, Y.Convolutional Neural Networks for Sentence Classification, EMNLP 2014,https://arxiv.org/pdf/1408.5882.pdf
• Only word tokenization is used as preprocessing
• Uses max-pooling across theinput
9
Character-level CNN
• Slow to train
• Slow during inference, more than 100 millisecond on a P100 GPU
• Achieves 79% accuracy on thetest set
11
Word-level CNN
tokens around “moisturizing cream” weighted highto tokens around “dress sandal” weighted high to
categorize under “Personal Care/ Bath & Body” categorize under “Clothing/Shoes”. Also the bran1d4
“mojo moxy” which makes shoes got highweight
Word Embedding
Accuracy vs steps
16
Parameter Tuning
Method Accuracy
Baseline 85.20%
More filters ofsize [2, 3, 4, 5, 6] 85.50%
Dropout probability from 0.5 increased to 0.75 85.97%
Batch size 2048 instead of512 84.91%
Batch size 64 instead of512 79.00%
17
Scaling
Processor Word-CNN Char-CNN
Word-CNN Char-CNN
4-8 millisecond >100 millisecond
- More than 60% of the time was spend in preparing the next batch of Word-CNN on a P100
- Batch preparation can be done inparallel
- Tensorflow reader can possibly be of great help
- Tensorflow compiled for SSE, AVX2 and FMA can be 4-8x faster
- Word-CNN training can be completed in 4-5 hours on 10s of millions of examples on a CPU
19
Comparison against SVM
• SVM with unigram + bigram features also achieves with 85% accuracy with training on
1/10th of the data
• Stochastic gradient descent on full data does not achieve more than 80% accuracy after
same number of epochs
20
Conclusion
• It is promising to see CNN achieving state of the art accuracy on a very well studied
problem with very littleeffort
• And the field is rapidly makingprogress
• Hopefully much higher accuracysoon!!!
21
We are Hiring!!!
https://www.linkedin.com/in/somnath-banerjee
22