Sie sind auf Seite 1von 3

Inferential Problem Examples:

1. To test if developed regions of state give significantly better service


delivery

quality

than

underdeveloped

region.

(Comparison,

Explanatory)
2. To find which model is good for ecommerce: Marketplace or Inventoryled (Comparison)
3. To determine the number of math classes attended on average by
graduating batches in BITS during their four years in college.
(Summarization)
4. To find if men

get

higher

wages

than

women.

(Comparison,

Explanatory)
5. To find the revenue growth by product line and average revenue per
square foot of retail space. (Summarization)
6. To test a new medicine on a random sample of patients to see how
effective it is at curing the disease (Comparison).
7. To compare performance with competitors by time, geography and
category. (Comparison)

Recommender Systems
Input: Since you are looking at this
Output: you might also look at
Rating prediction: item and user are fixed
Item Prediction: user is fixed
#Collaborative filtering knn, association rules based prediction, matrix
factorization
user based knn find like-minded users
Average rating given by one user matters because he can be either sadist or
too liberal

Association rule based CF: fuzzy rules,


Metrics: Confidence and support.
Confidence is if x and y have happened 100 times, 90 times z has also
happened
Term-document matrix
Missing values are not zeros, they are unknowns could be filled by avg or
some other statistics
SVD To reduce dimensionality. To eliminate bias, we try to bring averages to
zero
# High support Rules vs correlation of rare items
Confidence very high but support low in rare items
How to find K similar images?
As dimension increase, less no. of neighbors in near space.

# How to find similar documents?


We want a set of overlapping words from documents which are basically
sequence of words.
Jaccard similarity between two sets: Document is represented by set of
shingles.
LSH Intuition: Two points which are close in high dimension will be close in
smaller dimensions.
#Hashing
Randomly permute rows. Sigma has new value of storage.
H(D) is set of rows where the first permuted 1 is found for every column (Dn).
If Di and Dj are hashed to same value, they resemble.

#Sentiment Analysis
Quantify qualitative things
Unhappy is a key
Goal: understand psychology, lot of scope as existing technologies not
sufficient
Tokenizing: white space,
#Semi-supervised learning of lexicons
Manual labeling tough. People have actually added word manually. But
parsing problem is there if none of these words are found in reviews/
So can we start with few manual and let the system learn it more?
And: same polarity
But: opposite polarity

Das könnte Ihnen auch gefallen