Beruflich Dokumente
Kultur Dokumente
(https://www.facebook.com/AnalyticsVidhya) (https://twitter.com/analyticsvidhya)
(https://plus.google.com/+Analyticsvidhya/posts)
(https://www.linkedin.com/groups/Analytics-Vidhya-Learn-everything-about-5057165)
intelligence-big-data/)
(https://www.analyticsvidhya.com)
(https://datahack.analyticsvidhya.com/contest/the-young-optimization-crackerjack-contest
/?utm_source=AVhome_top)
vopani
1 (https://datahack.ana
/user/profile/Rohan
SRK
2 (https://datahack.ana
MACHINE LEARNING (HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/CATEGORY/MACHINE-
/user/profile/SRK
LEARNING/)
binga
3 (https://datahack.ana
/user/profile/binga
1 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
Aayushmnit
4 (https://datahack.ana
(https://www.analyticsvidhya.com ) /user/profile/aayush
Mark Landry
5 (https://datahack.ana
/user/profile/mark12
More Rankings
(http://datahack.analyticsvidhya.com
/users)
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
/?utm_source=AVblog_sidebottom )
(http://www.greatlearning.educat
Machine learning and data science are being looked as the
/analytics
drivers of the next industrial revolution happening in the world
/?utm_source=avm&
today. This also means that there are numerous
utm_medium=avmbanner&
utm_campaign=pgpba+bda)
2 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
/?utm_source=AVblog_sidebottom )
Essentials of Machine
Learning Algorithms
(with Python and R
Codes)
(https://www.analyticsvidhya.
/blog/2017
/09/common-machine-
learning-algorithms/)
A Complete Tutorial to
(https://datahack.analyticsvidhya.com/contest
Learn Data Science with
/the-young-optimization-crackerjack-contest
Python from Scratch
/?utm_source=AVblog_sidebottom )
(https://www.analyticsvidhya.
/blog/2016
Processing a high dimensional data on a limited
/01/complete-tutorial-
memory machine is a strenuous task, your interviewer would
learn-data-science-
be fully aware of that. Following are the methods you can use python-scratch-2/)
to tackle such situation: 7 Types of Regression
Techniques you should
1. Since we have lower RAM, we should close all other know!
applications in our machine, including the web browser, so (https://www.analyticsvidhya.
that most of the memory can be put to use. /blog/2015
2. We can randomly sample the data set. This means, we can /08/comprehensive-
create a smaller data set, let’s say, having 1000 variables and guide-regression/)
300000 rows and do the computations.
11 most read Deep
3. To reduce dimensionality, we can separate the numerical and
Learning Articles from
3 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
categorical variables and remove the correlated variables. For Analytics Vidhya in 2017
numerical variables, we’ll use correlation. For categorical (https://www.analyticsvidhya.
variables, we’ll use chi-square test. /blog/2017/12/11-
(https://www.analyticsvidhya.com )
4. Also, we can use deep-learning-analytics-
vidhya-2017/)
and pick the components which can explain Understanding Support
the maximum variance in the data set. Vector Machine
5. Using online learning algorithms like Vowpal Wabbit (available algorithm from
in Python) is a possible option. examples (along with
6. Building a linear model using Stochastic Gradient Descent is code)
also helpful. (https://www.analyticsvidhya.
7. We can also apply our business understanding to estimate /blog/2017
which all predictors can impact the response variable. But, /09/understaing-
this is an intuitive approach, failing to identify useful predictors
(https://datahack.analyticsvidhya.com/contest support-vector-
might result in significant loss of information.
/data-engineering-talent-hunt-hackathon machine-example-
/?utm_source=AVblog_sidebottom code/)
For point 4 & 5, make sure you read about
A Complete Tutorial on
Tree Based Modeling
&
from Scratch (in R &
Python)
(https://www.analyticsvidhya.
. These are advanced methods. /blog/2016
/04/complete-tutorial-
tree-based-modeling-
scratch-in-python/)
10 Data Science,
(https://datahack.analyticsvidhya.com/contest Machine Learning and AI
/the-young-optimization-crackerjack-contest Podcasts You Must
/?utm_source=AVblog_sidebottom ) Listen To
(https://www.analyticsvidhya.
/blog/2018/01/10-
data-science-machine-
learning-ai-podcasts-
Yes, rotation (orthogonal) is necessary because it
must-listen/)
maximizes the difference between variance captured by the
6 Easy Steps to Learn
component. This makes the components easier to interpret. Not
Naive Bayes Algorithm
to forget, that’s the motive of doing PCA where, we aim to (with codes in Python
select fewer components (than features) which can explain the and R)
maximum variance in the data set. By doing rotation, the (https://www.analyticsvidhya.
relative location of the components doesn’t change, it only /blog/2017/09/naive-
changes the actual coordinates of the points. bayes-explained/)
4 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
Know more:
(https://datahack.analyticsvidhya.com/contest
This question has enough hints for you to start
/data-engineering-talent-hunt-hackathon
thinking! Since, the data is spread across median, let’s assume
/?utm_source=AVblog_sidebottom )
it’s a normal distribution. We know, in a normal distribution,
~68% of the data lies in 1 standard deviation from mean (or
mode, median), which leaves ~32% of the data unaffected.
Therefore, ~32% of the data would remain unaffected by
missing values. (https://www.analyticsvidhya.com
/blog/2018/02/natural-
language-processing-
for-beginners-using-textblob/
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
5 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
Know more:
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
/?utm_source=AVblog_sidebottom )
. (https://www.analyticsvidhya.com
/blog/2018/02/introductory-
naive Bayes is so ‘naive’ because it assumes that all of guide-regularized-greedy-
the features in a data set are equally important and forests-rgf-python/)
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
6 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://datahack.analyticsvidhya.com/contest
Time series data is known to posses linearity. On the
/data-engineering-talent-hunt-hackathon
other hand, a decision tree algorithm
/?utm_source=AVblog_sidebottom ) is known to work best to
detect non – linear interactions. The reason why decision tree
failed to provide robust predictions because it couldn’t map the
linear relationship as good as a regression model did.
Therefore, we learned that, a linear regression model can
provide robust prediction given the data set satisfies its
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
7 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(http://www.twitter.com
/analyticsvidhya)
FOLLOWERS
(http://www.facebook.com
(https://datahack.analyticsvidhya.com/contest /Analyticsvidhya)
FOLLOWERS
/data-engineering-talent-hunt-hackathon
/?utm_source=AVblog_sidebottom ) (https://plus.google.com
Low bias occurs when the model’s predicted values /+Analyticsvidhya)
are near to actual values. In other words, the model becomes SUBSCRIBE
8 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
9 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://datahack.analyticsvidhya.com/contest
kmeans algorithm partitions a data set into clusters such that a
/data-engineering-talent-hunt-hackathon
cluster formed is homogeneous and the points in each cluster
/?utm_source=AVblog_sidebottom )
are close to each other. The algorithm tries to maintain enough
separability between these clusters. Due to unsupervised
nature, the clusters have no labels.
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
Know more:
10 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
11 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
Know more:
(https://www.analyticsvidhya.com )
Know more:
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
12 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
Know more:
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
Following are the methods of variable selection you
/?utm_source=AVblog_sidebottom )
can use:
13 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
The fundamental difference is, random forest uses
/?utm_source=AVblog_sidebottom )
bagging technique to make predictions. GBM uses boosting
techniques to make predictions.
Know more:
14 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
15 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
In case of linearly
separable data, convex hull
represents the outer
boundaries of the two group of
data points. Once convex hull is
created, we get maximum
margin hyperplane (MMH) as a perpendicular bisector between
two convex hulls. MMH is the line which attempts to create
greatest separation between two groups.
16 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
Neither.
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
In time series problem, k fold can be troublesome because
/?utm_source=AVblog_sidebottom )
there might be some pattern in year 4 or 5 which is not in year
3. Resampling the data set will separate these trends, and we
might end up validation on past years, which is incorrect.
Instead, we can use forward chaining strategy with 5 fold as
shown below:
17 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
Know more:
18 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
/?utm_source=AVblog_sidebottom )
19 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://datahack.analyticsvidhya.com/contest
We don’t use manhattan distance because
/data-engineering-talent-hunt-hackathon
it calculates distance horizontally
/?utm_source=AVblog_sidebottom ) or vertically only. It has
dimension restrictions. On the other hand, euclidean metric can
be used in any space to calculate distance. Since, the data
points can be present in any dimension, euclidean distance is a
more viable option.
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
20 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
Know more:
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
21 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
For better predictions, categorical variable can be
/?utm_source=AVblog_sidebottom )
considered as a continuous variable only when the variable is
ordinal in nature.
22 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
You might have been able to answer all the questions, but the
real value is in understanding them and generalizing your
knowledge on similar questions. If you have struggled at these
questions, no worries, now is the time to learn and not perform.
23 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom
24 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
https://www.analyticsvidhya.com
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
/?utm_source=AVblog_sidebottom
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
25 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
/?utm_source=AVblog_sidebottom )
(https://www.analyticsvidhya.com
/blog/author
/avcontentteam/)
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
This is article is quiet old now and you might not get a prompt
response from the author. We would request you to post this
comment on Analytics Vidhya
(https://discuss.analyticsvidhya.com/) to get your queries
resolved.
26 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
/?utm_source=AVblog_sidebottom )
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
Hi Gianni
/?utm_source=AVblog_sidebottom )
Good to know, you found them helpful! All the
best.
27 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
Good collection compiled by you Mr Manish ! Kudos !
it will be very useful to the budding data
whether they face start-ups or established
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
Hi Prof Ravi, You are right. These questions can
/?utm_source=AVblog_sidebottom )
be asked anywhere. But, with growth in
machine learning startups, facing off ML
algorithm related question have higher
chances, though I have laid emphasis on
statistical modeling as well.
(https://datahack.analyticsvidhya.com/contest
Thank you Manish.Helpful for Beginners like me.
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
Welcome
28 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
/?utm_source=AVblog_sidebottom )
Hi Chibole,
True, statistics in an inevitable part of machine
learning. One needs to understand statistical
concepts in order to master machine learning.
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
I was wondering, do you recommend
/?utm_source=AVblog_sidebottom )
for somebody to special in a specific
field of ML? I mean, it is
recommended to choose between
supervised learning and
unsupervised learning algorithms,
and simply say my specialty is this
during an interview. Shouldn’t
organizations recruiting specify their
specialty requirements too?
29 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
Hi Chibole,
It’s always a good thing to
establish yourself as an
expert in a specific field.
This helps the recruiter to
understand that you are a
(https://datahack.analyticsvidhya.com/contest detailed oriented person. In
/data-engineering-talent-hunt-hackathon machine learning, thinking
/?utm_source=AVblog_sidebottom ) of building your expertise
in supervised learning
would be good, but
companies want more than
that. Considering, the
variety of data these days,
they want someone who
can deal with unlabeled
data also. In short, they
look for someone who isn’t
just an expert in operating
(https://datahack.analyticsvidhya.com/contest Sniper Gun, but can use
/the-young-optimization-crackerjack-contest other weapons also if
/?utm_source=AVblog_sidebottom ) needed.
* stastics = Statistics
30 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
/?utm_source=AVblog_sidebottom )
Hi, really an interesting collection of answers. From a
merely statistical point of view there are some
imprecisions (e.g. Q40), but it is surely useful for job
interviews in startups and bigger firms.
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
Hi Nicola,
/?utm_source=AVblog_sidebottom
Thanks for )sharing your thoughts. Tell me more
about Q40. What’s about it?
31 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
Don’t bother…..Noted …..you assumed normal distribution….
Hi Amit,
Thanks for your encouraging words! The
purpose of this article is to help beginners
understand the tricky side of ML interviews.
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
Dear Kunal,
Rgds
32 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
Hi Manish,
(https://www.analyticsvidhya.com )
Great Job! It is a very good collection of interview
s on machine learning. It will be a great help if
also publish a similar article on statistics. Thanks
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
Hi Sampath,
/?utm_source=AVblog_sidebottom )
Thanks for your suggestion. I’ll surely consider
it in my forthcoming articles.
Hi Manish,
33 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
34 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
d Feature Construction.
Hi Manish ,
After going through these question I feel I am at 10% of
(https://datahack.analyticsvidhya.com/contest
knowledge required to pursue career in Data Science .
/the-young-optimization-crackerjack-contest
Excellent Article to read.
/?utm_source=AVblog_sidebottom ) Can you Please suggest me any
book or training online which gives this much deep
information . Waiting for your reply in anticipation . Thanks
a million
35 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
An awesome article for reference. Thanks a ton Manish sir
are. Please share the pdf format of this blog
ssible. Have also taken note of Karthi’s input!
36 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://www.analyticsvidhya.com )
(https://datahack.analyticsvidhya.com/contest
/data-engineering-talent-hunt-hackathon
/?utm_source=AVblog_sidebottom )
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
37 of 38 13-02-2018 02:18 PM
40 Interview Questions asked at Startups in Machine Learning / Data S... https://www.analyticsvidhya.com/blog/2016/09/40-interview-questions...
(https://plus.google.com
(https://www.linkedin
/+Analyticsvidhya)
/m/login/)
>
© Copyright 2013-2018 Analytics Vidhya. Privacy Policy (https://www.analyticsvidhya.com/privacy-policy/) Don't have an account? Sign up
Terms of Use (https://www.analyticsvidhya.com/terms/)
Refund Policy (https://www.analyticsvidhya.com/refund-policy/)
(https://datahack.analyticsvidhya.com/contest
/the-young-optimization-crackerjack-contest
/?utm_source=AVblog_sidebottom )
38 of 38 13-02-2018 02:18 PM