Beruflich Dokumente
Kultur Dokumente
2) Five numbers are given: (5, 10, 15, 5, 15). Now, what would be the sum of deviations of
individual data points from their mean?
A) 10
B)25
C) 50
D) 0
3) If a positively skewed distribution has a median of 50, which of the following statement is
true?
A) Mean is greater than 50
B) Mean is less than 50
C) Mode is less than 50
D) Both A and C
5) Which of these measures are used to analyze the central tendency of data?
A) Mean and Normal Distribution
B) Mean, Median and Mode
C) Mode, Alpha & Range
D) Standard Deviation, Range and Mean
6) Which of the following is true about below given histogram?
7) The standard normal curve is symmetric about 0 and the total area under it is 1.
A) Yes
B) No
C) Sometimes
D) Can’t say
10) A parcel of 12 books contains 4 books with loose binding. What is the probability that a
random selection of 6 books (without replacement) will contain 3 books with loose binding?
A) 0.24
B) 0.50
C) 0.26
D) 0.48
11) Seventy percent of the letters received by the popular T.V. program ‘SAHELI’ are written by
ladies. What is the probability that exactly 2 prize awards out of 5 are bagged by ladies?
A) 0.2312
B) 0.1323
C) 0.2854
D) 0.1524
12) The average number of misprints per page of a book is 1.5. Find the probability that a
particular book is free from misprints.
A) 0
B) 0.25
C) 0.22
D) 0.15
13) A personnel officer knows that about 20% of the applicants for a certain position are
suitable for the job. What is the probability that the 5th person interviewed will be the first one
who is suitable?
A) 0.082
B) 0.072
C) 0.080
D) 0.062
14) Find the probability that a person tossing a fair coin gets third head at seventh toss.
A) 0.3
B) 0.2734
C) 0.1563
D) 0.1256
15) What happens to the confidence interval when we introduce some outliers to the data?
A) Confidence interval is robust to outliers
B) Confidence interval will increase with the introduction of outliers.
C) Confidence interval will decrease with the introduction of outliers.
D) We cannot determine the confidence interval in this case.
16) A medical doctor wants to reduce blood sugar level of all his patients by altering their diet.
He finds that the mean sugar level of all patients is 180 with a standard deviation of 18. Nine of
his patients start dieting and the mean of the sample is observed to 175. Now, he is considering
to recommend all his patients to go on a diet.
Note: He calculates 99% confidence interval.
What is the standard error of the mean?
A) 9
B) 6
C) 7.5
D) 18
18) What is the relationship between significance level and confidence level?
A) Significance level = Confidence level
B) Significance level = 1- Confidence level
C) Significance level = 1/Confidence level
D) Significance level = sqrt (1 – Confidence level)
19) If a mechanic looks at your car engine and says there is nothing wrong with it and your car
breaks down when you leave the garage, what type of error did the mechanic make?
A) Type I
B) Type II
C) Systematic error
D) Matrix error
22) In univariate linear least squares regression, relationship between correlation coefficient
and coefficient of determination is ______ ?
A) Both are unrelated False
B) The coefficient of determination is the coefficient of correlation squared
C) The coefficient of determination is the square root of the coefficient of correlation
False
D) Both are same F
23) We have a linear regression equation (Y = 5X +40) for the below table.
X Y
5 45
6 76
7 78
8 87
9 79
Which of the following is a MAE (Mean Absolute Error) for this linear model?
A) 8.4
B) 10.29
C) 42.5
D) None of the above
24) Suppose you are given 7 Scatter plots 1-7 (left to right) and you want to compare Pearson
correlation coefficients between variables of each scatter plot.
1. 1<2<3<4
2. 1>2>3 > 4
3. 7<6<5<4
4. 7>6>5>4
Which of the above is in the right order?
A) 1 and 3
B) 2 and 3
C) 1 and 4
D) 2 and 4
25) The method of least squares dictates that we choose a regression line where the sum of the
square of deviations of the points from the line is:
A) Maximum
B) Minimum
C) Zero
D) Positive
26) The assumption that the variance of the residuals about the predicted dependent variable
scores should be the same for all predicted scores reflects which assumption?
A) Normality
B) Homoscedasticity
C) Singularity
D) Multicollinearity
27) The percent of total variation of the dependent variable Y explained by the set of
independent variables X is measured by
A) Coefficient of Correlation
B) Coefficient of Skewness
C) Coefficient of Determination
D) Standard Error or Estimate
28)
The above output gives the Regression model results and VIF of variables, how many variables
show high multicollinearity?
A) 3
B) 4
C) 2
D) None
29) Which of the following evaluation metrics can be used to evaluate a model while modeling
a continuous output variable?
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error
30) Which of the following statement is true about outliers in Linear regression?
A) Linear regression is sensitive to outliers
B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these
33) Which of the following methods do we use to best fit the data in Logistic Regression?
A) Least Square Error
B) Maximum Likelihood
C) Jaccard distance
D) Both A and B
34) Suppose you have been given a fair coin and you want to find out the odds of getting heads.
Which of the following option is true for such a case?
A) odds will be 0
B) odds will be 0.5
C) odds will be 1
D) None of these
37) If in a dataset with 250 positives, an LogR model classifies 200 positives correctly, the
specificity is
A) 0.8
B) 0.2
C) 1.25
D) Can’t say
38) True Positive Rate is also called as
1) Specificity
2) Recall
3) Sensitivity
4) Accuracy
A) Only 3
B) Only 1
C) Both 2 and 3
D) Both 1 and 4
44) The augmented Dickey-Fuller unit root test can be used to test for
A) Normality
B) Independence
C) Stationarity
D) Invertibility
48) Which of the following machine learning algorithm can be used for imputing missing values
of both categorical and continuous variables?
A) K-NN
B) Linear Regression
C) Logistic Regression
D) None
A) 1
B) 2
C) 3
D) 4
51) The data scientists at “Mart Inc” have collected 2013 sales data for 1600 products across 10
stores in different cities. Also, certain attributes of each product based on these attributes and
store have been defined. The aim is to build a predictive model and find out the sales of each
product at a particular store during a defined period.
Which learning problem does this belong to?
A) Supervised learning
B) Unsupervised learning
C) Reinforcement learning
D) None
54) While constructing decision tree algorithms, attribute selection measures are used to
A) Select the splitting criteria that best separate the data
B) Reduce the dimensionality
C) Reduce the error rate
D) Rank attributes
55) A database of 5000 transactions was partitioned into fraudulent and non-fraudulent
transactions. A machine based learning algorithm was then deployed onto this database.
The algorithm on completion correctly labeled 75% of the actual fraudulent transactions as
fraudulent. Using this information, complete the table below and answer the question
Predicted class → / Fraudulent Non-Fraudulent Total
↓Actual class
Fraudulent 500
Non-Fraudulent A
Total 4400 5000
Clustering
56) Which of the following is required by K-means clustering?
A) defined distance metric
B) number of clusters
C) initial guess as to cluster centroids
D) All of the Mentioned
57) What is the minimum no. of variables/ features required to perform clustering?
A) 0
B) 1
C) 2
D) 3
58) What should be the best choice of no. of clusters based on the following results:
A) 1
B) 2
C) 3
D) 4
67) What is the class of the object defined by the expression x <- c(4, “a”, TRUE)?
A) Numeric
B) Character
C) Integer
D) Logical
68) If I have two vectors x <- c(1,3, 5) and y <- c(3, 2, 10), what is produced by the expression
rbind(x, y)?
A) A vector of length 2
B) a 2 by 2 matrix
C) a vector of length 3
D) a 2 by 3 matrix
69) Suppose we have a vector x <- 1:4 and y <- 2:3. What is produced by the expression x + y?
A) a numeric vector with the values 3, 5, 3, 4.
B) an integer vector with the values 3, 5, 5, 7.
C) a numeric vector with the values 1, 2, 5, 7.
D) an error.
74) How missing values and impossible values are represented in R language respectively?
A) NaN, NA
B) NA,NaN
C) NA,NULL
D)NULL, NaN