Sie sind auf Seite 1von 26

1. Differentiate between Quantitative & Qualitative Research.

What are the

main stages of Quantitative Research?
Qualitative Research
Qualitative research is one which provides insights and understanding of the problem
setting. It is an unstructured, exploratory research method used to gain an in-depth
understanding of human behavior, experience, attitudes, intentions, and motivations, on
the basis of observation and interpretation, to find out the way people think and feel.
Quantitative Research
Quantitative research is a form of research that relies on the methods of natural sciences,
which produces numerical data and hard facts. It aims at establishing cause and effect
relationship between two variables by using mathematical, computational and statistical
Quantitative research Qualitative research*
The survey sample group is a large number The survey sample group is a small number of
of respondents respondents
It is carried out largely by questionnaires It is carried out largely by personal interviews
It examines the issues marginally It examines the issues in depth
Time undemanding Time demanding
Deduction* from results Induction* from results
Statistical data processing Non-statistical data processing
Qualitative Research Process

1. State the research problem

Often stated as a question, the problem should be focused narrowly on the problem being
studied. For example: "What is the optimal time for taking a rectal temperature with a
digital thermometer?"
2. Define the purpose of the study
The purpose explains "why" the problem is important and what use the findings will be.
3. Review related literature
The literature review provides information about what is already known, provides
information about concepts, and how the concepts have been measured. It also identifies
gaps in knowledge that will be studied.
4. Formulate hypotheses and variables
Hypotheses are statements about two or more concepts or variables. Variables are
concepts of varying levels of abstraction that are measured, manipulated, or controlled
in a study
5. Select the research design
The design is a carefully determined, systematic, and controlled plan for finding answers
to the question of the study. This provides a "road map" for all aspects of the study,
including how to collect and analyze the data.
6. Select the population and sample
The population is the group to be studied. The sample refers to specific people or events
in the population from which data will be collected.
7. Collect the data
Sources of data may include people, literature, documents, and findings (for example,
from sources such as laboratory data or measurements of vital signs).Data may be
collected from interviews, questionnaires, direct measurement, or examinations (such as
physical or psychological tests)
8. Analyze the data
Statistical procedures are used to analyze the data and provide answers to the research
9. Communicate findings and conclusions
Through publications and presentations, the researcher explains the results of the study
and links them to the existing body of knowledge in the literature. The researcher also
describes the implications of the study and suggests directions for further research.
How do econometricians proceed in their analysis of an economic problem?
That is, what is their methodology? Although there are several schools of thought
on econometric methodology, we present here the traditional or classical
methodology, which still dominates empirical research in economics and other
social and behavioral sciences.
Broadly speaking, traditional econometric methodology proceeds along the
following lines:
1. Statement of theory or hypothesis.
2. Specification of the mathematical model of the theory
3. Specification of the statistical, or econometric, model
4. Obtaining the data
5. Estimation of the parameters of the econometric model
6. Hypothesis testing
7. Forecasting or prediction
8. Using the model for control or policy purposes.

To illustrate the preceding steps, let us consider the well-known Keynesian theory
of consumption.

1. Statement of Theory or Hypothesis

Keynes stated:
The fundamental psychological law . . . is that men [women] are disposed, as a rule
and on average, to increase their consumption as their income increases, but not as
much as the increase in their income.
In short, Keynes postulated that the marginal propensity to consume (MPC), the rate of
change of consumption for a unit (say, a dollar) change in income, is greater than
zero but less than 1.

2. Specification of the Mathematical Model of Consumption

Although Keynes postulated a positive relationship between consumption and

income, he did not specify the precise form of the functional relation-ship between
the two. For simplicity, a mathematical economist might suggest the following form
of the Keynesian consumption function:
Y=β1+β2X 0<β2<1 (I.3.1)
Where Y= consumption expenditure and X=income, and where β1andβ2, Known as
the parameters of the model, are, respectively, the intercept and slope coefficients.
The slope coefficientβ2measures the MPC. Geometrically, Eq. (I.3.1) is as shown in
Figure I.1. This equation, which states that consumption is

Linearly related to income, is an example of a mathematical model of the

relationship between consumption and income that is called the consumption
function in economics. A model is simply a set of mathematical equations.
If the model has only one equation, as in the preceding example, it is called a single
equation model, whereas if it has more than one equation, it is known as a multiple-
equation model (the latter will be considered later in the book).
In Eq. (I.3.1) the variable appearing on the left side of the equality sign is called the
dependent variable and the variable(s) on the right side are called the independent,
or explanatory, variable(s). Thus, in the Keynesian consumption function, Eq. (I.3.1),
consumption (expenditure) is the dependent variable and income is the explanatory

3. Specification of the Econometric Model of Consumption

The purely mathematical model of the consumption function given in Eq. (I.3.1) is of
limited interest to the econometrician, for it assumes that there is an exact or
deterministic relationship between consumption and income. But relationships
between economic variables are generally inexact.
Thus, if we were to obtain data on consumption expenditure and disposable (i.e.,
after tax) income of a sample of, say, 500 American families and plot these data on
a graph paper with consumption expenditure on the vertical axis and disposable
income on the horizontal axis, we would not expect all 500 observations to lie exactly
on the straight line of Eq. (I.3.1) because, in addition to income, other variables
affect consumption expenditure. For ex-ample, size of family, ages of the members
in the family, family religion, etc. are likely to exert some influence on consumption.
To allow for the inexact relationships between economic variables, the
econometrician would modify the deterministic consumption function (I.3.1) as
Y=β1+β2X+u (I.3.2)
Where u, known as the disturbance, or error, term, is a random (stochastic) variable
that has well-defined probabilistic properties. The disturbance term u may well
represent all those factors that affect consumption but are not taken into account
Equation (I.3.2) is an example of an econometric model. More technically, it is an
example of a linear regression model, which is the major concern of this book. The
econometric consumption function hypothesizes that the dependent variable Y
(consumption) is linearly related to the explanatory variable X(income) but that the
relationship between the two is not exact; it is subject to individual variation.
The econometric model of the consumption function can be depicted as shown in
Figure I.2.

4. Obtaining Data
To estimate the econometric model given in (I.3.2), that is, to obtain the numerical
values of β1andβ2, we need data. Let us look at the data given in Table I.1, which
5. Estimation of the Econometric Model
Now that we have the data, our next task is to estimate the parameters of the
consumption function. The numerical estimates of the parameters give empirical
content to the consumption function. The actual mechanics of estimating the
parameters will be discussed in Chapter 3. For now, note that the statistical
technique of regression analysis is the main tool used to obtain the estimates. Using
this technique and the data given in Table I.1, we obtain the following estimates of
β1andβ2, namely, −184.08 and 0.7064.
Thus, the estimated consumption function is:

ˆ Y=−184.08+0.7064Xi (I.3.3)
The hat on the Y indicates that it is an estimate.
The estimated consumption function (i.e., regression line) is shown in Figure I.3.
*As a matter of convention, a hat over a variable or parameter indicates that it is an estimated value.
As Figure I.3 shows, the regression line fits the data quite well in that the data points
are very close to the regression line. From this figure we see that for the period
1982–1996 the slope coefficient (i.e., the MPC) was about 0.70, suggesting that for
the sample period an increase in real income of 1 dollar led, on average, to an
increase of about 70 cents in real consumption expenditure.
We say on average because the relationship between consumption and income is
inexact; as is clear from Figure I.3; not all the data points lie exactly on the regression
line. In simple terms we can say that, ac-cording to our data, the average, or mean,
consumption expenditure went up by about 70 cents for a dollar’s increase in real

6. Hypothesis Testing
Assuming that the fitted model is a reasonably good approximation of reality, we
have to develop suitable criteria to find out whether the estimates obtained in, say,
Eq. (I.3.3) are in accord with the expectations of the theory that is being tested.
According to “positive” economists like Milton Friedman, a theory or hypothesis that
is not verifiable by appeal to empirical evidence may not be admissible as a part of
scientific enquiry.
As noted earlier, Keynes expected the MPC to be positive but less than 1. In our
example we found the MPC to be about 0.70. But before we accept this finding as
confirmation of Keynesian consumption theory, we must en-quire whether this
estimate is sufficiently below unity to convince us that this is not a chance
occurrence or peculiarity of the particular data we have used. In other words, is 0.70
statistically less than 1? If it is, it may support Keynes’ theory.
Such confirmation or refutation of economic theories on the basis of sample
evidence is based on a branch of statistical theory known as statistical inference
(hypothesis testing). Throughout this book we shall see how this inference process
is actually conducted.

7. Forecasting or Prediction
If the chosen model does not refute the hypothesis or theory under consideration,
we may use it to predict the future value(s) of the dependent, or forecast, variable Y
on the basis of known or expected future value(s) of the explanatory, or predictor,
variable X.
To illustrate, suppose we want to predict the mean consumption expenditure for
1997. The GDP value for 1997 was 7269.8 billion dollars.
*Do not worry now about how these values were obtained. As we show in Chap. 3, the statistical
method of least squares has produced these estimates. Also, for now do not worry about the
negative value of the intercept.
*See Milton Friedman, “The Methodology of Positive Economics,” Essays in Positive Economics,
University of Chicago Press, Chicago, 1953.
*Data on PCE and GDP were available for 1997 but we purposely left them out to illustrate the topic
discussed in this section. As we will discuss in subsequent chapters, it is a good idea to save a portion
of the data to find out how well the fitted model predicts the out-of-sample observations.

Putting this GDP figure on the right-hand side of (I.3.3), we obtain:

ˆ Y1997= −184.0779+0.7064 (7269.8)
= 4951.3167 (I.3.4)
Or about 4951 billion dollars. Thus, given the value of the GDP, the mean, or average,
forecast consumption expenditure is about 4951 billion dollars. The actual value of
the consumption expenditure reported in 1997 was 4913.5 billion dollars. The
estimated model (I.3.3) thus over predicted the actual consumption expenditure by
about 37.82 billion dollars. We could say the forecast error is about 37.82 billion
dollars, which is about 0.76 percent of the actual GDP value for 1997. When we fully
discuss the linear regression model in subsequent chapters, we will try to find out if
Such an error is “small” or “large.” But what is important for now is to note that such
forecast errors are inevitable given the statistical nature of our analysis.
There is another use of the estimated model (I.3.3). Suppose the President decides
to propose a reduction in the income tax. What will be the effect of such a policy on
income and thereby on consumption expenditure and ultimately on employment?
Suppose that, as a result of the proposed policy change, investment expenditure
increases. What will be the effect on the economy? As macroeconomic theory
shows, the change in income following, say, a dollar’s worth of change in investment
expenditure is given by the income multiplier M, which is defined as
If we use the MPC of 0.70 obtained in (I.3.3), this multiplier becomes about M=3.33.
That is, an increase (decrease) of a dollar in investment will eventually lead to more
than a threefold increase (decrease) in income; note that it takes time for the
multiplier to work.
The critical value in this computation is MPC, for the multiplier depends on it. And
this estimate of the MPC can be obtained from regression models such as (I.3.3).
Thus, a quantitative estimate of MPC provides valuable in-formation for policy
purposes. Knowing MPC, one can predict the future course of income, consumption
expenditure, and employment following a change in the government’s fiscal policies.

8. Use of the Model for Control or Policy Purposes

Suppose we have the estimated consumption function given in (I.3.3). Suppose
further the government believes that consumer expenditure of about 4900 (billions
of 1992 dollars) will keep the unemployment rate at its
2. Differentiate between Binomial Probability Distribution and normal
Probability Distribution. What are their economic significance?


The normal (z) distribution is a continuous distribution that arises in many natural
processes. "Continuous" means that between any two data values we could (at least in
theory) find another data value. For example, men's heights vary continuously and are
the result of so many tiny random influences that the overall distribution of men's
heights in America is very close to normal. Another example is the data values that we
would get if we repeatedly measured the mass of a reference object on a pan balance—
the readings would differ slightly because of random errors, and the readings taken as a
whole would have a normal distribution.
The normal distribution is the most important distribution in statistics, since it arises
naturally in numerous applications. The key reason is that large sums of (small) random
variables often turn out to be normally distributed;


A binomial distribution is very different from a normal distribution, and yet if the sample
size is large enough, the shapes will be quite similar.
The key difference is that a binomial distribution is discrete, not continuous. In other
words, it is NOT possible to find a data value between any two data values.
The requirements for a binomial distribution are
1) The r.v. of interest is the count of successes in n trials
2) The number of trials (or sample size), n, is fixed
3) Trials are independent, with fixed value p = P (success on a trial)
4) There are only two possible outcomes on each trial, called "success" and "failure." (This
is where the "bi" prefix in "binomial" comes from. If there were several possible
outcomes, we would need to use a multinomial distribution to account for them, but we
don't study multinomial distributions in the beginning AP Statistics course.)

Normal distributions are continuous and have a special bell shape.

Binomial distributions are discrete ("stairsteppy"); they are close to normal only if the
sample size satisfies np ³ 10 and nq ³ 10.
Normal distributions arise in three general areas:
1) Natural processes where the data value (e.g., height) is the result of many small random
2) Sampling distribution of xbar, where either the underlying distribution is normal or
(more commonly) where the sample size is large enough for the CLT to take effect. Rules
of thumb are on p.606 of textbook.
3) Repeated measurement of a fixed phenomenon (e.g., the orbital period of Mars, the
mass of a moon rock, or the height of a mountain). Most phenomena cannot be measured
precisely—even if we have an accurate pan balance or laser range finder or whatever,
there will always be some uncertainty or error in our measurement. For this reason, the
normal distribution is sometimes called the "error function." However, #3 is really just a
special case of #1.
Binomial distributions arise whenever the r.v. of interest is the count of successes in a
fixed number (n) of independent trials. The four rules are listed near the beginning of the
“binomial distribution” section, before the second set of example problems.
Importance of normal distribution
1) It has one of the important properties called central theorem. Central theorem
means relationship between shape of population distribution and shape of sampling
distribution of mean. This means that sampling distribution of mean approaches normal
as sample size increase.
2) In case the sample size is large the normal distribution serves as good
3) Due to its mathematical properties it is more popular and easy to calculate.
4) It is used in statistical quality control in setting up of control limits.
5) The whole theory of sample tests t, f and chi-square test is based on the normal
Importance of Binomial probability distribution

1. National economic prediction:

Economists used binomial theorem to count probabilities that depend on
numerous and very distributed variables to predict the way the economy will
behave in the next few years.
2. Architecture:
It allows engineers, to calculate the magnitudes of the projects and thus
delivering accurate estimates of not only the costs but also time required to
construct them. For contractors, it is a very important tool to help ensuring the
costing projects is competent enough to deliver profits.
3. Weather forecasting:
Moreover binomial theorem is used in forecast services. The disaster forecast
also depends upon the use of binomial theorems..
4. Ranking of candidates:
The binomial “distribution” is popularly used to rank the candidates in many
competitive examinations.
3. (a) Differentiate between regression & Correlation?

Correlation is a statistical measure that indicates the extent to which two or

more variables fluctuate together. A positive correlation indicates the extent to which
those variables increase or decrease in parallel; a negative correlation indicates the
extent to which one variable increases as the other decreases.

A statistical technique for estimating the change in the metric dependent variable due to
the change in one or more independent variables, based on the average mathematical
relationship between two or more variables is known as regression.
Key Differences between Correlation and Regression
The points given below, explains the difference between correlation and regression in
1. A statistical measure which determines the co-relationship or association of two
quantities is known as Correlation. Regression describes how an independent
variable is numerically related to the dependent variable.
2. Correlation is used to represent the linear relationship between two variables. On
the contrary, regression is used to fit the best line and estimate one variable on
the basis of another variable.
3. In correlation, there is no difference between dependent and independent
variables i.e. correlation between x and y is similar to y and x. Conversely, the
regression of y on x is different from x on y.
4. Correlation indicates the strength of association between variables. As opposed
to, regression reflects the impact of the unit change in the independent variable
on the dependent variable.
5. Correlation aims at finding a numerical value that expresses the relationship
between variables. Unlike regression whose goal is to predict values of the
random variable on the basis of the values of fixed variable.
4. (a) Define coefficient of Multiple Determination (R2)

The coefficient of determination, R2, is used to analyze how differences in

one variable can be explained by a difference in a second variable. For example, when a
person gets pregnant has a direct relation to when they give birth. The coefficient of
determination is similar to the correlation coefficient, R. The correlation coefficient
formula will tell you how strong of a linear relationship there is between two variables. R
Squared is the square of the correlation coefficient, r (hence the term r squared). Watch
this video for a short definition of r squared and how to find it.
5. (a) Define Multi-Collinearity. What are the consequences of Multi-
Multicollinearity refers to a situation with a high correlation among the explanatory
variables within a multiple regression model. For the obvious reason it could never appear
in the simple regression model, since it only has one explanatory variable. In chapter 8 we
shortly described the consequences of including the full exhaustive set of dummy
variables created from a categorical variable with several categories. We referred to that
as to fall in the dummy variable trap. By including the full set of dummy variables, one
end up with a perfect linear relation between the set of dummies and the constant term.
When that happens we have what is called perfect multicollinearity. In this chapter we
will in more detail discuss the issue of multicollinearity and focus on what sometimes is
called imperfect multicollinearity which referrers to the case where a set of variables are
highly correlated but not perfect.
The lack of independence among the explanatory variables in a data set. It is a sample
problem and a state of nature that results in relatively large standard errors for the
estimated regression coefficients, but not biased estimates.
Types of multicollinearity
There are two types of multicollinearity:

 Structural multicollinearity is a mathematical artifact caused by creating new predictors

from other predictors — such as, creating the predictor x2 from the predictor x.
 Data-based multicollinearity, on the other hand, is a result of a poorly designed
experiment, reliance on purely observational data, or the inability to manipulate the
system on which the data are collected.
The consequences of perfect correlation among the explanatory variables is easiest
explained by an example. Assume that we would like to estimate the parameters of the
following model:

Where X1 is assumed to be a linear combination of X2 in the following way:

And where a and b are two arbitrary constants. If we substitute (11.2) into (11.1) we

Since (11.1) and (11.2) implies (11.3) we can only receive estimates of [B0 + aB2)
and (Bx + bB2). But since these two expressions contain three unknown parameters there
is no way we can receive estimates for all three parameters in (11.1). We simply need
more information, which is not available. Hence, with perfect multicollinearity it is
impossible to receive an estimate of the intercept and the slope coefficients.
This was an example of the extreme case of perfect multicollinearity, which is not very
likely to happen in practice, other than when we end up in a dummy variable trap or a
similar situation. More interesting is to investigate the consequences on the parameters
and their standard errors when high correlation is present. We will start this discussion
with the sample estimator of the slope coefficient B1 in (11.1) under the assumption that
X1 and X2 is highly correlated but not perfect. The situation for the sample estimator of
B2 is identical to that of B1 so it is not necessary to look at both. The sample estimator
for B1 is given by:

The estimator b1 is a function of r which is the correlation between Y and X1, r the
correlation between X1 and X2, rY2 the correlation between Y and X2, SY and S1 which
are the standard deviations for Y and X1 respectively.
The first thing to observe is that r appears in both the numerator and the denominator,
but that it is squared in the denominator and makes the denominator zero in case of
perfect correlation. In case of a strong correlation, the denominator has an increasing
effect on the size of the expression but since the correlation coefficient appears in the
numerator as well with a negative sign, it is difficult to say how the size of the parameter
will change, without any further assumptions. However, it can be shown that the OLS
estimators remain unbiased and consistent, which means that estimated coefficients in
repeated sampling still will center on the population coefficient. On the other hand, this
property says nothing about how the estimator will behave in a specific sample. Therefore
we will go through an example in order to shed some light on this issue.
Causes for multicollinearity can also include:

 Insufficient data. In some cases, collecting more data can resolve the issue.
 Dummy variables may be incorrectly used. For example, the researcher may fail to
exclude one category, or add a dummy variable for every category (e.g. spring,
summer, autumn, winter).
 Including a variable in the regression that is actually a combination of two other
variables. For example, including “total investment income” when total investment
income = income from stocks and bonds + income from savings interest.
 Including two identical (or almost identical) variables. For example, weight in
pounds and weight in kilos, or investment income and savings/bond income.
Next: Variance Inflation Factors.
6. (a) Differentiate between heteroskedasticity and homoscedasticity

Homoscedasticity can be also called homogeneity of variance, because

it is about a situation, when the sequence or vector of random variable
have the same finite variance. And as we probably know already –
variance measures how far a set of numbers is spread out. The
complementary notion is called heteroskedasticity, to sum up, it means

 In statistics, a sequence or a vector of random variables is

homoscedastic /ˌhoʊmoʊskəˈdæstɪk/ if all random variables in the
sequence or vector have the same finite variance.
 A collection of random variables is heteroscedastic if there are sub-
populations that have different variability’s from others.
6b. Use Goldfed & Quandt test for the detection of Multi-Collinearity
Linear programming (LP) (also called linear optimization) is a method to achieve the best
outcome (such as maximum profit or lowest cost) in a mathematical model whose
requirements are represented by linear relationships. Linear programming is a special
case of mathematical programming (mathematical optimization).
More formally, linear programming is a technique for the optimization of
a linear objective function, subject to linear equality and linear inequality constraints.
Its feasible region is a convex polytope, which is a set defined as the intersection of
finitely many half spaces, each of which is defined by a linear inequality. Its objective
function is a real-valued affine (linear) function defined on this polyhedron. A linear
programming algorithm finds a point in the polyhedron where this function has the
smallest (or largest) value if such a point exists.
Duality in Linear Programming
Definition: The Duality in Linear Programming states that every linear programming
problem has another linear programming problem related to it and thus can be derived
from it. The original linear programming problem is called “Primal,” while the derived
linear problem is called “Dual.”

Before solving for the duality, the original linear programming problem is to be
formulated in its standard form. Standard form means, all the variables in the problem
should be non-negative and “≥,” ”≤” sign is used in the minimization case and the
maximization case respectively.

The concept of Duality can be well understood through a problem given below:


Z = 50x1+30x2

Subject to:
4x1 + 3x2 ≤ 100
3x1 + 5x2 ≤ 150
X1, x2 ≥ 0

The duality can be applied to the above original linear programming problem as:


G = 100y1+150y2

Subject to:
4y1 + 3y1 ≥ 50
3y1 +5y2 ≥ 30
Y 1, y 2 ≥ 0

The following observations were made while forming the dual linear programming

1. The primal or original linear programming problem is of the maximization type while the
dual problem is of minimization type.
2. The constraint values 100 and 150 of the primal problem have become the coefficient of
dual variables y1 and y2 in the objective function of a dual problem and while the
coefficient of the variables in the objective function of a primal problem has become the
constraint value in the dual problem.
3. The first column in the constraint inequality of primal problem has become the first row
in a dual problem and similarly the second column of constraint has become the second
row in the dual problem.
4. The directions of inequalities have also changed, i.e. in the dual problem, the sign is the
reverse of a primal problem. Such that in the primal problem, the inequality sign was
“≤” but in the dual problem, the sign of inequality becomes “≥”.