Sie sind auf Seite 1von 15

INTRODUCTION TO REGRESSION Session 02

Created by: Sibashis


ANALYSIS AND TYPES OF DATA Chakraborty
ORIGIN

The law stated that,


although there was a
tendency for tall parents
to have tall children and
First introduced for short parents to have
by Francis Galton short children, the Later the law was Regression to
in his famous average height of confirmed by ‘Mediocrity’.
“Law of Universal children born of parents Karl Pearson.
Regression”. of a given height tended
to move or “regress”
toward the average
height in the population
as a whole.
MODERN INTERPRETATION OF REGRESSION

More Specifically, Regression analysis is


concerned with the study of the
In very general terms Regression is dependence of one variable, the
concerned with describing and evaluating dependent variable, on one or more other
the functional or the causal relationship variables, the explanatory variables, with
among variables. a view to estimating and/or predicting the
(population) mean or average value of the
former in terms of the known or fixed (in
repeated sampling) values of the latter.
SIMPLE REGRESSION VS MULTIPLE REGRESSION

If we are studying the dependence If we are studying the dependence


of a variable on only a single of one variable on more than one
explanatory variable, such as that explanatory variable, as in the
of consumption expenditure on real crop-yield, rainfall, temperature,
income, such a study is known as sunshine, and fertilizer examples, it
Simple, or two-variable, regression is known as Multiple regression
analysis. analysis.
REGRESSION VERSUS CAUSATION
A statistical relationship in itself cannot logically imply causation.
To understand this better let us refer to the statistical relationship
between Income and Consumption presented in our previous slide.
𝑌 = 𝛽1 + 𝛽2 𝑋 + u
A feature of the above relationship becomes apparent, that, there is
a one-way-causation between income and consumption and not the
other way around.
There is no statistical reason to assume that income does not depend
upon consumption. The fact that we treat Consumption as dependent
on Income is due to non-statistical considerations.
 “A statistical relationship, however strong and
however suggestive, can never establish causal
CONTINUED connection: our ideas of causation must come from
outside statistics, ultimately from some theory or
other.” – M.G Kendall and A. Stuart
The primary objective of Correlation analysis concerns with
measuring the strength or degree of linear association
between two variables.

In Regression analysis, we are not primarily interested in


such a measure. Instead, we try to estimate or predict the

REGRESSION VS average value of one variable on the basis of the fixed


values of other variables.

CORRELATION In Regression analysis there is an asymmetry in the way the


dependent and explanatory variables are treated.

In Correlation analysis, however we treat the variables


symmetrically.
In Regression analysis, the
dependent variable is assumed to
be statistical, random, or stochastic, In Correlation analysis on the other
that is, to have a probability hand, no such distinction is made
distribution. The explanatory between the dependent and
variables, on the other hand, are explanatory variables. - Symmetric
assumed to have fixed values (in
repeated sampling). - Asymmetric

NOTE ON SYMMETRIC AND ASYMMETRIC


TREATMENT OF VARIABLES.
TYPES OF DATA

Data

Time Cross- Pooled Panel


Series Section Data Data
Time Series data also called Macro data, are those
that are collected for the same entity for different
periods of time.
TIME SERIES Simply put, a time series dataset consists of
DATA observations on one or more variables over time.
The issue of ‘data frequency’ is important in the context
of time series data. Most common data frequencies are
annual, quarterly, monthly, weekly.
Also known as Micro-data, are those collected for
different entities in a single period of time. A cross
sectional data may consist of a sample of individuals,
CROSS-SECTION households, firms, regions, countries or any other type
of units at a specific time point.
DATA Cross sectional data are extensively used in
agricultural economics, industrial economics, labour
economics, health economics, demography, etc.
POOLED DATA In Pooled, or combined, data are elements of both
Time-series and Cross-section data.
A special type of Pooled data, also called
Longitudinal data, are data collected for multiple
entities where each entity is observed in two or more
time points.

PANEL DATA For instance, if we collect data on some


macroeconomic variables (GDP, Money supply,
exports, etc) for some countries for two or more years,
and arrange this data in a systematic manner, then our
dataset is called a panel dataset.
In our upcoming discussions we will dive deeper into the various
assumptions and their implications pertaining to Regression
Analysis.

UPCOMING PRESENTATIONS
REFERENCES
Gujarati N Damodar, Porter C Dawn, Gunasekar Sangeetha ; Basic Econometrics
(Fifth Edition).
Bhaumik K Sankar ; Principles of Econometrics: A modern approach using Eviews (First
Edition).

Das könnte Ihnen auch gefallen