24 views

Uploaded by Spandan Bandyopadhyay

- Relationship of Patient Characteristics and Rehabilitation Services to Outcomes Following Spinal Cord Injury- The SCIRehab Project
- Fused Lasso JRSSB
- STAT2008 Outline
- Regression English
- Desingn Of Experiments
- Multiple Regression Models
- asr_1984
- BMss
- syll.s06
- JM2007
- Entrepreneurship Success Factors an Empirical Investigation in Sri Lanka
- lecture13.pdf
- Lossless Compression and Aggregation Analysis for Numerical and Categorical Data in Data Cubes
- Colourfulpractice-Isdesigneducationinformingarchitectsuseofcolour (1)
- Market Mix
- Motivation
- a4f7b9e167bc22cfe1dd9ab6e952bae3e9b7
- Assignment 2 Question
- Chap 006
- Stat 305 Final 2014.docx

You are on page 1of 68

Department of Mathematics and Statistics

University of Maryland, Baltimore County (UMBC)

January 2018

Outline of Topics

2 Why do we use Regression Analysis?

3 What are the types of Regressions?

4 Linear Regression

5 Logistic Regression

6 Polynomial Regression

7 Stepwise Regression

8 Ridge Regression

9 Lasso Regression

10 ElasticNet Regression

11 How to select the right Regression Model?

What is Regression Analysis?

investigates the relationship between a dependent (target/response)

variable Y and a set of independent or predictor variables

X = (X1 , X2 , ..., Xp ). This technique is widely used for forecasting,

time series modelling and finding the causal effect relationship

between the variables.

A standard procedure is to collect data on (Y , X), plot on a Scatter

Plot (one predictor variable) or create response surface (multiple

predictor variables), and fit a line / curve / a surface to the data

points, so as to minimize (in some sense) the differences between the

distances of data points from the curve or line - use of L1 or L2 norms!

Why do we use Regression Analysis?

Multiple benefits

It explores the significant relationships between dependent variable and

independent variable.

It explores the strength of impact of multiple independent variables on

a dependent variable.

Regression analysis also allows us to compare the effects of variables

measured on different scales, such as the effect of price changes and

the number of promotional activities. These benefits help market

researchers / data analysts / data scientists to eliminate and evaluate

the best set of variables to be used for building predictive models.

How many types of regression techniques?

variables, type of dependent variables and shape of regression line).

Regression Types?

Linear Regression Most widely known modeling technique - dependent

variable is continuous, independent variable(s) can be continuous or

discrete, and nature of regression line is linear.

It is represented by an equation Y = a + bX + e, where a is intercept,

b is slope of the line and e is error term. This equation can be used to

predict the value of target variable based on given predictor variable(s).

1-Linear Regression

Y = a + b1 X1 + ... + bp Xp + e.

Challenge: to efficiently estimate the intercept and slope parameters -

usually accomplished by Least Squares Method. We can evaluate the

model performance by using the metric R 2

More on Linear Regression

Important Points:

There must be linear relationship between independent and dependent

variables.

Multiple regression suffers from multicollinearity, autocorrelation,

heteroskedasticity.

Linear Regression is very sensitive to Outliers. It can terribly affect the

regression line and eventually the forecasted values.

Multicollinearity can increase the variance of the coefficient estimates

and make the estimates very sensitive to minor changes in the model.

The result is that the coefficient estimates are unstable.

In case of multiple independent variables, we can go with forward

selection, backward elimination and step wise approach for selection of

most significant independent variables.

2- Logistic Regression

(success/failure, survival/death) to model the probability of response,

irrespective of the nature of predictor variables.

p probability of event occurrence

odds ratio= = probability of event none occurrence

(1 − p)

p

log(odds ratio) = ln( (1−p) )

p

logit(p) = ln = b0 + b1 X1 + b2 X2 + b3 X3 + · · · + bp Xp

(1 − p)

2- Logistic Regression

Important Points:

Widely used for classification problems

Logistic regression can handle various types of relationships because it

applies a non-linear log transformation to the predicted odds ratio

To avoid over fitting and under fitting, we should include all

significant variables. A good approach to ensure this practice is to use

a step wise method to estimate the logistic regression

It requires large sample sizes because maximum likelihood estimates

are less powerful at low sample sizes than ordinary least squares

Modifiied minimum chisquare is an alternative method - long and rich

history!

2-Logistic Regression

i.e. no multi collinearity - can include interaction effects of categorical

variables in the analysis and in the model.

If the values of dependent variable are ordinal, then it is called,

Ordinal logistic regression

If dependent variable is multi class then we call it: Multinomial

Logistic Regression

3- Polynomial Regression

y = a + bx + cx 2 + ...

Best fit is quadratic, cubic, quartic,...

3- Polynomial Regression

Important Point:

Usually a temptation to fit a higher degree polynomial to get lower

error, this can result in over-fitting. Always plot the relationships to

see the fit and focus on making sure that the curve fits the nature of

the problem.

4- Stepwise Regression

independent variables - selection of independent variables is done with

the help of an automatic process, which involves no human

intervention.

This feat is achieved by observing statistical values like R-square,

t-stats and AIC metric to discern significant variables. Stepwise

regression basically fits the regression model by adding/dropping

co-variates one at a time based on a specified criterion. Some of the

most commonly used Stepwise regression methods are listed below:

4- Stepwise Regression

predictors as needed for each step.

Forward selection starts with most significant predictor in the model

and adds variable for each step.

Backward elimination starts with all predictors in the model and

removes the least significant variable for each step.

The aim of this modeling technique is to maximize the prediction

power with minimum number of predictor variables. It is one of the

methods to handle higher dimensionality of data set.

5- Ridge Regression: huge literature

multicollinearity (independent variables are highly correlated). In

multicollinearity, even though the least squares estimates (OLS) are

unbiased, their variances are large which deviates the observed value

far from the true value. By adding a degree of bias to the regression

estimates, ridge regression reduces the standard errors.

y = a + y = a + b1 x1 + b2 x2 + · · · + e for multiple independent

variables.

Ridge regression solves the multicollinearity problem through

shrinkage parameter λ by choosing estimates of above model

parameters by minimizing (penalty function)

Pn Pp

i=1 (yi − a − b1 x1i − ... − bp xpi )2 + λ[ 2

i=1 bi ]

Effect is to shrink estimates to have low variance.

6- Lasso Regression: huge literature

Selection Operator) also penalizes the absolute size of the regression

coefficients. In addition, it is capable of reducing the variability and

improving the accuracy of linear regression models.

Penalty

Pn function: Pp

2

i=1 i − a − b1 x1i − ... − bp xpi ) + λ[ i=1 |bi |]

(y

Lasso regression differs from ridge regression in a way that it uses

absolute values in the penalty function, instead of squares. This leads

to penalizing (or equivalently constraining the sum of the absolute

values of the estimates) values which causes some of the parameter

estimates to turn out exactly zero. Larger the penalty applied, further

the estimates get shrunk towards absolute zero. This results in

variable selection out of given p variables

6- Lasso Regression

Important Points:

The assumptions of this regression is same as least squared regression

except normality is not to be assumed

It shrinks coefficients to zero (exactly zero), which certainly helps in

feature selection

If group of predictors are highly correlated, Lasso picks only one of

them and shrinks the others to zero

7- ElasticNet Regression

trained with L1 and L2 prior as regularizer. Elastic-net is useful when

there are multiple features which are correlated. Lasso is likely to pick

one of these at random, while elastic-net is likely to pick both.

Important Points:

It encourages group effect in case of highly correlated variables

There are no limitations on the number of selected variables

It can suffer with double shrinkage

How to Select the Right Regression Model?

other models like Bayesian, Ecological and Robust regression.

Life is usually simple, when you know only one or two techniques. For

a single response variable, - if the outcome is continuous - apply linear

regression. If it is binary - use logistic regression! However, higher the

number of options available at our disposal, more difficult it becomes

to choose the right one.

How to Select the Right Regression Model?

the best suited technique based on type of independent and

dependent variables, dimensionality in the data and other essential

characteristics of the data. Below are some key factors that can be

used to select the right regression model:

Data exploration is an inevitable part of building predictive model -

must try to identify the relationship and impact of variables

How to Select the Right Regression Model?

different metrics like statistical significance of parameters, R-square,

Adjusted r-square, AIC, BIC and error term, Mallow’s Cp criterion.

This essentially checks for possible bias in the selected model, by

comparing the model with all possible submodels (or a careful

selection of them).

Cross - validation is the best way to evaluate models used for

prediction. Here you divide data set into two group (train and

validate). A simple mean squared difference between the observed

and predicted values gives measure for the prediction accuracy.

How to Select the Right Regression Model?

automatic model selection method because we do not want to put

these in a model at the same time.

It also depends on our objective. It can occur that a less powerful

model is easy to implement as compared to a highly statistically

significant model.

Regression regularization methods(Lasso, Ridge and ElasticNet) works

well in case of high dimensionality and multicollinearity among the

variables in the data set.

Education and Economic Growth: A Meta-Regression

Analysis

Primary analysis or secondary analysis?

56 studies with 979 estimates and show that there is substantial

publication selection bias towards a positive impact of education on

growth. Once we account for this, we find evidence of a genuine

effect of education on economic growth.

The variation in reported estimates can be attributed to differences in

the measurement of education and study characteristics, most

importantly model specification, estimation methodology, type of data

and the research outlet where studies were published.

e.g. academic journals vs. working papers.

Some Regression Topics in Economics

Is consumption truly a ”random walk”?

Estimating the male-female wage gap, and what causes it.

Does campaign funding lead to good election results?

The effect of advertising on demand for a good.

The Relationship Between Annual GDP Growth and Income

Inequality: Developed and Undeveloped Countries

GDP versus Manufacturing Output: Proof of Movement of

Standardized Processes

Economic Patterns in Voting

Economic Factors Affecting Homelessness in India

Factors Explaining Life Satisfaction Across Countries

The Economic Impact of Research and Development

Is College Worth the Money? A Look on the Effects of Bachelors

Degrees to the Unemployment Rate

Bimal Sinha (UMBC) Regression Analysis December , 2017 25 / 68

Some Regression Topics in Economics

Performance

An Examination of the Economic Effects of the Winter Olympics

Factors Affecting Corruption in Developing and Emerging Countries

Modern Day Evaluation of the Preston Curve: The Relationship

Between Life Expectancy and Income

Econometric Analysis: Effect of Barriers on Trade

Income Inequality as a Determinant of Economic Growth: A

Cross-Country Analysis

Some Regression Topics in Economics

Happiness and Traffic: An Analysis of Long Term Effects

Effect of GDP Per Capita on National Life Expectancy

Impact of Educational Attainment on Crime in the United States: A

Cross-Metropolitan Analysis

Understanding How Unique Attributes Might Affect Poverty

The Effect of Inequality on Satisfaction

Regression Analysis of Electrical Energy Consumption with

Cross-Country data

Key Steps in Economic Regression Analysis (Econometrics)

I The Model

The model and the data are the starting points of an econometric

project.

The first step in formulating a model is to select a topic of interest

and to consider the model’s scope and purpose.

State and understand objectives of the study, what boundaries to

place on the topic, what hypotheses might be tested, what variables

might be predicted, and what policies might be evaluated.

Close attention must be paid, however, to the availability of adequate

data. In particular the model must involve causal relations among

measurable variables.

I. The Model: Choice of topics?

particular market (the market for Pitzer graduates, the market for

economists, the market for ice cream, the markets for private education), a

process (economic development, inflation, unemployment), demographic

phenomena (birth rates, death rates), environmental phenomena (water

quality, air quality), political phenomena (elections, voting behavior of

legislatures), some combination of these, or some other topic.

”Air pollution and Population”

”Birth Rates, Death Rates, and Economic Growth in Developing

Economies”

”Demand for and Supply of Higher Education”

”Differential Growth in Indian cities”

I. The Model

”Divorce Rates, Birth Rates, and Female Participation in the Labor

Force”

”Economic and Social Determinants of Infant Mortality in India”

”The Effect of Unemployment on Crime”

”Elections and Money”

”Medical School Applications”

”Police Expenditures and the Deterrence of Crime”

”The Relationship between Exports and Growth in Less Developed

Countries”

”Unionization and Strike Activities”

I. The Model

dependent variable Y. But since there are many variables X that have

influence on the variable Y, it is important to include all those

variables

To ensure that the model is both interesting and manageable, it

should contain at least three to four independent variables

The model should be formulated as an algebraic, linear, stochastic

equation along with a corresponding verbal statement of the meaning

of the equation.

II. The Data

obtaining an adequate and relevant set of data is an important and

often critical part of the econometric project. Data must be available

for all the variables in the model. Huge literature to deal with missing

data!

National Statistical Abstracts, Statistical Yearbooks, or Statistical

Handbooks, published annually by most major countries provide both

summary statistics and references to primary sources.

II. The Data

wealth of data on member countries, as do statistical yearbooks of other

international organizations like the OECD. The Federal Reserve Bank of

St. Louis puts out International Economic Conditions which gives

comparative data for Canada, France, Germany, Italy, Japan, Netherlands,

Switzerland, United Kingdom, and the U.S. Various almanacs, sources on

the WWW like www.census.gov, and other reference works also abound in

statistics. Take a look at the course homepage and the economics

department homepage. All of these sources contain data on so many

topics that they may suggest a topic for the econometric project.

II. The Data

Also it is best to avoid data sets which are too small, say less than

thirty observations.

The data should be examined, and if necessary, refined to make them

suitable for the purposes of the model.

For time-series data it may be necessary to use seasonal adjustments

or perhaps to eliminate certain trends. For both time-series and

cross-section consideration should be given to whether to divide the

data into separate samples or perhaps exclude certain observations.

II. The Data

exclude war years or years of a recession. In a cross-section of nations

it may be inappropriate to include all countries that are UN members.

The developed countries might be treated as one group and the

developing countries as another group.

Dividing the data this way into subsamples not only leads to more

homogenous data sets but also facilitates the study by allowing

comparative analyses.

III. The Estimation

After both the model and data have been developed, the next step is

to utilize econometric techniques to estimate the model.

We can use STATA 14 or any other statistical package for the

statistical analysis. Basic statistics packages include Minitab and

Excel. For careful work in econometrics we will want to use EViews,

STATA, SAS, TSP, LimDep, SPSS or Shazam.

Make sure that we have enough observations for all the variables and

that the dependent and independent variables show some variation

over the observations.

IV. Specification of the Model

Define and discuss the specification of the selected model What variables

are included in the model? Explain why we chose those variables and the

role they play in the model. Have we included all the important variables

in the model? What are the expected signs of all the coefficients?

V. Data Description

Provide complete description of all the data, their sources, refinements

used, and their possible biases or other possible weaknesses.

VI. Results

Present the estimates of the model and its related statistics such as

standard errors, t statistics and the R 2 . Discuss which coefficients are

significant at the 5% and 1% levels. If relevant, a discussion of possible

serial correlation and its correction; a discussion of possible

heteroscedasticity and its correction; and a discussion of possible

multicollinearity and its correction. Estimate alternative models to test the

robustness of the results.

VII. Discussion

Discuss the signs and magnitudes of the estimated coefficients and their

comparisons to predicted or theoretical signs and magnitudes. What have

we learned? Consider how the model might be reformulated in future

studies, and implications for future econometric research.

VIII. Conclusions

IX. Bibliography

Include complete citations of all items referred to in the paper.

X. Data

If reasonable, provide a table of all the data used. At a minimum, provide

the summary statistics for the data.

Forecasting

economics and business analytics.

Causal methods

time series methods

qualitative methods.

Forecasting

Each of these three different methods has various tools and techniques

that fall underneath the silo in question. And each of these methods is

going to be appropriate in different kinds of circumstances.

Causal methods typically involves regression analysis and some of the

different types of specialized regression analysis that are going to be

useful in various circumstances.

Time series methods often involves various forms of trend

analysis.Things like exponential smoothing, trend prediction, et

cetera.

And then,

qualitative methods involve using surveys and other subjective ad hoc

methods of gathering data in order to make predictions. In causal

forecasting we’re relying on relationships between variables.

Website for books on Regression

Statistics. https://www.pinterest.com/explore/regression-analysis/

Contents: The nature of econometrics and economic data. Part I:

REGRESSION ANALYSIS WITH CROSS-SECTIONAL DATA : The

simple regression model. Multiple regression analysis: Estimation.

Multiple regression analysis: Inference. Multiple regression analysis:

OLS asymptotics. Multiple regression analysis:

Modeling of United States Airline Fares Using the Official Airline

Guide (OAG) and Airline Origin and Destination Survey (DB1B)

A Case Study

Krishna Rama-Murthy

Master’s Thesis, Virginia Polytechnic Institute & State University, 2006

Motivation

Travel cost is one of the major factors that a traveler considers when

he/she chooses the transportation mode for the trip.

National Aeronautics and Space Administration (NASA) intends to

reduce inter-city travel time in the United States by one-half within

10 years and by two-thirds within 25 years, while keeping costs low

and improving safety.

For inter-city transportation system mode choice analysis, knowing the

cost of travel by each existing transportation mode have an impact of

the introduction of a new mode of transportation. The travel cost will

also help to determine the future trend in travels whether there is

going to be congestion or more demand of a particular mode.

NASA, in collaboration with the Federal Aviation Administration

(FAA), industry, and several universities, has launched the Small

Aircraft Transportation System (SATS) research program whose

critical task is transportation system demand estimation.

Bimal Sinha (UMBC) Regression Analysis December , 2017 45 / 68

Motivation

utilizes a cost model which is split into two sub-categories:

i. Cost model for Supply side, also referred to as “Transportation Vehicle

Performance Models”

ii. Cost model for Demand side, also referred to as “Generic Fare Model”

Rama-Murthy developed this generic fare model as a demand side

cost metric. The ratio of average fare to distance (fare per mile) is

used as a measure of this cost of travel.

Compared to other transportation mode fares, it is not easy to typify

air fare since it is affected by many factors. To better understand the

variation in the cost of air travel, Rama-murthy formulated several

statistical models.

Understanding Airfares

to the US domestic airline industry. In particular, the total numbers

of enplanements and passenger miles have more than doubled since

then, and the overall airfare has been considerably lower than it would

have been had regulation continued.

After the removal of the restrictions posted on airline industry in

regulation years, airfares have taken a more and more complex

structure. Airfares are heavily influenced by factors such as

i. scale economies

ii. level of competition

iii. airport congestion, and

iv. airline marketing strategies

Understanding Airfares

i. Longer flights tend to have lower average cost because the fixed costs

associated with each flight can be spread over a longer distance.

ii. Markets with larger passenger volume tend to have lower average cost

since airlines in those markets are able to use larger planes and achieve

higher load factors.

Methodology

travel in US:

i. Non-linear model which estimates the relationship average round-trip

fare and yield

ii. Multiple regression models that try to understand the causal

relationship between average fare between any origin and destination

pair and other defined explanatory variables

A list of 685 commercial airports classified by the FAA was used in

the analysis. These airports were clustered into four separate

categories of airports based on the total number of enplanements: (1)

Large Hub, (2) Medium Hub, (3) Small Hub, and (4) Non-Hub.

Nearly 95% of these enplanements in National Airspace System go

through the Large and Medium Hubs.

Determination of Fare Class Category

analysis, the fares were grouped into two types: First and Business

class and Non-Business class.

In order to determine the proper fare class category to be used for the

analysis, a set of fare class groups was created using the fare class

categories. They are as follows:

a. Business Class - Unrestricted First Class (F), Restricted First Class (G),

Unrestricted Business Class (C) and Restricted Business Class (D)

b. Coach Class - Unrestricted Coach Class (Y) and Restricted Coach Class

(X)

c. Restricted Coach Class (X)

d. Unrestricted Coach Class (Y)

Using these categories as a basis for class determination, non-linear

regression models were generated using the distance traveled as an

independent variable.

The results are presented below:

Based from the figure above, the fare model for the Unrestricted

Coach Class(Y) behaves similar to the Business Class fare model.

Hence, Unrestricted Coach Class fares were combined with Business

class fares for the analysis.

Bimal Sinha (UMBC) Regression Analysis December , 2017 51 / 68

The final cluster of fare class groups that were used to develop the

models is given below:

a. Business Class - Unrestricted First Class (F), Restricted First Class (G),

Unrestricted Business Class (C) Restricted Business Class (D) and

Unrestricted Coach Class (Y)

b. Non-Business Class - Restricted Coach Class (X)

Model Variables

1. Round Trip Distance: This used to be a prominent independent

variable for modeling airfare but after the deregulation period the

relationship between airfares and distance has broken down (Anderson, et.

al. 2002).

important impact on the cost of air travel. To understand this

competition, the percentage of total number of seats offered by each

carrier, denoted by pa , is calculated.

X X ta

ta = fab sb sa = ta pa =

a

sa

b

where

a: total number of airlines at origin (i) airport

b: types of aircraft by each airline from origin (i) airport

ta : total number of seats for each airline a from origin (i) airport

fab : frequency of aircraft type b offered by each airline a

sb : number of seats for aircraft type b

sa : total

Bimal

number of seats offered byRegression

Sinha (UMBC)

an airline a from origin (i) airportDecember , 2017

Analysis 53 / 68

Model Variables

pair can be measured by calculating the Herfindahl Index (HI).

X

HI = pa2

a

corresponds to a monopoly; 0.5 corresponds to an industry with two

equal-sized firms, 0.33 corresponds to an industry with three equal sized

firms and so on and so forth. As a rule, any market having a HI greater

than 0.4 is considered a highly concentrated market and less than 0.18 a

less concentrated market. The higher the concentration, the more likely

the fare will increase in that market segment.

Model Variables

3. Passenger Flows: A large number of passenger flows tend to reduce

the average fare. However this may not be true for certain cases having

higher HI values, thereby increasing the average fare. Hence, the

relationship between Herfindahl Index and passenger flow was observed.

Model Variables

cost carrier will tend to reduce the average fare between an Origin

-Destination pair. Low Cost carriers usually don’t offer business service and

only offer point-to-point service, thereby reducing their operating costs.

Model Variables

following types depending on the number of enplanements. It is usually

believed that traveling between Large Hubs is inexpensive than traveling

from other airports. Also on a macro level the overall supply and demand,

expenses and revenue would tend to drive the costs down in large airports.

Fare Models

Using all the variables mentioned previously, a family of “Fare Models”

was created for both Business and Coach Class. They are as follows:

1 Table Function: The Table Function is the weighted average of the mean

fare paid between 685 x 685 airports. The mean fare for a single

Origin-Destination pair is determined using the following formula:

X

µ= xp(x)

x

2 Non-Linear Regression Fare Model: A generic fare per mile model is used

to predict fare per mile only using the mean round trip distance traveled as

an independent variable. It is a non-linear regression model also known as

Harris Model. The model is given below:

1

y=

a + bx c

where y is the fare per mile ($/mile), a, b and c are the model parameters,

and x is the round trip distance in statue miles.

Bimal Sinha (UMBC) Regression Analysis December , 2017 58 / 68

Fare Flow Model

A family of generic fare models was developed as an input for the “Fare

Flow Model”. The “Fare Flow Model” is a combination of Table Function

and the generic fare models.

Check for fare value between any Origin-Destination pair in Table function.

If the fare value in Table Function is not available, check whether one of the

Origin-Destination airports is in Alaska or Hawaii.

If Origin and Destination airport is in Alaska or Hawaii, check the distance.

If distance is less than 1500 miles, use the Harris Model within AK & HI.

If No, then check if distance is greater than 1500 miles and less than 3000 miles and use

the Harris Model developed for that distance category.

If the distance is greater than 3000 miles, use the Harris Model for distance greater than

3000 miles.

If the Origin-Destination airport is not Alaska or Hawaii, check for Origin-Destination pair

airports with distance less than 500 miles.

If the distance is less than 500 miles, use the Harris Model developed for that category of

distance and Origin-Destination pairs.

If the distance is greater than 500 miles, use the Harris Model for distance greater than

500 miles.

Finally if Origin-Destination pair doesn’t fall in any of the above category it uses the

Generic Fare model developed using the Great Circle Distance. The great circle distance is

the minimum distance between the Origin-Destination airport pair.

Bimal Sinha (UMBC) Regression Analysis December , 2017 59 / 68

Statistical Validation of “Fare Flow Model”

The Fare flow model was then tested using a non parametric statistical

test for non-similarity between the generic fare models. The Wilcoxon

Rank Sum Test is a nonparametric alternative to the two-sample t-test

which is based solely on the order in which the observations from the two

samples fall. The results from Wilcoxon Rank Sum test performed on the

“Fare flow models” indicate that the models are dissimilar and are

independent from each other. The p-values imply that the models are

statistically significant.

Multiple Linear Regression Model

To test the hypotheses about the factors that affect the cost of air travel,

multiple regression equations were undertaken on the basis of fare class.

fcij = β0 + β1 dij + β2 pcij + β3 hi + β4 hj + β5 lcij + β6 oi + β7 dj + eij

where

fcij : annual average round-trip fare for coach class between i and j

dij : round trip distance in statue miles between i and j

hi : Herfindahl Index at the origin airport i

hj : Herfindahl Index at the origin airport j

pcij : annual coach class type passenger flows between i and j

lcij : low cost carrier presence between i and j dummy variable 0 or 1

oi : origin airport type (i) [1, 2, 3, and 4]

dj : destination airport type (j) [1, 2, 3, and 4]

β , β , β , β , β , β , β , β : model

Bimal Sinha (UMBC) parameters

Regression Analysis to be estimated.

December , 2017 61 / 68

Results

Parameter Estimate for Coach Fare Class Regression, Average Coach Fare

Interpretation

showing that longer trips have more average fare value.

Competition is one of the main causes that affect airfares. The higher

the competition, fares tends to be lower. The positive sign on the

competition parameters, Herfindahl Index at the origin and

destination airport, indicate that lesser the competition more the

average fare between the O-D pair. It also shows that the

competition at the destination airport is more critical than the

competition at the origin airport.

The annual passenger flows are higher between larger airport pairs.

This flow is one of the main reason for congestion in these large

airports; leading to more indirect operating costs. These costs are

directly passed on to the passengers leading to higher fares, as

indicated by the positive sign on annual average passenger flows.

Interpretation

the US. These airlines have a successful business model to reduce

indirect operating costs, thereby offering cheaper fares. Any presence

of low-cost carrier at the origin airport tends to reduce the average

fare. This is indicated by the negative sign of the causal variable

low-cost carrier presence.

The origin and destination airport type variables both have positive

effects, suggesting that airfare tends to be higher at smaller airports.

Again, the destination airport type is more critical that the origin

airport type.

Multiple Linear Regression Model

Business Class Fare Analysis:

where

fbij : annual average round-trip fare for business class between i and j

dij : round trip distance in statue miles between i and j

hi : Herfindahl Index at the origin airport i

hj : Herfindahl Index at the origin airport j

pcij : annual coach class type passenger flows between i and j

oi : origin airport type (i) [1, 2, 3, and 4]

dj : destination airport type (j) [1, 2, 3, and 4]

β0 , β1 , β2 , β3 , β4 , β5 , β6 : model parameters to be estimated.

eij : residual

Bimal Sinha (UMBC) Regression Analysis December , 2017 65 / 68

Results

Fare

Interpretation

The parameter estimate for average distance has a positive sign,

showing that longer business trips have more average fare value.

The positive sign on the competition parameters, Herfindahl Index at

the origin and destination airport, indicate that competition also

affects business class fares; lesser the competition more the average

fare between the O-D pair. It also shows that the competition at the

origin airport is more critical than the competition at the destination

airport in case of business class trips.

The annual passenger flows variable has a contradictory affect on the

business class fares; indicating that an increase in business class

passengers flows tend to reduce the overall operating costs thereby

reducing the average fare.

The origin and destination type causal variables have a negative affect

on the business fares, implying that the business fares tends to be

lower at small sized airports.

Bimal Sinha (UMBC) Regression Analysis December , 2017 67 / 68

Thank You!

- Relationship of Patient Characteristics and Rehabilitation Services to Outcomes Following Spinal Cord Injury- The SCIRehab ProjectUploaded byvasgar
- Fused Lasso JRSSBUploaded byCodius
- STAT2008 OutlineUploaded byLelouch1337
- Regression EnglishUploaded byJeejie Jureerat
- Desingn Of ExperimentsUploaded bykannappanrajendran
- Multiple Regression ModelsUploaded byArun Prasad
- asr_1984Uploaded byEliseo Marpa
- BMssUploaded byCfhunSaat
- syll.s06Uploaded byjuntujuntu
- JM2007Uploaded bydeea211091
- Entrepreneurship Success Factors an Empirical Investigation in Sri LankaUploaded byDeva Sanjeev
- lecture13.pdfUploaded byronny
- Lossless Compression and Aggregation Analysis for Numerical and Categorical Data in Data CubesUploaded byJournal of Computing
- Colourfulpractice-Isdesigneducationinformingarchitectsuseofcolour (1)Uploaded byCarlos Lau Guzman
- Market MixUploaded byenggajb
- MotivationUploaded byMariangela
- a4f7b9e167bc22cfe1dd9ab6e952bae3e9b7Uploaded byfcord
- Assignment 2 QuestionUploaded byYanaAlihad
- Chap 006Uploaded byJudy Anne Salucop
- Stat 305 Final 2014.docxUploaded byMshololo Mgg
- Articol EngUploaded byottonagy
- LogisticUploaded bys_ali771
- 11.[25-34]Environmental Awareness in Relation to Awareness Towards Social Duty and Some Demographic Factors Affecting It Among Higher Secondary StudentsUploaded byAlexander Decker
- IJASRFEB20175Uploaded byTJPRC Publications
- The Role of Risk and Protective Factors in Substance Use across adolescence.pdfUploaded byLaura Ramirez
- Assignment QuestionsUploaded bymani_bushan
- ExamplesUploaded byRahul Sukhija
- MunkanelkulisegUploaded byBeáta-Csilla Kerekes
- satellite problem 2Uploaded byapi-354338441
- Intro of RiverUploaded bySha Sha

- Inter FaithUploaded bySpandan Bandyopadhyay
- Shahid Afridi Mocks Gautam GambhirUploaded bySpandan Bandyopadhyay
- রামপ্রসাদ(Ramprasad)Uploaded bySpandan Bandyopadhyay
- রামপ্রসাদ(Ramprasad)Uploaded bySpandan Bandyopadhyay
- Swami Atmapriyanandaji Maharaj SpeechUploaded bySpandan Bandyopadhyay
- Brain Drain.docxUploaded bySpandan Bandyopadhyay
- feesUploaded bySpandan Bandyopadhyay
- Vicious Circle of POvertyUploaded bySpandan Bandyopadhyay
- Forces Driving Labour Market OutcomesUploaded bySpandan Bandyopadhyay
- QuizUploaded bySpandan Bandyopadhyay
- QuizUploaded bySpandan Bandyopadhyay
- Low-Level Equilibrium Trap (Print)Uploaded bySpandan Bandyopadhyay
- Swami VivekanandaUploaded bySpandan Bandyopadhyay
- Feluda SamagraP2 SwUploaded bySpandan Bandyopadhyay
- Brain DrainUploaded bySpandan Bandyopadhyay
- Swami Vivekananda QuizUploaded bySpandan Bandyopadhyay
- 2705History Civics X.docUploaded bySpandan Bandyopadhyay

- Stock Return Predictability With Financial Ratios. Evidence From PSX 100 Index CompaniesUploaded byWasimOrakzai
- Sem Slides1Uploaded byaboabd2007
- An experimental investigation of design parameters for pico-hydro Turgo turbines using a response surface methodologyUploaded byFranki123
- Watson,_Teelucksingh]_A Practical Introduction to Econometric MethodsUploaded byFabiola QP
- Regression DetailsUploaded bySangram Panda
- Modelling Carbon Dioxide Emission, Energy Consumption and Economic Growth in Nigeria: Environmental Kuznets Curve (EKC) ApproachUploaded byInternational Journal of Innovative Science and Research Technology
- Iimfi_en_2015_01cont_2_AlshattiUploaded bySami Tkh
- Multivariate - Dependence MethodsUploaded byArvind Yadav
- inflationUploaded bytanunolan
- Part3. 实用教程--Practical Regression and ANOVA using RUploaded byapi-19919644
- Feed Me: Motivating Newcomer Contribution in Social Network SitesUploaded byJonathan Chang
- The Effects of Multicollinearity in Multilevel ModelsUploaded byIsmael Neu
- Professionalism, Ethics, and Independence in Materiality JudgmentUploaded byAlfina Fittrinnisak
- PrimingUploaded byYohanes Masboi Widodo
- Working Capital Management Practiced in Pharmaceutical CompaniedUploaded byBirat Sharma
- 2012_Determinants of systematic risk.pdfUploaded byQuyen Nguyen
- FRM 2014 PracticeUploaded byKaval Hora
- OLS2Uploaded byjhanggawan
- Dynamic Model of Ibn Khaldun Theory on PovertyUploaded byReza Fetrian
- Applied Regression Analysis - Draper Smith.pdfUploaded byMariå Cèñdånä D'Cloudss
- 2009 Predicting Juvenile Delinquency the Nexus of Childhood Maltreatment Depression and Bipolar DisorderUploaded byMarina Vargas
- ch11testUploaded byMahmoudAli
- User Friendly Multivariate Calibration GPUploaded bySilvio Daniel Di Vanni
- EiaUploaded byEngr Fizza Akbar
- Regression and CorrelationUploaded byAbdul Rehman
- Introductory Econometrics Exam MemoUploaded bymdxful002
- Very Close Paper- Refere ItUploaded bymani10ray
- VIFUploaded byDigito Dunkey
- MmmUploaded byReet Kanjilal
- Exchange Rate and Its Determinants in IndiaUploaded byAkshay Jain