Sie sind auf Seite 1von 49

GSBS6002 Foundation of Business Analysis Assignment 1: Data Analysis Report

Postgraduate student of the University of Newcastle, Australia

Page 1 of 51

Table of Contents

3. Demographic Profile of Respondents ............................................................................ 8 5. Conclusion............................................................................................................................. 50 References ................................................................................................................................... 51 4. Data Analysis and Findings ............................................................................................ 11

2. Data Screening........................................................................................................................ 6

1. Introduction ............................................................................................................................ 6

Executive Summary .................................................................................................................... 5

Page 2 of 51

Executive Summary
This data analysis report provides recommendations for decision making with regards

to Divine Elegance, a fine upscale restaurant to be opened in a large metropolitan area. The data screening, demographic profile of respondents, analysis methods, results of each analysis and appropriate recommendations are discussed in detail. The ten areas of analysis and findings are as follows: item.

Q1: Price of entre items. Potential patrons are willing to pay around $18 for an entre month is expected to be less than $200.

Q3: Location of the restaurant. The best location for the restaurant is in location B. low.

Q2: Amount spent per month by potential patrons. The average amount spent per

Q4: Likelihood to patronise and household income level. Likelihood to patronise is Q5: Restaurant dcor. The restaurant should have simple dcor. and Easy Listening radio programmes. likely to be high when income level is high and likely to be low if income level is

Q6: Live entertainment. The restaurant should have jazz combo live entertainment.

Q7: Advertising in radio programmes. Advertisements should be placed during Rock Q8: Likelihood of patronage. The significant predictors of likelihood to patronise are: Dcor, and Prefer Jazz Combo. Prefer Waterfront View, Prefer Formal Waitstaff Wearing Tuxedos, Prefer Large Variety of Entres, Prefer Unusual Entres, Prefer Simple Dcor, Prefer Elegant

Q10: Gender of probable patrons. Both men and women are equally likely to be a probable patron.

Q9: Average age of probable and non-probable patrons. Average age of probable patrons is higher than non-probable patrons.

Recommendations are also given to conduct a subsequent qualitative analysis to gather more insights.

Page 3 of 51

1.

Introduction

The objective of this data analysis report is to provide recommendations for decision

large metropolitan area. The collected survey data is analysed to determine a variety of factors such as the most successful location for the restaurant and price of entre items. The following sections provide details of the data screening process, demographic

making with regards to Divine Elegance, a fine upscale restaurant to be opened in a

recommendations.

profile of the respondents, analysis methods, results of each analysis and appropriate

2.

Data Screening

Before data analysis, a data screening process was performed on the sample data to determine if there were any errors. One of the contingency tables revealed the following.

being a probable patron. The error could have been made by the respondent or during

responded Very Unlikely to patronize the new restaurant and yet responded Yes to

It is highly likely that the single case circled in red had an error since the respondent

Page 4 of 51

data entry. If it is indeed an error, the case should be removed from the sample data so that it does not affect the subsequent analysis.

Note: For the purpose of this assignment, no amendment is made to the sample data.

Page 5 of 51

3.

Demographic Profile of Respondents

The sample data consists of 400 cases. 49% of the respondents are female and almost 28% of the respondents indicated that they will probably patronise the new restaurant.

Page 6 of 51

Respondents from Location B and Location C form the majority, taking up 30% and 55% respectively.

The majority of respondents belong to households with before tax income in the range order.

$50,000 to $74,999, followed by $25,000 to $49,999, and $150,000+ in descending

Page 7 of 51

68% of the respondents are married, while 23% are single. Family size of respondents ranges from one to seven.

Page 8 of 51

4.

Data Analysis and Findings

Definitions
Potential patrons Respondents who answered Yes to the question Do you eat at this type of restaurant at least once every two weeks? Respondents who answered Yes to the question Probable Patron of the new restaurant? In the sample data file, all cases are potential patrons.

Probable patrons

Page 9 of 51

Q1

Price of entre items

The following histogram shows the frequency of expected entre item prices from the $18.84 and standard deviation of $9.828. Its median and mode are both $16.

respondents. The distribution is unimodal and skewed to the right, with a mean of

Hypothesis testing A one-sample t-test for the mean is appropriate for comparing the mean value of a sample to an assumed population mean (Sharpe, De Veaux & Velleman, 2010). Null hypothesis Alternate hypothesis H0: HA: = $18 $18

Where is the average expected price of an entre item Assumptions Independence Assumption Since the sample is random, the data values should be independent. Randomisation Condition The data is obtained from a random sample.
Page 10 of 51

10% Condition The sample size is fewer than 10% of the total population. Normal Population Assumption It is assumed that the population follows a Normal distribution. Nearly Normal Condition The sample distribution is unimodal and skewed to the right. The Q-Q plots below also indicate that the sample distribution is not Normal, but since the sample size is large, this is not a concern.

Method/Technique A Students t-model with n-1 = 339 degrees of freedom is used and a one-sample t-test for the mean is performed.

Page 11 of 51

Failed to reject the null hypothesis at 5% significance level (t339 =1.567, p=0.118 2 > 0.025). There is insufficient evidence to suggest that the average expected price is not $18. The 95% confidence interval for the average expected price is ($18 - $0.2131, $18

+ $1.8837) = ($17.79, $19.88).

entre item, hence the entre items should be priced around this amount.

This means that potential patrons expect (and thus willing to pay) around $18 for an

Page 12 of 51

Q2

Amount spent per month by potential patrons

The following histogram shows the frequency of amount spent per month in restaurants mean of $150.05 and standard deviation of $92.706. Its median and mode are $135 and $110 respectively. from the respondents. The distribution is multimodal and skewed to the right, with a

Hypothesis testing A one-sample t-test for the mean is appropriate for comparing the mean value of a sample to an assumed population mean (Sharpe et al., 2010). Null hypothesis Alternate hypothesis H0: HA: = $200 $200

Where is the average amount spent per month in restaurants Assumptions Independence Assumption Since the sample is random, the data values should be independent. Randomisation Condition The data is obtained from a random sample.
Page 13 of 51

10% Condition The sample size is fewer than 10% of the total population. Normal Population Assumption It is assumed that the population follows a Normal distribution. Nearly Normal Condition The sample distribution is multimodal and skewed to the right. The Q-Q plots below also indicate that the sample distribution is not Normal.

Method/Technique Under these conditions, a Students t-model with n-1 = 399 degrees of freedom is used and a one-sample t-test for the mean is performed with caution.

Page 14 of 51

The null hypothesis is rejected at 5% significance level (t399 = -10.775, p<0.001<0.025).

month in restaurants and $200. The 95% confidence interval for the average amount spent per month is ($200 - $59.0602, $200 - $40.8348) = ($140.94, $159.17).

There is a statistically significant difference between that the average amount spent per

This means that it is not realistic to expect all patrons to spend an average of $200 per average. Since this is significantly lower than $200, more marketing efforts are required to attract more customers in order to sustain the business. month in restaurants, instead they are likely to spend between $141 to $159 on

Page 15 of 51

Q3

Location of the restaurant

The following stacked bar charts (count and percentage) indicate that a larger proportion of the potential patrons in location B is more likely to patronise the restaurant than not, hence location B may be a good choice.

Page 16 of 51

The following stacked bar charts (count and percentage) indicate that a larger the restaurant, whereas a larger proportion of potential patrons in other locations do be located in location B. proportion of potential patrons in location B prefer a drive of less than 30 minutes to

not prefer a drive of less than 30 minutes. This also indicates that the restaurant should

Page 17 of 51

The following side-by-side box plots show the amount spent per month in restaurants by respondents from different locations. This indicates that respondents in Location B spend more money in restaurants per month on average.

Hypothesis testing A one-way ANOVA test is appropriate for testing more than two independent means (Sharpe et al., 2010). Null hypothesis Alternate hypothesis H0: HA: A = B = C = D at least one mean is different

Where A, B, C, D = average amount spent per month from potential patrons in locations A, B, C and D respectively Assumptions Independence Assumption Since the sample is random, the groups should be independent of each other. Randomisation Condition The data is from a random sample. Equal Variance Assumption It is assumed that the variances of each group are equal. Similar Variance Condition The box plots of the four groups in the sample data above
Page 20 of 51

indicate that their variances are not similar. Normal Population Assumption It is assumed that the residuals follow a Normal distribution.

Method/Technique Under these conditions, a one-way ANOVA test is performed with caution on the means with a post hoc Tukey test.

The null hypothesis is rejected at 5% significance level (F3,396 =313.333, p<0.001<0.05). restaurants per month by potential patrons from the four locations.

There is a statistically significant difference between average amounts spent in

Page 21 of 51

n=120) have a statistically significant higher average amount spent per month than

A Tukey post hoc test indicated that potential patrons from location B (M=$250.7250,

those from location C (M=$132.5455, n=220). Also, potential patrons from location C have a statistically significant higher average amount spent per month than those from location A or D.

It is recommended that the restaurant be located at location B because there will likely be more patrons and they are likely to spend more on meals.

Page 22 of 51

Q4

Likelihood to patronise and household income level

The following stacked bar charts (count and percentage) indicate that potential patrons with higher household income are more likely to patronise the restaurant.

Page 23 of 51

The following contingency table shows the likelihood to patronise the restaurant by household income.

cell would be:

If likelihood to patronise is independent of income level, the expected values for each

More than 20% of cells have expected values less than 5. This means that the conditions household income variable so that the rows are combined to the following:

to perform a chi square test are not satisfied. Hence recoding is performed on the

Page 24 of 51

Now, all cells have an expected value > 5.

Hypothesis testing A chi-square test of independence is appropriate for testing whether there is an association between two categorical/ordinal variables (Sharpe et al., 2010). Null hypothesis Alternate hypothesis H0: HA: Likelihood to patronise and household income are independent Likelihood to patronise and household income are not independent

Assumptions Counted Data Condition Independence Assumption The data are counts of respondents categorised on two categorical/ordinal variables. Since the sample is random, the data values should be independent. Randomisation Condition The data is from a random sample. Sample Size Assumption Expected Cell Frequency Condition No more than 20% of expected counts < 5 No expected counts < 1

Method/Technique The conditions are satisfied, so a 2 model with (5 1) (5 1) = 16 df is used and a chisquare test of independence is performed.

Page 25 of 51

The null hypothesis is rejected at 5% significance level (216 =633.842, p<0.001<0.05). patronise and household income (Cramers V =0.629, p<0.001<0.05).

There is a moderate, positive statistically significant association between likelihood to

Note: Cramers V correlation coefficient is used because the table is larger than 22.

There is strong evidence to suggest an association between likelihood to patronise and household income. Likelihood to patronise is likely to be high when income level is high frequent restaurant diners are more likely to have household income of at least and likely to be low if income level is low. This is consistent with market research that

$150,000 (Casual & Fine Dining, 2008). Hence marketing efforts should be directed at potential patrons with higher income.

Page 26 of 51

Q5

Restaurant dcor

The following bar charts show potential patrons preference for simple and elegant who do not prefer elegant dcor.

dcor. There are more respondents who prefer simple dcor and also more respondents

Page 27 of 51

Hypothesis testing A paired t-test is appropriate for testing the means of two variables where both are from the same cases (respondents) i.e. the testing is on the difference between the paired variables (Sharpe et al., 2010). Null hypothesis H0: Mean score for Prefer Simple Dcor and Prefer Elegant Dcor are the same: mean difference is zero: d = 0 Mean score for Prefer Simple Dcor and Prefer Elegant Dcor are not the same: mean difference is not zero: d 0

Alternate hypothesis

HA:

Assumptions Paired Data Assumption Independence Assumption The data for the two variables are paired because the same respondents answered both questions. Since the sample is random, the pairwise differences should be independent. Randomisation Condition The data is obtained from a random sample. 10% Condition The sample size is fewer than 10% of the total population. Normal Population Assumption It is assumed that population of pairwise differences follow a Normal model. Nearly Normal Condition The sample distribution of pairwise differences is not Normal according to the following histogram and Q-Q plots.

Page 28 of 51

Page 29 of 51

Method/Technique Under these conditions, a Students t-model with n-1 = 399 degrees of freedom is used and a paired t-test is performed with caution.

The null hypothesis is rejected at 5% significance level (t399 =8.564, p<0.001<0.025).

The mean score for Prefer Simple Dcor (M=3.58, SD=1.492) is significantly different from the mean score for Prefer Elegant Dcor (M=2.33, SD=1.510). The 95% confidence interval of the difference is (0.961, 1.534).

This means that potential patrons average preference for simple dcor is likely to be

between 1 to 1.5 survey points higher than elegant dcor. Hence the restaurant should have simple dcor. This is contrary to other fine-dining restaurants that use exotic dcor as a competitive weapon (Duecy, 2005, p.65), hence a qualitative analysis is recommended to understand the reasons (see Conclusion).

Page 30 of 51

Q6

Live entertainment

The following bar charts show potential patrons preference for string quartet and jazz respondents that prefer jazz combo.

combo. There are more respondents that do not prefer string quartet and more

Page 31 of 51

Hypothesis testing A paired t-test is appropriate for testing the means of two variables where both are from the same cases (respondents) i.e. the testing is on the difference between the paired variables (Sharpe et al., 2010). Null hypothesis H0: Mean score for Prefer String Quartet and Prefer Jazz Combo are the same: mean difference is zero: d = 0 Mean score for Prefer String Quartet and Prefer Jazz Combo are not the same: mean difference is not zero: d 0

Alternate hypothesis

HA:

Assumptions Paired Data Assumption Independence Assumption The data for the two variables are paired because the same respondents answered both questions. Since the sample is random, the pairwise differences should be independent. Randomisation Condition The data is obtained from a random sample. 10% Condition The sample size is fewer than 10% of the total population. Normal Population Assumption It is assumed that population of pairwise differences follow a Normal model. Nearly Normal Condition The sample distribution of pairwise differences is not Normal according to the following histogram and Q-Q plots.

Page 32 of 51

Page 33 of 51

Method/Technique Under these conditions, a Students t-model with n-1 = 399 degrees of freedom is used and a paired t-test is performed with caution.

The null hypothesis is rejected at 5% significance level (t399 = -10.030, p<0.001<0.025).

from the mean score for Prefer Jazz Combo (M=3.70, SD=1.221). The 95% confidence interval for the difference is (-1.426, -0.959).

The mean score for Prefer String Quartet (M=2.50, SD=1.420) is significantly different

Page 34 of 51

between 1.4 to 1 survey points lower than jazz combo. Hence the restaurant should provide live entertainment by a jazz combo band.

This means that potential patrons average preference for string quartet is likely to be

Page 35 of 51

Q7

Advertising in radio programmes

The following pie chart shows the percentage of potential patrons who listen to each type of radio programme. The largest portion listens to Rock (39.75%), indicating that advertising should be placed during Rock programmes.

Hypothesis testing A one-way ANOVA test is appropriate for testing more than two independent means (Sharpe et al., 2010). Null hypothesis Alternate hypothesis H0: HA: C = E = R = T at least one mean is different

Where C, E, R, T = average score of likelihood to patronise for potential patrons that listen to Country & Western, Easy Listening, Rock and Talk/News respectively Assumptions Independence Assumption Since the sample is random, the groups should be independent of each other.

Page 36 of 51

Randomisation Condition The data is from a random sample. Equal Variance Assumption It is assumed that the variances of each group are equal. Similar Variance Condition The box plots of the four groups in the sample data are shown below, which indicate that their variances are not similar.

Normal Population Assumption

It is assumed that the residuals follow a Normal distribution.

Method/Technique Under these conditions, a one-way ANOVA test is performed with caution on the means with a post hoc Tukey test.

The null hypothesis is rejected at 5% significance level (F3,381 =131.581, p<0.001< 0.05). patronise of the different groups of potential patrons.

There is a statistically significant difference between average score of likelihood to

Page 37 of 51

A Tukey post hoc test indicated that potential patrons that listen to Easy Listening have than those that listen to Talk/News (M=3.65, n=82), Rock (M=2.72, n=159) and Country

a statistically significant higher average score of likelihood to patronise (M=4.24, n=78)

& Western (M=1.61, n=66).

Page 38 of 51

This means that potential patrons who listen to Easy Listening are most likely to patronise the restaurant. Advertisements should be placed at radio stations that provide Easy Listening and Rock programmes. The former is to create awareness amongst those

who are mostly likely to patronise and the latter is to attract the largest pool of potential patrons.

Page 39 of 51

Q8

Likelihood of patronage

Multiple regression is appropriate for analysing relationships between a dependent 20, age, family size and gender) (Sharpe et al., 2010). Note:

variable (likelihood to patronise) and multiple independent variables (variables 11

The dependent variable is an ordinal variable, thus a more appropriate regression analysis is ordinal regression. For the purpose of this assignment, multiple regression is performed instead.

Hypothesis Testing (F-test) Null hypothesis Alternate hypothesis H0: HA: 1 = 2 = 3 = . . . = 13 = 0 at least one 0

Where 1 to 13 = slope coefficients of variables 11 20, age, family size and gender. Assumptions Linearity Assumption Independence Assumption It is assumed that there is a linear relationship between the dependent variable and each of the predictor variables. Since the sample is random, the data values should be independent. Randomisation Condition The data is obtained from a random sample. Equal Variance Assumption Normality Assumption It is assumed that the variances of residuals are equal. It is assumed that the residuals follow a Normal distribution.

Method/Technique Under these conditions, a multiple regression analysis is performed with caution.

Page 40 of 51

The null hypothesis is rejected at 5% significance level (F13,386 =59.427, p<0.001<0.05).

It is statistically significant that at least one slope coefficient is not zero. The adjusted R2 indicates that 65.6% of the variation in likelihood to patronise can be explained by the regression model.

Page 41 of 51

When all the predictor variables are considered simultaneously, regression equation is: Likelihood to patronise = 1.4 + 0.189(Prefer Waterfront View) + 0.002(Prefer Drive Less than 30 Minutes) + 0.305(Prefer Formal Waitstaff Wearing Tuxedos) + 0.091(Prefer Unusual Desserts) 0.194(Prefer Large Variety of Entres) + 0.130(Prefer Unusual Entres) 0.288(Prefer Simple Dcor) + 0.162(Prefer Elegant Dcor) + 0.075(Prefer String Quartet) + 0.131(Prefer Jazz Combo) + 0.003(Age) + 0.000(Family Size) 0.34(Gender) The results indicate that the significant predictors of likelihood to patronise are: Prefer Waterfront View (t = 3.141, p = 0.002) Prefer Formal Waitstaff Wearing Tuxedos (t = 4.298, p < 0.001) Prefer Large Variety of Entres (t = -3.390, p = 0.001) Prefer Unusual Entres (t = 2.082, p = 0.038) Prefer Simple Dcor (t = -4.249, p < 0.001) Prefer Elegant Dcor (t = 2.343, p = 0.020) Prefer Jazz Combo (t = 3.009, p = 0.003)

To construct a regression model with only significant predictors, the regression analysis is re-performed without the non-significant predictors.

Page 42 of 51

The refined regression equation is: Likelihood to patronise = 2.143 + 0.149(Prefer Waterfront View) + 0.327(Prefer Formal Waitstaff Wearing Tuxedos) 0.206(Prefer Large Variety of Entres) + 0.163(Prefer Unusual Entres) 0.336(Prefer Simple Dcor) + 0.194(Prefer Elegant Dcor) + 0.112(Prefer Jazz Combo)

This means that the above variables have a significant impact on whether a potential patron is likely to patronise the restaurant. Hence they should be considered in detail during the decision making process.

Page 43 of 51

Q9

Average age of probable and non-probable patrons

and non-probable patrons. They indicate that the average age of probable patrons SD=9.262). (M=62.15, SD=4.779) is likely to be older than that of non-probable patrons (M=51.61,

The following shows the histograms and side-by-side box-plots of the ages of probable

Page 44 of 51

Hypothesis testing A two-sample t-test (independent samples t-test) is appropriate for comparing the mean value of two independent samples (Sharpe et al., 2010). Null hypothesis Alternate hypothesis H0: HA: probable = non-probable probable non-probable

Where probable is the average age of probable patrons non-probable is the average age of non-probable patrons Assumptions Independence Assumption Since the sample is random, the data values within each group should be independent. Randomisation Condition The data is from a random sample. 10% Condition The sample size is fewer than 10% of the total population. Normal Population Assumption It is assumed that the population of both groups follow a Normal distribution.
Page 45 of 51

Nearly Normal Condition The histograms show that the distributions are skewed for both groups. However, since the sample size is large, this is not a concern. Independent groups assumption Probable and non-probable patrons are independent groups and there is no reason to think that those in one group can affect the other group.

Method/Technique A Students t-model is used and a two-sample (independent samples) t-test for equality of means is performed.

variances not assumed t-test is used.

Based on Levenes test for equality of variances, F=22.723, p<0.001<0.05, so the equal

The null hypothesis is rejected at 5% significance level (t365.652 =14.868, p<0.001<0.025). The average age of probable patrons (M=62.1532, SD=4.77912, n=111) is significantly different from the average age of non-probable patrons (M=51.6125, SD=9.26212, n=289). The 95% confidence interval of the difference is (9.14657, 11.93482).

Page 46 of 51

This means that the average age of probable patrons is likely between 9 to 12 years

older than that of non-probable patrons, which is consistent with market research that a larger proportion of frequent restaurant diners belong to older age groups (Casual & Fine Dining, 2008). The restaurant should be designed to cater to the needs of elderly patrons e.g. sufficient movement space for wheelchairs.

Page 47 of 51

Q10 Gender of probable patrons


The following contingency table shows the number of probable patrons by gender.

If probable patron is independent of gender, the expected values for each cell would be:

All cells have an expected value > 5. Hypothesis testing A chi-square test of independence is appropriate for testing whether there is an association between two categorical variables (Sharpe et al., 2010). Null hypothesis Alternate hypothesis Assumptions Counted Data Condition Independence Assumption The data are counts of respondents categorised on two categorical/ordinal variables. Since the sample is random, the data values should be independent. Randomisation Condition The data is from a random sample. Sample Size Assumption Expected Cell Frequency Condition No more than 20% of expected counts < 5 No expected counts < 1
Page 48 of 51

H0: HA:

Probable patron and gender are independent Probable patron and gender are not independent

Method/Technique The conditions are satisfied, so a 2 model with (2 1) (2 1) = 1 df is used and a chisquare test of independence is performed.

Failed to reject the null hypothesis at 5% significance level (21 =0.285, p=0.593>0.05). gender.

There is insufficient evidence to suggest a relationship between probable patron and

marketing efforts should be carried out for both gender.

This means that men and women are equally likely to be a probable patron, hence

Page 49 of 51

5.

Conclusion

In summary, it is recommended that the restaurant be located in Location B (post codes entre items should be around $18. More marketing efforts should be carried out on both male and female potential patrons with higher income and during Rock and Easy

3, 4 and 5). It should have simple dcor, jazz combo live entertainment and the price of

Listening radio programmes. Last but not least, the restaurant should cater to the needs conditions required under the forecasting model.

of elderly patrons. However, there is insufficient data to provide evidence for the

It is recommended to perform a qualitative analysis (e.g. interviews and focus groups)

following this quantitative analysis using an explanatory sequential approach or comprehensive insights and better understanding of the key success factors.

embedded approach (Creswell, 2011). Such mixed methods will help to provide more

(2,059 words computed by Microsoft Word from Introduction to Conclusion, excluding tables and charts.)

Page 50 of 51

References
Casual & Fine Dining. (2008). Leisure Market Research Handbook (pp. 189-192). Richard Creswell, J. W. (2011). Educational Research: Planning, Conducting, and Evaluating Pearson Education Inc. K. Miller & Associates.

Quantitative and Qualitative Research (4th ed.). Upper Saddle River, New Jersey:

Duecy, E. (2005, July 18). Fine-dining restaurants: Exotic decors a 'competitive weapon'. Sharpe, N. D., De Veaux, R. D., & Velleman, P. (2010). Business Statistics (2nd ed.). Upper Saddle River, New Jersey: Pearson Education Inc. Nation's Restaurant News, 39(29), 65.

Page 51 of 51