Sie sind auf Seite 1von 51

30001 Statistics – a.y.

2018-2019

BIEF – Class 22
INSTRUCTORS
• Emilio Gregori (emilio.gregori@unibocconi.it)
Office hours: Monday , 4.30 p.m. to 5.30 p.m.
13+13 Room: 3-D1-12 (Roentgen building)
Lectures

• Stafano Rizzelli (stefano.rizzelli@unibocconi.it)


Office hours: Thursday , 6.00 p.m. to 7.00 p.m.
5+5 Room: 3-E2-fm02 (Roentgen building)
Practical
sessions
CLASS CALENDAR (1/2)
Date Time Room Activity Instructor
Thursday 06/09/2017 10.30 - 12.00 N25 Lect. 1 Gregori

Monday 10/09/2017 08.45 - 10.15 N26 Lect. 2 Gregori

Tuesday 11/09/2017 16.15 - 17.45 N26 Lect. 3 Gregori

Wednesday 12/09/2017 12.30 - 14.00 N36 Lect. 4 Gregori

Thursday 13/09/2017 10.30 - 12.00 N25 P.S. 1 Rizzelli

Friday 14/09/2017 12.30 - 14.00 N23 Lect. 5 Gregori

Monday 17/09/2017 08.45 - 10.15 N26 Lect. 6 Gregori

Tuesday 18/09/2017 16.15 - 17.45 N26 Lect. 7 Gregori

Thursday 20/09/2017 10.30 - 12.00 N25 P.S. 2 Rizzelli

Monday 24/09/2017 08.45 - 10.15 N26 Lect. 8 Gregori

Tuesday 25/09/2017 16.15 - 17.45 N26 Lect. 9 Gregori

Thursday 27/09/2017 10.30 - 12.00 N25 P.S. 3 Rizzelli

Monday 01/10/2017 08.45 - 10.15 N26 Lect. 10 Gregori

Tuesday 02/10/2017 16.15 - 17.45 N26 Lect. 11 Gregori

Thursday 04/10/2017 10.30 - 12.00 N25 P.S. 4 Rizzelli

Monday 08/10/2017 08.45 - 10.15 N26 Lect. 12 Gregori

Tuesday 09/10/2017 16.15 - 17.45 N26 Lect. 13 Gregori

Thursday 11/10/2017 10.30 - 12.00 N25 P.S. 5 Rizzelli


CLASS CALENDAR (2/2)
Date Time Room Activity Instructor
Thursday 25/10/2017 10.30 - 12.00 N25 Lect. 14 Gregori

Monday 29/10/2017 08.45 - 10.15 N26 Lect. 15 Gregori

Tuesday 30/10/2017 16.15 - 17.45 N26 Lect. 16 Gregori

Monday 05/11/2017 08.45 - 10.15 N26 Lect. 17 Gregori

Tuesday 06/11/2017 16.15 - 17.45 N26 Lect. 18 Gregori

Thursday 08/11/2017 10.30 - 12.00 N25 P.S. 6 Rizzelli

Monday 12/11/2017 08.45 - 10.15 N26 Lect. 19 Gregori

Tuesday 13/11/2017 16.15 - 17.45 N26 Lect. 20 Gregori

Thursday 15/11/2017 10.30 - 12.00 N25 P.S. 7 Rizzelli

Monday 19/11/2017 08.45 - 10.15 N26 Lect. 21 Gregori

Tuesday 20/11/2017 16.15 - 17.45 N26 Lect. 22 Gregori

Thursday 22/11/2017 10.30 - 12.00 N25 P.S. 8 Rizzelli

Monday 26/11/2017 08.45 - 10.15 N26 Lect. 23 Gregori

Tuesday 27/11/2017 16.15 - 17.45 N26 Lect. 24 Gregori

Thursday 29/11/2017 10.30 - 12.00 N25 P.S. 9 Rizzelli

Monday 03/12/2017 08.45 - 10.15 N26 Lect. 25 Gregori

Tuesday 04/12/2017 16.15 - 17.45 N26 Lect. 16 Gregori

Thursday 06/12/2017 10.30 - 12.00 N25 P.S. 26 Rizzelli


TUTOR
• Stefano Rizzelli

Tutoring sessions: See separate document on


e‐learning for times (2‐hour slots). No additional
exercises, but group office hour in a classroom.
Info about the course on the web

International site > Programs > Current students > Courses offered in Academic Program 2018-2019 >
Code “30001” search > 30001 > click on class 22 or scroll down to BIEF/BIEM
TEXTBOOKS
• P. Newbold, W.L.Carlson, B.
Thorne (2013). Statistics for
Business and Economics, and
Student CD, Pearson. (Eighth
edition – Global edition).
TEXTBOOKS
• Additional Materials Document ("AMD"),
posted on the web learning page of the
course.

30001 STATISTICS - Material common to all classes > Materials > 2. ADDITIONAL
MATERIAL > Additional Material Document (AMD) on Descriptive Statistics
TEXTBOOKS

(forthcoming)
TOPICS

• Descriptive statistics
• Statistical inference:
– Random variables and introduction to statistical
inference
– Confidence intervals
– Hypothesis testing
– Correlation and linear regression
EXAMINATION RULES

WRITTEN
PARTIAL EXAMS
GENERAL EXAM

• JAN, 10th *
OCT, 19th
• JAN, 29th
First paper exam (max. 27 points)
• JUL, 1st
• SEP, 2nd
On line exam (max 4 points)
JAN, 10th*,
DEC, 10th JAN, 29th
Second paper exam (max. 27 points)

* DEC, 11th
FINAL GRADE for exchange
students only
(max 31 points)
EXAMINATION RULES: paper partial ex.
• Exercises, theoretical questions.
• First: lect. 1 to 13; Second: lect. 14 to 26
mainly (+first!)
• Maximum grade: 27 (both).
• Each test is passed if the score is  15.
• Final grade: arithmetic mean of the two.
• Exam successfully passed if the final grade
is  18.
EMAMINATION RULES: on line partial ex.
• Using R and Radiant, with student’s own laptop
(in a flat room).
• Exercises about analysis on a dataset.
• Topics: from lecture 1 to lecture 26.
• Max score: +4 points
• Student having passed the first paper exam =>
automatically signed up for the on line exam
• Not attending the o.l. exam => 0 points out of 4.
• Scores applied only to the paper partial exams of
the corresponding academic year (not to any general
exam).
EXAMINATION RULES: partial ex.
• Final grade: (rounded) arithmetic mean of the
two paper partial exams + points of the on line
exam.
• The score 31/30 indicates 30/30 cum laude
EXATINATION RULES: general exam
• Exercises, theoretical questions, even about R
and Radiant.
• Maximum grade: 31/30.
• Exam successfully passed if the grade is  18.
MORE ABOUT EXAMS
• Signing up for the exams: STUDENTS WHO
HAVE NOT REGULARLY SIGNED UP WILL NOT
BE ALLOWED TO TAKE ANY OF THE EXAMS.
• If the exam is successfully passed the grade
will be automatically added to the student’s
record.
• Looking at your exam: paper examination
meeting in unique date.
BLACKBOARD E-LEARNING PLATFORM
30001 STATISTICS - Material common to 2018/2019 - 30001 STATISTICS cl. 22
all classes

MATERIALS BIEF CLASSES


1. COURSE ORGANISATION CLASS 22 (Gregori and Rizzelli)
(…\SYLLABUS and Exam Rules) Syllabus
Tool and Assignments
Course organisation
(Slides about MyMathLab)
2. ADDITIONAL MATERIAL
[Class calendar]
Additional Material Document Lectures
(AMD) on Descriptive Practical sessions
Statistics Tutoring sessions
SLIDES Other
3. EXCERCISES Feel free to use any
4. EXAMS material for studying!
EXAMS OF CURRENT A.Y.
OLD EXAMS WITH SOLUTIONS
5. R manual Check regularly the folders
MOCK EXAM (Simulation of the R on line to get more useful
test)
information and materials!
(forthcoming)
CASE
Motion Picture Industry case
Using data to analyse the determinants of movies’ success

Statistics
A.Y. 2018-2019
Index
• Introduction

• Data description and visualization

• Hypothesis testing

• Linear regression analysis

20
Introduction: background

The film industry is one of the biggest contributors to the entertainment industry but often it is characterized by strong
uncertainty in predicting whether a product will be successful or not.

With the advent of Big Data and Analytics, the analysis possibilities on the data (that can be collected on different
sources) have increased exponentially: in a rapidly growing industry such as the motion picture industry, data analytics
has opened a number of important new opportunities that can be used to analyze past data, make creative marketing
decisions, and accurately predict the fate of forthcoming movie releases.

Movie studios can use data analytics in many different ways, but the main goal is to assign each production its best chance
at success. The “success” is not a straightforward concept.. it can be ticket sales, reviews, social chatter, franchise
options, critical awards, …

21
Introduction: case set-up and data collection

Suppose we are an independent entertainment company interested in identifying the factors that mostly
impact on the movies’ revenue and, in general, on the success of a film.

Through a data collection activity on several sources, we were able to build a dataset containing 2868
films, produced from 1980 onwards. The data sources used to create the dataset are as follows:

(https://www.imdb.com/interfaces/): Films’ metadata and users rating

(https://www.the-numbers.com/): Box office and budget

(http://www.boxofficemojo.com/): Opening Box office info

(http://awardsdatabase.oscars.org/): Academy awards data

(http://www.omdbapi.com/): Film plot and critics ratings

22
Introduction: main research questions

Some questions to guide the analysis:

• Can we predict movie revenue based on a certain metric or attributes?


• Can we predict which films will be highly rated, whether or not they are a commercial
success?
• Is it possible to identify some specific features in the movies that were a flop?
• Is there a relation between the commercial success and the critic awards, especially Academy
Awards?
• Are there specific topics in a movie that statistically impact on box office results, user reviews
or awards?
• How sharp is the divide between major film studios and the independents?

23
Introduction: the dataset
Here below you can find the list of the variables in the dataset, with description and notes:
N Var Variable name Description Notes
1 id_imdb Movie ID ID
2 movie_title Title
3 year Year of production Numerical discrete
4 year_bins Year of production (Classes) Categorical ordinal - 3 categories
5 studio_distrib Film distributor Categorical nominal
6 major_distrib Film distributor - Major studio Dummy (1="Yes", 0="No")
7 main_genre Main Genre Categorical nominal
8 country Main country of production Categorical nominal
9 country_group Main country of production (Grouped) Categorical nominal
10 runtime_minutes Runtime minutes Numerical discrete
11 mpaa_content_rating Motion Picture Association of America rating (film's suitability for certain audiences based on its content) Categorical ordinal
12 production_budget Production Budget (Million $) Numerical continuous
13 box_office_domestic Box Office ($) - Total Revenues US (Million $) Numerical continuous
14 box_office_worldwide Box Office ($) - Total Revenues Worldwide (Million $) Numerical continuous
15 opening First 3-day opening revenues (US) (Million $) Numerical continuous
16 production_budget_adj Production Budget ($) - Inflation Adjusted (Million $) Numerical continuous
17 box_office_domestic_adj Box Office ($) - Total Revenues US - Inflation Adjusted (Million $) Numerical continuous
18 box_office_worldwide_adj Box Office ($) - Total Revenues Worldwide - Inflation Adjusted (Million $) Numerical continuous
19 opening_adj First 3-day opening revenues (US) - Inflation Adjusted (Million $) Numerical continuous
20 theaters Number of theaters in which the movie was initially released Numerical discrete
21 box_office_budget_ratio Revenues/Budget Ratio Numerical continuous
22 opening_boxoffice_ratio Opening Revenues/Total revenues (US) Numerical continuous

24
Introduction: the dataset
N Var Variable name Description Notes
23 plot_topic_power The plot of the movie contains the topic "Power" Dummy (1="Yes", 0="No")
24 plot_topic_love The plot of the movie contains the topic "Love" Dummy (1="Yes", 0="No")
25 plot_topic_money The plot of the movie contains the topic "Money" Dummy (1="Yes", 0="No")
26 plot_topic_friends The plot of the movie contains the topic "Friends" Dummy (1="Yes", 0="No")
27 plot_topic_world The plot of the movie contains the topic "World" Dummy (1="Yes", 0="No")
28 plot_topic_school The plot of the movie contains the topic "School" Dummy (1="Yes", 0="No")
29 plot_topic_family The plot of the movie contains the topic "Family" Dummy (1="Yes", 0="No")
30 plot_topic_murder The plot of the movie contains the topic "Murder" Dummy (1="Yes", 0="No")
31 morgan_freeman Morgan Freeman acted in the movie Dummy (1="Yes", 0="No")
32 brad_pitt Brad Pitt acted in the movie Dummy (1="Yes", 0="No")
33 robert_deniro Robert De Niro acted in the movie Dummy (1="Yes", 0="No")
34 angelina_jolie Angelina Jolie acted in the movie Dummy (1="Yes", 0="No")
35 meryl_streep Meryl Streep acted in the movie Dummy (1="Yes", 0="No")
36 julia_roberts Julia Roberts acted in the movie Dummy (1="Yes", 0="No")
37 imdb_num_votes Internet Movie Database - Number of votes for the movie Numerical discrete
38 imdb_average_rating Internet Movie Database - Rating Numerical continuous
39 metascore_rating Metacritic - Metascore Numerical continuous
40 rotting_tomatoes_rating Rotting Tomatoes - Tomatometer score Numerical continuous
41 n_tot_awards Total number of awards (including nominations) received by different organization (Academy, BAFTA, MTV, etc.) Numerical discrete
42 oscar_nomination_n Academy Awards (Oscar) - Number of nominations Numerical discrete
43 oscar_won_n Academy Awards (Oscar) - Number of statuettes won Numerical discrete
44 oscar_best picture Academy Awards (Oscar) - Statuette for Best Picture Dummy (1="Yes", 0="No")
45 oscar_directing Academy Awards (Oscar) - Statuette for Best Director Dummy (1="Yes", 0="No")
46 oscar_best_actor_actress Academy Awards (Oscar) - At least one statuette for acting (lead or supporting) Dummy (1="Yes", 0="No")

25
R manual
Install R, R-studio & Radiant!
• On your laptot (Windows, Mac)
• A.S.A.P.
• Needed for Lecture 3, next Tuesday!
SUGGESTIONS FOR PREPARING EXAMS

• Start from the exercises of the Textbook,


• thereafter: do additional exercises on the
e-learning section,
• finally: practice with the old exams!
Lecture 1

Ch. 1 § 1.1
Objectives of this lecture
• Statistical thinking and
decision making under uncertainty
• Basic concepts:
Variables & Data
Population & Sample
Parameter & Statistic
Descriptive statistics & Inferential statistics
Why statistics?

https://www.youtube.com/watch?v=wV0Ks7aS7
YI&feature=youtu.be
Decision Making in an Uncertain
Environment
Everyday decisions are based on incomplete information

Examples:
Will the job market be strong when I graduate?
Will the price of Yahoo stock be higher in six months than
it is now?
Will interest rates remain low for the rest of the year if
the federal budget deficit is as high as predicted?

Data are used to assist decision making


Data

«In God we trust.


All others must bring data.»
Dr. William Edwards Deming.
Statistics

Prof. Alan Agresti


Statistical thinking in decision making

Definition Facing the


of the problem Decision
problem

Theory
Experience
Literature
Design of
research
Knowledge
(sources of data) (presentation)
Statistical
procedures

Data Data Information


collection analysis
Keywords

Variable
A specific characteristic about a
set of individuals or objects
Examples:
• Private clients of bank: gender, age, amount of deposits, level of satisfaction;
Keywords

Data
Any observations that have been
collected about a characteristic
Examples:
Gender Age Deposits Satisfaction
Male 45 23 k€ Poor
Female 24 7.6 k€ High
Female 32 84 k€ Moderate
Keywords

Population
The complete set of all items
that interest an investigator
Examples:
• Names of all registered voters in the United States;
• Incomes of all families living in Daytona Beach;
• Annual returns of all stocks traded on the New York Stock Exchange;
• Grade point averages of all the students in your university;
• For a bank: amount of deposits of all clients;
Keywords

Sample
an observed subset of the
population
Keywords

Parameter
a quantitative measure that
describes a specific characteristic
of a population
Examples:
• Proportion of private clients of the bank that are females;
• Average amount of deposits of private clients of the bank;
• Median age of private clients of the bank;
• Pct. of clients of the bank between 25 and 39 years old;
• Proportion of clients of the bank at least moderately satisfied;
Keywords

Statistic
a quantitative measure that
describes a specific characteristic
of a sample
Examples:
• Proportion of clients in the sample that are females;
• Average amount of deposits of clients in the sample ;
• Median age of clients in the sample ;
• Pct. of clients in the sample between 25 and 39 years old;
• Proportion clients in the sample at least moderately satisfied;
The process of inference
Population Sample
(N = size: # of items) (n = size: # of observations)

A measure computed on the A measures computed


items of the population is from sample data is called
called parameter statistic
The process of inference
https://www.youtube.com/watch?v=yxXsPc0bp
hQ
How to…

•…manage sampling RANDOM

•…manage errors
Random sampling
Simple random sampling is a procedure in which
• each member of the population is chosen
strictly by chance,
• each member of the population is equally
likely to be chosen,
• every possible sample of n objects is equally
likely to be chosen
The resulting sample is called a random sample
Systematic sampling 31

For systematic sampling,

• Assure that the population is arranged in a way that is


not related to the subject of interest
• Select every jth item from the population…
• …where j is the ratio of the population size to the
sample size, j = N/n
• Randomly select a number from 1 to j for the first item
selected

• The resulting sample is called a systematic sample


Example of Systematic sampling

Suppose you wish to sample n = 9 items


from a population of N = 72.

j = N/n = 72 / 9 = 8

Randomly select a number from 1 to 8 for the first item to include


in the sample; suppose this is item number 3.

Then select every 8th item thereafter

(items 1, 2, 3 , 4, 5, 6, 7, 8, 9,10,11, …,19, …, 27, …, 35, …, 43, …, 51, …, 59, …, 67, …, 72)

31
The two branches of statistics
(to be resumed
next lecture)

What is statistics?

A set of graphical and numerical … developed to


procedures to:
make predictions and estimates
collect,
for drawing conclusions
process or making decisions
(analyze, summarize, present)
about a population
interpret data … based on sample results

DESCRIPTIVE STATISTICS STATISTICAL INFERENCE


Recap
• Statistical thinking and
decision making under uncertainty
• Basic concepts:
Variables & Data
Population & Sample
Parameter & Statistic
Descriptive statistics & Inferential statistics
Upcoming

How to process data for describing variables by


tables and graphs?
DISCLAIMER
This material is carried out by using part of
slides provided by Pearson Education as
appendix of the Textbook.
Therefore it must be used only for didactic
purpose and it cannot be published, hired or
sold.
The use of the material and any violations of
Copyright © Pearson Education is under
your responsibility.

Das könnte Ihnen auch gefallen