Sie sind auf Seite 1von 35

7/3/2013

MATH SEMINAR 2A (Statistics)

Prof. Aryee

Math Seminar 2 (First Year Statistics Seminar)

Why Statistics
Any field of study that collects data, summarizes and describes the information collected, as well as interprets and draws valid conclusions from the information collected is a candidate for statistical application

Why Statistics

7/3/2013

Why Statistics
Statistics provide us with the tools to analyze data. Whether we want to detect differences between groups of people, events or activities, reorganize data to identify hidden patterns, or to create models in predicting outcomes of future events, statistics provide us with a variety of tools to achieve our goals.
3
Why Statistics

Why Statistics
The following list of reasons for taking statistics. Statistics gives us a clearer understanding of the world around us. It provides the methods and techniques for developing knowledge and for learning from information, thus forming the basis for thinking and planning ahead.
4
Why Statistics

7/3/2013

Why Statistics
Statistics allow us to formulate questions that can be addressed by using data and it provides the methods needed to adequately describe, summarize, analyze, interpret and draw valid conclusions from the set of data to answer the questions.

Why Statistics

Why Statistics
Proper usage of statistics helps us to critically interpret and evaluate claims as well as to make informed decisions in the face of uncertainty. The tools of statistics are widely employed in many fields of study, including business, communication, science, law, and so on.

Why Statistics

7/3/2013

Statistics, What Is It?


Everything dealing with the collection, analysis, interpretation, presentation, and making valid conclusion of numerical data belong to the domain of statistics.

Why Statistics

Statistics, What Is It?


Statistics is discipline which deals with:

a) Designing the data collection process and experiments, b) Preparing the data collected for analysis and to aid understanding, c) Analyzing and drawing conclusions from data, and d) Making estimates and predictions from data.
8
Statistics, What Is It?

7/3/2013

Statistics, What Is It?


In general, statistics is a collection of methods for gathering, organizing, summarizing, describing, analyzing, interpreting, presenting, and making valid conclusions of numerical data.
Often, data collected has inherent degree of variability within them. Statistical techniques help us deal with this variability and uncertainty in the data. 9
Statistics, What Is It?

Statistics, What Is It?


The American Statistical Association described statistics as the science of learning from data, and of measuring, controlling, and communicating uncertainty. In addition, they indicated that statistics provide the navigation essential for controlling the course of scientific and societal advances.
10
Statistics, What Is It?

7/3/2013

Statistics, Two (2) branches


Statistics is grouped under two broad categories, 1. Descriptive statistics, and 2. Inferential statistics.

11

Statistics, two branches

What is Descriptive Statistics


The purpose of descriptive statistics is to make the collected data more easily comprehensible and understandable. In descriptive statistics, analysis of data is directed entirely towards describing, summarizing, and interpreting the basic features or characteristics of the data actually collected.
12
Statistics, two branches

7/3/2013

What is Descriptive Statistics



13
What is Descriptive Statistics

The aim is to describe what is going on within the data or what the data collected actually shows. There is no intention to make conclusions that extend beyond the data actually collected.

What is Descriptive Statistics


Descriptive statistics techniques provide the platform for reducing data. These techniques include:

Numerical counts or frequencies Construction of tables and graphs Computation of various descriptive measures such as averages, percentages, and percentiles Computation of variability measures such as range, variance, and standard deviations
14
What is Descriptive Statistics

7/3/2013

Descriptive Statistics, Examples


How much do Americans borrow/have borrowed for college? As of Quarter 1 in 2012, the average student loan balance for all age groups is $24,301. About one-quarter of borrowers owe more than $28,000; 10% of borrowers owe more than $54,000; 3% owe more than $100,000; and less than 1%, or 167,000 people, owe more than $200,000.
Source: http://libertystreeteconomics.newyorkfed.org/2012/03/grading-student-loans.html

15

Descriptive Statistics, Examples

Descriptive Statistics, Examples


What is left? Not enough good men for all the women available. The Daily News, USA Weekend Section, printed these statistics: 9 out of 10 on-the-job fatalities are men, 5 out of 7 victims of traffic accidents are men, 4 out of 5 homicide victims are men, at least 4 out of 5 suicides are men, 9 out of 10 HIV-related deaths are men.
Source: http://www.mendontlisten.com/StartingAgainPtrFr.html

16

Descriptive Statistics, Examples

7/3/2013

Inferential Statistics
In inferential statistics, analysis of data is directed towards generalizing, summarizing, predicting and making valid conclusion about a larger set of data from which the given sample was collected and of which the given sample forms just a part.

17

What is Inferential Statistics?

Inferential Statistics
When we use statistical methods to draw conclusions, make estimations, predictions, and generalization about an entire set of data by studying only part of the data, then we are dealing with inferential statistics. Inferential statistics allows us to use information from a smaller group to make inferences about a larger group from which the smaller group was taken.
18
Inferential Statsitics

7/3/2013

Inferential Statistics
How many children die each year from child abuse? Based on data reported by CPS agencies in 2001, it is estimated that nationwide, 2,000 children died as a result of abuse or neglect. Based on this number, five to six children die each day as a result of child abuse or neglect.
Source: http://www.preventchildabuse.com/abuse.htm

19
Inferential Statistics

Population and Sample


There are two very important underlying concepts in inferential statistics. These are the concepts of population and sample. The paragraphs below explain these concepts A population is a complete collection of all the set of units such as people, objects, events, transactions, animals, plants, or other things whose characteristics a researcher is interested in learning about.
20
Statistics, what is a population and a sample

10

7/3/2013

Population and Sample


Identify the population of interest to the researcher.
In order to determine whether or not the cost of college education is spiraling out of control, most analyst focus on the cost of tuition as a yardstick to measure college education cost. Other costs accompanying college education, such as textbook cost, are rarely considered. A researcher wishes to estimate the textbook cost of first-year students at Seton Hall University. To do so, she randomly selected 300 first-year students and found that their average textbook cost was $350 per semester.

Answer: The population of interest is all firstyear Seton Hall university students.
21
Statistics, what is a population and a sample

A sample
When a population is inaccessible or not available (due to time or money constraint), or we cannot get a complete set because it is impractical or impossible to obtain a complete set, we draw samples. A sample is a collection of some (but not all) of the elements of the population. Thus, a sample is a subset of the population. It is usually selected to represent the population from which it was drawn.
22
Statistics, what is a population and a sample

11

7/3/2013

A sample
It is important to note that, different samples may give us different portions of the same population. As a result, if we already know the result of one sample and then draw a second sample from the same population we should not expect to have the exact same replica of data in the first. The difference in two or more samples drawn from the same population is called sampling variation or sampling error. The sampling variation decreases as we increase the size of our sample. 23
Statistics, what is a population and a sample

A sample

24

Statistics, what is a population and a sample

12

7/3/2013

A sample
It is important to note that, different samples may give us different portions of the same population. As a result, if we already know the result of one sample and then draw a second sample from the same population we should not expect to have the exact same replica of data in the first. The difference in two or more samples drawn from the same population is called sampling variation or sampling error. The sampling variation decreases as we increase the size of our sample. 25
Statistics, what is a population and a sample

A Random Sample
The sample taken must be based on a selection technique called random sampling. To use this technique, each member of the population must have an equal chance of being selected. A sample resulting from a random sampling technique is called a random sample. A random sample is one in which every different subset of a specified size from the population has equal probability of being selected. We can use a table of random numbers to select a random sample.
26
A RANDOM sample

13

7/3/2013

A unit of analysis or an element


A unit of analysis or an element is a single entity of the population from which information will be collected for analysis. In a study, a statement about the population under investigation must also state the object of interest, that is, who (the entityperson, city, county, state, school, organization) is being investigated. A unit of analysis (also called an element) is a single entity of the population from which information will be collected for analysis.
27
A unit of analysis

A unit of analysis or an element


In most studies, the units of analysis are the smallest units that are independent of each other. In identifying the units of analysis, you must answer the question: what things are being compared or examined by the researcher? The things being compared or examined can be at the level of an individual (say, students in a college), or groups of individuals representing different categories (such as undergraduate and graduate students).
28
A unit of analysis

14

7/3/2013

A unit of analysis or an element


A researcher may be interested in assessing the patterns generated by the variation among those units. In social science, the most typically chosen unit of analysis is the individual person. For instance, in a Gallop Poll, the unit of analysis is the individual voter. The population being studied or examined must be clearly defined so that there is no ambiguity as to whether or not an element is a member of the population.
29
A unit of analysis

A unit of analysis or an element


Mr. Jermic Smith, a pollster, is interested in finding out what percentage of registered voters in the country will vote for a particular presidential candidate, Mr. John David. Jermic randomly selected 3000 registered voters and ask them on the phone who they will vote for. 1,680 (56%) of the people polled informed him that they will vote for Mr. David. What is the unit of analysis? Answer: The unit of analysis is each individual registered voter represented the population of interest.
30
A unit of analysis

15

7/3/2013

What is a parameter
A parameter is a numerical descriptive measure of a population. It is usually a single value computed by using all the values in the entire population.

The study in which all members of the population are included in the study is called a census.

31

What is a parameter

Example of a parameter
In an English class of 40 students, 24 of them had participated in the English as a Second Language Program which provide a coursework for comprehensive language development for students from other non-English speaking countries. The statement "60% of the students in this English class had participated in the English as a Second Language Program" is a descriptive statement. The population is the 40 students in this English class. The 60% represents a parameter of interest.
32
Example of a parameter

16

7/3/2013

What is a statistic
A statistic is a numerical descriptive measure of a sample. It is usually a single numerical value computed by using only the sample data, and not the entire population.
Most statistical investigation leads to searching for the values of population parameters that are of interest to the investigator. If the population is not readily available, or we cannot get a complete set because it is impractical or impossible to obtain a complete set, we draw samples and then compute the necessary descriptive statistic. We then make statistical inference about the population parameter using the computed sample statistic.

33

What is a statistic?

What is a variable
A parameter is a numerical descriptive measure of a population. It is usually a single value computed by using all the values in the entire population. The study in which all members of the population are included in the study is called a census.

34

What is a variable

17

7/3/2013

What is a variable
A variable is usually a common characteristic that an investigation focuses on after all the units of analysis in the population or sample underlying the study have been identified. A variable can also be thought of as the characteristics of the units of analysis under investigation that vary from one unit to another, taking on different values, categories, or attributes. A variable tells us what particular characteristic is being studied or is of interest to the researcher. Researchers focus on the empirical measurement of this characteristic. 35
What is a variable

What is the variable of interest?


The reputations of many businesses can be severely damaged by shipments of manufactured items that contain a large percentage of defectives. A manufacturer of alkaline batteries wants to be reasonably certain whether fewer than 5% of its batteries are defective. To do so, 300 batteries are randomly selected from a very large shipment, each is tested and 10 defective batteries are found. Answer: The variable of interest is the number of defective batteries.
36
What is the variable of interest?

18

7/3/2013

Values of a variable
It is sometimes possible to confuse the difference between the variables name and the different categories or attributes which the variable consist, called the variables value. For example, gender is a variable consisting two different categories namely male and female. In this example, male and female are values we use to distinguish different people, however the name of the variable is gender.

37

Values of a variable

Values of a variable
A variable may consist of two or more values. Suppose a question on a survey asks each person to choose the response that best reflects their marital status: Are you Married, Widowed, Divorced, Separated, or Never Married. In this case, the name of the variable is marital status. The five different categoriesMarried, Widowed, Divorced, Separated, or Never Marriedare the values of the variable.

38

Values of a variable

19

7/3/2013

Values of a variable
Some variables, such as height, weight, age, may take on so many values. Others, such as gender, may take on just a few values. Irrespective of how many values a variable may take on, you can usually determine the name of the variable by asking the question what is this individuals ______? For example, what is this individuals weight? So the name of the variable is weight. Suppose the answer is 120 pounds. Then the value of this variable is 120 pounds. In this case, 120 pounds is just one of the many values of this variable named weight. 39
Values of a variable

Values of a variable
Domestic Violence: Battered women who live in poverty are often forced to choose between abusive relationships and homelessness. In a study of 777 homeless parents (the majority of whom were mothers) in ten U.S. cities, 22% said they had left their last place of residence because of domestic violence (Homes for the Homeless, 1998).

What is the population of interest in this study? Answer: The population of interest is all homeless parents in the ten U.S. cities.
40
Values of a variable

20

7/3/2013

Values of a variable
Domestic Violence: Battered women who live in poverty are often forced to choose between abusive relationships and homelessness. In a study of 777 homeless parents (the majority of whom were mothers) in ten U.S. cities, 22% said they had left their last place of residence because of domestic violence (Homes for the Homeless, 1998). What is the variable of interest being study? Answer: The variable of interest is the response of each homeless parent as to whether or not the reason for leaving their last place of residence because of domestic violence. 41
Values of a variable

Values of a variable
Domestic Violence: Battered women who live in poverty are often forced to choose between abusive relationships and homelessness. In a study of 777 homeless parents (the majority of whom were mothers) in ten U.S. cities, 22% said they had left their last place of residence because of domestic violence (Homes for the Homeless, 1998).

What is the size of the sample used? Answer: The 777 homeless parents (the majority of whom were mothers) in the ten U.S. cities.
42
Values of a variable

21

7/3/2013

Social Research Design


Researchers in the social sciences study people, and they are interested in understanding the basic features and characteristics that mark the people and groups in which the people live and how and why these characteristics are related. Once they have an explanation, they offer hypothesis about social relationships and collect facts that can shed more light on social behaviors. .
Social Research Design

43

Social Research Design


Formulating of hypotheses usually begins with an observation of a characteristic that differ or varies across individuals or groups. Usually, the researcher wants to understand the differences or variations among the units of analysis. For example, a researcher would like to know why some students prefer to take professor A and others prefer to take professor B, why some students prefer to sit at the back of the classroom and others prefer to sit in the front row, or why some people voted for John Kerry and others voted for President George Bush in the 2004 US presidential elections. 44
Social Research Design

22

7/3/2013

What is a dependent Variable?


These why questions usually rush to our mind when we observe differences between people. There are two important lesions we can learn about these why questions. First, it is always based on some characteristic that varies. That is, it is always based on a variable. The particular variable that constitutes our why question is called the dependent variable. It is the variable we want to understand, is usually view in a particular wayas the effect of some unknown cause. 45
What is a dependent Variable?

What is a dependent Variable?


If we are interested in explaining why some students prefer to sit at the back rows and others prefer to sit in the front rows of the classroom, then seating preference is the name of our dependent variable. We are interested in two values of this variable: back rows and the front rows of the classroom.

46

What is a dependent Variable?

23

7/3/2013

What is a dependent Variable?


Explanation in social science begins by observing a characteristic that varies between subjects. The second thing is that, each why question (the dependent variable) implicitly request a causal explanation for the observe differences. In order words, each why question (or the dependent variable) is looking for what causes the differences between the respondents on this variable. For example, what causes one student to take professor A and another to take professor B?, or what causes one student to sit at the front row and another to sit in the back row of the classroom? 47
What is a dependent Variable?

What is an independent Variable?


Researchers use their creativity to come out with explanations for these why questions. These explanations involve identifying factors and reasons why something happens in a particular way. When researchers propose explanations to the why questions, it usually involve a characteristic that varies between subjects. The variable that is proposed as an explanations to the why questions is called the independent variable.

48

What is an independent Variable?

24

7/3/2013

What is an independent Variable?


In general, the variable that the researcher selects as the causal factor in an explanation is called independent variable. The independent variable is the variable the influences the behavior of other variables. The investigator can alter, manipulate or control the independent variable. In a study, the independent variable is the variable the researchers identifies as being responsible for influencing or producing effect or impart on other variables. 49
What is an independent Variable?

What is an independent Variable?


For example, a researcher may be interested in explaining why some communities have higher crime rates than others. When researchers propose an explanation, it must be stated in such a way that involves causation. Explanation for differences in crime rates, for example, may propose that the type of community, whether urban or rural, plays a causal role. We might make a statement that prevalence of crime is higher in the urban communities than in the rural communities.

50

What is an independent Variable?

25

7/3/2013

What is an independent Variable?


As more and more rural communities are turned into urban communities, due to increase population, different type of people including criminals are brought into the community, leading to an increase in the crime rate. Thus, this explanation proposes that the type of community is associated to crime rates. So the type of community, whether urban or rural, is the independent variable.

51

What is an independent Variable?

Identifying independent and dependent variables in the following hypothesis

Question 1: In comparing individuals, the mean number of hours spent watching TV will be higher among newspaper readers than nonreaders. Answer: Independent variablewhether or not an individual reads newspaper. Dependent variable number of hours spent watching TV. Question 2: In comparing candidates campaigning for elections, those who spend more money on their campaigns are more likely to win than those candidates who spend less money on their campaigns. Answer: Independent variableamount of money spent on campaign. Dependent variable whether or not the candidate won the election. Question 3: In comparing students, those who arrive late to class are more likely to receive poor grades than those who arrive on time. Answer: Independent variablewhether or not the student arrived late. Dependent variable whether or not the student received a poor grade.

52

Identify dependent and independent Variable

26

7/3/2013

What is an Hypothesis?
When researchers propose an explanation to a why question, the explanation must be described in such a way that it can be tested with an empirical data. A hypothesis, therefore, is a testable statement about the empirical relationship between independent variable and the dependent variable (or between cause and effect).
53
What is an Hypothesis?

What is an Hypothesis?
For example, we might formulate the hypothesis that students from richer communities have higher SAT scores than those from poorer communities.

54

What is an Hypothesis?

27

7/3/2013

What is an Hypothesis?
There are scientific procedures that must be followed to determine whether or not a hypothesis is incorrect. To determine whether or not a hypothesis is incorrect, researchers describes a set of conditions under which the hypothesis would be rejected. To test hypotheses, we use empirical comparison. For example, using empirical data, we can compare the income of people having less education to the incomes of people having more education. In general, we use empirical comparison to test the hypotheses. We will learn more about the set of procedures for determining whether or not the hypothesis is incorrect. 55
What is an Hypothesis?

Writing Hypothesis
After we have determine the two variables whose relationship we are trying to examine, we can start our hypothesis by linking one category of the independent variable with another category of the dependent variable and make a statement about their relationship in terms of more likely or less likely type of relationship. We can use the following format: In comparing [put the name of the units of analysis here], those who are/those having [put the name of one of the category of the independent variable here] are more likely to [put the name of one of the category of the dependent variable being considered here] than those who are/those having [put the name of a different category of the independent variable with the lowest percentage here].

For example, in the attitude towards gun permit, we can make a statement such as: In comparing individuals, those who are women will be more likely to favor handgun permits than those who are men.

56

Writing Hypothesis

28

7/3/2013

Writing Hypothesis
After we have determine the two variables whose relationship we are trying to examine, we can start our hypothesis by linking one category of the independent variable with another category of the dependent variable and make a statement about their relationship in terms of more likely or less likely type of relationship. We can use the following format: In comparing [put the name of the units of analysis here], those who are/those having [put the name of one of the category of the independent variable here] are more likely to [put the name of one of the category of the dependent variable being considered here] than those who are/those having [put the name of a different category of the independent variable with the lowest percentage here].

For example, in the attitude towards gun permit, we can make a statement such as: In comparing individuals, those who are women will be more likely to favor handgun permits than those who are men.

57

Writing Hypothesis

Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students. Question 1: What is the population of interest in this study? Answer: The population of interest is all students attending that particular university. 58
What is an Hypothesis?

29

7/3/2013

Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students. Question 2: What is the variable of interest being study? Answer: The variable of interest is the topics that students most want to discuss with parents. 59
What is an Hypothesis?

Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students. Question 3: What are the values of this variable of interest? Answer: The values of this variable are: discuss about family financial situation, talk about school, and talk about religion. 60
What is an Hypothesis?

30

7/3/2013

Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students. Question 4: What is the size of the sample used? Answer: The sample size is 500 students.

61

What is an Hypothesis?

Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students. Question 6: What was a descriptive statistics used in this study?
Answer: Of the 500 students, 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion.

62

What is an Hypothesis?

31

7/3/2013

Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students.

Question 7: What statistical inference could be made from this study? Answer: 55% of all students in that university would like more to discuss about family financial situation. That is, majority of all students would like more to discuss about family financial situation. Very few students would like to talk about religion. 63
What is an Hypothesis?

Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students.

Question 8: What statistical inference could be made from this study? Answer: 55% of all students in that university would like more to discuss about family financial situation. That is, majority of all students would like more to discuss about family financial situation. Very few students would like to talk about religion. 64
What is an Hypothesis?

32

7/3/2013

Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students.

Question 9: State a possible hypothesis.

(Hint: use your creativity to come out with explanations why majority of all students would like more to discuss about family financial situation but very few students would like to talk about religion.)

Answer: In comparing individuals, students who are more concern about their own financial aid eligibility would like more to discuss about family financial situation and students who are less concern about their own financial aid eligibility would like to talk about religion. 65
What is an Hypothesis?

Examples
A Gallup Youth Poll was conducted at a certain university to determine topics that students most want to discuss with parents. The findings show that 55% would like more to discuss about family financial situation, 35% would like to talk about school, and 10% would like to talk about religion. The survey was based on a sampling of 500 students.

Question 11: For your hypothesis, what is the independent and dependent variable? Answer: Independent variabledegree of concern of students own financial aid eligibility. Dependent variable topics that teenagers most want to discuss with parents. Question 12: For your hypothesis, list possible control variables. Answer: family income. 66
What is an Hypothesis?

33

7/3/2013

Control Variables
In conducting a research, the independent variable may not be the only variable that might have effects on the dependent variable. Similarly, there may be other variables that could influence the relationship among the variables. If such variables are not controlled, they can confuse the interpretation of the research results. The variables which are not part of the variables under investigation but could potentially influence or affect the relationship among the variables if not controlled are called controlled variables. Controlled must be held constant, or must be prevented from varying, otherwise, they can confuse the interpretation of the research results. Control variables are important because they limit the focus of the research only to specific subgroups

67

What is an Hypothesis?

Control Variables
Suppose in a study, we are interested in understanding why some people perform much better academically than others. A researcher may propose that class attendance plays a causal role. This may lead to a statement that students who attend class more are more likely to perform better academically, on the average, than students who attend class less. In this case, the variable we want to explain (the dependent variable) is academic performance, and the variable that represents the causal factor in the explanation (the independent variable) is class attendance. However, there might be additional explanations that can affect academic performance other than class attendance. For instance, amount of hours of study, age of student, class participation, social responsibilities, etc. might also have influence on academic performance. As a result, the researcher can limit the study to a certain age group, or a particular amount of class participation.

68

Control Variables

34

7/3/2013

69

35

Das könnte Ihnen auch gefallen