Beruflich Dokumente
Kultur Dokumente
CORE
1.1
Classifying data
Statistics is a science concerned with understanding the world through data. The rst step in this process is to put the data into a form that makes it easier to see patterns or trends.
Some data
The data contained in Table 1.1 are part of a larger set of data collected from a group of university students.
Table 1.1 Student data
Height (cm)
Weight (kg)
Age (years)
57 58 62 84 64 74 60 50
18 19 18 18 18 22 19 34
86 82 96 71 90 78 88 70
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
Variables In a data set, we call the things about which we record information variables. An important rst step in analysing any set of data is to identify the variables involved, their units of measurement (where appropriate) and the values they take. In this particular data set there are six variables: height (in centimetres) sex (M = male, F = female) weight (in kilograms) plays sport (1 = regularly, 2 = sometimes, 3 = rarely) age (in years) pulse rate (beats/minute)
Warning!!
It is not the variable name itself that determines whether the data are numerical or categorical, it is the way the data for the variable are recorded. For example: weight recorded in kilograms, is a numerical variable weight recorded as 1 = underweight, 2 = normal weight, 3 = overweight, is a categorical variable
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
Exercise 1A
1 What is: a a numerical variable? Give an example. b a categorical variable? Give an example.
2 There are two types of numerical variables. Name them. 3 Classify each of the following variables as numerical or categorical. If the variable is numerical, further classify the variable as discrete or continuous. Recording information on: a b c d e f g length of bananas (in centimetres) h number of cars in a supermarket car park i daily temperature in C j eye colour (brown, blue, . . . ) k shoe size (6, 8, 10, . . . ) l the number of children in a family m city of residence (NY, London, . . . ) n number of people who live in your city/area time spent watching TV (hours) the TV channel most watched by students salary (high, medium, low) salary (in dollars) whether a person smokes (yes, no) the number of cigarettes smoked per day
4 Classify the data for each of the variables in Table 1.1 as numerical or categorical.
1.2
The sex of 11 preschool children is as shown (F = female, M = male): F M M F F M F F F M M Construct a frequency table to display the data.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party Cambridge University Press
Solution 1 Set up a table as shown. The variable Sex has two categories: Male and Female. 2 Count up the number of females (6) and males (5). Record this in the Count column. 3 Add the counts to nd the total count, 11 (6 + 5). Record this in the Count column opposite Total. 4 Convert the counts into percentages. Record this in the Per cent column. For example: 6 100% = 54.5% 11 5 Finally, total the percentages and record. percentage of females = Frequency Sex Female Male Total Count 6 5 11 Per cent 54.5 45.5 100.0
There are two things to note in constructing the frequency table in Example 1. 1 In setting up this frequency table, the order in which we have listed the categories Female and Male is quite arbitrary; there is no natural order. However, if the categories had been, for example, First, Second and Third, then it would make sense to list the categories in that order. 2 The Total count should always equal the total number of observations; in this case, 11. The percentages should add to 100%. However, if percentages are rounded to one decimal place a total of 99.9 or 100.1 is sometimes obtained. This is due to rounding error. Totalling the count and percentages helps check on your counting and percentaging. How has forming a frequency table helped? The process of forming a frequency table for a categorical variable: displays the data in a compact form tells us something about the way the data values are distributed (the pattern of the data).
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
Example 2
Constructing a bar chart from a frequency table Frequency Count Per cent 3 14 6 23 13.0 60.9 26.1 100.0
Construct a bar chart for this frequency table of climate types in various countries.
Solution 1 Label the horizontal axis with the variable name, Climate type. Mark the scale off into three equal intervals and label them Cold, Moderate and Hot. 2 Label the vertical axis Frequency. Scale allowing for the maximum frequency, 14. Fifteen would be appropriate. Mark the scale off in ves. 3 For each interval, draw in a bar. There are gaps between the bars to show that the categories are separate. The height of the bar is made equal to the frequency.
15
10 Frequency
The mode
One of the features of a data set that is quickly revealed with a bar chart is the mode or modal category. This is the most frequently occurring value or category. This is given by the category with the tallest bar. For the bar chart above, the modal category is clearly Moderate. That is, for the countries considered, the most frequently occurring climate type is Moderate. However, the mode is only of interest when a single value or category in the frequency table occurs much more often than the others. Modes are of particular importance in popularity polls. For example, in answering questions such as Which is the most frequently watched TV station between the hours of 6.00 and 8.00 p.m.? or What are the times when a supermarket is in peak demand morning, afternoon or night?
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
Describing a bar chart In describing a bar chart, we focus on two things: the presence of a dominant category (or group of categories) in the distribution. This is given by the mode. If there is no dominant category, then this should be stated. the order of occurrence of each category and its relative importance. In commenting on these features, it is usual to support your conclusions with percentages. When quoting percentages, it is also advisable to indicate at the beginning the total number of cases involved. Using the information in Example 2 to describe the distribution of climate type, you might write as follows:
Report
The climate types of 23 countries were classified as being, `cold', `moderate' or `hot'. The majority of the countries, 60.9%, were found to have a moderate climate. Of the remaining countries, 26.1% were found to have a hot climate while 13.0% were found to have a cold climate.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
Percentage
Frequency
Exercise 1B
1 a In a frequency table, what is the mode? b Identify the mode in the following data sets: i Grades: A A C B A B B B B D C ii Shoe size: 8 9 9 10 8 8 7 9 8 10 12
10
2 The following data identies the state of residence of a group of people, where 1 = Victoria, 2 = SA and 3 = WA. 2 1 1 1 3 1 3 1 1 3 3 a Form a frequency table (with both counts and percentages) to show the distribution of state of residence for this group of people. Use the table in Example 1 as a model. b Construct a bar chart using Example 2 as a model. 3 The size (S = small, M = medium, L = large) of 20 cars was recorded as follows: S S L M M M L S S M M S L S M M M S S M a Form a frequency table (with both counts and percentages) to show the distribution of size for these cars. Use the table in Example 1 as a model. b Construct a bar chart using Example 2 as a model. 4 The table shows the frequency distribution of School type for a number of schools. The table is incomplete. a Write down the information missing from the table. b How many schools are categorised as Independent? c How many schools are there in total? d What percentage of schools are categorised as Government? e Use the information in the frequency table to complete the following report. Frequency Count Percent 4 11 5 20 25 100
Report
schools were classified according to school type. The majority of these schools, %, schools. Of the remaining schools, were while were found to be schools. 20% were
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
5 The table shows the frequency distribution of the place of birth for 500 Australians. a Is Place of birth a categorical or a numerical variable? b Display the data in the form of a percentage segmented bar chart. c Use the information in the frequency table to write a brief report. Place of birth Australia Overseas Total Per cent 78.3 21.8 100.1
6 The table records the number of new cars sold in Australia during the rst quarter of one year, categorised by type (private vehicle or commercial vehicle). a Copy and complete the table giving the percentages correct to the nearest whole number. b Display the data in the form of a percentage segmented bar chart. Frequency Count Per cent 132 736 49 109
7 The table shows the frequency distribution of eye colour of 11 preschool children. a Use the information in the table to construct a bar chart. Place the columns in order of decreasing frequency. b Use the information in the table to construct a percentage segmented bar chart. c Use the information in the table to write a brief report. 8 Twenty-two students were asked the question, How often do you play sport? with the possible response: Regularly, Sometimes or Rarely. The distribution of responses is summarised in the frequency table. a Write down the information missing from the table. b Use the information in the frequency table to complete the following report. Frequency Eye colour Count Percentage Brown Hazel Blue Total 6 2 3 11 54.5 18.2 27.3 100.0
Report
When students were asked the question, `How often do you play sport', the dominant % of the students. Of the remaining students, response was `Sometimes', given by % of the students responded that they played sport while % said that they . played sport
Example 3
The family sizes of 11 preschool children (including the child itself) are as follows: 3 3 4 4 5 3 2 4 3 5 3 Display the data in the form of a frequency table. Solution 1 Set up a table as shown. In the data set, the variable family size takes the values 2, 3, 4 and 5. List these values under Family size in some order, here increasing. 2 Count up the number of 2s, 3s, 4s and 5s in the dataset. For example, there are ve 3s. Record these values in the Count column.
Frequency Family size 2 3 4 5 Total Count 1 5 3 2 11 Per cent 9.1 45.5 27.3 18.2 100.1
3 Add the counts to nd the total count, 11. Record this value in the Count column opposite Total. 4 Convert the counts into percentages. Record them in the Per cent column. For example, 5 100% = 45.5% 11 5 Finally, total the percentages and record. percentage of 3s =
Grouping data
Some variables can only take on a limited range of values; for example, the number of children in a family. Here, it makes sense to list each of these values individually when forming a frequency distribution. In other cases, the variable can take a large range of values; for example, age (0100). Listing all possible ages would be tedious and would produce a large and unwieldy display. To solve this problem, we group the data into a small number of convenient intervals. There are no hard and fast rules for the number of intervals but, usually, between ve and fteen intervals are used. Usually, the smaller the number of data values, the smaller the number of intervals. Note that the intervals are dened so that it is quite clear into which interval each data value falls. For example, you cannot dene intervals as, 15, 510, 1015, 1520, . . . etc., as you would not know into which interval to put the values, 5, 10, 15 etc. Guideline for choosing the number of intervals There are no hard and fast rules for the number of intervals to use but, usually, between ve and fteen intervals are used.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
10
Example 4
Grouping data
The ages of a sample of 200 people aged from 16 to 72 years are to be recorded. Group the ages into six equal-sized categories that will cover all of these ages. Solution 1 Write down the required number of intervals. 2 Determine interval width. Ages range from 16 to 72, which covers 57 years. Six intervals will give intervals 57 = 9.5. of width 6 Set the interval width to 10, the nearest whole number above 9.5. 3 Choose a starting point that ensures that the intervals cover the full range of values. 15 would be a suitable starting point. 4 Write down the intervals.
Starting point: 15
Once we know how to group data, we can form a frequency distribution for grouped data. Example 5 A grouped frequency distribution for a continuous numerical variable
The data below give the average hours worked per week in 23 countries. 35.0, 48.0, 45.0, 43.0, 38.2, 50.0, 39.8, 40.7, 40.0, 50.0, 35.4, 38.8, 40.2, 45.0, 45.0, 40.0, 43.0, 48.8, 43.3, 53.1, 35.6, 44.1, 34.8 Form a grouped frequency table with ve intervals. Solution 1 Set up a table as shown. For ve intervals and data values ranging between 34.8 and 53.1, use the intervals: 30.034.9, 35.039.9, . . . , 50.054.9. 2 List these intervals, in ascending order, under Average hours worked. 3 Count the number of countries whose average working hours fall into each of the intervals. For example, six countries have average working hours between 35.0 and 39.9. Record these values in the Count column. 4 Add the counts to nd the total count, 23. Record this value in the Count column opposite Total.
Frequency Count 1 6 8 5 3 23 Per cent 4.3 26.1 34.8 21.7 13.0 99.9
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
11
5 Convert the counts into percentages. Record these in the Per cent column. For example, for 35.039.9 hours, 6 100% = 26.1% percentage = 23 6 Finally, total the percentages and record.
There are two things to note in the frequency table in Example 5. 1 The intervals in this example are of width ve. For example, the interval 35.039.9, is an interval of width 5.0 because it contains all values from 34.9500 . . . to 39.9499. 2 The modal interval is 40.044.9 hours; eight (34.8%) of the countries have working hours that fall into this interval. How has forming a frequency table helped? The process of forming a frequency table for a numerical variable: orders the data displays the data in a compact form tells us something about the way the data values are distributed (the pattern of the data) helps us identify the mode (the most frequently occurring value or interval of values).
The histogram
The frequency histogram, or histogram for short, is a graphical way of presenting the information in a frequency table for numerical data. Later in the chapter, you will learn about two other graphical displays for numerical data, the stem plot and the dot plot. Constructing a histogram from a frequency table In a frequency histogram: frequency (count or per cent) is shown on the vertical axis the values of the variable being displayed are plotted on the horizontal axis for continuous data, each bar in a histogram corresponds to a data interval. For discrete data, where there are gaps between values, the intervals start and end halfway between values. Empty classes or missing discrete values have bars of zero height the height of the bar gives the frequency (usually the count, but it can equally well be the percentage). Example 6 Constructing a histogram from a frequency table: continuous numerical variable Average hours worked 30.034.9 35.039.9 40.044.9 45.049.9 50.054.9 Total Frequency (count) 1 6 8 5 3 23
Cambridge University Press
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
12
Solution 1 Label the horizontal axis with the variable name, Average hours worked. Mark in the scale using the beginning of each interval as the scale points: that is 30, 35, . . . 2 Label the vertical axis Frequency. Scale allowing for the maximum frequency, 8. Ten would be appropriate. Mark in the scale in units. 3 Finally, for each interval, 30.034.9, 35.039.9, . . . , draw in a bar with the base starting at the beginning of each interval and nishing at the beginning of the next. The height of the bar is made equal to the frequency. Example 7
9 8 7 Frequency 6 5 4 3 2 1 0 25 30 35 40 45 50 55 Average hours worked 60
Constructing a histogram from a frequency table: discrete numerical variable Family size 2 3 4 5 Total
5 4 Frequency 3 2 1 0 1 2 3 4 Family size 5 6
Frequency (count) 1 5 3 2 11
Solution 1 Label the horizontal axis with the variable name, Family size. Mark the scale in units, so that it includes all possible values. 2 Label the vertical axis Frequency. Scale to allow for the maximum frequency, 5. Five would be appropriate. Mark the scale in units. 3 Draw in a bar for each data value. The width of each bar is 1, starting and ending halfway between data values. For example, the base of the bar representing a family size of 2 starts at 1.5 and ends at 2.5. The height of the bar is made equal to the frequency.
Constructing a histogram from raw data It is relatively quick to construct a histogram from a preprepared frequency table. However, if you only have raw data (as you mostly do), it is a very slow process because you have to construct the frequency table rst. Fortunately, a graphics calculator will do this for us.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party Cambridge University Press
13
How to construct a histogram using the TI-Nspire CAS Display the following set of marks in the form of a histogram. 16 15 11 12 4 18 25 22 15 17 7 18 14 23 13 15 14 13 12 17 15 18 13 22 16 23 14
Steps 1 Start a new document: Press c and select New Document (or use / + N). If prompted to save an existing document, move cursor to No and press .
2 Select Add Lists & Spreadsheet. Enter the data into a list named marks. a Move the cursor to the name space of column A (or any other column) and type in marks as the list name. . Press b Move the cursor down to row 1, type . in the rst data value and press Continue until all the data has been after each entry. entered. Press 3 Statistical graphing is done through the Data & Statistics application. Press / + and select Add Data & Statistics (or press c, arrow to ). and press ,
Note: A random display of dots will appear this is to indicate that data are available for plotting. It is not a statistical plot.
a Press e to show the list of variables. The variable marks is to shown as selected. Press paste the variable marks to that axis.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
14
b A dot plot is then displayed as the default plot. To change the plot to a histogram press b >Plot Type>Histogram
Note for CX only: To add colour (or change
colour) move cursor over the plot and press / + b >Color>Fill Color.
Your screen should now look like that shown opposite. This histogram has a column (or bin) width of 2 and a starting point of 3.
4 Data analysis a Move cursor onto any column, will show and the column data will be displayed as shown opposite. b To view other column data values move the cursor to another column.
Note: If you click on a column it will be selected. To deselect any previously selected columns, move the cursor to the open area and press . Hint: If you accidentally move a column or data point, press / + to undo the move.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
15
5 Change the histogram column (bin) width to 4 and the starting point to 2. a Press / + b to get the contextual menu as shown (below left).
Hint: Pressing / + b with the cursor on the histogram gives you access to a contextual menu that enables you to do things that relate only to histograms.
b Select Bin Settings. c In the settings menu (below right) change the Width to 4 and the Starting Point . (Alignment) to 2 as shown. Press
d A new histogram is displayed with a column width of 4 and a starting point of 2 but it no longer ts the viewing window (below left). To solve this problem press / + b >Zoom>Zoom-Data to obtain the histogram shown below right.
6 To change the frequency axis to a percentage axis, press / + b >Scale>Percent and . then press
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
16
How to construct a histogram using the ClassPad Display the following set of 27 marks in the form of a histogram. 16 15 11 12 4 18 25 22 15 17 7 18 14 23 13 15 14 13 12 17 15 18 13 22 16 23 14
Steps 1 From the application menu screen, locate the built-in Statistics to open. application. Tap from the icon panel Tapping (just below the touch screen) will display the application menu if it is not already visible.
2 Enter the data into a list named marks. To name the list: a Highlight the heading of the rst list by tapping it. b Press k on the front of the calculator and tap the tab.
c To enter the data, type the word marks and press E. d Type in each data value and press E or (which is found on the cursor button on the front of the calculator) to move down to the next cell. The screen should look like the one shown opposite.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
17
3 Set up the calculator to plot a statistical graph. from the toolbar. This a Tap opens the Set StatGraphs dialog box.
b Complete the dialog box as given below. Draw: select On Type: select Histogram ( XList: select main \ marks ( ) Freq: leave as 1 c Tap h to conrm your selections.
Note: To make sure only this graph is drawn, select SetGraph from the menu bar at the top and conrm that there is a tick only beside StatGraph1 and no others.
4 To plot the graph: in the toolbar. a Tap b Complete the Set Interval dialog box as follows. HStart: type 2 (i.e. the starting point of the rst interval) HStep: type 4 (i.e. the interval width) Tap OK to display histogram.
Note: The screen is split into two halves, with the graph displayed in the bottom half, as shown above.
Tapping r from the icon panel allows the graph to ll the entire screen. Tap r again to return to half-screen size.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
18
5 Tapping from the toolbar places a marker (+) at the top of the rst column of the histogram (see opposite) and tells us that a the rst interval begins at 2 (xc = 2) b for this interval, the frequency is 1 (Fc = 1).
To nd the frequencies and starting points of the other intervals, use the arrow ( move from interval to interval.
) to
Exercise 1C
1 The numbers of occupants in nine cars stopped at a trafc light were: 1 1 2 1 3 1 2 1 3 What is the mode of this data set? What does this tell us? 2 The number of surviving grandparents for 11 preschool children is listed below. 0 4 4 3 2 3 4 4 4 3 3 Form a frequency table to show the distribution of the number of surviving grandparents. 3 a Write down the missing information in the frequency table. b How many families had only one child? c How many families had more than one child? d What percentage of families had no children? e What percentage of families had fewer than three children? No. of children in family 0 1 2 3 4 Total Frequency Count % 3 10 6 2 21 47.6 28.6 9.5
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
19
4 a Salaries of women teaching in a school range from $20 106 to $63 579. Group the salaries into ve equal-sized categories that cover all teaching salaries. b The number of students in VCE Further Mathematics classes ranges from 6 to 33. Group the class sizes into six equal-sized categories that cover all Further Mathematics class sizes. c The amount of money carried by a sample of 23 students ranges from nothing to $8.75. Group the amount of money carried by the students into ve equal-sized categories that cover all amounts of money carried by the students. 5 The histogram opposite was formed by recording the number of words in 30 randomly selected sentences. a What percentage of these sentences contained: ii 2529 words? i 59 words? iv fewer than 15 words? iii 1019 words? Give answers correct to the nearest per cent. b How many of these sentences contained: ii more than 25 words? i 2024 words? c What is the mode (modal interval)? 6 Use the information in the table opposite to help you construct a histogram to display population density. Use the histogram in Example 6 as a model. Label axes and mark in scales.
Frequency (%) 35 30 25 20 15 10 5 0 5 10 15 20 25 30 Number of words in sentence
Population density 0199 200399 400599 600799 800999 Total Number of rooms 4 5 6 7 8 Total
7 Use the information in the table opposite to help you construct a histogram to display the distribution of the number of rooms in the houses of 11 preschool children. Use the histogram in Example 7 as a model. Label axes and mark in scales.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
20
a Use a graphics calculator to construct a histogram so that the rst column starts at 63 and the column width is two. b For this histogram: i what is the starting point of the third column? ii what is the count for the third column? What actual data values does this include? c Redraw the histogram so that the column width is ve and the rst column starts at 60. d For this histogram, what is the count in the interval 65 to <70? 9 The following data values are the numbers of children in the families of 25 VCE students: 1 6 2 5 5 3 4 1 2 7 3 4 5 3 1 3 2 1 4 4 3 9 4 3 3 a Use a graphics calculator to construct a histogram so that the column width is one and the rst column starts at 0.5. b For this histogram, what is the starting point for the fourth column and what is the count? c Redraw the histogram so that the column width is two and the rst column starts at 0. d For this histogram: i what is the count in the interval from 6 to less than 8? ii what actual data value(s) does this interval include?
Shape
How is the data distributed? Is the histogram peaked; that is, do some data values tend to occur much more frequently than others, or is it relatively at, showing that all values in the distribution occur with approximately the same frequency?
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
21
Symmetric distributions If a histogram is single-peaked, does the histogram region tail off evenly on either side of the peak? If so, the distribution is said to be symmetric (see Histogram 1).
10 8 6 4 2 0 lower tail peak upper tail Frequency 10 8 6 4 2 0 peak peak
Frequency
Histogram 1
Histogram 2
A single-peaked symmetric distribution is characteristic of the data that derive from measuring variables such as peoples heights, intelligence test scores, weights of oranges in a storage bin, or any other data for which the values vary evenly around some central value. The histogram for average hours worked (see Example 6) would be classied as approximately symmetric. The double-peaked distribution (Histogram 2) is symmetric about the dip between the two peaks. A histogram that has two distinct peaks indicates a bimodal (two modes) distribution. A bimodal distribution often indicates that the data have come from two different populations. For example, if we were studying the distance the discus is thrown by Olympic level discus throwers, we would expect a bimodal distribution if both male and female throwers were included in the study. Skewed distributions Sometimes a histogram tails off primarily in one direction. Such distributions are said to be skewed. If a histogram tails off to the right we say that it is positively skewed (Histogram 3). The distribution of salaries of workers in a large organisation tends to be positively skewed. Most workers earn a similar salary with some variation above or below this amount, but a few earn more and even fewer, such as the senior manager, earn even more. The distribution of house prices also tends to be positively skewed.
peak 10 8 6 4 2 0 Histogram 3 Frequency long upper tail Frequency +ve skew 10 8 6 4 2 0 Histogram 4 long lower tail ve skew peak
If a histogram tails off to the left we say that it is negatively skewed (Histogram 4). The distribution of age at death tends to be negatively skewed. Most people die in old age, a few in middle age and even fewer in childhood.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
22
Outliers
Outliers are any data values that stand out from the main body of data. These are data values that are atypically high or low. See for example, Histogram 5, which shows an outlier. In this case it is a data value that is atypically low compared to the rest of the data values. outlier main body of data Outliers can indicate errors made collecting 10 or processing data; for example, a persons 8 age recorded as 365. Alternatively, they may 6 indicate data values that are very different 4 2 from the rest of the values. For example, 0 compared to her students ages, a teachers Histogram 5 age is an outlier.
Frequency
Centre
Histograms 6 to 8 display the distribution 8 of test scores for three different classes 7 6 taking the same subject. They are identical 5 in shape, but differ in where they are 4 located along the axis. In statistical terms 3 we say that the distributions are centred 2 at different points along the axis. 1 But what do we mean by the centre of a 0 50 60 70 80 90 100 110 120 130 140 150 distribution? This is an issue we will return Histograms 6 to 8 to in more detail later. For the present we will take centre to be the middle of the distribution. The middle of a symmetric distribution is reasonably easy to locate by eye. Looking at Histograms 6 to 8, it would be reasonable to say that the centre or middle of each distribution lies roughly halfway between the extremes; half the observations would lie above this point and half below. Thus we might estimate that Histogram 6 (yellow) is centred at about 60, Histogram 7 (light blue) at about 100, and Histogram 8 (dark blue) at about 140.
Frequency
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
23
For skewed distributions, it is more difcult to estimate the middle of a distribution by eye. The middle is not halfway between the extremes because, in a skewed distribution, the scores tend to bunch up at one end. However, if we 5 line that divides imagine a cardboard cut-out of the histogram, the area of the 4 the midpoint lies on the line that divides the histogram in half histogram into two equal areas (Histogram 9). 3 Using this method, we would estimate the 2 centre of the distribution to lie somewhere between 35 and 40, but closer to 35, so we 1 might opt for 37. However, remember that 0 15 20 25 30 35 40 45 50 this is only an estimate.
Frequency
Histogram 9
Spread
If the histogram is single peaked, is it narrow? This would indicate that most of the data values in the distribution are tightly clustered in a small region. Or is the peak broad? This would indicate that the data values are more widely spread out. Histograms 10 and 11 are both single peaked. Histogram 10 has a broad peak, indicating that the data values are not very tightly clustered about the centre of the distribution. In contrast, Histogram 11 has a narrow peak, indicating that the data values are tightly clustered around the centre of the distribution.
10 8 6 4 2 0 2 wide central region Frequency 20 16 12 8 4 0 2 narrow central region
Frequency
8 10 12 14 16 18 20 22 Histogram 10
4 6
8 10 12 14 16 18 20 22 Histogram 11
But what do we mean by the spread of a distribution? We will return to this in more detail later. For a histogram we will take it to be the maximum range of the distribution. Range Range = largest value smallest value For example, Histogram 10 has a spread (maximum range) of 22 (22 0) units, which is considerably greater than the spread of Histogram 11, which has a spread of 12 (18 6) units.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
24
Example 8
The histogram opposite shows the distribution of the number of phones per 1000 people in 85 countries. a Describe its shape and note outliers (if any). b Locate the centre of the distribution. c Estimate the spread of the distribution.
Solution a Shape and outliers b Centre Count up the frequencies from either end to nd the middle interval. c Spread Use the maximum range to estimate the spread.
The distribution is positively skewed. There are no outliers. The distribution is centred in the interval 170340 phones/1000 people. Spread = 1020 0 = 1020 phones/1000 people
It should be noted that, with grouped data, it is difcult to precisely determine the location of the centre of a distribution from a histogram. So, when working with grouped data, it is acceptable to state that the centre of a distribution lies in the interval 170340. We will learn how to solve this problem later in the chapter. If you were using the histogram above to describe the distribution in a form suitable for a statistical report, you might write as follows.
Report
For the 85 countries, the distribution of the number of phones per 1000 people is positively skewed. The centre of the distribution lies somewhere in the interval 170340 phones/1000 people. The spread of the distribution is 1020 phones/1000 people. There are no outliers.
Exercise 1D
1 Label each of the following histograms as approximately symmetric, positively skewed or negatively skewed, and identify the following: i the mode a
20 15 10 5 0 Histogram A
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
Frequency
Frequency
25
c
Frequency
20 15 10 5 0 Histogram C
d
Frequency
20 15 10 5 0 Histogram D
10 2 These three histograms show the marks obtained by a group 9 8 of students in three subjects. 7 a Are each of the distributions 6 approximately symmetric or 5 skewed? 4 b Are there any clear outliers? 3 c Determine the interval 2 containing the central mark 1 for each of the three subjects. 0 d In which subject was the 2 6 10 14 18 22 26 30 34 38 42 46 50 spread of marks the least? Use Subject A Subject B Subject C Marks the range to estimate the spread. e In which subject did the marks vary most? Use the range to estimate the spread.
3 Label each of the following histograms as approximately symmetric, positively skewed or negatively skewed, and identify the following: i the mode(s) a
Frequency 20 15 10 5 0 Histogram A
Frequency
c
Frequency
20 15 10 5 0 Histogram C
d
Frequency
20 15 10 5 0 Histogram D
e
Frequency
10 5 0 Histogram E
Frequency
20 15
80 60 40 20 0 Histogram F
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
26
Frequency (count)
4 This histogram shows the distribution of pulse rate (in beats per minute) for 28 students. Use the histogram to complete the report below, describing the distribution of pulse rate in terms of shape, centre, spread and outliers (if any).
Report
For the students, the distribution of pulse rates is with an outlier. The beats per minute and the spread of the centre of the distribution lies in the interval beats per minute. The outlier lies in the interval beats per minute. distribution is
Stem Leaf 2 5 13 2
27
leaves increase in value as they move away from the stem. It is usually the ordered stem plot that we want, because an ordered stem plot makes it easy to nd the key values. Example 9 Constructing an ordered stem plot
University participation rates (%) in 23 countries are given below. 26 3 12 20 36 1 25 26 13 9 26 27 30 1 15 21 7 8 22 3 37 17 55 Display the data in the form of an ordered stem plot. Solution 1 The data set has values in the units, tens, twenties, thirties, forties and fties. Thus, appropriate stems are 0, 1, 2, 3, 4, and 5. Write these down in ascending order, followed by a vertical line. 2 Now attach the leaves. The rst data value is 26. The stem is 2 and the leaf is 6. Opposite the 2 in the stem, write down the number 6, as shown. The second data value is 3 or 03. The stem is 0 and the leaf is 3. Opposite the 0 in the stem, write down the number 3, as shown. Continue systematically working through the data following the same procedure until all points have been plotted. You will then have the unordered stem plot, as shown. 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
3 6
3 Ordering the leaves in increasing value as they move away from the stem gives the ordered stem plot, as shown.
8 3 7 1 2
8 9 6 6 7
Using a stem plot to describe a distribution Stem plots are just like histograms, except that you can see all the data values. This enables more precise estimates to be made of the centre and spread.
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party Cambridge University Press
28
Methods for determining the centre, spread and outliers from a stem plot Centre (middle) Count up from either end of the distribution until you nd the middle value; the value that has an equal number of data values either side. n+1 th For an odd number of data values, n, the middle value is the 2 value. Thus, the median will be an actual data value. n+1 th For an even number of data values, n, the middle value is the 2 value. Thus, the median will lie between two data values. Spread (range) Subtract the smallest data value from the largest data value. Range = largest value smallest value Outliers Data values that stand out from the main body of data are called outliers. Their values can be read directly from the stem plot.
Describing a stem plot in terms of shape, centre and spread Test marks The ordered stem plot opposite shows the 0 distribution of test marks of 23 students. 1 5 9 9 9 a Name its shape and note outliers (if any). 2 0 4 5 7 8 8 8 b Locate the centre of the distribution. 3 0 3 5 5 6 8 c Estimate the spread of the distribution. 4 1 2 3 3 5 d Write down the values of any outliers. 5 6 0 Solution a Shape b Centre There are 23 data values; the middle value is the 12th value. Check by counting. c Spread Use the range to estimate the spread. d Outlier Read off the value of the outlier.
Example 10
The distribution is approximately symmetric with one outlier. The distribution is centred at 30 marks. Spread = 60 15 = 45 marks Outlier = 60 marks
If you were using the stem plot to describe the distribution in a form suitable for a statistical report, you might write as follows.
Report
For the 23 students, the distribution of marks is approximately symmetric with an outlier. The centre of the distribution is at 30 marks and the distribution has a spread of 45 marks. The outlier is a mark of 60.
Split stems
In some instances, using the simple process outlined above produces a stem plot that is too bunched up to give us a good overall picture of the variation in the data. This is often the case
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
29
when the data values all have the same rst digit or the same one or two rst digits. For example, a group of 17 VCE students recently sat for a statistics test marked out of 20. The results are as shown below. 2 12 13 9 18 17 7 16 12 10 16 14 11 15 16 15 17
Using the process described in Example 10 to form a stem plot, we end up with a bunched-up plot like the one below. 0 1 2 0 7 1 9 2
When this happens, the stem plot scale can be stretched out by splitting the stems. Generally the stem is split into halves or fths. For example, for the interval 1019, the split stem system works as follows. 1 (1011) 1 (1213) 1 (1415) 1 (1617) 1 (1819)
Stem split into fths
1 (1019)
1 (1014) 1 (1519)
Single stem
In a stem plot with a single stem, the 1 represents the interval 1019. In a stem plot with its stem split into halves, the top 1 represents the interval 1014, while the bottom 1 represents the interval 1519. In a stem plot with its stem split into fths, the top 1 represents the interval 1011, the second 1 represents the interval 1213, the third 1 represents the interval 1415, the fourth 1 represents the interval 1617, while the bottom 1 represents the interval 1819. Comparison of stem plots with different split stems Using a split stem plot to display the test marks can show features not revealed by a standard plot. This can be seen in the next plot with the stem split into fths, indicating that a mark of 2 is an outlier. 0 2 7 9 1 0 1 2 2 3 4 5 5 6 6 6 7 7 8 0 0 1 1 2 7 9 0 1 2 2 3 4 5 5 6 6 6 7 7 8 0 0 0 0 0 1 1 1 1 1 2 7 9 0 2 4 6 8
1 2 3 5 5 6 6 7 7
Single stem
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
30
Use the back-to-back stem plot to write a report comparing the distribution of the two sets of test marks in terms of shape, centre, spread and outliers.
Solution
Report
The distribution of the Test 1 marks is negatively skewed while the distribution of the Test 2 marks is approximately symmetric. The two distributions have similar centres; 36.5 and 35. The spread of the Test 1 marks is less than the Test 2 marks; 29 compared to 42. There are no outliers.
Dot plots
The simplest way to display numerical data is to form a dot plot. A dot plot consists of a number line with each data point marked by a dot. When several data points have the same value, the points are stacked on top of each other. Like stem plots, dot plots are a great way of displaying small data sets and have the advantage of being very quick to construct by hand. They are best when the data values are relatively close together. Example 12 Constructing a dot plot
The ages (in years) of the 13 members of a sporting team are: 22 19 18 19 23 25 22 29 18 22 23 24 22 Construct a dot plot. Solution 1 Draw in a number line, scaled to include all data values. Label the line with the variable being displayed.
17 18 19 20 21 22 23 24 25 26 27 28 29 30 Age (years)
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
31
2 Plot each data value by marking in a dot above the corresponding value on the number line.
17
18
19
20 21
22 23 24 25 26 27 Age (years)
28 29 30
Interpreting a dot plot Dot plots are interpreted in much the same way as stem plots. However, usually there is little we can say about the shape of the distribution from the dot plot because there are not sufcient data points for any pattern to be revealed. From the dot plot in Example 12, we see that the distribution of ages is centred at 22 years (the middle value) with a spread of 11 years (29 18 = 11).
Which graph?
One of the issues that you will face is choosing a suitable graph to display a distribution. The following guidelines might help you in your decision-making. They are guidelines only, because in some instances there may be more than one suitable graph. Type of data Categorical Numerical Graph Bar chart Segmented bar chart Histogram Stem plot Dot plot Qualications on use Not too many categories (4 or 5 maximum) Best for medium to large data sets (n 40) Best for small to medium sized data sets (n 50) Suitable only for small data sets (n 20)
Exercise 1E
1 The data below give the urbanisation rates (%) in 23 countries. 54 99 22 20 31 3 22 9 25 3 56 12 16 9 29 6 28 100 17 9 35 27 12 a Construct an ordered stem plot. b What advantage does a stem plot have over a histogram? 2 For each of the following stem plots (A, B and C): a name its shape and note outliers (if any) b locate the centre of the distribution
Stem plot A Stem plot B
c determine the spread of the distribution d write down the values of outliers (if any)
Stem plot C
0 1 2 3 4 5 6
0 2 0 2 0 2
0 1 1 2 6 7 7 9 2 3 5 5 5 5 6 1 4 7 2
0 1 2 3 4 5 6
0 1 0 2 1 2 2
3 0 2 2 3
6 1 2 4
9 5 6 8 8 4 5 9 9 9 4 6
0 1 2 3 4 5 6
1 3
2 0 2 4 1 1 3 5 8 8 0 0 4 4 4 7 7 8 9
Cambridge University Press
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
32
3 The data below give the wrist circumference (in cm) of 15 men. 16.9 17.3 19.3 18.5 18.2 18.4 19.9 16.7 17.1 17.6 17.7 16.5 17.0 17.2 17.6 a Construct a stem plot for wrist circumference using: i stems 16, 17, 18, 19 ii these stems split into halves b Which stem plot appears to be more appropriate for the data? c Use the stem plot with split stems to help you complete the report below.
Report
For the men, the distribution of their wrist circumference is . The centre of cm and it has a spread of cm. There are no outliers. the distribution is at
4 The data below give the weight (in kg) of 22 students. 57 58 62 84 64 74 57 55 56 60 75 68 59 72 110 56 69 56 50 60 75 58 a Construct a stem plot for weight using: i stems 5, 6, 7, 8, 9, 10 and 11 ii these stems split into halves b Use the stem plot with a split stem to write a brief report on the distribution of the weights of the students in terms of shape (and outliers), centre and spread. Use the report from Question 3 as a model. 5 The number of possessions (kicks, mark, handballs, knockouts etc.) recorded for players in a football game between Carlton and Essendon is shown below. Carlton Essendon 10 44 32 44 19 35 11 5 24 28 21 32 21 59 21 12 19 26 23 22 29 34 22 34 36 20 14 25 16 19 32 32 14 29 8 22 21 26 44 19 21 22 a Display the data in the form of an ordered back-to-back stem plot. b Complete the following report comparing the two distributions in terms of shape (and outliers), centre and spread.
Report
The distribution of the number of possessions is for both teams. The two and possessions, respectively. The spread of distributions have similar centres, at possessions, compared to possessions for the distribution is less for Carlton, Essendon.
6 The following data give the number of children in the families of 14 VCE students: 1 6 2 5 5 3 4 4 2 7 3 4 3 4 a Construct a dot plot. b What is the mode? c What is: i the centre? ii the spread?
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party Cambridge University Press
33
7 The following data give the life expectancies in years of 13 countries: 76 75 74 74 73 73 75 71 72 75 75 78 72 a Construct a dot plot. b What is the mode? c What is: i the centre? ii the spread? 8 Data have been collected for each of the following variables. The data are to be displayed graphically. In each case, decide which is the most appropriate graph. Select from bar chart, histogram, stem plot or dot plot. Sometimes more than one sort of graph is suitable. a b c d e f g h number of passengers in a bus 1000 buses in sample amount of petrol purchased (in litres) 30 petrol purchases type of petrol purchased (super, unleaded, premium) prices of houses sold in Melbourne over a weekend the number of medals won by countries winning medals at the Olympics state of residence of a sample of 200 Australians number of cigarettes smoked in a day (a sample of 120 people) resting pulse rates of 7 students
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
34
Review
For a small number of categories, the distribution of a categorical Describing variable is described in terms of the dominant category (if any), the distributions of categorical variables order of occurrence of each category and its relative importance. Mode The mode is the value or group of values that occurs most often (frequently) in a data set. For example, for the data 2 1 1 3 3 2 5 1 6 1 1 2 1 1, the mode is 1, because it is the data value that occurs most often. Numerical data arise from measuring or counting some quantity; for example, height, number of people etc. Numerical data can be discrete or continuous. Discrete data arise when you count. Continuous data arise when you measure. A histogram is used to display the frequency distribution of a numerical variable; suitable for medium to large sized data sets. A stem plot is an alternative graphical display to the histogram; suitable for small to medium sized data sets. The advantage of the stem plot over the histogram is that it shows the value of each data point. A dot plot consists of a number line with each data point marked by a dot; suitable for small sets of data only. The distribution of a numerical variable can be described in terms of: r shape: symmetric or skewed (positive or negative)? r outliers: values that appear to stand out r centre: the midpoint of the distribution (median) r spread: one measure is the range of values covered (Range = largest value smallest value)
Numerical data
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
35
Review
Skills check
Having completed this chapter you should be able to: differentiate between numerical and categorical data interpret the information contained in a frequency table identify and interpret the mode construct a bar chart or histogram from a frequency table decide when it is appropriate to use a histogram rather than a bar chart and vice versa construct a histogram from raw data, using a graphics calculator construct a dot plot and a stem plot from raw data, using split stems if required locate the mode of a distribution from a histogram, stem plot, dot plot or bar chart recognise a symmetric, positively skewed and negatively skewed histogram or stem plot identify potential outliers in a distribution from its histogram or stem plot write a brief report to describe the distribution of a numerical variable in terms of shape, centre, spread and outliers (if any) write a brief report to describe the distribution of a categorical variable in terms of the dominant category (if any), the order of occurrence of each category and their relative importance.
Multiple-choice questions
The following information relates to Questions 1 to 3 A survey collected information about the number of cars owned by a family and the car size (small, medium, large). 1 The variables Number of cars owned and car Size are: A both categorical variables B both numerical variables C a categorical and a numerical variable respectively D a numerical and a categorical variable respectively E neither numerical nor categorical variables 2 To graphically display the information about car size you could use a: A dot plot B stem plot C histogram D segmented bar chart E back-to-back stemplot 3 The Number of cars owned is: A a continuous numerical variable C a continuous categorical variable E none of the above B a discrete numerical variable D a discrete categorical variable
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
36
Review
The following information relates to Questions 4 to 6 A number of teenagers were asked to nominate their favourite leisure Leisure activity activity. Their responses have been Sport organised into a frequency table, as Listening to music shown. Some information is missing. Watching TV Other Total
4 The percentage of students who said that listening to music was their favourite leisure activity is: A 17.5 B 28.0 C 29.2 D 50.0 E 70.0 5 The number of students who said watching TV was their favourite leisure activity is: A 19 B 48 C 62 D 125 E 70.0 6 For the students surveyed, the most popular leisure activity is: A sport B listening to music C watching TV D other E cant tell Questions 7 to 11 relate to the histogram shown below This histogram displays the test scores of a class 6 of Further Mathematics students. 5
Frequency
7 The total number of students in the class is: A 6 B 18 C 20 D 21 E 22 8 The number of students in the class who obtained a test score less than 14 is: A 4 B 10 C 14
4 3 2 1 0 6 8 10 12 14 16 18 20 22 24 26 28 Test score
D 17
E 28
9 The histogram is best described as: A negatively skewed B negatively skewed with an outlier C approximately symmetric D approximately symmetric with outliers E positively skewed 10 The centre of the distribution lies in the interval: A 810 B 1012 C 1214 D 1416 11 The spread of the students marks is: A 8 B 10 C 12 D 20 E 22 12 For the stem plot shown opposite, the modal interval is: A 2024 B 2529 C 2029 D 25 E 29 E 1820
1 1 2 2 3
0 5 3 5 0
2 5 3 7 1
6 9 4 9 9 9 2 4
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
37
Review
Percentage
The following information relates to Questions 13 and 14 This percentage segmented bar chart 100 90 shows the distribution of hair 80 70 colour for 200 students. 13 The number of students with brown hair is closest to: A 4 B 34 C 57 D 68 E 114 14 For these students, the most common hair colour is: A black B blonde C brown
60 50 40 30 20 10 0
D red
E other
15 The ages of 11 primary school children were collected. The best graph to display the distribution of ages of these children would be a: A bar chart B dot plot C histogram D segment bar chart E stem plot
Extended-response questions
1 One hundred and twenty-one students were asked to identify their preferred leisure activity. The results of the survey are displayed in a bar chart. a What percentage of students nominated watching TV as their preferred leisure activity? b What percentage of students in total nominated either going to the movies or reading as their preferred leisure activity? c What is the most popular leisure activity for these students? How many students rated this activity as their preferred leisure activity?
30 25 Percentage 20 15 10 5 0
TV M us i M c ov i Re es ad in g O th er Sp or t
2 The number of people killed in natural and non-natural disasters in 1997 by world region is shown in the table below. a Construct a bar chart. Region Number killed b In which region was the: Europe 874 i greatest number of people killed? Africa 8 327 ii least number of people killed? Asia 10 551 Oceania 457 The Americas 1 581
includes Australia (41)
Cambridge University Press
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
38
Review
3 A group of 52 teenagers was asked Do you agree that the use of marijuana should be legalised? Their responses are summarised in the table opposite.
a Construct a properly labelled and scaled frequency bar chart for the data. b Complete the table by calculating the appropriate percentages, correct to one decimal place. c Use the percentages to construct a percentage segmented bar chart for the data. d Use the frequency table to help you complete the following report.
Report: In response to the question, `Do you agree that the use of marijuana . Of the remaining should be legalised?', 50% of the 52 students % agreed, while % said that they . students,
4 The table below gives the distribution of the number of children in 50 families. a Is the number of children in a family a numerical or categorical variable? b Write down the missing information. c What is the mode? d Determine the number of families with: i three children ii two or three children iii less than three children e Determine the percentage of families with: i six children ii more than six children iii less than six children Number of children in family 0 1 2 3 4 5 6 7 8 Total Frequency Count Per cent 5 6 19 7 2 3 0 1 50 10 38 14 4 6 0 2 100
10 5 Students were asked how much they spent on entertainment each month. The 8 results are displayed in the histogram. Use this information to answer the 6 following questions. 4 a How many students: i were surveyed? 2 ii spent $100105 per month? 0 90 b What is the mode? c How many students spent $110 or more per month? d What percentage spent less than $100 per month? Frequency
100
130
140
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party
39
Review
i Name the shape of the distribution displayed by the histogram. ii Locate the interval containing the centre of the distribution. iii Determine the spread of the distribution using the range.
6 This stem plot displays the ages (in years) of a group of women. a What was the age of the youngest woman? b In terms of age, one of the women is a Note: 17 possible outlier. What is her age? 17 2 c How many women were aged between 17 5 18 0 17.0 and 17.4 years, inclusive? 18 5 d How many women were 19 years old 19 1 or older? 19 8 e What is the modal age category? f What percentage of women were younger 20 20 6 than 20 years old? g i Name the shape of the distribution of ages, noting outliers. ii Locate the centre of the distribution. iii Determine the spread of the distribution. 7 The distribution of the waiting times of 37 cars stopped by a trafc light is as shown in the histogram opposite. Use the histogram to write a report on the distribution of waiting times in terms of shape, centre, spread and outliers.
10 8 Frequency 6 4 2 0
2 = 17.2 years 3 4 6 6 8 8 9 9 1 3 3 3 4 5 5 5 5 5 6 7 8 8 8 9 2 2 3 3
8 Use a graphics calculator to construct histograms for the following sets of data. a Use intervals of width 5 starting at 90. Monthly expenditure on entertainment (in dollars) 110 115 105 98 118 114 125 95 114 104 97 130 122 93
ISBN: 9781107655904 Peter Jones, Michael Evans, Kay Lipson 2012 Photocopying is restricted under law and this material must not be transferred to another party