In the previous lesson, we learnt of methods in displaying data set. Similarly, in this lesson another graphical representation, frequency polygon, will be discuss. Frequency polygon is a graphical device for understanding the shapes of distributions. It serves that same purpose as histogram but is helpful in comparing sets of data. It is constructed from a histogram. Frequency polygon is also a good choice for displaying cumulative frequency distributions.
LEARNING OUTCOMES
Upon the completion of this lesson, you should be able to: 1. Create and interpret frequency polygons; 2. Create and interpret cumulative frequency polygons; 3. Solve problems related to frequency polygons and cumulative frequency polygons; 4. Write the class limits, class boundaries and class marks for for a tabulated frequency distribution.
Frequency Polygon A frequency polygon is a pictorial representation (line graph) of a frequency distribution in which the scores (X) are plotted on the X-axis of the graph and the frequency (or relative frequency) of occurrences is plotted on the Y-axis. However, the frequencies at each value of X are represented as dots connected by a line as opposed to bars (as in a Histogram). 1. Create a Frequency Distribution of the scores of interest. 2. The X-axis - completed as a Histogram with the following caveat - create a X value for the score above the highest and below the lowest actual X scores. 3. The Y-axis - completed as a Histogram 4. Create dots for each score value - the height should be equal to frequency or relative frequency. 5. Connect dots with straight lines. Connect the dots above the highest and below the lowest score values (X) to the X-axis at the score values created in step 2. Label frequency polygon with title. Be sure to label X and Y-axes.
In short, a frequency polygon is a line graph of a frequency distribution where we connect the midpoint of each class boundary by a straight line. We can also superimpose the polygon on the same graph as the histogram by connecting the midpoint of the width of each bar.
Example 1
Below is a polygon from a data set (Same data in lesson 2). Table 1 Speed (km/h), X Number of cars, f Lower class boundary Upper class boundary 45 - 49 4 44.5 49.5 50 - 54 14 49.5 54.5 55 - 59 19 54.5 59.5 60 - 64 7 59.5 64.5 65 - 69 5 64.5 69.5 70 - 74 4 69.5 74.5 75 - 79 2 74.5 79.5 Total 55
Figure 1
Notice that class marks have been added at each end of the scale of observed values. They have zero observations. This is done so that the area under the polygon is the same as the area under the histogram. Hence, the information that can be obtained from the frequency polygon is the same as that obtained from the histogram.
Two or more frequency polygons can be compared if they have the same class intervals and the same total frequency by superimposing the graphs. This is difficult to do with histograms.
If the data are large, and more classes is constructed, the frequency polygon graph will be smooth and hence we can see the shape of the distribution.
Example 2
A frequency polygon for 642 mathematics test scores (max score = 165) is shown in Figure below. The first label on the X-axis is 35. This represents an interval extending from 29.5 to 39.5. Since the lowest test score is 46, this interval has a frequency of 0. The point labeled 45 represents the interval from 39.5 to 49.5. There are three scores in this interval. There are 150 scores in the interval that surrounds 85.
Figure 2
You can easily discern the shape of the distribution from the Figure. Most of the scores are between 65 and 115. It is clear that the distribution is not symmetric inasmuch as good scores (to the right) trail off more gradually than poor scores (to the left). In the terminology of skewness (where we will study shapes of distributions more systematically in lesson 6), the distribution is skewed.
Frequency polygons are also a good choice for displaying cumulative frequency distributions.
Cumulative Frequency Polygon Another form of graphical data presentation is cumulative frequency polygon or also known as ogive. A cumulative frequency polygon or ogive is a variation on the frequency polygon. Although both are used to describe a relatively large set of quantitative data, the distinction is that cumulative frequency polygons show cumulative frequencies on the y -axis, with frequencies expressed in either absolute (counts) or relative terms (proportions). Cumulative frequencies are useful for knowing the number or the proportion of values that fall above or below a given value.
A cumulative frequency polygon for the same test scores is shown in Figure 3. The graph is the same as before except that the Y value for each point is the number of students in the corresponding class interval plus all numbers in lower intervals. For example, there are no scores in the interval labeled "35," three in the interval "45,"and 10 in the interval "55."Therefore the Y value corresponding to "55" is 13. Since 642 students took the test, the cumulative frequency for the last interval is 642.
Figure 3
Constructing a cumulative frequency polygon from a set of data
The cumulative frequency is the running total of the frequencies. On a graph, it can be represented by a cumulative frequency polygon, where straight lines join up the points, or a cumulative frequency curve. The following steps are taken to construct a cumulative frequency graph. Step 1: determine the upper boundary Step 2: Determine the mid-point of each class Step 3: Fill in the cumulative frequency
Exercise 1 1. From the data set provided in the table 2, draw the cumulative frequency polygon on a graph paper [ Plot Cumulative frequencies (Y axis) against class boundaries (x axis)] Table 2 Step 1 Step 2 Step 3 Interval of height (cm) Upper Class Boundary Height, X (mid point) Frequency Cumulative frequency 79.5 0 80 - 99 99.5 89.5 4 4 100 - 119 119.5 109.5 6 10 120 - 139 139.5 129.5 3 13 140 - 159 159.5 149.5 2 15 160 - 179 179.5 169.5 6 21 180 199 199.6 189.6 2 23 200 219 219.5 209.5 4 27 220 - 239 239.5 229.5 3 30
Question 2 The following marks are obtained by 40 students in a Statistics test. 77 81 74 56 63 52 87 90 34 29 57 68 29 34 98 58 43 51 74 64 68 39 45 83 62 94 36 61 88 89 38 54 46 73 67 31 27 45 99 79
a) Complete the table by using the marks provided with 21 30 as the lowest class interval
Table 3 Interval of Marks Upper Boundary Marks Frequency 21 - 30 25.5
b) Answer the following questions by using Table 3.
(i) How many students have marks less than 30.5? _________________ (ii) How many students have marks less than 40.5? _________________ (iii) How many students have marks less than 50.5? _________________ (iv) Explain briefly how you obtain these answers. _____________________________________________________________ _____________________________________________________________ _____________________________________________________________
c) Draw a cumulative frequency graph based on the data set provided in the table.
d) From the graph, answer the following questions:
(i) How many students have marks less than 50? _____________________________________________________________
How do you find it from the graph? (Answer orally)
(ii) If the pass mark is 60, how many students pass? _____________________________________________________________
How do you find it from the graph? (Answer orally)
(iii) How many students obtain marks between 65 and 85?
Show your work on the graph. _____________________________________________________________
(iv) If 40% of the students pass this test, what is the pass mark? _____________________________________________________________
How do you obtain your answer? Write down your work below. _____________________________________________________________ _____________________________________________________________ _____________________________________________________________
(v) Find the percentage of students with marks less than 70. _____________________________________________________________ _____________________________________________________________ Exercise 2 Pn Mariah, the School principal collected the following data which is a representation of the month in which teacher absentees occurred in her school.
Teachers Jan Feb Mac Apr May Jun Jul Aug Sept Oct Nov Dec Male 19 1 1 2 2 3 3 2 2 1 1 1 0 Female 31 3 2 2 3 3 4 4 2 2 1 2 3
Use the table above to answer each of the following questions.
1.How many teachers were absent in the first three months of the year?
2. How many more teachers were absent in July than September?
3. Find the percentage of teachers absent in the month of November and December?
4. Determine the percentage of months that has absentees of more than 4
Class limits, Class boundaries, Class marks.
The following terms are necessary in determining the measures of central tendency/location (descriptive statistics of mean, median and mode) that will be discuss in the lesson 4. 1. Class limits. There are two for each class. The lower class limit of a class is the smallest data value that can go into the class. The upper class limit of a class is the largest data value that can go into the class. Class limits have the same accuracy as the data values; the same number of decimal places as the data values.
2. Class boundaries. They are halfway points that separate the classes. The lower class boundary of a given class is obtained by averaging the upper limit of the previous class and the lower limit of the given class. The upper class boundary of a given class is obtained by averaging the upper limit of the class and the lower limit of the next class.
3. Class marks. They are the midpoints of the classes. They are obtained by averaging the limits. Example 3
Class Frequency Class limits Class boundaries Class mark Class size 9.6 - 14.5 10 9.6, 14.5 9.55, 14.55 12.05 5 14.6 - 24.5 20 14.6, 24,5 14.55, 24.55 19.55 10 24.6 - 44.5 30 24.6, 44.5 24.55, 44.55 29.55 20 44.6 - 54.5 25 44.6, 54.5 44.55, 54.55 49.55 10
Exercise 3
Filled in the lower class limit, upper class limit, lower class boundary and upper class boundary for the following tables: a) Scores X Number of students f Lower Class Limit Upper Class Limit Lower Class Boundary Upper Class Boundary 45 - 49 4 50 - 54 14 55 - 59 19 60 - 64 7 65 - 69 5 70 - 74 4 75 - 79 2
b) Scores Frequency Lower Class Limit Upper Class Limit Lower Class Boundary Upper Class Boundary 56 - 60 2 61 - 65 2 66 - 70 3 71 - 75 5 76 - 80 6 81 - 85 7 86 - 90 8 91 - 95 4 96 - 100 3