1
In the statistics course part2, our course teacher Dr. Swapan Kumar Dhar has consigned us to carry out a report on “Bachelors degree earned by field”. The data set is secondary data and collected from the website of U.S. Census Bureau which was suggested by my course teacher. The data set was all about earning the bachelor degree from different field of education. The task was plotting this data set and finding the requirements with the statistical tools.
The problem statement was: from the table of “Bachelor’s Degrees Earned by Field” I took a random sample of 28 fields. By keeping ‘Business’ and ‘Mathematics and statistics’ fixed plotted the data for each field. Then described the apparent trends and obtained the trend values by least square method. For ‘Business’ and ‘Mathematics and statistics’, their proportions are obtained individually as it was required to find out the confidence interval(CI) of these two subjects. The purpose of this report is to compare among the different data set using various statistical tool.
The data
of this report
is
secondary data and has
been collected
from
the
website of U.S. Census Bureau.
2
3
As I mentioned at the introduction part, our data set was “Bachelors degree earned by field”. I have collected 28 samples from the sample by keeping the ‘Business’ and ‘Mathematics and Statistics’ fixed. The data is plotted below at table1 which is at the next page.
From this table the mean of this data has been obtained. The mean indicates the average number of students has earned those degrees in different fields of education from the year 1980 to 2006. The mean is shown separately for each year and shown in
table2.
4
Field 

of Study 
1980 
1990 
2000 
2003 
2004 
2005 
2006 

Table1: Bachelor's Degrees 
Earned by Fields 

Agriculture and natural 
22,80 

resources Architecture and related 
2 
12,900 
24,238 
23,294 
22,835 
23,002 
23,053 

services 
9,132 
9,364 
8,462 
9,054 
8,838 
9,237 
9,515 

Area, ethnic, cultural, and gender studies 
2,840 
4,447 
6,212 
6,629 
7,181 
7,569 
7,879 

Biological and biomedical 
46,19 

sciences 
0 
37,204 
63,005 
60,072 
61,509 
64,611 
69,178 

186,2 
256,07 
293,54 
307,14 
311,57 
318,04 

Business 
64248,568 
0 
5 
9 
4 
2 

Communications technologies 
1,689 
1,458 
1,298 
1,933 
2,034 
2,523 
2,981 

Computer and information 
11,15 

sciences 
4 
27,347 
37,788 
57,439 
59,488 
54,111 
47,480 

118,0 
108,03 
105,79 
106,27 
105,45 
107,23 

Education 
38105,112 
4 
0 
8 
1 
8 

Engineering and engineering 
69,38 

technologies 
7 
82,480 
73,419 
77,267 
78,227 
79,743 
81,223 

58,89 

Engineering 
6 
64,509 
58,822 
62,611 
63,558 
64,906 
67,045 

10,49 

Engineering technologies 
1 
17,971 
14,597 
14,656 
14,669 
14,837 
14,178 

English language and 
32,18 

literature/letters 
7 
46,803 
50,106 
53,670 
53,984 
54,379 
55,096 

Family and consumer 
18,41 

sciences/human sciences 
1 
13,514 
16,321 
18,166 
19,172 
20,074 
20,775 

Foreign languages, literatures, 
12,48 

and linguistics 
0 
13,133 
15,886 
16,901 
17,754 
18,386 
19,410 

Health professions and related 
63,84 

clinical sciences 
8 
58,983 
80,863 
71,223 
73,934 
80,685 
91,973 

Legal professions and studies 
683 
1,632 
1,969 
2,466 
2,841 
3,161 
3,302 

Liberal arts and sciences, 
23,19 

general studies 
6 
27,985 
36,104 
40,221 
42,106 
43,751 
44,898 

Library science 
398 
77 
154 
99 
72 
76 
76 

11,37 

Mathematics and statistics 
8 
14,276 
11,418 
12,493 
13,327 
14,351 
14,770 

Military technologies 
38 
196 
7 
6 
10 
40 
33 

11,45 

Multi/interdisciplinary studies Parks, recreation, leisure and 
7 
16,557 
28,561 
28,757 
29,162 
30,243 
32,012 

fitness studies 
5,753 
4,582 
17,571 
21,428 
22,164 
22,888 
25,490 

Philosophy and religious studies 
7,069 
7,034 
8,535 
10,344 
11,152 
11,584 
11,985 

Physical sciences and science 
23,40 

technologies 
7 
16,056 
18,331 
17,940 
17,983 
18,905 
20,318 

Public administration and social 
16,64 

services 
4 
13,908 
20,185 
19,878 
20,552 
21,769 
21,986 

15,01 

Security and protective services 
5 
15,354 
24,877 
26,189 
28,175 
30,723 
35,319 

103,6 
127,10 
143,21 
150,35 
156,89 
161,48 

Social sciences and history Theology and religious 
62118,083 
1 
8 
7 
2 
5 
5 

vocations 
6,170 
5,185 
6,789 
7,926 
8,126 
9,284 
8,548 
Table2: MEAN Values of the randomly selected 28 field
Field of 

Study 
1980 
1990 
2000 
2003 
2004 
2005 
2006 
23,29 
22,83 

Agriculture and natural resources 
22,802 
12,900 
24,238 
4 
5 
23,002 
23,053 
Architecture and related services 
9,132 
9,364 
8,462 
9,054 
8,838 
9,237 
9,515 
Area, ethnic, cultural, and gender studies 
2,840 
4,447 
6,212 
6,629 
7,181 
7,569 
7,879 
60,07 
61,50 

Biological and biomedical sciences 
46,190 
37,204 
63,005 
2 
9 
64,611 
69,178 
186,26 
248,56 
256,07 
293,5 
307,1 
311,57 

Business 
4 
8 
0 
45 
49 
4 
318,042 
Communications technologies 
1,689 
1,458 
1,298 
1,933 
2,034 
2,523 
2,981 
Computer and information 
57,43 
59,48 

sciences 
11,154 
27,347 
37,788 
9 
8 
54,111 
47,480 
118,03 
105,11 
108,03 
105,7 
106,2 
105,45 

Education 
8 
2 
4 
90 
78 
1 
107,238 
Engineering and engineering 
77,26 
78,22 

technologies 
69,387 
82,480 73,419 
7 
7 
79,743 
81,223 

62,61 
63,55 

Engineering 
58,896 
64,509 58,822 
1 
8 
64,906 
67,045 

14,65 
14,66 

Engineering technologies 
10,491 
17,971 14,597 
6 
9 
14,837 
14,178 

English language and 
53,67 
53,98 

literature/letters 
32,187 
46,803 
50,106 
0 
4 
54,379 
55,096 
Family and consumer 
18,16 
19,17 

sciences/human sciences 
18,411 
13,514 16,321 
6 
2 
20,074 
20,775 

Foreign languages, literatures, and 
16,90 
17,75 

linguistics 
12,480 
13,133 15,886 
1 
4 
18,386 
19,410 

Health professions and related 
71,22 
73,93 

clinical sciences 
63,848 
58,983 
80,863 
3 
4 
80,685 
91,973 
Legal professions and studies 
683 
1,632 
1,969 
2,466 
2,841 
3,161 
3,302 
Liberal arts and sciences, general 
40,22 
42,10 

studies 
23,196 
27,985 
36,104 
1 
6 
43,751 
44,898 
Library science 
398 
77 
154 
99 
72 
76 
76 
12,49 
13,32 

Mathematics and statistics 
11,378 
14,276 
11,418 
3 
7 
14,351 
14,770 
Military technologies 
38 
196 
7 
6 
10 
40 
33 
28,75 
29,16 

Multi/interdisciplinary studies 
11,457 
16,557 
28,561 
7 
2 
30,243 
32,012 
Parks, recreation, leisure and 
21,42 
22,16 

fitness studies 
5,753 
4,582 
17,571 
8 
4 
22,888 
25,490 
6
10,34 
11,15 

Philosophy and religious studies 
7,069 
7,034 
8,535 
4 
2 
11,584 
11,985 
Physical sciences and science 
17,94 
17,98 

technologies 
23,407 
16,056 
18,331 
0 
3 
18,905 
20,318 
Public administration and social 
19,87 
20,55 

services 
16,644 
13,908 
20,185 
8 
2 
21,769 
21,986 
26,18 
28,17 

Security and protective services 
15,015 
15,354 
24,877 
9 
5 
30,723 
35,319 
103,66 
118,08 
127,10 
143,2 
150,3 
156,89 

Social sciences and history 
2 
3 
1 
18 
57 
2 
161,485 
Theology and religious vocations 
6,170 
5,185 
6,789 
7,926 
8,126 
9,284 
8,548 
35,16 
44,3 
45,52 

Mean 
31,739 
9 
39,883 
42,972 
80 
7 
46,975 
Here at the table if we look closely then see that all these data has been arranged with respect to time. Then we can refer it as time series. Because we know that a time series is a set of measurements, ordered over time on a particular quantity of interest. From the given information we can see that the data has changed over long period of time and the mean of each year indicates the inclining tendency of this data set. This general tendency of a time series over a fairly long period of time is termed as trend or secular trend. To measure the secular trend of this data set there are several methods:
1. Graphical Method
2. Semiaverage Method
3. Moving average Method
4. Least Squares Method
To find out the trend value of the given information I have followed the Least Squares Method. The calculation of fitting the straight line has shown here.
7
This trend suppose to be linear the trend equation is of the type y _{c} = a + bx. The values of a and b are two parameters. Applying the least squares method the values of a and b are estimated as:
And
Let the equation of the linear trend be y _{c} = a + bx.
Here the number of years (n) is 7 which is a odd number. That’s why we choose the origin(x) the middle Year unit of x as 1 year.
Table3: Calculation of fitting a straight line
No. of people earned Bachelor's 
Trend 

Year(t) 
x= t2003 
degree(y) 
x² 
xy 
value(y=a+bx) 
1980 
23 
926,731 

2,159,065 

1990 
13 
1,020,205 

1,744,302 

2000 
3 
1,169,302 
9 
3,507,906 
1,329,540 
2003 
0 
1,268,060 
0 
0 
1,205,112 
2004 
1 
1,312,637 
1 
1,312,637 
1,163,636 
2005 
2 
1,348,141 
4 
2,696,282 
1,122,159 
2006 
3 
1,390,706 
9 
4,172,118 
1,080,683 
Total 
33 
8,435,782 
721 
29,904,347 
Now from the equation,
a= 
1146573.6 
b= 
 
40018.18169
Then I obtain the trend value for each year and plot it on table3. Now this trend line is fitted at the following graphical presentation.
8
Figure 1: Fitting straight line trend
Now the problem statement required the proportion and the confidence interval (CI) of ‘Business’ and ‘Mathematics and Statistics’ individually for each year. To find out the CI the standard error (SE) of the sample is also needed.
Table4 has shown the proportion and the CI of ‘Business’ field. Here n=28, the no. of student in ‘Business’ has earned the degree each year=X, the total no. of students earned the degrees in each year=N, population proportion=P
TDistribution Table Analysis
Using the T distribution for estimating is required whenever the sample size is 30 or less and the population standard deviation is not known. The table of T distribution values differs in construction from the Z table. The T table is more compact and shows the areas and T values for only a few percentages (10, 5, 2, and 1). A different T distribution for each number of degrees of freedom makes a lengthier but complete table.
In using T table we must specify the degrees of freedom with which we are dealing. In this report of Bachelors’ degree earned by a population of 36 fields we take a sample size of 28 fields. That is,
Sample size, n = 28; Degrees of freedom, df = n1 = 281 = 27. Acceptance error, α = 0.05. Confidence Interval level, CI = 1α = 1  .05 = .95 = 95%
9
Now we look in T table down the 0.05 column until we encounter the row for 27 degrees of freedom. There we find that the T value is 2.052 and set our confidence limits of the proportion of BUSINESS and MATHEMATICS AND STATISTICS.
In order to fin the PROPORTION of BUSINESS and MATHEMATICS AND STATISTICS we require the following equations.
Sample Proportion, P̂ = Number of Successes/Total number of outcomes = X/N Standard Error of Sample Proportion, SE (P̂) = √ P̂ (1 P̂)/n Confidence Interval of Sample Proportion = P̂ ± t _{α}_{/}_{2} SE (P̂) = Lower Confidence level < t < Upper
Confidence level
Table4: The proportion and CI of ‘Business” field
Proporti 
SE(p̂)= 
P 

on 
√p̂(1 
CI of p̂±t _{α}_{/}_{2} SE(p̂) 
= 

Year 
X 
N 
p̂=X/N 
p̂)/n 

p̂ 
Range of CI 

p̂+ t _{α}_{/}_{2} SE(p̂) 
t _{α}_{/}_{2} SE( p̂) 

926,73 
0.0042<P<0.0 


1 


2035 

1,020,2 
0.0054<P<0.0 


05 


2260 

1,169,3 
0.0026<P<0.0 


02 


1697 

1,268,0 
0.0026<P<0.0 


60 


1709 

1,312,6 
0.0028<P<0.0 


37 


1750 

1,348,1 
0.0031<P<0.0 


41 


1817 

1,390,7 
0.0031<P<0.0 


06 


1813 
From the table above, we have found the upper limit and the lower limit of the population proportion when the CI is 100(10.05) % or 95%. Here the error α=0.05. These numbers implied the percentage of the total population who has earned the degree in the ‘Business’ field.
10
From the above figure, we can report that with 95% CI, the sample proportion of the BUSINESS field in a certain year underlies between our estimated confidence range with α = .05. Here the above figure is based on the sample proportion of the BUSINESS field in the year of 2004.
In table5 the proportion and CI of ‘Mathematics and Statistics’ has shown where n=28, the no. of student in ‘Mathematics and Statistics’ has earned the degree each year=X, the total no. of students earned the degrees in each year=N, population proportion=P
Table5: The proportion and CI of ‘Mathematics and Statistics’ field
Proporti 
SE(p̂)= 
p̂ 
p̂ 
± 

on 
√p̂(1 
CI of t _{α}_{/}_{2} SE(p̂) 
= 

Year 
X 
N 
p̂=X/N 
p̂)/n 

p̂+t _{α}_{/}_{2} SE( 
p̂ 
Range of CI 

p̂) 
t _{α}_{/}_{2} SE(p̂) 

P(198 
186,2 
926,73 
0.1716<P<0.2 

0) 
64 
1 
0.2010 
0.0143 
0.2304 
0.1716 
304 

P(199 
248,5 
1,020,2 
0.2122<P<0.2 

0) 
68 
05 
0.2436 
0.0153 
0.2751 
0.2122 
751 

P(200 
256,0 
1,169,3 
0.1887<P<0.2 

0) 
70 
02 
0.2190 
0.0148 
0.2493 
0.1887 
493 

P(200 
293,5 
1,268,0 
0.2006<P<0.2 

3) 
45 
60 
0.2315 
0.0151 
0.2624 
0.2006 
624 
11
P(200 
307,1 
1,312,6 
0.2030<P<0.2 

4) 
49 
37 
0.2340 
0.0151 
0.2650 
0.2030 
650 
P(200 
311,5 
1,348,1 
0.2002<P<0.2 

5) 
74 
41 
0.2311 
0.0151 
0.2620 
0.2002 
620 
P(200 
318,0 
1,390,7 
0.1979<P<0.2 

6) 
42 
06 
0.2287 
0.0150 
0.2595 
0.1979 
595 
From the table above, we have found the upper limit and the lower limit of the population proportion when the CI is 100(10.05) % or 95%. Here the error α=0.05. These numbers implied the percentage of the total population who has earned the degree in the ‘Mathematics and Statistics’ field.
From the above figure, we can report that with 95% CI, the sample proportion of the BUSINESS field in a certain year underlies between our estimated confidence range with α = .05. Here the above figure is based on the sample proportion of the MATHEMATICS & STATISTICS field in the year of 2004.
12
13
3.1 ENDING SUMMARY
The report is on Statistical Analysis on Bachelor’s Degrees earned by field by suitable Statistical Tools.
After interpreting all the data I have found the following characteristics of the given data set.
With 95% CI, the sample proportion of the BUSINESS field in a certain year underlies between our estimated confidence range with α = .05. Here the above figure is based on the sample proportion of the BUSINESS and MATHEMATICS & STATISTICS field.
14