You are on page 1of 14

Introduction

1.1 Origin of the report

In the statistics course part-2, our course teacher Dr. Swapan Kumar Dhar has consigned us to carry out a report on “Bachelors degree earned by field”. The data set is secondary data and collected from the website of U.S. Census Bureau which was suggested by my course teacher. The data set was all about earning the bachelor degree from different field of education. The task was plotting this data set and finding the requirements with the statistical tools.

  • 1.2 Problem Statement

The problem statement was: from the table of “Bachelor’s Degrees Earned by Field” I took a random sample of 28 fields. By keeping ‘Business’ and ‘Mathematics and statistics’ fixed plotted the data for each field. Then described the apparent trends and obtained the trend values by least square method. For ‘Business’ and ‘Mathematics and statistics’, their proportions are obtained individually as it was required to find out the confidence interval(CI) of these two subjects. The purpose of this report is to compare among the different data set using various statistical tool.

1.3METHODOLOGY

The data

of this report

is

secondary data and has

been collected

from

the

website of U.S. Census Bureau.

Discussion

As I mentioned at the introduction part, our data set was “Bachelors degree earned by field”. I have collected 28 samples from the sample by keeping the ‘Business’ and ‘Mathematics and Statistics’ fixed. The data is plotted below at table-1 which is at the next page.

From this table the mean of this data has been obtained. The mean indicates the average number of students has earned those degrees in different fields of education from the year 1980 to 2006. The mean is shown separately for each year and shown in

table-2.

 

Field

 

of Study

1980

1990

2000

2003

2004

2005

2006

Table-1: Bachelor's Degrees

Earned by Fields

 

Agriculture and natural

22,80

resources Architecture and related

2

12,900

24,238

23,294

22,835

23,002

23,053

services

9,132

9,364

8,462

9,054

8,838

9,237

9,515

Area, ethnic, cultural, and gender studies

2,840

4,447

6,212

6,629

7,181

7,569

7,879

Biological and biomedical

46,19

sciences

0

37,204

63,005

60,072

61,509

64,611

69,178

 

186,2

256,07

293,54

307,14

311,57

318,04

 

Business

64248,568

0

5

9

4

2

Communications technologies

1,689

1,458

1,298

1,933

2,034

2,523

2,981

Computer and information

11,15

sciences

4

27,347

37,788

57,439

59,488

54,111

47,480

 

118,0

108,03

105,79

106,27

105,45

107,23

 

Education

38105,112

4

0

8

1

8

Engineering and engineering

69,38

technologies

7

82,480

73,419

77,267

78,227

79,743

81,223

 

58,89

 

Engineering

6

64,509

58,822

62,611

63,558

64,906

67,045

 

10,49

 

Engineering technologies

1

17,971

14,597

14,656

14,669

14,837

14,178

English language and

32,18

literature/letters

7

46,803

50,106

53,670

53,984

54,379

55,096

Family and consumer

18,41

sciences/human sciences

1

13,514

16,321

18,166

19,172

20,074

20,775

Foreign languages, literatures,

12,48

and linguistics

0

13,133

15,886

16,901

17,754

18,386

19,410

Health professions and related

63,84

clinical sciences

8

58,983

80,863

71,223

73,934

80,685

91,973

Legal professions and studies

683

1,632

1,969

2,466

2,841

3,161

3,302

Liberal arts and sciences,

23,19

general studies

6

27,985

36,104

40,221

42,106

43,751

44,898

Library science

398

77

154

99

72

76

76

 

11,37

 

Mathematics and statistics

8

14,276

11,418

12,493

13,327

14,351

14,770

Military technologies

38

196

7

6

10

40

33

 

11,45

 

Multi/interdisciplinary studies Parks, recreation, leisure and

7

16,557

28,561

28,757

29,162

30,243

32,012

fitness studies

5,753

4,582

17,571

21,428

22,164

22,888

25,490

Philosophy and religious studies

7,069

7,034

8,535

10,344

11,152

11,584

11,985

Physical sciences and science

23,40

technologies

7

16,056

18,331

17,940

17,983

18,905

20,318

Public administration and social

16,64

services

4

13,908

20,185

19,878

20,552

21,769

21,986

 

15,01

 

Security and protective services

5

15,354

24,877

26,189

28,175

30,723

35,319

 

103,6

127,10

143,21

150,35

156,89

161,48

 

Social sciences and history Theology and religious

62118,083

1

8

7

2

5

5

vocations

6,170

5,185

6,789

7,926

8,126

9,284

8,548

Table-2: MEAN Values of the randomly selected 28 field

Field of

 

Study

1980

1990

2000

2003

2004

2005

2006

   

23,29

22,83

Agriculture and natural resources

22,802

12,900

24,238

4

5

23,002

23,053

Architecture and related services

9,132

9,364

8,462

9,054

8,838

9,237

9,515

Area, ethnic, cultural, and gender studies

2,840

4,447

6,212

6,629

7,181

7,569

7,879

 

60,07

61,50

Biological and biomedical sciences

46,190

37,204

63,005

2

9

64,611

69,178

186,26

248,56

256,07

293,5

307,1

311,57

Business

4

8

0

45

49

4

318,042

Communications technologies

1,689

1,458

1,298

1,933

2,034

2,523

2,981

Computer and information

 

57,43

59,48

sciences

11,154

27,347

37,788

9

8

54,111

47,480

118,03

105,11

108,03

105,7

106,2

105,45

Education

8

2

4

90

78

1

107,238

Engineering and engineering

 

77,26

78,22

technologies

69,387

82,480 73,419

7

7

79,743

81,223

 

62,61

63,55

Engineering

58,896

64,509 58,822

1

8

64,906

67,045

 

14,65

14,66

Engineering technologies

10,491

17,971 14,597

6

9

14,837

14,178

English language and

 

53,67

53,98

literature/letters

32,187

46,803

50,106

0

4

54,379

55,096

Family and consumer

 

18,16

19,17

sciences/human sciences

18,411

13,514 16,321

6

2

20,074

20,775

Foreign languages, literatures, and

 

16,90

17,75

linguistics

12,480

13,133 15,886

1

4

18,386

19,410

Health professions and related

 

71,22

73,93

clinical sciences

63,848

58,983

80,863

3

4

80,685

91,973

Legal professions and studies

683

1,632

1,969

2,466

2,841

3,161

3,302

Liberal arts and sciences, general

 

40,22

42,10

studies

23,196

27,985

36,104

1

6

43,751

44,898

Library science

398

77

154

99

72

76

76

 

12,49

13,32

Mathematics and statistics

11,378

14,276

11,418

3

7

14,351

14,770

Military technologies

38

196

7

6

10

40

33

 

28,75

29,16

Multi/interdisciplinary studies

11,457

16,557

28,561

7

2

30,243

32,012

Parks, recreation, leisure and

 

21,42

22,16

fitness studies

5,753

4,582

17,571

8

4

22,888

25,490

   

10,34

11,15

Philosophy and religious studies

7,069

7,034

8,535

4

2

11,584

11,985

Physical sciences and science

 

17,94

17,98

technologies

23,407

16,056

18,331

0

3

18,905

20,318

Public administration and social

 

19,87

20,55

services

16,644

13,908

20,185

8

2

21,769

21,986

 

26,18

28,17

Security and protective services

15,015

15,354

24,877

9

5

30,723

35,319

103,66

118,08

127,10

143,2

150,3

156,89

Social sciences and history

2

3

1

18

57

2

161,485

Theology and religious vocations

6,170

5,185

6,789

7,926

8,126

9,284

8,548

 

35,16

44,3

45,52

Mean

31,739

9

39,883

42,972

80

7

46,975

Here at the table if we look closely then see that all these data has been arranged with respect to time. Then we can refer it as time series. Because we know that a time series is a set of measurements, ordered over time on a particular quantity of interest. From the given information we can see that the data has changed over long period of time and the mean of each year indicates the inclining tendency of this data set. This general tendency of a time series over a fairly long period of time is termed as trend or secular trend. To measure the secular trend of this data set there are several methods:

  • 1. Graphical Method

  • 2. Semi-average Method

  • 3. Moving average Method

  • 4. Least Squares Method

To find out the trend value of the given information I have followed the Least Squares Method. The calculation of fitting the straight line has shown here.

This trend suppose to be linear the trend equation is of the type y c = a + bx. The values of a and b are two parameters. Applying the least squares method the values of a and b are estimated as:

This trend suppose to be linear the trend equation is of the type y = a

And

This trend suppose to be linear the trend equation is of the type y = a

Let the equation of the linear trend be y c = a + bx.

Here the number of years (n) is 7 which is a odd number. That’s why we choose the origin(x) the middle Year unit of x as 1 year.

Table-3: Calculation of fitting a straight line

   

No. of people earned Bachelor's

   

Trend

Year(t)

x= t-2003

degree(y)

xy

value(y=a+bx)

1980

-23

926,731

 
  • 529 -21,314,813

2,159,065

1990

-13

1,020,205

  • 169 -13,262,665

1,744,302

2000

-3

1,169,302

9

-3,507,906

1,329,540

2003

0

1,268,060

0

0

1,205,112

2004

1

1,312,637

1

1,312,637

1,163,636

2005

2

1,348,141

4

2,696,282

1,122,159

2006

3

1,390,706

9

4,172,118

1,080,683

Total

-33

8,435,782

721

-29,904,347

 

Now from the equation,

a=

1146573.6

b=

-

40018.18169

Then I obtain the trend value for each year and plot it on table-3. Now this trend line is fitted at the following graphical presentation.

Figure 1: Fitting straight line trend Now the problem statement required the proportion and the confidence

Figure 1: Fitting straight line trend

Now the problem statement required the proportion and the confidence interval (CI) of ‘Business’ and ‘Mathematics and Statistics’ individually for each year. To find out the CI the standard error (SE) of the sample is also needed.

Table-4 has shown the proportion and the CI of ‘Business’ field. Here n=28, the no. of student in ‘Business’ has earned the degree each year=X, the total no. of students earned the degrees in each year=N, population proportion=P

T-Distribution Table Analysis

Using the T distribution for estimating is required whenever the sample size is 30 or less and the population standard deviation is not known. The table of T distribution values differs in construction from the Z table. The T table is more compact and shows the areas and T values for only a few percentages (10, 5, 2, and 1). A different T distribution for each number of degrees of freedom makes a lengthier but complete table.

In using T table we must specify the degrees of freedom with which we are dealing. In this report of Bachelors’ degree earned by a population of 36 fields we take a sample size of 28 fields. That is,

Sample size, n = 28; Degrees of freedom, df = n-1 = 28-1 = 27. Acceptance error, α = 0.05. Confidence Interval level, CI = 1-α = 1 - .05 = .95 = 95%

Now we look in T table down the 0.05 column until we encounter the row for 27 degrees of freedom. There we find that the T value is 2.052 and set our confidence limits of the proportion of BUSINESS and MATHEMATICS AND STATISTICS.

In order to fin the PROPORTION of BUSINESS and MATHEMATICS AND STATISTICS we require the following equations.

Sample Proportion, P̂ = Number of Successes/Total number of outcomes = X/N Standard Error of Sample Proportion, SE (P̂) = √ P̂ (1- P̂)/n Confidence Interval of Sample Proportion = P̂ ± t α/2 SE (P̂) = Lower Confidence level < t < Upper

Confidence level

Table-4: The proportion and CI of ‘Business” field

     

Proporti

SE(p̂)=

 

P

   

on

√p̂(1-

CI

of

p̂±t α/2 SE(p̂)

=

Year

X

N

p̂=X/N

p̂)/n

 
 

p̂-

Range of CI

p̂+ t α/2 SE(p̂)

t α/2 SE( p̂)

   

926,73

       

0.0042<P<0.0

  • 1980 11,378

1

  • 0.0123 0.0039

  • 0.02035 0.0042

2035

1,020,2

0.0054<P<0.0

  • 1990 14,276

05

  • 0.0140 0.0042

  • 0.02260 0.0054

2260

1,169,3

0.0026<P<0.0

  • 2000 11,418

02

  • 0.0098 0.0035

  • 0.01697 0.0026

1697

1,268,0

0.0026<P<0.0

  • 2003 12,493

60

  • 0.0099 0.0035

  • 0.01709 0.0026

1709

1,312,6

0.0028<P<0.0

  • 2004 13,327

37

  • 0.0102 0.0036

  • 0.01750 0.0028

1750

1,348,1

0.0031<P<0.0

  • 2005 14,351

41

  • 0.0106 0.0037

  • 0.01817 0.0031

1817

1,390,7

0.0031<P<0.0

  • 2006 14,770

06

  • 0.0106 0.0037

  • 0.01813 0.0031

1813

From the table above, we have found the upper limit and the lower limit of the population proportion when the CI is 100(1-0.05) % or 95%. Here the error α=0.05. These numbers implied the percentage of the total population who has earned the degree in the ‘Business’ field.

From the above figure, we can report that with 95% CI, the sample proportion of the

From the above figure, we can report that with 95% CI, the sample proportion of the BUSINESS field in a certain year underlies between our estimated confidence range with α = .05. Here the above figure is based on the sample proportion of the BUSINESS field in the year of 2004.

In table-5 the proportion and CI of ‘Mathematics and Statistics’ has shown where n=28, the no. of student in ‘Mathematics and Statistics’ has earned the degree each year=X, the total no. of students earned the degrees in each year=N, population proportion=P

Table-5: The proportion and CI of ‘Mathematics and Statistics’ field

     

Proporti

SE(p̂)=

 

 

±

 

on

√p̂(1-

CI

of

t α/2 SE(p̂)

=

Year

X

N

p̂=X/N

p̂)/n

 

p̂+t α/2 SE(

p̂-

Range of CI

p̂)

t α/2 SE(p̂)

 

P(198

186,2

926,73

       

0.1716<P<0.2

0)

64

1

0.2010

0.0143

0.2304

 

0.1716

304

P(199

248,5

1,020,2

 

0.2122<P<0.2

0)

68

05

0.2436

0.0153

0.2751

 

0.2122

751

P(200

256,0

1,169,3

 

0.1887<P<0.2

0)

70

02

0.2190

0.0148

0.2493

 

0.1887

493

P(200

293,5

1,268,0

 

0.2006<P<0.2

3)

45

60

0.2315

0.0151

0.2624

 

0.2006

624

P(200

307,1

1,312,6

       

0.2030<P<0.2

4)

49

37

0.2340

0.0151

0.2650

0.2030

650

P(200

311,5

1,348,1

0.2002<P<0.2

5)

74

41

0.2311

0.0151

0.2620

0.2002

620

P(200

318,0

1,390,7

0.1979<P<0.2

6)

42

06

0.2287

0.0150

0.2595

0.1979

595

From the table above, we have found the upper limit and the lower limit of the population proportion when the CI is 100(1-0.05) % or 95%. Here the error α=0.05. These numbers implied the percentage of the total population who has earned the degree in the ‘Mathematics and Statistics’ field.

P(200 307,1 1,312,6 0.2030<P<0.2 4) 49 37 0.2340 0.0151 0.2650 0.2030 650 P(200 311,5 1,348,1 0.2002<P<0.2

From the above figure, we can report that with 95% CI, the sample proportion of the BUSINESS field in a certain year underlies between our estimated confidence range with α = .05. Here the above figure is based on the sample proportion of the MATHEMATICS & STATISTICS field in the year of 2004.

CONCLUTION

3.1 ENDING SUMMARY

The report is on Statistical Analysis on Bachelor’s Degrees earned by field by suitable Statistical Tools.

After interpreting all the data I have found the following characteristics of the given data set.

With 95% CI, the sample proportion of the BUSINESS field in a certain year underlies between our estimated confidence range with α = .05. Here the above figure is based on the sample proportion of the BUSINESS and MATHEMATICS & STATISTICS field.