Sie sind auf Seite 1von 6

Probability and Statistics Assignment 1 Vishnu Prasad V

FPM/13/10/M

In Boston data, we would like to understand the primary variable MedV which is median income. So in-order
to understand the data, we would like to do descriptive analytics on the available data in-order to find out the
trends and outliers in the data obtained. In the data the Mean is 22.53, with standard deviation of 9.19. The
data is positively skewed as the skewness is 1.108 and kurtosis is 1.49.

Medv

Mean 22.53
Standard Error 0.4088
Median 21.2
Mode 50
Standard Deviation 9.19
Sample Variance 84.58
Kurtosis 1.49
Skewness 1.108
Range 45
Minimum 5
Maximum 50
Sum 11401.6
Count 506
Largest(1) 50
Smallest(1) 5
Confidence Level(95.0%) 0.80

In the data of Medv the whiskers are at 5 and 37, which means that there are 37 values which fall in the
outlier of the data involved. The outliers of 37 in 506 are mostly due to the degree of skewness in the data.
The data as we need to understand the level of correlation with other variable we would like to understand in
the perspective of skewness, degree of correlation, covariance of the data with the other variables in the data,
we also do the skater plot to determine the degree of skewness.
As we can see from the histogram the degree of the skewness in the data the data is concentrated only in the area
between 13 to 25 which is mostly about 70% of data pints thus the skewness I the data is thus observed.
Histogram
90 120.00%

80
100.00%
70

60 80.00%
Frequency

50
60.00%
40

30 40.00% Frequency
Cumulative %
20
20.00%
10

0 0.00%

Bin

Understanding the Medv with respect to the other variables thus the correlation the variable with other
variables in found out the variability of the data with the other data thus the below data gives us the insight
into it.
ptrati
crim zn indus chas nox rm age dis rad tax o black lstat medv
med
v -0.39 0.36 -0.48 0.18 -0.43 0.70 -0.38 0.25 -0.38 -0.47 -0.51 0.33 -0.74 1

We can find that the average room dwelling size is highly correlated with the median income which means
that the as the average size of the room in the cities increases the medv increases in the town.
We can also observe that as the industrial percentage in the city increases the medv decreases thus a negative
correlation can be observed in the data
We can also relate to the fact that there is a negative correlation between the low status people and median
income thus it correlates the intuitively known facts from the equation thus it can be verified

Correlation between variables


crim zn indus chas nox rm age dis rad tax ptratio black lstat medv
crim 1
zn -0.20 1
indus 0.41 -0.53 1
chas -0.06 -0.04 0.06 1
nox 0.42 -0.52 0.76 0.09 1
rm -0.22 0.31 -0.39 0.09 -0.30 1
age 0.35 -0.57 0.64 0.09 0.73 -0.24 1
dis -0.38 0.66 -0.71 -0.10 -0.77 0.21 -0.75 1
rad 0.63 -0.31 0.60 -0.01 0.61 -0.21 0.46 -0.49 1
tax 0.58 -0.31 0.72 -0.04 0.67 -0.29 0.51 -0.53 0.91 1
ptratio 0.29 -0.39 0.38 -0.12 0.19 -0.36 0.26 -0.23 0.46 0.46 1
black -0.39 0.18 -0.36 0.05 -0.38 0.13 -0.27 0.29 -0.44 -0.44 -0.18 1
lstat 0.46 -0.41 0.60 -0.05 0.59 -0.61 0.60 -0.50 0.49 0.54 0.37 -0.37 1
medv -0.39 0.36 -0.48 0.18 -0.43 0.70 -0.38 0.25 -0.38 -0.47 -0.51 0.33 -0.74 1

Covariance between variables


cha ptrati med
Co crim zn indus s nox rm age dis rad tax o black lstat v

crim 73.84
- 542.8
zn 40.14 6

indus 23.94 -85.24 46.97


0.0
chas -0.12 -0.25 0.11 6

0.0
nox 0.42 -1.39 0.61 0 0.01
0.0 -
rm -1.32 5.10 -1.88 2 0.02 0.49
-
373.1 124.2 0.6 - 790.7
age 85.24 6 7 2 2.38 4.74 9
-
- 0.0 -
dis -6.86 32.56 10.21 5 0.19 0.30 -44.24 4.43
-
0.0 - 111.5
rad 46.76 -63.22 35.48 2 0.62 1.28 5 -9.05 75.67
- - - -
843.1 1234. 831.7 1.5 13.0 34.5 2397. 189.2 1333. 28348.
tax 5 01 1 2 2 2 94 9 12 62
-
ptrati 0.0 -
o 5.39 -19.74 5.68 7 0.05 0.54 15.91 -1.06 8.74 167.82 4.68
- - - - -
301.7 372.9 223.1 1.1 - 701.5 352.5 6784.4 - 8318.
black 8 8 4 3 4.01 8.20 5 55.93 8 8 34.99 28
- -
0.1 - 120.8 238.2 50.8
lstat 27.93 -68.65 29.52 0 0.49 3.07 4 -7.46 30.33 653.42 5.77 0 9
-
- - 0.4 - - 279.4 48.3 84.4
medv 30.66 77.16 30.46 1 0.45 4.48 -97.40 4.83 -30.50 -724.82 10.09 4 5 2

We can thus relate by the scatter plot the degree of the correlation between the variables
The negative relation between the Medv and the low status of the people can be observed by the below data
of the Medv vs LsTat

Medv vs LSTAT
60

50

40
LSTAT

30

20

10

0
0 5 10 15 20 25 30 35 40
Medv

The positive correlation of the average number of the rooms and the Medv is observed by the below graph
which shows as the average room size increases the Medv increases.
Medv vs rm
10
9
8
7
6
5
4
3
2
1
0
0 10 20 30 40 50 60

The negative relation between the Medv and the industry of the people can be observed by the below data
of the Medv vs industry

Medv vs Industry
30

25

20

15

10

0
0 10 20 30 40 50 60

Thus we can observe that the industry, Low status and average room numbers have the highest correlation for
the given data.

Das könnte Ihnen auch gefallen