Sie sind auf Seite 1von 9

BISHNU PRASAD MAHALA , E.

CODE7291

The NHANES survey with target population is "the non-institutionalized civilian resident
population of the United States.

Q1) Ans:

R code for variables: names(Dataset_MPH) , summary(Dataset_MPH) etc.


table(Gender)
table(ID)
table(SurveyYr)
table(Age)
table(Education)
table(MaritalStatus)
table(HHIncome)
table(SmokeNow)
table(Diab)
table(Weight)
table(Height)
table(Height_m)
table(BMI)
summary(Gender)
summary(ID)
summary(SurveyYr)
summary(Education)
summary(MaritalStatus)
summary(HHIncome)
summary(Weight)
summary(Height)
summary(SmokeNow)
summary(Diab)
summary(Height_m)
summary(BMI)
summary(Age)

There are 13 variables and these are


"ID", "SurveyYr", "Gender", "Age","Education","MaritalStatus","HHIncome",
"Weight", "Height","SmokeNow", "Diab","Height_m",”BMI"

1. $ ID : int 60471 58275 69349 67972 68679 60165 51961 58622


67616 57708 ...
Comment: integer variable
Min. 1st Qu. Median Mean 3rd Qu. Max.
51702 56080 60899 61542 67123 71869

2. $ SurveyYr : Factor w/ 2 levels "2009_10","2011_12": 1 1 2 2 2 1 1


1 2 1 ...
Comment: factor variable
2009_10 2011_12
277 highest individual surveyed 223 individual surveyed

3. $ Gender : Factor w/ 2 levels "female","male": 1 2 2 2 2 1 1 2 2


2 ...
Comment: factor variable
4. $ Age : int 61 54 74 30 43 49 36 52 63 30 ...
Comment: integer variale
5. $ Education : Factor w/ 5 levels "8th Grade","9 - 11th Grade",..: 2
3 5 4 4 3 5 3 2 1 ...

Comment: factor variable

8th Grade 9 - 11th Grade High School Some College


32 lowest 77 117 165 highest
College Grad :109 individuals

6. $ MaritalStatus: Factor w/ 6 levels "Divorced","LivePartner",..: 6 1 3


4 3 3 3 3 2 2 ...
Comment: factor variable
Divorced:67 LivePartner:62 Married:238 highest NeverMarried:87
Separated: 13 lowest & Widowed :33

7. $ HHIncome : Factor w/ 12 levels " 0-4999"," 5000-9999",..: 6 3 11


11 5 5 9 4 4 9 ...
Comment: factor variable
More99999: 103 participant highest
8. $ Weight : num 65.9 66.6 79.1 97.5 105.1 ...
Comment: numeric variable
Maximum: 164.10, minimum:44.50

9. $ Height : num 155 168 177 179 177 ...


Comment: numeric variable
Maximum: 200.4, minimum:169.8

10.$ SmokeNow : Factor w/ 2 levels "No","Yes": 2 1 1 2 2 2 2 2 2 2


Comment: factor variable
Not smoking:259 highest,smoking:241

11. $ Diab : Factor w/ 2 levels "0","1": 2 1 2 1 1 1 1 1 1 1 ...


Comment: factor variable
Non-diabetic:439, diabetic:61
12. $ Height_m : num 1.55 1.68 1.77 1.79 1.77 ...
Comment: numeric variable
Maxm: 2.004

13. $ BMI : num 27.3 23.5 25.2 30.5 33.5 ...


Comment: numeric variable
Maxm:63.89, minimum: 15.98, mean:28.59

Q2)

R COMMAND USED

agegroupbrk <-cut(Dataset_MPH$Age,breaks=c(0,50,80))# as we saw in summary of age, maximum


age is 80

table(agegroupbrk)

summary(agegroupbrk)

agegroup_labeling <- cut(Dataset_MPH$Age,breaks=c(0,50,80),labels=c("Younger","Older"))

two categories older (51 and above) and younger(rest) created

table(agegroup_labeling)
Dataset_MPH <-Dataset_MPH%>%mutate(agegroup=agegroup_labeling)

agegroup<-agegroup_labeling

New age group variable created

table(agegroup)

summary(agegroup)

tabpct(Gender,agegroup)

Males <- filter(Dataset_MPH,Gender=="male")

Females <- filter(Dataset_MPH,Gender=="female")

#Calculating the summary statistics (mean, median, sd, minimum, and maximum) separately for males
and females.

summary(Males):

Calculate minimum, maximum, mean, median of males separately

summary(Females)

Calculate minimum, maximum, mean, median of females separately

skim(Males)

Calculate SD of males separately

skim(Females)

Calculate SD of males separately


Q3.)

BMI (calculated by formula) = Weight/Height_m^2

Maxm:63.89, minimum: 15.98, mean:28.59, 1st quartile:23.29, 3rd


quartile 32.36

R code:

Dataset_MPH <- mutate(Dataset_MPH,BMI_individuals = Weight/Height_m^2)


Q4. Two histograms of the distribution of BMI in diabetics and non-diabetics

Dataset_MPH <- mutate(Dataset_MPH,Diab_status=recode(Diab,"0"="Non-diabetic","1"="Diabetic"))

ggplot(Dataset_MPH,aes(x=BMI)) +ylab("FREQUENCY") +ggtitle("Histograms of the distribution of


BMI in diabetics and non-diabetics") +geom_histogram(binwidth=1, colour="blue",
fill="red")+facet_grid(Diab_status~.)

Diabetics <- filter(Dataset_MPH,Diab==1)

Nondiabetics <- filter(Dataset_MPH,Diab==0)

#Drawing histogram of distribution of BMI in diabetics

ggplot(Diabetics) + ggtitle("Histogram of distribution of BMI in diabetics")+


geom_histogram(aes(x=BMI),binwidth=1, colour="blue", fill="brown")
#Drawing histogram of distribution of BMI in non-diabetics

ggplot(Nondiabetics) +ggtitle("Histogram of distribution of BMI in non-diabetics") +


geom_histogram(aes(x=BMI),binwidth=1, colour="red", fill="green")
Q5.)

Classifying the individuals as: Underweight, Normal, Overweight and Obese.

R code:

max(BMI)

weight_status_standard <-cut(Dataset_MPH$BMI,breaks = c(0,18.5,25,30,64),right=FALSE) # as


maximum BMI is 63.88656 ,so we take last BMI as rounded 64

summary(weight_status_standard)

#classification Underweight, Normal, Overweight and Obese

weight_status_category<-cut(Dataset_MPH$BMI,breaks = c(0,18.5,25,30,64),right=FALSE,

labels = c("Underweight","Normal or Healthy Weight","Overweight","Obese"))

Dataset_MPH <- mutate(Dataset_MPH,Weight_status=weight_status_category)

Weight_status <- weight_status_category


Underweight: 6 Normal or Healthy Weight:164 Overweight:164
Obese :166

Q6)
Statistical model
R codes systematically
names(Dataset_MPH)
tabpct(Gender,Diab)
tabpct(agegroup,Diab)
tabpct(Education,Diab)
tabpct(MaritalStatus,Diab)
tabpct(HHIncome,Diab)
tabpct(Weight_status,Diab)
tabpct(SmokeNow,Diab)

model_diab_wt<-glm(Diab ~ Weight_status,data=Dataset_MPH,family='binomial')
model_diab_wt
coef(model_diab_wt)
exp(coef(model_diab_wt))
confint(model_diab_wt)
logistic.display(model_diab_wt)

model_diab_age<-glm(Diab ~ agegroup,data=Dataset_MPH,family='binomial')
model_diab_age
coef(model_diab_age)
exp(coef(model_diab_age))
confint(model_diab_age)
logistic.display(model_diab_age)
summary(model_diab_age)

model_diab_smokenow<-glm(Diab ~ SmokeNow,data=Dataset_MPH,family='binomial')
model_diab_smokenow
coef(model_diab_smokenow)
exp(coef(model_diab_smokenow))
confint(model_diab_smokenow)
logistic.display(model_diab_smokenow)
model_diab_bmi<-glm(Diab ~ BMI,data=Dataset_MPH,family='binomial')
model_diab_bmi
coef(model_diab_bmi)
exp(coef(model_diab_bmi))
confint(model_diab_bmi)
logistic.display(model_diab_bmi)

odds ratio 1.11 BMI is high , diabetic is high


r code:
logistic.display(model_diab_bmi)
smokenow vs diabetic:
OR PROTECTIVE
0.41 (0.23,0.73)
DIABETIC VS AGE
MORE AGE MORE DIABETIC
5.09 (2.76,9.41)
This is highest odds ratio for being a diabetic from above model.

SO increased odds of being diabetic by age, BMI

Q7)

We select variable age, BMI, smokenow for above model.


Because there are more evidence from literature search these are risk factors
for increasing odds a diabetic
Also from model we got the result that
Age have a highest association with being diabetic.
Smoking has a negative corelationmeans protecting factor for being diabetic

Das könnte Ihnen auch gefallen