Completed - Nhanes Survey Usa Answers

BISHNU PRASAD MAHALA , E.
CODE7291
The NHANES survey with target population is "the non-institutionalized civilian resident
population of the United States.
Q1) Ans:
R code for variables: names(Dataset_MPH) , summary(Dataset_MPH) etc.

table(Gender)
table(ID)
table(SurveyYr)
table(Age)
table(Education)
table(MaritalStatus)
table(HHIncome)
table(SmokeNow)
table(Diab)
table(Weight)
table(Height)
table(Height_m)
table(BMI)
summary(Gender)
summary(ID)
summary(SurveyYr)
summary(Education)
summary(MaritalStatus)
summary(HHIncome)
summary(Weight)
summary(Height)
summary(SmokeNow)
summary(Diab)
summary(Height_m)
summary(BMI)
summary(Age)
There are 13 variables and these are

"ID", "SurveyYr", "Gender", "Age","Education","MaritalStatus","HHIncome",
"Weight", "Height","SmokeNow", "Diab","Height_m",”BMI"
1. $ ID : int 60471 58275 69349 67972 68679 60165 51961 58622

67616 57708 ...
Comment: integer variable
Min. 1st Qu. Median Mean 3rd Qu. Max.
51702 56080 60899 61542 67123 71869
2. $ SurveyYr : Factor w/ 2 levels "2009_10","2011_12": 1 1 2 2 2 1 1

1 2 1 ...
Comment: factor variable
2009_10 2011_12
277 highest individual surveyed 223 individual surveyed
3. $ Gender : Factor w/ 2 levels "female","male": 1 2 2 2 2 1 1 2 2

2 ...
4. $ Age : int 61 54 74 30 43 49 36 52 63 30 ...
Comment: integer variale
5. $ Education : Factor w/ 5 levels "8th Grade","9 - 11th Grade",..: 2
3 5 4 4 3 5 3 2 1 ...
8th Grade 9 - 11th Grade High School Some College

32 lowest 77 117 165 highest
College Grad :109 individuals
6. $ MaritalStatus: Factor w/ 6 levels "Divorced","LivePartner",..: 6 1 3

4 3 3 3 3 2 2 ...
Divorced:67 LivePartner:62 Married:238 highest NeverMarried:87
Separated: 13 lowest & Widowed :33
7. $ HHIncome : Factor w/ 12 levels " 0-4999"," 5000-9999",..: 6 3 11

11 5 5 9 4 4 9 ...
More99999: 103 participant highest
8. $ Weight : num 65.9 66.6 79.1 97.5 105.1 ...
Comment: numeric variable
Maximum: 164.10, minimum:44.50
9. $ Height : num 155 168 177 179 177 ...

Maximum: 200.4, minimum:169.8
10.$ SmokeNow : Factor w/ 2 levels "No","Yes": 2 1 1 2 2 2 2 2 2 2

Not smoking:259 highest,smoking:241
11. $ Diab : Factor w/ 2 levels "0","1": 2 1 2 1 1 1 1 1 1 1 ...

Non-diabetic:439, diabetic:61
12. $ Height_m : num 1.55 1.68 1.77 1.79 1.77 ...
Maxm: 2.004
13. $ BMI : num 27.3 23.5 25.2 30.5 33.5 ...

Maxm:63.89, minimum: 15.98, mean:28.59
Q2)
R COMMAND USED
agegroupbrk <-cut(Dataset_MPH$Age,breaks=c(0,50,80))# as we saw in summary of age, maximum

age is 80
table(agegroupbrk)
summary(agegroupbrk)
agegroup_labeling <- cut(Dataset_MPH$Age,breaks=c(0,50,80),labels=c("Younger","Older"))
two categories older (51 and above) and younger(rest) created
table(agegroup_labeling)
Dataset_MPH <-Dataset_MPH%>%mutate(agegroup=agegroup_labeling)
agegroup<-agegroup_labeling
New age group variable created
table(agegroup)
summary(agegroup)
tabpct(Gender,agegroup)
Males <- filter(Dataset_MPH,Gender=="male")
Females <- filter(Dataset_MPH,Gender=="female")
#Calculating the summary statistics (mean, median, sd, minimum, and maximum) separately for males
and females.
summary(Males):
Calculate minimum, maximum, mean, median of males separately
summary(Females)
Calculate minimum, maximum, mean, median of females separately
skim(Males)
Calculate SD of males separately
skim(Females)
Calculate SD of males separately

Q3.)
BMI (calculated by formula) = Weight/Height_m^2
Maxm:63.89, minimum: 15.98, mean:28.59, 1st quartile:23.29, 3rd

quartile 32.36
R code:
Dataset_MPH <- mutate(Dataset_MPH,BMI_individuals = Weight/Height_m^2)

Q4. Two histograms of the distribution of BMI in diabetics and non-diabetics
Dataset_MPH <- mutate(Dataset_MPH,Diab_status=recode(Diab,"0"="Non-diabetic","1"="Diabetic"))
ggplot(Dataset_MPH,aes(x=BMI)) +ylab("FREQUENCY") +ggtitle("Histograms of the distribution of

BMI in diabetics and non-diabetics") +geom_histogram(binwidth=1, colour="blue",
fill="red")+facet_grid(Diab_status~.)
Diabetics <- filter(Dataset_MPH,Diab==1)
Nondiabetics <- filter(Dataset_MPH,Diab==0)
#Drawing histogram of distribution of BMI in diabetics
ggplot(Diabetics) + ggtitle("Histogram of distribution of BMI in diabetics")+

geom_histogram(aes(x=BMI),binwidth=1, colour="blue", fill="brown")
#Drawing histogram of distribution of BMI in non-diabetics
ggplot(Nondiabetics) +ggtitle("Histogram of distribution of BMI in non-diabetics") +

geom_histogram(aes(x=BMI),binwidth=1, colour="red", fill="green")
Q5.)
Classifying the individuals as: Underweight, Normal, Overweight and Obese.
R code:
max(BMI)
weight_status_standard <-cut(Dataset_MPH$BMI,breaks = c(0,18.5,25,30,64),right=FALSE) # as

maximum BMI is 63.88656 ,so we take last BMI as rounded 64
summary(weight_status_standard)
#classification Underweight, Normal, Overweight and Obese
weight_status_category<-cut(Dataset_MPH$BMI,breaks = c(0,18.5,25,30,64),right=FALSE,
labels = c("Underweight","Normal or Healthy Weight","Overweight","Obese"))
Dataset_MPH <- mutate(Dataset_MPH,Weight_status=weight_status_category)
Weight_status <- weight_status_category

Underweight: 6 Normal or Healthy Weight:164 Overweight:164
Obese :166
Q6)
Statistical model
R codes systematically
names(Dataset_MPH)
tabpct(Gender,Diab)
tabpct(agegroup,Diab)
tabpct(Education,Diab)
tabpct(MaritalStatus,Diab)
tabpct(HHIncome,Diab)
tabpct(Weight_status,Diab)
tabpct(SmokeNow,Diab)
model_diab_wt<-glm(Diab ~ Weight_status,data=Dataset_MPH,family='binomial')
model_diab_wt
coef(model_diab_wt)
exp(coef(model_diab_wt))
confint(model_diab_wt)
logistic.display(model_diab_wt)
model_diab_age<-glm(Diab ~ agegroup,data=Dataset_MPH,family='binomial')
model_diab_age
coef(model_diab_age)
exp(coef(model_diab_age))
confint(model_diab_age)
logistic.display(model_diab_age)
summary(model_diab_age)
model_diab_smokenow<-glm(Diab ~ SmokeNow,data=Dataset_MPH,family='binomial')
model_diab_smokenow
coef(model_diab_smokenow)
exp(coef(model_diab_smokenow))
confint(model_diab_smokenow)
logistic.display(model_diab_smokenow)
model_diab_bmi<-glm(Diab ~ BMI,data=Dataset_MPH,family='binomial')
model_diab_bmi
coef(model_diab_bmi)
exp(coef(model_diab_bmi))
confint(model_diab_bmi)
logistic.display(model_diab_bmi)
odds ratio 1.11 BMI is high , diabetic is high

r code:
logistic.display(model_diab_bmi)
smokenow vs diabetic:
OR PROTECTIVE
0.41 (0.23,0.73)
DIABETIC VS AGE
MORE AGE MORE DIABETIC
5.09 (2.76,9.41)
This is highest odds ratio for being a diabetic from above model.
SO increased odds of being diabetic by age, BMI
Q7)
We select variable age, BMI, smokenow for above model.

Because there are more evidence from literature search these are risk factors
for increasing odds a diabetic
Also from model we got the result that
Age have a highest association with being diabetic.
Smoking has a negative corelationmeans protecting factor for being diabetic

Completed - Nhanes Survey Usa Answers

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Completed - Nhanes Survey Usa Answers

Hochgeladen von

Copyright:

Verfügbare Formate

BISHNU PRASAD MAHALA , E.

R code for variables: names(Dataset_MPH) , summary(Dataset_MPH) etc.

There are 13 variables and these are

1. $ ID : int 60471 58275 69349 67972 68679 60165 51961 58622

2. $ SurveyYr : Factor w/ 2 levels "2009_10","2011_12": 1 1 2 2 2 1 1

3. $ Gender : Factor w/ 2 levels "female","male": 1 2 2 2 2 1 1 2 2

Comment: factor variable

8th Grade 9 - 11th Grade High School Some College

6. $ MaritalStatus: Factor w/ 6 levels "Divorced","LivePartner",..: 6 1 3

7. $ HHIncome : Factor w/ 12 levels " 0-4999"," 5000-9999",..: 6 3 11

9. $ Height : num 155 168 177 179 177 ...

10.$ SmokeNow : Factor w/ 2 levels "No","Yes": 2 1 1 2 2 2 2 2 2 2

11. $ Diab : Factor w/ 2 levels "0","1": 2 1 2 1 1 1 1 1 1 1 ...

13. $ BMI : num 27.3 23.5 25.2 30.5 33.5 ...

agegroupbrk <-cut(Dataset_MPH$Age,breaks=c(0,50,80))# as we saw in summary of age, maximum

agegroup_labeling <- cut(Dataset_MPH$Age,breaks=c(0,50,80),labels=c("Younger","Older"))

two categories older (51 and above) and younger(rest) created

New age group variable created

Males <- filter(Dataset_MPH,Gender=="male")

Females <- filter(Dataset_MPH,Gender=="female")

Calculate minimum, maximum, mean, median of males separately

Calculate minimum, maximum, mean, median of females separately

Calculate SD of males separately

Calculate SD of males separately

BMI (calculated by formula) = Weight/Height_m^2

Maxm:63.89, minimum: 15.98, mean:28.59, 1st quartile:23.29, 3rd

Dataset_MPH <- mutate(Dataset_MPH,BMI_individuals = Weight/Height_m^2)

Dataset_MPH <- mutate(Dataset_MPH,Diab_status=recode(Diab,"0"="Non-diabetic","1"="Diabetic"))

ggplot(Dataset_MPH,aes(x=BMI)) +ylab("FREQUENCY") +ggtitle("Histograms of the distribution of

Diabetics <- filter(Dataset_MPH,Diab==1)

Nondiabetics <- filter(Dataset_MPH,Diab==0)

#Drawing histogram of distribution of BMI in diabetics

ggplot(Diabetics) + ggtitle("Histogram of distribution of BMI in diabetics")+

ggplot(Nondiabetics) +ggtitle("Histogram of distribution of BMI in non-diabetics") +

Classifying the individuals as: Underweight, Normal, Overweight and Obese.

weight_status_standard <-cut(Dataset_MPH$BMI,breaks = c(0,18.5,25,30,64),right=FALSE) # as

#classification Underweight, Normal, Overweight and Obese

labels = c("Underweight","Normal or Healthy Weight","Overweight","Obese"))

Dataset_MPH <- mutate(Dataset_MPH,Weight_status=weight_status_category)

Weight_status <- weight_status_category

odds ratio 1.11 BMI is high , diabetic is high

SO increased odds of being diabetic by age, BMI

We select variable age, BMI, smokenow for above model.

Das könnte Ihnen auch gefallen