Sie sind auf Seite 1von 19

Do the Following Factors (gender, title) Predict the Faculty’s Attitude Toward the Value of

Improvements in Digital Library?

Missing Data Analysis

(Ithaka Data, 2012)

Abier Akiry

Asma Marghalani

ETR 790

Fall 2019

1
Data Set

The data set that we used for this project is Ithaka faculty data set from www.icpsr.org.

The data is survey data. A total of 44,218 participants were selected via an "every nth" selection

from a list of faculties by Schonfeld, Wulfson, and Housewright (2012) in the United States. A

total of 5261 individuals responded to the survey. This survey was conducted to explore faculty

attitudes and perceptions toward the digital research, teaching, and communicating.

Variables

Dependent variable

The predicted variable is a mean composite score measuring the faculty’s attitude toward

the value of improvements in digital library has eight items. Each item includes ten-point Likert

scale response options (1,2,3,4and 5 lowest to 6,7,8,9, and 10 highest). SPSS coded as two

groups 1,2,3,4and 5 into variable lowest and t6,7,8,9, and 10 into variable highest. Specifically,

the respondents were asked:

7_3) You may have the opportunity to read scholarly monographs in electronic format, either
through a library subscription database or as a standalone e-book. Certain changes in the future
may make digital versions more valuable to you. Use the scales below to rate how much more
valuable each of the following would make digital versions of scholarly monographs to you than
they are today, from 10 to 1 where 10 equals "Much more valuable than they are today" and 1
equals "Not at all more valuable than they are today." Please select one answer for each item
Access to a wider range of materials in digital form:
A. Improved ability to highlight, annotate, and print materials as needed
B. Improved ability to download and organize a personal collection of monographs
C. Improved ability to navigate through and among monographs
D. Improved ability to read scholarly monographs on my device of choice
E. Access to a wider range of materials in digital form
F. Ability to perform computational analysis (text mining) over a corpus of electronic
monographs
G. More effective integration of images, multimedia, and graphs linked to the text
H. Certified preservation of digital scholarly monographs

2
Independent variables

The independent variables are gender and job title, which are categorical predictors. The

categorical predictors gender has two responses options (1= male, 2=female) and the job title

question has six responses (1=Professor, 2=Associate Professor, 3=Assistant Professor,

4=Adjunct Professor, 5=Lecturer, and 6= Other).

Descriptive statistics:

Summary statistic for dependent variables (Table 1) includes the mean of each variable

including survey items: Q7_3_A, Q7_3_B, Q7_3_C, Q7_3_D, Q7_3_E, Q7_3_F, Q7_3_G,

Q7_3_H.

Table 1: Summary Statistic for Dependent Variables


Q7_3_A : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Access to a wider range of materials in digital form Format:F2.0
n missing distinct Info . Mean Gmd
5137 124 10 0.904 8.45 2.02

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 100 57 75 62 202 249 394 778 963 2257
Proportion 0.019 0.011 0.015 0.012 0.039 0.048 0.077 0.151 0.187 0.439

Q7_3_B : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Improved ability to download and organize a personal collection of monographs Format:F2.0
n missing distinct Info Mean Gmd
5148 113 10 0.934 8.102 2.339
lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 155 72 123 75 269 319 425 847 897 1966
Proportion 0.030 0.014 0.024 0.015 0.052 0.062 0.083 0.165 0.174 0.382

Q7_3_C : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Improved ability to navigate through and among monographs Format:F2.0
n missing distinct Info Mean Gmd .
5144 117 10 0.933 8.209 2.18

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10

3
Frequency 121 63 81 75 257 300 448 926 920 1953
Proportion 0.024 0.012 0.016 0.015 0.050 0.058 0.087 0.180 0.179 0.380

Q7_3_D : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Improved ability to read scholarly monographs on my device of choice Format:F2.0
n missing distinct Info Mean Gmd
5146 115 10 0.957 7.568 2.804

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 276 128 160 96 405 384 470 792 756 1679
Proportion 0.054 0.025 0.031 0.019 0.079 0.075 0.091 0.154 0.147 0.326

Q7_3_E : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Improved ability to highlight, annotate, and print materials as needed Format:F2.0
n missing distinct Info Mean Gmd
5151 110 10 0.918 8.133. 2.409

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 179 83 117 87 291 267 414 691 856 2166
Proportion 0.035 0.016 0.023 0.017 0.056 0.052 0.080 0.134 0.166 0.421

Q7_3_F : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Ability to perform computational analysis (text mining) over a corpus of electronic monographs
Format:F2.0
n missing distinct Info Mean Gmd .
5129 132 10 0.973 6.826. 3.375

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 544 216 233 168 424 390 437 688 652 1377
Proportion 0.106 0.042 0.045 0.033 0.083 0.076 0.085 0.134 0.127 0.268

Q7_3_G : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-More effective integration of images, multimedia, and graphs linked to the text Format:F2.0
n missing distinct Info Mean Gmd
5138 123 10 0.958 7.612 2.737

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 229 124 180 121 380 364 478 822 796 1644
Proportion 0.045 0.024 0.035 0.024 0.074 0.071 0.093 0.160 0.155 0.320

Q7_3_H : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Certified preservation of digital scholarly monographs Format:F2.0
n missing distinct Info Mean Gmd
5106 155 10 0.971 7.03 3.143

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

4
Value 1 2 3 4 5 6 7 8 9 10
Frequency 405 150 208 136 566 462 469 672 620 1418
Proportion 0.079 0.029 0.041 0.027 0.111 0.090 0.092 0.132 0.121 0.278

Summary statistic for independent variables (gender and title) (Table 2) shows the

missing for gender is 181 and the missing for the title is 96.

Table 2: Summary Statistic for Independent Variables


GenderFac
n missing distinct
5080 181 2

Value Male Female


Frequency 2863 2217
Proportion 0.564 0.436

TitleFac
n missing distinct
5165 96 6

lowest : Professor Associate Professor Assistant Professor Adjunct Professor Lecturer


highest: Associate Professor Assistant Professor Adjunct Professor Lecturer Other

Value Professor Associate Professor Assistant Professor Adjunct Professor Lecturer Other
Frequency 1949 1358 877 365 270 346
Proportion 0.377 0.263 0.170 0.071 0.052 0.067

Description of missing data patterns

The missing data pattern (Figure: 1) provides a graphical summary of the missing

patterns. 4967 cases with no missing values on the variables. On the other hand, 26 cases from

the sample with high percentage of missing values. In addition, the pattern shows the total

number of missing values for each variable. For instance, the total number of missing values for

Title is 96 and total number of missing values for the gender is 181. The missing from the overall

data set is 1266 values.

5
Figure:1 Missing data patterns
Graphical Representations of Missing Data

This plot (Figure: 2) provides a specific visualization of the amount of missing data,

showing in black the location of missing values (2.4%) and providing information on the overall

percentage of present values (97.6%) overall and in each variable. None of the variables are

missing over 5% of value.

Figure 2: Missing data plots

6
Missing data plot (Figure 3) shows that all demographic variables have missing data. The
plot also shows that the most of missing data come from the gender variable.

Figure:3 Missing data Plot


Little's test for MCAR

Figure 4: MCAR test

7
For missing values, first we removed the cases that are missing values for all the items

(Figure 4). The total of missing value was 1226 missing after removing missing value became

1006 missing value. Then, we assessed to determine whether data is missing completely at

random (MCAR) or missing at random (MAR). The MCAR test showed that X2 (252) = 288.48,

p = 0.056 which indicated that the null hypothesis was not rejected, and the missing values were

assessed to be MCAR.

Tests for MAR

Before started doing MAR test, the missing values for each variable were dummy coded

as 1 = value is missing and 0 = value is not missing. Examine the relationship of this

“missingness” using logistic regression. Table 3 below shows that the missingness of variables

predicted by other variables in the second row of the table (P<.05). For example, the missingness

of Q_7_3_ G which the variable “more effective integration of images, multimedia, and graphs

linked to the text” predicted by the title “Professor” and gender “Female.

Table 3: Tests for MAR

Variables Predictor Estimate Std. Error z value P-value


Q7_3_G Professor -2.12298 1.14542 -2.013 0.0441
Female 1.22333 0.55855 2.190 0.0285
Q7_3_D Q7_3_H 0.55503 0.27145 2.045 0.040888
Q7_3_C Q7_3_F 4.451e-01 1.827e-01 2.437 0.0148
Q7_3_A Q7_3_D -0.23915 0.10701 -2.235 0.0254

Imputation
Single imputation was employed for dealing with missing values in this analysis based on

the results of tests for MCAR p>0.5 to impute the missing values. Before doing this process, we

computed of alpha assumes that the scale is measuring a single construct. The result after random

single imputation method of missing values:

8
10 Variables 5261 Observations
------------------------------------------------------------------------------------------------------------------
Q7_3_A
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
5261 0 10 0.897 8.487 1.997 4 6 8 9 10 10 10

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 100 57 75 62 202 249 394 778 963 2381
Proportion 0.019 0.011 0.014 0.012 0.038 0.047 0.075 0.148 0.183 0.453
------------------------------------------------------------------------------------------------------------------
Q7_3_B
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
5261 0 10 0.938 7.95 2.538 1 4 7 9 10 10 10

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 268 72 123 75 269 319 425 847 897 1966
Proportion 0.051 0.014 0.023 0.014 0.051 0.061 0.081 0.161 0.170 0.374
------------------------------------------------------------------------------------------------------------------
Q7_3_C
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
5261 0 10 0.935 8.227 2.152 3 5 7 9 10 10 10

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 121 63 81 75 257 300 448 926 1037 1953
Proportion 0.023 0.012 0.015 0.014 0.049 0.057 0.085 0.176 0.197 0.371
------------------------------------------------------------------------------------------------------------------
Q7_3_D
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
5261 0 10 0.958 7.599 2.772 1 3 6 8 10 10 10

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 276 128 160 96 405 384 470 792 871 1679
Proportion 0.052 0.024 0.030 0.018 0.077 0.073 0.089 0.151 0.166 0.319
------------------------------------------------------------------------------------------------------------------
Q7_3_E
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
5261 0 10 0.912 8.172 2.386 3 5 7 9 10 10 10

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 179 83 117 87 291 267 414 691 856 2276
Proportion 0.034 0.016 0.022 0.017 0.055 0.051 0.079 0.131 0.163 0.433
------------------------------------------------------------------------------------------------------------------
Q7_3_F
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
5261 0 10 0.974 6.679 3.493 1 1 5 8 10 10 10

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

9
Value 1 2 3 4 5 6 7 8 9 10
Frequency 676 216 233 168 424 390 437 688 652 1377
Proportion 0.128 0.041 0.044 0.032 0.081 0.074 0.083 0.131 0.124 0.262
------------------------------------------------------------------------------------------------------------------
Q7_3_G
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
5261 0 10 0.96 7.574 2.734 2 3 6 8 10 10 10

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 229 124 180 121 380 487 478 822 796 1644
Proportion 0.044 0.024 0.034 0.023 0.072 0.093 0.091 0.156 0.151 0.312
------------------------------------------------------------------------------------------------------------------
Q7_3_H
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
5261 0 10 0.972 7.088 3.105 1 2 5 8 10 10 10

lowest : 1 2 3 4 5, highest: 6 7 8 9 10

Value 1 2 3 4 5 6 7 8 9 10
Frequency 405 150 208 136 566 462 469 672 775 1418
Proportion 0.077 0.029 0.040 0.026 0.108 0.088 0.089 0.128 0.147 0.270
------------------------------------------------------------------------------------------------------------------
GenderFac
n missing distinct Info Mean Gmd
5261 0 2 0.731 1.421 0.4877

Value 1 2
Frequency 3044 2217
Proportion 0.579 0.421
---------------------------------------------------------------------------------------------------------------------------
TitleFac
n missing distinct Info Mean Gmd
5261 0 6 0.919 2.334 1.565

lowest : 1 2 3 4 5, highest: 2 3 4 5 6

Value 1 2 3 4 5 6
Frequency 2045 1358 877 365 270 346
Proportion 0.389 0.258 0.167 0.069 0.051 0.066

Inferential Analysis

The inferential analysis was used the Regression and the randomly imputed was carried

out to determine the relationship between the independent variables (job title, gender) and

dependent variables (the faculty’s attitude) toward the value of improvements in digital library).

10
After the randomly-imputed, Gender is significant predictor (p=.0001). The Job title is not

significant predictor (p= .0052). (Figure:5)

Discussion

The final linear model, y = 7.34374 + 0.199227[gender] + 0.04697[title], shows us that a

respondent is more likely to place high value on improvements in elibraries if they are a woman

with a lower ranked job title. Male professors are most likely to place a low value on

improvements. The possible reasons why data were missing because some participants agreed to

participate and did not respond to all survey items, or (2) participants began the survey but only

responded to the first section of the online survey Likert scales. (the demographic questions

places at the end of the survey).

11
References
Housewright, R., Schonfeld, R. C., & Wulfson, K. (2013). Ithaka S+ R US faculty survey 2012 (pp.
45-80). Ithaka S+ R.

12
Appendix: R Code

Convert Gender and title to factors:


#Convert gender to factor
ithaka$GenderFac<-factor(ithaka$Gender,
levels=c(1,2),
labels=c("Male", "Female"))

#Convert Title to factor


ithaka$TitleFac<-factor(ithaka$Title,
levels=c(1:6),
labels=c("Professor",
"Associate Professor",
"Assistant Professor",
"Adjunct Professor",
"Lecturer",
"Other"))

Create a subset of the data and compute descriptive statistics:


#Create a subset of the ithaka data
library(dplyr)
ithakaSubset1<- dplyr::select(ithaka,
Q7_3_A,Q7_3_B,Q7_3_C,Q7_3_D,Q7_3_E,Q7_3_F,Q7_3_G,Q7_3_H,
GenderFac,TitleFac)

#Descriptive statistics using Hmisc::describe


library(Hmisc)
Hmisc::describe (ithakaSubset1)

Missing data plot using visdat package: R code


#Missing data plot using visdat package
library(visdat)
visdat::vis_miss(ithakaSubset1)
Missing data pattern plot using naniar package

Missing data plots with naniar package

#Missing value plots with naniar package


library(naniar)
naniar::gg_miss_upset(ithakaSubset1,

nsets=10) #Note:nsets = number of vars

#Plotting the number of variables with missing values for each case
naniar::gg_miss_case(ithakaSubset1)
naniar::gg_miss_case(ithakaSubset1,
facet=GenderFac)
naniar::gg_miss_case(ithakaSubset1,
facet=TitleFac)

Missing data patterns


#checking missing data patterns

13
library(mice)
mice::md.pattern(ithakaSubset1)

Little’s test for MCAR


#Testing for MCAR
library(BaylorEdPsych)
MCARtest1<-BaylorEdPsych::LittleMCAR(ithakaSubset1)
MCARtest1$chi.square
MCARtest1$df
MCARtest1$p.value
MCARtest1$missing.patterns
MCARtest1$amount.missing

#Create a subset of the ithaka data


library(dplyr)
ithakaQ7_3items2<- dplyr::select(ithaka,
Q7_3_A,Q7_3_B,Q7_3_C,Q7_3_D,Q7_3_E,Q7_3_F,Q7_3_G,Q7_3_H,
GenderFac,TitleFac)

If this occurs, remove those cases that have no valid responses from the dataframe
#Removing cases that are missing values for all the items

ithakaQ7_3items2<-dplyr::filter(ithakaSubset1,

!is.na(Q7_3_A) | !is.na(Q7_3_B) | !is.na(Q7_3_C) |

!is.na(Q7_3_D) | !is.na(Q7_3_E) | !is.na(Q7_3_F) |

!is.na(Q7_3_G) | !is.na(Q7_3_H) | !is.na(GenderFac) | !is.na(TitleFac))

mice::md.pattern(ithakaQ7_3items2)

Testing for MCAR


#Testing for MCAR
library(BaylorEdPsych)
MCARtest3<-BaylorEdPsych::LittleMCAR(ithakaQ7_3items2)
MCARtest3$chi.square
MCARtest3$df
MCARtest3$p.value
MCARtest3$missing.patterns
MCARtest3$amount.missing

Create dummy variables that code “missingness” of the variables: R code


#Creating dummy (0/1) variables for missing values of Q7
library(car)
ithakaQ7_3items2$Q7_3_Amiss<-car::recode(ithakaQ7_3items2$Q7_3_A,

"NA=1;
else=0")
ithakaQ7_3items2$Q7_3_Bmiss<-car::recode(ithakaQ7_3items2$Q7_3_B,
"NA=1;

14
else=0")

ithakaQ7_3items2$Q7_3_Cmiss<-car::recode(ithakaQ7_3items2$Q7_3_C,

"NA=1;
else=0")
ithakaQ7_3items2$Q7_3_Dmiss<-car::recode(ithakaQ7_3items2$Q7_3_D,
"NA=1;
else=0")
ithakaQ7_3items2$ Q7_3_Emiss<-car::recode(ithakaQ7_3items2$Q7_3_E,
"NA=1;
else=0")

ithakaQ7_3items2$ Q7_3_Fmiss<-car::recode(ithakaQ7_3items2$Q7_3_F,
"NA=1;
else=0")

ithakaQ7_3items2$ Q7_3_Gmiss<-car::recode(ithakaQ7_3items2$Q7_3_G,
"NA=1;
else=0")

ithakaQ7_3items2$Q7_3_Hmiss<-car::recode(ithakaQ7_3items2$Q7_3_H,
"NA=1;
else=0")

ithakaQ7_3items2$GenderFacmiss<-car::recode(ithakaQ7_3items2$GenderFac,
"NA=1;
else=0")

ithakaQ7_3items2$TitleFacmiss<-car::recode(ithakaQ7_3items2$TitleFac,
"NA=1;

else=0")

Logistic regression predicting “missingness” of Q7_3_A from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_A from education Title, Gender, and other Q7_3 items

Q7_3_AMARcheck<-glm(data=ithakaQ7_3items2,

Q7_3_Amiss~
TitleFac+GenderFac+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,

family=binomial)

summary(Q7_3_AMARcheck)

Logistic regression predicting “missingness” of Q7_3_B from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_B from education Title, Gender, and other Q7_3 items

Q7_3_BMARcheck<-glm(data=ithakaQ7_3items2,

15
Q7_3_Bmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,

family=binomial)

summary(Q7_3_BMARcheck)

Logistic regression predicting “missingness” of Q7_3_C from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_C from education Title, Gender, and other Q7_3 items

Q7_3_CMARcheck<-glm(data=ithakaQ7_3items2,

Q7_3_Cmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,

family=binomial)

summary(Q7_3_CMARcheck)

Logistic regression predicting “missingness” of Q7_3_D from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_D from education Title, Gender, and other Q7_3 items

Q7_3_DMARcheck<-glm(data=ithakaQ7_3items2,

Q7_3_Dmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,

family=binomial)

summary(Q7_3_DMARcheck)

Logistic regression predicting “missingness” of Q7_3_E from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_E from education Title, Gender, and other Q7_3 items

Q7_3_EMARcheck<-glm(data=ithakaQ7_3items2,

Q7_3_Emiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_F+Q7_3_G+Q7_3_H,

family=binomial)

summary(Q7_3_EMARcheck)

Logistic regression predicting “missingness” of Q7_3_F from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_F from education Title, Gender, and other Q7_3 items

Q7_3_FMARcheck<-glm(data=ithakaQ7_3items2,

16
Q7_3_Fmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_G+Q7_3_H,

family=binomial)

summary(Q7_3_FMARcheck)

Logistic regression predicting “missingness” of Q7_3_G from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_G from education Title, Gender, and other Q7_3 items

Q7_3_GMARcheck<-glm(data=ithakaQ7_3items2,

Q7_3_Gmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_H,

family=binomial)

summary(Q7_3_GMARcheck)

Logistic regression predicting “missingness” of Q7_3_H from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_H from education Title, Gender, and other Q7_3 items

Q7_3_HMARcheck<-glm(data=ithakaQ7_3items2,

Q7_3_Hmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G,

family=binomial)

summary(Q7_3_HMARcheck)

Logistic regression predicting “missingness” of Title from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Title from education Gender, and other Q7_3 items

TitleFacMARcheck<-glm(data=ithakaQ7_3items2,

TitleFacmiss~
GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,

family=binomial)

summary(TitleFacMARcheck)

Logistic regression predicting “missingness” of Gender from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Gender from education Title, and other Q7_3 items

GenderFacMARcheck<-glm(data=ithakaQ7_3items2,

17
GenderFacmiss~
TitleFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,

family=binomial)

summary(GenderFacMARcheck)

Create a composite (total) “Faculty’s Attitude “ score for each item: R code
#Compute a composite ‘Faculty’s Attitude’ (SUeB) score and add it to the dataframe
ithakaSubset1$SUeB<- rowMeans(cbind(ithaka$Q7_3_A,
ithaka$Q7_3_B,
ithaka$Q7_3_C,
ithaka$Q7_3_D,
ithaka$Q7_3_E,
ithaka$Q7_3_F,
ithaka$Q7_3_G,
ithaka$Q7_3_H),
na.rm=TRUE)

Random single imputation of missing values using the imputeR package:


R code
#Create new dataframe with randomly-imputed missing values
ithakaRanImpute<-imputeR::guess(ithakaSubset1, type="random")
ithakaRanImpute<-as.data.frame(ithakaRanImpute)
ithakaRanImpute$GenderFac<-factor(ithakRanImpute$Gender,
levels=c(1,2),
labels=c("Male","Female"))
ithakaRanImpute$titleFac<-factor(ithakRanImpute$Title,
levels=c(1:6),

labels=c("Professor",

"Associate Professor",

"Assistant Professor",

"Adjunct Professor",

"Lecturer",

"Other"))
Hmisc::describe(ithakaRanImpute)

Regression using the randomly-imputed data set: R code


#Regression using the randomly-imputed data set
ithakaRanImputeReg <- lm(SUeB ~ GenderFac + TitleFac,
data=ithakaRanImpute)
summary(ithakaRanImputeReg)
plot(ithakaRanImputeReg)

#Missing data plot using visdat package


library(visdat)
visdat::vis_miss(ithakaSubset1)

18
Missing data pattern plot using naniar package

#Missing value plots with naniar package


library(naniar)
naniar::gg_miss_upset(ithakaSubset1,

nsets=10) #Note:nsets = number of vars

19

Das könnte Ihnen auch gefallen