Etr790 Project Paper

Do the Following Factors (gender, title) Predict the Faculty’s Attitude Toward the Value of
Improvements in Digital Library?
Missing Data Analysis
(Ithaka Data, 2012)
Abier Akiry
Asma Marghalani
ETR 790
Fall 2019
1
Data Set
The data set that we used for this project is Ithaka faculty data set from www.icpsr.org.
The data is survey data. A total of 44,218 participants were selected via an "every nth" selection
from a list of faculties by Schonfeld, Wulfson, and Housewright (2012) in the United States. A
total of 5261 individuals responded to the survey. This survey was conducted to explore faculty
attitudes and perceptions toward the digital research, teaching, and communicating.
Variables
Dependent variable
The predicted variable is a mean composite score measuring the faculty’s attitude toward
the value of improvements in digital library has eight items. Each item includes ten-point Likert
scale response options (1,2,3,4and 5 lowest to 6,7,8,9, and 10 highest). SPSS coded as two
groups 1,2,3,4and 5 into variable lowest and t6,7,8,9, and 10 into variable highest. Specifically,
the respondents were asked:
7_3) You may have the opportunity to read scholarly monographs in electronic format, either
through a library subscription database or as a standalone e-book. Certain changes in the future
may make digital versions more valuable to you. Use the scales below to rate how much more
valuable each of the following would make digital versions of scholarly monographs to you than
they are today, from 10 to 1 where 10 equals "Much more valuable than they are today" and 1
equals "Not at all more valuable than they are today." Please select one answer for each item
Access to a wider range of materials in digital form:
A. Improved ability to highlight, annotate, and print materials as needed
B. Improved ability to download and organize a personal collection of monographs
C. Improved ability to navigate through and among monographs
D. Improved ability to read scholarly monographs on my device of choice
E. Access to a wider range of materials in digital form
F. Ability to perform computational analysis (text mining) over a corpus of electronic
monographs
G. More effective integration of images, multimedia, and graphs linked to the text
H. Certified preservation of digital scholarly monographs
2
Independent variables
The independent variables are gender and job title, which are categorical predictors. The
categorical predictors gender has two responses options (1= male, 2=female) and the job title
question has six responses (1=Professor, 2=Associate Professor, 3=Assistant Professor,
4=Adjunct Professor, 5=Lecturer, and 6= Other).
Descriptive statistics:
Summary statistic for dependent variables (Table 1) includes the mean of each variable
including survey items: Q7_3_A, Q7_3_B, Q7_3_C, Q7_3_D, Q7_3_E, Q7_3_F, Q7_3_G,
Q7_3_H.
Table 1: Summary Statistic for Dependent Variables

Q7_3_A : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Access to a wider range of materials in digital form Format:F2.0
n missing distinct Info . Mean Gmd
5137 124 10 0.904 8.45 2.02
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 100 57 75 62 202 249 394 778 963 2257
Proportion 0.019 0.011 0.015 0.012 0.039 0.048 0.077 0.151 0.187 0.439
Q7_3_B : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Improved ability to download and organize a personal collection of monographs Format:F2.0
n missing distinct Info Mean Gmd
5148 113 10 0.934 8.102 2.339
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 155 72 123 75 269 319 425 847 897 1966
Proportion 0.030 0.014 0.024 0.015 0.052 0.062 0.083 0.165 0.174 0.382
Q7_3_C : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Improved ability to navigate through and among monographs Format:F2.0
n missing distinct Info Mean Gmd .
5144 117 10 0.933 8.209 2.18
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
3
Frequency 121 63 81 75 257 300 448 926 920 1953
Proportion 0.024 0.012 0.016 0.015 0.050 0.058 0.087 0.180 0.179 0.380
Q7_3_D : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Improved ability to read scholarly monographs on my device of choice Format:F2.0
5146 115 10 0.957 7.568 2.804
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 276 128 160 96 405 384 470 792 756 1679
Proportion 0.054 0.025 0.031 0.019 0.079 0.075 0.091 0.154 0.147 0.326
Q7_3_E : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Improved ability to highlight, annotate, and print materials as needed Format:F2.0
5151 110 10 0.918 8.133. 2.409
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 179 83 117 87 291 267 414 691 856 2166
Proportion 0.035 0.016 0.023 0.017 0.056 0.052 0.080 0.134 0.166 0.421
Q7_3_F : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Ability to perform computational analysis (text mining) over a corpus of electronic monographs
Format:F2.0
n missing distinct Info Mean Gmd .
5129 132 10 0.973 6.826. 3.375
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 544 216 233 168 424 390 437 688 652 1377
Proportion 0.106 0.042 0.045 0.033 0.083 0.076 0.085 0.134 0.127 0.268
Q7_3_G : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-More effective integration of images, multimedia, and graphs linked to the text Format:F2.0
5138 123 10 0.958 7.612 2.737
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 229 124 180 121 380 364 478 822 796 1644
Proportion 0.045 0.024 0.035 0.024 0.074 0.071 0.093 0.160 0.155 0.320
Q7_3_H : You / may have the opportunity to read scholarly monographs in electronic format, / either through a
lib...-Certified preservation of digital scholarly monographs Format:F2.0
5106 155 10 0.971 7.03 3.143
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
4
Value 1 2 3 4 5 6 7 8 9 10
Frequency 405 150 208 136 566 462 469 672 620 1418
Proportion 0.079 0.029 0.041 0.027 0.111 0.090 0.092 0.132 0.121 0.278
Summary statistic for independent variables (gender and title) (Table 2) shows the
missing for gender is 181 and the missing for the title is 96.
Table 2: Summary Statistic for Independent Variables

GenderFac
n missing distinct
5080 181 2
Value Male Female

Frequency 2863 2217
Proportion 0.564 0.436
TitleFac
n missing distinct
5165 96 6
lowest : Professor Associate Professor Assistant Professor Adjunct Professor Lecturer

highest: Associate Professor Assistant Professor Adjunct Professor Lecturer Other
Value Professor Associate Professor Assistant Professor Adjunct Professor Lecturer Other
Frequency 1949 1358 877 365 270 346
Proportion 0.377 0.263 0.170 0.071 0.052 0.067
Description of missing data patterns
The missing data pattern (Figure: 1) provides a graphical summary of the missing
patterns. 4967 cases with no missing values on the variables. On the other hand, 26 cases from
the sample with high percentage of missing values. In addition, the pattern shows the total
number of missing values for each variable. For instance, the total number of missing values for
Title is 96 and total number of missing values for the gender is 181. The missing from the overall
data set is 1266 values.
5
Figure:1 Missing data patterns
Graphical Representations of Missing Data
This plot (Figure: 2) provides a specific visualization of the amount of missing data,
showing in black the location of missing values (2.4%) and providing information on the overall
percentage of present values (97.6%) overall and in each variable. None of the variables are
missing over 5% of value.
Figure 2: Missing data plots
6
Missing data plot (Figure 3) shows that all demographic variables have missing data. The
plot also shows that the most of missing data come from the gender variable.
Figure:3 Missing data Plot

Little's test for MCAR
Figure 4: MCAR test
7
For missing values, first we removed the cases that are missing values for all the items
(Figure 4). The total of missing value was 1226 missing after removing missing value became
1006 missing value. Then, we assessed to determine whether data is missing completely at
random (MCAR) or missing at random (MAR). The MCAR test showed that X2 (252) = 288.48,
p = 0.056 which indicated that the null hypothesis was not rejected, and the missing values were
assessed to be MCAR.
Tests for MAR
Before started doing MAR test, the missing values for each variable were dummy coded
as 1 = value is missing and 0 = value is not missing. Examine the relationship of this
“missingness” using logistic regression. Table 3 below shows that the missingness of variables
predicted by other variables in the second row of the table (P<.05). For example, the missingness
of Q_7_3_ G which the variable “more effective integration of images, multimedia, and graphs
linked to the text” predicted by the title “Professor” and gender “Female.
Table 3: Tests for MAR
Variables Predictor Estimate Std. Error z value P-value

Q7_3_G Professor -2.12298 1.14542 -2.013 0.0441
Female 1.22333 0.55855 2.190 0.0285
Q7_3_D Q7_3_H 0.55503 0.27145 2.045 0.040888
Q7_3_C Q7_3_F 4.451e-01 1.827e-01 2.437 0.0148
Q7_3_A Q7_3_D -0.23915 0.10701 -2.235 0.0254
Imputation
Single imputation was employed for dealing with missing values in this analysis based on
the results of tests for MCAR p>0.5 to impute the missing values. Before doing this process, we
computed of alpha assumes that the scale is measuring a single construct. The result after random
single imputation method of missing values:
8
10 Variables 5261 Observations
------------------------------------------------------------------------------------------------------------------
Q7_3_A
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
5261 0 10 0.897 8.487 1.997 4 6 8 9 10 10 10
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 100 57 75 62 202 249 394 778 963 2381
Proportion 0.019 0.011 0.014 0.012 0.038 0.047 0.075 0.148 0.183 0.453
------------------------------------------------------------------------------------------------------------------
Q7_3_B
5261 0 10 0.938 7.95 2.538 1 4 7 9 10 10 10
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 268 72 123 75 269 319 425 847 897 1966
Proportion 0.051 0.014 0.023 0.014 0.051 0.061 0.081 0.161 0.170 0.374
------------------------------------------------------------------------------------------------------------------
Q7_3_C
5261 0 10 0.935 8.227 2.152 3 5 7 9 10 10 10
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 121 63 81 75 257 300 448 926 1037 1953
Proportion 0.023 0.012 0.015 0.014 0.049 0.057 0.085 0.176 0.197 0.371
------------------------------------------------------------------------------------------------------------------
Q7_3_D
5261 0 10 0.958 7.599 2.772 1 3 6 8 10 10 10
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 276 128 160 96 405 384 470 792 871 1679
Proportion 0.052 0.024 0.030 0.018 0.077 0.073 0.089 0.151 0.166 0.319
------------------------------------------------------------------------------------------------------------------
Q7_3_E
5261 0 10 0.912 8.172 2.386 3 5 7 9 10 10 10
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 179 83 117 87 291 267 414 691 856 2276
Proportion 0.034 0.016 0.022 0.017 0.055 0.051 0.079 0.131 0.163 0.433
------------------------------------------------------------------------------------------------------------------
Q7_3_F
5261 0 10 0.974 6.679 3.493 1 1 5 8 10 10 10
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
9
Value 1 2 3 4 5 6 7 8 9 10
Frequency 676 216 233 168 424 390 437 688 652 1377
Proportion 0.128 0.041 0.044 0.032 0.081 0.074 0.083 0.131 0.124 0.262
------------------------------------------------------------------------------------------------------------------
Q7_3_G
5261 0 10 0.96 7.574 2.734 2 3 6 8 10 10 10
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 229 124 180 121 380 487 478 822 796 1644
Proportion 0.044 0.024 0.034 0.023 0.072 0.093 0.091 0.156 0.151 0.312
------------------------------------------------------------------------------------------------------------------
Q7_3_H
5261 0 10 0.972 7.088 3.105 1 2 5 8 10 10 10
lowest : 1 2 3 4 5, highest: 6 7 8 9 10
Value 1 2 3 4 5 6 7 8 9 10
Frequency 405 150 208 136 566 462 469 672 775 1418
Proportion 0.077 0.029 0.040 0.026 0.108 0.088 0.089 0.128 0.147 0.270
------------------------------------------------------------------------------------------------------------------
GenderFac
5261 0 2 0.731 1.421 0.4877
Value 1 2
Frequency 3044 2217
Proportion 0.579 0.421
---------------------------------------------------------------------------------------------------------------------------
TitleFac
5261 0 6 0.919 2.334 1.565
lowest : 1 2 3 4 5, highest: 2 3 4 5 6
Value 1 2 3 4 5 6
Frequency 2045 1358 877 365 270 346
Proportion 0.389 0.258 0.167 0.069 0.051 0.066
Inferential Analysis
The inferential analysis was used the Regression and the randomly imputed was carried
out to determine the relationship between the independent variables (job title, gender) and
dependent variables (the faculty’s attitude) toward the value of improvements in digital library).
10
After the randomly-imputed, Gender is significant predictor (p=.0001). The Job title is not
significant predictor (p= .0052). (Figure:5)
Discussion
The final linear model, y = 7.34374 + 0.199227[gender] + 0.04697[title], shows us that a
respondent is more likely to place high value on improvements in elibraries if they are a woman
with a lower ranked job title. Male professors are most likely to place a low value on
improvements. The possible reasons why data were missing because some participants agreed to
participate and did not respond to all survey items, or (2) participants began the survey but only
responded to the first section of the online survey Likert scales. (the demographic questions
places at the end of the survey).
11
References
Housewright, R., Schonfeld, R. C., & Wulfson, K. (2013). Ithaka S+ R US faculty survey 2012 (pp.
45-80). Ithaka S+ R.
12
Appendix: R Code
Convert Gender and title to factors:

#Convert gender to factor
ithaka$GenderFac<-factor(ithaka$Gender,
levels=c(1,2),
labels=c("Male", "Female"))
#Convert Title to factor

ithaka$TitleFac<-factor(ithaka$Title,
levels=c(1:6),
labels=c("Professor",
"Associate Professor",
"Assistant Professor",
"Adjunct Professor",
"Lecturer",
"Other"))
Create a subset of the data and compute descriptive statistics:

#Create a subset of the ithaka data
library(dplyr)
ithakaSubset1<- dplyr::select(ithaka,
Q7_3_A,Q7_3_B,Q7_3_C,Q7_3_D,Q7_3_E,Q7_3_F,Q7_3_G,Q7_3_H,
GenderFac,TitleFac)
#Descriptive statistics using Hmisc::describe

library(Hmisc)
Hmisc::describe (ithakaSubset1)
Missing data plot using visdat package: R code

#Missing data plot using visdat package
library(visdat)
visdat::vis_miss(ithakaSubset1)
Missing data pattern plot using naniar package
Missing data plots with naniar package
#Missing value plots with naniar package

library(naniar)
naniar::gg_miss_upset(ithakaSubset1,
nsets=10) #Note:nsets = number of vars
#Plotting the number of variables with missing values for each case
naniar::gg_miss_case(ithakaSubset1)
naniar::gg_miss_case(ithakaSubset1,
facet=GenderFac)
naniar::gg_miss_case(ithakaSubset1,
facet=TitleFac)
Missing data patterns

#checking missing data patterns
13
library(mice)
mice::md.pattern(ithakaSubset1)
Little’s test for MCAR

#Testing for MCAR
library(BaylorEdPsych)
MCARtest1<-BaylorEdPsych::LittleMCAR(ithakaSubset1)
MCARtest1$chi.square
MCARtest1$df
MCARtest1$p.value
MCARtest1$missing.patterns
MCARtest1$amount.missing
#Create a subset of the ithaka data

library(dplyr)
ithakaQ7_3items2<- dplyr::select(ithaka,
Q7_3_A,Q7_3_B,Q7_3_C,Q7_3_D,Q7_3_E,Q7_3_F,Q7_3_G,Q7_3_H,
GenderFac,TitleFac)
If this occurs, remove those cases that have no valid responses from the dataframe
#Removing cases that are missing values for all the items
ithakaQ7_3items2<-dplyr::filter(ithakaSubset1,
!is.na(Q7_3_A) | !is.na(Q7_3_B) | !is.na(Q7_3_C) |
!is.na(Q7_3_D) | !is.na(Q7_3_E) | !is.na(Q7_3_F) |
!is.na(Q7_3_G) | !is.na(Q7_3_H) | !is.na(GenderFac) | !is.na(TitleFac))
mice::md.pattern(ithakaQ7_3items2)
Testing for MCAR

#Testing for MCAR
library(BaylorEdPsych)
MCARtest3<-BaylorEdPsych::LittleMCAR(ithakaQ7_3items2)
MCARtest3$chi.square
MCARtest3$df
MCARtest3$p.value
MCARtest3$missing.patterns
MCARtest3$amount.missing
Create dummy variables that code “missingness” of the variables: R code

#Creating dummy (0/1) variables for missing values of Q7
library(car)
ithakaQ7_3items2$Q7_3_Amiss<-car::recode(ithakaQ7_3items2$Q7_3_A,
"NA=1;
else=0")
ithakaQ7_3items2$Q7_3_Bmiss<-car::recode(ithakaQ7_3items2$Q7_3_B,
"NA=1;
14
else=0")
ithakaQ7_3items2$Q7_3_Cmiss<-car::recode(ithakaQ7_3items2$Q7_3_C,
"NA=1;
else=0")
ithakaQ7_3items2$Q7_3_Dmiss<-car::recode(ithakaQ7_3items2$Q7_3_D,
"NA=1;
else=0")
ithakaQ7_3items2$ Q7_3_Emiss<-car::recode(ithakaQ7_3items2$Q7_3_E,
"NA=1;
else=0")
ithakaQ7_3items2$ Q7_3_Fmiss<-car::recode(ithakaQ7_3items2$Q7_3_F,
"NA=1;
else=0")
ithakaQ7_3items2$ Q7_3_Gmiss<-car::recode(ithakaQ7_3items2$Q7_3_G,
"NA=1;
else=0")
ithakaQ7_3items2$Q7_3_Hmiss<-car::recode(ithakaQ7_3items2$Q7_3_H,
"NA=1;
else=0")
ithakaQ7_3items2$GenderFacmiss<-car::recode(ithakaQ7_3items2$GenderFac,
"NA=1;
else=0")
ithakaQ7_3items2$TitleFacmiss<-car::recode(ithakaQ7_3items2$TitleFac,
"NA=1;
else=0")
Logistic regression predicting “missingness” of Q7_3_A from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_A from education Title, Gender, and other Q7_3 items
Q7_3_AMARcheck<-glm(data=ithakaQ7_3items2,
Q7_3_Amiss~
TitleFac+GenderFac+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,
family=binomial)
summary(Q7_3_AMARcheck)
Logistic regression predicting “missingness” of Q7_3_B from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_B from education Title, Gender, and other Q7_3 items
Q7_3_BMARcheck<-glm(data=ithakaQ7_3items2,
15
Q7_3_Bmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,
family=binomial)
summary(Q7_3_BMARcheck)
Logistic regression predicting “missingness” of Q7_3_C from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_C from education Title, Gender, and other Q7_3 items
Q7_3_CMARcheck<-glm(data=ithakaQ7_3items2,
Q7_3_Cmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,
family=binomial)
summary(Q7_3_CMARcheck)
Logistic regression predicting “missingness” of Q7_3_D from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_D from education Title, Gender, and other Q7_3 items
Q7_3_DMARcheck<-glm(data=ithakaQ7_3items2,
Q7_3_Dmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,
family=binomial)
summary(Q7_3_DMARcheck)
Logistic regression predicting “missingness” of Q7_3_E from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_E from education Title, Gender, and other Q7_3 items
Q7_3_EMARcheck<-glm(data=ithakaQ7_3items2,
Q7_3_Emiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_F+Q7_3_G+Q7_3_H,
family=binomial)
summary(Q7_3_EMARcheck)
Logistic regression predicting “missingness” of Q7_3_F from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_F from education Title, Gender, and other Q7_3 items
Q7_3_FMARcheck<-glm(data=ithakaQ7_3items2,
16
Q7_3_Fmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_G+Q7_3_H,
family=binomial)
summary(Q7_3_FMARcheck)
Logistic regression predicting “missingness” of Q7_3_G from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_G from education Title, Gender, and other Q7_3 items
Q7_3_GMARcheck<-glm(data=ithakaQ7_3items2,
Q7_3_Gmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_H,
family=binomial)
summary(Q7_3_GMARcheck)
Logistic regression predicting “missingness” of Q7_3_H from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Q7_3_H from education Title, Gender, and other Q7_3 items
Q7_3_HMARcheck<-glm(data=ithakaQ7_3items2,
Q7_3_Hmiss~
TitleFac+GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G,
family=binomial)
summary(Q7_3_HMARcheck)
Logistic regression predicting “missingness” of Title from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Title from education Gender, and other Q7_3 items
TitleFacMARcheck<-glm(data=ithakaQ7_3items2,
TitleFacmiss~
GenderFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,
family=binomial)
summary(TitleFacMARcheck)
Logistic regression predicting “missingness” of Gender from the remaining variables in the dataframe: R
code
#Logistic regression predicting missingness of Gender from education Title, and other Q7_3 items
GenderFacMARcheck<-glm(data=ithakaQ7_3items2,
17
GenderFacmiss~
TitleFac+Q7_3_A+Q7_3_B+Q7_3_C+Q7_3_D+Q7_3_E+Q7_3_F+Q7_3_G+Q7_3_H,
family=binomial)
summary(GenderFacMARcheck)
Create a composite (total) “Faculty’s Attitude “ score for each item: R code
#Compute a composite ‘Faculty’s Attitude’ (SUeB) score and add it to the dataframe
ithakaSubset1$SUeB<- rowMeans(cbind(ithaka$Q7_3_A,
ithaka$Q7_3_B,
ithaka$Q7_3_C,
ithaka$Q7_3_D,
ithaka$Q7_3_E,
ithaka$Q7_3_F,
ithaka$Q7_3_G,
ithaka$Q7_3_H),
na.rm=TRUE)
Random single imputation of missing values using the imputeR package:

R code
#Create new dataframe with randomly-imputed missing values
ithakaRanImpute<-imputeR::guess(ithakaSubset1, type="random")
ithakaRanImpute<-as.data.frame(ithakaRanImpute)
ithakaRanImpute$GenderFac<-factor(ithakRanImpute$Gender,
levels=c(1,2),
labels=c("Male","Female"))
ithakaRanImpute$titleFac<-factor(ithakRanImpute$Title,
levels=c(1:6),
labels=c("Professor",
"Associate Professor",
"Assistant Professor",
"Adjunct Professor",
"Lecturer",
"Other"))
Hmisc::describe(ithakaRanImpute)
Regression using the randomly-imputed data set: R code

#Regression using the randomly-imputed data set
ithakaRanImputeReg <- lm(SUeB ~ GenderFac + TitleFac,
data=ithakaRanImpute)
summary(ithakaRanImputeReg)
plot(ithakaRanImputeReg)
#Missing data plot using visdat package

library(visdat)
visdat::vis_miss(ithakaSubset1)
18
Missing data pattern plot using naniar package
#Missing value plots with naniar package

library(naniar)
naniar::gg_miss_upset(ithakaSubset1,
nsets=10) #Note:nsets = number of vars
19

Etr790 Project Paper

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Etr790 Project Paper

Hochgeladen von

Copyright:

Verfügbare Formate

Do the Following Factors (gender, title) Predict the Faculty’s Attitude Toward the Value of

Improvements in Digital Library?

Missing Data Analysis

(Ithaka Data, 2012)

the respondents were asked:

question has six responses (1=Professor, 2=Associate Professor, 3=Assistant Professor,

4=Adjunct Professor, 5=Lecturer, and 6= Other).

Table 1: Summary Statistic for Dependent Variables

Table 2: Summary Statistic for Independent Variables

Value Male Female

lowest : Professor Associate Professor Assistant Professor Adjunct Professor Lecturer

Description of missing data patterns

data set is 1266 values.

missing over 5% of value.

Figure 2: Missing data plots

Figure:3 Missing data Plot

Figure 4: MCAR test

Tests for MAR

Table 3: Tests for MAR

Variables Predictor Estimate Std. Error z value P-value

single imputation method of missing values:

significant predictor (p= .0052). (Figure:5)

The final linear model, y = 7.34374 + 0.199227[gender] + 0.04697[title], shows us that a

places at the end of the survey).

Convert Gender and title to factors:

#Convert Title to factor

Create a subset of the data and compute descriptive statistics:

#Descriptive statistics using Hmisc::describe

Missing data plot using visdat package: R code

Missing data plots with naniar package

#Missing value plots with naniar package

nsets=10) #Note:nsets = number of vars

Missing data patterns

Little’s test for MCAR

#Create a subset of the ithaka data

!is.na(Q7_3_A) | !is.na(Q7_3_B) | !is.na(Q7_3_C) |

!is.na(Q7_3_D) | !is.na(Q7_3_E) | !is.na(Q7_3_F) |

!is.na(Q7_3_G) | !is.na(Q7_3_H) | !is.na(GenderFac) | !is.na(TitleFac))

Testing for MCAR

Create dummy variables that code “missingness” of the variables: R code

Random single imputation of missing values using the imputeR package:

Regression using the randomly-imputed data set: R code

#Missing data plot using visdat package

#Missing value plots with naniar package

nsets=10) #Note:nsets = number of vars

Das könnte Ihnen auch gefallen