Sie sind auf Seite 1von 10

Marquez, Marvin E.

Stat 243- Categorical Data Analysis

A. The ML fit of the probit model is:

Φ−1 [𝜋̂(𝑥)] = −2.32 + 0.088𝑥

In this model, the fit corresponds to a normal tolerance distribution with 𝜇 = −𝛼/𝛽 =-(-2.32)/0.088 =
26.36 and 𝜎 = 1/|𝛽| = 11.36, the distance between x values where 𝜋̂(𝑥) =0.16/0.84 and 𝜋̂(𝑥)=0.5;
hence the curve for 𝜋̂(𝑥) is N(26,(11.4)2) cdf.

B. Using the estimates, the rate of instantaneous change in the probability of remission where it
equals 0.50 is: (𝛽/2𝜋)1/2 = 0.40 𝛽 = 0.40 (0.088) = 0.0352.

C. Probit analysis can be also done by using the difference between estimated probabilities of
extreme values or lower and higher quartiles. In this case, for every increase in labelling index
the estimated probability of remission increases from -1.088 of the lower quartile to 0.114 of
the higher quartile.

D. The most common interpretation of coefficients of probit models is taking the dependent
variable as latent continuous variable. Here, we can say that for every one unit of increase in
labeling index such as LI from 24 to 25, we can estimate that the conditional distribution of the
latent variable (remission length) shifts up by 0.088 standard deviation. Put in other way, in
every 11.4 percentage increased in thymidine labelling index, the remission rate shifts up by
(11.4) 0.088 ~ 1 standard deviation.

Coefficients Standard Error P>z Confidence Interval


Color -0.2919 0.13208 0.027 -0.551, -0.033
Width 0.2758 0.0601 0.000 0.158, 0.394
Constant -5.806 1.704 0.001 -9.14, -2.465
Loglikelihood: -94.5356
Deviance: 36.69
Prob>χ2: 0.0000

In this case, presence of satellite was accounted. Let y be the presence or absence of satellites as the
dependent variable. Using probit regression analysis, we can estimate the effects of color and width of
the horseshoe crab on the probability of having satellites. The ML fit of the model is:

Φ−1 [𝜋̂(𝑥)] = −5.806 − 0.292𝑥1 + .276𝑥2

 Global hypothesis Ho=B=0 can be rejected since the deviance of 36.7 at degree of freedom of 2
is highly significant against the null model. Hence, we can simply attribute the effect on our
predictors.
 Light colors attract more female crabs, hence increased in probability of having satellite
compared to dark one
 On the other hand, for every one centimeter increase of width among horseshoe crabs, the
probability of having satellite shifts up by 0.276 standard deviation. And this relationship is
statistically significant.
 Hence these two are significant predictors for satellite.
A. By fitting individually each of our predictor, it shows that sex and osteoblastic pathology have
significant effect on the dependent variable. It can be noted, however, that lymphocytic
infiltration provides an infinite estimate, as obviously it has zero frequency for lymphocytic
infiltration is high and not in disease free.

Odds Ratio Standard Error P>z Confidence Interval


Sex 0.164 0.137 0.031 0.03161, 0.85187
Constant 6.5 4.937104 0.014 1.466829, 28.80364
Loglikelihood: -27.36
Deviance: 5.88
P-value: 0.0153

Odds Ratio Standard Error P>z Confidence Interval


Osteoblastic Pathology .2171946 .1487553 0.026 .056737, .8314419
Constant 4.25 2.361805 0.009 1.430079, 12.63042
Loglikelihood: -27.53
Deviance: 5.53
P-value: 0.0186

Odds Ratio Standard Error P>z Confidence Interval


Lymphocytic Infiltration 1.0
Constant 1.117647 .3731253 0.739 .5809409, 2.150193
Loglikelihood: -24.89
Deviance: 0
P-value: NA

The two predictors – both the sex and osteoblastic pathology have significant effect but estimates on
intercepts are unreliable since standard errors is higher, and we have an evident wider CIs. This can be
due to small sample size or having an unbalanced data (i.e., has many zero frequencies)

B. Attempting to fit a main-effects logistic regression model using the three predictors, estimates
of the two of them suggest significant association while surprisingly the lymphocytic infiltration
MLE yielded infinite estimates. We can attribute this to the nature of the data in which aside
from we have smaller samples, this predictor also has cell with zero frequency--- unbalanced
data, so to speak. That is, higher lymphocytic infiltration predicts absolutely disease free. In this
case, we can surmise also that MLEs provided are unreliable as the intercept has less robust
standard error and with wider confidence interval. Another striking finding is that fitting all the
three predictors, none of them now remains significant.

Odds Ratio Standard Error P>z Confidence Interval


Sex .1947177 .1776399 0.073 .0325731, 1.163997
Osteoblastic .2951188 .22759 0.114 .0650979, 1.337911
Pathology
Lymphocytic * * * *
Infiltration
Constant 8.229653 7.960853 0.029 1.235897, 54.80004
*Infinite estimates
Loglikelihood: -21.398

C. The remedy for this case is to use conditional logistic regression, where maximum likelihood
estimates are provided by using conditional probability; hence by exact distribution. This
method is widely used if the outcome variable is binary and if samples are too small and
tabulation shows more empty cells. The table below shows estimates using the exact logistic
regression model.

Odds Ratio Sufficient P>z Confidence Interval


statistics
Sex .2126904 16 0.1392 .0499248 1.6675
Osteoblastic Pathology .3146999 12 0.2182 .0499248 1.6675
Lymphocytic Infiltration 6.592992 10 0.0742 .8508871 +Inf
Model Score: 12.88
P-value: 0.0027

Overall, the general model fits to the data with model score of 12.88 and it is statistically significant.
Simply put, the three predictors somewhat explain the variation in the probability of the dependent
variable. Individually, our predictor of interest—the lymphocytic infiltration has an odds ratio of 6.59
after fitting the model. This would mean that by controlling for both sex and osteoblastic pathology, the
odds of having higher lymphocytic infiltration are around six times higher to be in disease free state
after three years than those with low lymphocytic infiltration. The calculated p-values from the exact
conditional distribution indicate that none of the predictors are significantly associated with the disease-
free state. Results of the intercept is not included because its sufficient statistic was conditioned out
when creating joint distribution of the three predictors. Narrower confidence intervals now imply
reliable conditional MLEs than the regular logistic regression.
A.
𝜋1 𝜋1 𝜋2
log ( ) = log ( ) − log ( )
𝜋2 𝜋3 𝜋3

𝜋1
log (𝜋3)=1.785+1.044s+0.703r

𝜋2
log (𝜋3)=1.554+0.254s-0.106r

𝜋1 𝜋1 𝜋2
log ( ) = log ( ) − log ( )
𝜋2 𝜋3 𝜋3

= 1.785+1.044s+0.703r – (1.554+0.254s-0.106r)

𝜋1
log (𝜋2) = 0.231 + 0.79s + 0.809r

B. Computing for the 95% confidence interval of odds ratio, we have:

Odds
Coeff SE Ratio 95%CI
Gender 1.044 0.259 2.84 1.7098 4.7192
Constant 1.785 168

= exp [1.044 ± (1.96 X 0.259)]


= exp [0.5363, 1.551]
=(1.7098, 4.6192)

Interpretation: By holding other variable constant, the odds of believing in heaven is nearly thrice more
likely in females than males. The confidence intervals indicate that the odds could be little as 1.71 times
or as much as 4.71 times larger with 95 percent confidence.
C. Based from letting the gender, female=1 and race, black=1, the estimated probability that if they
believe in heaven for both females (sex=1) and from white race (race=0) is:

𝑒 1.785+1.044
𝜋1 =
1 + 𝑒 1.785+1.044 + 𝑒 1.554+0.254
= 0.7045

D. 𝜋2 > 𝜋1 > 𝜋0

The estimated probability for the baseline 𝜋3 is less than the rest of the probabilities and 𝜋2 is less than
𝜋1 can be shown by the following:

We have three categories j and equivalently J=0, 1, 2. The estimated probabilities can be calculated by
assigning probabilities for each category. Interestingly, the baseline has the following predicted
probability:
1
𝜋0 =
1+ 𝑒 𝑔1 + 𝑒 𝑔2
While the upper category 1 has the following estimated probability:

𝑒 𝑔1
𝜋1 =
1 + 𝑒 𝑔1 + 𝑒 𝑔2
By noticing the numerator, exponentiating beta coefficient is greater than the value of the base
outcome which is one and if we add another variable in the model, let say:

𝑒 𝑔1+ℎ1
𝜋2 =
1 + 𝑒 𝑔1+ℎ1 + 𝑒 𝑔2+ℎ2
The sum of the betas is greater than the sole beta coming from only one variable. Hence we can
conclude using the intercept and beta coefficients of the estimated parameters, that 𝜋1 > 𝜋2 > 𝜋3

This is similar with the black females, no matter how large the beta coefficients are, the same argument
applies that 𝜋1 > 𝜋2 > 𝜋3

E. Similar with the former question, the estimated probability for females is higher than males is
due to the fact that the coefficient is signed positive and since female = 1 the estimated
probability can be computed using
𝑒 𝑔1
𝜋1 =
1 + 𝑒 𝑔1 + 𝑒 𝑔2
While if the variable gender=0, exponentiating zero is equal to 1; hence:
1
𝜋0 =
1 + 𝑒 𝑔1 + 𝑒 𝑔2
F. In this fit, G2 is equal to 0.69 which seems to be fit with data, residual degree of freedom of 2
can be explained from the difference between the number of categories and the categorical
covariates. Residual df can be computed through (3-1 = 2 df. Subtracting G2 of the previous
model from the new model we have: 47.64- 0.69 = 46.95 following chi-square distribution
degree of freedom of 2. Since the p-value is <0.05, we need to reject the null hypothesis that the
coefficients are equal to zero. Hence opinion is statistically associated with gender given race.

Parameters Invertebrate/Fish Others/Fish


Intercept 0.971(0.626) -0.534(.791)
Gender (1=F) -.761(.701) .0312(.806)
Adult (1=A) -2.498(.746)*** -1.22(.803)
Note: *<0.05 **<0.01 ***<0.001

A. In this investigation, being adult alligator seems to be a significant predictor. By holding the
gender effect, being an adult tend to feed themselves by fish than invertebrate as what is
indicated by the sign of the regression coefficients. Exponentiating the estimate, the odds of an
adult alligator to be fed on invertebrates is less by 8 percentage points compared to fish. And
this is highly significant (p<0.001). Gender and being an adult has no significant effect for an
alligator in choosing other food over the fish. The estimated probability of the food choice
categories for adult females can be calculated through:

𝑒 0.971+(−0.761)+(−2.498)
𝜋1 = 1+𝑒 0.971+(−0.761)+(−2.498) +𝑒 −0.534+0.312+(−1.22) = 0.079

𝑒 −0.534+0.312+(−1.22)
𝜋2 = 1+𝑒 0.971+(−0.761)+(−2.498) +𝑒 −0.534+0.312+(−1.22) = 0.1785

1
𝜋3 = = 0.781
1+𝑒 0.971+(−0.761)+(−2.498) +𝑒 −0.534+0.312+(−1.22)

B. Using only the primary food choices of fish or invertebrate and dropping the others, we have the following model:
Parameters Invertebrate/Fish
Intercept 1.079(.678)
Gender (1=F) -.91(.756)
Adult (1=A) -2.598(.785)***

Compare it to this table

Parameters Invertebrate/Fish
Intercept 0.971(0.626)
Gender (1=F) -.761(.701)
Adult (1=A) -2.498(.746)***

We can say that we estimated almost similar regression coefficients and its standard errors with just a
little and almost negligible difference. Being an adult both models seem to be a strong significant
predictor of food choice.

C. Fitting a model where the adult binary is replaced by a continuous covariate or putting in the
regression model the exact measurement of the size instead of classifying whether adult or sub
adult, we have:

Parameters Invertebrate/Fish Others/Fish


5.64(1.897) -1.77 (1.705)
Intercept
-1.119728 (0.728) 0.56 (0.877)
Gender (1=F)
-2.927(0.937) 0.119 (0.583
Length
Note: *<0.05 **<0.01 ***<0.001

In this case, length for alligators with invertebrate food choice is a significant predictor. For every one
meter increase in the alligator size, the estimated parameter will be multiply by the odds of choosing the
invertebrate over the fish. Or in other words, the odds of choosing invertebrate as a food over the fish
decreases as the length of the alligator improves.

Parameters Invertebrate/Fish
Intercept 1.079(.678)
Gender (1=F) -.91(.756)
Adult (1=A) -2.598(.785)***

Coeff SE
Treatment*** 0.580685 2121478
Gender 0.541394 0.287182

Constan1 -0.19596 0.28929


Constan2 1.371312 0.30003
Constan3 2.422135 0.322441
A. Fitting a cumulative model , we have in the above the regression coefficients and standard
errors. Treatment appears as significant predictor of the response in chemotherapeutic agents.
The odds of the response is higher which is about twice more likely to respond with
chemotherapy. However, gender is not a significant predictor.

Das könnte Ihnen auch gefallen