A Logic of Statistical Testing

A Logic of Statistical Testing
In terms of mutually exclusive and mutually exhaustive sets of possible values for distribution parameters
of random variables reflecting a data generation process, the logic presented here requires the a priori
specification of hypothesis. A priori definition is an epistemological requirement that tests have evidential
value; however, this criterion does not extend to the testing act as described in this section. See Figure 1
for examples of a parameter which may take values on the extended real line (i.e. any number from
negative infinity to positive infinity): Panel A shows the set of hypotheses underlying typical one-tailed
tests; Panel B shows the hypotheses underlying two-tailed tests; Panel C shows how three hypotheses can
be expressed. Each set of values represents a parameter-driven hypothesis.For example, partitioning the
real line into the set of negative values and the set of non-negative values can represent a detailed set of
hypotheses (e.g., H1and H2) about a μ parameter (e.g., H1: μ<0 and H2: μ ≥ 0). The set of hypotheses
may include both substantially derived hypotheses and a catchall negation of those hypotheses. We want
to determine in which value set the true parameter is (i.e., which parameter value hypothesis is true).
If an estimate θ̂ (i.e., the value that the estimator yields when applied to specific data) is in a set of
values specified by a given hypothesis, then it conforms to the hypothesis in question. Because H and ¬H
(in which "¬" refers to logical negation and "¬H" can be interpreted as "not H") represent two sets of
possible values that are mutually exclusive but together constitute the full set of possible values, if the
estimate θ is consistent with one of the hypotheses, it will only conform to that hypothesis (e.g. any point
on the real lines shown in Figure). If the set of possible parameter values is a proper subset of the range of
the estimator (e.g., whole numbers are a proper subset of all real numbers), then it is possible for θ̂ not to
conform to any hypothesis (e.g., if the hypothesis is set of whole numbers, the estimate could be a
fraction). If an estimate conforms to a hypothesis, then if the hypothesis was true, it is a plausible result;
indeed, given the data, for θ= θ̂, then for an unbiased estimator with a symmetrical unimodal distribution,
θ is the most likely result.
I define an estimate θ̂, and by extension the underlying data, as consistent with the hypothesis if the
estimate is consistent with the hypothesis or if there is at least one element in the corresponding set of
values that would define a data generation process that could have produced data at least as extreme as
the estimate obtained. 0 is consistent with H for hypothesis H, where there is an element θ in the set of
values corresponding to H, for which the p-value(θ̂:0) is large. The term p-value(θ̂:0) refers to the
probability of the specified data generation process, with an actual parameter value of θ, producing data
that has an estimated value at least as extreme as θ̂ (this is the usual definition of a p value). An estimate,
and therefore the data, is inconsistent with hypothesis H if the estimate is inconsistent with H: that is, if θ̂
does not conform to H and the p-value(θ:0) is small for all elements θ in the set of values that correspond
to H. In comparison to a threshold value called a significance level, judgments are typically made as to
what constitutes a large or small p value. Note that the estimate needs to correspond only to a large p
value for one value in the hypothesized set of values to be consistent with a hypothesis; to be inconsistent
with a hypothesis, the estimate must correspond to a small p value for all values in the hypothesized set of
values.
Figure 2 shows the possibilities defined on the real line for the single-parameter two-hypothesis case (H
and ¬H). Panel A shows that the estimate is consistent with both hypotheses: θ̂ is consistent with H and
there is also at least one parameter value in the set of values corresponding to ¬H which could plausibly
produce data with an estimate at least as extreme as that obtained (if we consider the area below the curve
to the right of θ̂ to be large). In this case, the data can not adjudicate, and neither the hypothesis can be
excluded. Surely, θ̂, which provides a statistically insignificant test result, still gives some evidence for
¬H? No. Although θ̂ is consistent with certain values in ¬H, it is actually in H, and thus is even "more
consistent" with values in H. Consider, for instance, that the true value is exactly that of estimate θ̂, a
value in H. If anything, the fact that θ̂ is anywhere in H gives more evi-dence in favor of H than ¬H
irrespective of being consistent with values in ¬H (i.e. irrespective of the insignificant ¬H test). However,
making a judgment on the basis of this fact alone would not be an exercise in formal statistical testing,
since it would not properly account for the fact that estimates could have been produced at least as
extreme as θ̂ if the true value were in ¬H.
Panel B shows that the estimate is consistent with H but inconsistent with ¬H: Hypothesis H might have
given rise to the data, but ¬H is unlikely to have it (if we consider the area below the curve to the right of
θ̂ to be small). This is proof of H, and evidence against ¬H. Panel C shows that the estimate is consistent
with ¬H but inconsistent with H: Hypothesis H is unlikely to have given rise to such an estimate (if we
consider the area below the left curve of θ̂ to be small), but ¬H could have. This situation is evidence
against H, and proof for ¬H. Panel D is not possible because the estimate in this example must conform to
at least one of the hypotheses, and thus be consistent with it.
Because data are consistent with a hypothesis that the estimate conforms to, such a hypothesis is not
required to be statistically tested; however, a statistical test is required for those hypotheses that the
estimate does not meet. In this case, since the estimate will only conform to one hypothesis, if there are N
hypotheses, then N − 1 hypotheses will have to be tested statistically. In cases where the estimate is
inconsistent with any hypothesis, all hypotheses must be tested.
Other Sources of Conceptual Errors

Another worrying mistake concerns overly powerful tests. This concern arises from confusing statistics
with substantive objectives and thus confusing the statistical distance metric (based on the standard error,
reflecting variation in the data generation process) with that of a substantive determination (typically
associated with the raw variables scale). Indeed, if we had full information we would know if the
hypothesis was true or false, or we would know the parameter's actual value. That kind of knowledge
should not be shunned.
A related issue is the a posteriori interpretation of power-related outcomes. Suppose we have sufficient
power to discern a deviation(δ) from a point hypothesis θ 0. If the results differ significantly from θ0, we
could infer that θ is at least θ0 ± δ ;if the results are not significant, we could infer that θ is no more than
θ0 ± δ (Neyman, 1957). However, are these results a formal test of the implied hypothesis set θ ∈ [θ 0 − δ,
θ0 + δ] and θ ∉ [θ0 − δ, θ0 + δ]? No. The region of rejection of the statistics may be between θ0 and δ, or
greater than δ, but not statistically significantly different from δ(see Figure 4, where θ0 = 0).If we want to
test these new hypotheses, we'd follow the steps in the test logic that would lead to data collection and test
centering on either θ0 − δ or θ0 + δ (which-ever is closest to the new estimate). But, given the less than
perfect power in the process of generating new data, around this point, we will end up with another
discernible deviation δ*. If we follow the above a posteriori interpretation again, we would be leading to
the implied θ ∈ [θ0 − δ − δ*, θ0 + δ + δ*] and θ ∉ [θ0 − δ − δ*, θ0 + δ + δ*] hypotheses. As we continue
to pursue these new implied hypotheses, taking into account the discernible deviation due to power, we
finally (through infinite iterations, assuming the sequence of standard errors has a non-zeroo lower
bound) get to the implied statement that θ ∈ (−∞,∞),, which we presume to be true a priori — we have
come to a trivial truth rather than an informative hypotheses. Can a similar construction of implied
hypotheses be considered using data-specific concepts such as CIs or severity measures (Mayo, 2010;
Mayo & Spanos, 2010)? For example, could we consider our results as formally testing the θ ∈ [Lower
Limit,Upper Limit] and θ ∉ [Lower Limit, Upper Limit,] hypotheses implied on the basis of the CI?
No. No. Because such statements, as argued above refer to the data, and thus simply do not constitute
informative hypothesis about processes generating data. Although power, CIs, or severity measures may
inform inference a posteriori consideration, they do not deliver their epistemic value by formal
hypotheses testing.
However, as part of a formal test, we could use those data-specific values. For example, with regard to the
CI, we could use the underlying interval estimator (L, U) as a test statistics and thus require its sampling
distribution to calculate the probability of the statistics assuming values at least as extreme as the data-
specific CI calculated. It is important to remember here that the CI as calculated from data is not a
statistical (i.e., not a random variable) on which a formal test can be based.It is a single realization from
the distribution of an underlying statistical interval (L, U). A formal test would be based upon (L, U)
distribution. If such a test were constructed, that would follow the logic of its use, as described in this
article. Nonetheless, these types of data-specific values are not commonly used as defined in the formal
statistical testing.

A Logic of Statistical Testing

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

A Logic of Statistical Testing

Hochgeladen von

Copyright:

Verfügbare Formate

A Logic of Statistical Testing

Other Sources of Conceptual Errors

Das könnte Ihnen auch gefallen