**A Suggested Revision to theForthcoming 5th Edition of the
APA Publication Manual**

Return to Bruce's Home Page.

APAeffec.wp 5/29/00

suggested for the 5th edition APA publication manual

(suggested substitution for the 4th edition section

on

__Effect size__. Because __p__ values are confounded,
joint functions of several study features, including effect size and sample
size, calculated __p__ values are __not__ useful indices of study effects.
As emphasized by the APA Task Force on Statistical Inference (Wilkinson &
APA Task Force on Statistical Inference, 1999), "reporting and interpreting
effect sizes in the context of previously reported effects is __essential__
to good research" (p. 599, emphasis added).

Reporting effect sizes has three important benefits. First, reporting effects facilitates subsequent meta-analyses incorporating a given report. Second, effect size reporting creates a literature in which subsequent researchers can more easily formulate more specific study expectations by integrating the effects reported in related prior studies. Third, and perhaps most importantly, interpreting the effect sizes in a given study facilitates the evaluation of how a study's results fits into existing literature, the explicit assessment of how similar or dissimilar results are across related studies, and potentially informs judgment regarding what study features contributed to similarities or differences in effects.

For these reasons the 1994 fourth edition of the Publication Manual "encouraged" (p. 18) effect size reporting. However, 11 empirical studies of one or two post-1994 volumes of 23 journals found that this admonition had little, if any, impact (Vacha-Haase, Nilsson, Reetz, Lance & Thompson, 2000).

The reasons why the "encouragement" was ineffective, as reflected in the literature summary presented by Vacha-Haase et al. (2000), appear to be clear. As Thompson (1999) noted, only "encouraging" effect size reporting

presents a self-canceling mixed-message. To present an "encouragement" in the context of strict absolute standards regarding the esoterics of author note placement, pagination, and margins is to send the message, "these myriad requirements count, this encouragement doesn't." (p. 162)

Consequently, this edition of the Publication Manual
incorporates as a requirement, "__Always__ provide some effect-size estimate
when reporting a __p__ value" (Wilkinson & APA Task Force on Statistical
Inference, 1999, p. 599, emphasis added).

In classical statistics, effect sizes characterize the fit of a model (e.g., a fixed-effects factorial ANOVA model) to data. Similarly, in structural equation modeling (SEM) goodness of fit indices may be thought of as effect sizes.

In a few analyses (e.g., randomization tests) effect size indices have not yet been formulated. However, confidence intervals intervals are quite useful in these instances, just as they are even when effect sizes can be computed. Reporting confidence intervals, especially in direct comparison with the confidence intervals from related prior studies, falls squarely within the spirit of required effect size reporting. The graphic presentation of confidence intervals can be particularly helpful to readers.

Numerous effect sizes can be computed. Useful reviews of various choices are provided by Kirk (1996), Olejnik and Algina (2000), Rosenthal (1994), and Snyder and Lawson (1993). However, a brief review of the available choices may be useful. Although there is a class of effect sizes that Kirk (1996) labelled "miscellaneous" (e.g., the odds ratios that are so important in loglinear analyses), there are two major classes of effect sizes for parametric analyses.

The first class of effect sizes involves standardized mean
differences. Effect sizes in this class include indices such as Glass' D , Hedges' __g__, and Cohen's __d__. For example,
Glass' D is computed as the difference in the two means
(i.e., experimental group mean minus control group mean) divided by the control
group standard deviation, where the SD computation uses __n__-1 as the
denominator. When the study involves matched or repeated measures designs, the
standardized difference is computed taking into account the correlation between
measures (Dunlap, Cortina, Vaslow & Burke, 1996).

Of course, not all studies involve experiments or only a
comparison of group means. Because all parametric analyses are part of one
General Linear Model family, and are correlational, variance-accounted-for
effect sizes can be computed in all studies, including both experimental and
non-experimental studies. Effect sizes in this second class include indices such
as __r__^{2}, __R__^{2}, and h
^{2}. For example, for regression, __R__^{2} can be computed
as the sum-of-squares explained divided by the sum-of-squares total. Or, for a
one-way ANOVA, h ^{2} is computed as the
sum-of-squares explained divided by the sum-of-squares total.

The General Linear Model is a powerful heuristic device (cf.
Cohen, 1968), as suggested by commonalties in variance-accounted-for effect size
formulas. However, in many applications it is advisable to convert these indices
to unsquared metrics, for reasons summarized elsewhere (cf. D'Andrade &
Dart, 1990; Ozer, 1985). When measures have intrinsically meaningful
non-arbitrary metrics, as occasionally occurs in psychology, unstandardized
effect indices may be more useful than standardized differences or
variance-accounted-for or __r__ statistics (Judd, McClelland & Culhane,
1995).

The effect sizes in these two classes--standardized differences
and __r__--can be transformed into each others' metrics. For example, a
Cohen's __d__ can be converted to an __r__ using Cohen's (1988, p. 23)
formula #2.2.6:

=

= 0.8 / [(0.64 + 4)

= 0.8 / [( 4.64)

= 0.8 / 2.154

=

When total sample size is small or group sizes are disparate,
it is advisable to use a slightly more complicated but more precise formula
elaborated by Aaron, Kromrey and Ferron (1998):

__r__ = __d__ / [(__d__^{2} + [(N^{2} -
2N)/(n_{1} n_{2})] ^{.5}].

Or an __r__ can be converted to a __d__ using Friedman's
(1968, p. 246) formula #6:

= [2 (

= [2 (0.371)] / [(1 - 0.1376)

= [2 (0.371)] / (0.8624)

= [2 (0.371)] / 0.9286

= 0.742 / 0.9286

=

In addition to choosing between standardized difference and
variance-accounted-for (or __r__) effect sizes, researchers must choose
between "uncorrected" and "corrected" effect sizes. Like people, each individual
sample has its own personality, or variance that is unique to that given sample.
The effect sizes computed for a sample are inflated by capitalizing on this
"sampling error variance."

However, we know what factors contribute to sampling error
variance. Samples have more sampling error variance when (a) sample sizes are
smaller, (b) the number of observed variables is larger, and (c) the population
effect size is smaller. Because we know what factors contribute to sampling
error variance, we can estimate the amount of positive bias in a
variance-accounted-for effect size, and then estimate a "shrunken" or
"corrected" effect size with the estimating sampling error variance removed. The
"corrected" variance-accounted-for effect sizes include indices such as
"adjusted __R__^{2}," Hays' w ^{2},
and Herzberg's __R__^{2}.

No one effect size is appropriate for all research situations. However, psychology as a field will be more fully informed by inquiry in which researchers report and interpret an effect size, whatever that index may be.

It should also be noted that Cohen (1988) provided rules of
thumb for characterizing what effect sizes are small, medium, or large, as
regards his impressions of the typicality of effects in the social sciences
generally. However, he emphasized that the interpretation of effects requires
the researcher to think more narrowly in terms of a specific area of inquiry.
And the evaluation of effect sizes inherently requires an explicit researcher
personal value judgment regarding the practical or clinical importance of the
effects. Finally, it must be emphasized that if we mindlessly invoke Cohen's
rules of thumb, contrary to his strong admonitions, in place of the equally
mindless consultation of __p__ value cutoffs such as .05 and .01, we are
merely electing to be thoughtless in a new metric.

Aaron, B., Kromrey, J.D. & Ferron, J.M. (1998, November).

http://www.apa.org/journals/amp/amp548594.html]

Return to Bruce's Home Page.