effect.wp1 8/28/97
Return to Bruce Thompson's Home Page.
Go to related APA Task Force journal article.

COMPUTING EFFECT SIZES

      The APA Task Force on Statistical Inference (Shea, 1996) and a recent series of articles on statistical significance testing (cf. Cohen, 1994; Kirk, 1996; Schmidt, 1996; Thompson, 1996, 1997) all have prompted closer scrutiny of contemporary analytic practices. The field has been moving away from emphasizing only statistical significance tests and toward emphasizing evaluations of (a) practical significance and (b) result replicability.

      The recently published fourth edition of the American Psychological Association style manual (APA, 1994) included a related important, but largely unheralded, shift in APA editorial policy regarding the reporting of effect-size practical-significance statistics in quantitative research. The manual noted that:

Neither of the two types of [statistical significance] probability values reflects the importance or magnitude of an effect because both depend on sample size... You are encouraged to provide effect-size information. (APA, 1994, p. 18, emphasis added)

      However, several recent empirical studies of journal articles have shown that the APA manual's "encouraging" that effect sizes be reported has not led to changes in reporting practices (Kirk, 1996; Thompson & Snyder, 1997a, 1997b). Thus, some journal editorial policies have been revised to "require" that effect sizes be reported (cf. Thompson, 1994a).

      The present brief paper has two purposes. First, some resources that authors can consult for help on computing effect sizes are listed. Second, some of the basic available choices are briefly summarized. [For discussion of the separate result replicability issue, see Thompson (1993) or Thompson (1994b).]

Resources on Computing Effect Sizes

      Many statistical computer packages have been revised to compute effect sizes either with all analyses or as an optional output for analyses. There are several excellent articles that summarize the kinds of effect size statistics that can be reported, and which provide formulas that can easily be implemented with a computer spreadsheet or calculator (see Snyder & Lawson, 1993; Kirk, 1996; or Friedman, 1968). Also, there are numerous books on meta-analysis, each of which presents formulas for computing effect sizes.

Effect Size Choices

      There are many, many effect sizes that can be computed. The present precis summarizes some basic choices, but cannot convey the full detail contained in the longer reports cited above.

      Most (though not all) of the available choices can be located within one of the four cells of a two-by-two classification matrix (see Snyder & Lawson, 1993). The matrix is defined by two dimensions.

      The first dimension of the matrix of choices involves effect size type. There are two major classes of effect sizes (not counting a third "miscellaneous" category described by Kirk (1996)): (a) variance-accounted-for effect sizes analogous to a squared correlation coefficient, and (b) standardized mean differences.

      Regarding variance-accounted-for effect sizes, it has long been known that all parametric analyses are correlational (Knapp, 1978; Thompson, 1984, 1991). Thus, even in analyses which test differences in means, variance-accounted-for effect sizes can be computed. Perhaps the simplest of these involves dividing the sum-of-squares for an effect by the sum-of-squares total. For example, when this is done in an ANOVA, the resulting effect size is called eta squared. When this is done in multiple regression, the resulting effect size is called the squared multiple correlation.

      Regarding standardized mean difference effect sizes, these are readily calculated when means are the focus of an analysis. One widely known effect size in this class is Glass' delta, which equals the difference between the experimental group's mean and the control group's mean, divided by the control group's standard deviation. A similar measure is Cohen's d, which equals the difference between the means of the two groups, divided by a sample-size weighted average of the standard deviations of the scores in the two groups.

      Neither of these two classes of effect sizes is inherently more interesting. These effect sizes can be converted back and forth across the classes. However, since variance-accounted-for effect sizes can be readily computed in all studies, even in studies than do not involve experimental groups, some persons prefer these effect sizes. Using variance-accounted-for effect sizes also reinforces a heuristic realization that all analyses are correlational (though some designs are experimental and some designs are correlational). Nevertheless, the field has not definitively established preferences as regards these two choices.

      The second major dimension of choices involves correction for positive bias. Researchers can use either (a) uncorrected effect sizes or (b) corrected effect sizes.

      Since all analyses are correlational, and seek to minimize the sum-of-squares unexplained, all classical analytic methods capitalize on all the variance in sample data, including the unique and non-replicable variance called "sampling error." Thus, some effect sizes (i.e., the uncorrected effect sizes) are positively biased, and overstate the effects that would be found either in the population or in future samples.

      Generally, three study features impact the amount of positive bias in uncorrected effect size statistics. First, there is less bias as sample sizes are larger. Second, there is less bias as fewer variables are used in studies. Third, there is less bias as the true population effect sizes are larger.

      Since the influence of these study features are generally known or can be estimated with various formulas, effect sizes can be "corrected" for the amount of positive bias or "shrinkage" that is expected in a given study. For example, regression researchers may report an uncorrected multiple R squared, or a corrected effect size for this statistic that some statistical packages call Adjusted squared R. [Corrected effect sizes are always less than or equal to uncorrected values in the same study, since uncorrected effects are always biased in the direction of overstating effects.] By way of another example, ANOVA researchers can either compute the uncorrected effect size eta squared, described above, or a corrected analog called Hays' omega squared.

      There are dozens of choices available to researchers in each of the four cells of the effect-size matrix of choices. Some effect sizes are more accurate, but are harder to calculate. The important consideration is to report and interpret effect sizes as part of analyses, and to make clear to the reader specifically which effect size is being reported.

      What makes a given effect size noteworthy depends both on (a) the context of the particular study (e.g., life-or-death medication effects vs. smiling and touching behaviors of adolescents in fast food restaurants) and (b) the value system of a given researcher. Since personal values impact this determination of noteworthiness, researchers may reasonably disagree regarding the import of a given result, but reporting effect sizes better positions all readers to more easily evaluate research results. Reporting effect sizes also facilitates the future use of results in subsequent meta-analytic studies.

References

American Psychological Association. (1994). Publication manual of the American Psychological Association (4th ed.). Washington, DC: Author.

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003.

Friedman, H. (1968). Magnitude of experimental effect and a table for its rapid estimation. Psychological Bulletin, 70, 245-251.

Kirk, R. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56(5), 746-759.

Knapp, T. R. (1978). Canonical correlation analysis: A general parametric significance testing system. Psychological Bulletin, 85, 410-416.

Schmidt, F. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods, 1(2), 115-129.

Shea, C. (1996). Psychologists debate accuracy of "significance test." Chronicle of Higher Education, 42(49), A12, A16.

Snyder, P., & Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education, 61(4), 334-349.

Thompson, B. (1984). Canonical correlation analysis: Uses and interpretation. Newbury Park, CA: SAGE.

Thompson, B. (1991). A primer on the logic and use of canonical correlation analysis. Measurement and Evaluation in Counseling and Development, 24(2), 80-95.

Thompson, B. (1993). The use of statistical significance tests in research: Bootstrap and other alternatives. Journal of Experimental Education, 61(4), 361-377.

Thompson, B. (1994a). Guidelines for authors. Educational and Psychological Measurement, 54(4), 837-847.

Thompson, B. (1994b). The pivotal role of replication in psychological research: Empirically evaluating the replicability of sample results. Journal of Personality, 62(2), 157-176.

Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25(2), 26-30.

Thompson, B. (1997). Editorial policies regarding statistical significance tests: Further comments. Educational Researcher, 26(5), 29-32.

Thompson, B., & Snyder, P.A. (1997a, March). Contemporary uses of statistical significance testing in counseling research. Paper presented at the annual meeting of the American Educational Research Association, Chicago. (ERIC Document Reproduction Service No. ED 408 303)

Thompson, B., & Snyder, P.A. (1997b, March). Statistical significance testing practices in the Journal of Experimental Education. Paper presented at the annual meeting of the American Educational Research Association, Chicago.



Return to Bruce Thompson's Home Page