As blog readers digest the implications of this article, I can't help but think of common misconceptions some assessment professionals hold regarding the interpetation of test-retest reliability statistics. Reflect on the following quote (from the WJ-R technical manual - the McArdle and Woodcock discussion of the SEM-based test-retest study reported in that manual) when digesting the gist of this article.
- "A test does not change from one time to another; people do. There may be considerable change on some traits, but relatively little on others. Test-retest studies evaluate the tendency for change in people, not some aspect of test quality. A test that does not reflect such changes in human traits would be an insenstive measure of those traits. For example, an adult person's height is a very stable characteristic and the repeated measurement of a person's height will produce almost identical results. On the other other, an individual's weight is much more likely to fluctuate and the reason for weighing oneself daily or weekly is to track these changes. The bathroom scale is not condemned as unreliable because it reports different weights for the same individual on different occassions (it is condemned fo being too accurate!). The important point derived from the height and weight example is that, given good reliable measurements at different times, we should expect little or no change in some traits while expecting much fluctuation in others" (McGrew, Werder & Woodcock, 1991, p. 99)
Coyle, T. (in press). Test–retest changes on scholastic aptitude tests are not related to g. Intelligence.
Abstract
- This research examined the relation between test–retest changes on scholastic aptitude tests and g-loaded cognitive measures (viz., college grade-point average, Wonderlic Personnel Test, and word recall). University students who had twice taken a scholastic aptitude test (viz., Scholastic Assessment Test or American College Testing Program Assessment) during high school were recruited. The aptitude test raw scores and change scores were correlated with the g-loaded cognitive measures in two studies. The aptitude test change scores (which were mostly gains) were not significantly related to the cognitive measures, whereas the aptitude test raw scores were significantly related to those measures. Principal components analysis indicated that the aptitude test change scores had the lowest loading on the g factor, whereas the aptitude test raw scores and the cognitive measures had relatively high loadings on the g factor. These findings support the position that test–retest changes on scholastic aptitude tests do not represent changes in g. Further research is needed to determine the non-g variance components that contributed to the observed test–retest changes.
- What might have caused the observed test–retest changes, which were mostly gains? Some possibilities include practice effects, which refer to test–retest increases after repeated exposure to the test items; coaching effects, which refer to test–retest increases after being taught test-specific strategies; and regression effects, which refer to test–retest increases (or decreases) after an unusually low (or high) test score on the first testing attempt. Test–retest changes due to these effects are generally not g loaded (see Jensen, 1998, pp. 314–344). In particular, practice and coaching effects are typically welded to a specific cognitive test and do not transfer beyond that test. Similarly, regression effects are typically an anomaly of recruiting participants who have unusually low (or high) test scores on the first testing attempt. Whatever the causes of test–retest changes, spontaneous or random fluctuations in test scores could not explain the test–retest changes observed in the present study. If th observed test–retest changes had been random, then the distribution (and magnitude) of gains and losses in test scores should have been equivalent or nearly so. But this was not the case: More participants showed gains than losses, and the average gain score was larger in (absolute) magnitude than the average loss score.
- The findings of the current research are consistent with prior research indicating that test–retest changes on cognitive tests are not related to g (Jensen, 1998, pp. 314–344). In particular, the aptitude test change scores did not predict the g-loaded cognitive measures and did not load highly on the g factor. In contrast, the aptitude test raw scores did (significantly) predict the cognitive measures and did load highly on the g factor. Together, these findings support the position that test–retest changes on scholastic aptitude tests (viz., SAT and ACT) do not represent changes in g. Further research is needed to determine the non-g variance components that contributed to the observed test–retest changes.