[Jdm-society] Jdm-society Digest, Vol 101, Issue 10
bigopp at psych.stanford.edu
bigopp at psych.stanford.edu
Mon May 7 02:05:20 CDT 2007
I really am enjoying the thread, and I'm glad that this discussion is
happening. But I'm wondering just how big the problem really is given the
size and scope of the proposed solutions.
Eran's suggestion of a replication center in which every study was
replicated before being published, would be both time consuming and
expensive. More resources would be devoted to replication than most of us
have available for running novel experiments! And I echo the concerns
that have been raised about adding requirements to the already full
graduate school curriculum.
That's not to say that these, or other steps shouldn't be taken, but
rather that they shouldn't be taken until we can be sure that the problem
is really as big as one might gather from reading these emails.
Many journals are now publishing 6 or 7 study papers. These usually
consist of an initial study showing an effect, and then several "replicate
and extend" or "converging evidence studies" (which is nearly as good as
replication in my book - perhaps better because they reduce the
possibility of procedural artifacts). While some of the extensions may
not be as reliable, we should be fairly certain that the original study
is. Not to mention the fact that important/central studies in a field
draw the attention of other researchers who then follow up and do
additional replications.
I would think that before any wide-scale replication policy was
implemented, we would first try to identify the scope of the problem.
Ideally by randomly sampling from a set of papers that are agreed to be
influential, and seeing what percent of them replicate. Bearing in mind
power, and other statistical issues, of course.
That said, I'd like to comment on the problem of "finicky effects". There
are a lot of them - and usually the authors are well aware of the fact
that parameters need to be set perfectly in order for them to work.
However, there is no room for reporting this in a paper. Imagine a
researcher reporting "we tried to get the effect with different stimuli
twice and didn't find the effect. Then we tried with a third set of
stimuli, and it worked. We report 3 replications in Studies 2-4, but we
actually ran 7 studies - only 3 of them worked. In two we failed to
replicate the original finding, in the other 5 we replicated Study 1, but
the extensions only worked in the 3 we report in studies 2-4".
This descriptor would be a reasonable approximation of what really happens
for about 80% of the papers out there. The effect is probably real - the
basic effect was found for 7/10 attempts at an alpha of .05. But nobody
ever reports this sort of thing. Reporting that is paper suicide - the
paper would be rejected on the basis of the studies that didn't work.
Because of the way reviewers and editors look at papers (looking for
reasons to reject), there's no room to report studies that don't work -
only studies that do. (Note that a study can have a very high power, but
still be reporting a finicky effect if a slight variation to the stimuli
will greatly reduce that power.)
Replication is all well and good, but an effect can be real, and
replicable, but be finicky. I would argue that not knowing which effects
are finicky (and the extent to which they are) is likely a bigger problem
than the effects not being replicable at all. And this is an issue that
will be solved by encouraging rather than punishing authors for reporting
the number of tries it took before they could find an effect, (e.g. the
proportion of studies that didn't work compared to those that did).
Danny
Immortal until proven otherwise...
More information about the Jdm-society
mailing list