[Jdm-society] replications
Kusev, Petko
Petko.Kusev.1 at city.ac.uk
Mon May 7 17:38:47 CDT 2007
Dear All,
I completely agree with Mike Doughertys suggestions and arguments. The idea to maintain databases with replications is a very good one. It is all about internal/own standards and originality in the research.
Petko
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Dr Petko Kusev
Research Fellow
City University
Department of Psychology
Social Sciences Building
London
EC1V 0HB
UK
Room D431
Phone: 020 70404573
e-mail: p.kusev at city.ac.uk
Fax: +44 (0)20 7040 8580
XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
-----Original Message-----
From: jdm-society-bounces at mail.sjdm.org on behalf of Michael Dougherty
Sent: Mon 5/7/2007 22:22
To: jdm-society at mail.sjdm.org; bigopp at psych.stanford.edu
Subject: Re: [Jdm-society] Jdm-society Digest, Vol 101, Issue 10
Hi All,
I too have found this discussion kind of interesting, for many reasons.
But one reason is that it is incumbant upon individual researchers to
ensure replicability of their own novel effects. In a conversation I had
with Valerie Reyna a few weeks back, she remarked that for every result
she has published, she has 3 or 4 replications in a file drawer
somewhere. While this might be overkill (sorry Valerie), it does connote
an internal standard that we all should adhere to (when possible) in
publishing our own findings.
This does not, however, solve the other problem brought up in this
discussion, which seems to boil down to generalization. Can an effect be
replicated by an independent research group, who might not have access
to all the nuances that may be present in ones experimental design, that
might not show up in the methods. One case in point that is often
overlooked by cognitive psychologists is that different universities
have different subject populations created by different admissions
standards (cognitive capacity correlates fairly well with SATs, which
means that different universities likely have different ranges and
distributions of cognitive ability). Moreover, subtle differences in the
administration of a task can influence it's reliability. Such nuances
often don't show up in method sections, but nonetheless affect our
ability draw inferences about replicability.
Where does this leave us? One possibility is that we maintain electronic
databases where people can log their successes and failures to
replicate. This has at least two benefits: 1) we get to see how many
times results are replicated or not, without cost to journals, 2) we
would have an archive of past studies that could eventually be used in
meta-analytic studies. These meta-analytic studies would allow one to
identify (perhaps) important nuances in the experimental design, the
task, or subject populations. As a final benefit, these archived
failures to replicate could be taken as evidence for research activity,
which could (in principle) be used in tenure evaluation. (I don't know
what the implications are for the chronic non-replicator --
incompetance?).
Other ideas? Someone on here (perhaps the other Mike Dougherty
(Doherty)), suggested that MS theses, be based on replication. My own
bias is that theses should represent original work, with the hope of
publishing (doesn't do students a lot of good if they can't publish
their theses). However, what about undergraduate honors students?
Quality control is in the hands of the students' advisor and committee,
who can decide whether the experimental method was of sufficient quality
be archived online.
Mike (DoUGherty)
Michael Dougherty, Ph.D
Associate Professor of Psychology
Department of Psychology
University of Maryland
College Park, MD 20742
mdougherty at psyc.umd.edu
Office phone: 301-405-8423
DAM lab phone: 301-405-8276
>>> <bigopp at psych.stanford.edu> 05/07/07 3:05 AM >>>
I really am enjoying the thread, and I'm glad that this discussion is
happening. But I'm wondering just how big the problem really is given
the
size and scope of the proposed solutions.
Eran's suggestion of a replication center in which every study was
replicated before being published, would be both time consuming and
expensive. More resources would be devoted to replication than most of
us
have available for running novel experiments! And I echo the concerns
that have been raised about adding requirements to the already full
graduate school curriculum.
That's not to say that these, or other steps shouldn't be taken, but
rather that they shouldn't be taken until we can be sure that the
problem
is really as big as one might gather from reading these emails.
Many journals are now publishing 6 or 7 study papers. These usually
consist of an initial study showing an effect, and then several
"replicate
and extend" or "converging evidence studies" (which is nearly as good as
replication in my book - perhaps better because they reduce the
possibility of procedural artifacts). While some of the extensions may
not be as reliable, we should be fairly certain that the original study
is. Not to mention the fact that important/central studies in a field
draw the attention of other researchers who then follow up and do
additional replications.
I would think that before any wide-scale replication policy was
implemented, we would first try to identify the scope of the problem.
Ideally by randomly sampling from a set of papers that are agreed to be
influential, and seeing what percent of them replicate. Bearing in mind
power, and other statistical issues, of course.
That said, I'd like to comment on the problem of "finicky effects".
There
are a lot of them - and usually the authors are well aware of the fact
that parameters need to be set perfectly in order for them to work.
However, there is no room for reporting this in a paper. Imagine a
researcher reporting "we tried to get the effect with different stimuli
twice and didn't find the effect. Then we tried with a third set of
stimuli, and it worked. We report 3 replications in Studies 2-4, but we
actually ran 7 studies - only 3 of them worked. In two we failed to
replicate the original finding, in the other 5 we replicated Study 1,
but
the extensions only worked in the 3 we report in studies 2-4".
This descriptor would be a reasonable approximation of what really
happens
for about 80% of the papers out there. The effect is probably real -
the
basic effect was found for 7/10 attempts at an alpha of .05. But nobody
ever reports this sort of thing. Reporting that is paper suicide - the
paper would be rejected on the basis of the studies that didn't work.
Because of the way reviewers and editors look at papers (looking for
reasons to reject), there's no room to report studies that don't work -
only studies that do. (Note that a study can have a very high power,
but
still be reporting a finicky effect if a slight variation to the stimuli
will greatly reduce that power.)
Replication is all well and good, but an effect can be real, and
replicable, but be finicky. I would argue that not knowing which
effects
are finicky (and the extent to which they are) is likely a bigger
problem
than the effects not being replicable at all. And this is an issue that
will be solved by encouraging rather than punishing authors for
reporting
the number of tries it took before they could find an effect, (e.g. the
proportion of studies that didn't work compared to those that did).
Danny
Immortal until proven otherwise...
_______________________________________________
Jdm-society mailing list
Jdm-society at mail.sjdm.org
http://www.sjdm.org/mailman/listinfo/jdm-society
_______________________________________________
Jdm-society mailing list
Jdm-society at mail.sjdm.org
http://www.sjdm.org/mailman/listinfo/jdm-society
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.sjdm.org/mail-archive/jdm-society/attachments/20070507/292ebaa7/attachment-0001.htm
More information about the Jdm-society
mailing list