From the Journal of Irreproducible Science – over 2/3s of researchers say they are unable to replicate study results
WUWT Reader “QQBoss” writes:
The BBC reports (shockingly), that the journal Nature is going to begin requiring a reproducibility checklist of authors, based on a survey performed last year where at least 70% of respondents (self-selected, of course) indicated that they were unable to reproduce expected results. As the ability to replicate studies is what allows science to demonstrate meaningfulness and continue moving the body of knowledge forward, it is surprising that it has taken this long for top of the line journals to more strongly encourage replication to establish validity.
“Replication is something scientists should be thinking about before they write the paper,” says Ritu Dhand, the editorial director at Nature.
But will they take the next step and more actively police published research and denote when it is not replicable? There needs to be an accessible list of papers that sits between valid, replicated studies and the full blown Retraction Watch . I highly doubt that most journals are willing to self-police themselves, so just as Retraction Watch has come into being, perhaps there needs to be be a web site that aggregates the list of all papers published each year and allows researchers who are able to replicate results to make some fanfare when they are able to remove a paper from the list, since replications are usually quiet affairs.
In the face of the hockey shticks, 97%s, and PAL reviews, combined with researchers refusing to release data “because you want to find fault with it” or just handing their hard drives over to their dogs to chew on, what percentage of AGW-related studies should be listed as unreplicable, perhaps even nonredeemable?
Has Nature thought through the implications of what they are suggesting for a significant amount of the papers they have pushed through that under tougher (aka more meaningful) standards would never have seen the light of day?
More: http://www.bbc.co.uk/news/science-environment-39054778?SThisFB
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.
h/t to the paper, Negative Results are Disappearing from Most Disciplines and Countries [2011] from master lexicographer Daniele Fanelli, whose 2009 work on scientific misconduct is also of note. He finds that “the proportion of papers that, having declared to have tested a hypothesis, reported a full or partial support has grown by more than 20% between 1990 and 2007.” Does this mean that as a species we are becoming better at guessing — or are there other forces at work?
One thing that jumped at me in Fanelli’s paper [Fig 3, p7] was the smoothness of this progression for the US authors, as compared with other countries. Richard Feynman noted “The thing that doesn’t fit is the thing that’s most interesting.” Are we seeking those things? Newton was almost right. Bereft of rigorous testing to invalidate popular hypotheses, would we be likely to notice “negative results” such as the rounding error disparities that led into quantum mechanics, today? Or would they be swept under the rug of selective funding and implied consensus?
+10
This is big. Replication crisis is going mainstream. Their will be a lot of academic fakers extremely nervous right now.
I am sure computer simulations of computational climate models are fully reproducible with sufficient work (and some insider info — it is best to steal the code altogether). Therefore it makes sense to restrict climate research to that field, no pesky measurements are needed. And from that point on let’s call model runs “experiments”, see e.g. CORDEX: COordinated Regional climate Downscaling Experiment. It is an excellent paradigm.
“no pesky measurements are needed. And from that point on let’s call model runs “experiments””
This is what is utterly bizarre and antiscientific. When I object to this on other fora I’m invariably asked what problem I have with modelling and accused of making arguments from incredulity. It’s quite mind-blowingly stupid and yet the faithful see no problem with it.
There is a journal for irreproducible results called, ‘The Journal of Irreproducible Results’. It even has a website, http://www.jir.com/.
I’m a bit surprised it isn’t the leading journal for climate science. 🙂
Well, mostly because JIR is closely akin to National Lampoon but on a higher level and tends to publish papers we write when we are struggling with a concept and turn to silliness to break the log jam. That is how papers appear describing transgender in paperclips and siamese twinning in gummie bears….
PMK
I wonder, has anyone studied the reproductive habits of Legos?
SMC February 25, 2017 at 10:30 am
“I wonder, has anyone studied the reproductive habits of Legos?”
That is a study I would support. I keep asking my 9 year old how they keep showing up in my shoes……
michael
Highly recommend the famous jir paper, a quantum gravity treatment of the angels on a pinhead problem. Best piece of double entendre science satire ever. Google takes you right there.
Wouldn’t it be funny if its Internet URL changed constantly, at random? And the only way to access it would be to start guessing? HILARIOUS! At times I amaze myself. Maybe it’s why I’m the only imaginary friend I have left.
This reproducibility topic reminded me of Anthony’s attempt to reproduce Bill Nye the science guy’s experiment to demonstrate that CO2 caused warming by using a heat lamp and some jars. Don’t have any links could use some help.
For some reason, the Google-Fu is strong in me today.
The First:
https://wattsupwiththat.com/2011/09/28/video-analysis-and-scene-replication-suggests-that-al-gores-climate-reality-project-fabricated-their-climate-101-video-simple-experiment/
The follow-on:
https://wattsupwiththat.com/climate-fail-files/gore-and-bill-nye-fail-at-doing-a-simple-co2-experiment/
The second follow-on:
https://wattsupwiththat.com/2014/08/10/bill-nye-thescienceguy-and-al-gore-not-even-wrong-on-co2-climate-101-experiment-accoding-to-paper-published-in-aip-journal/
For quite some time there has been discrimination against descriptive works in ecology. It is less probable for these to be corrupted, except in the incompetence sense, or if you play games with the raw data. Such data is also discriminated against because a lot of it is in gray literature which is less likely to become digitized and harder to justify in the modern library. Descriptive studies can sometimes lead to fruitful insights and experiments perhaps partly because you are not trying to prove a hypothesis one way or another. Simple curiosity and problem solving are still important to science. There are examples from epidemiology, albeit now helped by easier data analysis.
In estuaries, properly described as individually idiosyncratic, it is often difficult to easily reproduce results. Some long term descriptive work has been carried on by state fisheries agencies and other venues, but academic work too often gets improperly thrown into the statistical category. Maybe things will work out better as we now can better store raw data, too often presented even in descriptive studies as at least partial summaries of data. However, as noted, we need to back way off from publish or perish and value good peer review. It would help to put more researchers and administrators back into the class room where they could learn from their students and earn their pay.
Maybe “….reproducibility in submitted papers” should be part of the Data Quality Act enforcement.
So, I thought I’d put Q = U * A * dT to a test. I inserted some assumed values for the earth’s atmosphere to determine U and k, U = k/x.
Q = 1,368 W/m^2 aka solar irradiance. X = 100 km. Surface T = 288 K. ToA T = -90C.
U = 13.03, k = 1,303
Now I doubled the irradiance and thickened the atmosphere by 2.5 times and calculated the surface temperature.
Q = 2,736 W/m^2 aka solar irradiance. X = 250 km. ToA T = -90C. k = 1,303
U = 5.211, Surface T = 798 K.
This result looks just like Venus without any CO2 or RGHE hocus pocus.
Your Q does not seem to have the loss from radiation from the Earth. You need to account for all energy flows. What are the units of U? It can’t be just 13.03 unless U is dimensionless, which I don’t think it is. I recognize this is a quick post so full details may not be included, but can you go through what you calculated?
Nature is mentioned in a listing of media and people all involved in the Foreign Trade Council, the US equivilent of the Royal Societies we find in the UK and other countries and who in reality form the outer ring of a secret society that is Pteparing for World Governance based on a collective, read Communist model.
All you have to know about Globalism and how close they are to achieving their ultimate objective:
https://youtu.be/94CSNbF4XlU
Thank you for the great Edward Griffin / Quigley video, it was packed with interest.
Here’s why we’ve got to this state. I put it on Bishop Hill’s a few days ago when the BBC first reported this.
A ‘reproducibility crisis’ is inevitable because of basic statistics.
For a result to be considered plausible it must meet a 95% confidence level. But why 95%?
The answer was very sound. 95% means it’s a 1 in 20 chance of it being a spurious anomaly in the data.
(OK, we all know that. Bear with me).
Back in the 1900s to 1920s the chances of two groups both getting a 1 in 20 chance was 1 in (20 x 20) all multiplied by the number of groups looking.
E.g. 3 or 4 groups looking in 400. Thus if you do get a spurious result the chance of it being replicated by another spurious result was about 1 in a 100. Unfortunate – bad – science gets weeded out. Success!
95% confidence levels kept science on the right track… back in the 1900s to 1920s.
Now though we have a lot more research going on. Universities have more post-grads. And every town in the developed World has a university. Plus lots of universities in the Third World too. This alters the sums.
Chances of a spurious result being accepted as guidance for further research now =
30 or 40 in 400 or about 1 in 10.
But that’s for a dull subject. Exciting findings that attract research funds suddenly become 300 or 400 in 400 or in practical terms ‘pretty darn tooting to be confirmed’.
Remember, failed results don’t get published. After all, it might be the sloppy technique of the investigator. Bad luck post-grad.
In fact, for interesting results it won’t be just two spurious results that confirm each other. It becomes four or five.
Enough for a conference and a special journal.
The lucky researchers are now the world leaders in this exciting new field. They can get poached by more prestigious (sand richer) universities. And they will be doubling down in that field. Others will follow. Inter-disciplinary approaches offer now opportunities to crack old problems. If you’re really lucky you can turn your approach from something that might be a dead-end towards this new advance.
But remember, 95% confidence levels can no longer prevent spurious results from being established. So all that effort is often wasted.
Hence you are bound to get a ‘reproducibility crisis’. We are publishing on too low a confidence level. And no-one would ever suggest that we should publish less. That’s not how jobs in academia work.
Fortunately in climate science we do have different groups replicating the work of the others. Hence different temperature series.
The Nature discussion on the Today program was talking about medical studies, not climate data.
As usual, you miss the point in your eagerness not to question the practice of climatologists or be sceptical of anything at all.
Since few climate studie authors have archived their data and methods, I do not understand how they could be independently replicated. Please explain how this is possible.
A quick look back at the posts I estimate about 70% specifically mention climate science. Yet you only make this comment on mine. Why is that? Why did you not say that to all he other commenters talking about climate science?
This is primarily a climate blog, and putting scientific news in the context of climate science is pretty much what it does. Calling me out for putting the climate science perspective makes no sense at all.
It looks a bit like desperation.
Because you were replying to me and I was talking about the story on the BBC. I began with “Here’s why we’ve got to this state. I put it on Bishop Hill’s a few days ago when the BBC first reported this.”
You thread-jacked my comment in your monomania. I’ve every right to respond to you contemptuously.
Brook: You have NASA, NOAA, HadCRUT and Berkeley Earth all producing independent data sets. Each is a test of replicability. You also have two independent satellite based data sets. If you are not aware of them or their methods you can look them up.
Replication is a big problem in science. Just not so much in the temperature data sets in climate science.
Ok your reason for not commenting on the 70% of other posts is reasonable, as I was responding to you. The point still stands, climate science is not one of those that suffers from the replication problem. I am glad that you do not dispute this, but merely object to me inserting this into your discussion of medical research.
On that point, your analysis is flawed because the multiplication of research groups are not all doing the same thing, so the chances of duplicate spurious results is not as you state.
seaice1, if they aren’t doing the same thing then they aren’t replicating the results. They are doing something different.
You’re back on your “astrology is sound if I like the results” argument again.
They have to be investigating the same thing to replicate results. And they have to be using the same method to provide confidence in the method.
And even if that was wrong, it wouldn’t change the statistical fact that – with the increased number of research groups and bias against publishing negative results – spurious results will probably be confirmed.
Yes. And the sat temps do not match the GAST temps. Lack of replication.
MCourtney. Is your argument that in medical science there are too many groups trying to replicate the work of others? This is an interesting take on the issue and somewhat at odds with the article.
If you follow the link you will see that I never mentioned astrology, and it is simply another of your straw man errors.
Rather that the level of significance taken to prove replication is too low for the number of groups trying to replicate.
And you mentioned anything that has the same conclusion – regardless of whether the method is known – as replicable.
So that includes astrology, sometimes.
You can’t defend Karl et al 2015 like that and exclude astrology too.
M Courtney, you just made up the astrology replication study. There are none. Astrology is not capable of producing one. You can’t just make stuff up then argue about it as if it were real. Show me the astrology paper that got past peer review and supported Karl15 and we can discuss it.
No. No. No.
There is no-one stupid enough to write a paper claiming that any astrology prediction replicates anything. Karl et al 15 or anything else.
No. They will not. Because Karl et al is rubbish.
That is the point I have repeatedly made and you have repeatedly missed.
You claim that it matters not one jot how a prediction is made, repeatable or explicable – Pah!
You claim that if any method reaches the same conclusion as Karl et al then the practice that created Karl et al is confirmed.
It is not.
And further more, it will never be so because Karl et al is fabricated. That’s why they hid the data by losing it in a computer fault.
However, (pay attention here) if the method that leads a conclusion is irrelevant and the repeatable logic that leads to a conclusion is irrelevant… then (Ta Daaa) the Karl et al Conclusion is Relevant.
Then, and only then, Karl et al is (maybe) not rubbish.
But how do you let Karl et al 15 in and not also believe in Astrology?
You claim that it matters not one jot how a prediction is made, repeatable or explicable – Pah!
I have never claimed this
You claim that if any method reaches the same conclusion as Karl et al then the practice that created Karl et al is confirmed.
I have never claimed this.
And further more, it will never be so because Karl et al is fabricated
So you claim.
But how do you let Karl et al 15 in and not also believe in Astrology?
By using all the usual scientific processes, checking the paper has cited and used previous work in a relevant context.
Your “logic” would seem to exclude progress in science without acknowledging astrology, which is absurd.
There are no astrology based papers supporting Karl15. You are just making it up. You are saying “if astrology could support Karl15 then you would have to accept astrology.” But the fact is that astrology cannot provide any meaningful support for karl15, so your argument is just nonsense.
One last time, if they aren’t doing the same thing then they aren’t replicating the results. They are doing something different.
This is obvious.
But you say that it doesn’t matter that they are doing something different so long as the conclusion is the same. If the conclusion is the same then Karl et al is supported. This is silly.
Astrology is something different. If astrology comes to the same conclusion then is Karl et al supported? No. Because the workings of astrology are not replicating the workings of Karl et al 2015.
Remember though, nothing can replicate the workings of Karl et al 2015. Because they fudged the ship data and then hid there working.
So how can anything provide support for Karl et al 2015 just through having the same conclusion? You say “By using all the usual scientific processes, checking the paper has cited and used previous work in a relevant context.” But there is no relevant context for Karl et al 2015. They lost the data. Therefore, the new paper may well be OK. But it has nothing to do with Karl et al 2015. It has no more to do with how Karl et al 2015 works than astrology does.
The problem with your concept of science is that it lets anything in. Yes, even astrology. Because if the “relevant context” is just having the same conclusion – what else can it be when there is no working to compare to – then anything and everything is OK if the conclusion meets your desired target.
This is why your argument that a ‘similar conclusion is support’ is a justification for astrology being science. And it is also why you are completely wrong.
Replicating other group’s research results costs money and time and very few researchers are willing to spend their time and money reproducing (or failing to reproduce) other researchers work when they can do original science.
The truth is most scientists are well aware of the results in their small subfield that are suspect. With time articles that have irreproducible results are forgotten and no longer cited, and science just churns along self correcting and building up on solid knowledge. The main problem is fashion research that attracts money and resources detracting from solid scientific research, and top journals encourage fashion research, sometimes even asking scientists to produce it. Human caused climate change is just one example of fashion research that interests people in this blog, but fashion research contaminates every single aspect of most scientific fields. Fashion research has a very high risk of being biased and violate principles of the scientific method. However is where the money and fame is. Fashion research is for example finding a fossil that is a human ancestor. If your fossil is not a human ancestor is a lot less fashionable. Or finding an exoplanet that can support life. A lot of supposed human ancestor fossils are not, and a lot of exoplanets could not support life. In the end it is our fault that fashion science exists and contaminates research. It is just a manifestation of old anthropocentrism, like the climate is our fault.
Removing fashion science, then, would be more important than pursuing replicability? I can see your point to a degree, but without replication, how does one know if the result was a fluke or real? Reproducibility is necessary in science—note that physicists are often very good at asking for others to reproduce their outcomes if they can. They understand how important this is.
Talking from experience that is only a problem for those not conducting research. If one is conducting research, his experiments won’t work if based on irreproducible results, so pretty soon one learns to trust only what works in his hands. So science only advances on reproducible results. Only in non-experimental fields (like theoretical physics) can irreproducible science contaminate the field for a long time, like in strings theory.
A fine article posted at http://www.americanthinker.com/
which lists the causes for “the scientific decline of science in American Academia” by Leo Goldstein. The obvious factor missed is how government funds “science than that it seems a pretty good list to this truck driver. And quite applicable to the current discussion.
http://www.americanthinker.com/articles/2017/02/nine_causes_of_scientific_decline_in_american_academia.html
There is a problem here, but the solution is not obvious. The problem is that replication is a valuable thing, but the researcher (and funder of) doing the replication does not get the benefits. If the replication is successful, then the original researcher gets the benefit of everyone having confidence in the result. The wider community gets the benefit of confidence in the original result. The person who does the replication gets essentially nothing. Who is going to fund this? The only body that is supposed to represent the interests of the wider community is the Government. So are we saying that there should be a lot of Government spending in reproducing every paper that is published?
I’ve been waiting for this! Now all the deniers will see how the caring people really care when they can read how CO2 heats the air. The glaciers will melt, the seas will rise, and we’ll all DIE!!! I mean, we have an equation, right? All we have to do is loose the CO2 into the atmosphere, apply the equation, and watch the temperature go up, up, up! And it will be reproducible! AND caring!
Will Nature now require reproducible statistical significance?
Bailey finds “5 sigma” scientific models deviate by up to five orders of magnitude beyond naive normal distribution assumptions.
Bailey DC. (2017) Not Normal: the uncertainties of scientific measurements. R. Soc. open sci. 4: 160600. dx.doi.org/10.1098/rsos.160600
Demetris Koutsoyiannis (2010) shows random probability statistics severely underestimate climate persistence (Hurst Kolgomorov dynamics).
Koutsoyiannis, D. Memory in climate and things not to be forgotten (Invited talk), 11th International Meeting on Statistical Climatology, Edinburgh, doi:10.13140/RG.2.2.17890.53445, PDF. Presentation http://www.itia.ntua.gr/en/docinfo/991/
How does this help with the massive number of papers that only regurgitate the results of the results from a computer analysis? Isn’t the same code on a different computer going to give the same garbage?
The article does not mention climate science at all.
Possibly because it is one area where replication is commonplace.
There are several independent series published. BEST is a good example. Could the temperature series be replicated by an independent team using different methods? Yes it could.
Climate science is possibly the mos replicated area of science.
seaice1:
You’re kidding (mainly yourself), right? Do tell us how many have replicated Mann’s work – with his data and methods and code? Or, from the (not so) sublime to the ridiculous, Cook’s, with his data and code?
But, of course, you don’t mean ‘replication’, do you?
I don’ want to get into the many replications of Mann’s work for now. I was mainly referring to the surface temperature data sets.
The reason you don’t is that red noise fed into Mann’s centered PCA produces hockey sticks. So all the Pages2 crap produces them. Because the method is fatally flawed. McIntyre published a peer reviewed paper on this years ago. As for temps, the GAST stuff does not replicate the sat stuff. The GAST stuff can be showed flawed many ways. The fact that it comes out ‘sort of the same’ doesn’t mean it is right. The flaw is taking raw data not fit for purpose ( lack of coverage, UHI, poor station quality) and thinking it can be massaged into something useful with tight error bars. It is the error bar thing that is the real reproducible problem.
seaice1:
Frit!!!!! Go away and hide behind your pusillanimity!
Yes, because my point is that the temperature data is very well replicated. .
So you agree that BEST was a good replication study? Your issue is that Mann’s study has not been replicated?
Well, here is the evidence:
Robustness of the Mann, Bradley, Hughes reconstruction
of Northern Hemisphere surface temperatures:
Examination of criticisms based on the nature and
processing of proxy climate evidence
Eugene R. Wahl · Caspar M. Ammann
http://nldr.library.ucar.edu/repository/assets/osgc/OSGC-000-000-011-900.pdf
From the abstract:
“Altogether new reconstructions over 1400–1980 are developed in both the indirect and direct analyses, which demonstrate that the Mann et al. reconstruction is robust against the proxy-based criticisms addressed. In particular, reconstructed hemispheric temperatures are demonstrated to be largely unaffected by the use or non-use of PCs to summarize proxy evidence from the data-rich North American region. When proxy PCs are employed, neither the time period used to “center” the data before PC
calculation nor the way the PC calculations are performed significantly affects the results, as long as the full extent of the climate information actually in the proxy data is represented by the PC time series. Clear convergence of the resulting climate reconstructions is a strong indicator for achieving this criterion. Also, recent “corrections” to the Mann et al. reconstruction that suggest 15th century temperatures could have been as high as those of the late-20th century are shown to be without statistical and climatological merit.”
In contrast Soon and Baliunas 2003 has not been replicated.
Seaice1: It’s not at issue that the ‘temperature data is very well replicated”. It is all to do with whether the work of the scientist’s paper can be replicated – using the data and methods/code he/she used. When Mann released his paper (MBH98, I think it was) he took great pains to make sure his data and methods were very difficult – if not impossible – to find.
I don’t think you really understand replication.
replication and reproducability are technically two different things.
BOTH are important.
Reproduceable typically means the author supplies all the code and data and you can reproduce the same results.. or do the experiment again and get the same result.
replication, as you note, involves using perhaps different data and different methods and coming to the same conclusions.
“Possibly because it is one area where replication is commonplace.
…
Climate science is possibly the mos replicated area of science.”
And that doesn’t ring any alarm bells with you?
Not a single one?
Really?
Ye Gods!
You’re not a scientist, are you?
caweazle666. Can you explain your comment as it makes no sense as it is. I seem to be saying replication is a bad thing and should ring alarm bells.
I think the journal should go one step furthere, studies that makes a prediction of a future natural event at a specific date should go on watch. If the event does happen as predicted then the journal should publish the study and add the observed data for the predicted result.
On this site somewhere someone posted a study about the relationship between earth to sun distant and wetter than normal years. I recall it stating that 2016 / 2017 would be wetter than normal. It currently looks like that prediction is coming true I also read about a computer model that predicted past sunspot behavior based on the orbits of the 4 outer planets. It predicted that this solar cycle would be weak and possibly would have a double peak in sun spot numbers. The prediction appears to have been accurate.
That’s being very “sciencey”. Maybe trying to recoup some of their credibility lost with fake articles about species extinctions and plant die offs due to AGW? Whatever their reason….. I applaud it.
“listed as unreplicable, perhaps even nonredeemable?”
Isn’t that redundant?
It would seem so at first glance, but an unreplicable study may be redeemed if a minimal error is detected and corrected. An nonredeemable study is so rife with error, projection, etc that no amount of new work could save it.
Contemplate Pons & Fleischmann cold fusion, for example. Certainly not a sure thing now, but what was originally perceived as nonredeemable (at least, in part, because they decided to go Las Vegas with the initial report before publishing or having replication attempted) has led to small modifications by the US Navy, Texas A&M researchers, and others that have hinted at reproducability. That doesn’t mean the E-Cat will ever work (I wonder if that is going to get me put into moderation LOL), but perhaps something will.
But how do I met my publishing quota if I have to provide all the data? It’s so much easier to get my papers published if I can hide the fake data from prying eyes. /Sarc
“…thinking about when they write the paper.”
????
How about trying it out BEFORE you write the paper, lest the frank gets caught above the beans.
In the same way that Nature does not enforce it’s archiving data rule, this one won’t be enforced either. It’s a method used by journals al the time. Publish a paper, a rule whatever with great fanfare and then forget about it.
They mean to be ‘more scientific’ but will do ‘business as usual’ while this operates as a simple fig leaf.
TonyL (above) raises interesting points. What is meant by replication of a study? Using the same assumptions, data, math, programs, and operating systems will very likely (certainly?) produce the same results yet prove nothing. But changing assumptions, collecting new data, using different computer programs is not exactly replicating the study. Meaningful replication must involve getting into the details of the assumptions, the sources and validity of the data, and the mathematical and logical validity of the methods.
It isn’t simple, but it has been done for years in other disciplines. It should be possible in Climate Research.
These guys sure know how to reproduce …
I get it! It took me a while to figure out the connection between the spiraling human replicas and the present discussion.. Metaphor is not my strong suit (and jokes are beyond me) whether worded or pictured. But yes! Replication is indeed low hanging fruit and can be accomplished with rapid fire.
This is an excellent beginning, if and only if Nature enforces this. Words notwithstanding, Nature has never required replication in the past. Actions, not words will show whether Nature’s editors are serious.
As the submitter of the article, I highly doubt this rule will be applied to any paper that is in line with the editorial and management philosophy, which is why I think an independent web site documenting meaningful replication of published research could become something of significant value to academia.
It will be subverted to the cause like all left wing institutions. Likely to be a test against sceptics.