Brandon Shollenberger writes: I’ve been mulling over an idea I had, and I wanted to get some public feedback. What would you think of a public re-analysis of the Cook et al data set?
A key criticism of the Cook et al paper is they didn’t define the “consensus” they were looking for. There’s a lot of confusion as to whether that “consensus” position is weak (e.g. the greenhouse effect is real) or strong (e.g. humans are the primary culprits). The reason for that is Cook et al tried to combine both definitions into one rating, meaning they had no real definition.
You can see a discussion of that here.
I think it’d be interesting to examine the same data with sensible definitions. Instead of saying there’s a “97% consensus,” we could say “X% believe in global warming, Y% say humans are responsible for Z% of it.” That’d be far more informative. It’d also let us see if rating abstracts is even a plausibly useful approach for measuring a consensus.
My current thinking is to create a web site where people will be able to create accounts, log in and rate a particular subsample of the Cook et al data. I’m thinking 100 “Endorse AGW” abstracts to start with should be enough. After enough ratings have been submitted (or enough time has passed), I’ll break off the ratings, post results and start ratings on another set of abstracts.
The results would allow us to see tallies of how each abstract was rated (contrasted with the Cook et al ratings). I’m thinking I’d also allow raters to leave comments on abstracts to explain themselves, and these would be displayed as well. Finally, individual raters’ ratings could be viewed on a page to look for systematic differences in views.
What do you guys think? Would you be interested in something like this? Do you have things you’d like added or removed from it? Most importantly, do you think it’d be worth the effort? I’d be happy to create it, but it would take a fair amount of time and effort. It’d also take some money for hosting costs. I’d like to have an idea of if it’d be worth it.
An added bonus to doing it would be I could move my blog to that site as well. Self-hosting WordPress takes more effort than using WordPress.com, but it allows for far more customization. I’d love that.
So, thoughts? Questions? Concerns?
By the way, don’t hesitate to tell me I’m a fool if you think I’m spending too much time on the Cook et al issue. I’ve been telling myself that for the last two weeks.
Source: http://hiizuru.wordpress.com/2014/05/20/a-re-analysis-of-the-consensus/
===============================================================
My opinion is that given the vast number of people interested in this at WUWT, we could likely crowd-source this work much more accurately and quickly than Cook did, without having to fall back on a small cadre of “like minded friends”. Both sides of the aisle can participate.
I don’t know what the result will be of such an analysis proposed by Brandon, but I do know that we can get far more participants from a much broader venue (since WUWT has almost an order of magnitude more reach than “Skeptical Science”) and that Brandon’s attention to detail will be an asset.
We already know many of the mistakes made in Cook’s work, so a re-do has the advantage out of the gate. This disadvantage may be that the gatekeepers at IOP may refuse to publish it, and University of Queensland may publish yet another bogus legal threat, since they seem tickled that Cooks 97% is the subject of worldwide gossip – Anthony
The statement “97% of scientistcs believe in climate change (or global warming)” means nothing.
What exactly do the agree on? Let’s refute the claim as meaningless and ask what they estimate the warming in degrees will be caused by increased CO2 emissions at a certain date (like 2050) and the ppm value used in the estimate.
Eustace Cranch says:
May 21, 2014 at 9:08 am
No. Reality is not subject to majority vote. Don’t legitimize the idea.
———————————————————————————————-
You are correct that the universe is not a democracy and does not care what we, or anyone else, thinks.
However, there is the argument to take the club that they have been using and hitting them over the head with it, over and over again, just to take it away from them if nothing else.
Hey guys, I’m afraid I didn’t see this post until just now, so I missed a lot of comments. I don’t think I can answer all 100+ of them now. I’ll try to just address some central concerns. One is expressed by Martin A:
A lot of people seem to think this project would be done to find out what the “consensus” truly is. That’s incorrect. I don’t believe rating abstracts can tell us that. What I believe is rating abstracts can tell us how good or bad Cook et al’s methodology and data were. That brings us to a point expressed by people like Ken G:
This argument does matter. It matters quite a bit. It shouldn’t, and it certainly should be based upon work as shoddy as that done by Cook et al, but “shouldn’t” rarely matters. Most people who hear about the “97% consensus” will never understand why Cook et al’s results are meaningless.
You can actually look to Skeptical Science to see why this is true. As John Cook has said many times, people will tend to not accept criticisms of an idea if they don’t have an alternative. Saying, “Cook’s et al’s work sucks” won’t change people’s minds. Saying, “Cook et al’s work sucks; look at this to actually understand the issue” may.
Related to these ideas is that is expressed by Steve Lohr:
The point of this project would not be to find “the truth.” The point would be to provide material for people to look at in order to understand the issues. A “free-for-all mash up of ideas” is fine for that. It lets people see the ideas then examine the data for them. If half of your raters have on bias while the other half have another, you can see that in the data and examine its effects.
Bruce Richardson asks a relevant question:
If people hear both, “There’s a 40% consensus,” and, “There’s a 97% consensus,” I think they’ll tend to find the “consensus” argument far less compelling. I think this is an effective way of showing people a claim of “consensus” is pretty much meaningless.
AnonyMoose raises important concerns about the backend of a project like this:
My current plan is to create this system from scratch (aside from cannibalizing code). I’d use a MySQL server with hand crafted web pages. I’d use PHP and Javascript for account control and database I/O. If there wound up being enough interest in the project, I might add in another component to make data analysis more practical on the server itself. Until then, I’d probably do more the complicated analyses by downloading the data and running R code on my machine. Another important concern from AnonyMoose is:
One benefit of a design like mine is it would be easy to scale. Additional projects could be added right along side it. And since I’d intend to have ratings done in sets by subsample, we could even change what’s being looked at as we go if problems are found. Another important issue is raised by Joe Born:
This is one of the primary appeals to a system like I describe. There would be no reconciliation phases, no tie-breaks, no nothing. Everyone would have all the data, and they could examine any parts or combinations of it they wanted. We’d be able to look for things like rater bias by comparing each rater to one another (even across specific abstracts or sets of them).
That said, I don’t think the initial system would allow too much in this regard on the server itself. Creating the code to run analyses like that in your browser would take a significant amount of time. At least in the beginning, more complicated analyses would probably require downloading the data yourself.
Even then, I would be happy to help by writing up R code to run tests on your own computer. I’m sure others would be too. And who knows, maybe some people would eventually contribute the code to make those analyses possible server-side.
Steven Mosher says:
May 21, 2014 at 2:00 pm
For folks who claim there is no consensus I would say this:
A) what evidence do you have
B) why resist an attempt to properly measure the consensus present in the literature.
—————————————————————————————————————
The consensus is not “present in the literature”. If it exists, it is expressed in the literature, but the expression is not the consensus. Therefore, it cannot be measured in the literature unless you have independent evidence of a strong correlation between the actual views of scientists and the expression in the literature.
Steven Mosher says:
May 21, 2014 at 2:00 pm
The only reasons not to do it.
A) you think its a waste of time.. well DONT HELP THEN
B) you are afraid of the answer.
—————————————————————————————————————
The study design is incurably flawed. Both because we don’t know how the literature reflects the views of scientists and because there is no way to guard against bias among volunteers. Maybe more reasons as well.
Jake J raises a concern which is important to address:
.
As I said above, I intend to allow people to examine individual rater ratings. Critics will be able to look at each abstract and see how different people rated it. They can then look at each person who rated and and see how they rated other abstracts. They can even look and see what the answers would have been if they removed raters they felt were biased.
Additionally, critics will be able to participate directly. I'll invite critics to to rate a set of abstracts then compare their results to everyone else's. I'll then invite everyone to discuss the disagreements and see what they think.
Mindert Eiting raises a concern which helped trigger this idea:
I don’t agree raters should be forced to practice as a person can disregard anything they “learn” while doing it, but the opportunity to practice is definitely important.
However, the more interesting issue to me is the idea I “should use at least three raters” so I can “detect outliers.” This is an important point to me because my hope is to have more than three raters for each abstract. I’d want five, or maybe even ten. That lets us examine thigns like bias much more effectively, and it’s a reasonable goal since I intend to only examine ~100 abstracts at a time.
John Whitman raises a question about publishing results:
I don’t think results need to be published in a journal to have a strong impact, but that’s not too important. A lot of scientific work isn’t done with a publication in mind. It’s done to learn things. It’s often only after you’ve done a lot of work that you’ll be able to perform a study the results of which can be published.
Michael D suggests I might have already planned something. I won’t quote it because of how long it is, but I can confirm I have had a similar idea from the start. Aside from some details of implementation, there’s only one important difference: I would not just remove data from anyone. What I’d do is calculate many different results with many standards for which raters to include. I’d also encourage people to make their own suggestions about what data to filter.
Thanks for fixing my HTML tag!
Brandon Shollenberger says:
May 21, 2014 at 9:55 pm
Jake J raises a concern which is important to address:
I like the idea, but I think you’ll be attacked for bias. Critics will say that, by using Watts Up With That to recruit reviewers, you effectively solicited people who would lowball the consensus numbers. If you could find ways to control for this bias, then I think it’s an interesting idea.
As I said above, I intend to allow people to examine individual rater ratings. Critics will be able to look at each abstract and see how different people rated it. They can then look at each person who rated and and see how they rated other abstracts. They can even look and see what the answers would have been if they removed raters they felt were biased.
—————————————————————————————————————————
If you want to be scientific, you should be worried about bias itself, not just about being attacked for it. The way this is being framed, it looks to me like “bias is OK if it supports our views and we can get away with it”. I understand that may not be what you intended.
Your suggested solution looks like a flimsy band-aid to me. Bias is natural and normal. You will find differences between raters. It’s commendable to let critics inspect the results, but as far as I can tell there’s no objective way to interpret the differences, so you will be arguing with the critics for eternity. You can invalidate the results, but not validate them.
Maybe you should ask Mike Hulme who said that the Cook et al study was “poorly conceived, poorly designed and poorly executed”. See if he thinks your design is any better.
Dagfinn, Jake J suggested critics would cite a particular form of bias as an issue. I outlined a powerful method for examining the effect of that issue. I have no idea how you interpreted that as suggesting “bias is OK if it supports our views and we can get away with it.”
There is no need to “interpret the differences.” One doesn’t need to understand why a bias exists in order to see it or examine the effect of it. At the most basic level, one can take the “lowest” and “highest” raters as demonstrating the full range of bias. That gives you boundary conditions on what the “right” answers are.
There is little reason there would be extensive arguing about the results. Critics can propose whatever filters they like. Those filters would be applied and the results examined. The results would be easily verifable, meaning the only arguments would revolve around what filters are “right.” The worst case scenario is we’d wind up with a situation where everyone could agree:
But disagree about whether A and/or X are true. Laying out people’s differences in such a clear manner would be useful. This is especially true since those differences could be easily checked against individual abstracts.
Additionally, the rating system I’ve proposed is quite simple. That greatly reduces the extent of bias. It also means inappropriate ratings are more easily verified.
I don’t quite understand the philosophising about this issue. The Abstract itself clearly states that it is NOT 97% of all papers. Out of 11,944 scientific peer-reviewed papers on “global warming”, 66.4% express *NO* position on anthropogenic (man-made) global warming. Of the reminder that endorsed some position (32.6%), 97.1% suggested it was man made – i.e. it is a majority of a third of scientists that enforce it, NOT a majority of scientists. What’s there to argue about. The media had purposefully twisted the facts to suit the media status quo.
I don’t believe that their is any benefit in repeating the survey as it will only confirm what we already know that funding of climate science by governments is corrupt a process and bias towards funding submissions that infer CAGW.
So, what constitutes a “consensus” exactly? Is it just a headcount? Is one Lindzen really equal to one newly minted PHD? I would suggest a more qualitative (as opposed to quantitative) assessment is in order.
I wouldn’t bother repeating Cook’s’ travesty. It was a politically corrupt set up from the start. Far better to start again and do it properly. But to do so will cost money as it needs to be done by an accredited social research company that is not politically aligned. Material would need to be kept confidential as many scientists still fear career backlash for coming out on the global warming issue although this is starting to turn the other way here in Australia. Colleagues now talk openly about the damage the climate movement has done to science and the dreadful waste of public resources that could otherwise have been used productively. Let’s put it this way, the only people here that would welcome a meet with Michael Bent Hockey Stick Mann are the ousted Green Party and a few toady followers in the leftist media. To do it again, nah, move on. Cook is a useful idiot and a jerk . Most of the world knows this now and he will never be anything else.
Don’t do it, is my advice. It is better to demonstrate that the original one was flawed.
A key criticism of the Cook et al paper is they didn’t define the “consensus” they were looking for. There’s a lot of confusion as to whether that “consensus” position is weak (e.g. the greenhouse effect is real) or strong (e.g. humans are the primary culprits).
==========================================================
AH!!, still not correct! The strong position of proponents should be “humans are the primary culprids, AND the warming affecfts are CATESTROPHIC!
That is what the entire debate is about. To see a good version of the a strong statement, see the Oregon petition where over 35,000 scientist state,..
” There is no convincing scientific evidence that human release of cabon dioxide, methane, or other greenhouse gase, is causing, or will, in the forseeable future cause CATESTROPHIC harm to earth’s atmosphere and disruption of the Earth’s climate. Moreover, there is substanial scientific Sevidence that increases in carbon dioxide produce many beneficial effects upon the natural plant and animal enviroments of the earth.”
There is no statement signed by any proponents of CAGW which states the opposite, which would read…
There IS convincing scientific evidence that human release of cabon dioxide, methane, or other greenhouse gase, is causing, or will, in the forseeable future cause CATESTROPHIC harm to earth’s atmosphere and disruption of the Earth’s climate. Moreover, there is substanial scientific Sevidence that increases in carbon dioxide produce many HARMFUL effects upon the natural plant and animal enviroments of the earth.”
Ther is not one survey of scientist that makes this claim. The 97% claim is a pitiful, a farce, and fradulant.
“I’m thinking 100 “Endorse AGW” abstracts to start with should be enough”
Again, the same error. Why leave the “C” out af AGW???? Never leave the “C” out/ Without the “C” CAGW is a purely academic exercise, of NO political value. WUWT is making a mistake in leaving the C out.
Hell, the truth is that C, the G and the W are all MIA.
Sceptics are under mo compulsion to accept the rebranding of CAGW to CC, (Climate Change). I encourage all skeptics to call it CAGW, and demand the proponents prove a consensous on that.
The “conflict of interest” factor is too high to properly evaluate the truth of the responses. Are they presently beneficiaries of grant funds or other income based upon AGW theory, is their department, their school, their company, etc?
I like this idea for selfish reasons. I think it would be interesting to see the results. To see how many published papers actually quantify A’s contribution to ACC.
But I don’t think it will have any impact beyond blog & forum debates. Even if the results completely refute the 97% claim, the press will ignore it and the warmists will dismiss it. It will provide one more arrow in the quiver of counter arguments to the 97% claim but I doubt it will convince the true believers. It won’t stop the EPA from issuing new regs, the IPCC from creating alarmist summaries, Obama or Kerry from demonizing fossil fuels. There is IMO, no anthropogenic way to derail the ACC gravy train – only time and the climate itself will do that.
And even if it manages to refute Cook’s study – the 97% consensus will live on through the studies/surveys of Oreske, Doran, Anderegg and whatever others there are. I think this has merit as an example of how subjective such studies can be. But by itself will not accomplish much.
To augment it, it may be useful to discredit the premise and methodologies of all of the 97% studies. Yes, each has been discredited at some time in the past in some post or another. But to my knowledge there is no one place anyone can go to find all of the critiques. That, I think, would be worth having. And if it shared the same url as Brandon’s idea then that survey would support the arguments that the premise and methodologies are flawed.
Brandon Shollenberger says:
May 22, 2014 at 12:56 am
There is no need to “interpret the differences.” One doesn’t need to understand why a bias exists in order to see it or examine the effect of it. At the most basic level, one can take the “lowest” and “highest” raters as demonstrating the full range of bias. That gives you boundary conditions on what the “right” answers are.
——————————————————————————————————————————
You may have a point about the boundary conditions. Assuming that you have full range of bias, that is. You would have to know that somehow. Of course, if the range is too broad, the result will be of less value. What you can’t do is correct for the overall or average bias to find the “right” answer, since you don’t know how large it is or even in what direction it has.
The only reason this comes up is because of the indirect method of assessing the views of scientists. Asking them directly seems so obviously superior.
we could say “X% believe in global warming, Y% say humans are responsible for Z% of it.”
The first clause is meaningless.
100% of geologists believe in global warming because even in the scope of human recorded history (cave paintings) we were in an ice age. The world is warmer than it used to be — by a lot.
So, go ahead and ask the Cook question and categorical responses, as he stated them without alteration. This is your control. If you cannot replicate the control, it begs other questions of Cooks methodology as well as yours,
Then ask the more specific quantifiable questions that really matter:
Global warming has been (Xp90, Xp50, Xp10) degrees C, from Date1 to Date 2, by what measure?
Y0% say UHI accounts for at least Z0% of signal (or is unmentioned, null)
Y1% say CO2 accounts for at least Z1% of total global warming (or is not mentioned. null)
Y2% say ALL GHGs account for at least Z2% of total global warming.
There is likely to be a one-to-many element here with one paper sourcing several probability estimates of Z*.
There must also be a sampling of how many such statements in the abstracts are supported by research in the paper and not just references to other papers.
Brandon Shollenberger says:
May 22, 2014 at 12:56 am
Dagfinn, Jake J suggested critics would cite a particular form of bias as an issue. I outlined a powerful method for examining the effect of that issue. I have no idea how you interpreted that as suggesting “bias is OK if it supports our views and we can get away with it.”
—————————————————————————————————————————–
I absolutely didn’t intend that as an accusation. I was trying to make a point, perhaps clumsily. The point refers to discussing rater bias, not as a problem in in itself, but only as something you need to handle because someone else accuses you of it. So it’s about what you didn’t say. Of course, I may have missed something.
Interesting that you are broadening our scientific vision for this initiative, Brandon. In particular I find it interesting that you are motivated to investigate the validity of Cook-like processes. i.e. ask questions like “if we modified Cook’s process in xxx way, would that make it valid?” and “what do the statistics mean?”
Once you have all the machinery in place, presumably the same machinery could be used to ask other questions, such as “is there a consensus that antibiotics are good” (a loaded question!!). We may begin to see patterns such as “if a concept is widely accepted, then only papers that question that concept will be ‘original research’ and thus publishable.” Thus Cook’s foundational concepts may be more deeply flawed than his experiment design.
“if a concept is widely accepted, then only papers that question that concept will be ‘original research’ and thus publishable.”
Actually there appears to be a bell curve relationship to the strength and duration of widely accepted conclusions/results/beliefs.
This New Yorker article is an eye-opener on the impact of consensus on any field of study. It goes beyond gatekeeping and beyond climate science, which the article doesn’t even mention.
http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer?currentPage=1
Dagfinn:
If critics wanted to claim bias was an issue, I’d expect them to do some ratings so they could tell us what the “right” answer is. They’d have a hard time convincing people otherwise.
I guess you could suggest there is bias in some people that didn’t participate, but I don’t think that’s an issue.
True enough, but we can look different response types. We may not be able to tell which patterns are “right” since popularity doesn’t give us that answer, but we can see what possibilities there are.
That could be useful, but it would be pretty much useless as a response to Cook et al.
Stephen Rasey:
I’d like to think most people understood I meant “anthropogenically induced global warming,” or “the greenhouse effect,” or the like. I would be more precise when writing rater guidelines, but I prefer not to have to worry about doing it in every blog comment.
Michael D:
Aye. It’d require creating new tables in a database, inserting the new data and modifying some interfaces, but that’s about it. We could even switch out what kind of data is being studied, such as by replacing abstracts with public statements.
That link from thallstd is very interesting and potentially relevant. Here is the money quote:
“The journals only wanted confirming data. It was too exciting an idea to disprove, at least back then.” … after a new paradigm is proposed, the peer-review process is tilted toward positive results. But then, after a few years, the academic incentives shift—the paradigm has become entrenched—so that the most notable results are now those that disprove the theory.”
Thanks!