After almost two years and some false starts, BEST now has one paper that has finally passed peer review. The text below is from the email release sent late Saturday. It was previously submitted to JGR Atmospheres according to their July 8th draft last year, but appears to have been rejected as they now indicate it has been published in Geoinformatics and Geostatistics, a journal I’ve not heard of until now.
(Added note: commenter Michael D. Smith points out is it Volume 1 issue 1, so this appears to be a brand new journal. Also troubling, on their GIGS journal home page , the link to the PDF of their Journal Flier gives only a single page, the cover art. Download Journal Flier. With such a lack of description in the front and center CV, one wonders how good this journal is.)
Also notable, Dr. Judith Curry’s name is not on this paper, though she gets a mention in the acknowledgements (along with Mosher and Zeke). I have not done any detailed analysis yet of this paper, as this is simply an announcement of its existence. – Anthony
===============================================================
Berkeley Earth has today released a new set of materials, including gridded and more recent data, new analysis in the form of a series of short “memos”, and new and updated video animations of global warming. We are also pleased that the Berkeley Earth Results paper, “A New Estimate of the Average Earth Surface Land Temperature Spanning 1753 to 2011” has now been published by GIGS and is publicly available.
here: http://berkeleyearth.org/papers/.
The data update includes more recent data (through August 2012), gridded data, and data for States and Provinces. You can access the data here: http://berkeleyearth.org/data/.
The set of memos include:
- Two analyses of Hansen’s recent paper “Perception of Climate Change”
- A comparison of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques on ideal synthetic data
- Visualizing of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques
and are available here: http://berkeleyearth.org/available-resources/
==============================================================
A New Estimate of the Average Earth Surface Land Temperature Spanning 1753 to 2011
Abstract
We report an estimate of the Earth’s average land surface
temperature for the period 1753 to 2011. To address issues
of potential station selection bias, we used a larger sampling of
stations than had prior studies. For the period post 1880, our
estimate is similar to those previously reported by other groups,
although we report smaller uncertainties. The land temperature rise
from the 1950s decade to the 2000s decade is 0.90 ± 0.05°C (95%
confidence). Both maximum and minimum daily temperatures have
increased during the last century. Diurnal variations decreased
from 1900 to 1987, and then increased; this increase is significant
but not understood. The period of 1753 to 1850 is marked by
sudden drops in land surface temperature that are coincident
with known volcanism; the response function is approximately
1.5 ± 0.5°C per 100 Tg of atmospheric sulfate. This volcanism,
combined with a simple proxy for anthropogenic effects (logarithm
of the CO2 concentration), reproduces much of the variation in
the land surface temperature record; the fit is not improved by the
addition of a solar forcing term. Thus, for this very simple model,
solar forcing does not appear to contribute to the observed global
warming of the past 250 years; the entire change can be modeled
by a sum of volcanism and a single anthropogenic proxy. The
residual variations include interannual and multi-decadal variability
very similar to that of the Atlantic Multidecadal Oscillation (AMO).
Full paper here: http://www.scitechnol.com/GIGS/GIGS-1-101.pdf
Kev-in-Uk says:
January 20, 2013 at 11:43 pm
Kev – If you visited the grocer in search of raw carrots and you were handed a plate of cooked carrots – you would know the difference. But apparently, some climate scientists believe the carrots raw if they were just handed the plate. There are legitimate exceptions, like AMSU data that require processing to produce the temperature number, but when it come to thermometers, there should be a truly raw number.
The situation with modeling is similar in that some climate scientists confuse model output with measurements taken in the real world in the field.
And they want us to trust them? Hah!
Interesting that Daniel Goldber, Yong Gang Li and Yao Yi Chiang are not listed on USC’s web-site faculty/staff directory.
And Jixiang Wu, though on staff at SDSU as an assistant professor, does not mention sitting on the GIGS board, either on his directory web-page or on his curriculum vitae. This is odd, because his vitae lists 9 journals and a US government agency for which he is a reviewer. I guess I find it odd that he sits on an editorial board for an academic journal and it doesn’t appear anywhere on his otherwise extensive and detailed list.
The document I am looking at appears to be almost 2 years old though, so it is possible he just hasn’t updated it with his most recent accomplishment.
Interesting about the three USC board members though. Not being listed as faculty at USC and all.
Provenance of any data set is everything.
You would take it with a “pinch of salt” if someone said a painting was by Rembrandt because some ancestor who knew about paintings had said it was.
If the original raw data was (shamefully) lost or deleted, unless it turns up and there is proof or provenance then any data set produced based on the original “lost” data set will be and should be treated with the same “pinch of salt” as the “Rembrandt”, in fact even more so as the question needs to be asked how can scientists publish and then lose or delete their raw data?
Provenance of the data is everything.
From Phil Jones To: Michael Mann (Pennsylvania State University). July 8, 2004 (Climategate emails)
“I can’t see either of these papers being in the next IPCC report. Kevin and I will keep them out somehow — even if we have to redefine what the peer-review literature is!”
telegraph.uk UEA contentious quotes
Looks to me like BEST just did its part in redefining “peer-review literature.”
There is some info that maybe I missed.
Did the journal JGR Atmospheres reject the subject paper?
Or, did Muller et al withdraw it from JGR Atmospheres? If they withdrew it then why? If they withdrew it from JGR Atmospheres was it because the review period was becoming too long for their paper to make AR5? Did they need to get a quicker acceptance of their paper at a journal like the GIGS journal in order to have a chance at inclusion in AR5?
Did I miss that back story?
John
Anthony, it would be very interesting to know exactly who and what organization(s) are supporting this new journal… e.g, if there is any conflict of interest with the BEST folks, or if they perhaps even had something to do with getting a new journal started just to be able to publish their paper(s) in what’s ostensibly a ‘peer reviewed reputable academic journal.’
@DirkH 1/20 9:15 am: And I’d like to direct attention again to this razor sharp demolition of the BEST “scalpel” method. (referring to Rasey 12/13/12 11:00 comment specifying “wholesale decimation and counterfeiting of low frequency information happening within the BEST process.”
Thank you for the approving plug, DirkH.
By the theorems of Fourier Analysis, the scalpel destroys the climate-science-critical lowest frequencies in the original data by shortening the surviving temperature records. BEST throws away the signal, then homogenizes the noise. Through the use of the “suture” short segments are spliced into long ones; low-frequencies appear — but from where did they come? Until I see otherwise, I must conclude that the low-frequency components (i.e. the Climate “signal”) of the 100+ year record are artifacts of the suture process. The long term trend is counterfeit – artificial, illegitimate, with a look of reality.
I have yet to see any answer of substance to the Fourier domain Low-cut filter argument. Have you? I didn’t find it in the Dec. 2012 and Jan 2013 links Mosher provided.
Now this is what consensus looks like.
James Annan and Anthony Watts on the same page regarding OMICS.
http://julesandjames.blogspot.com/2013/01/best-laugh-of-day.html
@Mosher 1/20 11:34 am: “So presumably a new improved method of data ‘homogenization’?”
nope. just a method proposed and endorsed by skeptics before they saw the result.
Are you speaking for all skeptics? I hope not. You don’t speak for me.
Endorsed by (some few) skeptics, perhaps. Who? How many? You are referring to the AGU Dec 2012 poster, right? The one that compares two potentially dodgy methods of homogenization against each other?
@Mosher 1/20 12:34 pm: The data that I hope to have up in due course would be the scalpeled data. That is, showing where all the cuts are made. In due course the data for every station will be online with charts showing where the scapel was applied and why it was applied.
So, it is pretty clear that post-scalpel data and the temporal distribution of cuts is not yet available for review. We are to trust the sausage makers that their product is safe to consume. As both a geophysicist and a taxpayer, this turns my stomach. I smell a rat and it is probably coming from inside the sausage.
@Mosher 1/20 12:38 pm The scalpel method ( as endorsed by Willis
THAT I like to see. I wonder if Willis agrees he “endorsed” it? Link, please.
works. See the AGU poster for a double blind test of the method.
From 12:51 pm: link to Poster: http://berkeleyearth.org/images/agu-2012-poster.png. The poster does NOT assuage my concerns. It reinforces I have not misunderstood the BEST process. “Results” amounts to comparing two untrustworthy methods with similar assumptions against each other. But thanks for the link. (my guess is that this is from the Dec. 2012 AGU…. It’s only been 21+ months coming.).
The Rohde 2013 paper uses synthetic error free data. The scalpel is not mentioned. My concern is the use of the scalpel on real, error riddled data.
@auto 1/20 1:20 pm The maths – scalpel effects and so on – seems to have been debunked already
I wouldn’t go that far. But potentially fatal flaws in the post-scalpel frequency content have yet to be addressed, judging by the AGU poster (Dec. 2012) and 1/15/2013 PDFs.
@James Baldwin Sexton: 1/21 3:03am We don’t have good analogs for it in the 20th century, but, we used 20th century temp spacing for the earlier time period that volcanoes wrecked havoc on the earth. How stupid is that? I can’t get to discussing scalpels and jack-knifes when this madness is offending my eyes!
Agreed. There is a dearth of temperature records pre-1900, but that doesn’t stop BEST from making a global temperature profile back to 1750. “Caveat emptor” is no defense here. No amount of error bars is going to cover that sin. I don’t believe things just because it comes out of a computer, but there are too many that do. And too many willing to use what is convenient.
jim2 says:
If you visited the grocer in search of raw carrots and you were handed a plate of cooked carrots instead – would you …
————
That is a great analogy. Thanks jim2.
I wouldn’t accept the carrots, of course, without knowing how the carrots were cooked.
I wouldn’t want to have a database with matlab code showing how each individual carrot was cooked so that it would be impossible to download the files of 44,000 individual carrots and then spend the next two weeks finding out the 44,000 carrots were cut into 180,000 different pieces and then each piece cooked was at 145F for 10,000 of the carrots, at 155F for 10,000 of the carrots etc.
I would just want to know “how the carrots were cooked”. A nice simple explanation with perhaps a video showing how the carrot/temperatures were cooked.
There are good ways to cook carrots and there are ways that make them too soft or too hard or don’t convert enough of the carbohydrates into sugars, that “over-cook” the carrots/temperatures.
I wouldn’t want to buy cooked carrots without an endorsement from someone saying they taste really good and were cooked to perfection. I’d rather just have the RAW carrots and cook them myself the best way to make the right kind of carrots.
Stephen Rasey says:
January 21, 2013 at 6:04 pm
Agreed. There is a dearth of temperature records pre-1900, but that doesn’t stop BEST from making a global temperature profile back to 1750. “Caveat emptor” is no defense here. No amount of error bars is going to cover that sin. I don’t believe things just because it comes out of a computer, but there are too many that do. And too many willing to use what is convenient.
===================================================
Thanks Stephen, it’s been my experience that the ones believing what comes out of computers are the ones who know the least about what computers do and how they work. And, like you, I think coloring huge error bars is simply a form of dishonesty. Likely to themselves, but maybe to others.
Re: the Stephen Rasey comments regarding filtering of low frequency information by the scalpel process (following links to comments at WUWT and CA)
Here is my attempt at rephrasing these arguments in layman’s terms with some of my thoughts added:
The filtering of frequencies may perhaps be better understood by reference to wavelengths. High frequencies have short wavelengths and low frequencies have long wavelengths. Therefore, the scalpel method filters out all frequencies that have a wavelength longer than the longest fragment. The only frequencies that the scalpel does not filter at all are those with a wavelength shorter than the shortest fragment. The frequencies with wavelengths longer than the shortest fragment and shorter than the longest fragment are partially filtered. Given that the average fragment length is about 12 years (IIRC), trends of longer than that are effectively filtered out. Any trend longer than that in the reconstruction is apparently a result of modeling and does not come directly from the data (no matter how voluminous), since low frequencies are filtered out.
High frequencies are also filtered out. The use of monthly averages (actually smooths) effectively filters out all frequencies with wavelengths shorter than one month. Deseasonalizing the data (using models) partially filters out all frequencies with wavelengths shorter than a year and longer than a month. Given that some fragments may be only a few years long, it would seem that almost all frequencies are at least partially filtered out. That would leave precious few frequencies that haven’t been filtered at all to validate all the models used by BEST.
How is BEST not effectively a sort of statistical homeopathy? Homeopathy, as I understand it, is basically taking a supposedly active ingredient and diluting it multiple times until almost none of it is left and then marketing it as a cure for various ailments. Doesn’t BEST dilute the information in the mountain of data that they process by filtering, at least partially, almost all of the frequencies inherent in the data and replacing them with modeled data? Isn’t BEST essentially models most of the way down, as most of the information in the data is filtered out?
P.S. I believe that various different individuals post comments at WUWT under the handle Phil
Dear oldfossil. RE: “I’m prepared to accept whatever result they produce, even if it proves my premise wrong.”
“Blind Acceptance” of any result before all of the nessasary Peer Review is done is a mistake. Even if it’s a result you expect. The whole reason for Peer Review is so others can try to duplicate and verify, or disprove the authors methods and theory.
Steven Mosher said:
“The good news is you can take every station in CRU, delete it, and you still have 32,000 stations. And of course the answer doesnt change.”
“But if you dont like GHCN monthly data you can delete those 7000 stations and you are left with 29000 stations. And the answer doesnt change.”
I’ve seen Steven Mosher say similar things before, and it struck me as somewhat strange, or at least it would give me pause for thought. If the answer doesn’t change with different inputs then could the algorithm be somewhat insensitive to input data. Almost as though the answer is being driven by any smoothing algorithms.
If the answer doesn’t change with inputs, it’s obvious that the algorithms are driving the answer to show ” man made ” global warming.
Phil says:
January 21, 2013 at 8:36 pm
Phil, that’s an interesting claim. I’ve given it some thought, and I’m not sure it’s true. Here’s my thought experiment on the matter. Suppose you have a regular cyclical signal with a period of lets say 30 years. Suppose further that you have 150 years of that data.
Now, let’s subject it to the scalpel by copying chunks of it of random lengths, with random starting points. Some will be longer, some will be shorter, let’s assume an average fragment length of 12 years and a maximum length of 20 years. We make a number of such random copies of the original signal, say a thousand of them.
Here’s the question. From those chopped up fragments, could you reconstruct the original signal?
For me the answer is sure, no problem. As long as there is some overlap between the fragments, we can reconstruct the original signal exactly, 100% correctly.
I bring this up to show that the mere fact that we cut the data into short, 12-year fragments does NOT, as you claim, mean that “trends of longer than that are effectively filtered out”. It also does NOT mean that “the scalpel method filters out all frequencies that have a wavelength longer than the longest fragment”. They are not filtered out in the slightest. Provided there is overlap, we can reconstruct all of the variations from the fragments … and in the real world with the number of temperature station records, there is always an overlap between stations.
Now, how well we can reconstruct the full signal from the individual fragments, and the best method to do that, that’s a separate question.
But your claim, that “Any trend longer than [12 years] in the reconstruction is apparently a result of modeling”, that’s not true. Long-period trends have noise added to them by the scalpel technique, but the scalpel technique does not lose the long-period information as you claim. The long-term trends stay in the data, they are not removed as you think.
w.
Willis,
I agree with the logic in your thought experiment, but it is predicated on an important assumption (which I have highlighted in bold):
Two issues: First, the problem is that there has to be some credible basis on which to reconstruct the original signal, such as the overlap you mention. The problem is that, as I understand the scalpel (and I may not understand it correctly), by definition there isn’t going to be an overlap between fragments of the record of a given station. IIRC, the scalpel is applied after combining multiple station records (where such exist), so there would be no overlap to help stitch the fragments back together. Consequently, you would have to go back to using neighboring stations (or something similar) that, hopefully, aren’t also chopped across the discontinuity that the scalpel has found to help stitch the fragments back together. That implies a model and you are back again to the same issue of just how far you can go to help you stitch the fragments back together. Are you going to use stations as far as 1200 km away from the station in question to do so?
In short, after chopping up the data into 180,000 or so fragments, these fragments need to be stitched back together again using some sort of mathematical technique that should properly be called a model and that should ideally be individually validated. After you stitch these fragments back together, one has to ask whether it is the data talking or the seamstress.
Second, it is easier to imagine stitching fragments back together when considering idealized examples, such as pristine synthetic data. When using real-life, error-riddled data, I would submit that putting the fragments back together is not trivial.
As for evidence of the lower frequencies being filtered out, I would point you to this comment over at CA (and it is worthwhile reading the whole thread):
Stephen Rasey replies:
Cheers.
Maybe this is the reason that BEST could not get into a legitimate journal.
re: amirlach says: January 21, 2013 at 8:42 pm
While I agree with your warning about the problems with blind acceptance – last I knew peer review was strictly to catch flaws in the scientific method used in each individual experiment (along with minor issues such as typo’s, ambiguous phrases, etc. that need correction). Peer review has essentially nothing to do with replication, verification, and validation (other than ensuring the scientific method was followed, e.g., that the methods are clearly spelled out such that other scientists can replicate if desired, and that there aren’t any gross errors in the paper).
Replication is done in separate experiments by other scientists with no conflict of interest, by identically repeating the exact experiment initially conducted. Replication is done far too infrequently, and we see the resulting problems with retracted papers, fraud, etc. that later comes to light. But what scientist is able to get funding to replicate another scientist’s work – and wants to replicate someone else’s work rather than try to come up with their own new findings? It’s a problem and the lack of sufficient follow up of this nature has been seen time and again. Validation is when other scientists attack an initial experiment’s hypothesis using different experimental methods to see if the hypothesis either holds up or fails. Verification is testing the experiment to ensure that all possible confounding factors are controlled for, that the statistics are sound, etc.
Steven Mosher,
I write this in all sincerity: you need to distance yourself from this BEST “peer review” scam, like Judith Curry already has. It is as phony as a three dollar bill.
Thanks, Phil. You are correct that the scalpel technique leaves no overlap between adjoining sections of the same station. And you are right that the variations in other stations are used to fill in the overlap.
However, you seem to think that we are trying to reconstruct an individual station. We’re not. We’re looking for larger averages … and those larger averages perforce contain the overlaps we need. However, the scalpel doesn’t use “neighboring stations” to put them back together. Instead, it uses kriging to reconstruct the original temperature field.
Now, as you point out, unavoidably there will be noise added by the process. And, as you point out, there is a mathematical model involved … but this is true no matter how you combine individual stations into any kind of overall average.
I looked at the two Fourier analyses you posted, one of the BEST land data, and the other of HadCRUT data. Since these are totally different datasets, I’m not surprised in the least that they have different Fourier analyses … were you actually expecting different datasets, one covering two and a half times the area of the other, one of which doesn’t contain the ocean, to have the same Fourier transform?
Because I sure don’t …
Here’s the thing, Phil. There’s no “right” way to do the task of averaging the planetary temperatures. Every way that you do it will have pluses and minuses. Every way that you do it involves some kind of mathematical model. But if you wish to show that the “scalpel” method is inferior to the others, you’ll have to do more than claim it. Kriging can be demonstrated mathematically to be the best technique for joining up spatially and temporally disparate data, so you are fighting an uphill battle.
Look, Phil, the scalpel method has problems, like every other method you might use. But that doesn’t make it inferior to the others as you seem to thing.
w.
@Phil.
Sir, you have a way with words. Nice concepts, these:
8:36 pm:How is BEST not effectively a sort of statistical homeopathy? Homeopathy, as I understand it, is basically taking a supposedly active ingredient and diluting it multiple times until almost none of it is left and then marketing it as a cure for various ailments.
3:04 amIn short, after chopping up the data into 180,000 or so fragments, these fragments need to be stitched back together again using some sort of mathematical technique that should properly be called a model and that should ideally be individually validated. After you stitch these fragments back together, one has to ask whether it is the data talking or the seamstress.
I want to modify a key statement I made up at 5:46 pm.
By the theorems of Fourier Analysis, the cutting long temperature records into shorter lengths destroys the climate-science-critical lowest frequencies in the original data. BEST attenuates and throws away the climate signal, then spatially homogenizes and krigs the remaining weather noise to deduce a climate signal.
From a climate signal “chain-of-custody” frame of reference, the process is daft.
Stephen Rasey says:
January 22, 2013 at 10:46 am
Thanks, guys. The problem is that simply claiming that the scalpel method “throws away the climate signal” doesn’t establish anything. Nor does describing it as “statistical homeopathy”, although indeed it is a lovely turn of phrase. The scalpel method is described here. It would be useful if you could quote some part of that exposition and demonstrate why it is wrong, rather than simply claiming it is wrong.
Finally, in the end all methods of averaging a bunch of temporally and spatially scattered temperature records are “wrong”, in that there is no agreed upon “right” way to do it. Even if you don’t do anything to the data you have to combine 38,000 records or something. Rather than asking “is it wrong”, it is preferable to ask “is it better than the other methods?”. The BEST folks have done their homework in that regard, as reported here. It outperforms both the CRU and the GISS methods. So if you think the scalpel method is wrong, you’ll have to point us to something better. It is indeed wrong, all global temperature averages are wrong … but the scalpel method is less wrong than the other methods we’ve invented for the purpose.
Look, it’s no secret that I’m absolutely no fan of Richard Mueller, I don’t like a number of things he’s done. And Steven Mosher and I, despite being friends, often butt heads on a wide range of topics. And the journal it finally was published in is totally unknown.
But all of that is scientifically meaningless. It is totally separate and distinct from their math and logic and methods. Those stand or fall on their own, and to date, as near as I can tell, they stand up better than the competition. Always more to learn, anything could be falsified at any time, when the facts change I change my opinion … but that’s how I see it today.
All the best,
w.