After almost two years and some false starts, BEST now has one paper that has finally passed peer review. The text below is from the email release sent late Saturday. It was previously submitted to JGR Atmospheres according to their July 8th draft last year, but appears to have been rejected as they now indicate it has been published in Geoinformatics and Geostatistics, a journal I’ve not heard of until now.
(Added note: commenter Michael D. Smith points out is it Volume 1 issue 1, so this appears to be a brand new journal. Also troubling, on their GIGS journal home page , the link to the PDF of their Journal Flier gives only a single page, the cover art. Download Journal Flier. With such a lack of description in the front and center CV, one wonders how good this journal is.)
Also notable, Dr. Judith Curry’s name is not on this paper, though she gets a mention in the acknowledgements (along with Mosher and Zeke). I have not done any detailed analysis yet of this paper, as this is simply an announcement of its existence. – Anthony
===============================================================
Berkeley Earth has today released a new set of materials, including gridded and more recent data, new analysis in the form of a series of short “memos”, and new and updated video animations of global warming. We are also pleased that the Berkeley Earth Results paper, “A New Estimate of the Average Earth Surface Land Temperature Spanning 1753 to 2011” has now been published by GIGS and is publicly available.
here: http://berkeleyearth.org/papers/.
The data update includes more recent data (through August 2012), gridded data, and data for States and Provinces. You can access the data here: http://berkeleyearth.org/data/.
The set of memos include:
- Two analyses of Hansen’s recent paper “Perception of Climate Change”
- A comparison of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques on ideal synthetic data
- Visualizing of Berkeley Earth, NASA GISS, and Hadley CRU averaging techniques
and are available here: http://berkeleyearth.org/available-resources/
==============================================================
A New Estimate of the Average Earth Surface Land Temperature Spanning 1753 to 2011
Abstract
We report an estimate of the Earth’s average land surface
temperature for the period 1753 to 2011. To address issues
of potential station selection bias, we used a larger sampling of
stations than had prior studies. For the period post 1880, our
estimate is similar to those previously reported by other groups,
although we report smaller uncertainties. The land temperature rise
from the 1950s decade to the 2000s decade is 0.90 ± 0.05°C (95%
confidence). Both maximum and minimum daily temperatures have
increased during the last century. Diurnal variations decreased
from 1900 to 1987, and then increased; this increase is significant
but not understood. The period of 1753 to 1850 is marked by
sudden drops in land surface temperature that are coincident
with known volcanism; the response function is approximately
1.5 ± 0.5°C per 100 Tg of atmospheric sulfate. This volcanism,
combined with a simple proxy for anthropogenic effects (logarithm
of the CO2 concentration), reproduces much of the variation in
the land surface temperature record; the fit is not improved by the
addition of a solar forcing term. Thus, for this very simple model,
solar forcing does not appear to contribute to the observed global
warming of the past 250 years; the entire change can be modeled
by a sum of volcanism and a single anthropogenic proxy. The
residual variations include interannual and multi-decadal variability
very similar to that of the Atlantic Multidecadal Oscillation (AMO).
Full paper here: http://www.scitechnol.com/GIGS/GIGS-1-101.pdf
@Willis 10:32 am There’s no “right” way to do the task of averaging the planetary temperatures.
True. But there are certainly ways more wrong than other ways.
The goal is not to take the planetary temperature.
The goal is to determine the long term change in the planetary temperature.
If BEST honored the actual values of the recorded temperatures, I wouldn’t be raising this issue. BEST, through the use of the scalpel, shorter record lengths, and homogenization and krigging is honoring the fitted slope of the segments, the relative changes, more than the actual temperatures. By doing that, BEST is turning Low-Pass Temperature records into Band-Pass relative temperature segments.
With Band-Pass signals, you necessarily get instrument drift over time without some other data to provide the low frequency control. I suspect BEST has instrument drift as a direct consequence of throwing away the low frequencies and giving low priority to actual temperatures.
In the petroleum seismic processing (CGG PDF 1 MB link) realm, the recorded signal is a band-pass time-sampled series of sound energy ground accelerations or water pressure. In the seismic model, the Signal = Convolution ( Source, Reflectivity profile). Once you deconvolve the source from the signal, you are left with a band-pass Reflectivity profile. Reflectivity is the difference in Impedance (velocity * density) of layers of the earth. It is possible to integrate the reflectivity profile to get an Impedance profile, but because the original signal is band-limited, there is great drift, accumulating error, in that integrated profile. The seismic industry gets around that drift problem by superimposing a separate low frequency, low resolution information source, the stacking or migration velocity profile estimated in the course of removing Source-Receiver Offset differences and migrating events into place.
In summary, exploration seismic processing will integrate the band-pass acceleration-difference signals to get high-frequency impedance differences, but they use a separate low-frequency data source to control drift to get usable inverted impedance (velocity with an assumed density) profiles. Of course, the quality of the low frequency control governs the quality of the final product.
In a similar vein, BEST integrates scalpeled band-pass short term temperature difference profiles, to estimate total temperature differences over a time-span. Unless BEST has a separate source to provide low-frequency data to control drift, then BEST’s integrated temperature profile will contain drift indistinguishable from a climate signal.
Low frequency content is the “whole ball game” when it comes to a climate signal. High frequency content amounts to weather and seasons. From what I have seen since April 2011, BEST is decimating the low frequency content of the temperature records via the scalpel and not returning that information back into the final product. The low frequency content in the final product must be treated as drift until the low frequency control is defined.
@Willis 10:46 am It would be useful if you could quote some part of that exposition and demonstrate why it is wrong, rather than simply claiming it is wrong.
I cannot quote what is not there.
The problem with the BEST process is what is missing: the low frequency control.
from the Rohde paper: http://berkeleyearth.org/pdf/robert-rohde-memo.pdf
Bottom of page1: By limiting the present discussion to “error-free” data, we can examine the efficacy of the different averaging techniques separate from the consideration of quality control and homogenization issues.
Page 2: The dataset includes 7280 weather stations and will provide a set of times and locations at which the climate model field can be sampled in order to produce synthetic data with a realistic spatial and temporal structure.
Page 4: As the simulated data is intrinsically free from any noise or bias, we have omitted any parts of the respective algorithms associated with quality control or homogenization….. To further reduce differences, we used the true seasonality in the GCM field [something not known in reality] as the basis for removing seasonality from each simulated time series so that slight differences in the handling of seasonality in the three algorithms would not affect our conclusions.
From the above, I’m have to conclude that the scalpel was not used on any time series in this test. From my Fourier point of view, this was a successful test of the krigging on the full spectrum of the data available with the low frequencies preserved.
Stephen Rasey says:
January 22, 2013 at 12:57 pm
Damn – I’ll have to try and re-read that. But I think I can see your point that the test method did not employ the scalpel to test if re-integration of the data actually retained the LF signal?
But also, did I read that right – that the test was not including any data that was adjusted/homogenised – in which case, as error free data, what was the point?
Stephen Rasey says:
January 22, 2013 at 12:05 pm
Thanks, Stephen. You keep making that same claim over and over, that low frequency information is lost. As I said above, you need to demonstrate it rather than claiming it. Repeating your claim does nothing.
w.
Stephen, thanks for hanging in there. Let me see if I can explain the issue. At some points in some temperature records, there are abrupt discontinuities. The record moseys along for thirty years, then jumps a degree or so … but not one of its neighbors does the same thing.
Typically these jumps are caused by things like a change in thermometer location, a change in thermometers, or a change in surroundings. That’s why the neighbors don’t show the jumps.
Now, there’s a case for just leaving those jumps in the record. And it is always worth noting what the raw data looks like before you touch it. The problem arises because more of those abrupt jumps in the record are upwards jumps than downwards jumps (although both exist). So if you just use raw data, you end up with an artificial long-term trend of unknown size and sign. No bueno.
Now, there’s a couple of ways to deal with that challenge. One is to see if you can calculate, from looking at the nearest neighbors, how big the erroneous jump was, than then remove that amount from the subsequent part of that particular temperature record (or add it to the earlier part). That’s how GHCN does it.
Me, I think it is preferable to just acknowledge that the data before and the data after the jump are actually two different datasets. If the thermometer moves across the airport, or if they change to an electronic sensor in a new location, you are not measuring what you were measuring before. It is a brand new record, starting with when it moved across the airport.
To me, that is the underlying logic behind cutting the records—you are just acknowledging the reality on the ground, which is that the measurements pre- and post-jump are not measuring the same thing.
Does this lose low-frequency information as you say? Yes and no. The real answer is, the low-frequency information was never there, since the two records (pre- and post-cut) were measuring different things … and not only that, because they were incorrectly combined, they have a bogus jump in the middle. Think about what that does to your analysis of long-term cycles.
So there is loss of information, but it is information that we have determined to be suspect. Since e.g. the underlying trend is not correct in the raw data, what exactly are you losing by cutting the records other than the incorrect part of the trend?
As to the issue of recombining them, remember that they are not being recombined, because they are different records. Instead, they are simply one more record added into the global average, and that is no different whether you cut or don’t cut the records into shorter sections.
I’m happy to answer questions, although I’m likely not the best person to do so …
All the best,
w.
Oh, yeah. Steven Mosher, please feel free to step in and explain if you think I’ve got something wrong.
w.
@Willis: 1:59 pmIf the thermometer moves across the airport, or if they change to an electronic sensor in a new location, you are not measuring what you were measuring before. It is a brand new record, starting with when it moved across the airport.
To me, that is the underlying logic behind cutting the records—you are just acknowledging the reality on the ground, which is that the measurements pre- and post-jump are not measuring the same thing.
First, you are cherry picking with your example. For the most part, we don’t have the metadata to identify why there was a change.
Second, do we get a truer sense of the climate if the slice the climate record every time we paint the Stevenson Screen or we preserve the entire record? I argue the latter, one record, is FAR CLOSER to the real trend.
Here is what I think is realistic demonstration:
The Rohde paper seems to prove that that the BEST process can work with Error-free Synthetic data and do a best-in-class job of homogenization. Fine. Do the analysis once with REAL data, UN-Sliced. If you get the same answer as with tens of thousands of cuts, then that is a valid demonstration that slicing doesn’t matter…. (but then why do it?)
If you get a significantly different long term trend with the uncut data… WHY might that be? And wouldn’t that be interesting? Is it because of real changes to the temp record that must be corrected? Or is it because we capture UHI effects and Stevenson Screen paintings, multiple of times in saw-toothed spliced records? At least let us capture that uncertainty.
@Willis 1:27pm, you need to demonstrate it rather than claiming it. Repeating your claim does nothing.
How can I practically demonstrate other than through established mathematical principles and theorems? I think I have done so to the best of my ability with what’s available.
Where have I blundered? Where did I get the math wrong? Are lowest frequencies important? Are the lowest original frequencies in the bit bucket after you slice a record? If so, has BEST shown where they come back or are preserved elsewhere? If so, where and how?
If you have been confronted with claims of a Perpetuum Mobile do you go build a demonstration, or do you argue from Thermodynamics that something is not right?
(I’m done for the day. – I’ll reply 1/23.
Thank you for your attention.)
Stephen Rasey says:
January 22, 2013 at 2:54 pm (Edit)
Well, allow me to assist you, then. What you do is generate say 1,000 synthetic temperature series, with autocorrelation equal to that of temperature data.
Then you put artificial jumps into them, at various places, use the scalpel to chop them up, and see how closely you can reconstruct the underlying signal.
The problem is, we know that there are jumps in the data. Unless you think we should ignore them (in which case we have nothing to discuss), how to you plan to rectify that?
Where you have blundered is confusing the maximum frequency resolvable in an individual record, and the maximum frequency resolvable in a group of records. Let’s take one of the simplest ways of finding the average anomaly, using the first differences. If we have 38,000 overlapping records of a host of various lengths, we can take the first differences (the monthly changes in the case of temperature records), average them, and then reconstruct the average signal.
Now, the maximum signal that we can resolve by that method is not limited to the average length of the individual segments, or even the longest segments. I showed that above, and I believe you agreed. Instead, it is limited by the total length of the dataset, with an associated error estimate at each step.
When you average by that method, using the scalpel on the data has very little effect on the average. It just reduces the N for that timestep by one station. On the next timestep you pick up the new station, and the beat goes on. Unless the number of cuts in that particular month is large compared to N, the error will be small.
Think about it with sine waves. Suppose you took 38,000 sine waves, and averaged them by first differences. Then you cut each one at random somewhere in the data. Could you reconstruct the original average by averaging the cut data in the same way? Sure, with 38,000 sine waves, the cuts won’t be much noticed.
Experiment with some synthetic data, and it will be clearer. Like I say, you can use sine waves for simplicity.
Finally, you ask above, should we call it a new record every time we paint the Stephenson screen? I say if it makes a statistically significant difference so it is mathematically detectable as being spurious, sure. Otherwise, if you paint the screen every twenty years, you artificially introduce a spurious twenty-year cycle into the data … which is the exact problem I’m pointing at. Not cutting the data leaves spurious cycles in it, long-term cycles. If a sixty-year record has a spurious one-degree jump in the middle, it appears to contain a sixty year cycle plus a trend … but in reality no such cycle or trend exists.
So yes, we are removing information by cutting, but it is spurious information, incorrect information. If a thermometer is moved, the difference between the last of the old record and the first of the new record is WRONG. That’s the problem. I suggest removing it, because leaving it in creates spurious trends and cycles, particularly at longer time periods. YMMV.
w.
Stephen Rasey says:
January 22, 2013 at 2:30 pm
Why do we need the metadata? That is what the comparison with adjacent stations is for, to see if there is an anomalous jump. We don’t need to know what the jump is from. We just need to be able to determine that in fact it is spurious.
Also, I gave three examples (change in location, change in instrumentation, change in surroundings), not one, and they are the three most likely reasons for spurious jumps in the temperature data. How is that “cherry picking”?
w.
PS—You may not be aware of this, but an accusation of cherry picking is an accusation of deliberate misrepresentation of the data, and I don’t brook a man calling me a liar, no matter how politely. It’s a serious accusation, not one to toss around.
Willis Eschenbach says:
January 22, 2013 at 3:43 pm
You say we need to know if a shift/jump is spurious – which is of course what you see from a long term series – it should ‘jump’ out at you from a simple plot. Contrast that say, to something like slow electronic sensor drift – which wouldn’t jump out at you without the long term plot and some curious ‘human’ intervention or double checking. At this moment in time, I fail to grasp how either would/could be ‘caught’ by the BEST method of cutting and splicing. Indeed, if a fault in the data is present, it appears to me that it could be missed altogether, or artifically amplified by the procedure – depending on the data treatment. For example, does the algorithm say that if you have five stations, and 3 show warming, but 2 show cooling – does it assign the weighting in favour of the 3 over the 2? Or does it take the next adjacent five stations to those to make a decision, etc, etc.
This is all way above my level of statistical understanding – as an engineer and geologist, I need to grasp the workings and see if they make sense. As a former (half decent) chess player, that’s how I’m thinking about it – kind of forward move planning, the consequences of consequences, etc – so, to my mind, if we weight some station, how is that treated with respect to the ‘next’ adjacent stations data treatment? hope that makes sense……does it create in effect some kind of feedback loop?
Kev-in-Uk says:
January 22, 2013 at 4:35 pm
It wouldn’t be caught by the scalpel method, or by the GHCN method either, if the change is slow … but then I know of no method to identify it if it is slow.
That kind of slow change, however, is a separate confounding problem. You can only do as much as you can do.
w.
Kev-in-Uk says:
January 22, 2013 at 4:35 pm
The process runs like this. From the nearest stations, you construct an average temperature anomaly. Then you compare the Station X anomaly to that local average temperature anomaly, but not comparing the trend.
Instead, you first subtract the local average temperature anomaly from Station X anomaly to give a series of differences. Then you use a mathematical algorithm that sweeps from one end of the difference data to the other, and compares the variance of the left hand side to the variance of the right hand side. If there is a discontinuity, it shows up as a jump in the graph when the algorithm sweeps past the jump. This procedure has nothing to do with the trends, it just reveals discontinuities. What you do with them is then your choice.
w.
@Willis 3:43 pm. an accusation of cherry picking is an accusation of deliberate misrepresentation of the data, and I don’t brook a man calling me a liar, no matter how politely.
No disrespect intended. I didn’t call you a liar, nor use any synonym for one. In my writings on WUWT I think I have shown you great respect.
In my clumsy way, I was pointing out that you gave three examples of station changes from a much larger population of possibilities. No fair minded person would ever expect conversational examples were required to be random samples of the whole.
I cherry-pick all the time in the testing of theories and processes. When you test boundary conditions, by design you do not pick the data fairly or randomly.
Once again, no disrespect was intended. I apologize.
I was curious.
Spine Journal seems to be the owner link
Predatory publishers are corrupting open access
Is Omicsonline a scientific scam? Yes it is.
No where can I find any information on who are the owners, who is the CEO or who are the Board of Directors.
Stephen Rasey says:
January 23, 2013 at 7:56 am
Thank you kindly, sir, you are a gentleman, the issue is forgotten.
For the future, I would caution you about accusing people of “cherry-picking”. Perhaps you use it to mean selecting data for a particular purpose. However, that is far from the common meaning. Usually, it is intended as a slur, certainly with huge negative connotations. It implies that you are carefully selecting data, not to test limits as you suggest, but to fool people about the results of a particular analysis.
For example, “cherry picking” is listed among Wikipedia’s list of logical fallacies. Here’s a definition:
A person who is cherry picking is lying about what is going on on the ground, they are specially selecting their data to support their case while claiming (or implying) that they have taken a fair selection of data. In other words, it is a lie, they have lied (by commission or omission) about the data selection process.
That’s why I took offense. Cherry picking is definitely not something an honest scientist would do. So you accused me of not being an honest scientist.
However, that’s all for the future, as I said, you have cleared the slate entirely, we can move forwards.
I apologize for the interruption, and we now return you to your usual programming.
w.
PS—Let me point out again that
1) I gave, not just three examples of reasons for jumps in the data, but the three most common reasons. Together, they probably account for 95% of the jumps in temperature data.
2) I did not say that that was all of the possibilities, just the common ones.
So no, even by your definition I was not “cherry picking” in any sense of the word.
@Willis 3:21 pm:
This gets to the heart of the matter. I believe that many times, perhaps most, we should not create a new record even if the jump is obvious. I will follow up with an Excel chart treatment, but first I’ll describe what I think matters most.
Let me nominate the occasional “painting of a Stephenson screen” as a member of a class of events called recalibration of the temperature sensor. Other members of the class might be: weeding around the enclosure, replacement of degrading sensors, trimming of nearby trees, removal of a bird’s nest, other actions that might fall under the name “maintenance”.
A property of this “recalibration class” is that there is slow buildup of instrument drift, then quick, discontinuous offset to restore calibration. At time t=A0 the sensor is set up for use at a quality satisfactory for someone who signs the log. The station operates with some degree of human oversight. At time t=A9, a human schedules some maintenance (painting, weeding, trimming, sensor replacement, whatever). The maintenance is performed and at the time tools are packed up the station is ready to take measurements again at time t=B0. A recalibration event happened between A9 and B0. The station operates until time t=B9 when the human sees the need for more work. Tools up, work performed, tools down. t=C0 and we take measurements again. The intervals between A0-A9, B0-B9 are wide, likely many years. A9-B0 and B9-C0 recalibration events are very short, probably within a sample period. My key point is that A0-A9 and B0-B9 contain instrument drift as well as temperature record. A9-B0, B9-C0 are related to the drift estimation and correction.
At what points in the record are the temperatures most trustworthy? How can they be any other but the “tools down” points of A0, B0, C0? We go back to look at the temperature record and let BEST slice and dice with the scalpel. What if the scalpel detects a discontinuity at B0 and/or C0? Should it make a cut there? That all depends upon what happens next.
1. From everything I have read about the BEST process, it would slice the record into a A0-A9 segment and a B0-B9 segment and treat the A9-B0, B9-C0 displacement as a discontinuity and discard it. BEST will honor the A0-A9, B0-B9 trends and codify two episodes of instrument drift into real temperature trends . Not only will Instrument drift and climate signal be inseparable, we have multiplied the drift in the overall record by discarding the correcting recalibration at the discontinuities.
There are at least two alternatives.
2. Don’t cut it at all. A0-A9-B0-B9-C0-C9-D0… as one full-life segment. Look at the saw and not the saw-teeth. The low frequency signal is still trustworthy. It is a good long term record with recalibration points and temporary drift error. Yes, there is an intermediate frequency that is spurious from the recalibration events. But the low frequency, the longest term trend is still solid A0-B0-C0-D0.. is as good as it gets. Intermediate points, B1, B2… B9, C1, contain some unknown degree of drift, but we have not baked it into the trend of each segment. We do not duplicate the drift. Over several segments the drift contribution to the trend will diminish instead of grow.
3. Gradually adjust each slice by the discontinuity estimated at the end of each slice. Cut the slices at A9-B0 and B9-C0. Using a trend algorithm determine the absolute offset between the trends of A0-A9, B0-B9. That will be the estimated instrument drift at A9 and B9 We should remove the A9 instrument drift from all along the A0-A9 segment. A reasonable approach is to assume the drift had growth proportional with time. Subtract an amount of drift proportional to the time distance from the previous recalibration point. Adjust, the A0-A9, B0-B9 such that A0′-A9′-B0′-B9′ no longer shows the recalibration discontinuity and maintain that one long record and it’s lowest frequency.
When it comes to spurious events associated with recalibration events, I maintain #3 is a superior method than #2 which is a superior method to #1.
Here I focused on a subset of all discontinuities; events associated with recalibration of the site. I argue that recalibration should not slice records for that is a return to trend we must not lose. I have a saw-tooth case in mind, which implies the drift is uni-directional. Reality will be much more confused. So please forgive my picking a boundary case for discussion.
Certainly major station moves requires a new record. Moving a station from point X to Point Y within an airport grounds? —- Tougher call. Why move it? Was it because Point X had become thermally contaminated and you wanted to restore Class 1 status (recalibration) at Point Y? By the way, when you move it, a maintenance event happens, too. If so, it argues against slicing the record and discarding the offset at the identified discontinuity.
Stephen, thanks for your thoughts.
The overwhelming majority of “recalibrations”, as you call them, will never reach statistical significance. As you point out they are things like “weeding around the enclosure, replacement of degrading sensors, trimming of nearby trees, removal of a bird’s nest”. By and large, none of those will cause a statistically significant jump in the data, including painting the screen on a regular basis. So they are a non-issue for this discussion, they just do what they do, make a very tiny “sawtooth” in the data with little effect.
As a result, I fear that what you are alluding to is a difference that makes no difference.
I am talking about the much larger events, events that are large enough to be statistically detectable as an anomaly in the record. These do not happen frequently resulting in a “saw-toothed” signal as you say. If they did, it wouldn’t be such a problem. Instead, they occur very occasionally and randomly, one or a few per record.
Now, there are thousands and thousands of records in the temperature dataset with one or two such anomalous jumps in them. As I pointed out, such anomalous jumps will be indistinguishable from a long-term trend or a long-term cycle or both. You are interested in long-term cycles … yet you focus on differences that don’t make a difference, and you gloss over the damage that leaving in the bogus data does to the very long-term issues that are of interest to you.
You think leaving in that bogus data preserves the long-term trends, where far too often they are spurious trends created by the bogus data.
w.
I thought in the past year, there was a WUWT article about the temperature effects of painting a weather-beaten screen with a fresh coat of paint. The changes were not-insignificant in comparison to measured warming rates of 0.1-0.3 dec C/decade. I have not found what I was looking for.
Related links of interest
The Metrology of Thermometers
Posted on January 22, 2011by Anthony Watts
http://wattsupwiththat.com/2011/01/22/the-metrology-of-thermometers/
Hard to believe it is that bad. Is it?
A typical day in the Stevenson Screen Paint Test
Posted on January 14, 2008by Anthony Watts
http://wattsupwiththat.com/2008/01/14/a-typical-day-in-the-stevenson-screen-paint-test/
Which shows a daily plot of temperatures between bare wood, whitewash, latex, and air.
The curves are quite close together, but the difference in the maximum is most important.
Is there a later compilation of the data?
The Smoking Gun At Darwin Zero
Posted on December 8, 2009by by Willis Eschenbach
http://wattsupwiththat.com/2009/12/08/the-smoking-gun-at-darwin-zero/
“Smoking Gun” is what brought me to WUWT.