Analysing the complete hadCRUT yields some surprising results

From The Reference Frame, 30 July 2011 via the GWPF

HadCRUT3: 30% Of Stations Recorded A Cooling Trend In Their Whole History

The warming recorded by the HadCRUT3 data is not global. Despite the fact that the average station records 77 years of the temperature history, 30% of the stations still manage to end up with a cooling trend.

In a previous blog entry, I encouraged you to notice that HadCRUT3 has released the (nearly) raw data from their 5,000+ stations.

undefined

Temperature trends (in °C/century, in terms of colors) over the whole history as recorded by roughly 5,000 stations included in HadCRUT3. To be discussed below.

The 5,113 files cover the whole world – mostly continents and some islands. I have fully converted the data into a format that is usable and understandable in Mathematica. There are some irregularities, missing longitudes, latitudes, heights of a small fraction of the stations. Some extra entries appear for a very small number of stations and I have classified these anomalies as well.

As Shawn has also noticed, the worst defect is associated with the 863th (out of 5,113) station in Jeddah, Saudi Arabia. This one hasn’t submitted any data. For many stations, some months (and sometimes whole years) are missing so you get -99 instead. This shouldn’t be confused with numbers like -78.9: believe me, stations in Antarctica have recorded average monthly temperatures as low as -78.9 °C. It’s not just a minimum experienced for an hour: it’s the monthly average.

Clearly, 110 °C of warming would be helpful over there.

I wanted to know what are the actual temperature trends recorded at all stations – i.e. what is the statistical distribution of these slopes. Shawn had this good idea to avoid the computation of temperature anomalies (i.e. subtraction of the seasonally varied “normal temperature”): one may calculate the trends for each of the 12 months separately.

At a very satisfactory accuracy, the temperature trend for the anomalies that include all the months is just the average of those 12 trends. In all these calculations, you must carefully omit all the missing data – indicated by the figure -99. But first, let me assure you that the stations are mostly “old enough”:

As you can see, a large majority of the 5,000 weather stations is 40-110 years old (if you consider endYear minus startYear). The average age is 77 years – and that’s also because you may find a nonzero number of stations that have more than 250 years of the data. So it’s not true that you can get too many “bizarre” trends just because they arise from a very small number of short-lived and young stations.

Following Shawn’s idea, I computed the 12 histograms for the overall historical warming trends corresponding to the 12 months. They look like this:

Click to zoom in.

You may be irritated that the first histogram looks much broader than e.g. the fourth one and you may start to think why it is so. At the end, you will realize that it’s just an illusion – the visual difference arises because the scale on the y-axis is different and it’s different because if there’s just “one central bin” in the middle, it may reach much higher a maximum than if you have two central bins. 😉

This insight is easily verified if you actually sketch a basic table for these 12 histograms:

undefined

The columns indicate the month, starting from January; the number of stations that yielded legitimate trends for the month; the average trend for the stations and the given month, in °C/century; and the standard deviation – the width of the histogram.

You may actually see that September (closely followed by October) saw the slowest warming trend in these 5,000 stations – about 0.5 °C per century – while February (closely followed by March) had the fastest trend of 1.1 °C per century or so. The monthly trends are slightly random numbers in the ballpark of 0.7 °C but the function “trend” seems to be a more continuous, sine-like function of the month than white noise.

At any rate, it’s untrue that the 0.7 °C of warming in the last century is a “universal” number. In fact, for each month, you get a different figure and the maximum one is more than 2 times larger than the minimum one. The warming trends hugely depend both on the places as well as the months.

The standard deviations of the temperature trend (evaluated for a fixed month of the year but over the statistical ensemble of all the legitimate weather stations) go from 2.14 °C per century in September to 2.64 °C in February – the same winners and losers! The difference is much smaller than the huge “apparent” difference of the widths of the histogram that I have explained away. You may say that the temperatures in February tend to oscillate much more than those in September because there’s a lot of potential ice – or missing ice – on the dominant Northern Hemisphere. The ice-albedo feedback and other ice-related effects amplify the noise – as well as the (largely spurious) “trends”.

Finally, you may combine all the monthly trends in a huge melting pot. You will obtain this beautiful Gauss-Lorentz hybrid bell curve:

undefined

It’s a histogram containing 58,579 monthly/local trends – some trends that were faster than a certain large bound were omitted but you see that it was a small fraction, anyway. The curve may be imagined to be a normal distribution with the average trend of 0.76 °C per century – note that many stations are just 40 years old or so which is why they may see a slightly faster warming. However, this number is far from being universal over the globe. In fact, the Gaussian has a standard deviation of 2.36 °C per century.

The “error of the measurement” of the warming trend is 3 times larger than the result!

If you ask a simple question – how many of the 58,579 trends determined by a month and by a place (a weather station) are negative i.e. cooling trends, you will see that it is 17,774 i.e. 30.3 percent of them. Even if you compute the average trend for all months and for each station, you will get very similar results. After all, the trends for a given stations don’t depend on the month too much. It will still be true that roughly 30% of the weather stations recorded a cooling trend in all the monthly anomalies on their record.

Finally, I will repeat the same Voronoi graph we saw at the beginning (where I have used sharper colors because I redefined the color function from “x” to “tanh(x/2)”):

undefined

Ctrl/click to zoom in (new tab).

The areas are chosen according to their nearest weather station – that’s what the term “Voronoi graph” means. And the color is chosen according to a temperature color scheme where the quantity determining the color is the overall warming (+, red) or cooling (-, blue) trend ever recorded at the given temperature station.

It’s not hard to see that the number of places with a mostly blue color is substantial. The cooling stations are partly clustered although there’s still a lot of noise – especially at weather stations that are very young or short-lived and closed.

As far as I remember, this is the first time when I could quantitatively calculate the actual local variability of the global warming rate. Just like I expected, it is huge – and comparable to some of my rougher estimates. Even though the global average yields an overall positive temperature trend – a warming – it is far from true that this warming trend appears everywhere.

In this sense, the warming recorded by the HadCRUT3 data is not global. Despite the fact that the average station records 77 years of the temperature history, 30% of the stations still manage to end up with a cooling trend. The warming at a given place is 0.75 plus minus 2.35 °C per century.

If the rate of the warming in the coming 77 years or so were analogous to the previous 77 years, a given place XY would still have a 30% probability that it will cool down – judging by the linear regression – in those future 77 years! However, it’s also conceivable that the noise is so substantial and the sensitivity is so low that once the weather stations add 100 years to their record, 70% of them will actually show a cooling trend.

Even if you imagine that the warming rate in the future will be 2 times faster than it was in the last 77 years (in average), it would still be true that in the next 40 years or so, i.e. by 2050, almost one third of the places on the globe will experience a cooling relatively to 2010 or 2011! So forget about the Age of Stupid doomsday scenario around 2055: it’s more likely than not that more than 25% of places will actually be cooler in 2055 than in 2010.

Isn’t it remarkable? There is nothing “global” about the warming we have seen in the recent century or so.

The warming vs cooling depends on the place (as well as the month, as I mentioned) and the warming places only have a 2-to-1 majority while the cooling places are a sizable minority. Of course, if you calculate the change of the global mean temperature, you get a positive sign – you had to get one of the signs because the exact zero result is infinitely unlikely. But the actual change of the global mean temperature in the last 77 years (in average) is so tiny that the place-dependent noise still safely beats the “global warming trend”, yielding an ambiguous sign of the temperature trend that depends on the place.

Imagine, just for the sake of the argument, that any change of the temperature (calculated as a trend from linear regression) is bad for every place on the globe. It’s not true but just imagine it. So it’s a good idea to reduce the temperature change between now and e.g. the year 2087.

Now, all places on the planet will pay billions for special projects to help to cool the globe. However, 30% of the places will find out in 2087 that they will have actually made the problem worse because they will get a cooling and they will have helped to make the cooling even worse! 😉

Because of this subtlety, it would be an obvious nonsense to try to cool the globe down even if the global warming mattered because it’s extremely far from certain that cooling is what you would need to regulate the temperature at a given place. The regional “noise” is far larger than the trend of the global average so every single place on the Earth can neglect the changes of the global mean temperature if they want to know the future change of their local temperature.

The temperature changes either fail to be global or they fail to be warming. There is no global warming – this term is just another name for a pile of feces.

And that’s the memo.

UPDATE:

EarlW writes in comments:

Luboš Motl has posted an update with new analysis over shorter timescales that is interesting. Also, he posts a correction showing that he calculated the RMS instead of Stand Dev for the error.

Wrong terminology in all figures for the standard deviation

Bill Zajc has discovered an error that affects all values of the standard deviation indicated in both articles. What I called “standard deviation” was actually the “root mean square”, RMS. If you want to calculate the actual value of SD, it is given by

SD2=RMS2−⟨TREND⟩2

In the worst cases, those with the highest ⟨TREND⟩/RMS, this corresponds to a nearly 10% error: for example, 2.35 drops to 2.2 °C / century or so. My sloppy calculation of the “standard deviation” was of course assuming that the distributions had a vanishing mean value, so it was a calculation of RMS.

The error of my “standard deviation” for the “very speedy warming” months is sometimes even somewhat larger than 10%. I don’t have the energy to redo all these calculations – it’s very time-consuming and CPU-time-consuming. Thanks to Bill.

http://motls.blogspot.com/2011/08/hadcrut3-31-of-stations-saw-cooling.html

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
111 Comments
Inline Feedbacks
View all comments
jaymam
August 4, 2011 5:08 pm

Missing value codes
Back in 1970 the specification for a program that I was told to write said to use a year of 99 to indicate end of file. I protested but was told that it was unlikely that the system would still be going in the year 1999.
“missing data – indicated by the figure -99”
I find it amazing that someone decided to use -99 for missing data when that value can occur on Earth, unlikely but it does happen.
There appears to be no standard for indicating missing data. If there are many possible “missing code values”, a program may miss one and include wildly spurious data into the climate record. That may not be noticed because certain scientists average a whole lot of figures together.
Here I have GISS data that uses 999.9, and NIWA that uses –
(i.e. minus blank, an excellent option)
HARRY_READ_ME mentions missing value codes of -9999 and -999 and -999.00 and -99999 and 999999 and 8888888 and -10 and -7777777
e.g.
“CHERRAPUNJI, the wettest place in the world. So here, the high values are realistic. However I did notice that the missing value code was -10 instead of -9999!”
“LoadCTS multiplies non-missing lons by 0.1, so they range from -18 to +18 with missing value codes passing through AS LONG AS THEY ARE -9999. If they are -999 they will be processed and become -99.9. “

Berényi Péter
August 4, 2011 5:17 pm

“I wanted to know what are the actual temperature trends recorded at all stations – i.e. what is the statistical distribution of these slopes. Shawn had this good idea to avoid the computation of temperature anomalies (i.e. subtraction of the seasonally varied “normal temperature”): one may calculate the trends for each of the 12 months separately”.
It always bothered me a bit that a certain peculiarity of the Gregorian calendar we are using is never taken into account when calculating century scale temperature trends for each month of the year separately.
The problem with the old Julian calendar was that long term average length of its annual cycle was slightly longer than the tropical year, so the vernal equinox (when night and day has the same length in spring) slowly wandered back in the calendar, at an average rate of about 0.78 day/century. This is why Pope Gregory XIII skipped 10 days in 1582 (October 4 was followed by October 15) and established a new rule for leap years saying years divisible by 100 but not by 400 are not leap years (even though they are divisible by 4). That brings the calendar much closer to reality in the long run.
However, year 2000 happened to be divisible by 400, so it was a leap year. It means for 199 years in a row (centered on 2000 AD) we are missing Gregorian correction and in this epoch (between 1900 and 2100 AD) our calendar works just like the old Julian one. It means vernal equinox shifts back in the calendar by about 1.5 days in two centuries (in a see-saw pattern due to slight overcorrection in every fourth year).
Now, few temperature time series go back to the 19th century (or earlier), therefore if you calculate monthly trends for each location, this shift can’t be ignored.
As much more land area is found in the Northern hemisphere and average density of stations is also higher there, it dominates the dataset in this respect. At places where temperature difference between winter and summer is high, average rate of warming during spring months (within the year) can be pretty high, sometimes as much as 1°C/day (but ~0.5°C/day quite often). It means for these months century scale rate of warming in our epoch is biased upward by several tenths of a degree due to the “Gregorian calendar effect”.
Of course it is just the opposite for autumn and it is entirely the other way around in the Southern hemisphere. But just remember how often one finds springtime warming rates highlighted in press releases (while ignoring fall) for locations in Europe or Northern America.
I believe this effect also explains the pattern in your “sketch of a basic table”, that is, rates for February-May being high while for September-October they are low.

sky
August 4, 2011 5:24 pm

Regressional trends fitted to data with strong oscillatory components are very sensitive to record-duration and start-stop times. Thus there’s no such thing as “the” trend at a particular station. The analysis might be improved by putting all stations on the same time interval. Also it would interesting to see a stratification according to population size. After all, outside the USA, Scandinavia and Australia, the GHCN data base is largely urban.

timetochooseagain
August 4, 2011 5:44 pm

bbbeard-It’s not clear to me how you handled the fact that there will be only about one fourth as many February 29ths as other days of the year. Here’s a thought, rank days within each year, and ask what the trends are in nth warmest or coldest day from year to year. Might give some idea about how the distribution of temperatures is changing.

Richard S Courtney
August 4, 2011 5:45 pm

Berényi Péter:
Thankyou for your post at August 4, 2011 at 5:17 pm. I had failed to recognise the matter but it is obvious to me now you have explained it: i.e.
“However, year 2000 happened to be divisible by 400, so it was a leap year. It means for 199 years in a row (centered on 2000 AD) we are missing Gregorian correction and in this epoch (between 1900 and 2100 AD) our calendar works just like the old Julian one. It means vernal equinox shifts back in the calendar by about 1.5 days in two centuries (in a see-saw pattern due to slight overcorrection in every fourth year).
Now, few temperature time series go back to the 19th century (or earlier), therefore if you calculate monthly trends for each location, this shift can’t be ignored.”
As you say, when comparing seasonal or monthly hemispheric averages “this shift can’t be ignored” and it may also bias global data because the hemisphers differ in their trends.
However, considering the wide spread in the data reported by Lubos Motl, the effect you report makes no difference to what he reports in his above essay.
Thankyou. I like to learn.
Richard

Allen63
August 4, 2011 5:48 pm

Good post. As others have pointed out, its questionable if temperature data genuinely says anything accurate about “global warming”. Still, there is a net trend.
I downloaded the data set. The data set seems to be monthly temperatures — probably daily or twice daily temperatures averaged by some means with missing days “filled in” by some means. That is, its manipulated data not raw data (no disparagement meant to the original post as the post is very interesting).
Is there a complete set of the true raw data (or a limited set — say, limited to USA)? That is the individual daily, twice daily, or whatever readings — uncorrected. Does such a thing even exist anymore — or in digital form — or are all existing data sets “manipulated”. Where could I download it (assuming its available)? Thanks.

timetochooseagain
August 4, 2011 5:50 pm

Berényi Péter-Actually there is at least one study that has examined the effect of “Gregorian calender bias”:
R. S. Cerveny, B. M. Svoma, R. C. Balling, and R. S. Vose (2008), Gregorian calendar bias in monthly temperature databases , Geophys. Res. Lett. , 35 , L19706, doi:10.1029/2008GL035209.

August 4, 2011 5:56 pm

The size of areas near the poles is distorted by the Mercator (I assume) projection. Can the map be re-done on an equal area projection, like maybe the Lambert cylindrical equal-area projection?

cagw_skeptic99
August 4, 2011 6:04 pm

Actually mod, the author’s name does not appear at the top of the post.

SteveSadlov
August 4, 2011 6:04 pm

Here at the leading edge of North America it is cooling. It is undeniable. May or may not mean anything. We’ll see.

August 4, 2011 6:31 pm

timetochooseagain:
I did a regression on the 15 average temperature days that were labeled Feb 29th; for every other date I had 59 data points. On general principles, having 1/4 as many data points means the standard error is twice as big. But when I did this I didn’t bother to estimate the standard error of the regressed slope. If I were to do this today I would update through 2010 and include the standard errors.
I was a little worried about the precessional effects that Berenyi Peter referred to, in terms of the leap-year correction. In successive years, Jan 1st, for example, falls 1/4 day “later” than the previous years, except following a leap year, when it backs up 3/4 of a day. The only way to handle this rigorously, I think, is to partition the set of dates using mod-4 arithmetic (Jan 1st of leap year (“4k”), Jan 1st after leap year (“4k+1”), then “4k+2”, and “4k+3”). But it seems to me that the graph I produced shows there is so little correlation between successive dates, in terms of their regressed slope, that this correction would add little to the analysis. At worst there is a tiny bit more uncertainty for each data point in the abscissa; partioning the dates mod 4 would double the uncertainty on the ordinate. I suppose you could argue that when I do the regression I should add 0.2422*(Year-1948)-floor((Year-1948)/4) to each date — but I doubt that would change the regressed slopes by more than a small fraction of their uncertainty. If anyone can provide a rigorous formulation I’d be glad to listen.

naturalclimate
August 4, 2011 6:32 pm

You know if you smooth out that data enough, you won’t have to use so many colors, and then you can just call it an uptrend and be done with it.

Bart
August 4, 2011 6:40 pm

Given the haphazard pattern of temperature readings, average temperature should be calculated by fitting the data to an expansion of spherical harmonics, like we do for Earth’s gravitational potential and other quantities.

DocMartyn
August 4, 2011 7:08 pm

Your Voronoi graph is quite simply the best visual representation of a complex dataset that I have ever seen; I say that as someone who had been in research for 20 years.
Quite brilliant, well done.

Richard Hill
August 4, 2011 7:12 pm

S Mosher said…
The real work is figuring out if there is anything that explains why some places cool while other places warm. Been looking for that since 2007 when I first did a version of the analysis presented here. or conversely why some warm more than others ( aside from polar amplification )

It is a brilliant graph. Lubos should get a prize.
If you narrow your eyes, can you see a trail of red up the Great Rift Valley of Africa and then spreading out over the seismically active areas left towards Italy and right to Iran?
I know humans can see a pattern where there is none, but Mosh’s comment on this would be valued.

timetochooseagain
August 4, 2011 7:14 pm

bbbeard-Thanks. I am currently look at the daily variations in some data and am needing to figure out how to deal with leap years. You’ve got some interesting suggestions there I hadn’t thought of.

Richard M
August 4, 2011 7:16 pm

Sounds like randomness in action. Should not be a surprise in a chaotic system. While we humans might think it should average out, the climate just plods along at its own pace. I’m not really sure there is much information to be mined here.

David Falkner
August 4, 2011 9:50 pm

Ok, I’ll finish reading in just a second. Just wanted to note for the record that the first graph makes my ocular faculties sweat like a whore in church.

David Falkner
August 4, 2011 10:11 pm

Wow, I wonder if Gavin et al would be interested in providing the possible physical reasons for 30% of the stations cooling? Especially since they seemed so dismissive of Essex et al.
http://www.realclimate.org/index.php/archives/2007/03/does-a-global-temperature-exist/
The whole paper is irrelevant in the context of a climate change because it missed a very central point. CO2 affects all surface temperatures on Earth, and in order to improve the signal-to-noise ratio, an ordinary arithmetic mean will enhance the common signal in all the measurements and suppress the internal variations which are spatially incoherent (e.g. not caused by CO2 or other external forcings).
The result that all temperatures on Earth are not equally affected doesn’t seem very suppressed. Or spatially incoherent. Ouch.

Berényi Péter
August 4, 2011 11:49 pm

timetochooseagain says:
August 4, 2011 at 5:50 pm
Berényi Péter-Actually there is at least one study that has examined the effect of “Gregorian calender bias”

Thank you for bringing it to my attention, I was not aware of it.
GEOPHYSICAL RESEARCH LETTERS, VOL. 35, L19706, 4 PP., 2008
doi:10.1029/2008GL035209
Gregorian calendar bias in monthly temperature databases
Randall S. Cerveny, Bohumil M. Svoma, Robert C. Balling Jr. & Russell S. Vose
Is there a copy not hiding behind a paywall?
BTW, I wonder if the BEST project would take this effect into account or ignore it as all other global temperature analyses do. They have quite a lot of daily temperature data in their dataset, so at least in theory, they could go for it. We will see, as all their data, algorithms and methods are supposed to be published online (sooner or later).
Also, in non leap years January-June is three days shorter (181 days) than July-December (184 days), so any positive warming bias that may be present in the first half of the year due to the calendar effect, gets more weight in the annual average if the correction is ignored. And then, as you say, there is an overall hemispheric difference in trends, that make things a bit worse.
Anyway, it would be nice to see this bias quantified properly.

Patrick Davis
August 5, 2011 1:56 am

Maybe slightly O/T, but Mombasa, Kenya today, 12c. One of the coldest days on record…so far.

Dr A Burns
August 5, 2011 3:13 am

So where did the IPCC get those very tight error bars ?

Ryan
August 5, 2011 3:32 am

I notice that the Himalayas don’t appear to be getting any warmer. I wonder why all those Himalayan glaciers are melting? Oh I forgot, they aren’t melting, somebody made that up.
I notice that most of the developed world producing all that CO2 don’t seem to be warming much at all. The most severe warming occurs where there aren’t many people and even fewer thermometers it seems.

August 5, 2011 3:56 am

Dear everyone, thanks for your kind words and interest!
Who prefers white background, gadget-free, simple-design blogs could bookmark the mobile version of my blog,
http://motls.blogspot.com/?m=1
Otherwise one must be careful about the interpretation of “error margins”. The standard deviation over 2 Celsius degrees is the error of the temperature trend “at a random place of the globe”. However, the error in the trend of the global mean temperature is much smaller than 2 degrees Celsius because a quantity obtained by averaging N quantities with the same error margin has a much smaller, sqrt(N) times, error margin.
So the increase of the global mean temperature over the last 100 years or so, if the local HadCRUT3 data are at least approximately right, is of course statistically significant. As the histograms shows, the expected (but not guaranteed) increase of the global mean temperature in the next century is just not too useful for predictions what will happen at any particular place of the globe – which is a largely uncorrelated question. Just like 30% of places were cooling in the last 77 years – in average – 30% of places may cool in the next 77 years.
There are many variations of the work one could do – computing area-weighted averages with gridding (or an equivalent of it); drawing the diagrams on the round surface of the globe rather than the simple linear longitude-latitude graph, and so on. But many calculations that I did in Mathematica are pretty time-consuming – when it comes both to the CPU time and the human time.
I would like to hope that someone will convert the data to his or her preferred format and continue to analyze the data in many other ways. There is a very little doubt that most of the actual analyses of the data from the whole globe is done by the people who call themselves “climate realists” or “skeptics”. But that doesn’t mean that there can’t be any crisper conclusions one may still extract from similar data.
Yours
LM

Ryan
August 5, 2011 4:02 am

By the way, I would point out that these graphs are not based on data that could remotely be called “raw”. They are based on the monthly averages that appear in the HadCRUT database.
The graphs shown here are misleading, because although they give you a distribution, it is a distribution of averages, not the distribution of the raw readings from the thermometers. In other words although the distributions show a s.d. which is several times larger than the difference of the mean to zero, this s.d. is not nearly as wide as it would be if the raw data had not been averaged first.
Let me put it another way – these graphs show the distribution of temperatures for the whole of July, i.e. with the effect of clouds and wind direction and wind speed kind of smoothed out in a statistically imappropriate way. If you had the raw data you could plot graphs of how the temperature trends are for say the 1st July. This would avoid doing any averaging and would then show that any trend present is in a much wider distribution of temperatures than even these graphs suggest. In other words you are looking for a trend due to CO2 underlying a considerable amount of noise – primarily caused by clouds and wind direction. Now you could say that averaging out is intended to filter out this noise – but it is not a statistically valid method of doing so. The whole of July could be cloudy, or the whole of July could be cloud free of it could alternate between cloudy and cloud-free days – in these cases averaging over a month would give very different results unrelated to CO2 because the impact of cloud is not actually being filtered out succesfully by averaging. You could say that averaging out over longer periods of time or over multiple sites removes this noise, but we can’t be sure of that because we can’t be sure that the amount of cloud is not varying over time.
Another source of noise is the quantisation noise introduced by being able to read mercury thermometers only accurate to 0.5Celsius. Averaging obscures this because if you divide 31 different temperature readings over a month by 31 then you will likely get a recurring decimal which will look like the accuracy is much greater 0 but it isn’t. Graphs of temperature should only allow discrete values rounded to the nearest 0.5Celsius for this reason.