Buoy Temperatures, First Cut

Guest Post by Willis Eschenbach

As many folks know, I’m a fan of good clear detailed data. I’ve been eyeing the buoy data from the National Data Buoy Center (NDBC) for a while. This is the data collected by a large number of buoys moored offshore all around the coast of the US. I like it because it is unaffected by location changes, time of observation, or Urban Heat Island effect, so there’s no need to “adjust” it. However, I haven’t had the patience to download and process it, because my preliminary investigation a while back revealed that there are a number of problems with the dataset. Here’s a photo of the nearest buoy to where I live. I’ve often seen it when I’ve been commercial fishing off the coast here from Bodega Bay or San Francisco … but that’s another story.

bodega bay buoy

And here’s the location of the buoy, it’s the large yellow diamond at the upper left:

bodega bay buoy location

The problems with the Bodega Bay buoy dataset, in no particular order, are:

One file for each year.

Duplicated lines in a number of the years.

 The number of variables changes in the middle of the dataset, in the middle of a year, adding a column to the record.

Time units change from hours to hours and minutes in the middle of the dataset, adding another column to the record.

But as the I Ching says, “Perseverance furthers.” I’ve finally been able to beat my way through all of the garbage and I’ve gotten a clean time series of the air temperatures at the Bodega Bay Buoy … here’s that record:

air temp bodega bay buoy

Must be some of that global warming I’ve been hearing about …

Note that there are several gaps in the data

Year 1986 1987 1988 1992 1997 1998 2002 2003 2011

Months  7    1    2    2    8    2    1    1    4

Now, after writing all of that, and putting it up in draft form and almost ready to hit the “Publish” button … I got to wondering if the Berkeley Earth folks used the buoy data. So I took a look, and to my surprise, they have data from no less than 145 of these buoys, including the Bodega Bay buoy … here is the Berkeley Earth Surface Temperature dataset for the Bodega Bay buoy:

berkeley earth bodega buoy raw

Now, there are some oddities about this record … first, although it is superficially quite similar to my analysis, a closer look reveals a variety of differences. Could be my error, wouldn’t be the first time … or perhaps they didn’t do as diligent a job as I did of removing duplicates and such. I don’t know the answer.

Next, they list a number of monthly results as being “Quality Control Fail” … I fear I don’t understand that, for a couple of reasons. First, the underlying dataset is not monthly data, or even daily data. It is hourly data … so while the odd hourly record might be wrong, how could a whole month fail quality control? And second, the data is already checked and quality controlled by the NDBC. So what is the basis for the Berkeley Earth claim of multiple failures of quality control on a monthly basis?

Moving on, below is what they say is the appropriate way to adjust the data … let me start by saying, whaa?!? Why on earth would they think that this data needs adjusting? I can find no indication that there has been any change in how the observations are taken, or the like. I see no conceivable reason to adjust it … but nooo, here’s their brilliant plan:

berkeley earth bodega bay adj

As you can see, once they “adjust” the station for their so-called “Estimated Station Mean Bias”, instead of a gradual cooling, there’s no trend in the data at all … shocking, I know.

One other oddity. There is a gap in their records in 1986-7, as well as in 2011 (see above), but they didn’t indicate a “record gap” (green triangle) as they did elsewhere … why not?

To me, all of this indicates a real problem with the Berkeley Earth computer program used to “adjust” the buoy data … which I assume is the same program used to “adjust” the land stations. Perhaps one of the Berkeley Earth folks would be kind enough to explain all of this …

w.

AS ALWAYS: If you disagree with someone, please QUOTE THE EXACT WORDS YOU DISAGREE WITH. That way, we can all understand your objection.

R DATA AND CODE: In a zipped file here. I’ve provided the data as an R “save” file. The code contains the lines to download the individual data files, but they’re remarked out since I’ve provided the cleaned-up data in R format.

BODEGA BAY BUOY NDBC DATA: The main page for the Bodega Bay buoy, station number 46013, is here. See the “Historical Data” link at the bottom for the data.

NDBC DATA DESCRIPTION: The NDBC description file is here.

 

Get notified when a new post is published.
Subscribe today!
0 0 votes
Article Rating
228 Comments
Inline Feedbacks
View all comments
Editor
November 29, 2014 2:48 am

It just shows that if you cherry-pick data you can have any conclusion that you want. This seems to be the ongoing theme of AGW.
PS What happened to all the heat that mysteriously (and contrary to the Laws of Thermodynamics) is alleged to have disappeared into the oceans?

Bill Illis
November 29, 2014 3:01 am

Berkeley’s algorithms are (by design or by accident) predisposed to find more downspike breakpoints than upspike breakpoints.
The downs are taken out, the ups are left in.
And the algorithms are finding down breakpoints in high quality trusted station data where none should be found at all. Amundsen-Scot station at the South Pole has the same problem as this buoy. The station is staffed by dozens of highly qualified scientists using the best equipment and best methods possible. Yet Berkeley finds more than 20 down breakpoints in this dataset and removes them. But they find not a single breakpoint on the high side. Sorry Berkeley should not be touching this data at all. Scientists working in in -70C temperatures would be very disheartened to know someone is just throwing out their hard work. I pointed out this particular station to Berkeley (Zeke) several months ago and they noted something was wrong and they would look at it. Nothing was fixed of course.
Berkely is predisposed to adjust the temperature record up.
I vote we throw out all of the adjustments, homogenizations from Berkely and the NCDC and the sea level satellites and all the others and just go back to the Raw temperatures. The Raw temperatures might not be perfect, but they are actually more likely to be closer to the real trends than the adjusted records are.

mpainter
Reply to  Bill Illis
November 29, 2014 6:02 am

Again, here is a problem that Steven Mosher can shed some light on, perhaps.

Reply to  Bill Illis
November 29, 2014 9:54 am

Exactly. It is station 166900. The 26 month quality control fails are to the ‘regional expection fIeld’–according to a rather nasty argument I had with Mosher about the BEST treatment. Well, the next nearest station is McMurdo base, which is 1300 km away amd 2700 meters lower on the coast.
See footnote 24 to essay When Data Isn’t in Blowing Smoke.

Reply to  Bill Illis
November 29, 2014 6:25 pm

I vote we have two records one without adjustments and one with the adjustments so that it’s easy to see the trend of the adjustments!

Oatley
November 29, 2014 3:35 am

The idea that Berkeley should adjust anything without an independent review is laughable on its face.

ferdberple
Reply to  Oatley
November 29, 2014 8:38 am

the simplest test for temperature adjustments is to count the number of high and low adjustments. Statistically they should even out due to chance, except for adjustments for population trends. if the pattern of adjustments does not match expectations, then likely the algorithms are wrong.
When you look at these Berkely results, as well the published adjustments for the other major temperature series, the adjustments are all contrary to expectations. this strongly suggests that the methodology currently being used to adjust the major temperature series is likely based on a fault shared algorithm.
Why does Berkely find more down breakpoints than up? Statistically this should no happen. Why do GISS adjustments show a positive trend, when adjustments for population should result in a negative trend? Statistically this should not happen.
This is really very basic quality control testing. If your adjustments don’t match expectations, then it is no good arguing they are correct because they conform to such and such theoretical method.
If your results work in theory but not in practice, then the theory is wrong. No amount of peer review can change this.

Hugh
November 29, 2014 3:58 am

[quote]As you can see, once they “adjust” the station for their so-called “Estimated Station Mean Bias”, instead of a gradual cooling, there’s no trend in the data at all … shocking, I know.[/quote]
I can’t see that. I see that there is no trend _drawn_, just three means supposedly matching three different device setups used to measure temps. Whether the setup has changed and where that would be documented, I don’t know. It is also quite unclear to a layman how that estimated mean bias is calculated.
If there is long gap in measurements, it is probably because the buoy broke and was fixed mush later. That may affect temperature readings. Without metainfo on why data is missing, it is not possible to do reliable long term analysis. IMO.

Hugh
Reply to  Hugh
November 29, 2014 4:01 am

As usual, add missing ‘a’s and ‘an’s where expected. Thanx.

ferdberple
Reply to  Hugh
November 29, 2014 8:49 am

I can’t see that.
=========
See Nicks reply above: “Where they detect a discontinuity, they treat as separate stations”
Because they treat the data as three separate stations, what appears convincingly to the eye as a trend when they are combined as a single station, disappears when they are treated as separate stations.
Combine that with an algorithm that finds more/bigger down-spikes than up-spikes, and you will end up splitting down-trends into separate stations with no trend, while leaving the up-trends as single stations.
After averaging this will have the effect of creating an uptrend where there was no uptrend in the raw data. and since the adjusted result matches expectations of rising temps due to rising CO2, the researchers don’t suspect their algorithm is faulty and don’t bother to check.
basic quality control is skipped because they get the answer they think is correct.

ferdberple
Reply to  ferdberple
November 29, 2014 8:54 am

statistically, if the data errors are random, there should be no statistical difference between up-spike (up breakpoint) and down-spike detection (down breakpoint) in the Berkley adjustments. If there is a statistical difference, then the errors are not random or the adjustments are faulty. however, if the errors are not random, then the Berkely algorithm will deliver faulty results. so in either case, if there is a statistical imbalance in the spike detection, the adjusted results will be faulty.

Hugh
Reply to  Willis Eschenbach
November 30, 2014 2:26 am

Isn’t the motivation in breaking the series that the measuring instruments do change in the course of time? There is a half life for any setup…
But still the the breakpoints in this case really-really come from an algorithm rather than some external metadata? How could we know if they don’t tell?
On the other hand, breaking at gaps randomly should not statistically cause any systematic trend. By choosing gaps to break at, you can fiddle with the trends whichway you prefer. Are you afraid that happened?
As long as the procedure is not open and public, it is little bit difficult to reproduce it and thus, also difficult to trust it.

Bruce Cobb
November 29, 2014 3:59 am

All in a day’s work for “science” in the service of an ideology. Lysenkoism, on a grand scale.

Reply to  Bruce Cobb
November 29, 2014 9:59 am

Lysenkoism practiced by warmunists. H/t to former Czech president Vaclav Klaus and his book Blue Planet in Green Chains.

sleepingbear dunes
November 29, 2014 4:28 am

Last week I looked at BEST data for trends in several Asian and European countries. Some numbers at first glance just didn’t make sense . I compared the trend since 1990 in Ukraine against Germany and there was a massive difference between the 2 countries while they are separated by only Poland. I wouldn’t expect a huge difference between say Pittsburgh and Kansas City. I think the holiday season is a good time to look more deeply into BEST.
Where is Mosher when we need him? This post showed me more clearly than any other what might be amiss in all that we depend on.

Editor
November 29, 2014 4:33 am

Good work Willis. I am shocked. It seems Berkeley have gone so far down the rabbit hole of developing solutions to problems in pursuit of scientific excellence that common sense checks have been forgotten.

November 29, 2014 4:33 am

Why do the 3 data sets/graphs shown look like they are from different Buoys? They don’t correlate to each other at all if you overlap them.
The highest temperature for graph 1 is 1983, graph 2 is 1992.5, graph 3 is 1998.
The lowest temperature for graph 1 is 1989, graph 2 is 2013, graph 3 is 2012.

Hugh
November 29, 2014 4:41 am

There is a long way from Hamburg to Kiev. Btw, I’m located exactly north from Kiev, and we have much milder climate, thanks to the Baltic sea and Atlantic influence. But we lack the heat of the Crimean summer. Hopefully we lack the war of Crimea as well.

Hugh
Reply to  Hugh
November 29, 2014 4:43 am

Comment meant to sleepingbear dunes, but misdirected thanks to tablet.

maccassar
Reply to  Hugh
November 29, 2014 5:52 am

Thanks for your reply. It may all be explained away but it just struck me that the difference in the upward trend of over 7 F degrees per century since 1990 between Ukraine and Germany seemed large. But there may be very reasonable explanations.

November 29, 2014 4:47 am

Of all the metrics associated with climate change, the only one that seems to not be adjusted, is the PSMSL tide gauges. Everything elapse seems to be suspect

November 29, 2014 4:50 am

Elapse = else

MikeUK
November 29, 2014 4:53 am

Looking at data from nearby buoys is probably the only way to check the validity of any adjustment.

John L.
November 29, 2014 4:55 am

Oh Mann! Haven’t we seen this “adjustment” thingy somewhere before?

November 29, 2014 5:12 am

… and again we see the climate “scientists” trying to make a weather monitoring and warning system into a climate data system. These buoys are designed to get real time data for marine weather warnings and forecasts and were the primary source of marine data before satellites. I’m sure their products have been modified over the years to match the more sophisticated mariner’s weather systems whose purpose is to keep marine traffic and coastal areas apprised of dangerous conditions. Absolute and unchanging data formatting’s not too important and long term climate archiving would be lower on the list than the format requirements for warning and forecast, but maybe now that we’re in the “information age” they can get data formats to some cleaned up standard.
These buoys, like anything else man drops into the ocean, are NOT 100% reliable. I occasionally look at their data during hurricanes (I’m in Florida, so it’s been a while) and it is not uncommon for them to lose data for one or more sensors, drop off the air or break loose from their moorings and go walkabout. (I think that’s where they got the idea for ARGO.) Some of these things are hundreds of miles offshore and, unlike NOAA’s best stations, are not “ideally sited” in a densely populated city at the junction of several 6 lane highways with a convenient asphalt parking lot around the station for maintenance vehicles. It can sometimes take months to repair these buoys, depending on the frequency of the maintenance boat’s service schedule. The same holds true for calibrations and long term accuracy.
I’m sure there’s very good data to be had from these buoys, but it was never their primary mission and one needs to remember that when wading into the data they’ve gathered.

JR
November 29, 2014 5:13 am

Thank you, Willis. I live on the Gulf Coast, and one of the things that always drives me a little crazy is that the figures that are given for tropical systems by NOAA are ALWAYS higher than are reported by the buoys. That is, when the location and wind speed of a tropical system are given, the wind speed measured at any of the buoys, even when they are virtually planted in the northeast quadrant of the tropical system, are lower by 20% to 30% or more routinely, measured in mph. This is especially true of marginal systems, which seems to me to indicate an exaggeration of the number and strength of tropical systems.

Reply to  JR
November 29, 2014 5:43 am

I’ve noticed that as well during tropical events.

Reply to  JR
November 29, 2014 9:17 am

Not at expert – but I don’t believe there is a conspiracy on this particular item. The hurricane hunter aircraft cannot fly on the deck. They always measure wind speed at altitude, where the wind speed is higher than at the sea surface where the buoys are located.
The also use drop sondes to gather information from their altitude to the sea surface. In my recollection, I believe I have even seen maps where the wind speed at the buoys is corrected upwards to reflect the maximum wind speed at altitude when the eyewall passed over the location of the buoy.
[This response is from a hurricane non-expert.]

Reply to  JR
November 29, 2014 11:44 am

Wind speeds reported by the National Hurricane Center are estimates of the highest 1-minute-average windspeed anywhere in the storm at 10 meters above the surface. And the windiest part of a hurricane is usually small, may miss every station, and the duration of the windiest part of a direct hit often, maybe usually, lasts less than half an hour. So, I consider it expected for the actual maximum 1-minute wind to be significantly greater than the highest hourly reading anywhere.

JR
Reply to  Donald L. Klipstein
November 29, 2014 12:50 pm

While what you say about the NHC is correct, the buoys in the Gulf have continuous wind readings, graphs and also measure the highest gusts in the measurement period. Consistently, the highest gust is significantly lower than the reported max 1-minute wind speed reported by the NHC.

November 29, 2014 5:30 am

Willis writes “It is hourly data … so while the odd hourly record might be wrong, how could a whole month fail quality control?”
If you lose a single hourly reading in a day’s recording, then you lose the ability to be sure of the maximum or minimum for that day. Sure, you can analyse the readings you do have and make a best guess but to be certain, a single lost hour “breaks” a day. So I expect you wouldn’t need to lose much data to put a month in question.

Steve from Rockwood
Reply to  TimTheToolMan
November 29, 2014 5:39 am

I would expect temperature change to be gradual over a 24 period. Interpolation of one hour within that day would make no difference. On the other hand if you assumed something went wrong with the sensor and sought to shift the mean of several months to years of data in order to correct for something, the source of which is not known – well that could produce some serious errors.
Reminds me of the movie “The Adjustment Bureau” where men run around trying to change the world to fit their idea of success.

Reply to  Steve from Rockwood
November 29, 2014 5:51 am

I would expect temperature change to be gradual over a 24 period.

What do they do with the data when a front goes by? T-storm cells can easily drop air temps 15°F in 10 or 15 minutes around their edges, outside the precipitation area… and is perfectly accurate data. Do they toss it? It underscores the difficulties of trying to monitor “climate” with data from a bunch of discrete sensors designed to monitor weather.

ferdberple
Reply to  TimTheToolMan
November 29, 2014 9:01 am

So I expect you wouldn’t need to lose much data to put a month in question
==========
however, if data losses are random, then no adjustments are required, as the positive and negative errors will balance over time.
in attempting to correct random errors you could actually reduce data quality, because it is very difficult to design algorithms that are truly random. In effect adjustments add a non-random error to a time series that has only random error, which means the errors will from then on not be expected to average out to zero. instead the adjustment errors will introduce bias.

ferdberple
Reply to  ferdberple
November 29, 2014 9:04 am

the classic example is on-line casino’s that use random number generators. because these are typically pseudo random, they have been exploited to generate casino losses.

Reply to  TimTheToolMan
November 29, 2014 1:13 pm

I’ve seen buoy air temperature data in the Arctic obviously drop a minus sign causing a 40° jump and fall over a 15 minute interval. It might be interesting to know how much it takes to knock out a whole day. The impression I got was the problem was a transmission problem rather than a sensor problem.

Steve from Rockwood
November 29, 2014 5:32 am

If the Pacific Ocean was cooling, would we even know?

November 29, 2014 5:34 am

Which part of “the heat is being sucked into the DEEP ocean” did you not understand?

Reply to  probono
November 29, 2014 5:52 am

Don’t you mean teleported?

Mike M
Reply to  probono
November 29, 2014 6:39 am

Yep, heat has been getting sucked into the deep ocean for millions of years and if it ever ‘decides’ to come back out – we’re all toast!

November 29, 2014 5:42 am

Willis writes “It is hourly data … so while the odd hourly record might be wrong, how could a whole month fail quality control?”
Another thought occurred to me…if there is “hydrogen hazard” associated with the buoys then they probably have a lead-acid based battery which would need replacing periodically and that could take weeks to happen. Especially in Winter which is why eyeballing the graph appears to show more “cold” lost data than warm…it’d be winter and even harder to change batteries. That would itself introduce a bias I would think.

Steve from Rockwood
Reply to  TimTheToolMan
November 29, 2014 9:27 am

I once helped design and build a calibrated temperature sensor (mainly on the software side). The device operated over a voltage range of 24-28 VDC. As the voltage dropped in the main battery the current draw increased slightly to keep the operating voltage within its correct range. The device would operate down to 17 VDC but much below 24 you could see a change in temperature that correlated to the voltage drop. We also measured and recorded voltage and warned the users not to rely on temperatures acquired outside the operating range of the device. This is trivial with today’s electronics. Even a simple 16-bit A/D can measure temperature and voltage to a few decimal places and store years worth of measurements in RAM.

November 29, 2014 6:52 am

Reblogged this on Centinel2012 and commented:
I will take you take on the data set over theirs any day even though I don’t know you. I have seen so much data tampering from the various government agencies that I can’t believe much of anything they publish in climate work or economics, which I also follow.

steverichards1984
November 29, 2014 7:02 am

Moored buoys do have a hard life.
Here: http://www.ndbc.noaa.gov/mooredbuoy.shtml ndbc describe the buoy types and
Here: http://www.ndbc.noaa.gov/wstat.shtml they document faults. You can see when sensors etc fail.
The quality control rules for data from these buoys is contained in:http://www.ndbc.noaa.gov/NDBCHandbookofAutomatedDataQualityControl2009.pdf
and they do have a section on how they allow for changes when a front passes over but still flag values significantly out of range.
The QA manual above, does not help in the discussion of why BEST then trash the data, once received.
One concern they have noted:
“Air temperature measurements (ATMP1, ATMP2) are generally very reliable; however,
it is important to note that the physical position of temperature sensors can adversely
affect measurements. Air temperature housings can lead to non-representative readings in
low wind conditions. Air temperature is sampled at a rate of 1Hz during the sampling
period. “

November 29, 2014 7:06 am

Simple Willis.
The station is treated as if it were a land station. Which means it’s going to be very wrong.
That’s why when you do regional or local work with the
Data you don’t use these stations.
Finally one last time.
The data isn’t adjusted.
It’s a regression. The model creates fitted values.
Fitted values differ from the actual data. Durrr
Lots of people like to call these fitted values adjusted data.
Think through it

Reply to  Steven Mosher
November 29, 2014 8:49 am

“Fitted values”
And the tailor always has “a ceartain flair”

Sciguy54
Reply to  Steven Mosher
November 29, 2014 8:51 am

“It’s a regression. The model creates fitted values.
Fitted values differ from the actual data. Durrr”
I don’t know, Steven…. “fitting” properly-collected and accurate data because it lies outside of some pre-determined range….. it causes discomfort to many honest observers. Doesn’t matter if the variation was due to a passing thunderstorm or katabatic winds…. it was legitimate, it existed, and it reflected a piece of the atmospheric energy balance puzzle at a given moment. At lot of folks will always have valid arguments for not smoothing these kinds of observations.

Steve from Rockwood
Reply to  Steven Mosher
November 29, 2014 9:31 am

When you have a near continuous time series you don’t need regression fitting. Simple interpolation works just fine.

ferdberple
Reply to  Steven Mosher
November 29, 2014 12:14 pm

Fitted values differ from the actual data. Durrr
==============
isn’t “differ from the actual data” the definition of adjusted?
or are you arguing that changing a value by “fitting” so that the differ from the actual data in no way involves any adjustment?
a rose by any other name…

DHF
Reply to  Steven Mosher
November 29, 2014 1:48 pm

Steven Mosher
Let me first express that I appreciate that you follow this weblog and put in your comments. I regard it as very important that you, as a representative for the BEST temperature data product, participate in the discussions.
I am also happy to see that you write full sentences in your reply. I would wish however that you make some effort to put forward more complete arguments.
The intention of this reply is to try to explain why I think that your comments cannot be regarded to contain complete or proper arguments:
“The station is treated as if it were a land station. Which means it’s going to be very wrong.”
The first sentence does not say who is treating the station as a land station. Is it the BEST data product or is it Willis Echenbach? As far as I can tell Willis Echenbach is not treating the data series as anything more than a temperature data series. Is it the BEST model that treat the station as a land station?
The second sentence does not give me any clue why «it» is going to be very wrong. It does not say what «it» is. How can a series of valid temperature measurements be wrong?
“That’s why when you do regional or local work with the Data you don’t use these stations.”
Does this sentence mean that BEST does not use this station? It is very clear form your record, as presented by Willis, that BEST make adjustments to this data series. Why do you perform adjustments to the data series if you do not use it? This really does not make any sense to me.
“Finally one last time. The data isn’t adjusted. It’s a regression. ”
This is what Wikipedia has to say about regression.
“In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or ‘criterion variable’) changes when any one of the independent variables is varied, while the other independent variables are held fixed.»
Hence, It seems reasonable to say that in regression analysis, you do not perform any adjustment to your measurements. A regression is supposed to find a relationship between independent variables and dependent variable. Willis put forward example that shows that you perform adjustment to the data”
“The model creates fitted values. Fitted values differ from the actual data.»
Finally, I find a sentence that make sense. The sentences seems to mean that the BEST data product creates fitted values, and that fitted values differ from actual data. This seems to be exactly what Willis has pointed out. Your fitted data differ from the measured data and, as I understand it, he can see no reason why real data should be replaced by an estimate unless you have identified a specific error, and can provide a better estimate for the measurand than the measurement.
“Lots of people like to call these fitted values adjusted data.»
To me, this seems to be a correct observation. I will regard anything else than a measured value to be an adjusted value.
“Think through it»
You can regard me as one who has thought about it, and I am not able to make any sense of what you write.
In «About us» at your web site I find the following:
“Berkeley Earth systematically addressed the five major concerns that global warming skeptics had identified, and did so in a systematic and objective manner. The first four were potential biases from data selection, data adjustment, poor station quality, and the urban heat island effect.”
Willis put forward an proper example where it seems that BEST perform data adjustments which does not seem to be not justified. To me, this seems to be a serious observation that deserve a serious reply.
I wish that you could put some more effort into your reply, and make sure that you use full sentences and put forward proper arguments. I think the BEST temperature data product would be better represented if you take your time to formulate proper and complete arguments.

mpainter
Reply to  DHF
November 29, 2014 3:32 pm

DHF,
Yours is a thoughtful and very worthwhile comment. I hope that Mosher sees it.

Reply to  DHF
November 29, 2014 6:31 pm

+1

Reply to  Steven Mosher
November 29, 2014 3:24 pm

Well in my opinion these are precisely the kinds of stations that SHOULD be used for regional work.

Don K
Reply to  Terry
November 30, 2014 7:12 am

Terry,
I’m not remotely an expert in this, but I’m GUESSING that temperature data from a buoy is going to be heavily influenced by water surface temperatures and thus is likely to be cooler in the afternoon, warmer at night and less influenced by cloud cover than a nearby station on land. If someone tells you different, you should probably believe them, not me.

Reply to  Terry
November 30, 2014 12:42 pm

Yes.
To give you a short hand explaination.
T = C +W
That is in the berkeley approach we dont average temperatures. We create a model that PREDICTS temperatures. T = C + W
People continue to misunderstand this cause they dont look at the math.
T= C+ W. We decompose the temperature into a climate portion ( C) and a W portion. (W) The climate portion is estimated by creating a regression equations C = f(lat,elevation) in other words the climate for all locations is estimated as a function of that locations latitude and elevation. on a global basis this regression explains over 80% of the variation.. That is, 80% of a places temperature is determined by its latitude and altitude. Now of course you will always be able to find local places where 80% is not explained. Here are some other factors : distance from a body of water, geography conducive to cold air drainage. land cover, These are no present in the global regression, BUT if your goal was a REGIONAL ACCURACY in the 10s of kilometers range, then you might add these factors to the regression. Continuing. With 80% of the variation explained by latitude and elevation, the remaining 20% is assigned to weather. So you have a climate field which is determined soley by the latitude and elevation ( by season of course) and you have a RESIDUAL that is assigned to the Weather feild. Or W. Now in actuality because the regression isnt perfect, we know that the W feild can contain some factors that should be in the Climate.. for example, cold air draining areas will have structure in their residuals. The regression model will have more error on thse locations than others. In addition the residuals near coasts will have more error in the climate field. So, if you want to get the local detail ( at a scale below 50km say ) expressed more perfectly, then you need variables to the climate regression. We are currently working on several improvements to handle the pathological cases: cases near water, case in geographical areas that experience cold air drainage. And adding land cover regressors although these have to be indexed over time. On a global scale we know that Improving the local detail doesnt change the global answer. In other words the local detail can be wrong, but fixing it doesnt change the global answer. This is clear if you look at distance from coast. When you add that to the regression some local detail will change but the overall regression ( r^2) doesnt change. Some places in the climate get a little warmer and others get a little cooler..
Once folks understand that the approach is a prediction at its heart a regression based prediction, then a few things become clear. 1) the actual local data is always going to vary from the predicted local value. The predicted local value ( the fitted values of the regression) are not ‘adjusted data” As the documentation explains these values are what we would expect to have measured if the site behaved as the regression predicted. 2) if you are really interested in a small area, then DONT use data that predicted form a global model. Take the raw data we provide and do a local approach. For examples you can look at various states that use krigging to estimate their temperatures. At smaller scales you have the option of going down to 1km or higher resolutions required to get cold air drainage right. Further you can actually look at individual sites and decide how to treat bouys for example. We treat them as land stations. That means they will be horribly wrong in some cases when you look at the fitted values. Why? because the fitting equation asumes they are over land! If you are doing a local data set then you would decide how you wanted to handle them. In the future I would hope to improve the treatment of bouys by adding a land class to the regression and if that doesnt add anything then they would get dropped from land stations and put into a marine air temp database.

Bill Illis
Reply to  Steven Mosher
November 29, 2014 3:31 pm

Steven Mosher, for about the 8th time now, I am asking for a distribution of the detected breakpoints:
– the number of breakpoints that that are detected as higher than the regional expectation and then how many are lower than the regional expectation;
– more importantly, the distribution of the same through time (by month or year) of the higher breakpoints detected and the lower than regional expectation breakpoints.
– it would be nice to know how much each affected the trend over time as well but maybe that is asking for too much computing resources).
The point being are there more breakpoints on the downside detected than the upside and has that changed through time – we should expect exactly 50:50 every single months throughout the entire record if the algorithm was working properly – it is described as something that should be very close to completely random (to answer Willis’ question about a citation for my statements, it would sure be nice and many would have expected that there would be data available showing this that could be cited. I don’t know how you present temperature data in the manner that BEST has done without showing this important point – I’ve asked about 7 times for this information before today).

Steve Fitzpatrick
November 29, 2014 7:17 am

Steve Mosher,
The question is if this (and other) ocean station data are included in the land historical trend or not. Clearly (as you say) they are very wrong for land data, and so should not be included in land trends… but are they?

Steve from Rockwood
Reply to  Willis Eschenbach
November 29, 2014 9:34 am

Could be something as simple as formatting errors in the original data set with no human looking at the processed data records.

Reply to  Willis Eschenbach
November 29, 2014 12:59 pm

Like a lot of Climate Science, they overestimate their ability to interpret the data.

November 29, 2014 8:51 am

Willis: As usual…I’m afraid I had a “bias” myself, when this Berkley effort was proposed: I.e., that somehow, magically, they’d get the “result” they wanted.
As Dilbert always says to Dogbert, “You are an evil little dog!” (Image: Tail WAG!). Using raw data, not manipulating at all….and “wallah”. 34 years of several stations, which not only do not show the alledged upward (land based) trend, but almost completely opposite.
Sun, cloud cover, weather patterns…NORMAL VARIATIONS account for everything. The thinning at the North pole, MATCHED by ice growth at the south pole (shelf and THICK!)
Balance is maintained.
And the AWG claim, is again unraveling as either “narrow vision” (i.e. select years, or manipulated data)…or
anecdotal, with no consideration of “world wide” scoping.