DO-IT-YOURSELF TEMPERATURE RECONSTRUCTION

Guest essay by Dr Michael Chase

SCOPE

This article describes a simple but effective procedure for regional average temperature reconstruction, a procedure that you, yes you dear reader, can fully understand and, if you have some elementary programming skills, can implement.

To aid readability, and to avoid the risk of getting it wrong, no attempt is made in the article to give proper attribution to previous work of others, but a link is provided at the end to where a list of references can be found.

INPUTS/OUTPUTS

The inputs are records of raw monthly average surface air temperature for a region of interest, plus any station history information (metadata) that is available. Monthly rainfall totals are sometimes helpful in the analysis of temperature changes.

The outputs are, separately for each month, regional averages of two quantities:

· A: The variations of “typical” (moving-average) temperature, relative to an arbitrary reference year. Moving averages are typically over around 11-15 years.

· B: The fluctuations of temperature relative to A

The two outputs are usually plotted together as A+B (temperature variations) and A (moving average variations). Note that there is no concept here of a regional average absolute temperature.

There is some arbitrariness in the definition of moving averages A, and thereby in the definition of B (= RAW – A), but this issue goes away in A+B.

PROCEDURE OUTLINE

The procedure does a democratic average across stations of temperature changes and deviations, excluding periods deemed by the analyst to be anomalous, due to a variety of causes such as station moves, equipment changes and observer errors.

Initially, when presented with raw data with no indication of anomalous periods, the procedure simply produces an average over entire station records, which acts as a reference for the visual detection/confirmation of station inhomogeneities. Each station record is analysed in turn, looking for (or confirming metadata indications of) periods of anomalous temperature change, and any such periods are marked in files, which cause the software to exclude them from subsequent recalculations of the regional averages. When all stations have been done the final output is an estimation of the true regional averages, assuming that there were no systematic weather station changes, which must be dealt with by additional (bolted-on) processing.

KEY CONCEPTS

The procedure is described with the aid of a set of actual software outputs for a set of synthetic data inputs. The synthetic station records all have the same base “moving-averages”, to which is added common sets of temperature fluctuations (referred to as “weather” in the figure titles). Each station record usually also has its own uncorrelated set of random deviations. One synthetic input has a persistent step changes in temperature, and a large transient perturbation.

The following figure, showing monthly data, illustrates the averaging methods used to obtain the two outputs of the procedure:

clip_image002

Regional average temperature fluctuations (output B) are estimated by median averaging across all stations with valid moving averages, the median giving resilience to data errors and local weather extremes.

Besides being one of the desired outputs (output B) the regional average temperature fluctuations play a key role in detecting/confirming inhomogeneities: subtracting the fluctuations from station raw data enhances the signal-to-noise ratio.

The regional-average moving averages are obtained by the “First Difference” (FD) method applied to the valid periods of station moving averages. Inhomogeneities are excluded simply by making them produce periods of invalid moving averages, effectively chopping station records into separate segments. The FD method forms inter-annual temperature differences, averages them across stations, then integrates those average differences forwards and backwards in time from an arbitrary reference year.

Note in the figure above that moving averages continue right up to the boundaries, a feature that allows all data to be used, there is more on this feature below.

The figure above does not show any sources of error, such as from non-climatic temperature shifts caused by station moves and equipment changes. The following figure shows the main data display used in the visual detection/confirmation of anomalous temperature changes:

clip_image004

The figure above shows 12-month moving averages of regional-weather-corrected temperature variations (RAW – B) for a station (in blue), together with similar data for station and regional (red) moving averages. There is a step change in temperature around year 50, but the exact date of the step is unclear.

The following figure illustrates the method used to estimate the month of step changes in temperature, it is essentially the multi-month version of the figure above:

clip_image006

The monthly data figure above reveals that the step change occurred early in January of year 50, with all data before then being “high”, and data after being “low”. The figure above was produced with the step change marked in a file, which caused the software to break the moving averages (the red curves) at that point, thereby preventing the step change from distorting the regional averages. The data display above is also used to check for inhomogeneities with a strong seasonality, sometimes step changes are only visible in a few months of the data.

So far we appear to be having an almost free algorithmic lunch, with no mention of auto-detection or of temperature adjustments, a one-size-fits-all reference (the regional averages), and no need for correlation and for the calculation of temperature offsets to allow different station records to be averaged. Now is the time to mention the downside with the First Difference method of averaging temperature records: end-point errors associated with truncated transient perturbations.

Transient perturbations that both start and stop within a segment of station data are not much of a problem. Even if they survive the outlier-resistant moving average estimation algorithm, they will have matching temperature up and down shifts, so they have almost no net impact on trends, but not exactly zero impact due to the time varying weights applied to stations in the regional average. A potentially large problem with transient perturbations arises when they occur at station record start/stop times, internal boundaries caused by periods of missing data, and boundaries created when records are chopped into separate segments. When a transient perturbation is truncated by a boundary it will no longer have matching up and down shifts, so may produce trend distortion.

The following figure illustrates the resilience of the procedure to truncated transient perturbations:

clip_image008

The blue and purple data are identical leading up to year 50, but the blue data has a transient perturbation just before its step change then. The transition defined for the blue data at year 50 creates a truncated perturbation, which would have distorted the regional average trend if the First Difference method had been applied to unsmoothed temperature data. The procedure avoids major trend distortion in this example by the smoothing inherent in moving averages, and by the use of extrapolated station data derived from the regional moving averages. The extrapolated data allows moving averages to be computed more accurately right up to boundaries, with greater resilience to transient perturbations.

QUALITY CONTROL

In general there is no need for the extensive Quality Control adjustments that feature in some methods, but in some cases a small amount is beneficial when dealing with sparse data periods. Data deemed to be invalid, and which might lead to errors if left in place, can be set manually to NaN (Not a Number). The NaNs created, and many present originally, are auto-infilled using valid data either side, and the latest estimate of the regional average weather fluctuations.

MORE INFORMATION

There is a dedicated website for this procedure, providing more information on the algorithms and software, together with real data examples:

https://diymetanalysis.wordpress.com

The website also provides references for the original First Difference method, which was just a data averaging method, and for its use in conjunction with the removal of periods of anomalous temperature change.

BIOGRAPHICAL INFORMATION

Dr Michael Chase has a PhD and several years of postdoctoral research experience in theoretical physics. He also has around 30 years of experience in developing signal processing algorithms for acoustic sensor systems.

Advertisements

125 thoughts on “DO-IT-YOURSELF TEMPERATURE RECONSTRUCTION

  1. I no longer trust the official stations. My winter low temperature has been over 2° colder, on average, than the official low temperature.

    • About 10 years ago when noticeable global cooling seemed to be happening, I did an experiment that I wish I had kept the results of. Our main weather source had bought into CAGW early and I began to notice that their 14 day weather forecasts tended to predict higher temperatures by the end of the period than what resulted. I jotted down notes in front of the TV and analyzed my results and used them to make my own temperature predictions 2 weeks into the future by simply lopping off average differences. I did this for a couple of months and then intermittently over about half a year. As I recall, about 65% of the time, my forecasts were more accurate than the official. Apparently, the met folks have also fiddled the “feels like” temperatures of the summer months upward – I haven’t checked this with changes in the methods they use, but I do notice that they talk more about feels like temperatures than actual these days. I wonder if they underplay the “feels like” on the cold side (what I as an old Manitoban used to call the “Wimp Chill” factor).

  2. The only thing stopping me producing my own global average temperature is a lack of access to the data – not someone telling me yet another way to average what I can’t get hold of.
    If we can get the global data – please tell me where.

    • “If we can get the global data”
      and NOT the stuff that has already been through one of the “adjustment” processes. !!
      One then need to look at the quality of the data. ..
      A lot of surface temperature data is just monumental CARP with massive non-climate influences.. !!
      It just gives the “adjusters” more room to “adjust”

      • using the metadata of CRN as a filter i classified all stations. if they matched crn in…population distance from airport, built human surfaces and distance from the nearest city ..then i classified them as crn-like.
        found 15000 stations that matched.

      • we can test the crappiness.
        take crn
        take the crappy stations
        compare
        adust
        compare again.
        crn runs a bit hotter.
        after adjustment they match

      • I don’t give a stuff what you have pretended to do Mosh.
        You have proven yourself NO not fit for the purpose.

      • Sorry Andy.
        Facts are facts.
        Further. I never touch booze.
        I’m sure the moderators will inform you of the policies about lying about folks

      • Read your posts mosh. (Stop with the attacks on the person!) MOD
        (You are wasting my time once again to snip your comments, keep doing this will get you BACK on moderation) MOD

      • How many times did you use the word “adjust”, mosh
        So much scope for “adjustment” 😉
        Best that you just stop posting, you are giving the game away. 🙂

        • Actually AndyG55, it will be YOU who be in the situation of “Best that you just stop posting…” because I’m getting complaints across the board about you. from both sides, from people in the middle, and from my moderation staff.
          You’ve been warned repeatedly. So, as of this moment, you are on double non-secret probation. All of your comments now go directly to the bit bucket.

    • global data has been freely available for over a decade.
      i use ghcn daily raw.
      berkeley earth has this plus thousands of other raw stations

    • Go to the NOAA site. Hunt around for the Global Historical Climate Network data set. That’ll be the land surface station data, i believe. You’ll want the raw daily data. Not sure why anyone would want the monthly averages if starting truly from scratch (as available at least). The file size is massive though.
      Alternatively, you can head to Tony Heller’s website and download his program. He posted a YouTube video on how to use the program. It should allow you to generate plots of the raw data.

    • GHCN M v3 unadjusted is here (all years). It’s about 12 MB to download.
      ERSST V5 is here. There is an ascii 1.1MB file for each year. Else you can get ncdf, with compression – 132 kB per year.

      • Nick did you ever come up with the pictures and local history of those 6 stations I asked you for ??
        NOPE. !!
        You ran away and said IT DIDN’T MATTER.

      • Evasion yet again, hey Nick.
        To be expected.
        You have just confirmed that you have NO IDEA of the quality of the data you are using, and DON’T CARE
        Thanks for that. 🙂
        TGI – TGO.

      • Pretty interesting the level of stuff Andy gets away with in the echo chamber.
        CTM…would never allow this
        Reply: Sorry Mosh, you know as well as anyone I’m not around much anymore. Looks like Anthony’s dealt with it in my absence `ctm

      • You can look back an confirm that Nick those exact words.
        He does not care about data quality. End of story.

      • Well, I’ve been away from moderation for the last 24 hours, and I agree. AndyG55 stepped over the line one too many times, and now is on permanent moderation for being disruptive and attacking people personally.

      • GHCN M v3 unadjusted is here (all years).

        That is a priceless oxymoron! By resorting to peculiar splicing of UHI-corrupted records (e.g., Cape Town) and other data changes to effect regional “homogenization,” GHCN v.3 is very far removed from sensible, rigorous treatment of available data. Contrary to the impression created here, there is no scientifically acceptable substitute for long, intact, uncorrupted records of actual temperature measurements.

      • “long, intact, uncorrupted records of actual temperature measurements”
        You need to read more carefully. I linked to GHCN V3 monthly unadjusted. And that is exactly what they provide.

      • “Here’s the example of GHCN V.2 data”
        No, it isn’t. You are not going to the original. The GISS interface allows various processing options. You appear to have chosen the option that says “after combining sources at the same location”, which means in this case that GISS combined the records for Observatory and Airport. GHCN V2 in such cases provided what it called duplicates. It gave each record independently, even though in many cases they were just copies of the same readings, where for example they had a local log and a head office copy. But it also included cases like Cape Town with two stations having the same WMO number. They added a suffix.
        This was done basically because they hadn’t had time to do a thorough sorting to resolve what were clerical duplicates and real location differences. That came with v3.
        If you click on the raw data option, you’ll see six of those duplicates. The main one, corresponding to what is now V3, is here.

      • You are not going to the original. The GISS interface allows various processing options. You appear to have chosen the option that says “after combining sources at the same location”, which means in this case that GISS combined the records for Observatory and Airport.

        While the V.2 I chose doesn’t present the discernible “original” data for Observatory and Airport, it does provide a quite seamless SPLICE of the two records, professionally done, IIRC, by the SA Weather Service. It is a far more reasonable treatment of historical temperature series in Capetown than the blindly spliced original data, which you show from newly-available V. 3.2 [sic!]. Moreover, it certainly is much more credible in it’s multidecadal behavior than the arbitrarily HOMOGENIZED series presented as V.3 GISTEMP station data, which perpetrates the fiction of a strong, persistent linear trend:
        https://data.giss.nasa.gov/cgi-bin/gistemp/show_station.cgi?id=141688160000&dt=1&ds=12

  3. Come on, if it is so easy, just give me the answer.
    Might need to remind me of the question however 🙂

  4. Please Anthony, send this as a Tweet to Mike Mann. Please. His reaction would be hilarious.
    I’ve already done a very simplistic temperature graph for the town near where I live that has records back to the early 1900s. Very simplistic. Just used highest and lowest recorded temperatures for each year.
    Highest temperatures have declined overall, while the lowest have increased (UHI). I wasn’t shocked.

    • its called global warming for a reason.
      think of it like the pay gap question
      if i told you i made more than most women would you conclude there was a pay gap.
      nope.

      • oh I thought it was now called climate change or global weirding or climate disruption or
        think of it like the unicorn question
        if you told me you made more than most women i wouldn’t believe you
        yes.
        by the way I did my local weather station incidentally to something else i needed weather data for
        and honest
        just like your BEST boss i used to be a skeptic but now i’m not
        now i know all the adjusted numbers are junk and they still don’t show anything unusual

      • What childish and irrelevant analogy.
        You really don’t have a clue what you are talking about, do you, Mosh

      • Actually it is really a Northern Hemisphere warming since nearly ZERO warming in the Southern Hemisphere is what you can get without the error bars.

      • And most remaining NH real data shows that the late 1930s, early 1940s was not a dissimilar temperature to now.
        That has all been REMOVED by the left-wing totalitarian AGW agenda, that people like Mosh and Nick represent/support..
        The TRUTH is irrelevant to them so long as they can bend the data how they need it.

      • Global Warming became Climate Change because the “Big Lie” could not handle the Pause.
        Have another drink Mosh.
        But sober-up before you post again.
        Cheers.

      • I find Mosher’s posts entertaining. The lack of capital letters, fragmented sentences and hidden meaning are like e.e. cummings poems, all dedicated to CAGW. A climate poet, arranging and adjusting lines and words to tell his story, just like back in the lab – though there are more pauses in his posts than in his climate story.
        And these poems are delivered Eric Cartman style, with complete and total authoritaayy, and that’s fun.
        So I look forward to these mystical messages from the Oracle of BEST.

      • Gosh Anthony weird your mods allow untrue personal attacks.
        (Not really, sometimes they are not seen for a while, you are correct that you are being picked on inappropriately) MOD

    • That is exactly what you see everywhere there are still records. Max going slightly down, mins going slightly more up. Outbreak of mildness.
      We are clearly all doomed!

      • Doomed is certainly the message from the media but actually the true message is that the winter warming is, in the case of the UK at least , a benefit , whether the change is natural , human – caused , or as the UK Met office believes a mixture of the 2.
        How often recently have we had references to serious , peer reviewed, papers that point out that cold kills more than heat and that even in the case of extreme forcing the rise in heat deaths globally is more than compensated by the drop in mortality from the cold (speaking objectively , subjectively any person’s death from lack of air conditioning, from dehydration or from inadequate heating is to be deplored ).
        If the result of AGW is, as many personal experiences suggest, a reduction in the severity of the cold weather the human benefits are enormous . Not least in the UK where this unusually cold winter (and it appears to be about to return) has caused a severe problem in the hospitals and surgeries and when the statistics are released will probably show an excess of mortality , particularly amongst the most vulnerable, over the expected figures.
        In a modern society, and one that also pretends to retain elements of Christian belief , one would expect the Govt to want to protect the most vulnerable as far as economically practicable . From a UK perspective wasting money on subsidies to renewables to prevent extremes of summer heat that, in the UK, never happens, is surely not justifiable by any code of moral conduct and by consideration of the scientific evidence.
        The counter argument of course is that UK is only part of the global environment and in other regions the
        benefit of milder winters and unchanged summers may not be the case. I would expect the UK experience however to be valid for much of the temperate zones.

  5. Fascinating, however, I must admit my DIY skills are limited to Home Construction, and Automotive repair. In fact, I just finished replacing the water pump in my 2011 BMW 335i Conv. The convertible adds one more layer of difficulty in that you have to remove a pair of steel diagonal braces necessary to stabilize the convertible frame. Only then, do you gain “access” to the ridiculously inaccessible electric water pump. I am certain that the BMW Factory Robots had a much easier time putting the original on the engine block.

    • The average time to replace a water pump is 2.3789 hours. This is a global average of all known vehicle makes and ages. Interestingly it has increased by 0.023 hours per annum over the last 40 years illustrating your point about increased complexity in modern automobile engineering. Recent data suggest that the water pump time anomaly is increasing at a faster rate and our models predict that changing a water pump in the year 2250 could typically take 2 whole days.

      • That’s the “tipping point”! After that it’s too late and the water pump can’t be changed within the lifetime of the vehicle.

      • This of course shows the ridiculousness of using averages. In the case of my first car, the time to replace the water pump was either zero or infinite using any time units selected. And yes the car was water cooled.
        There was of course no water pump but circulation was by thermo-siphon. The infinite time to replace was the time taken looking for a non-existent pump.

      • mice can breed a new generation every 16 days.
        in a little over a year, this will cover the planet 20 ft deep in mice
        fear the global mousing!

  6. “The inputs are records of raw monthly average surface air temperature…”
    If the original temperatures are diurnal, then the monthly averages are NOT raw. They have been filtered.

  7. Dr Michael Chase … has around 30 years of experience in developing signal processing algorithms for acoustic sensor systems.

    That means he actually knows what he’s doing. That particular application works or does not work and its performance is easily measured. That’s a welcome contrast with a lot of scientific work. Folks throw around great mounds of mathematics while exhibiting obvious misunderstandings of basic concepts. The following quote amuses me:

    Statistics are not more accurate when they are improperly used … link

    Some folks seem to think that more math equals more credibility. Hey dudes; GIGO!
    This comment started out celebrating the fact that Dr. Chase demonstrably knows what he’s doing. It evolved into a rant about something that makes me really grumpy. Sigh.

  8. “This article describes a simple but effective procedure for regional average temperature reconstruction, a procedure that you, yes you dear reader, can fully understand and, if you have some elementary programming skills, can implement.”
    “regional” what is meant by that?

  9. l have a 40 year recording of the date of the first snow for my local area.
    Here is the raw data.
    77/78 21st Nov
    78/79 27th Nov
    79/80 19th Dec
    80/81 28th Nov
    81/82 8th Dec
    82/83 16th Dec
    83/84 11th Dec
    84/85 2nd Jan
    85/86 12th Nov
    86/87 21st Nov
    87/88 22nd Jan
    88/89 20th Nov
    89/90 12th Dec
    90/91 8th Dec
    91/92 19th Dec
    92/93 4th Jan
    93/94 20th Nov
    94/95 31st Dec
    95/96 17th Nov
    96/97 19th Nov
    97/98 2nd Dec
    98/99 5th Dec
    99/00 18th Nov
    00/01 30th Oct
    01/02 8th Nov
    02/03 4th Jan
    03/04 22nd Dec
    04/05 18th Jan
    05/06 28th Nov
    06/07 23rd Jan
    07/08 23rd Nov
    08/09 23rd Nov
    09/10 17th Dec
    10/11 25th Nov
    11/12 5th Dec
    12/13 27th Oct
    13/14 27th Jan
    14/15 26th Dec
    15/16 21st Nov
    16/17 18th Nov
    17/18 29th Nov
    So all l need now is a warmist to “adjust” it into a graph that proves that snow is becoming a thing of the past. 🙂

    • Now all we need is the Fourier analyses, but make sure you run the autocorrelation (AR1) only after you have detrended the data.

      • And all your fancy schmancy terms are too much for my little head to follow … so … I just call it statistical ju jitsu. Taking the raw data and making it fall where you want it to.

      • Yes, and time of day adjustments. When did the snow start falling? Were any snowflakes triggered or did they all glide harmoniously into safe spaces? And adjustments for whiteness are required.

      • We have a number of squirrels we feed regularly, one has two white marks on his back so my wife nicknamed him “Patch”. We throw peanuts out twice a day, Patch often comes to the window to “ask” for his!
        Being curious I conducted a scientific study of the amount of food the squirrels eat each day and plotted it on some graphs. It’s very erratic but does lend itself to Fourier analysis. The results are interesting.
        My wife says it’s just nuts.

      • Studies show climate change may, might , probably causes patchy loss of colouration in peanut fed squirrels.
        *based on one squirrel and the tree stump rings at the end of the garden where I spilled fertilizer that time.

    • For a bit of fun l did a graph with pen and paper to see how it would look.
      As you would expect the graph does jump about a bit between years. But there does seem to have been a slight trend towards earlier first snowfall taking place. Which was not what l was expecting.
      But it does follow what has been happening to the NH snow cover extent over the last 50 years. Where increasing snow cover during the fall/winter also points to the same sort of trend.

    • Just done a quick graph of the data, as you expect it does jump about a bit between years. But there does appear to be a slight trend towards earlier first snow falls. Which was a bit unexpected. But it does follow whats been happening with the NH snow cover extent during the fall/winter. Which also points to the trend that snow cover has started earlier into the season during the last 50 years.

    • Strictly speaking if it snows in January that’s the first snow of the year. Give that to an adjuster and you may find the average is around September, or June, or…wherever you want it to be.

      • Rhoda
        My record is based on the first snow of the winter season which of course fall between two years rather then the first snow of the year. Which is the reason l write down 2 years for each recording. Here in England we have summers that clearly divide the winter seasons. So l am able to date the first snow fall to the fall/winter.

  10. It’s certainly possibly to do your own temperature reconstruction. It’s something I think sceptics should do, using unadjusted data if preferred. It’s a way of showing how much difference adjustment makes (very little). I do it regularly; there is a review of the practice and methods here.
    This post get a lot of things right. It’s not very clear how large are the regions envisaged. New Zealand is given as an example. If you want to go for something larger, you need a scheme for area weighting the average. Otherwise you end up with a lot of US and little Africa.
    There is always a difficulty with fixed term averages where stations might not have data there. The First Difference method described here has been popular, but I think now not so much. It can go wrong. I use a least squares fitting method, also now used by BEST. Some of the issues are discussed here.
    The method here seems to involve homogenisation – ie spotting breaks. I don’t have a feel for how good the method is, but I think if you want that, better use NOAA adjusted. If you do want to locate breaks, I think the use of trends is more sensitive, as described here.

    • Tony Heller does a lot of calculations with GHCN data.
      You should pay more attention Nick, You might learn something.
      Remove blinkers first though.

      • Andy, remember what several people did to expose Nick when he made dishonest posts over what Tony made on his charts last year?
        He still hasn’t improved………

      • And you have been “disingenuous” with the data.
        You have been shown up in this respect TO MANY TIMES to be taken as anything but a minor propagandist
        There is NO REASON to believe TH does it badly, and EVERY reason to believe that you bent and twist the data to suit your agenda,

      • “You might learn something.”
        I have now, for six years, been publishing my calculation of the monthly surface temperature anomaly. I do that usually within the first ten days, and always before any others are published. When GISS comes out, I compare both numbers and maps. They agree very well. You can find those posts listed from the index here. near the bottom of the table, the button “TempLS monthly” will bring up links to my posts, and “GISS Monthly” will bring up links to GISS.
        Do you ever see Tony Heller testing his calculations against something else?

      • AndyG,
        I post. Them comes GISS, days later. We agree. Sounds like somebody’s doing something right.
        Do you ever contribute anything? Calculations? Explanation? Anything but noisy bluster?

      • I have to say I’m getting tired as well of Andy’s personal attacks on Nick. Nick and Steve are polite and I appreciate their contributions to this site. I would hate to see them leave because some members seem to have a personal vendetta against them. Nick/Steve and the rest of us might disagree on conclusions, but it can only help that we share data and methods so that we can have a clear understanding as to why we’ve reached different conclusions from the same data.
        That’s the important thing, after all.
        Just my 2¢.

      • “Andy, remember what several people did to expose Nick when he made dishonest posts over what Tony made on his charts last year?
        He still hasn’t improved………”
        I remember it well, because I asked a simple question about Tony’s work. The next thing you know, Nick makes accusations against Tony H and then TH takes those assertions and proves them false, one by one. Nick was invited to TH’s blog to do a riposte, or an apology. Nick never showed up. I concluded that Nick is not trustworthy.

      • Nick,
        Getting a similar results as GISS says you and GISS are using similar methods. Similar results does nothing to prove that the method itself is correct or not and is at the root of the entire conversation.

  11. NOAA’s Global Historical Climate Network (GHCN) data set is available at this url:
    https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/global-historical-climatology-network-ghcn
    There are several data sets, but the two of interest to me are the daily and monthly (v3) data sets. NOAA describes the GHCN thusly:

    The Global Historical Climatology Network (GHCN) is an integrated database of climate summaries from land surface stations across the globe that have been subjected to a common suite of quality assurance reviews. The data are obtained from more than 20 sources. Some data are more than 175 years old while others are less than an hour old. GHCN is the official archived dataset, and it serves as a replacement product for older NCEI-maintained datasets that are designated for daily temporal resolution (i.e., DSI 3200, DSI 3201, DSI 3202, DSI 3205, DSI 3206, DSI 3208, DSI 3210, etc.).

    The daily and monthly datasets are described as well:

    GHCN Daily
    GHCN (Global Historical Climatology Network)-Daily is an integrated database of daily climate summaries from land surface stations across the globe.
    GHCN Monthly
    Temperature dataset that contains monthly mean temperatures and is used for operational climate monitoring activities.

    The daily data set is truly the raw data.

    GHCN (Global Historical Climatology Network)-Daily is an integrated database of daily climate summaries from land surface stations across the globe. Like its monthly counterpart (GHCN-Monthly) , GHCN-Daily is comprised of daily climate records from numerous sources that have been integrated and subjected to a common suite of quality assurance reviews.
    GHCN-Daily contains records from over 100,000 stations in 180 countries and territories. NCEI provides numerous daily variables, including maximum and minimum temperature, total daily precipitation, snowfall, and snow depth; however, about one half of the stations report precipitation only. Both the record length and period of record vary by station and cover intervals ranging from less than a year to more than 175 years.

    The monthly data set has been subjected to more processing, though there are two versions: the unadjusted and the adjusted. Version 3 is the current version used. I have compared the monthly averages using both the daily and monthly data, and they match; what I don’t like about the monthly is that one can’t calculate the standard deviation for each month’s data — and they can be pretty large, over 4°C.
    It’s also frustrating to work with the monthly data because there is no — at least none I’ve found so far — metadata file describing the stations. The daily data has station IDs in the format USC00392797. The monthly data station IDs are in the format 42512345000, but only some are cross-indexed via the metadata for the daily files. That metadata file has the WMO id along with the GHCN id, which will match the Monthly data ID, IF that ID ends with three zeroes. Confusing? You bet.
    Anyway, if you’re sufficiently motivated, you can grab a whole lot of data from NOAA, and if you’re really energetic, the Oracle corporation allows you to download full enterprise editions of their database to load on your home computer for personal use. You have to sign up for an account, but it costs you nothing as long you only use it for development and personal use.
    Hope this helped. Have fun!

  12. you need to do what all other algorithm developers do.
    we double blind test.
    basically there are 8 standard data sets.
    1 is ground truth
    the other 7 have had errors added to them
    you run your algorithm on all 8.
    and you test.
    did you adjust the series toward the ground truth.
    did your algorithm leave the true series un touched.

    • You really have NO IDEA what the actual so-called scientists are talking about, do you Mosh !!
      Pity you never had a science education, isn’t it.

      • Poor Nick, even you must know that Mosh has very little science in his education
        He’s a literature, story teller front man.

      • You need to brush up on maths and data too n=Nick, realise that data quality is important.
        But any old garbage is good enough for you isn’t it
        more junk you have the more you can bend it..
        It is afterall, how you have spent your whole working life
        Data MANIPULATION and FABRICATION..

      • Andy, your contributions here in this thread have been the level of a
        playground sneer. There is no reason or justification for this level of
        silly personal denigrations. Just stop it.
        I’m also not interested in your feelings about Steven Mosher. Keep
        that stupidity out of it also, and if you have nothing to say on the subject
        under discussion, be quiet.
        I read Watts regularly, and am very pleased to have Nick Stokes commenting.
        Whether we agree or disagree, whether he is right or wrong.
        The bottom line of this is: Andy, you are contributing nothing. Nick, thank
        you for your continued participation and patience in the face of constant
        denigration from clowns with an emotional age of five.

      • michel…. Your opinion is noted.. and ignored.
        (But I can’t ignore your attacking the person method, since that damages debate and promote hard feelings. There are increasing number of valid complaints about the “in your face” way of treating people you don’t agree with, which I can’t ignore either. I am an editor who can’t put you in moderation, but I have Anthonys e-mail……) MOD
        [And Anthony read that email, along with others complaining about AndyG55, hence he’s now on permanent moderation – Anthony]

      • AndyG55 February 3, 2018 at 12:03 am
        michel…. Your opinion is noted.. and ignored.

        I agree with michel, and in my opinion your playground style hurts people who want skepticism to be taken seriously. Heller gives you total freedom to sneer, I’m not so sure how much our host likes to see this mud-throwing. You don’t much argue, but you do SHOUT and send multiple comments as if that won who got the last word. It ain’t so. You got to have something sensible to say. Being polite is not a weakness.

  13. The reality is you can’t turn “bad data” into “better data” unless all the variables contributing to the “bad data” are known and their contributions are systematic. In the case of historical data this is an impossible ask. Better to design measurement systems that actually produce “good data”.

  14. This WUWT post has NO value IMO because it can mean anything to anyone.
    Science is both about a method of inquiry and a method of succinctness to a prediction based on a hypothesis. The succinctness arises because prediction test-ability must be a straightforward yes/no answer in order to advance beyond ignorance.

      • But it is like describing a method of how to count Angels on the head of a needle.
        Seriously.
        Like George Carlin said, “The Earth is doing just fine.
        It’s been through a lot worse shit than this and we came came out okay.”
        But back to Nick’s reply to my flippant comment. (yes I admit it was flippant).
        Local temp records will/can be dismissed by both sides of the debate no matter what they tell us. Too convenient.
        I can say (as an example), “Region X is cooling unequivocally”. The other side will say “That’s not the a global representation.” And vice versa. It is all just BS.
        And until you can tell the average Joe or Jill why his or her utility bills are skyrocketing because Al Gore and his climate ilk demands we pay for our carbon sins.. while elites fly around in private jets and yachts to St Tropez and Tahiti… they can all just F-O.

      • “at least in USA”
        ROFLMAO.. you will try anything, won’t you Nick !!! Why always so disingenuous and devious !!!
        USA is using gas, it has hardly been sucked-in by the unreliables farce.

      • Nick ==> re: Electric pricing 1990 to 2010 is a 50% increase! — in ten years — That is skyrocketing….

      • “Electric pricing 1990 to 2010 is a 50% increase! — in ten years — “
        A now fizzled skyrocket. I presume you mean 2000 to 2010. But that change can’t be attributed to ” Al Gore and his climate ilk demands”.

    • In the face of the childish personal rudeness by some contributors to this thread, may I once again thank you for your patience and your continued calm and rational posting. I often disagree, but am always pleased to find you are continuing to participate, and am irritated and embarrassed by the antics of the self important and immature attackers.

      • Well, I’m yet to figure out how it is that I’m on permanent moderation, while andy is allowed to do this over and over again, in every thread about anything. I never carried on like that, but I did make the mistake of annoying Anthony by not just dropping things when he declared an argument to be over. It seems that who you say it too can sometimes be more important that what you actually say here.
        [most of your comments are either whining or telling us all how we should run the blog – in your case content is king – try being part of the community rather than maligning it -MOD]

      • moderator said:
        “[most of your comments are either whining or telling us all how we should run the blog – in your case content is king – try being part of the community rather than maligning it -MOD]”
        I find that slightly amusing given what is commonly allowed in articles and comments if it is directed at approved targets. But hey, It’s Anthony’s blog, and he can do what he likes.

  15. I have an old transistor radio here, it works quite well on the Medium Wave Band and I tuned it into a nice radio station playing 1960s/70s/80s popular music. By using my trusty AVO meter I have measured the instantaneous dc voltage every 5 minutes across the loudspeaker and similarly extended the exercise to various other points on the circuit board.Interestingly there are some component junction where the voltage various over a small range 0 – 0.5V and others where it goes up to the full 9.0V battery level. Some points are like 8.0 – 9.0V. Anyway I have averaged it all out and got a great answer for the whole transistor radio of 4.5V which is EXACTLY half the battery voltage thus proving there is a perfect energy balance, half in the circuit and by calculation 9.0 minus 4.5V half must be the music power.
    My job interview for the position of Climate Scientist is tomorrow, no need to wish me luck, it’s in the bag!

    • Appreciating your distractions, Rev. I suggest you might want to adapt your radio to 115V and get some really hot stuff happening. That raw , electric sound!

  16. The ‘anthropogenic global warming’ hypothesis is that ‘energy’ is ‘trapped’ in the atmosphere by ‘extra’ Carbon Dioxide leading to excess energy in the atmosphere. Temperature is not the correct metric for energy content of a volume of atmosphere due to enthalpy varying with humidity. The correct measure is of kilojoules per kilogram of atmosphere.
    As an example take two equal volumes of air one at close to 100% humidity at 75DegF and the other at close to 0% humidity at 100DegF, the 75DegF air has twice the energy content (trapped energy) as the 100DegF dry air. Therefore, relative humidity is essential to calculate how much energy is in a volume of atmosphere.
    Humidity does not appear to be reported by NOAA in their CRNs
    Time of observation has led to all sorts of clever adjustments so that the mathematical mean (incorrectly called average) of the highest and lowest temperature of the day can be obtained. Firstly, averaging a temperature, an intensive variable, is incorrect. Secondly, as the values of temperature and humidity do not follow a sine wave, their arithmetic mean gives information on neither the ‘average temperature’ nor the average energy content.
    Ideally, the temperature and humidity should be reported at least hourly and the energy content calculated in kilojoules per kilogram. Reducing the continual reports from automated weather reporting stations to just maximum and minimum temperature is a hidden ‘adjustment’ and withholding of information for quantification of the correct metric for average energy content.

    • “Humidity does not appear to be reported by NOAA in their CRNs”
      This fact alone should tell every scientist that the GAT is a total joke. One needs the heat content of the atmosphere, not just temperature. The amount of irrigation in the USA alone would alter the USA ‘average’. Then there is the total lack of data for so much of the world prior to satellites. Even then, the poles are not fully covered. No scientist should ever use terms like “warmest ever”. Every published paper that uses temperature data should warn the reader of the huge uncertainties. They do not, which proves to me that there is massive misdirection going on, either knowingly, or by group think.

    • I have noted this, too. As someone who spent a career on latent heat and enthalpy matters I find this almost laughable. Except something has been changing in the North and the almost wholly politicised climate “science” will never figure it out while they are ignoring obvious clues.

    • >>
      Firstly, averaging a temperature, an intensive variable, is incorrect.
      <<
      Bingo! All of this averaging of temperatures is nonsense. An average temperature makes about as much sense as an average telephone number. But it’s worse than that–you can’t measure an intensive variable unless the system is in equilibrium. Meteorologists get around that difficulty by assuming LTE (local thermodynamic equilibrium) holds. It seems to work for things like altimeter settings.
      >>
      Secondly, as the values of temperature and humidity do not follow a sine wave, their arithmetic mean gives information on neither the ‘average temperature’ nor the average energy content.
      <<
      I’m not sure I’m following you here. The average of a sine wave is zero. An AC voltmeter measures the RMS (root mean square) value of a sine wave. The peak value of a sine wave is the square-root of 2 times the RMS value; and the peak-to-peak value is twice that.
      Jim

  17. All step changes are considered to wrong data which needs to be corrected but what if the step change really occurred locally, there maybe no such thing as local climate even after thirty years because local climate is always changing between droughts and floods for example. The concept of a stable local climate is another sweeping generalisation or assumption that is easily accepted by some simply because it appeals to their way of looking at the world. I don’t accept that there is anything called a climate there is only weather even after thirty years.

    • Isn’t local climate = average weather for some predetermined amount of time? I think I have seen 30 years used, but I am unsure why that would be the magic number. It could be 50 or 100 for all I care.
      Anyway, if a place tends to be dry, but occasionally floods, and if those events occur within the used time period, they create an average some refer to as local climate. If that is the case, the current weather is almost always too low or too high a range (or rainfall, or temperature, or really anything) compared to the average.
      This means dry places that are occasionally wet will mostly be in droughts – even severe ones – even though that is normal. Planners who use the average climate will never have enough water storage or water transport built because they don’t understand they place is dry as hell most of the time.

  18. Michael, your simple process is a good way for interested people to get themselves immersed in the temperature data in a more instructive way. I have always had a problem with how “discontinuities” by station moves seem to be handled. The step changes seem to ‘prefer’ moving the temperature of the tail upwards to match the new record and this has a quasi logical basis to it. But, if most of the station changes made are to move stations out to the tarmac of airports. Then global averages, (assuming they mean anything) are going to rise. Okay, this would at least leave the longterm trend unchanged, one would think (this is the quasi I was referring to). However, the data managers also have a “rationale” to continuallyy adjust the much earlier records downwards (they can’t decently adjust the most recent data upwards more than a couple of tenths), collectively increasing the annual trend of temperatures more steeply upward (who cares if the 1850s are a degree cooler – is the thought – nobody important has their eye on this).
    This isn’t the worst infraction. You can change a chain of records significantly (done by Hansen) in such a way as to leave the long term trend unchanged. What does this do? Well, let’s say for example that the temperature in mid 1930s – mid 1940s was higher than the super el nino temperature of 1998 (it was until about ten years ago in the US record until Hansen put a stop to that and ditto other world temperature traces). This results in hiding a levelling off of temperatures after the 30s-40s. In other words, the temperature climb relative to 1940 was non-existent (higher than 1998) and we were then into the dreaded Pause. Since all the alarm arises from the 1979 -1999 “steep” rise, it wouldnt do to have an even steeper drop between 1940 and 1970s (the global cooling scare that warmists have been trying to “revise”).
    The argument, of course is that USA represents only 3% of the earth’s land and things are happening differently everywhere else. Immediately I can add on Canada, Greenland, Scandinavia (probably Europe and Siberia). But what about the SH? Check out Capetown, South Africa. This has almost identical wiggles to the US one, and Paul Homewood has shown that Paraguay, Ecuador, etc. also have a US facsimile. Here are the US and South African ones. Visit Paul Homewood’s “notalotofpeopleknow.com” for the South American raw records.
    From Jennifer Marohasy:
    http://jennifermarohasy.com//wp-content/uploads/2009/06/hammer-graph-5-us-temps.jpg
    From WUWT, a guest blogger from South Africa – looks like 1937 was even the high there!!:
    https://wattsupwiththat.files.wordpress.com/2017/01/clip_image0022.gif
    As an engineer, I would say these charts corroborate each other remarkably – see Paul Homewoods blog for the same pattern in South America.
    I believe the raw temperatures are closer to the truth than the adjusted which, by their own lights, the adjusters say it doesn’t make much difference!!

    • “I believe the raw temperatures are closer to the truth than the adjusted which, by their own lights, the adjusters say it doesn’t make much difference!!”
      If you had bothered to investigate, you would have discovered that the reporting station for “Cape town” moved from the Observatory to the airport in 1960.
      The observatory was warmer than the airport and suffered from UHI effect.
      There was an entire post on it here ….
      https://wattsupwiththat.com/2017/01/28/homogenization-of-temperature-data-makes-capetown-south-africa-have-a-warmer-climate-record/
      In which Nick Stokes debunked accusations of erasure of 1940’s warmth.
      FI a graph he posted of the 2 stations when both in operation and when the “splice” took place.
      https://s3-us-west-1.amazonaws.com/www.moyhu.org/2017/02/capes.png

      • Hello, first time commenter. I am trying to understand this graph and the adjustments. At the start point of the pink and green lines (I’m going to call it approximately 1855), there is around a 2C difference between CT unadj and CT adjusted. This can also be seen at around 2C at the peak around 1865 and the trough about 1875. However, it seems like the lines cease to track one another after this point. By 1900, I am looking at about 15.8C on CT adjusted and about 17C on CT unadj for a difference of 1.2C. Could you explain why?
        Further, you stated “The observatory was warmer than the airport and suffered from UHI effect.” If you are aware that UHI is having an effect on the recorded temperatures and you then adjust the new station (which presumably does not have a UHI effect), it seems like you are including UHI from the first station, then charting at the airport (which presumably does not yet have UHI), and then if the airport eventually experiences UHI, that effect will have been added twice.
        Could you give me an explanation of why this was done? It does not seem honest to adjust the new station record when it is known the old station experienced the UHI effect.

      • “It does not seem honest to adjust the new station record when it is known the old station experienced the UHI effect.”
        No, you have that the wrong way around. The adjusted curve is the lower one (greenish), and is adjusted to the airport values. The general convention when doing adjustments is that old readings should be adjusted relative to new, so the newest readings should have no adjustment at all.
        UHI is not an appropriate consideration at this stage. The objective is just to decide what the temperature reading actually was, for whatever reason. Then you can try attribution.
        HADCRUT/CRUTEM actually adopted the alternative approach; they regarded the airport as a new station (as it was), and the Observatory as a separate station that simply continued from 19th Cen to end 20th. That has much the same effect. What isn’t correct is to imagine that the temperature somewhere actually made a big dive in 1960. As the continuing Observatory record shows, it didn’t.

      • Nick,
        You said, “The general convention when doing adjustments is that old readings should be adjusted relative to new, so the newest readings should have no adjustment at all..” Perhaps you would care to speculate on why Karl didn’t follow that convention when he adjusted the SSTs to ‘prove’ there was no hiatus?

      • “Perhaps you would care to speculate on why Karl didn’t follow that convention”
        I think you should set out the facts there. I’m referring here to the convention that applies if you make an adjustment to a single record for a location.

      • but look how they just slid the 1960 record up to the top of the old record! WUWT? That isn’t a debunking, Notice that the bottom temperature line peaks in ~1937 and this peak is not beaten by this line until ~1997. The adjusted one jumps and makes a new peak above the former end one. Shift should have gone about 1/3 less than it did. You can see the spread between the lines is much farther appart than before the split. I would be dismissed if I did this kind of work. And BTW Tony, “bothering to investigate” isn’t simply a reading exercise. You can see what I mean by simply looking at what was done with the graph – even if it had beend done by Einstein, although perhaps I shouldn’t bring this scientists name into this discussion.

      • “No, you have that the wrong way around.”
        How do you figure that? I stated: ” I am looking at about 15.8C on CT adjusted and about 17C on CT unadj for a difference of 1.2C”
        The green line is at approximately 15.8C in 1900 (CT adjusted) and the pink line is at Approximately 17C (CT unadjusted).
        If you are simply adjusting one record to the other than why does the difference between the lines start at 2C in 1855 and then move to 1.2C by 1900? If, as you say “The general convention when doing adjustments is that old readings should be adjusted relative to new” then why would there not be a consistent difference between the lines?

      • “If you are simply adjusting one record to the other than why does the difference between the lines start at 2C in 1855 and then move to 1.2C by 1900?”
        There was a second adjustment around 1888. I haven’t looked into that, but it could be the introduction of a Stevenson Screen. But the basic thing is that the adjusted is the lower curve, and that is continuous with the airport post 1960. The curve was adjusted down to the airport level. Not that it matters after you take the anomaly.

      • Thank you for the response. However, I do not see how that would explain the change. Either the Stevenson screen was adjusted for in the original records, or it wasn’t. The data recorded in 1960 as of the change recorded about a 1.2C difference between the 2 stations. There would be no logical reason to change the CT adjusted due to installation of a Stevenson screen.
        My admittedly, very limited experience with Stevenson screens led to reduction in temperature, however that was a low humidity situation. From what I understand, Cape Town is fairly humid.

  19. Of course the whole idea of this is to make everything smooth and easily fitted together, average out all those irritating and noisy uncorrelated random deviations and transient perturbation. However who is to say these might actually be the signal we really require the find out what the climate is doing, and possibly give us an indication where it might go.
    Consider:
    Slowly, slowly the average rises but the levels of uncorrelated noise, and the number of random transient perturbation rises a bit faster. They rise quicker that is until a natural threshold is reached then — flip — a new climate regime establishes, and everyone is left puzzled as to how or why.
    Noise on the signal or a signal in itself? If you never investigate you’ll never know.
    Just like there are no real weeds just plants in the wrong place, or there is no trash just materials in the wrong place, there is no noise on the signal, just some other signals for which you have yet to discover what, why, and how.

Comments are closed.