Friday Funny: more upside down data

Steve McIntyre famously determined that Michael Mann was using a paleo proxy data series upside down in a paper. Mann’s response (though not directly because he doesn’t speak with mere people with questions) was “it doesn’t matter”.

I wonder what CRU will have to say about this one that has been discovered? It’s bigger than just a single point on Earth.

friday-funny-flipped-hemispeheres

Bishop Hill notes:

Reader John McLean emails with details of some surprising finds he has made in the Hadley Centre’s sea-surface temperature record, HadSST. John is wondering whether others might like to take a look and confirm what he is seeing. Here’s what he has found:

http://www.bishop-hill.net/blog/2016/3/25/some-oddities-in-hadsst.html

UPDATE: 3/26 Josh adds:

A hilarious possibility but one which Zeke has already said has been fixed or was wrong. Well, that’s a relief – these things do tend to happen in Climate Science.

101 thoughts on “Friday Funny: more upside down data

  1. I’m not finding any mix up between NH and SH data; if there ever was a problem presumably its already fixed? The >9999 values reading as ***** in the obs count file does seem to be a real bug, however.

    • Yes, I can’t see any problem with NH and SH. HadSST3-nh.dat (and -sh) is just a file of monthly averages. The numbers in the file correspond to the familiar graphs shown. NH (-nh) temperatures are higher, as expected. Eg the 2015 average for NH was 0.737; for SH was 0.425. The files were last updated 8 March, so I don’t think there is a recent change. It looks to me as if John Maclean may have been reading the netCDF gridded file wrongly.

    • What did you check Zeke? Your post isn;t clear. I checked the percentage coverage of the hemisphere by calculating it from the gridded data. My data was displayed to 2 decimal places so I had to round it but even in routned form the percentage cover for the NH matched the SH summary file rather than the NH summary file.

    • The Met Office has corrected; Hadley Centre not yet.

      to be further continued.
      ===================

  2. Minor typo in third sentence: change “that” to “than”. A very common error.
    Ian M

    • Yes, but the question is: will any scientists that projected CAGW from this data correct their analyses?


      • correct their analyses?

        There is nothing to correct. The analysis and conclusions were written first. The data is merely a way to demonstrate the conclusion. Do you actually think the warming alarmists would look at the data before they reached their conclusion?

    • There is no such thing as an “honest mistake” when your dealing with data reduction. There is “getting it right” and then there are errors that come from a lack of skill and knowledge. Any senior research personal, or any institution that allows the latter to happen is being totally dishonest.

      • But there are “typos”, mistakes made with no intent to deceive or mislead.
        The original may be more a lack of “proof reading” on “the boss’s” part. (Title a table as “nh” instead of “sh.) Not quite what Mann did with the lake cores to keep his stick alive. Just a mistake rather than a deception or gross incompetence.
        Maybe an “honest mistake” has been dishonestly used.
        John Mclean has asked “the masses” to double check what he thinks he saw.
        Would that other’s that claim to be “scientist” would do the same.

      • “John Mclean has asked “the masses” to double check what he thinks he saw.
        Would that other’s that claim to be “scientist” would do the same.”

        So who’s checking?

      • Amazing that you can get pilloried more for making a mistake in a comment on a blog than work in a paper that has been peer-reviewed and cited numerous times.

    • True. Devil’s Advocates are critical for the identification of existing and potential sources of problems
      This way we address faulty science, bad data, short coming, lack of performance, poor outcomes.

      First, we must confirm and then address those issues, correct, and identify how to prevent it the next time.If the oxygen line to the surgery room keeps getting crimped and people die, we do not ignore it and shut other people down.
      A typical student error would be to have to mix data from different start and end dates. Once pointed out, a real scientist would fix the error and address the origination of the error.

      Here, it takes a LOT of machination to flip an entire data set, a lot of energy to refuse to address it, a committed ‘review’ group to ignore it and a large herd to trample over anyone who brings it up… because the ‘error’ supports a precept.

    • I may be mistaken (hopefully not), but I believe the phrase “good enough for government work” was not originally ironic. It was meant as approval of high quality, back in the days (World War II ?) when the government really did insist on meeting quality standards for its work.

      What a distance we have fallen, if praise has become irony.

      • Ummm, Steve. Donwload involves right clicking on a link and then selecting ‘Save Link as’ (if you’re using Firefox). Somehow I doubt that’s what you meant.

        I started by calculating the weighting (the cosine of the latitude of the centre of the grid cell) of an individual grid cell in each latitude band where we have grid cells and I separately aggregated the weighting all grid cells within the hemisphere.

        I then processed the gridded HadSST3 datafile for each month, aggregating the weighting factor for all grid cells that reported data (i.e. were not flagged ‘missing data’) for each hemisphere. At the end of the month’s data I converted the aggregate weighting of reporting grid cells to a percentage of the total weighting for the hemisphere, thus giving the percentage coverage.

        I’ve used the same technique on CRUTEM and HadCRUT data at different times and my output, when rounded to teh nearest whole number, matches the summary files that the CRU published via its web pages.

      • Hi moshe; there’s an update from Tim Osborn thanking McLean and correcting the record. No cookies today.
        =============

    • Obviously a programming error that is very easy to make …

      for j in range(-90,90) do { buf:=””; for k in range(0,360) do { buf:=buf+”,”+data(j,k) }; buf:=buf-“,”; print buf }

      vs

      for j in range(90,-90) do { buf:=””; for k in range(0,360) do { buf:=buf+”,”+data(j,k) }; buf:=buf-“,”; print buf }

      Most programmers increment variables upwards through a range, so counting from 90S up to 90N would be more natural than from 90N down to 90S.

      One of my favorite programming sayings is … “If engineers were to build buildings the way programmers write programs (and manipulate data), then the first woodpecker to come along would destroy civilization!”

      Yes, my brethren are that bad. There is no other profession that I know of where competencies vary with a variance greater than an order of magnitude from the mean. A very few of us (<<1%) are ridiculously good. The rest SUCK, which is why we have project specifications, code reviews, change control, regression and acceptance testing; but only for projects that matter or that have large budgets. Most programming efforts are hack jobs, plain and simple.

      By definition, because they used graduate students paid by grant monies, any programming done with regards to climate is a hack job, and even that presupposes that they have the physics right, which we know they cannot (parameterizations, 100+ kilometer intervals between data points, etc).

      • ” Most programming efforts are hack jobs, plain and simple.”

        Because most managers don’t take code serious. Most projects start out as; “You get started writing code, and we’ll get you the requirements later”.

      • “Paul

        March 25, 2016 at 2:40 pm

        “You get started writing code, and we’ll get you the requirements later”.”

        I experience this all the time with projects over the last 30 years or so, and it get’s tiring very very quickly. While my last recent experience was not code, it was a Windows 10 SOE build project. Now, most installations would use SCCM 2012 to build and capture the SOE. Not in this case. No I had to build this SOE manually, while the requirements were changing almost every hour over a couple of weeks. The “SOE” to be captured using CloneZilla. This was a Govn’t agency.

      • Thomasedwardson, as you say, it’s an easy error to make. That said, most of us who work in IT verify that the code is doing what it’s supposed to do or we have someone else (a software tester) do that verification.
        In this case I had code doing something more complex and found that it was giving odd results so I simplified the program and checked it carefully and still the odd results persisted.

  3. Agencies that publish climate data talk about them as “Products” and it’s obvious that when it comes to quality control, they aren’t up to speed with the rest of the modern world. They seem to be living in that bygone era where the new car dealer told you, “Drive it around and keep a list of what you found wrong. Then bring it in and we’ll fix it.” But it’s worse than that. Quality Control seems to mean “Adjust the data to fit the narrative.”

    • Try looking at the coverage from (a) the HadSST3-nh.dat file and (b) calculated from the gridded data. (I didn’t make it clear enough that coverage was the issue when I emailed Bishop Hill.)

  4. Azimuth – from North or South Pole?
    A possible source of such confusion is that sometimes Azimuth has been reckoned from the South Pole in astronomy and satellite observations, instead of from the North Pole as in navigation.
    Stanford defines Azimuth:

    Azimuth, in astronomical measurement, is the number of degrees clockwise from due south (usually) to the object’s vertical circle (i.e. a great circle through the object and the zenith). For nonastronomical purposes, azimuth (or bearing) is generally measured clockwise from due north.
    </blockquote
    e.g. see Altitude, Azimuth, and Line of Position Comprising Tables for Working Sight …Table IV page 155
    Azimuth (Wikipedia)

    Azimuth (Az), that is the angle of the object around the horizon, usually measured from the north increasing towards the east. Exceptions are, for example, ESO’s FITS convention where it is measured from the south increasing towards the west, or the FITS convention of the SDSS where it is measured from the south increasing towards the east.

    NOAA has historically inverted longitude and time zone definitions:

    Please note that this web page is the old version of the NOAA Solar Calculator. Back when this calculator was first created, we decided to use a non-standard definition of longitude and time zone, to make coordinate entry less awkward. So on this page, both longitude and time zone are defined as positive to the west, instead of the international standard of positive to the east of the Prime Meridian.

  5. New Job Posting at Hadley Centre:
    Wanted, Student Intern, must be right-handed with MS Excel experience and knowledge, apply at dd-mm-yyyy lon-lat.

    Ha ha ;-) Friday Funny

  6. From the Bishop-Hill article: “I think a fair question is whether Hadley Centre publishes other flawed data on SST or anything else because it looks like there’s no in-house verification that software does what it’s supposed to do.”

    As long as the data show an increasing temperature trend, the software must be doing what it is supposed to do, so there would be no need for verification;-)

  7. It’s an honest error, like the time Al Gore complained to the Washington Post that they had a satellite image of the Earth upside-down.

      • Anthony: Over at BH, Steven Mosher raised the same comment, but I noticed that the dataset he referenced – and said was error-free – was named differently to the dataset that John McLean said was in error.

        viz: Steven Mosher: “https://crudata.uea.ac.uk/cru/data/temperature/HadSST.3.1.1.0.median.nc”

        McLean: https://crudata.uea.ac.uk/cru/data/temperature/HadSST3-nh.dat

        I’m afraid Mr Mosher has managed to confuse matters (OK, me). Like you, I wish he would just come out and say what he means.

      • Harry,

        I can’t help but wonder if Mosher’s error was simple human error or deliberate deception?

      • Amusing that moshe doubted McLean’s downloading code, rather than finding the error himself.
        ===========

    • Different file.
      The ‘median’ file is gridded data that for each month has a header record followed by 36 records each of 72 fields.
      The HadSST-nh.dat file is a summary file that says in month X of year Y the average temperature anomaly for the Northern Hemisphere was Z and in the next record says that the month’s coverage was N.

  8. I followed the link and read about some of the problems.
    Data sets transposed, floating point fields instead of integer fields, field overflows.
    These are the mistakes of a high school kid just messing around.
    Maybe this sort of thing could be understood in the 1970s when there were huge numbers of people going into the field and lots of things had not been fully worked out as yet.
    I have been developing scientific software for decades so I know the issues. When confronted with issues like this perhaps the most important principle of software development is to be sure your programmers know what the hell they are doing.
    (from above)
    “First, we must confirm and then address those issues, correct, and identify how to prevent it the next time.”
    Pure bureaucratic response.
    How about knowing what the hell they are doing in the first place.
    “If the oxygen line to the surgery room keeps getting crimped and people die, we do not ignore it and shut other people down.”
    What on Earth are you doing, allowing a system where critical items can be damaged or destroyed by careless, inept, abject stupidity, even while they are in use?

    The only non-technical analogy I can come up with is this:
    You bring your car in for an oil change. The auto worker drains the oil and replaces it with water. After a mile down the road, you bring the car back because there is something very wrong with the engine.
    The excuse given is that the employee is “a hayseed just off the farm, never seen machinery before”. You think maybe that excuse would have worked in 1890, but certainly not by 1990. Then you realize that in 1890, a “hayseed just off the farm” would never been allowed near the high-tech machinery in the first place.

    My point is that some things are just so bad, you just can’t make excuses.
    And this stuff got out into distribution, and nobody even checked.

  9. In the 1970s I learned FORTRAN in the Engineering department. ALL the exercises were deliciously designed to seduce you into the (almost) obvious traps of things like wrong type or sign or non-random random. Best programming education I ever got! (After falling into the first trap, and seeing the TA’s glee, I started to look for it. Got 100% right and A after that…)

    Later, seeing how non-Engineers were taught “easier” languages, I cringe at their laxity. The Engineering department stressed that if you got it wrong, buildings and bridges fell, people died, and you were variously sued to oblivion, driven from the field, or in prison.

    Would that the Climate Science Kids have the same rigor and not treat things as a grad student t job for a computer non-major with one intro programming class.

      • jasmr: I prefer Forth where 2+2 can = 5 ;-) (does that make me a climate scientist?)

        No, because you admitted you were wrong.

      • But the Ariane 501 was not an Algol or programming error, it was a systems integration error.

        The ESA rocket scientists had built a successful flight control system in the Ariane 4 using a redundant pair of processors for control, and a third processor to arbitrate which of the two actually controlled the rocket. Unfortunately, the 1990’s vintage CPUs were not really fast enough for the job resulting in an insufficient frame rate in the guidance system. So the engineers started optimizing and removing code to reduce the length of the control loops to increase the frame rate. Some of the code they removed was the bounds checking on the horizontal delta-V values because it was physically impossible for the Ariane 4 to accelerate quickly enough to exceed its 16 bit storage location.

        When the ESA built the more powerful Ariane 5, they simply reused the fight control systems of the Ariane 4 without modification and without proper testing. About 35 seconds into the maiden flight carrying an expensive live payload, the more powerful engines of the Ariane 5 were accelerating the vehicle quickly enough to cause an untrapped integer overflow due to the removal of the horizontal delta-V bounds checking code, which crashed the primary cpu. Not to worry, the arbitration processor simply handed control over to the secondary system … which had faulted about 200 milliseconds earlier due to the same error. Opps. Self destruct.

        The design team had failed to test the entire flight control system under the more powerful Ariane 5 flight profile. The Ariane 4 inertial reference system was not included in the test. Instead, the engineers used simulated data to test the rest of the control system. Had they used the real system, it would have faulted on the ground.

        Yep. Ariane 501 ultimately failed due to bad modeling data.

  10. dbstealey

    When referring to Climate Science, I’d recommend changing “at least 97% of today’s kids would flunk” to “at least 97% of today’s kids have flunked”

    • ‘Flunked’ is a vile derogatory word always replaced by ‘participated’ in the education lexicon nowadays. Are you some sort of racist, misogynist, homophobe or something?

      • Nah. Just an old retired CFO who was rather used to holding people accountable, whatever fancy name they wanted to put on their “participation” in misfeasance.

  11. This article (and the stuff at Bishop Hill) got me curious, so I went and grabbed the 3 ASCII files that were mentioned at the HadSST website (‘…number_of_observations.zip’, ‘…median.zip’, and ‘…measurement_and_sampling_uncertainty.zip’). I did not bother (yet) with any of the NetCDF files.

    After a very cursory overview of those files (I have no ‘R’ chops), I came to the following conclusions:
    – the meta-data for these files appears to be very minimal (may be more available if I were to read ALL the support docs/papers, but …)
    – the ‘median.zip’ and ‘measurement….zip’ files didn’t look obviously out of scope
    – the ‘number_of_observations.zip’ file was indeed curious:
    – – Yeah, it did look like the file format was legacy FORTRAN-sourced, and nobody has bothered to update it for integer-scale content (one wonders where this stuff is going to get used).
    – – The positioning of the overflow markers (field content ‘*******’) seemed to have some significance relative to the particular grid-cells that they represent (beyond their meaning of integer value > 9999.00). At least in the first several years that the overflow-markers occurred, they appeared against the same few grid-cells (I’m still mucking about in the data). Additionally, the neighboring grid-cell values for the same latitude also contained ‘large’ counts (upwards of 40%-80% of max value). The fact that these overflow values didn’t start until late 2002 makes me wonder if the source data for these grid-cells was due either to Argo float transmitters that were improperly nattering as they phoned home (or are they really taking that many samples), or if there was some particular problems in the source data (which appears to be at ICOADS if I followed the HadSST website documentation correctly), such as certain data sets being entered multiple times. I have seen groups of multiple-entry happen before in other data systems, so it wouldn’t overly surprise me, but I’m open to other explanations. One of these days, I’m gonna learn enough R to build a grid-cell-count-histogram over a world map to see if there is some way to simplify chasing down these kinds of data problems.
    – – Depending on how this file gets used in the HadSST computations (or is it just produced for ‘historical reasons’?), I could see the overflow markers (and any other associated data issues) impacting any number of statistical inferences in the SST products. As Steve MacIntyre and Willis E have said on any number of occasions, intimate knowledge of one’s detail data can be very useful.

    • I understand what you’re saying. I didn’t use R in my analysis; I used Fortran. I need time to learn R but I don’t have that time at the moment. I’ve been writing in Fortran since 1977 so I’m very comfortable in it.
      Regards the grid cells, I tend to call them CLAT and CLONG, the former being 1 to 36 and the latter 1 to 72 and (obviously) they correspond to latitude and longitude respectively
      To help my analysis I have a sheet of paper with a map of the world on it. (I think I copied it from a CRU or Hadley Centre publication as an image then stretched it to fill the sheet.) The map has the left edge at the date line and on it I’ve drawn the 5 degree latitude x 5 degree longitude grid cells and labelled the X and Y axes at every 10th cell. It’s been so useful that I strongly recommend it.

  12. Big Climate looked to Wall St for inspiration on how to slice, dice and collateralize sub-prime data.

  13. I just have to reply – my husband 40 years a carpenter, since retired many a time was asked to build various structures that were held up by skyhooks, these plans were approved by engineers, and local councils, but unless he was a magician the task that was being asked was impossible and or dangerous.
    In consultations with architect and engineers the problems were usually solved, but the first set of plans certainly did not provide the solution.

  14. I have put some comments up at the Bish’s place.
    It seems to me that the SST count of obs file is completely screwed up and the unadjusted anomalies at least partially screwed up.
    Mosher is right, the first thing to check is the code used to grab and analyse this stuff, I spent 80% of my time checking and double checking.
    If I am going wrong, I can’t find it.

    If anyone can independently check out the count of obs of SST at Mt Everest, let me know what number you get. I am getting k’s of them

    • – EternalOptimist @ March 27, 2016 at 5:59 am

      Since all the various files seem to have data in (36 x 72) matrices (the world-5SquareDegree-grids), it wouldn’t take much for one of them to be 180-degrees out-of-sync with the others: [(1,1)..(36,72)] in one relating to [(36,72)..(1,1)] of another. I didn’t find (nor look very hard) for meta/’about’ files that talk to this kind of problem, which is why detailed meta-info is so important in making sense out of libraries of stuff. By the way, I did find a kind-of sideways reference to the two matrix rows for the earth-poles (in meta-info for another file altogether): for that particular file, matrix cell (1,1) contains the content for one pole and cell (36,72) contains that for the other pole. The rest of the grid-cells for their affiliated rows should contain the ‘no-data’ marker. Also, given that it seems that the in-file meta-data seems not only to imply the use of no-data markers for grid-cells with ‘no-data-acquired’, but also for data grid-cells that would represent land-grid-cells. Graphically mapping this stuff might give an indication of data-orientation, since the no-data markers of land-forms should jump out visually quite easily.

      Thanks, Mosh, for the quick code to look at the NetCDF files (over at BH and elsewhere here, above). My problems have more to do with what that code truly says in the operation of the content of the grid-matrices and how I can apply it to the ‘…counts…txt’ file of interest with a world-map, thus my lack of R knowledge impedes my progress.

      My other concerns about the ‘…counts…txt’ file have to do with the proper use of such data in statistical analyses of the grid-cells of the other related files that have observation-content (any uses of means, medians, std-devs, variances, regressions, etc — what are the codes’ internal behaviors when dividing/multiplying by either zero or ‘large’/infinity/not-a-number for a given grid-cell — that’s neither trivial nor necessarily obvious in a given implementation — does the language-system throw errors or just ‘do-something’).

      • The observation count data looks to be mirrored about the equator. In other words it’s been written from 90S to 90N rather than 90N to 90S.Or if you’re a Fortran programmer it’s DO n J=36,1,-1 when it should be DO n J=1,36.
        As someone said, different data uses different systems. The NOAA OI gridded data does indeed have records from 90S to 90N. In contrast HadCRUT, CRUTEM and HadSST gridded data are all 90N to 90S.

  15. The CRU updated its files on March 30. Now, just after the table and about 30% down the page, we find …
    “Correction issued 30 March 2016. The HadSST3 NH and SH files have been replaced. The temperature anomalies were correct but the values for the percent coverage of the hemispheres were previously incorrect. The global-mean file was correct, as were all the HadCRUT4 and CRUTEM4 files. If you downloaded the HadSST3 NH or SH files before 30 March 2016, please download them again.”

    No correction yet from Hadley Centre, team leader John Kennedy is out of the office until April 6th.

    • FYI, a post at Bishop Hill by Tim Osborn might interest, has given credit to you.
      ————–
      The files on the CRU website containing the NH and SH averages of HadSST3 contained the correct temperature anomalies but the values for the percent coverage of the hemispheres were incorrect (the NH and SH values were swapped). They have been replaced with corrected files today. Thanks to John McLean for noticing that there was a problem with these files.

      The global-mean file was correct, as were the HadCRUT4 and CRUTEM4 files. If you downloaded the HadSST3 NH or SH files before 30 March 2016, please download them again.
      Mar 30, 2016 at 5:44 PM |

Comments are closed.