Challenging statistics of weather extremes

From KING ABDULLAH UNIVERSITY OF SCIENCE & TECHNOLOGY (KAUST)

By integrating previously distinct statistical paradigms into a single modeling scheme, Raphaël Huser from KAUST and Jennifer Wadsworth from Lancaster University in the UK have taken some of the guesswork out of modelling of weather extremes. This could greatly improve predictions of future extreme events.

Modeling the frequency and severity of possible weather extremes, such as intense rainfall, strong winds and heat waves, must account for nearby monitoring stations being spatially correlated. That is, heavy rain at one station often implies that there will be similarly heavy rain nearby.

However, as the severity of the event increases, this spatial dependence can weaken–the higher the rainfall intensity, for example, the less likely it is to occur across a wide region. Some extreme events may even be entirely localized around one station, with no correlation at all with those nearby.

Deciding whether the dependence changes with intensity, and to what extent, is a crucial step in the model selection process, but is often difficult to determine. For those involved in predicting weather disasters, a mismatch between model selection and the hidden character of the data can critically undermine the accuracy of predictions.

“It is very common with wind speeds or rainfall that spatial dependence weakens as events become more extreme, and eventually vanishes,” explains Huser. “If we restrict ourselves to ‘asymptotically’ dependent models, we might overestimate the spatial dependence strength of the largest extreme events; meanwhile, if we restrict ourselves to ‘asymptotically’ independent models, we might underestimate their dependence strength.”

Building on their recent work, Huser and Wadsworth have developed an integrated statistical approach that eliminates this guesswork by combining these disparate spatial dependence models on a smooth continuum.

“Our statistical model smoothly transitions between asymptotic dependence and independence in the interior of the parameter space,” explains Huser, “which greatly facilitates statistical inference and is more general than other models, covering a different class of statistical models with application to a broader range of scenarios.”

The researchers applied the modeling scheme to winter observations of extreme wave height in the North Sea, which was found in a previous study to have a high degree of ambiguity in its dependence class. The model proved to be very effective in dealing with the data, accounting for the case where there is strong spatial dependence but also strong evidence of asymptotic independence.

“Our new statistical model bridges these two usually distinct possibilities, and crucially, learning about the dependence type becomes part of the inference process,” says Wadsworth. “This means the model can be fitted without having to select the appropriate dependence class in advance, while being flexible and easy to use.”

###

The paper: https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1411813

Advertisements

34 thoughts on “Challenging statistics of weather extremes

  1. Around Marble Falls, Texas, in the hill country, much of the rain is associated with thunderstorms. We had a storm about 9 years ago that deposited some 20 inches (50cm) in less than six hours. The storm was quite local, with a fairly small footprint. Just how close together weather reporting stations are does matter when events are that patchy.

    • Remember a BBC program years ago (before they got hooked on global warming) with a man from the Met Office explaining that when the weather reports talk about scattered showers if you missed them you thought “what were they on about” but if you got one that was stationary or slow moving all you got was a wet day and it was all from the same forecast.
      James Bull

  2. A number of actuarial organizations in North America have banded together to prepare their own, severely flawed, tracking of weather extremes in North America called the Actuaries Climate Index. http://actuariesclimateindex.org/home/ The concept is flawed from the start, but it sure makes for good alarmism.

    • Whatever the other flaws, there is one glaring one. These so-called actuaries have accepted the data, as produced by NOAA and others, without first reviewing it and testing it When I qualified, many years ago, I would have been failed in my exams if I had not first verified the accuracy of both the data and of the methodology that lay behind its production; and then been able to demonstrate that I had done so before going on to use it in any model that I was proposing to derive.

      • Solomon Green – I agree with the concern over the data – and the lack of assessment of the validity of the data. There are many other concerns about the index the actuaries created. It highlights what it calls extremes, when it might not be an extreme at all. For example, a run of temperatures in the US midwest in winter of 45 degrees F instead of 30 degrees F might trigger an extreme event for that month. It also blends in sea level from only about 80 stations for the entire Canada and U.S. coast, without any attribution of how much is actual sea level rise versus subsidence, and it tracks any increase in sea level rise as an extreme event. I see it as an embarrassment to our profession.

    • Sadly, it won’t take long. The first time someone can manipulate a result to make things look worse, they will use the wording “This HAS greatly improved…”

  3. Let us assume the odds of me winning an average lottery are 100 million to one. That is to say, if I purchased a lottery ticket every year for 100 million years, I could expect to win the lottery one time, correct? So buying a lottery ticket is sort of a waste of time. I’d be dead before I would ever win.
    Well, what if 100 million people each purchase a lottery ticket? Then we would expect someone (probably not me) to win the lottery each and every year. Ergo, someone winning the lottery each year is not some sort of event, nor does it indicate that the chances of winning the lottery have somehow improved.
    So, we have around 5,000 weather stations in the U.S. that have recorded weather for 100 years. They record heat, cold, wind, rain, what have you. Now, just like the lottery, only one year in 100 years can set a record, right? There is only one “winner” out of the 100 years. For example, 1962 might be the warmest year ever in New York, while the wettest year might be 1985. Of course, in Chicago, the warmest might be 1948, and wettest might be 1901. Each station would set their record on a different year, if the weather was just random from year to year, right?
    Therefore, each year, at each weather station, there is a 1:100 chance of setting the record for some parameter, assuming a random distribution of weather events. Therefore, if we have 5,000 weather stations, then we would expect each year to set 50 records for each parameter. Therefore, each year we would see 50 recording stations would set 100 year records for high temperature, and 50 recording stations would set 100 year all-time records for low temperature, etc.
    Therefore, when someone says, it was the hottest year in 100 years in Fresno California, that in itself indicates….nothing about the climate. Nothing. If the distribution of weather was RANDOM we would expect to set 50 high temperature records in the U.S. each and every year, right? This year might be Fresno. Next year it would be Casper, Wyoming. The next year New York City.
    Now, we expect 50 records to be broken each year, for each parameter. However, we know that year to year, there is not a perfectly even distribution, right? That is to say you could have 100 records for high temperature broken in one year, and zero the next year, right? As long as it all evens out to around 50 weather records set per year then we would be pretty much par, statistically. Think of it like the lottery – sometimes there is no winner, right? The money rolls over to the next drawing, and the following year there are two winners.
    Or am I missing something?

    • Missing factor: the distribution of weather patterns is not random.
      It cannot change from heat wave to frost spell over night at the same location.

      • Not saying it is random, just pointing out that random results would lead to exactly the same alarmist headlines regarding record cold, heat, rain, and wind. Records being broken at individual climate stations should be a common occurrence, and not exactly news worthy regarding climate change.
        Is it your experience that this fact is well communicated to the public, or even vaguely understood by news organizations?
        In actuality my expectation is that stations in specific regions would all experience certain records in the same year, but that nationwide, but over wider areas the patterns would appear more random (record cold in Alaska at the same time as record warmth in New York). That is more or less what we see in the records.

    • Nope. I think you hit the nail on the head. Of course, the Climate Fascists will attempt to find something to propagandize as an effect of “climate change,” all without the slightest bit of empirical attribution, just the usual science-by-assumption/confirmation bias/circular logic.

      • TheDismalScience guy has a very good point. Local Weather records are meaningful only for that locality. On a global basis they have nothing to do with climate change….global warming…… AGW…. CAGW…. ozone hole…etc . and any other hoax you want to dream up.

    • I think it depends on the definition of a heatwave. Two Sunday’s ago at my house north of Denver, we hit a high of 65, about 20 degrees warmer than average. That following Monday morning, the temp was minus 11, with a high of 10. Also a departure of 20 degrees from average. While I find this fairly normal, it is a bit extreme for our area.

    • Dismal,
      Your statement that “If the distribution of weather was RANDOM we would expect to set 50 high temperature records in the U.S. each and every year, right” depends critically on the
      probability distribution that the temperatures are drawn from. Temperatures might be random but if they have a Gaussian distribution then the probability of setting new temperature records decreases very rapidly with time. You would only expect 50 high temperature records if the temperature was a random variable evenly distributed between +/- infinity something which is not physically plausible.

      • I don’t think that is correct. Or at least, you are talking about something entirely different, which is the shape of the bell curve. The shape doesn’t matter – just the outliers which become the records.
        And 0 to infinity? Okay, out of 100 years, 10 will have an average of 75 degrees, 5 will have an average of 75.3 degrees, and 1 will have a average of 75.31 degrees, it being the hottest year on record, by a margin of 0.01 degrees. There are after all, an infinite number of decimal places, and the news has been quite reluctant to explain how the hottest year ever is only 0.01 degrees hotter than the previously hottest year, in 1934, or that such measures are outside our statistical ability for estimation.
        What I am saying is that, with a sufficiently large sample size (enough players playing the lottery) we should be setting various weather records every single year, assuming a random distribution of highs and lows. In fact NOT setting at least some records nearly every year would be the notable outcome.
        One year will be the hottest. The distribution doesn’t matter. The hottest can be 0.0001 degrees warmer than peak of the bell curve for all I care.
        At the end of the day, we are looking at trends, and the trends have to be more than just a single station, or even a single region. I would expect that the hottest year in Boston will correlate closely with the hottest year in New York. I would be surprised if it correlated with the hottest year in Buenos Aires.
        And that is what is missing. Whenever these things get discussed, we either talk about a worldwide average (which is probably statistical mush), or we talk about one place or region warming, without noting whether other places or regions are warming at the same time, to the same degree. Or whether such warming, even if measured, is statistically valid.

      • Dismal,
        Again your statement:
        “What I am saying is that, with a sufficiently large sample size (enough players playing the lottery) we should be setting various weather records every single year, assuming a random distribution of highs and lows.”
        Is just not true unless the random distribution extends to infinity and is uniformly distributed. The shape of
        the probability distribution will determine how likely it is that a new record is set and assuming that there is a
        maximum possible temperature then the probability that a new record will be set will approach zero.

    • What I think you are missing is this. Say you are working on a research project to determine what the most overweight person weighs. You start by weighing everyone in your town. You have a pretty good list going. Then you move on to the state and the list gets bigger. It turns out with enough data that the more people you weigh, the less likely it will become to find someone to beat everyone who has already been weighed. If on the whole, the human weight is stable then the odds of finding those persons on the tails of the curve will get harder and harder. On the other hand, if in general the average human is getting bigger and heavier, it won’t be as hard to find the new record holders. It seems to me state temperature records, both high and low, are going to be harder to break as more data is collected. And the increase (or decrease) from the old record is going to get smaller and smaller as well.

      • You are misunderstanding statistics.
        Say I want to run the experiment you just outlined. Weight varies in individuals – do I want to measure it once and make a conclusion? A snap shot? But if I measure in my town just after Christmas, I might be weighting that number (pun intended) versus the middle of summer for another town. Do I measure each day and average? Do I report the max weight for each individual, or the average over a year or month? Do I average weights by house, block, city, region? How do I determine trend? And better yet, how do I report the results to the public? What if I don’t measure everyone, but just some people? What if people move to and from the town?
        Let’s do it just like we do the weather – weigh each and every person every second of every day or every year. We now have, literally, billions of measurements.
        Now Fred, he has been gaining weight, and not exercising. I might report daily, hour, weekly weights for him that seem to trend up. He is always setting new records. But Tina, she is dieting, so I might notice the opposite trend. However, being obsessed with increasing weight measures, I might report, breathlessly, about how poor Fred has set a new high weight for April 1st, and ignore the fact Tina didn’t, or has even been trending down.
        Depending on how a parse my data, what I report, how I average it, I might be able to make news every day with new weight records for individuals, towns regions, individual days or months. And a person reading that news might think, my goodness, people are sure getting fat. But it is all just noise. Which leads to a conclusion that reporting weight records for individuals is a rather stupid exercise – individuals will be setting records all the time, and it doesn’t tell me jack about the overall trends for the population. In fact, the overall trend in weight gain could be flat, but I would still see all sorts of record highs and lows reported all the time.
        And just like temperature, weight is bounded – can’t weigh lass than zero, or more than 900 lbs. But I can also report new records by the smallest units – to the nearest pound, half pound, ounce, gram. Fred set a new record, 190 pounds! Wait, Fred set a new record, 190.1 pounds!
        And that is what I am saying – individual weather stations, on individual days, should be setting new records all the time, Yet that is NOT how things are commonly reported. We are given to believe that individual weather station X setting a record heat for April 1 has some sort of meaning, when in fact it means next to nothing, and tells us nothing about climate.

  4. If this leads to a greater number of accurate monitoring stations more “spatially regulated” this could be used to map all sorts of phenomena, such as accurately mapping the UHI effect between urban areas and their surrounds.

  5. More rediscovery of what is obvious. Generally, the more intense a tropical cyclone, the more localised the effects. The ice skater spinning phenomenon. High intensity wind can be accompanied by high intensity rain, but sometimes not. My location (NE Oz) is currently receiving more rain over 2 days than the long-term annual minimum, with no intense winds. Cyclone Yasi (2011) high winds over a wide area, not much rain.

  6. There was “guesswork” in this stuff before? Who knew?
    But I’m still unclear what they are doing? Their model still only works if their assumptions are right. If warming doesn’t cause more extreme events, then this is an utter waste of time. And if we don’t warm, even more so.So they are not improving the forecasts of anything, just modeling their assumption better.

    • Model assumptions? Re ‘The Scientific Method.’ First we guess, then
      we compute the consequences and then we compare directly to observation –
      don’t laugh! If it doesn’t comply with observation -you know-it’s wro-o-ng. Now
      this means yer hafta show yr werkings – can’t be tinkering and homogenizing ‘n
      addjusting and telling the plebs- ‘Plebs yer hafta trust us- our guess confirms to
      observations. Our tests conform to A1 quality control measurements.”

      • Everyone should listen to the video. My eyes teared over at the greatness of this man and how he was able to speak to audiences in a clear and whimsical manner to always keep his audience in a joyful mood. That would be enough to define a man who was very special but consider this. Feynmann, a Nobel prize winner, was involved in so many theoretical discoveries in physics that he has to be rated in the top 10 of physicists of all time. Just look at who were some of his attendees at his first seminar,(which was on the classical version of the Wheeler-Feynman absorber theory) which included Albert Einstein, Wolfgang Pauli, and John von Neumann. I wonder what Feynman thought of AGW. Listening to his presentation on what science is: clearly showed that the science rules were not followed in the global warming hypothesis.

  7. they recognise the problems ,that is good. deluding themselves these issues can be eliminated from the models is bad. i live in scotland , it is a small country. i used to ride sports motorcycles several hundred miles a day from the east coast to the north west highlands from spring to autumn. every possible weather condition that occurs in scotland could be encountered within 100 miles, sometimes far less.
    on other rare days the weather we left home with stayed with us all day , including intense rainfall. i sometimes wonder how much time some climate scientists actually spend outside in the real world as opposed to sitting in front of a computer.

  8. When they are ‘adjusting’ the past records to the present degree to show ‘Global Warming’ there are serious questions as to whether they need to actually record the present. Why not just make it all up?

  9. The Climate System is not random. It is Chaotic, which in my book may be described as a Complex of Randomness, whatever that means. To me it means that it contains many variables, often dependent upon each other, with a proportion subject to random values.
    Lancaster Unversity in chasing prediction is chasing a wild goose I fear.

  10. I’m confused by what the researchers appear to be doing or claiming. As the scale gets smaller, extreme events will be determined more and more by local topography. I don’t see how they can write a general program to deal with that unless they have a very detailed local map and details of how wind, rain etc behaves locally under various weather conditions. Even today I experienced areas a few yards apart which were building up snow drifts, and others which were scoured bare by the wind.

  11. This seems like a real advance. It would be good to get a statisticians comment. Briggs, where are you?. In Northern Aust. we have cases of one farm getting good rain while next door farms are in drought.

    • And you think statistical techniques are going to help you predict which is which? Ha Ha Ha Ha If you were being sarcastic your clues escaped me.

  12. Now here’s a long way to repeat P T Barnums famous quote about suckers: “an integrated statistical approach that eliminates this guesswork by combining these disparate spatial dependence models on a smooth continuum.”
    In other words, how are models never guesswork?

  13. How in the hell is a “new and improved” statistical technique going to improve the prediction of future extreme events when the base models now are piss poor at that? They may be able to better simulate the extreme event if the prediction happens to be correct. However these extreme events often give little warning except in cases of hurricanes and snowstorms. In those cases we don’t need better localized predictions because a hurricane isn’t localized and a snowstorm is often very well prepared for these days. A statistical technique will NO way be able to predict any other extreme event with any more accuracy of the prediction in the 1st place. I suppose they really mean flash floods as they seem to affect more people in a devastating manner. Well the same comment as above. They have to do better at predicting the flash flood in the 1st place instead of concentrating on simulating actual event correctly. Well I suppose if they have enough money to waste they can do both. If we would stop spending money trying to ban CO2 then I would support projects like this.

  14. Take a weather record – any record will do, as long as the data behind it are solid and the distribution not too skew (rainfall data of less than about a month is too skew) – and define the extreme events as those outside 95% of the data. Now see what the average time between each high or each low extreme event is. If there are 1000 data points, you should find about 25 high and 25 low extreme events, and the average time between high or low extremes will be about 40 data periods. BUT now calculate the standard deviation on your 40 period estimate – it will be quite large – probably close to 30. So there is a 95% chance the next extreme event will occur somewhere between the next moment and 100 data periods from now. Are you sure you can find a change in the rate??.

Comments are closed.