Researchers Devise More Accurate Method for Predicting Hurricane Activity

Kudos to NC State for providing a complete press release with the name of the paper and the abstract included. I wish more science PR writers would follow this example rather than make the reader go hunting for these things – Anthony

For Immediate Release

Researchers from North Carolina State University have developed a new method for forecasting seasonal hurricane activity that is 15 percent more accurate than previous techniques.

“This approach should give policymakers more reliable information than current state-of-the-art methods,” says Dr. Nagiza Samatova, an associate professor of computer science at NC State and co-author of a paper describing the work. “This will hopefully give them more confidence in planning for the hurricane season.”

The researchers, including Dr. Fredrick Semazzi (pictured), hope to use their new method to improve our understanding of hurricane behavior.

Conventional models used to predict seasonal hurricane activity rely on classical statistical methods using historical data. Hurricane predictions are challenging, in part, because there are an enormous number of variables in play – such as temperature and humidity – which need to be entered for different places and different times. This means there are hundreds of thousands of factors to be considered.

The trick is in determining which variables at which times in which places are most significant. This challenge is exacerbated by the fact that we only have approximately 60 years of historical data to plug into the models.

But now researchers have developed a “network motif-based model” that evaluates historical data for all of the variables in all of the places at all of the times in order to identify those combinations of factors that are most predictive of seasonal hurricane activity. For example, some combinations of factors may correlate only to low activity, while other may correlate only to high activity.

The groups of important factors identified by the network motif-based model are then plugged into a program to create an ensemble of statistical models that present the hurricane activity for the forthcoming season on a probability scale. For example, it might say there is an 80 percent probability of high activity, a 15 percent probability of normal activity and a 5 percent probability of low activity.

Definitions of these activity levels vary from region to region. In the North Atlantic, which covers the east coast of the United States, high activity is defined as eight or more hurricanes during hurricane season, while normal activity is defined as five to seven hurricanes, and low activity is four or fewer.

Using cross validation – plugging in partial historical data and comparing the new method’s results to subsequent historical events – the researchers found the new method has an 80 percent accuracy rate of predicting the level of hurricane activity. This compares to a 65 percent accuracy rate for traditional predictive methods.

In addition, using the network model, researchers have not only confirmed previously identified predictive groups of factors, but identified a number of new predictive groups.

The researchers plan to use the newly identified groups of relevant factors to advance our understanding of the mechanisms that influence hurricane variability and behavior. This could ultimately improve our ability to predict the track of hurricanes, their severity and how global climate change may affect hurricane activity well into the future.

The paper, “Discovery of extreme events-related communities in contrasting groups of physical system networks,” was published online Sept. 4 in the journal Data Mining and Knowledge Discovery. The paper is co-authored by Samatova; Dr. Fredrick Semazzi, a professor of marine, earth and atmospheric science at NC State; former NC State Ph.D. students Zhengzhang Chen and William Hendrix, who are both now postdoctoral researchers at Northwestern University; former NC State Ph. D. student Isaac Tetteh, now a lecturer at Kwame Nkrumah University of Science and Technology, Ghana; Dr. Alok Choudhary of Northwestern; and Hang Guan, a student at Zhejiang University. The research was supported by grants from the National Science Foundation and the Department of Energy.

-shipman-

Note to Editors: The study abstract follows.

“Discovery of extreme events-related communities in contrasting groups of physical system networks”

Authors: Zhengzhang Chen, William Hendrix, Isaac K. Tetteh, Fredrick Semazzi and Nagiza Samatova, North Carolina State University; Hang Guan, Zhejiang University; Alok Choudhary, Northwestern University

Published: Online Sept. 4, Data Mining and Knowledge Discovery

Abstract: The latent behavior of a physical system that can exhibit extreme events such as hurricanes or rainfalls is complex. Recently, a very promising means for studying complex systems has emerged through the concept of complex networks. Networks representing relationships between individual objects usually exhibit community dynamics. Conventional community detection methods mainly focus on either mining frequent subgraphs in a network or detecting stable communities in time-varying networks. In this paper, we formulate a novel problem—detection of predictive and phase-biased communities in contrasting groups of networks, and propose an efficient and effective machine learning solution for finding such anomalous communities. We build different groups of networks corresponding to different system’s phases, such as higher or low hurricane activity, discover phase-related system components as seeds to help bound the search space of community generation in each network, and use the proposed contrast-based technique to identify the changing communities across different groups. The detected anomalous communities are hypothesized (1) to play an important role in defining the target system’s state(s) and (2) to improve the predictive skill of the system’s states when used collectively in the ensemble of predictive models. When tested on the two important extreme event problems—identification of tropical cyclone-related and of African Sahel rainfall-related climate indices—our algorithm demonstrated the superior performance in terms of various skill and robustness metrics, including 8–16 % accuracy increase, as well as physical interpretability of detected communities. The experimental results also show the efficiency of our algorithm on synthetic datasets.

0 0 votes
Article Rating
22 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Ray
September 11, 2012 1:12 pm

15% more precise on unpredictability is still unpredictable. /sarc

Theo Goodwin
September 11, 2012 1:22 pm

“But now researchers have developed a “network motif-based model” that evaluates historical data for all of the variables in all of the places at all of the times in order to identify those combinations of factors that are most predictive of seasonal hurricane activity.”
Hooray! This is what I taught to decision makers for many years. A model is worthless. But a history of model runs, a history of one or more variables found to be trustworthy in extrapolations, and a history of the decisions made and results found for all runs, taken together, can be darn useful in planning. Now we are getting into serious modelling.

Alvin
September 11, 2012 1:28 pm

And here’s the “money shot”, or should I say “Grant Shot”
This could ultimately improve our ability to predict the track of hurricanes, their severity and how global climate change may affect hurricane activity well into the future.

September 11, 2012 1:34 pm

I read this paragraph “Using cross validation – plugging in partial historical data and comparing the new method’s results to subsequent historical events – the researchers found the new method has an 80 percent accuracy rate of predicting the level of hurricane activity. This compares to a 65 percent accuracy rate for traditional predictive methods.”
I did not bother reading any more. When they predicit what will happen in the future, and show that their predictions are 15% more accurate, then let me read some more.

September 11, 2012 2:20 pm

I will wait for an opinion from Dr. William Gray and Dr. Phil Klotzbach of Colorado State University.
Or, for North Carolina State University publishing their forecast for the 2013 season.
The proof, shall be in the pudding.

rgbatduke
September 11, 2012 2:47 pm

Actually, it looks like damn good science to me. This is the way statistics should be used — to build predictive models that are iteratively refined and cross-validated. Best of all, a good predictive model uses each new year of data as it comes in to refine and improve the model, and this one appears to be constructed to do just that. So it gets BETTER over time.
This is my thing. I’m not sure I would have built the model exactly the same way that they did, but the way they did is a very reasonable and correct way to do it, and the proof is that it WORKS BETTER than prior attempts, at least within the cross-validation set versus the training set of data. One always does have to hold one’s breath and cross one’s fingers to see if it is predictive in the long run of new data, but this model has a very good chance of doing a decent job.
rgb

Darren Potter
September 11, 2012 2:51 pm

“This challenge is exacerbated by the fact that we only have approximately 60 years of historical data to plug into the models.”
If only they had contacted Briffa and Mann they could haven gotten thousands of years of proxy data to plug into their models.
/snark

rgbatduke
September 11, 2012 2:52 pm

Incidentally, nearly all predictive models and pattern recognition codes are built using one portion of the data — the training data (term from neural network training, but it applies to PR in general) — and then used to predict a second portion that was randomly held out — the trial data or cross-validation data. There isn’t really any difference between predicting next year and predicting the last five years as long as you didn’t use the last five years in your training data.
The only real worry with a model like this is that 60 years isn’t very long. They have a ton of variables and only 60 exemplars. Obviously (to me) the dimensionality of the resulting model is going to be low, and the temptation to overfit high, and in the end the statistical reliability of the resulting model somewhat reduced. So sure, worry about whether or not they can maintain 80% confidence, but at the same time recognize that even within what they’ve attempted they’ve done comparatively well. It is reasonable to expect they’ll do well predicting real world numbers.
rgb

Richard Lewis
September 11, 2012 3:47 pm

Wow! “But now researchers have developed a “network motif-based model” that evaluates historical data for all of the variables in all of the places at all of the times.” Certainly sounds “all encompassing”!

September 11, 2012 4:09 pm

This all sounds good and even makes some sense. I’m confident improvements will result. The trick here is to make sure neither the modelers or the result users start to believe the predictions are better then they are demonstrated to be.

Theo Goodwin
September 11, 2012 4:51 pm

Please notice that in my post above I did not use the word ‘prediction’. Looking at the history of model use for a particular purpose can improve model use for decision makers. However, it does not elevate a model to the level of science and cannot support predictions.

Bill Jamison
September 11, 2012 5:01 pm

Only time will tell if the skill level is really improved. Of course simply knowing that a given season will be more or less active doesn’t provide a huge benefit to the average person. It’s more important to traders and insurers.

Mike McMillan
September 11, 2012 6:14 pm

This sounds similar to what you’re trying to do with principle component analysis, finding the combination that counts most. PC1 shows level activity all spring and summer, then a steep rise in September, sort of a hockey stick shape.

Colonial
September 11, 2012 9:10 pm

Remember the furor over the Barbie doll who was reported to have said (when her string was pulled and released), “Math is hard”? This article contains an indication that even university professors find grade school math hard.
If the previous prediction method is 65% accurate and the new one is 80% accurate, the improvement is not, repeat NOT, 15%. Here’s how I was taught to calculate percentage improvements (in grade school, mind you):
(((0.80 – 0.65) / 0.65) * 100) = ((0.15 / 0.65) * 100) = 23.076923076923076923076923076923
(Don’t you just love the Windows calculator’s approach to significant digits?)
In other words, going from 65% accuracy to 80% accuracy is an improvement of approximately 23%, which is nowhere near 15% (except in climate science). Send ’em all back to grade school!

Breaker
September 11, 2012 11:34 pm

“for all of the variables in all of the places at all of the times in order to identify those combinations of factors that are most predictive of seasonal hurricane activity. For example, some combinations of factors may correlate only to low activity, while other may correlate only to high activity.”
May be useful and may not be. I’ve played with approaches like this in the past just to see what would happen and they easily produce cross-validated results that look quite impressive but that are useless to predict data that is not part of the set being used for cross-validation. If you have enough features and then examine all combinations of two and three features that permute out of them, you will get apparently predictive combinations, even using cross-validation. NB, when I did these experiments, I was combining binned feature sets of two or three features into a single feature and then asking if the combined binned feature was predictive of a target output. It’s been quite a while, but I believe I used information theoretic binning methods that allowed use of the target output in the binning. However, the held-out data in cross-validation was not used to set the bin boundaries.
It will be interesting to see how the derived hurricane data sets do predicting the future.

September 12, 2012 6:16 am

When theoretical predictions fail, statistical extrapolations are the next best thing. Which is not to say this isn’t a worthwhile and potentially valuable exercise. Hopefully, it proves to have predictive accuracy, but science it aint.

Dr. Lurtz
September 12, 2012 7:03 am

Come on!! It is about time that Hurricane “prediction” uses real time data and variables. The statistical approach is a complete failure! The same type of statistical approach has been used to predict Sunspots [Sun activity]; how well has that worked out [five major revisions downward]???
You need to be a “real scientist” to identify the variables and how they affect system behavior. The “pseudo scientists”, that only use historical data to predict the future, are just that “pseudo scientists”.
The hilarious thing is that the “enormous” number of variables consists of “temperature and humidity [did they leave out wind speed].

tadchem
September 12, 2012 11:05 am

Someone once said “the better the history; the better the prophesy.”
If you wish to make predictions / forecasts you had better include *all* the data you can. The larger the database, the narrower the error bars on the forecasts. Even so, there is a point of diminishing returns in which the expense of expanding the database does not justify the marginal reduction of uncertainty.
If the climate were a system with Cartesian determinacy, it would simply be a matter of identifying more variables to be thrown into the pot. But Chaos Theory (combined with the Heisenberg Principle) tells us we can never have sufficient data to treat the climate as a fully deterministic system amenable to precise predictions in long-range time frames.
The line between more precise forecasts and more expensive data acquisition will be drawn by accountants.

jayhd
September 12, 2012 1:13 pm

What good is predicting the number of hurricanes? Who cares about the number. It’s their track, speed and power that really count. It only takes one category 5 hitting the mainland to ruin everyone’s year.

Editor
September 12, 2012 4:21 pm

Yes, jayhd and others are absolutely correct. Except for the odd CliSci specialist, to whom hurricanes may be huge movers of energy/heat from one part of the atmosphere to another, we humans care ONLY (well, almost only…) when and where a hurricane will strike … and 90+% of that interest is the area of landfall and the intensity when it hits.
My wife and I live on our sailboat in the northern Caribbean, and have for the last ten years. Hurricanes that pass more than 75 miles away are nothing more than “yawns in paradise”. The latest studies on track accuracy show successful 48 hour prediction to be +/- 50 miles (so, ‘somewhere in this 100 mile stretch of land’). Accurate enough for our purposes, safety at sea.
The ‘number of hurricanes’ is not a useful metric….this season is a prime example. 13 Tropical Cyclones so far…5 hurricanes. Only two, short-lived, hurricanes struck land (Ernesto in the Yucatan and Isaac in Louisiana), neither of these had hurricane force winds for even a full 24 hours. In fact, only one of the seasons hurricanes was so classified for more than 24 hours, Gordon, who wandered the Atlantic.
Oceanic shipping, not an unimportant part of society, is affected by these ‘only at sea’ storms.

ABT
September 12, 2012 9:23 pm

RGB
I only skimmed the paper, but it looks like they were using higher-resolution temporal data for to train. It wasn’t so much a snapshot of data at a give point in the hurricane’s existence but rather all the data during the entire lifetime of the storm. That makes this study rather impressive and the fact that they are going to be using it to try to understand physical, process-based mechanisms driving the features they see, well thats the correct mindset!

Brian H
September 13, 2012 12:07 am

Andres Valencia says:
September 11, 2012 at 2:20 pm
I will wait for an opinion from Dr. William Gray and Dr. Phil Klotzbach of Colorado State University.
Or, for North Carolina State University publishing their forecast for the 2013 season.
The proof, shall be in the pudding.

Yes, it’s all “synthetic data sets” so far. Let’s see some real world performance.
The expression, btw, is “The proof of the pudding is in the eating.”