Kudos to NC State for providing a complete press release with the name of the paper and the abstract included. I wish more science PR writers would follow this example rather than make the reader go hunting for these things – Anthony
For Immediate Release
Researchers from North Carolina State University have developed a new method for forecasting seasonal hurricane activity that is 15 percent more accurate than previous techniques.
“This approach should give policymakers more reliable information than current state-of-the-art methods,” says Dr. Nagiza Samatova, an associate professor of computer science at NC State and co-author of a paper describing the work. “This will hopefully give them more confidence in planning for the hurricane season.”
The researchers, including Dr. Fredrick Semazzi (pictured), hope to use their new method to improve our understanding of hurricane behavior.
Conventional models used to predict seasonal hurricane activity rely on classical statistical methods using historical data. Hurricane predictions are challenging, in part, because there are an enormous number of variables in play – such as temperature and humidity – which need to be entered for different places and different times. This means there are hundreds of thousands of factors to be considered.
The trick is in determining which variables at which times in which places are most significant. This challenge is exacerbated by the fact that we only have approximately 60 years of historical data to plug into the models.
But now researchers have developed a “network motif-based model” that evaluates historical data for all of the variables in all of the places at all of the times in order to identify those combinations of factors that are most predictive of seasonal hurricane activity. For example, some combinations of factors may correlate only to low activity, while other may correlate only to high activity.
The groups of important factors identified by the network motif-based model are then plugged into a program to create an ensemble of statistical models that present the hurricane activity for the forthcoming season on a probability scale. For example, it might say there is an 80 percent probability of high activity, a 15 percent probability of normal activity and a 5 percent probability of low activity.
Definitions of these activity levels vary from region to region. In the North Atlantic, which covers the east coast of the United States, high activity is defined as eight or more hurricanes during hurricane season, while normal activity is defined as five to seven hurricanes, and low activity is four or fewer.
Using cross validation – plugging in partial historical data and comparing the new method’s results to subsequent historical events – the researchers found the new method has an 80 percent accuracy rate of predicting the level of hurricane activity. This compares to a 65 percent accuracy rate for traditional predictive methods.
In addition, using the network model, researchers have not only confirmed previously identified predictive groups of factors, but identified a number of new predictive groups.
The researchers plan to use the newly identified groups of relevant factors to advance our understanding of the mechanisms that influence hurricane variability and behavior. This could ultimately improve our ability to predict the track of hurricanes, their severity and how global climate change may affect hurricane activity well into the future.
The paper, “Discovery of extreme events-related communities in contrasting groups of physical system networks,” was published online Sept. 4 in the journal Data Mining and Knowledge Discovery. The paper is co-authored by Samatova; Dr. Fredrick Semazzi, a professor of marine, earth and atmospheric science at NC State; former NC State Ph.D. students Zhengzhang Chen and William Hendrix, who are both now postdoctoral researchers at Northwestern University; former NC State Ph. D. student Isaac Tetteh, now a lecturer at Kwame Nkrumah University of Science and Technology, Ghana; Dr. Alok Choudhary of Northwestern; and Hang Guan, a student at Zhejiang University. The research was supported by grants from the National Science Foundation and the Department of Energy.
Note to Editors: The study abstract follows.
“Discovery of extreme events-related communities in contrasting groups of physical system networks”
Authors: Zhengzhang Chen, William Hendrix, Isaac K. Tetteh, Fredrick Semazzi and Nagiza Samatova, North Carolina State University; Hang Guan, Zhejiang University; Alok Choudhary, Northwestern University
Published: Online Sept. 4, Data Mining and Knowledge Discovery
Abstract: The latent behavior of a physical system that can exhibit extreme events such as hurricanes or rainfalls is complex. Recently, a very promising means for studying complex systems has emerged through the concept of complex networks. Networks representing relationships between individual objects usually exhibit community dynamics. Conventional community detection methods mainly focus on either mining frequent subgraphs in a network or detecting stable communities in time-varying networks. In this paper, we formulate a novel problem—detection of predictive and phase-biased communities in contrasting groups of networks, and propose an efficient and effective machine learning solution for finding such anomalous communities. We build different groups of networks corresponding to different system’s phases, such as higher or low hurricane activity, discover phase-related system components as seeds to help bound the search space of community generation in each network, and use the proposed contrast-based technique to identify the changing communities across different groups. The detected anomalous communities are hypothesized (1) to play an important role in defining the target system’s state(s) and (2) to improve the predictive skill of the system’s states when used collectively in the ensemble of predictive models. When tested on the two important extreme event problems—identification of tropical cyclone-related and of African Sahel rainfall-related climate indices—our algorithm demonstrated the superior performance in terms of various skill and robustness metrics, including 8–16 % accuracy increase, as well as physical interpretability of detected communities. The experimental results also show the efficiency of our algorithm on synthetic datasets.