From the INSTITUTE OF ATMOSPHERIC PHYSICS, CHINESE ACADEMY OF SCIENCES and the “pyramid schemes” department:
A new method to evaluate overall performance of a climate model
Many climate-related studies, such as detection and attribution of historical climate change, projections of future climate and environments, and adaptation to future climate change, heavily rely on the performance of climate models. Concisely summarizing and evaluating model performance becomes increasingly important for climate model intercomparison and application, especially when more and more climate models participate in international model intercomparison projects.
Most of current model evaluation metrics, e.g., root mean square error (RMSE), correlation coefficient, standard deviation, measure the model performance in simulating individual variable. However, one often needs to evaluate a model’s overall performance in simulating multiple variables. To fill this gap, an article published in Geosci. Model Dev., presents a new multivariable integrated evaluation (MVIE) method.
“The MVIE includes three levels of statistical metrics, which can provide a comprehensive and quantitative evaluation on model performance.”
Says XU, the first author of the study from the Institute of Atmospheric Physics, Chinese Academy of Sciences. The first level of metrics, including the commonly used correlation coefficient, RMS value, and RMSE, measures model performance in terms of individual variables. The second level of metrics, including four newly developed statistical quantities, provides an integrated evaluation of model performance in terms of simulating multiple fields. The third level of metrics, multivariable integrated evaluation index (MIEI), further summarizes the three statistical quantities of second level of metrics into a single index and can be used to rank the performances of various climate models. Different from the commonly used RMSE-based metrics, the MIEI satisfies the criterion that a model performance index should vary monotonically as the model performance improves.
According to the study, higher level of metrics is derived from and concisely summarizes the lower level of metrics. “Inevitably, the higher level of metrics loses detailed statistical information in contrast to the lower level of metrics.” XU therefore suggests, “To provide a more comprehensive and detailed evaluation of model performance, one can use all three levels of metrics.”
This paper develops a multivariable integrated evaluation (MVIE) method to measure the overall performance of climate model in simulating multiple fields. The general idea of MVIE is to group various scalar fields into a vector field and compare the constructed vector field against the observed one using the vector field evaluation (VFE) diagram. The VFE diagram was devised based on the cosine relationship between three statistical quantities: root mean square length (RMSL) of a vector field, vector field similarity coefficient, and root mean square vector deviation (RMSVD). The three statistical quantities can reasonably represent the corresponding statistics between two multidimensional vector fields. Therefore, one can summarize the three statistics of multiple scalar fields using the VFE diagram and facilitate the intercomparison of model performance. The VFE diagram can illustrate how much the overall root mean square deviation of various fields is attributable to the differences in the root mean square value and how much is due to the poor pattern similarity. The MVIE method can be flexibly applied to full fields (including both the mean and anomaly) or anomaly fields depending on the application. We also propose a multivariable integrated evaluation index (MIEI) which takes the amplitude and pattern similarity of multiple scalar fields into account. The MIEI is expected to provide a more accurate evaluation of model performance in simulating multiple fields. The MIEI, VFE diagram, and commonly used statistical metrics for individual variables constitute a hierarchical evaluation methodology, which can provide a more comprehensive evaluation of model performance.