Last quarter I taught Atmospheric Sciences 101 and as a fun extra-credit activity students had the opportunity to participate in a forecasting competition in which they predicted temperatures and probability of precipitation at Sea-Tac Airport. The National Weather Service forecast is scored as well to provide a comparison to highly trained and experienced forecasts. In addition, we averaged the prediction of all the students, producing what is known as a consensus forecasts.
Now who do you think won? The pros at the Seattle National Weather Service office or the average of the inexperienced, weather newbies in my class? The answer is found below–the consensus of the students was considerably superior to the Weather Service folks (click on image to enlarge).
Students were number two overall and the NWS was in sixth place.
A fluke ? No–it happens this way virtually EVERY YEAR. To illustrate this, here are the results for 2004. In that year, the average of the students was fourth, the NWS experts were in 10th place. You will notice that some individual students sometimes came in first or ahead of the NWS…that could be just random luck due to the brevity of the forecast contest (1-1.5 months).
This phenomenon is often called the Wisdom of Crowds and has been the subject of a number of journal articles. So why might an average of the students be better than a NWS forecaster? Some possibilities include:
1. The average forecast of a group will tend to damp out forecast extremes, which produce very bad scores when they are wrong.
2. Students look at many different sources of information, using weather information in different ways and viewing many different forecasts (e.g., from various private sector groups). Forecasts derived from an average of many different sources tends to be more skillful on average.
3. Some of them might have took at look at superior forecasts, say form weather.com or accuweather.
I can think of other possibilities…perhaps you can too.
This wisdom of crowds finding is closely relate to why we make ensemble forecasts, running models many times, each slightly differently. The average of these many forecasts is on average the best forecast to use.
So next time you need a forecast, trying averaging the guesses of your friends or classmates. Does this idea apply to elections? Now that is a subject I think I want to avoid.