With an ever-growing plethora of hurricane computer forecast models out there – including two new ones that became operational just last year – it can be ever more puzzling to figure out which model to believe. But the best approach remains: Don’t take any of them as gospel. Put your trust in the National Hurricane Center, or NHC, forecast.
It’s always been the case that a particular forecast model may outperform the official NHC forecast in some situations. However, the 2023 NHC Forecast Verification Report reiterates a longstanding truth: overall, it is very difficult for any one model to consistently beat the NHC forecasts for track and for intensity.
Track forecasts: NHC had an off year in 2023
During the 2023 Atlantic hurricane season, the mean NHC official track forecast errors in the Atlantic basin were notably higher than their previous 5-year means, especially at the longer lead times (3-day, 4-day, and 5-day forecasts). The average track error of 69 nautical miles (nm) for NHC’s two-day forecasts fell short of the goal of 53 nm set for 2022 as part of the Government Performance and Results Act of 1993. One long-lived and very difficult-to-forecast storm, Philippe, was responsible for a large portion of the NHC’s failure to meet this goal. NHC track errors were lower than average for two of the three major hurricanes, Idalia and Lee, but were higher than average for the other one, Franklin.
Over the past 20 years, going by a linear average that includes the off year of 2023, one- to three-day track forecast errors have been reduced by about 75%, and four-day and five-day track forecast errors have fallen by 60%. Those numbers amount to an extraordinary accomplishment, one undoubtedly leading to huge savings in lives, damage, and emotional angst. The improvement in track forecast accuracy has slowed down in recent years, however, suggesting that forecasts may be nearing their limit in accuracy because of the chaotic nature of the atmosphere.
The higher error in official NHC track forecasts in 2023 means that the forecast “cones” in 2024 are slightly larger than before (3% to 8% broader at time frames beyond 24 hours). The cone width at any forecast time is based on average error over the prior five years – so whether a storm is well-behaved or tough to predict, the cone width will be the same, as it reflects the average historical error rather than the uncertainty specific to a given hurricane. Also, the widths are calibrated so that on average they will capture about two-thirds of hurricane positions, meaning that the observed track positions can be expected to stray outside the cone about a third of the time. (The two-thirds value is intended to strike a balance between over- and under-warning.)
Best track models in 2023: the European and HAFS-B
As usual, the official NHC track forecasts for Atlantic storms in 2023 were tough to beat. The new HAFS-B model outperformed all the other models for 3-, 4-, and 5-day forecasts, and beat the official forecast for 5-day forecasts. The European model was the best model for the shorter lead times from 12 to 72 hours, and it outperformed the official 3-day forecasts from NHC. The UKMET and GFS ensemble were the next best models, and the remainder of the models, including the GFS, HWRF, HMON, and COAMPS, trailed (Fig. 2).
To interpret Fig. 2, note that the CLIPER5 model (which combines the word “climatology” and “persistence” to show the nature of the forecasts it makes) is tough to outperform at short-term forecasts, since a hurricane will tend to keep moving in the same direction and at the same speed as at its initial point (this is called persistence). For that reason, the skill curve in Fig. 2 shows relatively low skill for NHC forecasts for short-term forecasts out to one day; skill increases for forecasts between one and three days, when persistence tends not to be a good forecast (hurricanes generally don’t move in a straight line at a constant speed for days on end). Beyond three-day forecasts, NHC forecast skill starts to drop off, as the CLIPER5 model starts weighting its forecasts using climatology, which becomes tougher to beat at long ranges.
Here is a list of some of the top hurricane forecast models used by NHC:
Euro: The European Center for Medium-range Weather Forecasting (ECMWF) global forecast model
GFS: The National Oceanic and Atmospheric Administration (NOAA) Global Forecast System model
UKMET: The United Kingdom Met Office’s global forecast model
HAFS: Hurricane Analysis and Forecast System (newly added in 2023; see below)
HMON: Hurricanes in a Multi-scale Ocean-coupled Non-hydrostatic regional model, initialized using GFS data
HWRF: Hurricane Weather and Research Forecasting regional model, initialized using GFS data
COAMPS: The Navy’s COAMPS-TC regional model, initialized using GFS data
NHC intensity forecasts: a decent year in 2023
Though intensity forecasts have not improved as dramatically as track forecasts over the past 30 years, there has been a notable decrease since around 2010 in intensity errors. Official NHC intensity forecast errors in the Atlantic in 2023 were 10% to 18% smaller than the five-year average for time periods from 12 to 72 hours. No records for intensity accuracy were set in 2023 for forecasts at any lead time (Fig. 3), though near-record accuracy was achieved with the 24- and 48-hour forecasts.
Mean intensity forecast errors in 2023 (expressed in maximum sustained winds) were about 7 mph at 24 hours and increased to about 15 mph for five-day forecasts. The official forecasts had little bias through 24 hours but were biased slightly too high for 3-, 4-, and 5-day forecasts.
Best intensity model in 2023: The blends and NHC take the prize
In 2023, the official NHC intensity forecast outperformed all models at four and five days out. The IVCN consensus model outcompeted the official forecast at all of the shorter time frames, as did the HCCA consensus model at most of the shorter time frames. Notably, none of the individual models outperformed the official forecast or the IVCN and HCCA model blends at any time frame – a switch from past years, when the HMON and HWRF models sometimes did better on average than the intensity forecasts from NHC and from any model blend.
Over the past few years, the five top intensity models have typically been the regional/dynamical models HWRF, HMON, and COAMPS-TC (which subdivide the atmosphere into a 3-D grid around the storm and solve the atmospheric equations of fluid flow at each point on the grid), and the statistics-based LGEM and DSHP models (DSHP is the SHIPS model with inland decay of a storm factored in).
Two of the top-performing global dynamical models for hurricane track, the European (ECMWF) and GFS models, are typically not considered by NHC forecasters when making intensity forecasts. These models have traditionally made poor intensity forecasts, and this was the case again in 2023 for the Euro, as shown by the pale blue line near the bottom of Fig. 4. However, the GFS model (dark blue line in Fig. 4) was competitive in 2023 (as it was also in 2022), holding its own with the better intensity models.
Best model for Hurricane Beryl: COAMPS
Using data compiled from SUNY Albany’s Brian Tang, we plotted up the model performance for 2024’s most significant Atlantic tropical cyclone thus far, Hurricane Beryl. For track, the Navy’s COAMPS model performed brilliantly, outperforming every model and the official NHC forecast at every time period, with track errors more than a factor of two better for 4- and 5-day forecasts. This edge was especially apparent near the end of Beryl’s long life, when COAMPS caught on early to the Texas landfall that hit the Houston area much harder than had been anticipated even a couple of days earlier. Compared to the 5-year average NHC track error, the official NHC forecast for Beryl did about average for 1-day and 2-day forecast, and significantly worse than average for longer-range forecasts.
For intensity, all of the models and the official NHC forecast did considerably worse than the 5-year average for 1-day, 2-day, and 3-day forecasts, but considerably better at 5-day forecasts. (Storms that rapidly intensify almost always have higher intensity forecast errors than average.) The overall best intensity forecasts were made by the COAMPS and HWRF models. Almost all the of the models (and the official forecast) tended to underpredict Beryl’s intensity. Note that the best track model, the European, is not a good intensity model.
New on the scene: the HAFS model
For the 2023 season, NHC brought two variants of the new Hurricane Analysis and Forecast System (HAFS) model into the fold of its model guidance. HAFS, which became fully operational on June 27, 2023, is now the preferred option within the National Weather Service for high-resolution track and intensity forecasts, similar to the guidance long provided by HMON and HWRF. (These two models are still being run this season, but in “legacy” mode, so the underlying code will no longer be updated.) Three years of testing (2020-2022) showed improvements of up to 10% in both track and intensity for HAFS versus HWRF.
HAFS is the hurricane-oriented element of the NWS Unified Modeling System, which uses a common dynamical core that’s designed to help streamline the agency’s key modeling efforts. Also part of this unified system is the current GFS model, which will provide input to the higher-resolution HAFS.
Two versions of HAFS are being run, both out to 126 hours and with maximum resolutions of 2 km in and around tropical cyclones:
- HAFS-A (for all global oceanic basins)
- HAFS-B (only for those basins monitored by NHC, including the North Atlantic and the Eastern and Central Pacific)
Sources of free model data
About ensemble models
Ensemble model runs are available for most of the top global models. An ensemble model is created by taking the forecast from the high-resolution version of a model like the GFS or European, then running multiple versions of the model with slightly different initial conditions to generate an ensemble of potential forecasts that suggest uncertainties that may exist. These ensemble members are run at a lower resolution to save computer time. The European model has 51 ensemble members, and the GFS has 31. The 0Z GFS run (called GEFS) goes out to Day 35 (note: there is approximately a 24-hour delay for Days 17-35 to be recorded). Note that Days 17-35 ensemble forecasts should be taken with a large grain of salt for now but may still be useful for tracking long-term or seasonal shifts.
Ensembles are especially useful for setups such as weak steering flow, where the varied starting conditions across a model ensemble may shed light on important features that the observing grid hasn’t yet captured directly. When the spread in a model ensemble decreases as a storm evolves, it’s a good sign that the forecast from that operational model is becoming more reliable. Keep in mind that one model’s ensemble tracks can sometimes be in tight agreement while another model’s ensemble is in tight agreement on a completely different solution. In such a case, it’s often the different physics within each model that are driving the difference, which makes it especially important to watch how the consensus model output evolves (the average forecast from three or more separate models averaged together, like the GFS, European, and UKMET models).
Tropical cyclone genesis forecasts
NHC has long issued a Tropical Weather Outlook four times per day, offering two-day and five-day forecasts of tropical cyclone genesis. The five-day forecasts have been expanded this year to cover seven days. For the Atlantic in 2023, these forecasts were pretty reliable for five-day genesis forecasts of 10 – 70%. For example, when NHC gave a 70% chance a tropical cyclone would form within five days, one actually did form about 67% of the time.
However, NHC’s genesis forecasts were too conservative at the upper end of the distribution. All of the Atlantic storms to which NHC gave an 80% and 90% chance of development did, in fact, develop.
A 2016 study by a group of scientists led by Florida State’s Daniel Halperin, though now seven years old, is worth noting: it found that four models can make decent forecasts out to five days in advance of the genesis of new tropical cyclones in the Atlantic. The model with the highest success ratio (rewarding correct genesis forecasts combined with the fewest false alarms) was the European, followed by the UKMET, GFS, and Canadian models.
The scientists authoring that study found that skill declined markedly for forecasts beyond two days into the future, and skill was lowest for small tropical cyclones. The European model had the lowest probability of correctly making a genesis forecast – near 20% – but had the fewest false alarms. The GFS correctly made genesis forecasts 20 – 25% of the time but had more false alarms. The Canadian model had the best chance of making a correct genesis forecast but also had the highest number of false alarms. The take-home message: The Canadian model’s predicting genesis suggests something may be afoot, but don’t bet on it until the European model comes on board. In general, when two or more models make the same genesis forecast, the odds of the event actually occurring increase considerably, the study authors found.
Sources of tropical cyclone genesis forecasts
We help millions of people understand climate change and what to do about it. Help us reach even more people like you.