Should we trust artificial intelligence to predict natural disasters?

Count weather forecasting among the numerous industries AI has the potential to transform.

Artificial intelligence is already helping improve forecasts of hurricane tracks, tornado potential, flood risk and other weather threats, but meteorologists are still wrestling with how to fully integrate AI models into daily forecasting and how much to trust the new predictions.

Weather models generated by AI are faster and cheaper than conventional, government-run models and hold the potential to help meteorologists increase accuracy and allow for earlier warnings at a time when weather extremes are more frequent and often catastrophic.

“In the last 12 months, we have had a tsunami of demonstrations of different AI methods being used for forecasting across a wide variety of tasks and scales,” Amy McGovern, director of the National Science Foundation’s AI Institute for Research on Trustworthy AI in Weather, Climate, and Coastal Oceanography, said in an email. “People are really seeing the power of AI and realizing that it can be used to help forecasts.”

The maturation of AI weather models comes a decade after the rivalry between conventional weather model heavyweights — the “European model” and the “American model”— first went mainstream. Now a new wave of AI models, largely developed by the private sector, appears to be equaling or exceeding the performance of conventional models operated by the world’s leading government weather agencies.

But it’s still up for debate if and when AI models could become the primary tools used by meteorologists to make forecasts. Experts say it depends not only on how easily commercially developed AI models can be transitioned into operations but also whether they will be trusted and embraced by government agency leaders and forecasters.

“The most innovative work on the modeling itself seems to come from private companies right now more than the [government] weather services. The weather services need to maybe be paying more attention to this,” said University of Washington professor of atmospheric sciences Dale Durran, who helped build one of the earliest machine learning models for weather forecasting. “They have a lot invested in the current approach and it works pretty well, but it’s very computationally intensive.”

Extreme weather remains greatest forecasting challenge

The accuracy of weather models has improved by about one day per decade: Today’s seven-day forecast is about as accurate as the three-day forecast was in the 1980s. Still, extreme weather hazards such as hurricanes, tornadoes, floods, hail, winter storms, heat waves and drought remain challenging to predict with the detail, confidence and advance notice that people need to properly prepare.

The National Oceanic and Atmospheric Administration estimates the cost of the most extreme weather and climate disasters impacting the United States during the past seven years at more than $1 trillion. Globally, the World Bank estimates that, in addition to saving tens of thousands of lives, improved forecasts and early warning systems could yield an annual economic benefit of $162 billion.

Hurricane forecasts are a prime example of both the progress that’s been made and challenges that remain. National Hurricane Center data shows a steady increase in the accuracy of hurricane track forecasts, with the average error for a three-day forecast dropping from more than 300 nautical miles in the 1980s to less than 100 nautical miles during the last few years. Hurricane intensity forecasts have improved as well, albeit at a slower pace.

It was only last September, though, when a difficult track forecast left some in southwest Florida confused about Hurricane Ian’s path. The Hurricane Center’s initial projections for Ian’s landfall leaned toward Florida’s Big Bend area and Tampa Bay, or around 100 to 275 miles north of where the storm ultimately came ashore near Fort Myers. Even though Fort Myers was always within the center’s forecast cone, the forecast didn’t lock in on the area until about 24 hours before landfall.

Officials in Lee County, where Fort Myers is located and where there were more than 70 reported deaths related to Ian, were second-guessed for not issuing evacuation orders until the day before landfall, which may have been too late for some people to leave.

Precipitation is another area in which major forecast challenges persist. A NOAA document outlining its strategy for precipitation prediction says that unlike temperature forecasts, which have “improved greatly over the last few decades,” precipitation forecasts have not made the same strides. Accurate predictions, the report adds, “are needed by every person and business in the United States and at almost every timescale.”

A novel approach to forecasting

Computer models first started to produce useful weather forecasts in the 1950s based on complex mathematical equations that describe the day-to-day evolution of the atmosphere. Running these models requires tremendous processing power, and upgrading them is an expensive process that can take years to complete. Meanwhile, new supercomputers, such as the one the U.K. Met Office purchased from Microsoft in 2021, can cost more than $1 billion.

AI’s approach to forecasting is novel. AI models are first trained on vast amounts of historical data, which they analyze to find relationships between past observations or forecasts and the conditions that followed. To make a forecast, they ingest conditions or forecasts from conventional models, and then apply what they learned from the past. The concept is not unlike how meteorologists use their experience to anticipate how the weather might deviate from a model forecast, but AI does this at a scale and speed impossible for the human brain.

Once trained, AI models can generate forecasts in seconds to a few minutes on a desktop computer, compared to more than an hour on large supercomputers for most conventional models. Retraining an AI model on recent data only takes a few hours to a few weeks, potentially accelerating what historically has been a steady — but relatively slow — pace of forecast improvement.

Most AI models now being used operationally are “hybrid” models that use forecasts from conventional models as a starting point to predict the risk of a particular weather hazard in a specific region. But a rapidly emerging new breed of “pure” AI models can directly ingest conditions and make global forecasts independent from conventional models.

It’s largely the private sector driving the development of these more ambitious global AI forecast systems. One such model, called GraphCast, has been developed by Google DeepMind, Google’s London-based AI research lab. GraphCast can generate a forecast out to 10 days in less than a minute and was found to be about 10 to 30 percent more accurate than the European model, according to an article published in December to the online database arXiv, which does not require peer review.

“For a while, I think people in the weather community probably thought that [AI] was never going to be able to be at a point where it can compete with the traditional approach,” Google DeepMind senior research scientist Rémi Lam, said in an interview. “I think we’re at the point where the meteorologist community is now paying attention and thinking there’s something happening here.”

Microsoft, NVIDIA and China-based Huawei have also published academic articles in the past 16 months claiming their AI weather models perform as well as or better than the European model, widely considered the gold standard of conventional models. NVIDIA’s model, FourCastNet, is 50,000 times faster than conventional models and 10,000 times more energy efficient, a company spokesperson said in an email.

“Big tech firms have vastly more resources (both computational and human) to develop AI techniques and train very large models on massive datasets compared to public agencies and many research labs,” Daniel Rothenberg, an atmospheric scientist with Google’s autonomous vehicle spinoff Waymo, said in an email. “I think this is the biggest, mostly unnoticed change in AI/weather over the past few years.”

Weather services starting to see the power of AI

For Hurricane Ian, it turns out one model was more accurate every step of the way than the official forecast from the Hurricane Center, according to the center’s post-storm evaluation. It was the center’s HCCA model, a hybrid AI model that merges input from multiple models and uses machine learning to more heavily lean on the models that performed better on past forecasts.

HCCA’s average three-day forecast track error for Ian was 97 nautical miles versus 116 for the official forecast. Its average two-day forecast track error was 55 nautical miles versus 67 for the official forecast. Both represent meaningful improvements, since evacuation costs roughly $1 million per mile. The model also outperformed the Hurricane Center’s official track forecast for several other storms in 2022.

AI is “directly contributing to the many forecast improvements that you’ve seen from the Hurricane Center of late,” Jamie Rhome, the center’s deputy director, said in an interview.

Another hybrid AI model, developed at Colorado State University, learns from past forecasts and verified instances of severe weather to predict heavy rain, large hail, damaging thunderstorm winds and tornadoes up to eight days in advance. The information is used by meteorologists at the National Weather Service’s Storm Prediction Center to issue more confident forecasts of severe weather outbreaks, especially at longer lead times.

“There is very limited information available to forecasters at lead times of four to eight days, basically just information about possible storm environments,” Colorado State University professor of atmospheric science Russ Schumacher, who leads the team that developed the model, said in an email. “So being able to use machine learning to provide actual probabilities of severe weather has been a big advance.”

Other hybrid AI models in development at universities and research labs are aimed at improving forecasts for a variety of weather and climate phenomena including flooding, lightning, heat waves, drought, winter precipitation type and atmospheric rivers, like the ones that dumped historic amounts of snow and rain on California this past winter.

A key to how fast AI models are adopted may hinge in part on how quickly the technology is embraced, especially among government agency leaders and forecasters who have spent years mastering conventional models.

“The concern with AI is often that it’s a black box, where you just put some numbers in and get numbers out and don’t know how the process works in between,” Schumacher said.

‘Astonishing’ progress in AI models draws attention, questions

While the use of hybrid AI models gradually grows, it’s the progress and momentum of the pure AI global models that is attracting increased attention — and questions — from leading government forecast centers. Both NOAA and the European Centre for Medium-Range Weather Forecasts, which operate the American and European models, say they are in contact or collaborating with at least some of the companies building AI models to explore how they could be used in operations.

“The progress of the pure machine learning models has been quite astonishing during the last six months and many scientists in the field have been taken by surprise regarding the quality of predictions,” Peter Dueben, head of Earth system modeling at the European Center, said in an email. “However, we still need to gain more experience how to work with these models.”

Hendrik Tolman, the National Weather Service’s senior adviser for advanced modeling systems, struck a similar tone.

“The results of these kinds of approaches are amazing. However, for day-to-day operations, we simply do not have enough experience with such approaches to be able to have a clear path forward,” Tolman said in an email.

One challenge is that NOAA, the European Center and other national forecasting centers haven’t figured out how to make the atmospheric observations that are used to run the conventional models available to the AI models in real time.

However, AI weather start-ups Atmo and Zurich-based Jua both say they have developed their own data pipelines and are already operating stand-alone AI models. In May, Atmo announced “the first AI-based live global weather forecast.” Its customers include both the U.S. Air Force and countries that cannot afford conventional models.

“We essentially think it’s possible now to use an AI meteorology model to allow every country and region to operate their own forecast,” Atmo CEO Alexander Levy said in an interview.

Levy, Tolman, Dueben and other experts all point to ensemble modeling as one of the most promising applications of AI. In ensemble modeling, the same model is run multiple times, each time starting with slightly tweaked initial atmospheric conditions to represent uncertainties and approximations made by the model. The result is a range of possible outcomes, rather than a single forecast, that meteorologists use to determine what forecast is most probable and assess confidence.

Because generating an ensemble forecast with conventional models is time-consuming and expensive, existing systems can only do up to about 50 simulations. The speed and efficiency of AI could allow for ensembles that generate hundreds or even thousands of simulations in as little as a few minutes.

Larger ensembles could especially benefit what are known as subseasonal-to-seasonal forecasts, which predict trends in temperature and precipitation from two weeks to two months into the future. Such forecasts matter to many industries including agriculture, energy and water management.

Building trust in ‘black box’ models

As forecasters start to use AI models more often, Schumacher is optimistic the systems can become less opaque, less of the “black box” he described.

“There are tools that help to explain how the machine learning makes its predictions,” Schumacher said.

What Schumacher is referring to is the field of “explainable AI,” which aims to help humans better understand how and why an AI model reaches a particular conclusion. In the case of an AI weather model, a more transparent system might identify wind direction, humidity and pressure as the most important predictors of severe thunderstorms.

Schumacher suggests forecasters can build trust with AI models the same way they do with conventional models — by “looking at them closely to identify both where it succeeds and where it fails,” he said.

As for whether meteorologists should be concerned about AI taking over their jobs, Rothenberg maintains that the human element in forecasting will always remain important.

“Humans are vital for tailoring communications and effectively informing stakeholders, especially in complex emergency or severe situations where timeliness, accuracy, and actionability and trust matter above all else,” Rothenberg said.