3 min read

Technical Blog Vol. 1

by

Christian Lessig

WeatherGenerator Structure

Rethinking Weather and Climate with Machine Learning

The WeatherGenerator project has the ambitious goal to use machine learning to train a large neural network representing the entire Earth system from minutes to decades—that is to build a foundation model for weather and climate. The WeatherGenerator will support a wide range of applications, e.g. local and global weather forecasting, decadal climate projections, and impact models, such as for wind energy, agriculture and hydrology. With this, the WeatherGenerator will be a machine-learned “digital twin”, following the development of the Extreme Event and Climate Digital Twins as part of Destination Earth by European Centre for Medium-Range Weather Forecasts - ECMWF and its partners.

Cutting Edge Machine Learning for Earth System Modeling

For the WeatherGenerator, we leverage the innovations made for Large Language Models (LLMs), such as OpenAI’s GPT-series or Google’s Gemini, and adapt them to Earth system science. Just like LLMs learn language from vast amounts of text, we will train the WeatherGenerator on a very large, heterogeneous training dataset, including from observations, simulations and reanalyses. We will also use fine-tuning, the technique that turns a base model such as GPT4 into an application such as ChatGPT. With this, we can calibrate the WeatherGenerator’s output with very high-quality but very limited data such as from ground stations, and tune the model for specific applications.

Our project started in February 2025 and after only three months of collaboration, we already have a first prototype that demonstrates some of the key principles of the WeatherGenerator. The figure below shows a simplified description of the model that we trained with five very different input datasets:

  • ERA5: Copernicus 5th generation reanalysis which provides the best estimate of the state of the atmosphere globally

  • CERRA: a high-resolution reanalysis that provides the best estimate of the state of the atmosphere in Europe at 5 km resolution

  • METEOSAT 11-SEVIRI: a geostationary satellite that provides observations of the atmospheric state over Africa and Europe (at approx. 50 km resolution in the version currently used)

  • NPP-ATMS: A polar orbiting microwave sounder that observes temperature and humidity in the atmosphere around the globe

  • SYNOP stations: ground station measurements around the globe, for example for temperature and wind at the surface

WeatherGenerator Structure

The datasets are fused in the “assimilation engine” (grey) of the WeatherGenerator to combine the information from the different sources and internally obtain an estimate of the Earth system state that is better than each individual input. The fusion process starts locally for small neighbourhoods, and then the global Earth system state (orange) is determined by combining the local information with attention heads. Physical predictions are obtained through output-specific prediction networks that take a location and time as input and translate the latent Earth system state to the corresponding physical value. The visualizations on the right show the current model predictions for a short-term, 6h forecast.

What’s next?

To meet our ambitious goals for the WeatherGenerator during this four-year project, substantial work remains to be done. We aim to learn how to time-step with our model for longer-term predictions, how to train on higher resolution data, and tackle many other challenges. With the project just having started, we are looking forward to the work ahead!

The WeatherGenerator is developed open source, and the current model is available on GitHub: https://github.com/ecmwf/WeatherGenerator. We will regularly update the blog to keep you informed about our developments.