Co-design applications - Maestro

IFS numerical weather prediction system

The European Centre for Medium-Range Weather Forecasts (ECMWF) runs an operational weather forecasting workflow four times each day. This workflow, shown in Figure 1, contains ECMWF’s Integrated Forecast System (IFS) and runs an ensemble of 51 weather forecasts and a single ‘high-resolution’ forecast to predict the weather for up to 15 days. Data generated by the forecast model is then post-processed by a swarm of workflow components to generate and disseminate products to ECMWF’s member and co-operating states and other licensed customers. The high number of data exchanges required between workflow components – especially between the forecast model and the post-processing engine – create a bottleneck. Without further technological advances, this may restrict future increases in forecast resolution, which is one of the most important means to improve accuracy of weather forecasts.

**Figure 1**: A sketch of the operational workflow at ECMWF

Maestro is capable of orchestrating more efficient data movement between the workflow components. It can thus assist ECMWF in exploiting novel storage technologies and heterogeneous storage architectures, as well as improving its management of data dependencies. These will in turn facilitate efforts to accelerate the processing and delivery of higher resolution data to its users. Figure 2 shows an I/O benchmark that makes use of the Maestro technology. Many of the components in this workflow are open source and available at: https://github.com/ecmwf.

**Figure 2**: An example of using Maestro in an ECMWF I/O benchmark. A pool manager (PM) orchestrates the data flow between multiple producers generating a mock weather forecast and multiple consumers carrying out the post-processing.

Computational Fluid Dynamics plus in-situ analysis

This application is a simplified setup of a realistic usage scenario where multiple simulation codes are chained and combined with data analysis and visualisation pipelines. Chaining of simulation codes is often used to model different physics and/or scales, with the output of the first code being used as the input of the second with an intermediate data transformation or preparation step and a final post-processing step. For this simplified setup the open source proxy application Hydro is used, a 2D hydrodynamic simulation code, to perform simulations at different scales. Two post-processing pipelines are then run using the simulation output data: A custom data analysis pipeline written in Python and a visualisation pipeline using ParaView and FFmpeg to produce a movie of the simulation.

The Maestro-enabled version will consist of modifying the Hercule I/O library to allow using Maestro CDOs to produce and/or consume simulation data with the primary objective of executing the processing steps, no longer post-, but in situ in a loosely coupled fashion (also called in transit). In other words, it consists of executing in parallel steps 3-a), 3-b) and 3-c), with simulation output data ”streamed” from the simulation to the two processing. This is shown in the following Figure.

Global Earth System Modelling system TerrSysMP

Global Earth System Modelling faces the challenge of having to couple different models. The Terrestrial Systems Modelling Platform (TerrSysMP) targets the simulation of interactions between lateral flow processes in river basins and lower atmospheric layers. This is achieved by combining three model components COSMO, CLM and ParFlow. COSMO models the atmosphere, the Community Land Model (CLM) is used to study the effect of terrestrial ecosystems on climate, and Paraflow provides hydrological models to simulate surface and subsurface flow. TerrSysMP is used to co-design Maestro such that it can be used for coupling these different models. A typical data transfer model between the different part of applications is shown in the following figure:

The two main components of the simplified workflow are the CLM component and the Parflow component. CLM and Parflow are the simulation components that produce and consume data while the data is transferred between these two components through the Oasis3-MCT coupler. Over the course of Maestro project, the Oasis3-MCT coupler has been replaced by Maestro. Due to the already highly optimized nature of data transfer in the Oasis3-MCT coupler, Maestro is not expected to provide any performance advantages for the data transfer. Instead, the advantages are expected to be seen in the maintainibility and the genericity of the process.

TerrSysMP software is currently implemented by patching the component application with the patching code. A separate set of patches has to be maintained for each separate use case. This can often result in bugs, and different use cases falling out of sync with each other. Currently, there are at least three different use cases that have to be kept in sync manually by manually reimplementing patches from one use case into another.

With Maestro, only the component applications have to be ported to use Maestro, and the coupler part is not needed to be implemented. Furthermore, the Maestro-enabled port of the application can be used in a standalone manner as well, without any loss of either functionaliity or performance. This allows the coupling code to be incorporated into the upstream, and therefore reduce the maintenance burden of the coupling code. The same features also help a lot with the debuggability of the course.

Electronic structure calculation library SIRIUS

The Density Functional Theory (DFT) is an established approach for computing electronic structures of materials. SIRIUS is a domain specific library to realise DFT applications. It includes compute-intensive kernels that are good candidates for being accelerated on GPUs. On servers comprising CPUs and GPUs this requires data to be placed in the memory attached to CPUs and GPUs and to be exchanged between both types of memory. In the case of SIRIUS this can become challenging as the matrices involved in the calculation are usually large and therefore do not always fit into the memory of the GPU. Maestro is used to support chunking of the data and the transfer of these chunks to the GPU.

Repository: https://github.com/electronic-structure/SIRIUS

Montage

Montage is a portable toolkit for constructing custom, science-grade mosaics by composing multiple astronomical images. The mosaics constructed by Montage preserve the astrometry (position) and photometry (intensity) of the sources in the input images. The mosaic to be constructed is specified by the user in terms of a set of parameters, including dataset and wavelength to be used, location and size on the sky, coordinate system and projection, and spatial sampling rate. Many astronomical datasets are massive, and are stored in distributed archives that are, in most cases, remote with respect to the available computational resources. An example of one such mosaic is shown in Figure 3.

**Figure 3:** A Mosaic of Astronomical Images

Montage is an example of a class of well-specified deterministic workflows that are common in science. These workflows usually consist of a series of codes (i.e. components) connected together to perform large-scale analysis Routines. The inputs to the workflow include a “template header file” that specifies the mosaic to be constructed, and several input images in standard FITS format (a file format used throughout the astronomy community) . Input images are taken from archives such as 2MASS. An example of one such workflow is shown in Figure 4.

Montage is considered to be I/O-bound because on large workloads, if often spends 70-95% of its time waiting on I/O operations. In the aspect of workflows, Maestro-enabled Montage components can take advantage of the fact that the data transfer (in terms of CDOs) can happen through the memory instead of through the global file system. This can allow for certain optimizations. In particular, the workflow can pre-fetch data to the nodes where it is required in order to reduce the time spent in waiting for the data.

The Montage application internally uses a FITS file format to load and store image data. The FITS file format has some complexities which make it infeasible to create a serialized version of FITS data without redesigning and reimplementing the algorithms of Montage as well. For example, the FITS File format stores a pointer to the FILE* descriptor and majority of algorithms directly read data and write data to the descriptor instead of writing data to memory first. This prevents us from keeping all the data in the memory, since the functions often directly write data to the file. In order to get around this, a synthetic benchmark named Mocktage has been implemented, which aims to reproduce the computational and I/O workloads of Montage workflow while avoiding the difficulties of FITS file format.

Mocktage

mocktage is a staging demonstrator for Maestro. It’s behavior is supposed to reflect Montage. In mocktage, a dummy data set with profiles matching that of Montage workflows is used.

Repository: https://gitlab.jsc.fz-juelich.de/maestro/mocktage