System demonstrators - Maestro | Data Orchestration

To demonstrate the capabilities of the Maestro middleware, the project is implementing a set of demonstrators:

Maestro framework use

This demonstrator showcases the coordinated startup of multiple Maestro-enabled applications, comprising producer(s) and consumer(s) of Core Data Objects, with artificial workloads. Each of them exercises a certain aspect of the Maestro core functionality and can serve as example code for that aspect of the Maestro system.

Data aware runtime

Usual simulation codes rely on domain decomposition to span the complete computation on multiple processes and multiple nodes. The boundaries of these domains, which are often called halos, must be communicated on a regular basis. This demonstrator will show how such halos can be exchanged using Maestro. It will use the Maestro-enabled version of MPC , a high-level communication library that implements the MPI API.

Workflow execution and optimisation demonstrator

The goal of this demonstrator is to show how Maestro’s execution framework can manage synthetic workflows. From the workflow manager to the job scheduler, a set of components is in charge of translating a workflow description enhanced with Maestro annotations to a graph of tasks to be scheduled on the HPC system. Those tasks can include both workflow applications as well as Maestro tasks such as the Maestro Pool Manager or a dynamically provisioned file system.

Intelligent workload management and data preloading/staging

This demonstrator uses the static code analysis capabilities of the Parallelware tools to extract the source code information that is relevant for Maestro middleware in order to enable intelligent decision-making about data movement and data placement in complex memory hierarchies. The demonstrator showcases the command-line tool pwmaestro, which extracts information from the source code and then leverages such information to facilitate the development of Maestro-enabled applications.

Dynamic Provisioning

HPC systems provide dynamic access to compute nodes through a batch scheduler. However, little has been done for dynamically provisioned storage resources. Such resources are traditionally shared among all users through a fixed API. In the context of Maestro, a dynamic provisioning mechanism as part of the middleware allows allocation of dedicated resources to a workflow and deploy an appropriate data manager on top. This set of resources can be explicitly requested through the workflow description or granted based on quantitative and qualitative requirements expressed by the workflow (bandwidth, metadata intensive, and so on).

Guided I/O in pre/post-processing

The Maestro I/O (MIO) interface acts as a generic object store I/O interface for the Maestro middleware providing a gateway to various persistent storage backends. Guided-I/O is one of the important aspects of the MIO interface. Maestro can specify optimization and data usage hints to the persistent storage backend through MIO, which is helpful to meaningfully organise data and optimize the data management of Core Data Objects. Guided I/O involves hints provided to the backend persistent storage which stores Maestro data, through which the storage system is able to pre and post process data appropriately. Apart from application-fed hints, guided-IO also makes use of the information gained from telemetry data. MIO includes a unified telemetry interface which allows MIO itself and its applications such as Maestro core to encode and store telemetry records to the object storage backend. In this way, the telemetry records of the whole Maestro stack from applications, Maestro core to MIO and storage backend can potentially be integrated together, enabling holistic analysis of Maestro middleware.