MAESTRO Software Components - Maestro

MAESTRO Core

Maestro Core is a C library that allows multi-threaded cross-application data transfer and inspection over high-speed interconnect, using lightweight data wrappers named CDOs that include user metadata and data semantics.

Repository: https://gitlab.com/maestro-data/maestro-core

Cortx MIO

Cortx is Seagate’s Object storage software infrastructure that can convert any commodity hardware with storage into a data storage solution. CORTX Maestro I/O Interface provides the interface for Maestro middleware ro work with the CORTX Object store (Specifically the “Motr” Component of CORTX). This exemplifies that Maestro can work with any backend storage subsystem.

Repository: https://github.com/Seagate/cortx-mio

ECMWF

ECMWF’s post-processing benchmark is a collection of software packages that together form part of the workflow that creates meteorological products from raw forecast data. It can be adapted to new technologies relatively easily and has been used to compare the existing data-transfer technologies with those supported by the Maestro core.

Repositories:

pwmaestro

The pwmaestro tool is a command-line tool that uses the static code analysis capabilities of the Parallelware tools (no rebranded as Codee tools) for the analysis of the data access semantics of an application code written in C/C++. It finds memory related issues that impact the performance of the code, including non-consecutive array accesses that exhibit poor data locality at run-time. It also provides source code rewriting capabilities to insert calls to the Maestro API. Overall, pwmaestro is intended to facilitate the development of Maestro-enabled applications.

Link to Parallelware: https://codee.com/

RADLE

RADLE is a parser for the maestro resource description language written in C11.

Repository: https://gitlab.jsc.fz-juelich.de/maestro/radle

Splinter

Splinter is an experimental python based workflow-executor. Splinter was written to demonstrate the performance advantages of a data-dependency based async workflow-execution as opposed to the more traditional task-dependency based workflow execution. It also defines a specification for a workflow description language (IWDL). Splinter then uses IWDL workflow as an input and executes it. It has experimental support for SLURM-based workload executors.

Repository: https://gitlab.jsc.fz-juelich.de/maestro/splinter

DynPro

It is common for HPC systems to provide dynamic access to compute nodes through a batch scheduler. However, little has been done for dynamically provisioned storage resources. Such resources are traditionally shared among all users. A dynamic provisioning mechanism allows to allocate dedicated resources to an application or a workflow and deploy an appropriate data manager on top of it.

Repository: https://github.com/eth-cscs/dynamic-resource-provisioning

SelFIe: Self and Light proFIling Engine

SelFIe is a lightweight profiling tool developed by the CEA. It runs a light profiling of jobs on supercomputers and provides statistics of the usage of the computer to the administrators. This tool has been in production since 2016 on the clusters managed by the CEA. SelFIe allows to have a vision of the user jobs on the following points:

Number of cores used (MPI or OpenMP)
Time spent in the job: execution time, system time, CPU time
Maximum memory used by the job processes
Number of calls and time spent in MPI calls
Number of calls and time spent in POSIX I/O calls
PAPI hardware counters: number of cycles and number of instructions
Statistics on Maestro core usage
Statistics on MIO usage

When a process is executed, Selfie is loaded via the LD_PRELOAD mechanism and at the end of the job, it sends a summary as JSON in the system logs.

Repository: https://github.com/cea-hpc/selFIe