DEEP-ER

DEEP Extended Reach

Starting date:

01/10/2013

End date:

30/09/2016

The goal of DEEP-ER is to update the Cluster-Booster architecture introduced by the DEEP project and extend it with additional parallel I/O and resiliency capabilities. In DEEP-ER, Cluster and Booster nodes will be connected to a uniform network. Novel non-volatile memory technology will be tested at different architecture levels, building up a multi-level storage hierarchy. A software environment for parallel I/O and resiliency will be built upon this storage infrastructure. To recover applications after hardware failures, a multi-level checkpoint/restart mechanism using the available memory devices and a task-based recovery mechanism based on the OmpSs programming environment will be set up.

Framework Programme:

FP7-ICT-2013-10

General notes:

The proposed project DEEP-ER (DEEP-Extended Reach) addresses two significant Exascale challenges: the growing gap between I/O bandwidth and compute speed, and the need to significantly improve system resiliency. DEEP-ER will extend the Cluster-Booster architecture of the Dynamical Exascale Entry Platform (DEEP) project by a highly scalable I/O system and will implement an efficient mechanism to recover application tasks that fail due to hardware errors. The project will leverage new memory technology to provide increased performance and power efficiency. As a result, I/O parts of HPC codes will run faster and scale up better HPC applications will be able to profit from checkpointing and task restart on large systems reducing overhead seen today. Systems that use the DEEP-ER results can run more applications increasing scientific throughput, and the loss of computational work through system failures will be substantially reduced.

DEEP-ER will build a prototype with the second generation Intel® Xeon Phi processor, a uniform high-speed interconnect across Cluster and Booster, non-volatile memory on the compute nodes, and network attached memory providing high-speed shared memory access. A highly scalable and efficient I/O system based on the Fraunhofer file system will support I/O intensive applications, using optimised I/O middleware SIONlib and EIOW. A multi-level checkpoint scheme will exploit scalable I/O and fast, non-volatile memory close to the nodes to reduce the overhead of saving state for long-running tasks. The OmpSs based DEEP programming model will govern the creation of checkpoints and restart failed tasks from the beginning or recover saved state depending on their granularity.

Seven important HPC applications will be optimised demonstrating the usability, performance and resiliency of the DEEP-ER Prototype. The applications come from different scientific and engineering areas and represent requirements of simulation-based and data-intensive HPC codes.

Url:

http://www.deep-er.eu/

Funding Scheme:

Collaborative project

Partners:

13 partners from 7 different countries collaborate in the DEEP-ER project:

FORSCHUNGSZENTRUM JUELICH GMBH - Germany
INTEL GMBH - Germany
BAYERISCHE AKADEMIE DER WISSENSCHAFTEN - Germany
RUPRECHT-KARLS-UNIVERSITAET HEIDELBERG - Germany
FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANGEWANDTEN FORSCHUNG E.V - Germany
EUROTECH SPA - Italy
BARCELONA SUPERCOMPUTING CENTER - CENTRO NACIONAL DE SUPERCOMPUTACION - Spain
XYRATEX TECHNOLOGY LIMITED - United Kingdom
CONSORZIO INTERUNIVERSITARIO CINECA - Italy
KATHOLIEKE UNIVERSITEIT LEUVEN - Belgium
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE - France
STICHTING ASTRONOMISCH ONDERZOEK IN NEDERLAND - Netherlands
UNIVERSITAET REGENSBURG - Germany

Contact

Fabio Affinito

f.affinito@cineca.it

Contact

Andrew Emerson

a.emerson@cineca.it

Menu utility

Main menu

Projects

Highlights

You are here