Top of the FLOPS: Scaling up Europe’s Performance

With Europe currently lagging behind the United States and China in terms of the highest performing computers, the European Union is investing in research to help us reach exascale. In this article, HPC application researcher Mike Ashworth (University of Manchester), EuroEXA coordinator Georgios Goumas (Institute of Communication and Computer Systems, National Technical University of Athens) and EuroEXA dissemination lead Peter Hopton (ICETOPE) explain how EuroEXA’s groundbreaking, co-designed architecture is providing the template for ExaScale.

High-performance computing (HPC) is about using the biggest, fastest computers around for simulation and data processing in support of scientific research in academia, industry and government. The performance of these HPC systems has been growing exponentially over at least five decades, with the raw number crunching power increasing by about 1,000-fold every ten years. Exascale refers to the next target, the next 1,000-fold leap in compute power.

This awesome power is measured in the number of calculations the system can compute every second, measured as floating point operations per second or ‘FLOPS’. The current world-leading systems operate at around 100 petaFLOPS or 1017 FLOPS – the current number one system, ‘Summit’ at Oak Ridge National Laboratory in the USA, is rated at 143.5 petaFLOPS. Another factor of ten is needed to get us to exascale, or 1018 FLOPS.

HPC has become critically important across a wide range of human endeavours with immeasurable impacts upon our daily lives, our economy and our quality of life. A report by the PRACE Scientific Steering Committee, The Scientific Case for Computing in Europe 2018-2026, lists the following key areas:

  • fundamental sciences
  • climate, weather and earth sciences
  • life sciences and medicine
  • infrastructure and manufacturing chemistry and materials sciences
  • complexity and data
  • next generation computing

The report concludes that ‘enhanced synergies between scientists working on hardware, algorithms, and applications is required for advancing the frontiers of science and industry in Europe for the benefit of its citizens’. A number of critical research areas and problem classes, including weather prediction with fine granularity, climate change, large eddy simulation for turbulence modelling in aeronautics and the challenges in fusion energy research, are beyond current computing capabilities; they need exascale performance and even more.

The impact of exascale computing in many of these areas has been analysed in The Opportunities and Challenges of Exascale Computing, a report from the United States Department of Energy Office of Science. This report concludes that ‘[e]xascale computing will uniquely provide knowledge leading to transformative advances for our economy, security and society in general. A failure to proceed with appropriate speed risks losing competitiveness in information technology, in our industrial base writ large, and in leading-edge science’.

Building on European Research

Launched in September 2017, EuroEXA draws upon previous European research to deliver the recipe for an exascale computer by the project’s end in 2021. Specifically, this major (€20 million) project builds on the work of three projects, ExaNoDe, ExaNeSt and ECOSCALE.

Using co-design principles, we will physically demonstrate a testbed deployment providing an expected 2-4 petaFLOPS peak performance in an operational environment. The groundbreaking system architecture will be developed and optimized through a series of three testbed systems with increasing levels of performance and increasing sophistication, which are being developed during the course of the project.

EuroEXA systems will be equipped with an optimized system software stack building on the work on the operating and runtime systems from the ExaNoDe project. They will be assessed through a wide set of HPC applications representing a diverse range of scientific subject areas. The testbed architecture will be shown to be capable of scaling to world-class peak performance in excess of 400 petaFLOPS with an estimated system power of around 30 MW peak.

Groundbreaking Architecture Responding to Application Needs

EuroEXA combines state-of-the-art computing components using a groundbreaking system architecture. This applies the design flexibility of UNIMEM, a scalable memory scheme first developed during the FP7 EUROSERVER projects and used in ExaNeSt and ECOSCALE. The architecture delivers high levels of performance to the selected applications and balances compute and reconfigurable acceleration resources with the demands of applications. Through co-design between the enabling technologies, the system software and the applications, EuroEXA is delivering an innovative solution that achieves both extreme data processing and extreme computing.

Work on applications started with an assessment of application requirements and has progressed to porting and optimization work, using testbed systems similar to the EuroEXA architecture, to begin offloading computational kernels onto field-programmable gate arrays (FPGAs). The project is now working to implement a rich system software and runtime stack that will ensure applications can fully exploit the novel characteristics of the underlying architecture.

EuroEXA is evolving both traditional HPC programming paradigms (such as MPI and OpenMP) and novel ones including programming support for FPGA acceleration (such as OmpSs@FPGA), task-based, multi-node programming (such as OmpSs@clusters, OpenStream, UNIMEM-based) and streaming dataflow programming (Maxeler dataflow).

The first of three testbeds, Testbed 1, has been designed, built and installed. Testbed 1 consists of Quad-FPGA Daughter Boards (QFDB) connected by high-speed links in an innovative and state-of-the-art packaging design. A key part of the design process is co-design, with application requirements feeding into the design parameters. We have updated the specification for Testbed 2 resulting in a new design, the Co-Design Recommended Daughter Board (CRDB), which is forecast to result in four times greater performance.

Initial designs have also been produced for a novel, hybrid, hierarchical, low-latency, high-performance network, as well as for a multi-central processing unit (CPU) custom application specific integrated circuit (ASIC) for the core of the Testbed 3 compute node. The ASIC features custom hardware for implementing UNIMEM global addressing and memory compression.

At an early stage, the co-design process revealed a requirement for a redesign of the daughter board for the Testbed 2 system, meaning that we had to rethink some of the tasks in the project. However, thanks to flexible and collaborative working, we have managed to adapt swiftly and turn this significant challenge into a major success.

Towards the end of 2019, we’ll be reaching a major milestone with the deployment of the Testbed 2 system. Application porting and optimization, together with porting and optimization efforts of parallel language runtimes, will start to reveal the benefits of the EuroEXA architecture, especially the memory and communications aspects, which are key to whole system performance across the cluster. Moreover, ongoing work on performance modelling will be able to provide initial evidence on how the EuroEXA testbeds can lead to effective exascale machines.

EuroEXA has received funding from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement no 754337

Leave a Comment