The development and implementation of efficient computer codes for exascale supercomputers will require combined advancement of all development environment components: compilers, automatic tuning frameworks, run-time systems, debuggers and performance monitoring and analysis tools. The exascale era poses unprecedented challenges. Because the presence of accelerators is more and more common among the fastest supercomputer and will play a role in exascale computing, compilers will need to support hybrid computer architectures and generate efficient code hiding the complexity of programming accelerators. Hand optimization of the code will be very difficult on exascale machine and will be increasingly assisted by automatic tuners. Application tuning will be more focus on parallel aspects of the computation because of large amount of available parallelism. The application workload will be distributed over million of processes, and to implement ad-hoc strategies directly in the application will be probably unfeasible while an adaptive run-time system will provide automatic load balancing. Debuggers and performance monitoring tools will deal with million processes and with huge amount of data from application and hardware counters, but they will still be required to minimize the overhead and retain scalability. In this talk, we present how the development environment of the CRESTA exascale EC project meets all these challenges by advancing the state of the art in the field.
An investigation of compiler support for hybrid GPU programming, the design concepts, and the main characteristics of the alpha prototype implementation of the CRESTA development environment components for exascale computing are presented. A performance study of OpenACC compiler directives has been carried out, showing very promising results and indicating OpenACC as viable approach for programming hybrid exascale supercomputer. A new Domain-Specific Language (DSL) has been defined for the expression of parallel auto-tuning at very large scale. The focus of on the extension of the auto-tuning approach into the parallel domain to enable tuning of communication-related aspects of application. A new adaptive run-time system has been designed to schedule processes depending on the resource availability, on the workload, and on the run-time analysis of the application performance. The Allinea DDT debugger and the Dresden University of Technology MUST MPI correctness checker are being extended to provide a unified interface, to improve scalability, and to include new disruptive technology based on statistical analysis of run-time behavior of the application for anomalies detection. The new exascale prototypes of the Dresden University of Technology Vampir, VampirTrace and Score-P performance monitoring and analysis tools have been released. The new features include the possibility of applying filtering technique before loading performance data to drastically reduce memory needs during the performance analysis. The initial evaluation study of the development environment is targeted on the CRESTA project applications to determine how the development environment could be coupled into a production suite for exascale computing.