kth.sePublications
Change search
Refine search result
123 1 - 50 of 117
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fuerlinger, Karl
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Automatic On-Line Detection of MPI Application Structure with Event Flow Graphs2015In: EURO-PAR 2015: PARALLEL PROCESSING, Springer Berlin/Heidelberg, 2015, p. 70-81Conference paper (Refereed)
    Abstract [en]

    The deployment of larger and larger HPC systems challenges the scalability of both applications and analysis tools. Performance analysis toolsets provide users with means to spot bottlenecks in their applications by either collecting aggregated statistics or generating loss-less time-stamped traces. While obtaining detailed trace information is the best method to examine the behavior of an application in detail, it is infeasible at extreme scales due to the huge volume of data generated. In this context, knowing the application structure, and particularly the nesting of loops in iterative applications is of great importance as it allows, among other things, to reduce the amount of data collected by focusing on important sections of the code. In this paper we demonstrate how the loop nesting structure of an MPI application can be extracted on-line from its event flow graph without the need of any explicit source code instrumentation. We show how this knowledge on the application structure can be used to compute postmortem statistics as well as to reduce the amount of redundant data collected. To that end, we present a usage scenario where this structure information is utilized on-line (while the application runs) to intelligently collect fine-grained data for only a few iterations of an application, considerably reducing the amount of data gathered.

  • 2.
    Aguilar, Xavier
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Fürlinger, K.
    Laure, Erwin
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Online MPI trace compression using event flow graphs and wavelets2016In: Procedia Computer Science, Elsevier, 2016, p. 1497-1506Conference paper (Refereed)
    Abstract [en]

    Performance analysis of scientific parallel applications is essential to use High Performance Computing (HPC) infrastructures efficiently. Nevertheless, collecting detailed data of large-scale parallel programs and long-running applications is infeasible due to the huge amount of performance information generated. Even though there are no technological constraints in storing Terabytes of performance data, the constant flushing of such data to disk introduces a massive overhead into the application that makes the performance measurements worthless. This paper explores the use of Event flow graphs together with wavelet analysis and EZW-encoding to provide MPI event traces that are orders of magnitude smaller while preserving accurate information on timestamped events. Our mechanism compresses the performance data online while the application runs, thus, reducing the pressure put on the I/O system due to buffer flushing. As a result, we achieve lower application perturbation, reduced performance data output, and the possibility to monitor longer application runs.

  • 3.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fürlinger, Karl
    Ludwig-Maximilians-Universitat (LMU).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    MPI Trace Compression Using Event Flow Graphs2014Conference paper (Refereed)
    Abstract [en]

    Understanding how parallel applications behave is crucial for using high-performance computing (HPC) resources efficiently. However, the task of performance analysis is becoming increasingly difficult due to the growing complexity of scientific codes and the size of machines. Even though many tools have been developed over the past years to help in this task, current approaches either only offer an overview of the application discarding temporal information, or they generate huge trace files that are often difficult to handle.

    In this paper we propose the use of event flow graphs for monitoring MPI applications, a new and different approach that balances the low overhead of profiling tools with the abundance of information available from tracers. Event flow graphs are captured with very low overhead, require orders of magnitude less storage than standard trace files, and can still recover the full sequence of events in the application. We test this new approach with the NERSC-8/Trinity Benchmark suite and achieve compression ratios up to 119x.

  • 4.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fürlinger, Karl
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Visual MPI Performance Analysis using Event Flow Graphs2015In: Procedia Computer Science, E-ISSN 1877-0509, Vol. 51, p. 1353-1362Article in journal (Refereed)
    Abstract [en]

    Event flow graphs used in the context of performance monitoring combine the scalability and low overhead of profiling methods with lossless information recording of tracing tools. In other words, they capture statistics on the performance behavior of parallel applications while pre- serving the temporal ordering of events. Event flow graphs require significantly less storage than regular event traces and can still be used to recover the full ordered sequence of events performed by the application.  In this paper we explore the usage of event flow graphs in the context of visual performance analysis. We show that graphs can be used to quickly spot performance problems, helping to better understand the behavior of an application. We demonstrate our performance analysis approach with MiniFE, a mini-application that mimics the key performance aspects of finite- element applications in High Performance Computing (HPC).

    Download full text (pdf)
    xaguilar_iccs2015
  • 5.
    Aguilar, Xavier
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Jordan, H.
    Heller, T.
    Hirsch, A.
    Fahringer, T.
    Laure, Erwin
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    An On-Line Performance Introspection Framework for Task-Based Runtime Systems2019In: 19th International Conference on Computational Science, ICCS 2019, Springer Verlag , 2019, p. 238-252Conference paper (Refereed)
    Abstract [en]

    The expected high levels of parallelism together with the heterogeneity and complexity of new computing systems pose many challenges to current software. New programming approaches and runtime systems that can simplify the development of parallel applications are needed. Task-based runtime systems have emerged as a good solution to cope with high levels of parallelism, while providing software portability, and easing program development. However, these runtime systems require real-time information on the state of the system to properly orchestrate program execution and optimise resource utilisation. In this paper, we present a lightweight monitoring infrastructure developed within the AllScale Runtime System, a task-based runtime system for extreme scale. This monitoring component provides real-time introspection capabilities that help the runtime scheduler in its decision-making process and adaptation, while introducing minimum overhead. In addition, the monitoring component provides several post-mortem reports as well as real-time data visualisation that can be of great help in the task of performance debugging.

  • 6.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fürlinger, Karl
    Ludwig-Maximilians-Universität München.
    Online Performance Data Introspection with IPM2014In: Proceedings of the 15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013), IEEE Computer Society, 2014, p. 728-734Conference paper (Refereed)
    Abstract [en]

    Exascale systems will be heterogeneous architectures with multiple levels of concurrency and energy constraints. In such a complex scenario, performance monitoring and runtime systems play a major role to obtain good application performance and scalability. Furthermore, online access to performance data becomes a necessity to decide how to schedule resources and orchestrate computational elements: processes, threads, tasks, etc. We present the Performance Introspection API, an extension of the IPM tool that provides online runtime access to performance data from an application while it runs. We describe its design and implementation and show its overhead on several test benchmarks. We also present a real test case using the Performance Introspection API in conjunction with processor frequency scaling to reduce power consumption.

    Download full text (pdf)
    fulltext
  • 7.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Vahtras, Olav
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Gimenez, Judit
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Scalability analysis of Dalton, a molecular structure program2013In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 29, no 8, p. 2197-2204Article in journal (Refereed)
    Abstract [en]

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a performance characterization and optimization of Dalton. We also propose a solution to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers. With these improvements we obtain speedups of 4x, increasing the parallel efficiency of the code and being able to run in it in a much bigger number of cores.

  • 8.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Vahtras, Olav
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Gimenez, Judit
    Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Barcelona, Spain.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Scaling Dalton, a molecular electronic structure program2011In: Seventh International Conference on e-Science, e-Science 2011, 5-8 December 2011, Stockholm, Sweden, IEEE conference proceedings, 2011, p. 256-262Conference paper (Refereed)
    Abstract [en]

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a characterization and performance optimization of Dalton that increases the scalability and parallel efficiency of the application. We also propose asolution that helps to avoid the master/worker design of Daltonto become a performance bottleneck for larger process numbers and increase the parallel efficiency.

    Download full text (pdf)
    fulltext
  • 9.
    Ahmed, Laeeq
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Alogheli, Hiba
    Uppsala Univ, Dept Pharmaceut Biosci, Box 591, S-75124 Uppsala, Sweden..
    McShane, Staffan Arvidsson
    Uppsala Univ, Dept Pharmaceut Biosci, Box 591, S-75124 Uppsala, Sweden..
    Alvarsson, Jonathan
    Uppsala Univ, Dept Pharmaceut Biosci, Box 591, S-75124 Uppsala, Sweden..
    Berg, Arvid
    Uppsala Univ, Dept Pharmaceut Biosci, Box 591, S-75124 Uppsala, Sweden..
    Larsson, Anders
    Uppsala Univ, Dept Cell & Mol Biol, Natl Bioinformat Infrastruct Sweden NBIS, Box 596, S-75124 Uppsala, Sweden..
    Schaal, Wesley
    Uppsala Univ, Dept Pharmaceut Biosci, Box 591, S-75124 Uppsala, Sweden..
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). Royal Inst Technol KTH, Dept Elect Engn & Computat Sci, Lindstedtsvagen 5, S-10044 Stockholm, Sweden..
    Spjuth, Ola
    Uppsala Univ, Dept Pharmaceut Biosci, Box 591, S-75124 Uppsala, Sweden..
    Predicting target profiles with confidence as a service using docking scores2020In: Journal of Cheminformatics, E-ISSN 1758-2946, Vol. 12, no 1, article id 62Article in journal (Refereed)
    Abstract [en]

    Background: Identifying and assessing ligand-target binding is a core component in early drug discovery as one or more unwanted interactions may be associated with safety issues. Contributions: We present an open-source, extendable web service for predicting target profiles with confidence using machine learning for a panel of 7 targets, where models are trained on molecular docking scores from a large virtual library. The method uses conformal prediction to produce valid measures of prediction efficiency for a particular confidence level. The service also offers the possibility to dock chemical structures to the panel of targets with QuickVina on individual compound basis. Results: The docking procedure and resulting models were validated by docking well-known inhibitors for each of the 7 targets using QuickVina. The model predictions showed comparable performance to molecular docking scores against an external validation set. The implementation as publicly available microservices on Kubernetes ensures resilience, scalability, and extensibility.

  • 10.
    Ahmed, Laeeq
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Edlund, Åke
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Spjuth, O.
    Using iterative MapReduce for parallel virtual screening2013In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), IEEE Computer Society, 2013, p. 27-32Conference paper (Refereed)
    Abstract [en]

    Virtual Screening is a technique in chemo informatics used for Drug discovery by searching large libraries of molecule structures. Virtual Screening often uses SVM, a supervised machine learning technique used for regression and classification analysis. Virtual screening using SVM not only involves huge datasets, but it is also compute expensive with a complexity that can grow at least up to O(n2). SVM based applications most commonly use MPI, which becomes complex and impractical with large datasets. As an alternative to MPI, MapReduce, and its different implementations, have been successfully used on commodity clusters for analysis of data for problems with very large datasets. Due to the large libraries of molecule structures in virtual screening, it becomes a good candidate for MapReduce. In this paper we present a MapReduce implementation of SVM based virtual screening, using Spark, an iterative MapReduce programming model. We show that our implementation has a good scaling behaviour and opens up the possibility of using huge public cloud infrastructures efficiently for virtual screening.

  • 11.
    Ahmed, Laeeq
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Edlund, Åke
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Whitmarsh, S.
    Parallel real time seizure detection in large EEG data2016In: IoTBD 2016 - Proceedings of the International Conference on Internet of Things and Big Data, SciTePress, 2016, p. 214-222Conference paper (Refereed)
    Abstract [en]

    Electroencephalography (EEG) is one of the main techniques for detecting and diagnosing epileptic seizures. Due to the large size of EEG data in long term clinical monitoring and the complex nature of epileptic seizures, seizure detection is both data-intensive and compute-intensive. Analysing EEG data for detecting seizures in real time has many applications, e.g., in automatic seizure detection or in allowing a timely alarm signal to be presented to the patient. In real time seizure detection, seizures have to be detected with negligible delay, thus requiring lightweight algorithms. MapReduce and its variations have been effectively used for data analysis in large dataset problems on general-purpose machines. In this study, we propose a parallel lightweight algorithm for epileptic seizure detection using Spark Streaming. Our algorithm not only classifies seizures in real time, it also learns an epileptic threshold in real time. We furthermore present "top-k amplitude measure" as a feature for classifying seizures in the EEG, that additionally assists in reducing data size. In a benchmark experiment we show that our algorithm can detect seizures in real time with low latency, while maintaining a good seizure detection rate. In short, our algorithm provides new possibilities in using private cloud infrastructures for real time epileptic seizure detection in EEG data.

  • 12.
    Ahmed, Laeeq
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Georgiev, Valentin
    Capuccini, Marco
    Toor, Salman
    Schaal, Wesley
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Spjuth, Ola
    Efficient iterative virtual screening with Apache Spark and conformal prediction2018In: Journal of Cheminformatics, E-ISSN 1758-2946, Vol. 10, article id 8Article in journal (Refereed)
    Abstract [en]

    Background: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. Contribution: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. Results: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.

  • 13. Akhmetova, D.
    et al.
    Kestor, G.
    Gioiosa, R.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    On the application task granularity and the interplay with the scheduling overhead in many-core shared memory systems2015In: Proceedings - IEEE International Conference on Cluster Computing, ICCC, IEEE , 2015, p. 428-437Conference paper (Refereed)
    Abstract [en]

    Task-based programming models are considered one of the most promising programming model approaches for exascale supercomputers because of their ability to dynamically react to changing conditions and reassign work to processing elements. One question, however, remains unsolved: what should the task granularity of task-based applications be? Fine-grained tasks offer more opportunities to balance the system and generally result in higher system utilization. However, they also induce in large scheduling overhead. The impact of scheduling overhead on coarse-grained tasks is lower, but large systems may result imbalanced and underutilized. In this work we propose a methodology to analyze the interplay between application task granularity and scheduling overhead. Our methodology is based on three main points: 1) a novel task algorithm that analyzes an application directed acyclic graph (DAG) and aggregates tasks, 2) a fast and precise emulator to analyze the application behavior on systems with up to 1,024 cores, 3) a comprehensive sensitivity analysis of application performance and scheduling overhead breakdown. Our results show that there is an optimal task granularity between 1.2x10^4 and 10x10^4 cycles for the representative schedulers. Moreover, our analysis indicates that a suitable scheduler for exascale task-based applications should employ a best-effort local scheduler and a sophisticated remote scheduler to move tasks across worker threads.

  • 14.
    Akhmetova, Dana
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Cebamanos, L.
    Iakymchuk, Roman
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Rotaru, T.
    Rahn, M.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Bartsch, V.
    Simmendinger, C.
    Interoperability of GASPI and MPI in large scale scientific applications2018In: 12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017, Springer Verlag , 2018, p. 277-287Conference paper (Refereed)
    Abstract [en]

    One of the main hurdles of a broad distribution of PGAS approaches is the prevalence of MPI, which as a de-facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like GASPI without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we address this challenge by providing our study and preliminary performance results regarding interoperating GASPI and MPI on the performance crucial parts of the Ludwig and iPIC3D applications. In addition, we draw a strategy for better coupling of both APIs. 

  • 15.
    Akhmetova, Dana
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Iakymchuk, Roman
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Ekeberg, Örjan
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Performance study of multithreaded MPI and Openmp tasking in a large scientific code2017In: Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 756-765, article id 7965119Conference paper (Refereed)
    Abstract [en]

    With a large variety and complexity of existing HPC machines and uncertainty regarding exact future Exascale hardware, it is not clear whether existing parallel scientific codes will perform well on future Exascale systems: they can be largely modified or even completely rewritten from scratch. Therefore, now it is important to ensure that software is ready for Exascale computing and will utilize all Exascale resources well. Many parallel programming models try to take into account all possible hardware features and nuances. However, the HPC community does not yet have a precise answer whether, for Exascale computing, there should be a natural evolution of existing models interoperable with each other or it should be a disruptive approach. Here, we focus on the first option, particularly on a practical assessment of how some parallel programming models can coexist with each other. This work describes two API combination scenarios on the example of iPIC3D [26], an implicit Particle-in-Cell code for space weather applications written in C++ and MPI plus OpenMP. The first scenario is to enable multiple OpenMP threads call MPI functions simultaneously, with no restrictions, using an MPI THREAD MULTIPLE thread safety level. The second scenario is to utilize the OpenMP tasking model on top of the first scenario. The paper reports a step-by-step methodology and experience with these API combinations in iPIC3D; provides the scaling tests for these implementations with up to 2048 physical cores; discusses occurred interoperability issues; and provides suggestions to programmers and scientists who may adopt these API combinations in their own codes.

    Download full text (pdf)
    fulltext
  • 16.
    Al Ahad, Muhammed Abdullah
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Simmendinger, Christian
    T Syst Solut Res GmbH, D-70563 Stuttgart, Germany..
    Iakymchuk, Roman
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Efficient Algorithms for Collective Operations with Notified Communication in Shared Windows2018In: PROCEEDINGS OF PAW-ATM18: 2018 IEEE/ACM PARALLEL APPLICATIONS WORKSHOP, ALTERNATIVES TO MPI (PAW-ATM), IEEE , 2018, p. 1-10Conference paper (Refereed)
    Abstract [en]

    Collective operations are commonly used in various parts of scientific applications. Especially in strong scaling scenarios collective operations can negatively impact the overall applications performance: while the load per rank here decreases with increasing core counts, time spent in e.g. barrier operations will increase logarithmically with the core count. In this article, we develop novel algorithmic solutions for collective operations such as Allreduce and Allgather(V)-by leveraging notified communication in shared windows. To this end, we have developed an extension of GASPI which enables all ranks participating in a shared window to observe the entire notified communication targeted at the window. By exploring benefits of this extension, we deliver high performing implementations of Allreduce and Allgather(V) on Intel and Cray clusters. These implementations clearly achieve 2x-4x performance improvements compared to the best performing MPI implementations for various data distributions.

  • 17.
    Apostolov, Rossen
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Axner, Lilit
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Agren, Hans
    Ayugade, Eduard
    Duta, Mihai
    Gelpi, Jose Luis
    Gimenez, Judit
    Goni, Ramon
    Hess, Berk
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics.
    Jamitzky, Ferdinand
    Kranzmuller, Dieter
    Labarta, Jesus
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Lindahl, Erik
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics.
    Orozco, Modesto
    Peterson, Magnus
    Satzger, Helmut
    Trefethen, Anne
    Scalable Software Services for Life Science2011In: Proceedings of 9th HealthGrid conference, 2011Conference paper (Refereed)
    Abstract [en]

    Life Science is developing into one of the largest e- Infrastructure users in Europe, in part due to the ever-growing amount of biological data. Modern drug design typically includes both sequence bioinformatics, in silico virtual screening, and free energy calculations, e.g. of drug binding. This development will accelerate tremendously, and puts high demands on simulation software and support services. e-Infrastructure projects such as PRACE/DEISA have made important advances on hardware and scalability, but have largely been focused on theoretical scalability for large systems, while typical life science applications rather concern small-to-medium size molecules. Here, we propose to address this with by implementing new techniques for efficient small-system parallelization combined with throughput and ensemble computing to enable the life science community to exploit the largest next-generation e-Infrastructures. We will also build a new cross-disciplinary Competence Network for all of life science, to position Europe as the world-leading community for development and maintenance of this software e-Infrastructure. Specifically, we will (1) develop new hierarchical parallelization approaches explicitly based on ensemble and high-throughput computing for new multi-core and streaming/GPU architectures, and establish open software standards for data storage and exchange, (2) implement, document, and maintain such techniques in pilot European open-source codes such as the widely used GROMACS & DALTON, a new application for ensemble simulation (DISCRETE), and large-scale bioinformatics protein annotation, (3) create a Competence Centre for scalable life science software to strengthen Europe as a major software provider and to enable the community to exploit e-Infrastructures to their full extent. This Competence Network will provide training and support infrastructure, and establish a long-term framework for maintenance and optimization of life science codes.

  • 18. Appleton, O
    et al.
    Jones, B
    Kranzlmüller, D
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    The EGEE-II Project: Evolution Towards a Permanent European Grid Initiative2008In: Advances in Parallel Computing: High Performance Computing and Grids in Action / [ed] Lucio Grandinetti, IOS Press, 2008, 16, p. 424-435Chapter in book (Refereed)
    Abstract [en]

    Enabling Grids for E-sciencE represents the worlds' largest multidisciplinary grid infrastructure today. Co-funded by the European Commission, it brings together more than 90 partners in 32 countries to produce a reliable and scalable computing resource available to the European and global research community. At present, it consists of more than 200 sites in over 40 countries and makes more than 35,000 CPUs available to users 24 hours a day, 7 days a week. This article provides an overview of EGEE, its infrastructure, middleware, applications and support structures. From this experience, the current state of future plans will be explained, which is summarized under the term European Grid Initiative (EGI), and represents an emerging federated model for sustainable future grid infrastructures.

  • 19.
    Ardestani, Shahrzad
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Håkansson, Carl Johan
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Livenson, I.
    Stranak, P.
    Dima, E.
    Blommesteijn, D.
    Van De Sanden, M.
    B2SHARE: An open eScience data sharing platform2015In: Proceedings - 11th IEEE International Conference on eScience, IEEE , 2015, p. 448-453Conference paper (Refereed)
    Abstract [en]

    Scientific data sharing is becoming an essential service for data driven science and can significantly improve the scientific process by making reliable, and trustworthy data available. Thereby reducing redundant work, and providing insights on related research and recent advancements. For data sharing services to be useful in the scientific process, they need to fulfill a number of requirements that cover not only discovery, and access to data. But to ensure the integrity, and reliability of published data as well. B2SHARE, developed by the EUDAT project, provides such a data sharing service to scientific communities. For communities that wish to download, install and maintain their own service, it is also available as software. B2SHARE is developed with a focus on user-friendliness, reliability, and trustworthiness, and can be customized for different organizations and use-cases. In this paper we discuss the design, architecture, and implementation of B2SHARE. We show its usefulness in the scientific process with some case studies in the biodiversity field.

  • 20.
    Atzori, Marco
    et al.
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics.
    Köpp, Wiebke
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Chien, Wei Der
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Massaro, Daniele
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics.
    Mallor, Fermin
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics.
    Peplinski, Adam
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Rezaei, Mohamad
    PDC Center for High Performance Computing, KTH Royal Institute of Technology.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Vinuesa, Ricardo
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Weinkauf, Tino
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    In-situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst2021Report (Other academic)
    Abstract [en]

    In-situ visualization on HPC systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We design and develop in-situ visualization with Paraview Catalyst in Nek5000, a massively parallel Fortran and C code for computational fluid dynamics applications. We perform strong scalability tests up to 2,048 cores on KTH's Beskow Cray XC40 supercomputer and assess in-situ visualization's impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in-situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ~21\% on 2,048 cores (the relative efficiency of Nek5000 without in-situ operations is ~99\%). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in-situ processing time between rank 0 and all other ranks. Better scaling and load-balancing in the parallel image composition would considerably improve the performance and scalability of Nek5000 with in-situ capabilities in large-scale simulation.

  • 21. Bessani, A.
    et al.
    Brandt, J.
    Bux, M.
    Cogo, V.
    Dimitrova, L.
    Dowling, Jim
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Gholami, Ali
    KTH.
    Hakimzadeh, Kamal
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Hummel, M.
    Ismail, Mahmoud
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Leser, U.
    Litton, J. -E
    Martinez, R.
    Niazi, Salman
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Reichel, J.
    Zimmermann, K.
    BiobankCloud: A platform for the secure storage, sharing, and processing of large biomedical data sets2016In: 1st International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2015 and Workshop on Big-Graphs Online Querying, Big-O(Q) 2015 held in conjunction with 41st International Conference on Very Large Data Bases, VLDB 2015, Springer, 2016, p. 89-105Conference paper (Refereed)
    Abstract [en]

    Biobanks store and catalog human biological material that is increasingly being digitized using next-generation sequencing (NGS). There is, however, a computational bottleneck, as existing software systems are not scalable and secure enough to store and process the incoming wave of genomic data from NGS machines. In the BiobankCloud project, we are building a Hadoop-based platform for the secure storage, sharing, and parallel processing of genomic data. We extended Hadoop to include support for multi-tenant studies, reduced storage requirements with erasure coding, and added support for extensible and consistent metadata. On top of Hadoop, we built a scalable scientific workflow engine featuring a proper workflow definition language focusing on simple integration and chaining of existing tools, adaptive scheduling on Apache Yarn, and support for iterative dataflows. Our platform also supports the secure sharing of data across different, distributed Hadoop clusters. The software is easily installed and comes with a user-friendly web interface for running, managing, and accessing data sets behind a secure 2-factor authentication. Initial tests have shown that the engine scales well to dozens of nodes. The entire system is open-source and includes pre-defined workflows for popular tasks in biomedical data analysis, such as variant identification, differential transcriptome analysis using RNA-Seq, and analysis of miRNA-Seq and ChIP-Seq data.

  • 22. Cameron, D.
    et al.
    Casey, J.
    Guy, L.
    Kunszt, P.
    Lemaitre, S.
    Mc Cance, G.
    Stockinger, H.
    Stockinger, K.
    Andronico, G.
    Bell, W.
    Ben-Akiva, I.
    Bosio, D.
    Chytracek, R.
    Domenici, A.
    Donno, F.
    Hoschek, W.
    Laure, Erwin
    Lucio, L.
    Millar, P.
    Salconi, L.
    Segal, B.
    Silander, M.
    Replica Management in the European Data Grid Project2004In: Journal of Grid Computing, ISSN 1570-7873, E-ISSN 1572-9184, Vol. 2, no 4, p. 341-351Article in journal (Refereed)
    Abstract [en]

    Within the European DataGrid project, Work Package 2 has designed and implemented a set of integrated replica management services for use by data intensive scientific applications. These services, based on the web services model, enable movement and replication of data at high speed from one geographical site to another, management of distributed replicated data, optimization of access to data, and the provision of a metadata management tool. In this paper we describe the architecture and implementation of these services and evaluate their performance under demanding Grid conditions.

  • 23. Capuccini, Marco
    et al.
    Ahmed, Laeeq
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Schaal, Wesley
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Spjuth, Ola
    Large-scale virtual screening on public cloud resources with Apache Spark2017In: Journal of Cheminformatics, E-ISSN 1758-2946, Vol. 9, article id 15Article in journal (Refereed)
    Abstract [en]

    Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.

  • 24. Chien, Steven W. D.
    et al.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Sishtla, Chaitanya Prasad
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Santos, Luis
    Herman, Pawel
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Nrasimhamurthy, Sai
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Characterizing Deep-Learning I/O Workloads in TensorFlow2018In: Proceedings of PDSW-DISCS 2018: 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 54-63Conference paper (Refereed)
    Abstract [en]

    The performance of Deep-Learning (DL) computing frameworks rely on the rformance of data ingestion and checkpointing. In fact, during the aining, a considerable high number of relatively small files are first aded and pre-processed on CPUs and then moved to accelerator for mputation. In addition, checkpointing and restart operations are rried out to allow DL computing frameworks to restart quickly from a eckpoint. Because of this, I/O affects the performance of DL plications. this work, we characterize the I/O performance and scaling of nsorFlow, an open-source programming framework developed by Google and ecifically designed for solving DL problems. To measure TensorFlow I/O rformance, we first design a micro-benchmark to measure TensorFlow ads, and then use a TensorFlow mini-application based on AlexNet to asure the performance cost of I/O and checkpointing in TensorFlow. To prove the checkpointing performance, we design and implement a burst ffer. find that increasing the number of threads increases TensorFlow ndwidth by a maximum of 2.3 x and 7.8 x on our benchmark environments. e use of the tensorFlow prefetcher results in a complete overlap of mputation on accelerator and input pipeline on CPU eliminating the fective cost of I/O on the overall performance. The use of a burst ffer to checkpoint to a fast small capacity storage and copy ynchronously the checkpoints to a slower large capacity storage sulted in a performance improvement of 2.6x with respect to eckpointing directly to slower storage on our benchmark environment.

  • 25.
    Chien, Steven Wei Der
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Olshevsky, Vyacheslav
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Bulatov, Yaroslav
    South Pk Commons, San Francisco, CA USA..
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Vetter, Jeffrey S.
    Oak Ridge Natl Lab, Oak Ridge, TN USA..
    TensorFlow Doing HPC An Evaluation of TensorFlow Performance in HPC Applications2019In: 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Institute of Electrical and Electronics Engineers (IEEE) , 2019, p. 509-518Conference paper (Refereed)
    Abstract [en]

    TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HPC applications. However, very few experiments have been conducted to evaluate TensorFlow performance when running HPC workloads on supercomputers. This work addresses this lack by designing four traditional HPC benchmark applications: STREAM, matrix-matrix multiply, Conjugate Gradient (CG) solver and Fast Fourier Transform (FFT). We analyze their performance on two supercomputers with accelerators and evaluate the potential of TensorFlow for developing HPC applications. Our tests show that TensorFlow can fully take advantage of high performance networks and accelerators on supercomputers. Running our Tensor-Flow STREAM benchmark, we obtain over 50% of theoretical communication bandwidth on our testing platform. We find an approximately 2x, 1.7x and 1.8x performance improvement when increasing the number of GPUs from two to four in the matrix-matrix multiply, CG and FFT applications respectively. All our performance results demonstrate that TensorFlow has high potential of emerging also as HPC programming framework for heterogeneous supercomputers.

  • 26.
    Chien, Steven Wei Der
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Sishtla, Chaitanya Prasad
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Jun, Zhang
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Peng, Ivy Bo
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems2018In: Proceedings of the 5th International Conference on Exascale Applications and Software, The University of Edinburgh , 2018, p. 34-Conference paper (Refereed)
    Abstract [en]

    Computational intensive applications such as pattern recognition, and natural language processing, are increasingly popular on HPC systems. Many of these applications use deep-learning, a branch of machine learning, to determine the weights of artificial neural network nodes by minimizing a loss function. Such applications depend heavily on dense matrix multiplications, also called tensorial operations. The use of Graphics Processing Unit (GPU) has considerably speeded up deep-learning computations, leading to a Renaissance of the artificial neural network. Recently, the NVIDIA Volta GPU and the Google Tensor Processing Unit (TPU) have been specially designed to support deep-learning workloads. New programming models have also emerged for convenient expression of tensorial operations and deep-learning computational paradigms. An example of such new programming frameworks is TensorFlow, an open-source deep-learning library released by Google in 2015. TensorFlow expresses algorithms as a computational graph where nodes represent operations and edges between nodes represent data flow. Multi-dimensional data such as vectors and matrices which flows between operations are called Tensors. For this reason, computation problems need to be expressed as a computational graph. In particular, TensorFlow supports distributed computation with flexible assignment of operation and data to devices such as GPU and CPU on different computing nodes. Computation on devices are based on optimized kernels such as MKL, Eigen and cuBLAS. Inter-node communication can be through TCP and RDMA. This work attempts to evaluate the usability and expressiveness of the TensorFlow programming model for traditional HPC problems. As an illustration, we prototyped a distributed block matrix multiplication for large dense matrices which cannot be co-located on a single device and a Conjugate Gradient (CG) solver. We evaluate the difficulty of expressing traditional HPC algorithms using computational graphs and study the scalability of distributed TensorFlow on accelerated systems. Our preliminary result with distributed matrix multiplication shows that distributed computation on TensorFlow is extremely scalable. This study provides an initial investigation of new emerging programming models for HPC.

    Download full text (pdf)
    fulltext
  • 27. Doerner, K
    et al.
    Laure, Erwin
    high performance computing in the optimization of software test plans2002In: Optimization and Engineering, ISSN 1389-4420, E-ISSN 1573-2924, Vol. 3, no 1, p. 67-87Article in journal (Refereed)
    Abstract [en]

    Statistical software testing is an increasingly popular method in the software development cycle. An exact modeling of the usage profiles of a software system is an indispensable prerequisite for statistical testing. Recently, new techniques for obtaining optimal usage profiles even the presence of rarely used critical functions have been introduced. Although these techniques deliver unbiased dependability estimates with a single model (instead of using multiple models as it is the current practice) their applicability is hampered by their prohibitive computational complexity.

  • 28.
    Eriksson, Olivia
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Lindahl, Erik
    KTH, School of Engineering Sciences (SCI), Applied Physics, Biophysics. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Henningson, Dan S.
    KTH, School of Engineering Sciences (SCI), Mechanics, Stability, Transition and Control. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Ynnerman, Anders
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    e-Science in Scandinavia2018In: Informatik-Spektrum, ISSN 0170-6012, E-ISSN 1432-122X, Vol. 41, no 6, p. 398-404Article in journal (Refereed)
  • 29. Field, Laurence
    et al.
    Laure, Erwin
    Schulz, Markus W.
    Grid Deployment Experiences: Grid Interoperation2009In: JOURNAL OF GRID COMPUTING, ISSN 1570-7873, Vol. 7, no 3, p. 287-296Article in journal (Refereed)
    Abstract [en]

    Over recent years a number of Grid projects have emerged which have built Grid infrastructures that are now the computing backbones for various user communities. A significant number of these communities are limited to one Grid Infrastructure due to the different middleware and operations procedures used. Grid Interoperation is trying to bridge these differences and enable Virtual Organizations to access resources independent of the Grid project affiliation. Building upon the experiences the authors have gained while working on interoperation between EGEE and various other Grid infrastructures as well as through co-chairing the Grid Interoperation Now (GIN) efforts of the Open Grid Forum (OGF), this paper gives an overview of Grid Interoperation and describes various methods that can be used to connect Grid Infrastructures. The case is made for standardization in key areas and why the Grid community should move more aggressively towards standards.

  • 30. Gagliardi, F
    et al.
    Jones, B
    Laure, Erwin
    The EU datagrid project: Building and Operating a large scale Grid Infrastructure2006In: Engineering the Grid: Status and Perspective, American Scientific Publishers, 2006Chapter in book (Refereed)
    Abstract [en]

    The EU DataGrid project (EDG) has its aim to develop a large-scale research testbed for Grid computing. During the lifetime of the Project (January 2001 - March 2004) high level Grid middleware has been developed and a large scale testbed has been up and running continuously since the beginning of 2002. Three application domains are using this testbed to explore the potential Grid computing has for their production environments: Particle Physics, Earth Observation and Biomedics. The EDG testbed, spanning some 20 major sites all over Europe as well as sites in the US and Asia is one of the largest Grid infrastructures in the world. The software developed by EDG is being exploited in many national and international Grid projects, most notably the LCG production infrastructure. In this talk we give a brief overview on Grid computing, present the architecture of the EDG middleware, report on our experiences in deploying and operating a large scale Grid infrastructure and discuss how our application groups are using this infrastructure. An outlook on the future perspectives of Grid computing is concluding the talk.

  • 31.
    Gholami, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Dowling, Jim
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    A security framework for population-scale genomics analysis2015In: Proceedings of the 2015 International Conference on High Performance Computing and Simulation, HPCS 2015, IEEE conference proceedings, 2015, p. 106-114Conference paper (Refereed)
    Abstract [en]

    Biobanks store genomic material from identifiable individuals. Recently many population-based studies have started sequencing genomic data from biobank samples and cross-linking the genomic data with clinical data, with the goal of discovering new insights into disease and clinical treatments. However, the use of genomic data for research has far-reaching implications for privacy and the relations between individuals and society. In some jurisdictions, primarily in Europe, new laws are being or have been introduced to legislate for the protection of sensitive data relating to individuals, and biobank-specific laws have even been designed to legislate for the handling of genomic data and the clear definition of roles and responsibilities for the owners and processors of genomic data. This paper considers the security questions raised by these developments. We introduce a new threat model that enables the design of cloud-based systems for handling genomic data according to privacy legislation. We also describe the design and implementation of a security framework using our threat model for BiobankCloud, a platform that supports the secure storage and processing of genomic data in cloud computing environments.

  • 32.
    Gholami, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Lind, Anna-Sara
    Reichel, Jane
    Litton, Jan-Eric
    Edlund, Åke
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Privacy Threat Modeling for Emerging BiobankClouds2014In: Procedia Computer Science: The 5th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN-2014)/ The 4th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH 2014)/ Affiliated Workshops, Elsevier, 2014, Vol. 37, p. 489-496Conference paper (Refereed)
    Abstract [en]

    There is an increased amount of data produced by next generation sequencing (NGS) machines which demand scalable storage and analysis of genomic data. In order to cope with this huge amount of information, many biobanks are interested in cloud computing capabilities such as on-demand elasticity of computing power and storage capacity. There are several security and privacy requirements mandated by personal data protection legislation which hinder biobanks from migrating big data generated by the NGS machines. This paper describes the privacy requirements of platform-as-service BiobankClouds according to the European Data Protection Directive (DPD). It identifies several key privacy threats which leave BiobankClouds vulnerable to an attack. This study benefits health-care application designers in the requirement elicitation cycle when building privacy-preserving BiobankCloud platforms.

  • 33.
    Gholami, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Svensson, Gert
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Eickhoff, M.
    Brasche, G.
    ScaBIA: Scalable brain image analysis in the cloud2013In: CLOSER 2013 - Proceedings of the 3rd International Conference on Cloud Computing and Services Science, 2013, p. 329-336Conference paper (Refereed)
    Abstract [en]

    The use of cloud computing as a new paradigm has become a reality. Cloud computing leverages the use of on-demand CPU power and storage resources while eliminating the cost of commodity hardware ownership. Cloud computing is now gaining popularity among many different organizations and commercial sectors. In this paper, we present the scalable brain image analysis (ScaBIA) architecture, a new model to run statistical parametric analysis (SPM) jobs using cloud computing. SPM is one of the most popular toolkits in neuroscience for running compute-intensive brain image analysis tasks. However, issues such as sharing raw data and results, as well as scalability and performance are major bottlenecks in the "single PC"-execution model. In this work, we describe a prototype using the generic worker (GW), an e-Science as a service middleware, on top of Microsoft Azure to run and manage the SPM tasks. The functional prototype shows that ScaBIA provides a scalable framework for multi-job submission and enables users to share data securely using storage access keys across different organizations.

  • 34.
    Gong, Jing
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Otten, Matthew
    Fischer, Paul
    Min, Misun
    Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations2016In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 72, no 11, p. 4160-4180Article in journal (Refereed)
    Abstract [en]

    We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier-Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix-matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather-scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.

  • 35.
    Gong, Jing
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Cebamanos, Luis
    Hart, Alistair
    Min, Misun
    Fischer, Paul
    NekBone with Optimizaed OpenACC directives2015Conference paper (Refereed)
    Abstract [en]

    Accelerators and, in particular, Graphics Processing Units (GPUs) have emerged as promising computing technologies which may be suitable for the future Exascale systems. Here, we present performance results of NekBone, a benchmark of the Nek5000 code, implemented with optimized OpenACC directives and GPUDirect communications. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. Results of an optimized NekBone version lead to 78 Gflops performance on a single node. In addition, a performance result of 609 Tflops has been reached on 16, 384 GPUs of the Titan supercomputer at Oak Ridge National Laboratory.

     

  • 36.
    Gong, Jing
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Henningson, Dan
    KTH, School of Engineering Sciences (SCI), Mechanics, Stability, Transition and Control. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Mechanics, Stability, Transition and Control. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Peplinski, Adam
    Hart, Alistair
    Doleschal, Jens
    Henty, David
    Fischer, Paul
    Nek5000 with OpenACC2015In: Solving software challenges for exascale, 2015, p. 57-68Conference paper (Refereed)
    Abstract [en]

    Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flows. We follow up on an earlier study which ported the simplified version of Nek5000 to a GPU-accelerated system by presenting the hybrid CPU/GPU implementation of the full Nek5000 code using OpenACC. The matrix-matrix multiplication, the Nek5000 gather-scatter operator and a preconditioned Conjugate Gradient solver have implemented using OpenACC for multi-GPU systems. We report an speed-up of 1.3 on single node of a Cray XK6 when using OpenACC directives in Nek5000. On 512 nodes of the Titan supercomputer, the speed-up can be approached to 1.4. A performance analysis of the Nek5000 code using Score-P and Vampir performance monitoring tools shows that overlapping of GPU kernels with host-accelerator memory transfers would considerably increase the performance of the OpenACC version of Nek5000 code.

  • 37.
    Iakymchuk, Roman
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Jordan, Herbert
    University of Innsbruck, Institute of Computer Science.
    Peng, Ivy Bo
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    A Particle-in-Cell Method for Automatic Load-Balancing with the AllScale Environment2016Conference paper (Other academic)
    Abstract [en]

    We present an initial design and implementation of a Particle-in-Cell (PIC) method based on the work carried out in the European Exascale AllScale project. AllScale provides a unified programming system for the effective development of highly scalable, resilient and performance-portable parallel applications for Exascale systems. The AllScale approach is based on task-based nested recursive parallelism and it provides mechanisms for automatic load-balancing in the PIC simulations. We provide the preliminary results of the AllScale-based PIC implementation and draw directions for its future development. 

    Download full text (pdf)
    fulltext
  • 38.
    Iakymchuk, Roman
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Quintana-Orti, E. S.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Graillat, S.
    Towards reproducible blocked lu factorization2017In: Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 1598-1607, article id 7965230Conference paper (Refereed)
    Abstract [en]

    In this article, we address the problem of reproducibility of the blocked LU factorization on GPUs due to cancellations and rounding errors when dealing with floating-point arithmetic. Thanks to the hierarchical structure of linear algebra libraries, the computations carried within this operation can be expressed in terms of the Level-3 BLAS routines as well as the unblocked variant of the factorization, while the latter is correspondingly built upon the Level-1/2 BLAS kernels. In addition, we strengthen numerical stability of the blocked LU factorization via partial row pivoting. Therefore, we propose a double-layer bottom-up approach for ensuring reproducibility of the blocked LUfactorization and provide experimental results for its underlying blocks.

  • 39.
    Ivanov, Ilya
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Gong, Jing
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Akhmetova, Dana
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Peng, Ivy Bo
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Machado, Rui
    Rahn, Mirko
    Bartsch, Valeria
    Hart, Alistair
    Fischer, Paul
    Evaluation of Parallel Communication Models in Nekbone, a Nek5000 mini-application2015In: 2015 IEEE International Conference on Cluster Computing, IEEE , 2015, p. 760-767Conference paper (Refereed)
    Abstract [en]

    Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with the goal of studying the performance of different parallel communication models. First, a new MPI blocking communication kernel has been developed to solve Nekbone problems in a three-dimensional Cartesian mesh and process topology. The new MPI implementation delivers a 13% performance improvement compared to the original implementation. The new MPI communication kernel consists of approximately 500 lines of code against the original 7,000 lines of code, allowing experimentation with new approaches in Nekbone parallel communication. Second, the MPI blocking communication in the new kernel was changed to the MPI non-blocking communication. Third, we developed a new Partitioned Global Address Space (PGAS) communication kernel, based on the GPI-2 library. This approach reduces the synchronization among neighbor processes and is on average 3% faster than the new MPI-based, non-blocking, approach. In our tests on 8,192 processes, the GPI-2 communication kernel is 3% faster than the new MPI non-blocking communication kernel. In addition, we have used the OpenMP in all the versions of the new communication kernel. Finally, we highlight the future steps for using the new communication kernel in the parent application Nek5000.

  • 40.
    Ivanov, Ilya
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Machado, Rui
    Rahn, Mirko
    Akhmetova, Dana
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Gong, Jing
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Mechanics. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Henningson, Dan
    KTH, School of Engineering Sciences (SCI), Mechanics, Stability, Transition and Control. KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Fischer, Paul
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Evaluating New Communication Models in the Nek5000 Code for Exascale2015Conference paper (Other academic)
  • 41.
    Jansson, Niclas
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Karp, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Perez, Adalberto
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Mukha, Timofey
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Ju, Yi
    Max Planck Computing and Data Facility, Garching, Germany.
    Liu, Jiahui
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Pall, Szilard
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    Max Planck Computing and Data Facility, Garching, Germany.
    Weinkauf, Tino
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Schumacher, Jörg
    Technische Universität Ilmenau, Ilmenau, Germany.
    Schlatter, Philipp
    Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg, Germany.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations2023In: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery (ACM) , 2023, p. 1-9, article id 5Conference paper (Refereed)
    Abstract [en]

    We detail our developments in the high-fidelity spectral-element code Neko that are essential for unprecedented large-scale direct numerical simulations of fully developed turbulence. Major inno- vations are modular multi-backend design enabling performance portability across a wide range of GPUs and CPUs, a GPU-optimized preconditioner with task overlapping for the pressure-Poisson equation and in-situ data compression. We carry out initial runs of Rayleigh–Bénard Convection (RBC) at extreme scale on the LUMI and Leonardo supercomputers. We show how Neko is able to strongly scale to 16,384 GPUs and obtain results that are not pos- sible without careful consideration and optimization of the entire simulation workflow. These developments in Neko will help resolv- ing the long-standing question regarding the ultimate regime in RBC. 

  • 42.
    Jansson, Niclas
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Towards a Parallel Algebraic Multigrid Solver Using PGAS2018In: 2018 Workshop on High Performance Computing Asia, New York, NY, USA: Association for Computing Machinery (ACM), 2018, p. 31-38Conference paper (Refereed)
    Abstract [en]

    The Algebraic Multigrid (AMG) method has over the years developed into an efficient tool for solving unstructured linear systems. The need to solve large industrial problems discretized on unstructured meshes, has been a key motivation for devising a parallel AMG method. Despite some success, the key part of the AMG algorithm; the coarsening step, is far from trivial to parallelize efficiently. We here introduce a novel parallelization of the inherently sequential Ruge-Stüben coarsening algorithm, that retains most of the good interpolation properties of the original method. Our parallelization is based on the Partitioned Global Address Space (PGAS) abstraction, which greatly simplifies the parallelization as compared to traditional message passing based implementations. The coarsening algorithm and solver is described in detail and a performance study on a Cray XC40 is presented.

  • 43. Jones, B
    et al.
    Kranzlmuller, D
    Laure, Erwin
    Appleton, O
    Lessons from Europe's International Grid Initiatives: Grid Technology in Africa2006Conference paper (Refereed)
    Abstract [en]

    This paper presents the progress and recent history of Grid computing in Europe, aimed at the development of a sustainable European Grid infrastructure. This is achieved by examining the Enabling Grids for E-sciencE (EGEE) project, funded by the European Commission (EC) to build a multi-science Grid infrastructure for the European Research Area, and recently connected to the African continent through other EC funded projects. By considering these developments, this paper extracts lessons for development of Grid computing in Africa. Among these are the need to connect to the broader international Grid community to ensure the success of Grid computing, and the proposal of a new incremental model for Grid development that involves leveraging existing Grid infrastructures to support small Grid projects in less developed areas. These lessons are then applied to the African continent, with suggestions of the most effective avenues for development of an African Grid infrastructure.

  • 44. Karagiannis, F.
    et al.
    Keramida, D.
    Ioannidis, Y.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Vitlacil, Dejan
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Short, Faith
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Technological and organisational aspects of global research data infrastructures towards year 20202013In: Data Science Journal, E-ISSN 1683-1470, Vol. 12, p. GRDI1-GRDI5Article in journal (Refereed)
    Abstract [en]

    A general-purpose Global Research Data Infrastructure (GRDI) for all sciences and research purposes is not conceivable for the next decade as there are too many discipline-specific modalities that currently prevail for such generalisation efforts to be effective. On the other hand, a more pragmatic approach is to start from what currently exists, identify best practices and key issues, and promote effective inter-domain collaboration among different components forming an ecosystem. This will promote interoperability, data exchange, data preservation, and distributed access (among others). This ecosystem of interoperable research data infrastructures will be composed of regional, disciplinary, and multidisciplinary components, such as libraries, archives, and data centres, offering data services for both primary datasets and publications. The ecosystem will support data-intensive science and research and stimulate the interaction among all its elements, thus promoting multidisciplinary and interdisciplinary science. This special issue includes a set of independent papers from renowned experts on organisational and technological issues related to GRDIs. These documents feed into and compliment the GRDI2020 roadmap, which supports a Global Research Data Infrastructure ecosystem.

  • 45. Karlsson, A.
    et al.
    Olofsson, N.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Clements, M.
    A parallel microsimulation package for modelling cancer screening policies2017In: Proceedings of the 2016 IEEE 12th International Conference on e-Science, e-Science 2016, IEEE, 2017, p. 323-330Conference paper (Refereed)
    Abstract [en]

    Microsimulation with stochastic life histories is an important tool in the development of public policies. In this article, we use microsimulation to evaluate policies for prostate cancer testing. We implemented the microsimulations as an R package, with pre- and post-processing in R and with the simulations written in C++. Calibrating a microsimulation model with a large population can be computationally expensive. To address this issue, we investigated four forms of parallelism: (i) shared memory parallelism using R; (ii) shared memory parallelism using OpenMP at the C++ level; (iii) distributed memory parallelism using R; and (iv) a hybrid shared/distributed memory parallelism using OpenMP at the C++ level and MPI at the R level. The close coupling between R and C++ offered advantages for ease of software dissemination and the use of high-level R parallelisation methods. However, this combination brought challenges when trying to use shared memory parallelism at the C++ level: the performance gained by hybrid OpenMP/MPI came at the cost of significant re-factoring of the existing code. As a case study, we implemented a prostate cancer model in the microsimulation package. We used this model to investigate whether prostate cancer testing with specific re-testing protocols would reduce harms and maintain any mortality benefit from prostate-specific antigen testing. We showed that four-yearly testing would have a comparable effectiveness and a marked decrease in costs compared with two-yearly testing and current testing. In summary, we developed a microsimulation package in R and assessed the cost-effectiveness of prostate cancer testing. We were able to scale up the microsimulations using a combination of R and C++, however care was required when using shared memory parallelism at the C++ level.

  • 46. Kunszt, P.
    et al.
    Laure, Erwin
    CERN.
    Stockinger, H.
    Stockinger, K.
    File-Based Replica Management2005In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 21, no 1, p. 115-123Article in journal (Refereed)
    Abstract [en]

    Data replication is one of the best known strategies to achieve high levels of availability and fault tolerance, as well as minimal access times for large, distributed user communities using a world-wide Data Grid. In certain scientific application domains, the data volume can reach the order of several petabytes; in these domains, data replication and access optimization play an important role in the manageability and usability of the Grid.

  • 47. Lamanna, M.
    et al.
    Laure, Erwin
    CERN European Organization for Nuclear Research.
    Preface2008In: Journal of Grid Computing, ISSN 1570-7873, E-ISSN 1572-9184, Vol. 6, no 1, p. 1-2Article in journal (Other academic)
  • 48. Laure, Erwin
    A Java Framework for Distributed High Performance Computing2001In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 18, no 2, p. 235-251Article in journal (Refereed)
    Abstract [en]

    The past few years have dramatically changed the view of high performance applications and computing. While traditionally such applications have been targeted towards dedicated parallel machines, we see the emerging trend of building "meta-applications" composed of several modules that exploit heterogeneous platforms and employ hybrid forms of parallelism. In particular, Java has been recognized as modem programming language for heterogeneous distributed computing. In this paper we present OpusJava, a Java based framework for distributed high performance computing (DHPC) that provides a high level component infrastructure and facilitates a seamless integration of high performance Opus (i.e., HPF) modules into larger distributed environments. OpusJava offers a comprehensive programming model that supports the exploitation of hybrid parallelism and provides high level coordination means. (C) 2001 Elsevier Science B.V. All rights reserved.

  • 49.
    Laure, Erwin
    EDG.
    The EU DataGrid Setting the Basis for Production Grids: Preface2004In: Journal of Grid Computing, ISSN 1570-7873, E-ISSN 1572-9184, Vol. 2, no 4, p. 299-300Article in journal (Other academic)
  • 50. Laure, Erwin
    The European e-Infrastructure Ecosystem2009In: Research in a connected world / [ed] Alex Voss, Elizabeth Vander Meer, David Fergusson, Edinburgh: David Fergusson , 2009Chapter in book (Other academic)
123 1 - 50 of 117
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf