Change search
Refine search result
12 1 - 50 of 88
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fuerlinger, Karl
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Automatic On-Line Detection of MPI Application Structure with Event Flow Graphs2015In: EURO-PAR 2015: PARALLEL PROCESSING, Springer Berlin/Heidelberg, 2015, p. 70-81Conference paper (Refereed)
    Abstract [en]

    The deployment of larger and larger HPC systems challenges the scalability of both applications and analysis tools. Performance analysis toolsets provide users with means to spot bottlenecks in their applications by either collecting aggregated statistics or generating loss-less time-stamped traces. While obtaining detailed trace information is the best method to examine the behavior of an application in detail, it is infeasible at extreme scales due to the huge volume of data generated. In this context, knowing the application structure, and particularly the nesting of loops in iterative applications is of great importance as it allows, among other things, to reduce the amount of data collected by focusing on important sections of the code. In this paper we demonstrate how the loop nesting structure of an MPI application can be extracted on-line from its event flow graph without the need of any explicit source code instrumentation. We show how this knowledge on the application structure can be used to compute postmortem statistics as well as to reduce the amount of redundant data collected. To that end, we present a usage scenario where this structure information is utilized on-line (while the application runs) to intelligently collect fine-grained data for only a few iterations of an application, considerably reducing the amount of data gathered.

  • 2.
    Aguilar, Xavier
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Fürlinger, K.
    Laure, Erwin
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Online MPI trace compression using event flow graphs and wavelets2016In: Procedia Computer Science, Elsevier, 2016, p. 1497-1506Conference paper (Refereed)
    Abstract [en]

    Performance analysis of scientific parallel applications is essential to use High Performance Computing (HPC) infrastructures efficiently. Nevertheless, collecting detailed data of large-scale parallel programs and long-running applications is infeasible due to the huge amount of performance information generated. Even though there are no technological constraints in storing Terabytes of performance data, the constant flushing of such data to disk introduces a massive overhead into the application that makes the performance measurements worthless. This paper explores the use of Event flow graphs together with wavelet analysis and EZW-encoding to provide MPI event traces that are orders of magnitude smaller while preserving accurate information on timestamped events. Our mechanism compresses the performance data online while the application runs, thus, reducing the pressure put on the I/O system due to buffer flushing. As a result, we achieve lower application perturbation, reduced performance data output, and the possibility to monitor longer application runs. © The Authors. Published by Elsevier B.V.

  • 3.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fürlinger, Karl
    Ludwig-Maximilians-Universitat (LMU).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    MPI Trace Compression Using Event Flow Graphs2014Conference paper (Refereed)
    Abstract [en]

    Understanding how parallel applications behave is crucial for using high-performance computing (HPC) resources efficiently. However, the task of performance analysis is becoming increasingly difficult due to the growing complexity of scientific codes and the size of machines. Even though many tools have been developed over the past years to help in this task, current approaches either only offer an overview of the application discarding temporal information, or they generate huge trace files that are often difficult to handle.

    In this paper we propose the use of event flow graphs for monitoring MPI applications, a new and different approach that balances the low overhead of profiling tools with the abundance of information available from tracers. Event flow graphs are captured with very low overhead, require orders of magnitude less storage than standard trace files, and can still recover the full sequence of events in the application. We test this new approach with the NERSC-8/Trinity Benchmark suite and achieve compression ratios up to 119x.

  • 4.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fürlinger, Karl
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Visual MPI Performance Analysis using Event Flow Graphs2015In: Procedia Computer Science, ISSN 1877-0509, E-ISSN 1877-0509, Vol. 51, p. 1353-1362Article in journal (Refereed)
    Abstract [en]

    Event flow graphs used in the context of performance monitoring combine the scalability and low overhead of profiling methods with lossless information recording of tracing tools. In other words, they capture statistics on the performance behavior of parallel applications while pre- serving the temporal ordering of events. Event flow graphs require significantly less storage than regular event traces and can still be used to recover the full ordered sequence of events performed by the application.  In this paper we explore the usage of event flow graphs in the context of visual performance analysis. We show that graphs can be used to quickly spot performance problems, helping to better understand the behavior of an application. We demonstrate our performance analysis approach with MiniFE, a mini-application that mimics the key performance aspects of finite- element applications in High Performance Computing (HPC).

  • 5.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fürlinger, Karl
    Ludwig-Maximilians-Universität München.
    Online Performance Data Introspection with IPM2014In: Proceedings of the 15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013), IEEE Computer Society, 2014, p. 728-734Conference paper (Refereed)
    Abstract [en]

    Exascale systems will be heterogeneous architectures with multiple levels of concurrency and energy constraints. In such a complex scenario, performance monitoring and runtime systems play a major role to obtain good application performance and scalability. Furthermore, online access to performance data becomes a necessity to decide how to schedule resources and orchestrate computational elements: processes, threads, tasks, etc. We present the Performance Introspection API, an extension of the IPM tool that provides online runtime access to performance data from an application while it runs. We describe its design and implementation and show its overhead on several test benchmarks. We also present a real test case using the Performance Introspection API in conjunction with processor frequency scaling to reduce power consumption.

  • 6.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Vahtras, Olav
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Gimenez, Judit
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Scalability analysis of Dalton, a molecular structure program2013In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 29, no 8, p. 2197-2204Article in journal (Refereed)
    Abstract [en]

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a performance characterization and optimization of Dalton. We also propose a solution to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers. With these improvements we obtain speedups of 4x, increasing the parallel efficiency of the code and being able to run in it in a much bigger number of cores.

  • 7.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Vahtras, Olav
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Gimenez, Judit
    Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Barcelona, Spain.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Scaling Dalton, a molecular electronic structure program2011In: Seventh International Conference on e-Science, e-Science 2011, 5-8 December 2011, Stockholm, Sweden, IEEE conference proceedings, 2011, p. 256-262Conference paper (Refereed)
    Abstract [en]

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a characterization and performance optimization of Dalton that increases the scalability and parallel efficiency of the application. We also propose asolution that helps to avoid the master/worker design of Daltonto become a performance bottleneck for larger process numbers and increase the parallel efficiency.

  • 8.
    Ahmed, Laeeq
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Edlund, Åke
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Spjuth, O.
    Using iterative MapReduce for parallel virtual screening2013In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), IEEE Computer Society, 2013, p. 27-32Conference paper (Refereed)
    Abstract [en]

    Virtual Screening is a technique in chemo informatics used for Drug discovery by searching large libraries of molecule structures. Virtual Screening often uses SVM, a supervised machine learning technique used for regression and classification analysis. Virtual screening using SVM not only involves huge datasets, but it is also compute expensive with a complexity that can grow at least up to O(n2). SVM based applications most commonly use MPI, which becomes complex and impractical with large datasets. As an alternative to MPI, MapReduce, and its different implementations, have been successfully used on commodity clusters for analysis of data for problems with very large datasets. Due to the large libraries of molecule structures in virtual screening, it becomes a good candidate for MapReduce. In this paper we present a MapReduce implementation of SVM based virtual screening, using Spark, an iterative MapReduce programming model. We show that our implementation has a good scaling behaviour and opens up the possibility of using huge public cloud infrastructures efficiently for virtual screening.

  • 9.
    Ahmed, Laeeq
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Edlund, Åke
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Whitmarsh, S.
    Parallel real time seizure detection in large EEG data2016In: IoTBD 2016 - Proceedings of the International Conference on Internet of Things and Big Data, SciTePress, 2016, p. 214-222Conference paper (Refereed)
    Abstract [en]

    Electroencephalography (EEG) is one of the main techniques for detecting and diagnosing epileptic seizures. Due to the large size of EEG data in long term clinical monitoring and the complex nature of epileptic seizures, seizure detection is both data-intensive and compute-intensive. Analysing EEG data for detecting seizures in real time has many applications, e.g., in automatic seizure detection or in allowing a timely alarm signal to be presented to the patient. In real time seizure detection, seizures have to be detected with negligible delay, thus requiring lightweight algorithms. MapReduce and its variations have been effectively used for data analysis in large dataset problems on general-purpose machines. In this study, we propose a parallel lightweight algorithm for epileptic seizure detection using Spark Streaming. Our algorithm not only classifies seizures in real time, it also learns an epileptic threshold in real time. We furthermore present "top-k amplitude measure" as a feature for classifying seizures in the EEG, that additionally assists in reducing data size. In a benchmark experiment we show that our algorithm can detect seizures in real time with low latency, while maintaining a good seizure detection rate. In short, our algorithm provides new possibilities in using private cloud infrastructures for real time epileptic seizure detection in EEG data.

  • 10.
    Ahmed, Laeeq
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Georgiev, Valentin
    Capuccini, Marco
    Toor, Salman
    Schaal, Wesley
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Spjuth, Ola
    Efficient iterative virtual screening with Apache Spark and conformal prediction2018In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 10, article id 8Article in journal (Refereed)
    Abstract [en]

    Background: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. Contribution: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. Results: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub (https://github.com/laeeq80/spark-cpvs) and can be run on high-performance computers as well as on cloud resources.

  • 11. Akhmetova, D.
    et al.
    Kestor, G.
    Gioiosa, R.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    On the application task granularity and the interplay with the scheduling overhead in many-core shared memory systems2015In: Proceedings - IEEE International Conference on Cluster Computing, ICCC, IEEE , 2015, p. 428-437Conference paper (Refereed)
    Abstract [en]

    Task-based programming models are considered one of the most promising programming model approaches for exascale supercomputers because of their ability to dynamically react to changing conditions and reassign work to processing elements. One question, however, remains unsolved: what should the task granularity of task-based applications be? Fine-grained tasks offer more opportunities to balance the system and generally result in higher system utilization. However, they also induce in large scheduling overhead. The impact of scheduling overhead on coarse-grained tasks is lower, but large systems may result imbalanced and underutilized. In this work we propose a methodology to analyze the interplay between application task granularity and scheduling overhead. Our methodology is based on three main points: 1) a novel task algorithm that analyzes an application directed acyclic graph (DAG) and aggregates tasks, 2) a fast and precise emulator to analyze the application behavior on systems with up to 1,024 cores, 3) a comprehensive sensitivity analysis of application performance and scheduling overhead breakdown. Our results show that there is an optimal task granularity between 1.2x10^4 and 10x10^4 cycles for the representative schedulers. Moreover, our analysis indicates that a suitable scheduler for exascale task-based applications should employ a best-effort local scheduler and a sophisticated remote scheduler to move tasks across worker threads.

  • 12.
    Akhmetova, Dana
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Cebamanos, L.
    Iakymchuk, Roman
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Rotaru, T.
    Rahn, M.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Bartsch, V.
    Simmendinger, C.
    Interoperability of GASPI and MPI in large scale scientific applications2018In: 12th International Conference on Parallel Processing and Applied Mathematics, PPAM 2017, Springer Verlag , 2018, p. 277-287Conference paper (Refereed)
    Abstract [en]

    One of the main hurdles of a broad distribution of PGAS approaches is the prevalence of MPI, which as a de-facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like GASPI without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we address this challenge by providing our study and preliminary performance results regarding interoperating GASPI and MPI on the performance crucial parts of the Ludwig and iPIC3D applications. In addition, we draw a strategy for better coupling of both APIs. 

  • 13.
    Apostolov, Rossen
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Axner, Lilit
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Agren, Hans
    Ayugade, Eduard
    Duta, Mihai
    Gelpi, Jose Luis
    Gimenez, Judit
    Goni, Ramon
    Hess, Berk
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics.
    Jamitzky, Ferdinand
    Kranzmuller, Dieter
    Labarta, Jesus
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Lindahl, Erik
    KTH, School of Engineering Sciences (SCI), Theoretical Physics, Theoretical & Computational Biophysics.
    Orozco, Modesto
    Peterson, Magnus
    Satzger, Helmut
    Trefethen, Anne
    Scalable Software Services for Life Science2011In: Proceedings of 9th HealthGrid conference, 2011Conference paper (Refereed)
    Abstract [en]

    Life Science is developing into one of the largest e- Infrastructure users in Europe, in part due to the ever-growing amount of biological data. Modern drug design typically includes both sequence bioinformatics, in silico virtual screening, and free energy calculations, e.g. of drug binding. This development will accelerate tremendously, and puts high demands on simulation software and support services. e-Infrastructure projects such as PRACE/DEISA have made important advances on hardware and scalability, but have largely been focused on theoretical scalability for large systems, while typical life science applications rather concern small-to-medium size molecules. Here, we propose to address this with by implementing new techniques for efficient small-system parallelization combined with throughput and ensemble computing to enable the life science community to exploit the largest next-generation e-Infrastructures. We will also build a new cross-disciplinary Competence Network for all of life science, to position Europe as the world-leading community for development and maintenance of this software e-Infrastructure. Specifically, we will (1) develop new hierarchical parallelization approaches explicitly based on ensemble and high-throughput computing for new multi-core and streaming/GPU architectures, and establish open software standards for data storage and exchange, (2) implement, document, and maintain such techniques in pilot European open-source codes such as the widely used GROMACS & DALTON, a new application for ensemble simulation (DISCRETE), and large-scale bioinformatics protein annotation, (3) create a Competence Centre for scalable life science software to strengthen Europe as a major software provider and to enable the community to exploit e-Infrastructures to their full extent. This Competence Network will provide training and support infrastructure, and establish a long-term framework for maintenance and optimization of life science codes.

  • 14. Appleton, O
    et al.
    Jones, B
    Kranzlmüller, D
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    The EGEE-II Project: Evolution Towards a Permanent European Grid Initiative2008In: Advances in Parallel Computing: High Performance Computing and Grids in Action / [ed] Lucio Grandinetti, IOS Press, 2008, 16, p. 424-435Chapter in book (Refereed)
    Abstract [en]

    Enabling Grids for E-sciencE represents the worlds' largest multidisciplinary grid infrastructure today. Co-funded by the European Commission, it brings together more than 90 partners in 32 countries to produce a reliable and scalable computing resource available to the European and global research community. At present, it consists of more than 200 sites in over 40 countries and makes more than 35,000 CPUs available to users 24 hours a day, 7 days a week. This article provides an overview of EGEE, its infrastructure, middleware, applications and support structures. From this experience, the current state of future plans will be explained, which is summarized under the term European Grid Initiative (EGI), and represents an emerging federated model for sustainable future grid infrastructures.

  • 15.
    Ardestani, Shahrzad
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Håkansson, Carl Johan
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Livenson, I.
    Stranak, P.
    Dima, E.
    Blommesteijn, D.
    Van De Sanden, M.
    B2SHARE: An open eScience data sharing platform2015In: Proceedings - 11th IEEE International Conference on eScience, IEEE , 2015, p. 448-453Conference paper (Refereed)
    Abstract [en]

    Scientific data sharing is becoming an essential service for data driven science and can significantly improve the scientific process by making reliable, and trustworthy data available. Thereby reducing redundant work, and providing insights on related research and recent advancements. For data sharing services to be useful in the scientific process, they need to fulfill a number of requirements that cover not only discovery, and access to data. But to ensure the integrity, and reliability of published data as well. B2SHARE, developed by the EUDAT project, provides such a data sharing service to scientific communities. For communities that wish to download, install and maintain their own service, it is also available as software. B2SHARE is developed with a focus on user-friendliness, reliability, and trustworthiness, and can be customized for different organizations and use-cases. In this paper we discuss the design, architecture, and implementation of B2SHARE. We show its usefulness in the scientific process with some case studies in the biodiversity field.

  • 16. Bessani, A.
    et al.
    Brandt, J.
    Bux, M.
    Cogo, V.
    Dimitrova, L.
    Dowling, Jim
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Gholami, Ali
    KTH.
    Hakimzadeh, Kamal
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Hummel, M.
    Ismail, Mahmoud
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Leser, U.
    Litton, J. -E
    Martinez, R.
    Niazi, Salman
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Reichel, J.
    Zimmermann, K.
    BiobankCloud: A platform for the secure storage, sharing, and processing of large biomedical data sets2016In: 1st International Workshop on Data Management and Analytics for Medicine and Healthcare, DMAH 2015 and Workshop on Big-Graphs Online Querying, Big-O(Q) 2015 held in conjunction with 41st International Conference on Very Large Data Bases, VLDB 2015, Springer, 2016, p. 89-105Conference paper (Refereed)
    Abstract [en]

    Biobanks store and catalog human biological material that is increasingly being digitized using next-generation sequencing (NGS). There is, however, a computational bottleneck, as existing software systems are not scalable and secure enough to store and process the incoming wave of genomic data from NGS machines. In the BiobankCloud project, we are building a Hadoop-based platform for the secure storage, sharing, and parallel processing of genomic data. We extended Hadoop to include support for multi-tenant studies, reduced storage requirements with erasure coding, and added support for extensible and consistent metadata. On top of Hadoop, we built a scalable scientific workflow engine featuring a proper workflow definition language focusing on simple integration and chaining of existing tools, adaptive scheduling on Apache Yarn, and support for iterative dataflows. Our platform also supports the secure sharing of data across different, distributed Hadoop clusters. The software is easily installed and comes with a user-friendly web interface for running, managing, and accessing data sets behind a secure 2-factor authentication. Initial tests have shown that the engine scales well to dozens of nodes. The entire system is open-source and includes pre-defined workflows for popular tasks in biomedical data analysis, such as variant identification, differential transcriptome analysis using RNA-Seq, and analysis of miRNA-Seq and ChIP-Seq data.

  • 17. Cameron, D.
    et al.
    Casey, J.
    Guy, L.
    Kunszt, P.
    Lemaitre, S.
    Mc Cance, G.
    Stockinger, H.
    Stockinger, K.
    Andronico, G.
    Bell, W.
    Ben-Akiva, I.
    Bosio, D.
    Chytracek, R.
    Domenici, A.
    Donno, F.
    Hoschek, W.
    Laure, Erwin
    Lucio, L.
    Millar, P.
    Salconi, L.
    Segal, B.
    Silander, M.
    Replica Management in the European Data Grid Project2004In: Journal of Grid Computing, ISSN 1570-7873, E-ISSN 1572-9184, Vol. 2, no 4, p. 341-351Article in journal (Refereed)
    Abstract [en]

    Within the European DataGrid project, Work Package 2 has designed and implemented a set of integrated replica management services for use by data intensive scientific applications. These services, based on the web services model, enable movement and replication of data at high speed from one geographical site to another, management of distributed replicated data, optimization of access to data, and the provision of a metadata management tool. In this paper we describe the architecture and implementation of these services and evaluate their performance under demanding Grid conditions.

  • 18. Capuccini, Marco
    et al.
    Ahmed, Laeeq
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Schaal, Wesley
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Spjuth, Ola
    Large-scale virtual screening on public cloud resources with Apache Spark2017In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 9, article id 15Article in journal (Refereed)
    Abstract [en]

    Background: Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. Results: We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against similar to 2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Conclusion: Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries.

  • 19.
    Chien, Steven Wei Der
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Sishtla, Chaitanya Prasad
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Zhang, Jun
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Peng, Ivy Bo
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    An Evaluation of the TensorFlow Programming Model for Solving Traditional HPC Problems2018In: Proceedings of the 5th International Conference on Exascale Applications and Software, The University of Edinburgh , 2018, p. 34-Conference paper (Refereed)
    Abstract [en]

    Computational intensive applications such as pattern recognition, and natural language processing, are increasingly popular on HPC systems. Many of these applications use deep-learning, a branch of machine learning, to determine the weights of artificial neural network nodes by minimizing a loss function. Such applications depend heavily on dense matrix multiplications, also called tensorial operations. The use of Graphics Processing Unit (GPU) has considerably speeded up deep-learning computations, leading to a Renaissance of the artificial neural network. Recently, the NVIDIA Volta GPU and the Google Tensor Processing Unit (TPU) have been specially designed to support deep-learning workloads. New programming models have also emerged for convenient expression of tensorial operations and deep-learning computational paradigms. An example of such new programming frameworks is TensorFlow, an open-source deep-learning library released by Google in 2015. TensorFlow expresses algorithms as a computational graph where nodes represent operations and edges between nodes represent data flow. Multi-dimensional data such as vectors and matrices which flows between operations are called Tensors. For this reason, computation problems need to be expressed as a computational graph. In particular, TensorFlow supports distributed computation with flexible assignment of operation and data to devices such as GPU and CPU on different computing nodes. Computation on devices are based on optimized kernels such as MKL, Eigen and cuBLAS. Inter-node communication can be through TCP and RDMA. This work attempts to evaluate the usability and expressiveness of the TensorFlow programming model for traditional HPC problems. As an illustration, we prototyped a distributed block matrix multiplication for large dense matrices which cannot be co-located on a single device and a Conjugate Gradient (CG) solver. We evaluate the difficulty of expressing traditional HPC algorithms using computational graphs and study the scalability of distributed TensorFlow on accelerated systems. Our preliminary result with distributed matrix multiplication shows that distributed computation on TensorFlow is extremely scalable. This study provides an initial investigation of new emerging programming models for HPC.

  • 20. Doerner, K
    et al.
    Laure, Erwin
    high performance computing in the optimization of software test plans2002In: Optimization and Engineering, ISSN 1389-4420, E-ISSN 1573-2924, Vol. 3, no 1, p. 67-87Article in journal (Refereed)
    Abstract [en]

    Statistical software testing is an increasingly popular method in the software development cycle. An exact modeling of the usage profiles of a software system is an indispensable prerequisite for statistical testing. Recently, new techniques for obtaining optimal usage profiles even the presence of rarely used critical functions have been introduced. Although these techniques deliver unbiased dependability estimates with a single model (instead of using multiple models as it is the current practice) their applicability is hampered by their prohibitive computational complexity.

  • 21. Field, Laurence
    et al.
    Laure, Erwin
    Schulz, Markus W.
    Grid Deployment Experiences: Grid Interoperation2009In: JOURNAL OF GRID COMPUTING, ISSN 1570-7873, Vol. 7, no 3, p. 287-296Article in journal (Refereed)
    Abstract [en]

    Over recent years a number of Grid projects have emerged which have built Grid infrastructures that are now the computing backbones for various user communities. A significant number of these communities are limited to one Grid Infrastructure due to the different middleware and operations procedures used. Grid Interoperation is trying to bridge these differences and enable Virtual Organizations to access resources independent of the Grid project affiliation. Building upon the experiences the authors have gained while working on interoperation between EGEE and various other Grid infrastructures as well as through co-chairing the Grid Interoperation Now (GIN) efforts of the Open Grid Forum (OGF), this paper gives an overview of Grid Interoperation and describes various methods that can be used to connect Grid Infrastructures. The case is made for standardization in key areas and why the Grid community should move more aggressively towards standards.

  • 22. Gagliardi, F
    et al.
    Jones, B
    Laure, Erwin
    The EU datagrid project: Building and Operating a large scale Grid Infrastructure2006In: Engineering the Grid: Status and Perspective, American Scientific Publishers, 2006Chapter in book (Refereed)
    Abstract [en]

    The EU DataGrid project (EDG) has its aim to develop a large-scale research testbed for Grid computing. During the lifetime of the Project (January 2001 - March 2004) high level Grid middleware has been developed and a large scale testbed has been up and running continuously since the beginning of 2002. Three application domains are using this testbed to explore the potential Grid computing has for their production environments: Particle Physics, Earth Observation and Biomedics. The EDG testbed, spanning some 20 major sites all over Europe as well as sites in the US and Asia is one of the largest Grid infrastructures in the world. The software developed by EDG is being exploited in many national and international Grid projects, most notably the LCG production infrastructure. In this talk we give a brief overview on Grid computing, present the architecture of the EDG middleware, report on our experiences in deploying and operating a large scale Grid infrastructure and discuss how our application groups are using this infrastructure. An outlook on the future perspectives of Grid computing is concluding the talk.

  • 23.
    Gholami, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Dowling, Jim
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    A security framework for population-scale genomics analysis2015In: Proceedings of the 2015 International Conference on High Performance Computing and Simulation, HPCS 2015, IEEE conference proceedings, 2015, p. 106-114Conference paper (Refereed)
    Abstract [en]

    Biobanks store genomic material from identifiable individuals. Recently many population-based studies have started sequencing genomic data from biobank samples and cross-linking the genomic data with clinical data, with the goal of discovering new insights into disease and clinical treatments. However, the use of genomic data for research has far-reaching implications for privacy and the relations between individuals and society. In some jurisdictions, primarily in Europe, new laws are being or have been introduced to legislate for the protection of sensitive data relating to individuals, and biobank-specific laws have even been designed to legislate for the handling of genomic data and the clear definition of roles and responsibilities for the owners and processors of genomic data. This paper considers the security questions raised by these developments. We introduce a new threat model that enables the design of cloud-based systems for handling genomic data according to privacy legislation. We also describe the design and implementation of a security framework using our threat model for BiobankCloud, a platform that supports the secure storage and processing of genomic data in cloud computing environments.

  • 24.
    Gholami, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Lind, Anna-Sara
    Reichel, Jane
    Litton, Jan-Eric
    Edlund, Åke
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Privacy Threat Modeling for Emerging BiobankClouds2014In: Procedia Computer Science: The 5th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN-2014)/ The 4th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH 2014)/ Affiliated Workshops, Elsevier, 2014, Vol. 37, p. 489-496Conference paper (Refereed)
    Abstract [en]

    There is an increased amount of data produced by next generation sequencing (NGS) machines which demand scalable storage and analysis of genomic data. In order to cope with this huge amount of information, many biobanks are interested in cloud computing capabilities such as on-demand elasticity of computing power and storage capacity. There are several security and privacy requirements mandated by personal data protection legislation which hinder biobanks from migrating big data generated by the NGS machines. This paper describes the privacy requirements of platform-as-service BiobankClouds according to the European Data Protection Directive (DPD). It identifies several key privacy threats which leave BiobankClouds vulnerable to an attack. This study benefits health-care application designers in the requirement elicitation cycle when building privacy-preserving BiobankCloud platforms.

  • 25.
    Gholami, Ali
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Svensson, Gert
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Eickhoff, M.
    Brasche, G.
    ScaBIA: Scalable brain image analysis in the cloud2013In: CLOSER 2013 - Proceedings of the 3rd International Conference on Cloud Computing and Services Science, 2013, p. 329-336Conference paper (Refereed)
    Abstract [en]

    The use of cloud computing as a new paradigm has become a reality. Cloud computing leverages the use of on-demand CPU power and storage resources while eliminating the cost of commodity hardware ownership. Cloud computing is now gaining popularity among many different organizations and commercial sectors. In this paper, we present the scalable brain image analysis (ScaBIA) architecture, a new model to run statistical parametric analysis (SPM) jobs using cloud computing. SPM is one of the most popular toolkits in neuroscience for running compute-intensive brain image analysis tasks. However, issues such as sharing raw data and results, as well as scalability and performance are major bottlenecks in the "single PC"-execution model. In this work, we describe a prototype using the generic worker (GW), an e-Science as a service middleware, on top of Microsoft Azure to run and manage the SPM tasks. The functional prototype shows that ScaBIA provides a scalable framework for multi-job submission and enables users to share data securely using storage access keys across different organizations.

  • 26.
    Gong, Jing
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Otten, Matthew
    Fischer, Paul
    Min, Misun
    Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations2016In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 72, no 11, p. 4160-4180Article in journal (Refereed)
    Abstract [en]

    We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier-Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix-matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather-scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.

  • 27.
    Gong, Jing
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Cebamanos, Luis
    Hart, Alistair
    Min, Misun
    Fischer, Paul
    NekBone with Optimizaed OpenACC directives2015Conference paper (Refereed)
    Abstract [en]

    Accelerators and, in particular, Graphics Processing Units (GPUs) have emerged as promising computing technologies which may be suitable for the future Exascale systems. Here, we present performance results of NekBone, a benchmark of the Nek5000 code, implemented with optimized OpenACC directives and GPUDirect communications. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. Results of an optimized NekBone version lead to 78 Gflops performance on a single node. In addition, a performance result of 609 Tflops has been reached on 16, 384 GPUs of the Titan supercomputer at Oak Ridge National Laboratory.

     

  • 28.
    Gong, Jing
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Henningson, Dan
    KTH, School of Engineering Sciences (SCI), Mechanics, Stability, Transition and Control. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Mechanics, Stability, Transition and Control. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Peplinski, Adam
    Hart, Alistair
    Doleschal, Jens
    Henty, David
    Fischer, Paul
    Nek5000 with OpenACC2015In: Solving software challenges for exascale, 2015, p. 57-68Conference paper (Refereed)
    Abstract [en]

    Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flows. We follow up on an earlier study which ported the simplified version of Nek5000 to a GPU-accelerated system by presenting the hybrid CPU/GPU implementation of the full Nek5000 code using OpenACC. The matrix-matrix multiplication, the Nek5000 gather-scatter operator and a preconditioned Conjugate Gradient solver have implemented using OpenACC for multi-GPU systems. We report an speed-up of 1.3 on single node of a Cray XK6 when using OpenACC directives in Nek5000. On 512 nodes of the Titan supercomputer, the speed-up can be approached to 1.4. A performance analysis of the Nek5000 code using Score-P and Vampir performance monitoring tools shows that overlapping of GPU kernels with host-accelerator memory transfers would considerably increase the performance of the OpenACC version of Nek5000 code.

  • 29.
    Iakymchuk, Roman
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Jordan, Herbert
    University of Innsbruck, Institute of Computer Science.
    Bo Peng, Ivy
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    A Particle-in-Cell Method for Automatic Load-Balancing with the AllScale Environment2016Conference paper (Other academic)
    Abstract [en]

    We present an initial design and implementation of a Particle-in-Cell (PIC) method based on the work carried out in the European Exascale AllScale project. AllScale provides a unified programming system for the effective development of highly scalable, resilient and performance-portable parallel applications for Exascale systems. The AllScale approach is based on task-based nested recursive parallelism and it provides mechanisms for automatic load-balancing in the PIC simulations. We provide the preliminary results of the AllScale-based PIC implementation and draw directions for its future development. 

  • 30.
    Ivanov, Ilya
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Gong, Jing
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Akhmetova, Dana
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Peng, Ivy Bo
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Machado, Rui
    Rahn, Mirko
    Bartsch, Valeria
    Hart, Alistair
    Fischer, Paul
    Evaluation of Parallel Communication Models in Nekbone, a Nek5000 mini-application2015In: 2015 IEEE International Conference on Cluster Computing, IEEE , 2015, p. 760-767Conference paper (Refereed)
    Abstract [en]

    Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with the goal of studying the performance of different parallel communication models. First, a new MPI blocking communication kernel has been developed to solve Nekbone problems in a three-dimensional Cartesian mesh and process topology. The new MPI implementation delivers a 13% performance improvement compared to the original implementation. The new MPI communication kernel consists of approximately 500 lines of code against the original 7,000 lines of code, allowing experimentation with new approaches in Nekbone parallel communication. Second, the MPI blocking communication in the new kernel was changed to the MPI non-blocking communication. Third, we developed a new Partitioned Global Address Space (PGAS) communication kernel, based on the GPI-2 library. This approach reduces the synchronization among neighbor processes and is on average 3% faster than the new MPI-based, non-blocking, approach. In our tests on 8,192 processes, the GPI-2 communication kernel is 3% faster than the new MPI non-blocking communication kernel. In addition, we have used the OpenMP in all the versions of the new communication kernel. Finally, we highlight the future steps for using the new communication kernel in the parent application Nek5000.

  • 31.
    Ivanov, Ilya
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Machado, Rui
    Rahn, Mirko
    Akhmetova, Dana
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Gong, Jing
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Mechanics. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Henningson, Dan
    KTH, School of Engineering Sciences (SCI), Mechanics, Stability, Transition and Control. KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Fischer, Paul
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Evaluating New Communication Models in the Nek5000 Code for Exascale2015Conference paper (Other academic)
  • 32.
    Jansson, Niclas
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
    Towards a Parallel Algebraic Multigrid Solver Using PGAS2018In: 2018 Workshop on High Performance Computing Asia, New York, NY, USA: Association for Computing Machinery (ACM), 2018, p. 31-38Conference paper (Refereed)
    Abstract [en]

    The Algebraic Multigrid (AMG) method has over the years developed into an efficient tool for solving unstructured linear systems. The need to solve large industrial problems discretized on unstructured meshes, has been a key motivation for devising a parallel AMG method. Despite some success, the key part of the AMG algorithm; the coarsening step, is far from trivial to parallelize efficiently. We here introduce a novel parallelization of the inherently sequential Ruge-Stüben coarsening algorithm, that retains most of the good interpolation properties of the original method. Our parallelization is based on the Partitioned Global Address Space (PGAS) abstraction, which greatly simplifies the parallelization as compared to traditional message passing based implementations. The coarsening algorithm and solver is described in detail and a performance study on a Cray XC40 is presented.

  • 33. Jones, B
    et al.
    Kranzlmuller, D
    Laure, Erwin
    Appleton, O
    Lessons from Europe's International Grid Initiatives: Grid Technology in Africa2006Conference paper (Refereed)
    Abstract [en]

    This paper presents the progress and recent history of Grid computing in Europe, aimed at the development of a sustainable European Grid infrastructure. This is achieved by examining the Enabling Grids for E-sciencE (EGEE) project, funded by the European Commission (EC) to build a multi-science Grid infrastructure for the European Research Area, and recently connected to the African continent through other EC funded projects. By considering these developments, this paper extracts lessons for development of Grid computing in Africa. Among these are the need to connect to the broader international Grid community to ensure the success of Grid computing, and the proposal of a new incremental model for Grid development that involves leveraging existing Grid infrastructures to support small Grid projects in less developed areas. These lessons are then applied to the African continent, with suggestions of the most effective avenues for development of an African Grid infrastructure.

  • 34. Karagiannis, F.
    et al.
    Keramida, D.
    Ioannidis, Y.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Vitlacil, Dejan
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Short, Faith
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Technological and organisational aspects of global research data infrastructures towards year 20202013In: Data Science Journal, ISSN 1683-1470, E-ISSN 1683-1470, Vol. 12, p. GRDI1-GRDI5Article in journal (Refereed)
    Abstract [en]

    A general-purpose Global Research Data Infrastructure (GRDI) for all sciences and research purposes is not conceivable for the next decade as there are too many discipline-specific modalities that currently prevail for such generalisation efforts to be effective. On the other hand, a more pragmatic approach is to start from what currently exists, identify best practices and key issues, and promote effective inter-domain collaboration among different components forming an ecosystem. This will promote interoperability, data exchange, data preservation, and distributed access (among others). This ecosystem of interoperable research data infrastructures will be composed of regional, disciplinary, and multidisciplinary components, such as libraries, archives, and data centres, offering data services for both primary datasets and publications. The ecosystem will support data-intensive science and research and stimulate the interaction among all its elements, thus promoting multidisciplinary and interdisciplinary science. This special issue includes a set of independent papers from renowned experts on organisational and technological issues related to GRDIs. These documents feed into and compliment the GRDI2020 roadmap, which supports a Global Research Data Infrastructure ecosystem.

  • 35. Karlsson, A.
    et al.
    Olofsson, N.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Clements, M.
    A parallel microsimulation package for modelling cancer screening policies2017In: Proceedings of the 2016 IEEE 12th International Conference on e-Science, e-Science 2016, IEEE, 2017, p. 323-330Conference paper (Refereed)
    Abstract [en]

    Microsimulation with stochastic life histories is an important tool in the development of public policies. In this article, we use microsimulation to evaluate policies for prostate cancer testing. We implemented the microsimulations as an R package, with pre- and post-processing in R and with the simulations written in C++. Calibrating a microsimulation model with a large population can be computationally expensive. To address this issue, we investigated four forms of parallelism: (i) shared memory parallelism using R; (ii) shared memory parallelism using OpenMP at the C++ level; (iii) distributed memory parallelism using R; and (iv) a hybrid shared/distributed memory parallelism using OpenMP at the C++ level and MPI at the R level. The close coupling between R and C++ offered advantages for ease of software dissemination and the use of high-level R parallelisation methods. However, this combination brought challenges when trying to use shared memory parallelism at the C++ level: the performance gained by hybrid OpenMP/MPI came at the cost of significant re-factoring of the existing code. As a case study, we implemented a prostate cancer model in the microsimulation package. We used this model to investigate whether prostate cancer testing with specific re-testing protocols would reduce harms and maintain any mortality benefit from prostate-specific antigen testing. We showed that four-yearly testing would have a comparable effectiveness and a marked decrease in costs compared with two-yearly testing and current testing. In summary, we developed a microsimulation package in R and assessed the cost-effectiveness of prostate cancer testing. We were able to scale up the microsimulations using a combination of R and C++, however care was required when using shared memory parallelism at the C++ level.

  • 36. Kunszt, P.
    et al.
    Laure, Erwin
    CERN.
    Stockinger, H.
    Stockinger, K.
    File-Based Replica Management2005In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 21, no 1, p. 115-123Article in journal (Refereed)
    Abstract [en]

    Data replication is one of the best known strategies to achieve high levels of availability and fault tolerance, as well as minimal access times for large, distributed user communities using a world-wide Data Grid. In certain scientific application domains, the data volume can reach the order of several petabytes; in these domains, data replication and access optimization play an important role in the manageability and usability of the Grid.

  • 37. Lamanna, M.
    et al.
    Laure, Erwin
    CERN European Organization for Nuclear Research.
    Preface2008In: Journal of Grid Computing, ISSN 1570-7873, E-ISSN 1572-9184, Vol. 6, no 1, p. 1-2Article in journal (Other academic)
  • 38. Laure, Erwin
    A Java Framework for Distributed High Performance Computing2001In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 18, no 2, p. 235-251Article in journal (Refereed)
    Abstract [en]

    The past few years have dramatically changed the view of high performance applications and computing. While traditionally such applications have been targeted towards dedicated parallel machines, we see the emerging trend of building "meta-applications" composed of several modules that exploit heterogeneous platforms and employ hybrid forms of parallelism. In particular, Java has been recognized as modem programming language for heterogeneous distributed computing. In this paper we present OpusJava, a Java based framework for distributed high performance computing (DHPC) that provides a high level component infrastructure and facilitates a seamless integration of high performance Opus (i.e., HPF) modules into larger distributed environments. OpusJava offers a comprehensive programming model that supports the exploitation of hybrid parallelism and provides high level coordination means. (C) 2001 Elsevier Science B.V. All rights reserved.

  • 39.
    Laure, Erwin
    EDG.
    The EU DataGrid Setting the Basis for Production Grids: Preface2004In: Journal of Grid Computing, ISSN 1570-7873, E-ISSN 1572-9184, Vol. 2, no 4, p. 299-300Article in journal (Other academic)
  • 40. Laure, Erwin
    The European e-Infrastructure Ecosystem2009In: Research in a connected world / [ed] Alex Voss, Elizabeth Vander Meer, David Fergusson, Edinburgh: David Fergusson , 2009Chapter in book (Other academic)
  • 41.
    Laure, Erwin
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Edlund, Åke
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    The e-Infrastructure Ecosystem: Providing Local Support to Global Science2012In: Large-Scale Computing, John Wiley & Sons, 2012, p. 19-34Chapter in book (Refereed)
  • 42. Laure, Erwin
    et al.
    Fisher, S.M.
    Frohner, A.
    Grandi, C.
    Kunszt, P.
    Krenek, A.
    Mulmo, O.
    Pacini, F.
    Prelz, F.
    White, J.
    Barroso, M.
    Buni, P.
    Hemmer, F.
    Di Meglio, A.
    Edlund, Åke
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Programming the Grid using gLite2006In: Computational Methods in Science and Technology, ISSN 1505-0602, Vol. 12, no 1, p. 33-45Article in journal (Other academic)
    Abstract [en]

    The past few years have seen the creation of the first production level Grid infrastructures that offer their users a dependableservice at an unprecedented scale. Depending on the flavor of middleware services these infrastructures deploy (for instance Condor,gLite, Globus, UNICORE, to name only a few) different interfaces to program the Grid infrastructures are provided. Despite ongoingefforts to standardize Grid service interfaces, there are still significant differences in how applications can interface to a Grid infrastructure.In this paper we describe the middleware (gLite) and services deployed on the EGEE Grid infrastructure and explain how applicationscan interface to them.

  • 43.
    Laure, Erwin
    et al.
    CERN, Geneva, Switzerland.
    Fisher, S-M
    Frohner, A.
    Grandi, C.
    Kunszt, P.
    Krenek, A.
    Mulmo, Olle
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Pacini, F.
    Prelz, F.
    White, J.
    Barroso, M.
    Buncic, P.
    Hemmer, F.
    Di Meglio, A.
    Edlund, Åke
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.
    Programming the grid with glite2006In: Computational Methods in Science and Technology, ISSN 1505-0602, Vol. 12, no 1, p. 33-45Article in journal (Other academic)
    Abstract [en]

    The past few years have seen the creation of the first production level Grid infrastructures that offer their users a dependable service at an unprecedented scale. Depending on the flavor of middleware services these infrastructures deploy (for instance Condor, gLite, Globus, UNICORE, to name only a few) different interfaces to program the Grid infrastructures are provided. Despite ongoing efforts to standardize Grid service interfaces, there are still significant differences in how applications can interface to a Grid infrastructure. In this paper we describe the middleware (gLite) and services deployed on the EGEE Grid infrastructure and explain how applications can interface to them.

  • 44. Laure, Erwin
    et al.
    Haines, M
    Mehrotra, P
    Zima, H
    on the implementation of the opus coordination language2000In: CONCURRENCY-PRACTICE AND EXPERIENCE, Vol. 12, no 4, p. 227-249Article in journal (Refereed)
    Abstract [en]

    Opus is a new programming language designed to assist in coordinating the execution of multiple, independent program modules. With the help of Opus, coarse grained tush parallelism between data parallel modules can be expressed in a clean and structured way, In this paper we address the problems of how to build a compilation and runtime support system that can efficiently implement the Opus constructs, Our design considers the often-conflicting goals of efficiency and modular construction through software re-use, In particular, we present the system requirements for an efficient Opus implementation, the Opus runtime system, and describe how they work together to provide the underlying services that the Opus compiler needs for a broad class of machines, Copyright (C) 2000 John Wiley & Sons, Ltd.

  • 45. Laure, Erwin
    et al.
    Hemmer, F
    Edlund, Åke
    KTH.
    Middleware for the Next Generation Grid Infrastructure2004Conference paper (Refereed)
    Abstract [en]

    The aim of the EGEE (Enabling Grids for E-Science in Europe) project is to create a reliable and dependable European Grid infrastructure for e-Science. The objective of the EGEE Middleware Re-engineering and Integration Research Activity is to provide robust middleware components, deployable on several platforms and operating systems, corresponding to the core Grid services for resource access, data management, information collection, authentication & authorization, resource matchmaking and brokering, and monitoring and accounting. For achieving this objective, we developed an architecture and design of the next generation Grid middleware leveraging experiences and existing components essentially from AliEn, EDG, and VDT. The architecture follows the service breakdown developed by the LCG ARDA group. Our strategy is to do as little original development as possible but rather re-engineer and harden existing Grid services. The evolution of these middleware components towards a Service Oriented Architecture (SOA) adopting existing standards (and following emerging ones) as much as possible is another major goal of our activity.

  • 46.
    Laure, Erwin
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Holmgren, Sverker
    Preface2013In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 29, no 8, p. 2115-2116Article in journal (Refereed)
  • 47.
    Laure, Erwin
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Jones, B.
    Enabling Grids for e-Science: The EGEE Project2009In: Grid computing: Infrastructure, Service, and application / [ed] Lizhe Wang, Wei Jie, Jinjun Chen, CRC Press, 2009, p. 55-74Chapter in book (Refereed)
    Abstract [en]

    Enabling Grids for E-sciencE represents the world's largest multi-disciplinary Grid infrastructure today. Co-funded by the European Commission, it brings together more than 250 resource centres from 48 countries to produce a reliable and scalable computing resource available to the European and global research community. This article provides an overview of EGEE, its infrastructure, middleware, applications and support structures. This article is intended to provide a first source of information for similar efforts elsewhere and based on EGEE's experiences a sustainable model for Grid operations will be discussed.

  • 48.
    Laure, Erwin
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Kao, O.
    Badia, R. M.
    Lefevre, L.
    Di Martino, B.
    Prodan, R.
    Turilli, M.
    Warneke, D.
    Topic 6: Grid, cluster and cloud computing (Introduction)2013In: Lecture Notes in Computer Science, 2013, p. 241-Conference paper (Refereed)
    Abstract [en]

    Grid and cloud computing have changed the IT landscape in the way we access and manage IT infrastructures. The use of computing resources has become essential for many applications in various areas. Both technologies provide easyto- use and on-demand access to large-scale infrastructures. The high number of submissions to "Topic 6: Grid, Cluster and Cloud Computing" reflected the importance of the research area. The papers addressed key challenges regarding design, deployment, operation and use of Grid and cloud infrastructures. Moreover, several innovative algorithms and methods for fundamental capabilities and services that are required in a heterogeneous environment, such as adaptability, scalability, reliability and security, and to support applications as diverse as ubiquitous local services, enterprise-scale virtual organizations, and internet-scale distributed supercomputing were proposed. Finally, many experimental evaluations and use-cases delivered an insight into the deployment in real-world scenarios and showed interesting future application domains. Each submission was reviewed by at least four reviewers and, finally, we were able to select nine high-quality papers. The papers were grouped in four sessions that are briefly summarized in following.

  • 49.
    Laure, Erwin
    et al.
    CERN, Europ. Org. for Nuclear Research, Geneva, Switzerland.
    Stockinger, H.
    Stockinger, K.
    Performance Engineering in Data Grids2005In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 17, no 2-4, p. 171-191Article in journal (Refereed)
    Abstract [en]

    The vision of Grid computing is to facilitate worldwide resource sharing among distributed collaborations. With the help of numerous national and international Grid projects, this vision is becoming reality and Grid systems are attracting an ever increasing user base. However, Grids are still quite complex software systems whose efficient use is a difficult and error-prone task. In this paper we present performance engineering techniques that aim to facilitate an efficient use of Grid systems, in particular systems that deal with the management of large-scale data sets in the tera- and petabyte range (also referred to as data Grids). These techniques are applicable at different layers of a Grid architecture and we discuss the tools required at each of these layers to implement them. Having discussed important performance engineering techniques, we investigate how major Grid projects deal with performance issues particularly related to data Grids and how they implement the techniques presented.

  • 50.
    Laure, Erwin
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Vitlacil, Dejan
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Data storage and management for global research data infrastructures - Status and perspectives2013In: Data Science Journal, ISSN 1683-1470, E-ISSN 1683-1470, Vol. 12, p. GRD37-GRDI42Article in journal (Refereed)
    Abstract [en]

    In the vision of Global Research Data Infrastructures (GRDIs), data storage and management plays a crucial role. A successful GRDI will require a common globally interoperable distributed data system, formed out of data centres, that incorporates emerging technologies and new scientific data activities. The main challenge is to define common certification and auditing frameworks that will allow storage providers and data communities to build a viable partnership based on trust. To achieve this, it is necessary to find a long-term commitment model that will give financial, legal, and organisational guarantees of digital information preservation. In this article we discuss the state of the art in data storage and management for GRDIs and point out future research directions that need to be tackled to implement GRDIs.

12 1 - 50 of 88
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf