Change search
Refine search result
1 - 8 of 8
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Gardfjall, Peter
    et al.
    Elmroth, Erik
    Johnsson, Lennart
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Mulmo, Olle
    Sandholm, Thomas
    Scalable Grid-wide capacity allocation with the SweGrid Accounting System (SGAS)2008In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 20, no 18, p. 2089-2122Article in journal (Refereed)
    Abstract [en]

    The SweGrid Accounting System (SGAS) allocates capacity in collaborative Grid environments by coordinating enforcement of Grid-wide usage limits as a means to offer usage guarantees and prevent overuse. SGAS employs a credit-based allocation model where Grid capacity is granted to projects via Grid-wide quota allowances that can be spent across the Grid resources. The resources Collectively enforce these allowances in a soft, real-time manner. SGAS is built on service-oriented principles with a strong focus on interoperability, and Web services standards. This article covers the SGAS design and implementation, which, besides addressing inherent Grid challenges (scale, security, heterogeneity, decentralization), emphasizes generality and flexibility to produce a customizable system with lightweight integration into different middleware and scheduling system combinations. We focus the discussion around the system design, a flexible allocation model, middleware integration experiences and scalability improvements via a distributed virtual banking system, and finally, an extensive set of testhed experiments. The experiments evaluate the performance of SGAS in terms of response times, request throughput, overall system scalability, and its performance impact on the Globus Toolkit 4 job submission software. We conclude that, for all practical purposes, the quota enforcement overhead incurred by SGAS on job submissions is not a limiting factor for the job-handling capacity of the job submission software.

  • 2.
    Laure, Erwin
    et al.
    CERN, Europ. Org. for Nuclear Research, Geneva, Switzerland.
    Stockinger, H.
    Stockinger, K.
    Performance Engineering in Data Grids2005In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 17, no 2-4, p. 171-191Article in journal (Refereed)
    Abstract [en]

    The vision of Grid computing is to facilitate worldwide resource sharing among distributed collaborations. With the help of numerous national and international Grid projects, this vision is becoming reality and Grid systems are attracting an ever increasing user base. However, Grids are still quite complex software systems whose efficient use is a difficult and error-prone task. In this paper we present performance engineering techniques that aim to facilitate an efficient use of Grid systems, in particular systems that deal with the management of large-scale data sets in the tera- and petabyte range (also referred to as data Grids). These techniques are applicable at different layers of a Grid architecture and we discuss the tools required at each of these layers to implement them. Having discussed important performance engineering techniques, we investigate how major Grid projects deal with performance issues particularly related to data Grids and how they implement the techniques presented.

  • 3.
    Podobas, Artur
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Brorsson, Mats
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Faxén, Karl-Filip
    Swedish Institute of Computer Science.
    A comparative performance study of common and popular task-centric programming frameworks2013In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634Article in journal (Refereed)
    Abstract [en]

    SUMMARY: Programmers today face a bewildering array of parallel programming models and tools, making it difficult to choose an appropriate one for each application. An increasingly popular programming model supporting structured parallel programming patterns in a portable and composable manner is the task-centric programming model. In this study, we compare several popular task-centric programming frameworks, including Cilk Plus, Threading Building Blocks, and various implementations of OpenMP 3.0. We have analyzed their performance on the Barcelona OpenMP Tasking Suite benchmark suite both on a 48-core AMD Opteron 6172 server and a 64-core TILEPro64 embedded many-core processor. Our results show that the OpenMP offers the highest flexibility for programmers, and this flexibility comes to a cost. Frameworks supporting only a specific and more restrictive model, such as Cilk Plus and Threading Building Blocks, are generally more efficient both in terms of performance and energy consumption. However, Intel's implementation of OpenMP tasks performs the best and closest to the specialized run-time systems.

  • 4. Riedel, M.
    et al.
    Laure, Erwin
    Open Grid Forum, Grid Interoperation Now (GIN), Community Group (CG).
    Geddes, N.
    et al.,
    Interoperation of World-Wide Production e-Science Infrastructures2009In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 21, no 8, p. 961-990Article in journal (Refereed)
    Abstract [en]

    Many production Grid and e-Science infrastructures have begun to offer services to end-users during the past several years with an increasing number of scientific applications that require access to a wide variety of resources and services in multiple Grids. Therefore, the Grid Interoperation Now-Community Group of the Open Grid Forum-organizes and manages interoperation efforts among those production Grid infrastructures to reach the goal of a world-wide Grid vision on a technical level in the near future. This contribution highlights fundamental approaches of the group and discusses open standards in the context of production e-Science infrastructures.

  • 5. Valero-Lara, P.
    et al.
    Jansson, J.
    KTH, School of Computer Science and Communication (CSC).
    Heterogeneous CPU+GPU approaches for mesh refinement over Lattice-Boltzmann simulations2016In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634Article in journal (Refereed)
    Abstract [en]

    The use of mesh refinement in CFD is an efficient and widely used methodology to minimize the computational cost by solving those regions of high geometrical complexity with a finer grid. In this work, the author focuses on studying two methods, one based on Multi-Domain and one based on Irregular meshing, to deal with mesh refinement over LBM simulations. The numerical formulation is presented in detail. It is proposed two approaches, homogeneous GPU and heterogeneous CPU+GPU, on each of the refinement methods. Obviously, the use of the two architectures, CPU and GPU, to compute the same problem involves more important challenges with respect to the homogeneous counterpart. These challenges and the strategies to deal with them are described in detail into the present work. We pay a particular attention to the differences among both methodologies/implementations in terms of programmability, memory management, and performance. The size of the refined sub-domain has important consequences over both methodologies; however, the influence on Multi-Domain approach is much higher. For instance, when dealing with a big refined sub-domain, the Multi-Domain approach achieves an important fall in performance with respect to other cases, where the size of the refined sub-domain is smaller. Otherwise, using the Irregular approach, there is no such a dramatic fall in performance when increasing the size of the refined sub-domain. © 2016 John Wiley & Sons, Ltd.

  • 6. Vapirev, A.
    et al.
    Deca, J.
    Lapenta, G.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Hur, I.
    Cambier, J. -L
    Initial results on computational performance of Intel many integrated core, sandy bridge, and graphical processing unit architectures: implementation of a 1D c++/OpenMP electrostatic particle-in-cell code2015In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 27, no 3, p. 581-593Article in journal (Refereed)
    Abstract [en]

    We present initial comparison performance results for Intel many integrated core (MIC), Sandy Bridge (SB), and graphical processing unit (GPU). A 1D explicit electrostatic particle-in-cell code is used to simulate a two-stream instability in plasma. We compare the computation times for various number of cores/threads and compiler options. The parallelization is implemented via OpenMP with a maximum thread number of 128. Parallelization and vectorization on the GPU is achieved with modifying the code syntax for compatibility with CUDA. We assess the speedup due to various auto-vectorization and optimization level compiler options. Our results show that the MIC is several times slower than SB for a single thread, and it becomes faster than SB when the number of cores increases with vectorization switched on. The compute times for the GPU are consistently about six to seven times faster than the ones for MIC. Compared with SB, the GPU is about two times faster for a single thread and about an order of magnitude faster for 128 threads. The net speedup, however, for MIC and GPU are almost the same. An initial attempt to offload parts of the code to the MIC coprocessor shows that there is an optimal number of threads where the speedup reaches a maximum.

  • 7.
    Varisteas, Georgios
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Brorsson, Mats
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS. SICS Swedish Institute of Computer Science, Sweden.
    Palirria: accurate on-line parallelism estimation for adaptive work-stealing2015In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634Article in journal (Refereed)
    Abstract [en]

    We present Palirria, a self-adapting work-stealing scheduling method for nested fork/join parallelism that can be used to estimate the number of utilizable workers and self-adapt accordingly. The estimation mechanism is optimized for accuracy, minimizing the requested resources without degrading performance. We implemented Palirria for both the Linux and Barrelfish operating systems and evaluated it on two platforms: a 48-core Non-Uniform Memory Access (NUMA) multiprocessor and a simulated 32-core system. Compared with state-of-the-art, we observed higher accuracy in estimating resource requirements. This leads to improved resource utilization and performance on par or better to executing with fixed resource allotments.

  • 8. Wang, Jiechen
    et al.
    Cui, Can
    Rui, Yikang
    KTH, School of Architecture and the Built Environment (ABE), Urban Planning and Environment, Geodesy and Geoinformatics.
    Cheng, Liang
    Pu, Yingxia
    Wu, Wenzhou
    Yuan, Zhenyu
    A parallel algorithm for constructing Voronoi diagrams based on point-set adaptive grouping2014In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 26, no 2, p. 434-446Article in journal (Refereed)
    Abstract [en]

    This paper presents a parallel algorithm for constructing Voronoi diagrams based on point-set adaptive grouping. The binary tree splitting method is used to adaptively group the point set in the plane and construct sub-Voronoi diagrams for each group. Given that the construction of Voronoi diagrams in each group consumes the majority of time and that construction within one group does not affect that in other groups, the use of a parallel algorithm is suitable.After constructing the sub-Voronoi diagrams, we extracted the boundary points of the four sides of each sub-group and used to construct boundary site Voronoi diagrams. Finally, the sub-Voronoi diagrams containing each boundary point are merged with the corresponding boundary site Voronoi diagrams. This produces the desired Voronoi diagram. Experiments demonstrate the efficiency of this parallel algorithm, and its time complexity is calculated as a function of the size of the point set, the number of processors, the average number of points in each block, and the number of boundary points.

1 - 8 of 8
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf