Ändra sökning
Avgränsa sökresultatet
1 - 5 av 5
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Daneshtalab, Masoud
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System. University of Turku, Turku, Finland.
    Ebrahimi, Masoumeh
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektronik och Inbyggda System. University of Turku, Turku, Finland.
    Dytckov, Sergei
    Plosila, Juha
    In-order delivery approach for 2D and 3D NoCs2015Ingår i: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 71, nr 8, s. 2877-2899Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In many applications, it is critical to guarantee the in-order delivery of requests from the master cores to the slave cores, so that the requests can be executed in the correct order without requiring buffers. Since in NoCs packets may use different paths and on the other hand traffic congestion varies on different routes, the in-order delivery constraint cannot be met without support. To guarantee the in-order delivery, traditional approaches either use dimension-order routing or employ reordering buffers at network interfaces. Dimension-order routing degrades the performance considerably while the usage of reordering buffers imposes large area overhead. In this paper, we present a mechanism allowing packets to be routed through multiple paths in the network, helping to balance the traffic load while guaranteeing the in-order delivery. The proposed method combines the advantages of both deterministic and adaptive routing algorithms. The simple idea is to use different deterministic algorithms for independent flows. This approach neither requires reordering buffers nor limits packets to use a single path. The algorithm is simple and practical with negligible area overhead over dimension-order routing. The concept is investigated in both 2D and 3D mesh networks.

  • 2. Farahnakian, Fahimeh
    et al.
    Ebrahimi, Masoumeh
    Daneshtalab, Masoud
    Liljeberg, Pasi
    Plosila, Juha
    Adaptive Load Balancing in Learning-based Approaches for Many-core Embedded Systems2014Ingår i: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, ISSN 0920-8542, Vol. 68, nr 3, s. 1214-1234Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Adaptive routing algorithms improve network performance by distributingtraffic over the whole network. However, they require congestion information to facilitateload balancing. To provide local and global congestion information, we proposea learning method based on dual reinforcement learning approach. This informationcan be dynamically updated according to the changing traffic condition in the networkby propagating data and learning packets. We utilize a congestion detection methodwhich updates the learning rate according to the congestion level. This method calculatesthe average number of free buffer slots in each switch at specific time intervalsand compares it with maximum and minimum values. Based on the comparison result,the learning rate sets to a value between 0 and 1. If a switch gets congested, the learningrate is set to a high value, meaning that the global information is more important thanlocal. In contrast, local is more emphasized than global information in non-congestedswitches. Results show that the proposed approach achieves a significant performanceimprovement over the traditional Q-routing, DRQ-routing, DBAR and Dynamic XYalgorithms.

  • 3.
    Gong, Jing
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Otten, Matthew
    Fischer, Paul
    Min, Misun
    Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations2016Ingår i: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 72, nr 11, s. 4160-4180Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier-Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix-matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather-scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.

  • 4. Rahmani, Amir-Mohammad
    et al.
    Liljeberg, Pasi
    Plosila, Juha
    Tenhunen, Hannu
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Elektroniksystem.
    Special section on advances in methods for adaptive multicore systems2014Ingår i: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 68, nr 3, s. 1023-1026Artikel i tidskrift (Refereegranskat)
  • 5.
    Thoman, Peter
    et al.
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Dichev, Kiril
    Queens Univ Belfast, Belfast BT7 1NN, Antrim, North Ireland..
    Heller, Thomas
    Univ Erlangen Nurnberg, D-91058 Erlangen, Germany..
    Iakymchuk, Roman
    KTH.
    Aguilar, Xavier
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Hasanov, Khalid
    IBM Ireland, Dublin 15, Ireland..
    Gschwandtner, Philipp
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Lemarinier, Pierre
    IBM Ireland, Dublin 15, Ireland..
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Jordan, Herbert
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Fahringer, Thomas
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Katrinis, Kostas
    IBM Ireland, Dublin 15, Ireland..
    Laure, Erwin
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Parallelldatorcentrum, PDC.
    Nikolopoulos, Dimitrios S.
    Queens Univ Belfast, Belfast BT7 1NN, Antrim, North Ireland..
    A taxonomy of task-based parallel programming technologies for high-performance computing2018Ingår i: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 74, nr 4, s. 1422-1434Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Task-based programming models for shared memory-such as Cilk Plus and OpenMP 3-are well established and documented. However, with the increase in parallel, many-core, and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today.

1 - 5 av 5
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf