Change search
Refine search result
12 51 - 55 of 55
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 51. Valero-Lara, P.
    et al.
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). Basque Center for Applied Mathematics (BCAM).
    Multi-domain grid refinement for lattice-Boltzmann simulations on heterogeneous platforms2015In: Proceedings - IEEE 18th International Conference on Computational Science and Engineering, CSE 2015, IEEE Computer Society, 2015, p. 1-8Conference paper (Refereed)
    Abstract [en]

    The main contribution of the present work consists of several parallel approaches for grid refinement based on a multi-domain decomposition for lattice-Boltzmann simulations. The proposed method for discretizing the fluid incorporates different regular Cartesian grids with no homogeneous spatial domains, which are in need to be communicated each other. Three different parallel approaches are proposed, homogeneous Multicore, homogeneous GPU, and heterogeneous Multicore-GPU. Although, the homogeneous implementations exhibit satisfactory results, the heterogeneous approach achieves up to 30% extra efficiency, in terms of Millions of Fluid Lattice Updates per Second (MFLUPS), by overlapping some of the steps on both architectures, Multicore and GPU. © 2015 IEEE.

  • 52. Valero-Lara, P.
    et al.
    Krishnasamy, E.
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). Basque Center for Applied Mathematics (BCAM), Spain.
    Towards HPC-embedded. Case study: Kalray and message-passing on NoC2017In: Scalable Computing: Practice and Experience, ISSN 1895-1767, E-ISSN 1895-1767, Vol. 18, no 2, p. 151-160Article in journal (Refereed)
    Abstract [en]

    Today one of the most important challenges in HPC is the development of computers with a low power consumption. In this context, recently, new embedded many-core systems have emerged. One of them is Kalray. Unlike other many-core architectures, Kalray is not a co-processor (self-hosted). One interesting feature of the Kalray architecture is the Network on Chip (NoC) connection. Habitually, the communication in many-core architectures is carried out via shared memory. However, in Kalray, the communication among processing elements can also be via Message-Passing on the NoC. One of the main motivations of this work is to present the main constraints to deal with the Kalray architecture. In particular, we focused on memory management and communication. We assess the use of NoC and shared memory on Kalray. Unlike shared memory, the implementation of Message-Passing on NoC is not transparent from programmer point of view. The synchronization among processing elements and NoC is other of the challenges to deal with in the Karlay processor. Although the synchronization using Message-Passing is more complex and consuming time than using shared memory, we obtain an overall speedup close to 6 when using Message-Passing on NoC with respect to the use of shared memory. Additionally, we have measured the power consumption of both approaches. Despite of being faster, the use of NoC presents a higher power consumption with respect to the approach that exploits shared memory. This additional consumption in Watts is about a 50%. However, the reduction in time by using NoC has an important impact on the overall power consumption as well.

  • 53. Valero-Lara, Pedro
    et al.
    Nookala, Poornima
    Pelayo, Fernando L.
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). BCAM, Bilbao, Spain.
    Dimitropoulos, Serapheim
    Raicu, Ioan
    MANY-TASK COMPUTING ON MANY-CORE ARCHITECTURES2016In: Scalable Computing: Practice and Experience, ISSN 1895-1767, E-ISSN 1895-1767, Vol. 17, no 1, p. 33-46Article in journal (Refereed)
    Abstract [en]

    Many-Task Computing (MTC) is a common scenario for multiple parallel systems, such as cluster, grids, cloud and supercomputers, but it is not so popular in shared memory parallel processors. In this sense and given the spectacular growth in performance and in number of cores integrated in many-core architectures, the study of MTC on such architectures is becoming more and more relevant. In this paper, authors present what are those programming mechanisms to take advantages of such massively parallel features for the particular target of MTC. Also, the hardware features of the two dominant many-core platforms (NVIDIA's GPUs and Intel Xeon Phi) are also analyzed for our specific framework. Given the important differences in terms of hardware and software in our two many-core platforms, we have considered different strategies based on CUDA (for GPUs) and OpenMP (for Intel Xeon Phi). We carried out several test cases based on an appropriate and widely studied problem for benchmarking as matrix multiplication. Essentially, this study consisted of comparing the time consumed for computing in parallel several tasks one by one (the whole computational resources are used just to compute one task at a time) with the time consumed for computing in parallel the same set of tasks simultaneously (the whole computational resources are used for computing the set of tasks at very same time). Finally, we compared both software-hardware scenarios to identify the most relevant computer features in each of our many-core architectures.

  • 54.
    Vilela De Abreu, Rodrigo
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Towards the development of adaptive finite element methods for internal flow aeroacoustics2013In: 19th AIAA/CEAS Aeroacoustics Conference, 2013Conference paper (Refereed)
    Abstract [en]

    We report the latest results obtained in the development of an adaptive finite element method for computational aeroacoustics (CAA). The new methodology is based on the General Galerkin (G2) method, which has been successfully used for the computation of incompressible, turbulent flow. Here, we simulate the flow past an in-duct mixer plate and compare the results with available experimental data. The comparisons include mean velocity profiles and frequency content of the turbulent signal. No direct simulation of sound or sound wave propagation has been performed; instead, simple analogy arguments have been used to extract acoustic results from incompressible simulations by assuming a direct correlation between the computed pressure drop signal and the sound at the far field. We were able to reproduce the sound signal from experiments with our incompressible simulation and our results compared well with both the level and the broadband frequency peak of the measured sound. We suggest that the methodology presented here is mainly suitable for the prediction of sound in low Mach number pipe flows.

  • 55. Wendt, Fabian F
    et al.
    Yu, Yi-Hsiang
    Nielsen, Kim
    Ruehl, Kelley
    Bunnik, Tim
    Touzon, Imanol
    Nam, Bo Woo
    Kim, Jeong Seok
    Kim, Kyong-Hwan
    Janson, Carl Erik
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Hoffman, Johan
    et al.,
    International Energy Agency Ocean Energy Systems Task 10 Wave Energy Converter Modeling Verification and Validation2017In: 12th European Wave and Tidal Energy Conference European Wave and Tidal Energy Conference, 2017Conference paper (Refereed)
12 51 - 55 of 55
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf