Endre søk
Begrens søket
23456 201 - 250 of 260
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 201. Mathur, Kapil K.
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Communication Primitives for Unstructured Finite Element Simulations on Data Parallel Architectures1992Inngår i: Computing systems in engineering, Vol. 3, nr 1-4, s. 63-72Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Efficient data motion is critical for high performance computing on distributed memory architectures. The value of some techniques for efficient data motion is illustrated by identifying generic communication primitives. Further, the efficiency of these primitives is demonstrated on three different applications using the finite element method for unstructured grids and sparse solvers with different communication requirements. For the applications presented, the techniques advocated reduced the communication times by a factor of between 1.5 and 3.

  • 202. Mathur, Kapil K
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Data Parallel Algorithms for the Finite Element Method1990Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A data paralell implementation of the finite element method is described. The focus of the presentation is on data mapping and data motion. The essential ideas of the dataparallel implementation are developed for discretizations of regular domains by Lagrange elements ofarbitrary order in two and three dimensions, A generalization to irregular domains is also presented.Implementations of the data mappings for both regular and irregular domains have been made onthe Connection Machine®system model CM-2. Peak performances Well in excess of 2 Gñops have bee  measured for the evaluation of the elemental stiffness matrices. The performance of the iterative solver is in the range of 600  850 Mñops s'1.

  • 203. Mathur, Kapil K
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Element Order and Convergence Rate of the Conjugate Gradient Method for Stress Analysis on the Connection Machine1989Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A data parallel formulation of the finite element method is described. The data structures and the algorithms for stiffness matrix generation and the solution of the equilibrium equations are presented briefly. The generation of the elemental stiffness matrices requires no communication, even though each finite element is distributed over several processors. The conjugate gradient method with a diagonal preconditioner has been used for the solution of the resulting sparse linear system. This formulation has been implemented on the Connection Machine® model CM-2. The simulations reported in this article investigate the influence of the mesh discretization and the interpolation order on the convergence behavior of the conjugate gradient method. A linear dependence of the convergence behavior on the mesh discretization parameter is observed. In addition, the convergence rate depends on the interpolation order p as (p1.6). The peak floating point rate (single-precision) for the evaluation of the stiffness matrix is approximately 2.4 Gflops s-1. The iterative solver peaks at nearly 850 Mflops s-1.

  • 204. Mathur, Kapil K
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Multiplication of Matrices of Arbitrary Shape on a Data Parallel Computer1994Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 20, nr 7, s. 919-951Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Some level-2 and level-3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM-200 are described. No assumption is made on the shape or size of the operands. For matrix-matrix multiplication, both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in-place is described in detail. We show that a level-3 DBLAS yields better performance than a level-2 DBLAS. On the Connection Machine system CM-200, blocking yields a performance improvement by a factor of up to three over level-2 DBLAS. For certain matrix shapes the systolic algorithms offer both improved performance and significantly reduced temporary storage requirements compared to the nonsystolic block algorithms.

    We show that, in order to minimize the communication time, an algorithm that leaves the largest operand matrix stationary should be chosen for matrix-matrix multiplication. Furthermore, it is shown both analytically and experimentally that the optimum shape of the processor array yields square stationary submatrices in each processor, i.e. the ratio between the length of the axes of the processing array must be the same as the ratio between the corresponding axes of the stationary matrix. The optimum processor array shape may yield a factor of five performance enhancement for the multiplication of square matrices. For rectangular matrices a factor of 30 improvement was observed for an optimum processor array shape compared to a poorly chosen processor array shape.

  • 205. Mathur, Kapil K
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    The Finite Element Method on a Data Parallel Architecture1989Konferansepaper (Fagfellevurdert)
    Abstract [en]

    A data parallel implementation of the finite element method on the Connection Machine system CM-2® is presented. This implementation assumes that the elementary unit of data is an unassembled nodal point. In the context of the CM-2, each virtual processor represents an unassembled nodal point and nodal points shared between elements are replicated on different virtual processors. An algorithm for computing each elemental stiffness matrix concurrently, as well as different elemental stiffness matrices concurrently, without inter-processor communication is presented. The performance of the elemental stiffness matrix computation is in the range 1.6–1.9 GFlops s−1. The sparse system of linear equations that results from the finite element discretization has been solved by a conjugate gradient method with a diagonal preconditioner. The rate of convergence of the conjugate gradient iterations for boundary conditions which correspond to uniaxial deformations depends nonlinearly on the order of interpolation of the elements and linearly on the mesh discretization. Sample code segments are provided to illustrate the programming environment on a data parallel architecture.

  • 206. Mathur, Kapil K.
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    The Finite Element Method on a Data Parallel Computing System1989Inngår i: International journal of high speed computing, ISSN 0129-0533, Vol. 1, nr 1, s. 29-44Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A data parallel implementation of the finite element method on the Connection Machine system CM-2® is presented. This implementation assumes that the elementary unit of data is an unassembled nodal point. In the context of the CM-2, each virtual processor represents an unassembled nodal point and nodal points shared between elements are replicated on different virtual processors. An algorithm for computing each elemental stiffness matrices concurrently, as well as different elemental stiffness matrices concurrently, without inter-processor communicated is presented. The performance of the elemental stiffness matrix computation is in the range 1.6-1.9 GFlops s-1. The sparse system of linear equations that results from the finite element discretization has been solved by a conjugate gradient method with a diagonal preconditioner. The rate of convergence of the conjugate gradient iterations for boundary conditions which correspond to uniaxial deformations depends nonlineary on the order of interpolation of the elements and linearly on the mesh discretization. Sample code segments are provided to illustrate the programming environment on a data parallel architecture.

  • 207. Metere, A.
    et al.
    Oppelstrup, T.
    Sarman, S.
    Laaksonen, A.
    Dzugutov, Michail
    KTH, Skolan för teknikvetenskap (SCI), Matematik (Inst.). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Formation of the smectic-B crystal from a simple monatomic liquid2013Inngår i: Physical Review E. Statistical, Nonlinear, and Soft Matter Physics, ISSN 1539-3755, E-ISSN 1550-2376, Vol. 88, nr 6, s. 062502-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We report a molecular dynamics simulation demonstrating that the smectic-B crystalline phase (Cry-B), commonly observed in mesogenic systems of anisotropic molecules, can be formed by a system of identical particles interacting via a spherically symmetric potential. The Cry-B phase forms as a result of a first-order transition from an isotropic liquid phase upon isochoric cooling at appropriate number density. Its structure, determined by the design of the pair potential, corresponds to the Cry-B structure formed by elongated particles with the aspect ratio 1.8. The diffraction pattern and the real-space structure inspection demonstrate dominance of the ABC-type of axial layer stacking. This result opens a general possibility of producing smectic phases using isotropic interparticle interaction both in simulations and in colloidal systems.

  • 208. Metere, Alfredo
    et al.
    Sarman, Sten
    Oppelstrup, Tomas
    Dzugutov, Mikhail
    KTH, Skolan för teknikvetenskap (SCI), Matematik (Inst.). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Formation of a columnar liquid crystal in a simple one-component system of particles2015Inngår i: Soft Matter, ISSN 1744-683X, E-ISSN 1744-6848, Vol. 11, nr 23, s. 4606-4613Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We report a molecular dynamics simulation demonstrating that a columnar liquid crystal, commonly formed by disc-shaped molecules, can be formed by identical particles interacting via a spherically symmetric potential. Upon isochoric cooling from a low-density isotropic liquid state the simulated system underwent a weak first order phase transition which produced a liquid crystal phase composed of parallel particle columns arranged in a hexagonal pattern in the plane perpendicular to the column axis. The particles within columns formed a liquid structure and demonstrated a significant intracolumn diffusion. Further cooling resulted in another first-order transition whereby the column structure became periodically ordered in three dimensions transforming the liquid-crystal phase into a crystal. This result is the first observation of a columnar liquid crystal formation in a simple one-component system of particles. Its conceptual significance is in that it demonstrated that liquid crystals that have so far only been produced in systems of anisometric molecules can also be formed by mesoscopic soft-matter and colloidal systems of spherical particles with appropriately tuned interatomic potential.

  • 209. Mirkovic, Dragan
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Automatic Performance Tuning in the UHFFT Library2001Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we describe the architecture-specific automatic performance tuning implemented in the UHFFT library. The UHFFT library is an adaptive and portable software library for fast Fourier transforms (FFT).

  • 210. Mirkovic, Dragan
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    CODELAB: A Developers' Tool for Efficient Code Generation and Optimization2003Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we describe CODELAB, an integrated development environment (IDE) for efficient code development and optimization. The main idea behind the CODELAB IDE is to bring together several useful code development and optimization tools into an integrated and user friendly environment with a goal of simplifying the production and maintenance of high performance software libraries. We give an overview of the tool structure and provide examples that illustrate the use and the efficiency of the IDE.

  • 211. Mucci, P.
    et al.
    Smeds, Nils
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Ekman, P.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Performance monitoring with PAPI: Using the performance application programming interface2005Inngår i: Dr. Dobb's journal (1989), ISSN 1044-789X, Vol. 30, nr 6, s. 55-60Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The importance of using the performance application programming interface (PAPI) for performance monitoring is discussed. To facilitate the development of portable performance tools, PAPI provides interfaces to get information about the execution environment. It also provides methods to obtain a complete listing of what performance monitoring (PM) events are available for monitoring. PAPI's goal is to expose real hardware performance information to users, which will help in eliminating most of the guesswork regarding the root cause of a code's performance problem.

  • 212.
    Mucci, Philip J.
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Ahlin, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Danielsson, Johan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Ekman, Per
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Malinowski, Lars
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    PerfMiner: Cluster-wide collection, storage and presentation of application level hardware performance data2005Inngår i: EURO-PAR 2005 PARALLEL PROCESSING, PROCEEDINGS / [ed] Cunha, JC; Medeiros, PD, 2005, Vol. 3648, s. 124-133Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present PerfMiner, a system for the transparent collection, storage and presentation of thread-level hardware performance data across an entire cluster. Every sub-process/thread spawned by the user through the batch system is measured with near zero overhead and no dilation of run-time. Performance metrics are collected at the thread level using tool built on top of the Performance Application Programming Interface (PAPI). As the hardware counters are virtualized by the OS, the resulting counts are largely unaffected by other kernel or user processes. PerfMiner correlates this performance data with metadata from the batch system and places it in a database. Through a command line and web interface, the user can make queries to the database to report information on everything from overall workload characterization and system utilization to the performance of a single thread in a specific application. This is in contrast to other monitoring systems that report aggregate system-wide metrics sampled over a period of time. In this paper, we describe our implementation of PerfMiner as well as present some results from the test deployment of PerfMiner across three different clusters at the Center for Parallel Computers at The Royal Institute of Technology in Stockholm, Sweden.

  • 213.
    Natarajan Arul, Murugan
    et al.
    KTH, Skolan för bioteknologi (BIO), Teoretisk kemi och biologi.
    Apostolov, Rossen Pavlov
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Rinkevicius, Zilvinas
    KTH, Skolan för bioteknologi (BIO), Teoretisk kemi och biologi.
    Kongsted, Jacob
    epartment of Physics, Chemistry and Pharmacy, University of Southern Denmark.
    Lindahl, Erik
    KTH, Skolan för teknikvetenskap (SCI), Teoretisk fysik, Beräkningsbiofysik.
    Ågren, Hans
    KTH, Skolan för bioteknologi (BIO), Teoretisk kemi och biologi.
    Association dynamics and linear and nonlinear optical properties of an N-acetylaladanamide probe in a POPC membrane2013Inngår i: Journal of the American Chemical Society, ISSN 0002-7863, E-ISSN 1520-5126, Vol. 135, nr 36, s. 13590-13597Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Along with the growing evidence that relates membrane abnormalities to various diseases, biological membranes have been acknowledged as targets for therapy. Any such abnormality in the membrane structure alters the membrane potential which in principle can be captured by measuring properties of specific optical probes. There exists by now many molecular probes with absorption and fluorescence properties that are sensitive to local membrane structure and to the membrane potential. To suggest new high-performance optical probes for membrane-potential imaging it is important to understand in detail the membrane-induced structural changes in the probe, the membrane association dynamics of the probe, and its membrane-specific optical properties. To contribute to this effort, we here study an optical probe, N-acetylaladanamide (NAAA), in the presence of a POPC lipid bilayer using a multiscale integrated approach to assess the probe structure, dynamics, and optical properties in its membrane-bound status and in water solvent. We find that the probe eventually assimilates into the membrane with a specific orientation where the hydrophobic part of the probe is buried inside the lipid bilayer, while the hydrophilic part is exposed to the water solvent. The computed absorption maximum is red-shifted when compared to the gas phase. The computations of the two-photon absorption and second harmonic generation cross sections of the NAAA probe in its membrane-bound state which is of its first kind in the literature suggest that this probe can be used for imaging the membrane potential using nonlinear optical microscopy.

  • 214.
    Nazem, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Kootstra, Geert
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Kragic, Danica
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Djurfeldt, Mikael
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Interfacing a parallel simulation of a neuronal network to robotic hardware using MUSIC, with application to real-time figure-ground segregation.2011Inngår i: BMC neuroscience (Online), ISSN 1471-2202, E-ISSN 1471-2202, Vol. 12, nr Suppl 1, s. 78-78Artikkel i tidsskrift (Fagfellevurdert)
  • 215.
    Nazem, Ali
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Kootstra, Geert
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Kragic, Danica
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP.
    Djurfeldt, Mikael
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB.
    Parallel implementation of a biologically inspired model of figure-ground segregation: Application to real-time data using MUSIC2011Inngår i: Frontiers in Neuroinformatics, ISSN 1662-5196, E-ISSN 1662-5196Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    MUSIC, the multi-simulation coordinator, supports communication between neuronal-network simulators, or other (parallel) applications, running in a cluster super-computer. Here, we have developed a class library that interfaces between MUSIC-enabled software and applications running on computers outside of the cluster. Specifically, we have used this component to interface the cameras of a robotic head to a neuronal-network simulation running on a Blue Gene/L supercomputer. Additionally, we have developed a parallel implementation of a model for figure ground segregation based on neuronal activity in the Macaque visual cortex. The interface enables the figure ground segregation application to receive real-world images in real-time from the robot. Moreover, it enables the robot to be controlled by the neuronal network.

  • 216. Nesson, Ted
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    ROMM Routing: A Class of Efficient Minimal Routing Algorithms1994Konferansepaper (Fagfellevurdert)
    Abstract [en]

    ROMM is a class of Randomized, Oblivious, Multi-phase, Minimal routing algorithms. ROMM routing offers a potential for improved performance compared to fully randomized algorithms under both light and heavy loads. ROMM routing also offers close to best case performance for many common permutations. These claims are supported by extensive simulations of binary cube networks for a number of routing patterns. We show that k \Theta n buffers per node suffice to make k--phase ROMM routing free from deadlock and livelock on n--dimensional binary cubes.

  • 217. Nesson, Ted
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    ROMM routing on mesh and torus networks by Ted Nesson, S Lennart Johnsson1995Konferansepaper (Fagfellevurdert)
    Abstract [en]

    ROMM is a class of Randomized, Oblivious, Multi--phase, Minimal routing algorithms. ROMM routing offers a potential for improved performance compared to both fully randomized algorithms and deterministic oblivious algorithms, under both light and heavy loads. ROMM routing also offers close to best case performance for many common routing problems. In previous work, these claims were supported by extensive simulations on binary cube networks [30, 31]. Here we present analytical and empirical results for ROMM routing on wormhole routed mesh and torus networks. Our simulations show that ROMM algorithms can perform several representative routing tasks 1.5 to 3 times faster than fully randomized algorithms, for medium--sized networks. Furthermore, ROMM algorithms are always competitive with deterministic, oblivious routing, and in some cases, up to 2 times faster.

  • 218.
    Netzer, Gilbert
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Johnsson, Lennart
    University of Houston.
    Ahlin, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Instrumentation for accurate energy-to-solution measurements of a texas instruments TMS320C6678 digital signal processor and its DDR3 memory2014Inngår i: E2SC ’14: Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, 2014, s. 89-98Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Architectural choices for High-Performance Computingsystems have once again become interesting with energyefficiency for targeted workloads now being a major decisionfactor. A detailed understanding of the energy consumptionof major system components during code execution is criticalfor evolving architectures towards enhanced energy efficiency.The focus of this paper is on the measurement system hard- and software we designed and implemented for the assessmentof the energy-to-solution of HPC workloads for the Texas Instruments TMS320C6678 (6678) Digital Signal Processor. The 6678’s thermal design power falls between x86 serverprocessors and mobile CPUs and so does its floating-pointand memory system capabilities. Yet, compared to those types of processors in corresponding CMOS technology, it offers a potentially significant energy advantage. The measurement system is described together with a thorough error analysis. Measurements are processed out-of-band minimizing the impact on the measured system. Sample observations of the energy efficiency of the 6678 and its memory system are included for illustration.

  • 219.
    Netzer, Gilbert
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Johnsson, Lennart
    University of Houston.
    Ahlin, Daniel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Stotzer, Eric
    Texas Instruments.
    Varis, Pekka
    Texas Instruments.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Exploiting DMA for Performance and Energy Optimized STREAM on a DSP2014Inngår i: IPDPSW ’14: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014, s. 805-814Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Energy efficiency is of major concern in HPC.DSP architectures have the potential to offer highly competitiveenergy efficiency for applications requiring 64-bit floatingpointprecision. For STREAM, we achieved 1.47GB/J energy efficiency and 96% DDR3 memory bandwidth utilization on the Texas Instruments TMS320C6678 DSP by using its DMAengines for prefetching to avoid cache misses, which cause pipeline stalls in the DSP’s cores, and to prevent write-allocate loads, which would significantly reduce performance. The DMA engines were also used to coordinate the DSPs cores and schedule main memory accesses to improve DDR3 bandwidth utilization. We briefly describe the instrumentation that we designed and implemented for accurate measurement of the core-related, on-chip memory, and DDR3 power consumption and the effectiveness of the DSP’s power saving mechanisms to trade-off performance and energy efficiency.

  • 220. Nord, F.
    et al.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Monte Carlo option pricing with graphics processing units2011Inngår i: Applications, Tools and Techniques on the Road to Exascale Computing, IOS Press, 2011, Vol. 22, s. 143-150Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Monte Carlo methods are common practice in financial engineering for a wide variety of problems, one being option pricing. Large clusters of computers are used to run these calculations. Growing volumes and complexity of work that needs to be performed as well as strict requirements for fast responses makes for a pressing demand for high performance computing. We present a prototype implementation of an option pricer running on graphics cards. The prototype supports various exotic option types, quasi Monte Carlo and support for custom models for the evolution of stock prices. We conclude that graphics cards can outperform CPUs given certain conditions and for reasonable problem sizes we find a 12x improvement over sequential code when pricing options in a production system.

  • 221.
    Offermans, Nicolas
    et al.
    KTH, Skolan för teknikvetenskap (SCI), Mekanik. KTH, Skolan för teknikvetenskap (SCI), Centra, Linné Flow Center, FLOW.
    Marin, O.
    Schanen, M.
    Gong, Jing
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Fischer, P.
    Schlatter, Philipp
    KTH, Skolan för teknikvetenskap (SCI), Centra, Linné Flow Center, FLOW. KTH, Skolan för teknikvetenskap (SCI), Mekanik.
    On the strong scaling of the spectral element solver Nek5000 on petascale systems2016Inngår i: Proceedings of the 2016 Exascale Applications and Software Conference (EASC2016): April 25-29 2016, Stockholm, Sweden, Association for Computing Machinery (ACM), 2016, artikkel-id a5Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The present work is targeted at performing a strong scaling study of the high-order spectral element uid dynamics solver Nek5000. Prior studies such as [5] indicated a recommendable metric for strong scalability from a theoretical viewpoint, which we test here extensively on three parallel machines with different performance characteristics and interconnect networks, namely Mira (IBM Blue Gene/Q), Beskow (Cray XC40) and Titan (Cray XK7). The test cases considered for the simulations correspond to a turbulent ow in a straight pipe at four different friction Reynolds numbers Reτ = 180, 360, 550 and 1000. Considering the linear model for parallel communication we quantify the machine characteristics in order to better assess the scaling behaviors of the code. Subsequently sampling and profiling tools are used to measure the computation and communication times over a large range of compute cores. We also study the effect of the two coarse grid solvers XXT and AMG on the computational time. Super-linear scaling due to a reduction in cache misses is observed on each computer. The strong scaling limit is attained for roughly 5000 - 10; 000 degrees of freedom per core on Mira, 30; 000 - 50; 0000 on Beskow, with only a small impact of the problem size for both machines, and ranges between 10; 000 and 220; 000 depending on the problem size on Titan. This work aims at being a reference for Nek5000 users and also serves as a basis for potential issues to address as the community heads towards exascale supercomputers.

  • 222. Olsson, Pelle
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    A Data Parallel Implementation of an Explicit Method for the Compressible Navier– Stokes Equations for Three–Dimensional Channel Flow1990Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 14, nr 1, s. 1-30Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The fluid flow in a three-dimensional twisted channel is modeled by both the compressible Navier-Stokes equations, and the Euler equations. A three stage Runge-Kutta method is used for integrating the system of equations in time. A second-order accurate, centered difference scheme is used for spatial derivatives of the flux variables. For both the Euler and the Navier-Stokes equations artificial viscosity introduced through fourth-order centered differences is used to stabilize the numeric scheme. By using lower order difference approximations on or close to the boundary than in the interior, the difference stencils can be evaluated at all grid points concurrently. A few different difference molecules for the boundaries, and different factorizations of the fourth-order difference operators were evaluated. With the appropriate factorization of the difference stencils, six variables per lattice point suffice for the evaluation of the difference stencils occurring in the code. The three fourth-order stencils we investigated, including three different factorizations of one of these stencils, account for three out these six variables. The convergence rate for all stencils and their factorizations is approximately the same for the first 1000–1500 steps at which point the residual has reached a value of 10−2–10−3. From this point on the convergence rate for one of the factorizations of the fourth-order stencil is approximately twice that of one of the unfactored stencils.

    A performance of 1.05 Gflops/s was demonstrated on 65 536 processor Connection Machine system with 512 Mbytes of primary storage. The performance scales in proportion to the number of processors. The performance on 8k processor configurations was 135 Mflops/s, on 16k processors 265 Mflops/s and 525 Mflops/s on 32k processors. The efficiency is independent of the machine size. The evaluation of the boundary conditions accounted for less than 5% of the total time. A performance improvement by a factor of about three is expected with optimized implementations of functional kernels such as convolution, and matrix-vector multiplication.

  • 223. Olsson, Pelle
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    A study of Dissipation Operators for the Euler Equations and a Three–dimensional Channel Flow1989Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Explicit methods for the solution of fluid flow problems are of considerable interest in supercomputing. These methods parallelize well. The treatment of the boundaries is of particular interest both with respect to the numeric behavior of the solution, and the computational efficiency. We have solved the three-dimensional Euler equations for a twisted channel using second-order, centered difference operators, and a three stage Runge-Kutta method for the integration. Three different fourth-order dissipation operators were studied for numeric stabilization: one positive definite, [8], one positive semidefinite, [3], and one indefinite. The operators only differ in the treatment of the boundary. For computational efficiency all dissipation operators were designed with a constant bandwidth in matrix representation, with the bandwidth determined by the operator in the interior. The positive definite dissipation operator results in a significant growth in entropy close to the channel walls. The other operators maintain constant entropy. Several different implementations of the semidefinite operator obtained through factoring of the operator were also studied. We show the difference both in convergence rate and robustness for the different dissipation operators, and the factorizations of the operator due to Eriksson. For the simulations in this study one of the factorizations of the semidefinite operator required 70 - 90% of the number of iterations required by the positive definite operator. The indefinite operator was sensitive to perturbations in the inflow boundary conditions. The simulations were performed on a 8,192 processor Connection Machine system model CM-2. Full processor utilization was achieved, and a performance of 135 Mflops/s in single precision was obtained. A performance of 1.1 Gflops/s for a fully configured system with 65,536 processors was demonstrated.

  • 224. Olsson, Pelle
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Boundary Modifications of the Dissipation Operators for the Three–Dimensional Euler Equations1989Inngår i: Journal of Scientific Computing, ISSN 0885-7474, E-ISSN 1573-7691, Vol. 4, nr 2, s. 159-195Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Explicit methods for the solution of fluid flow problems are of considerable interest in supercomputing. These methods parallelize well. The treatment of the boundaries is of particular interest with respect to both the numeric behavior of the solution and the computational efficiency. We have solved the three-dimensional Euler equations for a twisted channel using second-order, centered difference operators, and a three-stage Runge-Kutta method for the integration. Three different fourth-order dissipation operators were studied for numeric stabilization: one positive definite, one positive semidefinite, and one indefinite. The operators only differ in the treatment of the boundary. For computational efficiency all dissipation operators were designed with a constant bandwidth in matrix representation, with the bandwidth determined by the operator in the interior. The positive definite dissipation operator results in a significant growth in entropy close to the channel walls. The other operators maintain constant entropy. Several different implementations of the semidefinite operator obtained through factoring of the operator were also studied. We show the difference both in convergence rate and robustness for the different dissipation operators, and the factorizations of the operator due to Eriksson. For the simulations in this study one of the factorizations of the semidefinite operator required 70%–90% of the number of iterations required by the positive definite operator. The indefinite operator was sensitive to perturbations in the inflow boundary conditions. The simulations were performed on a 8,192 processor Connection Machine system CM-2. Full processor utilization was achieved, and a performance of 135 Mflops/sec in single precision was obtained. A performance of 1.1 Gflops/sec for a fully configured system with 65,536 processors was demonstrated.

  • 225.
    Olwal, Alex
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Numerisk Analys och Datalogi, NADA. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    DiVerdi, S.
    Rakkolainen, I.
    Höllerer, T.
    Consigalo: Multi-user face-to-face interaction on immaterial displays2008Inngår i: INTETAIN 2008 - 2nd International Conference on INtelligent TEchnologies for Interactive EnterTAINment / [ed] Kruger A.,Rehg J.,Feiner S., ICST , 2008Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper, we describe and discuss interaction techniques and interfaces enabled by immaterial displays. Dual-sided projection allows casual face-to-face interaction between users, with computer-generated imagery in-between them. The immaterial display imposes minimal restrictions to the movements or communication of the users. As an example of these novel possibilities, we provide a detailed description of our Consigalo gaming system, which creates an enhanced gaming experience featuring sporadic and unencumbered interaction. Consigalo utilizes a robust 3D tracking system, which supports multiple simultaneous users on either side of the projection surface. Users manipulate graphics that are floating in mid-air with natural gestures. We have also added a responsive and adaptive sound track to further immerse the users in the interactive experience. We describe the technology used in the system, the innovative aspects compared to previous largescreen gaming systems, the gameplay and our lessons learned from designing and implementing the interactions, visuals and the auditory feedback.

  • 226. Ortiz, Luis
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Local Basic Linear Algebra Subroutines (BLAS) on the Connection Machine System CM–2001993Inngår i: International Journal of Supercomputer Applications, Vol. 7, nr 1Artikkel i tidsskrift (Fagfellevurdert)
  • 227. Otten, Matthew
    et al.
    Gong, Jing
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Mametjanov, Azamat
    Vose, Aaron
    Levesque, John
    Fischer, Paul
    Min, Misun
    An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication2016Inngår i: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 30, nr 3, s. 320-334Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present performance results and an analysis of a message passing interface (MPI)/OpenACC implementation of an electromagnetic solver based on a spectral-element discontinuous Galerkin discretization of the time-dependent Maxwell equations. The OpenACC implementation covers all solution routines, including a highly tuned element-by-element operator evaluation and a GPUDirect gather-scatter kernel to effect nearest neighbor flux exchanges. Modifications are designed to make effective use of vectorization, streaming, and data management. Performance results using up to 16,384 graphics processing units of the Cray XK7 supercomputer Titan show more than 2.5x speedup over central processing unit-only performance on the same number of nodes (262,144 MPI ranks) for problem sizes of up to 6.9 billion grid points. We discuss performance-enhancement strategies and the overall potential of GPU-based computing for this class of problems.

  • 228. Pons, Carles
    et al.
    Jimenez-Gonzalez, Daniel
    Gonzalez-Alvarez, Cecilia
    Servat, Harald
    Cabrera-Benitez, Daniel
    Aguilar, Xavier
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Fernandez-Recio, Juan
    Cell-Dock: high-performance protein-protein docking2012Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, nr 18, s. 2394-2396Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The application of docking to large-scale experiments or the explicit treatment of protein flexibility are part of the new challenges in structural bioinformatics that will require large computer resources and more efficient algorithms. Highly optimized fast Fourier transform (FFT) approaches are broadly used in docking programs but their optimal code implementation leaves hardware acceleration as the only option to significantly reduce the computational cost of these tools. In this work we present Cell-Dock, an FFT-based docking algorithm adapted to the Cell BE processor. We show that Cell-Dock runs faster than FTDock with maximum speedups of above 200x, while achieving results of similar quality.

  • 229. Quance, M.
    et al.
    Feng, C.
    Rojas, M.
    Putonti, C.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Fofanov, Y.
    Mimicry of Statistical Properties of Host Genomes by RNA Viruses2008Konferansepaper (Fagfellevurdert)
  • 230. Ranade, Abhiram
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    The Communication Efficiency of Meshes, Boolean Cubes, and Cube Connected Cycles for Wafer Scale Integration1987Konferansepaper (Fagfellevurdert)
    Abstract [en]

     In this paper we analyze the emulation of two-dimensional meshes, butterfly networks, and spanning trees on meshes, Boolean cubes, and Cube Connected Cycles (CCC) networks. We consider three timing models for signal propagation dong a wire: constant delay, capacitive delay, and resistive delay. We ais0 present novel layouts for hypercubes and CCCs that offer better performance for some problems, while essentially maintainingthe performance for other problems. The mesh interconnection performs better on all emulations for all delay models,if the communication throughput determines the performance. With resistive delay model, meshes also offer the best latency for all emulations. The hypercube and CCC layouts yield lower latency for emulating butterlly networks and spanning trees for the constant delay and capacitive delay models.

  • 231. Ranade, Abhiram R.
    et al.
    Bhatt, Sandeep N.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    The Fluent Abstract Machine1987Inngår i: Advanced Research in VLSI, MIT Press, 1987, s. 71-93Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    The fluent abstract machine supports a very powerful programming model. In addition to arbitrary access patterns, the instruction repertoire of the fluent machine also includes the multiprefix operation and high-level set operations. The fluent machine consists of over one hundred thousand processors interconnected by a butterfly network. The efficiency of the fluent machine derives from a very simple router, which effectively eliminates the possibility of congestion. The routing hardware is extremely simple inexpensive, and probably efficient.

  • 232. Riedel, M.
    et al.
    Memon, A. S.
    Memon, M. S.
    Mallmann, D.
    Venturi, V.
    Andreetto, P.
    Marzolla, M.
    Ferraro, A.
    Ghiselli, A.
    Hedman, Fredrik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Shah, Zeeshan Ali
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Salzemann, J.
    Da Costa, A.
    Bloch, V.
    Breton, V.
    Kasam, V.
    Hofmann-Apitius, M.
    Snelling, D.
    van de Berghe, S.
    Li, V.
    Brewer, S.
    Dunlop, A.
    De Silva, N.
    Improving e-Science with Interoperability of the e-Infrastructures EGEE and DEISA2008Inngår i: MIPRO 2008 - 31st International Convention Proceedings: Microelectronics, Electronics and Electronic Technology, MEET and Grid and Visualizations Systems, GVS, 2008, s. 225-231Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In the last couple of years, many e-Science infrastructures have begun to offer production services to e- Scientists with an increasing number of applications that require access to different kinds of computational resources. Within Europe two rather different multi-national e-Science infrastructures evolved over time namely Distributed European Infrastructure for Supercomputing Applications (DEISA) and Enabling Grids for E-SciencE (EGEE). DEISA provides access to massively parallel systems such as supercomputers that are well suited for scientific applications that require many interactions between their typically high numbers of CPUs. EGEE on the other hand provides access to a world-wide Grid of university clusters and PC pools that are well suited for farming applications that require less or even no interactions between the distributed CPUs. While DEISA uses the HPC-driven Grid technology UNICORE, EGEE is based on the gLite Grid middleware optimized for farming jobs. Both have less adoption of open standards and therefore both systems are technically non-interoperable, which means that no e-Scientist can easily leverage the DEISA and EGEE infrastructure with one suitable client environment for scientific applications. This paper argues that future interoperability of such large e-Science infrastructures is required to improve e-Science in general and to increase the real scientific impact of world-wide Grids in particular. We discuss the interoperability achieved by the OMII-Europe project that fundamentally improved the interoperability between UNICORE and gLite by using open standards. We also outline one specific scientific scenario of the WISDOM initiative that actually benefits from the recently established interoperability.

  • 233. Ringholm, M.
    et al.
    Jonsson, D.
    Bast, Radovan
    KTH, Skolan för bioteknologi (BIO), Teoretisk kemi och biologi. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Gao, B.
    Thorvaldsen, A. J.
    Ekström, U.
    Helgaker, T.
    Ruud, K.
    Analytic cubic and quartic force fields using density-functional theory2014Inngår i: Journal of Chemical Physics, ISSN 0021-9606, E-ISSN 1089-7690, Vol. 140, nr 3, s. 034103-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present the first analytic implementation of cubic and quartic force constants at the level of Kohn-Sham density-functional theory. The implementation is based on an open-ended formalism for the evaluation of energy derivatives in an atomic-orbital basis. The implementation relies on the availability of open-ended codes for evaluation of one- and two-electron integrals differentiated with respect to nuclear displacements as well as automatic differentiation of the exchange-correlation kernels. We use generalized second-order vibrational perturbation theory to calculate the fundamental frequencies of methane, ethane, benzene, and aniline, comparing B3LYP, BLYP, and Hartree-Fock results. The Hartree-Fock anharmonic corrections agree well with the B3LYP corrections when calculated at the B3LYP geometry and from B3LYP normal coordinates, suggesting that the inclusion of electron correlation is not essential for the reliable calculation of cubic and quartic force constants.

  • 234. Ringholm, Magnus
    et al.
    Bast, Radovan
    KTH, Skolan för bioteknologi (BIO), Teoretisk kemi och biologi. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Oggioni, Luca
    Ekstrom, Ulf
    Ruud, Kenneth
    Analytic calculations of hyper-Raman spectra from density functional theory hyperpolarizability gradients2014Inngår i: Journal of Chemical Physics, ISSN 0021-9606, E-ISSN 1089-7690, Vol. 141, nr 13, s. 134107-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present the first analytic calculations of the geometrical gradients of the first hyperpolarizability tensors at the density-functional theory (DFT) level. We use the analytically calculated hyperpolarizability gradients to explore the importance of electron correlation effects, as described by DFT, on hyper-Raman spectra. In particular, we calculate the hyper-Raman spectra of the all-trans and 11-cis isomers of retinal at the Hartree-Fock (HF) and density-functional levels of theory, also allowing us to explore the sensitivity of the hyper-Raman spectra on the geometrical characteristics of these structurally related molecules. We show that the HF results, using B3LYP-calculated vibrational frequencies and force fields, reproduce the experimental data for all-trans-retinal well, and that electron correlation effects are of minor importance for the hyper-Raman intensities.

  • 235.
    Rinkevicius, Zilvinas
    et al.
    KTH, Skolan för kemi, bioteknologi och hälsa (CBH), Teoretisk kemi och biologi.
    Bast, Radovan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Quantum chemistry on a heterogeneous computer system: Accelerating the Kohn-Sham method for hybrid CPU/GPGPU and CPU/Intel MIC platforms2014Inngår i: Abstract of Papers of the American Chemical Society, ISSN 0065-7727, Vol. 248Artikkel i tidsskrift (Annet vitenskapelig)
  • 236.
    Sandholm, Thomas
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Data- och systemvetenskap, DSV. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Lai, Kevin
    Hewlett-Packard Laboratories, Information Dynamics Laboratory, Palo Alto.
    A Statistical Approach to Risk Mitigation in Computational Markets2007Inngår i: Proceedings of the 16th International Symposium on High Performance Distributed Computing 2007, HPDC'07, 2007, s. 85-96Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We study stochastic models to mitigate the risk of poor Quality-of-Service (QoS) in computational markets. Consumers whopurchase services expect both price and performance guarantees. They need to predict future demand to budget for sustained performance despite price fluctuations. Conversely, providers need to estimate demand to price future usage. The skewed and bursty nature of demand in large-scale computer networks challenges the common statistical assumptions of symmetry, independence, and stationarity. This discrepancy leads to under estimation of investment risk. We confirm this non-normal distribution behavior in our study of demand in computational markets. The high agility of a dynamic resource market requires flexible, efficient, and adaptable predictions. Computational needs are typically expressed using performance levels, hence we estimate worst-case bounds of price distributions to mitigate the risk of missing execution deadlines. To meet these needs, we use moving time windows of statistics to approximate price percentile functions. We use snapshots of summary statistics to calculate prediction intervals and estimate model uncertainty. Our approach is model-agnostic, distribution-free both in prices and prediction errors, and does not require extensive sampling nor manual parameter tuning. Our simulations and experiments show that a Chebyshev inequality model generates accurate predictions with minimal sample data requirements. We also show that this approach mitigates the risk of dropped service level performance when selecting hosts to run a bag-of-task Grid application simulation in a live computational market cluster.

  • 237.
    Sandholm, Thomas
    et al.
    KTH, Skolan för informations- och kommunikationsteknik (ICT), Data- och systemvetenskap, DSV. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Lai, Kevin
    Hewlett-Packard Laboratories, Information Dynamics Laboratory, Palo Alto.
    Prediction-Based Enforcement of Performance Contracts2007Inngår i: Grid Economics and Business Models / [ed] Altmann, J; Veit, DJ, 2007, Vol. 4685, s. 71-82Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Grid computing platforms require automated and distributed resource allocation with controllable quality-of-service (QoS). Market-based allocation provides these features using the complementary abstractions of proportional shares and reservations. This paper analyzes a hybrid resource allocation system using both proportional shares and reservations. We also examine the use of price prediction to provide statistical QoS guarantees and to set admission control prices.

  • 238.
    Sandholm, Thomas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för informations- och kommunikationsteknik (ICT), Data- och systemvetenskap, DSV.
    Lai, Kevin
    Hewlett-Packard Laboratories, Information Dynamics Laboratory, Palo Alto.
    Andrade, Jorge
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Odeberg, Jacob
    KTH, Skolan för bioteknologi (BIO), Genteknologi.
    Market-Based Resource Allocation using Price Prediction in a high performance computing Grid for scientific applications2006Inngår i: Proceedings of the IEEE International Symposium on High Performance Distributed Computing 2006, 2006, Vol. 15th IEEE International Symposium, s. 132-143Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present the implementation and analysis of a market-based resource allocation system for computational Grids. Although Grids provide a way to share resources and take advantage of statistical multiplexing, a variety of challenges remain. One is the economically efficient allocation of resources to users from disparate organizations who have their own and sometimes conflicting requirements for both the quantity and quality of services. Another is secure and scalable authorization despite rapidly changing allocations.

    Our solution to both of these challenges is to use a market-based resource allocation system. This system allows users to express diverse quantity- and quality-of-service requirements, yet prevents them from denying service to other users. It does this by providing tools to the user to predict and tradeoff risk and expected return in the computational market. In addition, the system enables secure and scalable authorization by using signed money-transfer tokens instead of identity-based authorization. This removes the overhead of maintaining and updating access control lists, while restricting usage based on the amount of money transferred We examine the performance of the system by running a bioinformatics application on a fully operational implementation of an integrated Grid market.

  • 239.
    Sarnowska, Karolina
    et al.
    University of Virginia.
    Grimshaw, Andrew
    University of Virginia.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Using Standards-based Interfaces to Share Data across Grid Infrastructures2009Inngår i: 38th International Conference on Parallel Processing (ICPP 2009), 2009, s. 254-260Konferansepaper (Fagfellevurdert)
  • 240.
    Schliephake, Michael
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Aguilar, Xavier
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Design and Implementation of a Runtime System for Parallel Numerical Simulations on Large-Scale Clusters2011Inngår i: Proceedings Of The International Conference On Computational Science (ICCS) / [ed] Sato, M; Matsuoka, S; Sloot, PMA; VanAlbada, GD; Dongarra, J, Elsevier, 2011, Vol. 4, s. 2105-2114Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The execution of scientific codes will introduce a number of new challenges and intensify some old ones on new high-performance computing infrastructures. Petascale computers are large systems with complex designs using heterogeneous technologies that make the programming and porting of applications difficult, particularly if one wants to use the maximum peak performance of the system. In this paper we present the design and first prototype of a runtime system for parallel numerical simulations on large-scale systems. The proposed runtime system addresses the challenges of performance, scalability, and programmability of large-scale HPC systems. We also present initial results of our prototype implementation using a molecular dynamics application kernel.

  • 241.
    Schliephake, Michael
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Communication Performance Analysis of CRESTA’s Co-Design Application NEK50002012Inngår i: Workshop Preparing Applications for Exascale Through Co-design in International Conference on High Performance Computing, 2012Konferansepaper (Fagfellevurdert)
  • 242.
    Schliephake, Michael
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Towards improving the communication performance of CRESTA's co-design application NEK50002012Inngår i: Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012, IEEE , 2012, s. 669-674Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In order to achieve exascale performance, all aspects of applications and system software need to be analysed and potentially improved. The EU FP7 project 'Collaborative Research into Exascale Systemware, Tools & Applications' (CRESTA) uses co-design of advanced simulation applications and system software as well as related development tools as a key element in its approach towards exascale. In this paper we present first results of a co-design activity using the highly scalable application NEK5000. We have analysed the communication structure of NEK5000 and propose new, optimised collective communication operations that will allow to improve the performance of NEK5000 and to prepare it for the use on several millions of cores available in future HPC systems. The latency-optimised communication operations can also be beneficial in other contexts, for instance we expect them to become an important building block for a runtime-system providing dynamic load balancing, also under development within CRESTA.

  • 243. Seitz, L.
    et al.
    Rissanen, E.
    Sandholm, Thomas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Firozabadi, B. S.
    Mulmo, Olle
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Policy administration control and delegation using XACML and delegent2005Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In this paper we present a system permitting controlled policy administration and delegation using the XACML access control system. The need for these capabilities stems from the use of XACML in the SweGrid Accounting System, which is used to enforce resource allocations to Swedish research projects. Our solution uses a second access control system Delegent, which has powerful delegation capabilities. We have implemented limited XML access control in Delegent, in order to supervise modifications of the XML-encoded XACML policies. This allows us to use the delegation capabilities of Delegent together with the expressive access level permissions of XACML.

  • 244. Shalaby, Nadia
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    A Vector Space Framework for Parallel Stable Permutations1997Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We establish a formal foundation for stable permutations in the domain of a parallel model of computation applicable to a customized set of complexity metrics. By means of vector spaces, we develop an algebrao--geometric representation that is expressive, flexible and simple to use, and present a taxonomy categorizing stable permutations into classes of index--digit, linear, translation, affine and polynomial permutations. For each class, we demonstrate its general behavioral properties and then analyze particular examples in each class, where we derive results about its inverse, fixed instances, number of instances local and nonlocal to a processor, as well as its compositional relationships to other permutations. Such examples are bit--reversal, radix-- Q exchange, radix--Q shuffle and unshuffle within the index--digit class, radix--Q butterfly and 1's complement within the translation class, binary--to--Gray and Gray--to--binary within the linear class, and arithmetic add 1, arithm...

  • 245. Shalaby, Nadia
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Hierarchical Load–Balancing for Parallel Fast Legendre Transforms1997Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present a parallel Fast Legendre Transform (FLT) based on the Driscol--Healy algorithm with computation complexity O(N log 2 N ). The parallel FLT is load-- balanced in a hierarchical fashion. We use a load--balanced FFT to deduce a load-- balanced parallel fast cosine transform, which in turn serves as a building block for the Legendre transform engine, from which the parallel FLT is constructed. We demonstrate how the arithmetic, memory and communication complexities of the parallel FLT are hierarchically derived via the complexity of its modular blocks. 1 Introduction Legendre transforms are ubiquitous in many disciplines of applied sciences, particularly spectral methods for the solution of partial differential equations [3]. For applications of harmonic analysis on the 2--sphere S 2 , an efficient Legendre transform is as crucial to numeric computation as the fast Fourier transform (FFT) is to classical time--series analysis on R. This stems from the fact that harmonic ana...

  • 246.
    Silverstein, David N.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. Stockholm Brain Institute, Karolinska institutet, Sweden.
    A computational investigation of feedforward and feedback processing in metacontrast backward masking2015Inngår i: Frontiers in Psychology, ISSN 1664-1078, E-ISSN 1664-1078, Vol. 6, artikkel-id 6Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In human perception studies, visual backward masking has been used to understand the temporal dynamics of subliminal vs. conscious perception. When a brief target stimulus is followed by a masking stimulus after a short interval of <100 ms, performance on the target is impaired when the target and mask are in close spatial proximity. While the psychophysical properties of backward masking have been studied extensively, there is still debate on the underlying cortical dynamics. One prevailing theory suggests that the impairment of target performance due to the mask is the result of lateral inhibition between the target and mask in feedforward processing. Another prevailing theory suggests that this impairment is due to the interruption of feedback processing of the target by the mask. This computational study demonstrates that both aspects of these theories may be correct. Using a biophysical model of V1 and V2, visual processing was modeled as interacting neocortical attractors, which must propagate up the visual stream. If an activating target attractor in V1 is quiesced enough with lateral inhibition from a mask, or not reinforced by recurrent feedback, it is more likely to burn out before becoming fully active and progressing through V2 and beyond. Results are presented which simulate metacontrast backward masking with an increasing stimulus interval and with the presence and absence of feedback activity. This showed that recurrent feedback diminishes backward masking effects and can make conscious perception more likely. One model configuration presented a metacontrast noise mask in the same hypercolumns as the target, and produced type-A masking. A second model configuration presented a target line with two parallel adjacent masking lines, and produced type-B masking. Future work should examine how the model extends to more complex spatial mask configurations.

  • 247.
    Silverstein, David N.
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. Stockholm Brain Institute, Karolinska Institutet, Sweden.
    Ingvar, Martin
    Karolinska Institutet, Sweden.
    A multi-pathway hypothesis for human visual fear signaling2015Inngår i: Frontiers in Systems Neuroscience, ISSN 1662-5137, E-ISSN 1662-5137, Vol. 9, artikkel-id 101Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A hypothesis is proposed for five visual fear signaling pathways in humans, based on an analysis of anatomical connectivity from primate studies and human functional connectvity and tractography from brain imaging studies. Earlier work has identified possible subcortical and cortical fear pathways known as the “low road” and “high road,” which arrive at the amygdala independently. In addition to a subcortical pathway, we propose four cortical signaling pathways in humans along the visual ventral stream. All four of these traverse through the LGN to the visual cortex (VC) and branching off at the inferior temporal area, with one projection directly to the amygdala; another traversing the orbitofrontal cortex; and two others passing through the parietal and then prefrontal cortex, one excitatory pathway via the ventral-medial area and one regulatory pathway via the ventral-lateral area. These pathways have progressively longer propagation latencies and may have progressively evolved with brain development to take advantage of higher-level processing. Using the anatomical path lengths and latency estimates for each of these five pathways, predictions are made for the relative processing times at selective ROIs and arrival at the amygdala, based on the presentation of a fear-relevant visual stimulus. Partial verification of the temporal dynamics of this hypothesis might be accomplished using experimental MEG analysis. Possible experimental protocols are suggested.

  • 248.
    Stamou, Katerina
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Hedman, Fredrik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Iliopoulos, Anthony
    Extending UNICORE 5 authentication model by supporting proxy certificate profile extensions2008Inngår i: Euro-Par 2007 Workshops: Parallel Processing / [ed] Bouge, L; Forsell, M; Traff, JL; Streit, A; Ziegler, W; Alexander, M; Childs, S, 2008, Vol. 4854, s. 104-111Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Authentication interoperability between the UNICORE grid middleware system and other Grid middleware systems is addressed. An approach to extending the UNICORE authentication model to support a proxy certificate (RFC3280) profile is presented. This optional feature can then be enabled based on site policy. Furthermore, the addition capacitates further advances related to authorization. With interoperability becoming a key issue in many production environments, extending the generality of UNICORE in this way opens up the possibility of direct and general interoperability scenarios.

  • 249.
    Svensson, Gert
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Söderberg, Johan
    Hifab.
    A Heat Re-Use System for the Cray XE6 and Future Systems at PDC, KTH2012Inngår i: Cray User Group Final Proceedings: Greengineering the Future, Cray User Group , 2012Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The installation of a 16 cabinet Cray XE6 in 2010 at PDC was expected to increase the total power consumption from around 800 kW to 1300 kW, an increase of 500 kW. The intention was to refund some of the power cost and become more environmentally friendly by re-using the energy from the Cray to heat nearby buildings. The custom made system, which makes it possible to heat nearby buildings at the campus without using heat-pumps, is described in detail. The principle of the system is that hot air from the Cray is sent through industrial heat exchangers placed above the Cray racks. This makes it possible to heat the water to more than 30 °C. The problems encountered and the experiences gained are described as well as projection for the savings. A method of describing a mix of different cooling requirements shows the way for future improvements and addition of future systems.

  • 250. Szalisznyo, Krisztina
    et al.
    Silverstein, David N.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsbiologi, CB. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Duffau, Hugues
    Smits, Anja
    Pathological Neural Attractor Dynamics in Slowly Growing Gliomas Supports an Optimal Time Frame for White Matter Plasticity2013Inngår i: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 8, nr 7, s. e69798-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Neurological function in patients with slowly growing brain tumors can be preserved even after extensive tumor resection. However, the global process of cortical reshaping and cerebral redistribution cannot be understood without taking into account the white matter tracts. The aim of this study was to predict the functional consequences of tumor-induced white matter damage by computer simulation. A computational model was proposed, incorporating two cortical patches and the white matter connections of the uncinate fasciculus. Tumor-induced structural changes were modeled such that different aspects of the connectivity were altered, mimicking the biological heterogeneity of gliomas. The network performance was quantified by comparing memory pattern recall and the plastic compensatory capacity of the network was analyzed. The model predicts an optimal level of synaptic conductance boost that compensates for tumor-induced connectivity loss. Tumor density appears to change the optimal plasticity regime, but tumor size does not. Compensatory conductance values that are too high lead to performance loss in the network and eventually to epileptic activity. Tumors of different configurations show differences in memory recall performance with slightly lower plasticity values for dense tumors compared to more diffuse tumors. Simulation results also suggest an optimal noise level that is capable of increasing the recall performance in tumor-induced white matter damage. In conclusion, the model presented here is able to capture the influence of different tumor-related parameters on memory pattern recall decline and provides a new way to study the functional consequences of white matter invasion by slowly growing brain tumors.

23456 201 - 250 of 260
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf