Endre søk
Begrens søket
123456 151 - 200 of 260
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 151.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Ho, Ching-Tien
    On the Conversion between Binary Code and Binary Reflected Gray Code1995Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 44, nr 1, s. 47-53Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present a new algorithm for conversion between binary code and binary-reflected Gray code that requires ap proximately 2K/3 element transfers in sequence for K elements per node, compared to K element transfers for previously known algorithms. For a binary cube of n = 2 dimensions the new algorithm degenerates to yield a complexity of K/2 + 1 element transfers, which is optimal. The new algorithm is optimal to within a multiplicative factor of 4/3 with respect to the best known lower bound for any routing strategy. We show that the minimum number of element transfers for minimum path length routing is K with concurrent communication on all channels of every node of a binary cube.

  • 152.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Ho, Ching-Tien
    Optimal Communication Channel Utilization for Matrix Transposition and Related Permutations on Boolean Cubes1994Inngår i: Journal of Discrete Applied Mathematics, Vol. 53, nr 1-3, s. 251-274Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present optimal schedules for permutations in which each node sends one or several unique messages to every other node. With concurrent communication on all channels of every node in binary cube networks, the number of element transfers in sequence for K elements per node is K/2, irrespective of the number of nodes over which the data set is distributed. For a succession of s permutations within disjoint subcubes of d dimensions each, our schedules yield min(K/2 + (s - 1)d, (s + 3)d, K/2 + 2d) exchanges in sequence. The algorithms can be organized to avoid indirect addressing in the intermode data exchanges, a property that increases the performance on some architectures.

  • 153.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Ho, Ching-Tien
    Spanning Graphs for Optimum Broadcasting and Personalized Communication in Hypercubes1989Inngår i: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 38, nr 9, s. 1249-1268Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Four different communication problems are addressed in Booleann-cube configured multiprocessors: (1) one-to-all broadcasting: distribution of common data from a single source to all other nodes; (2) one-to-all personalized communication: a single node sending unique data to all other nodes; (3) all-to-all broadcasting: distribution of common data from each node to all other nodes; and (4) all-to-all personalized communication: each node sending a unique piece of information to every other node. Three communication graphs (spanning trees) for the Booleann-cube are proposed for the routing, and scheduling disciplines provably optimum within a small constant factor are proposed. With appropriate scheduling and concurrent communication on all ports of every processor, routings based on these two communication graphs offer a speedup of up ton/2, and O(√n) over the routings based on the spanning binomial tree for cases (2)-(4) respectively. All three spanning trees offer optimal communication times for cases (2)-(4) and concurrent communication on all ports of every processor. Timing models and complexity analysis are verified by experiments on a Boolean-cube-configured multiprocessor

  • 154.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Jacquemin, Michael
    Krawitz, Robert L.
    Communication Efficient Multi–Processor FFT1992Inngår i: Journal of Computational Physics, ISSN 0021-9991, E-ISSN 1090-2716, Vol. 102, nr 2, s. 381-397Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Computing the fast Fourier transform on a distributed memory architecture by a direct pipelined radix-2, a bi-section, or a multisection algorithm, all yield the same communications requirement, if communication for all FFT stages can be performed concurrently, the input data is in normal order, and the data allocation is consecutive. With a cyclic data allocation, or bit-reversed input data and a consecutive allocation, multi-sectioning offers a reduced communications requirement by approximately a factor of two. For a consecutive data allocation, normal input order, a decimation-in-time FFT requires that P/N + d−2 twiddle factors be stored for P elements distributed evenly over N processors, and the axis that is subject to transformation be distributed over 2d processors. No communication of twiddle factors is required. The same storage requirements hold for a decimation-in-frequency FFT, bit-reversed input order, and consecutive data allocation. The opposite combination of FFT type and data ordering requires a factor of log2N more storage for N processors. The peak performance for a Connection Machine system CM-200 implementation is 12.9 Gflops/s in 32-bit precision, and 10.7 Gflops/s in 64-bit precision for unordered transforms local to each processor. The corresponding execution rates for ordered transforms are 11.1 Gflops/s and 8.5 Gflops/s, respectively. For distributed one- and two-dimensional transforms the peak performance for unordered transforms exceeds 5 Gflops/s in 32-bit precision and 3 Gflops/s in 64-bit precision. Three-dimensional transforms execute at a slightly lower rate. Distributed ordered transforms execute at a rate of about 1/2 to 2/3 of the unordered transforms.

  • 155.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Krawitz, Robert L.
    Cooley–Tukey FFT on the Connection Machine1992Inngår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 18, nr 11, s. 1201-1221Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We describe an implementation of the Cooley-Tukey complex-to-complex FFT on the Connection Machine. The implementation is designed to make effective use of the communications bandwidth of the architecture, its memory bandwidth, and storage with precomputed twiddle factors. The peak data motion rate that is achieved for the interprocessor communication stages is in excess of 7 Gbytes/s for a Connection Machine system CM-200 with 2048 floating-point processors. The peak rate of FFT computations local to a processor is 12.9 Gflops/s in 32-bit precision, and 10.7 Gflops/s in 64-bit precision. The same FFT routine is used to perform both one- and multi-dimensional FFT without any explicit data rearrangement. The peak performance for a one-dimensional FFT on data distributed over all processors is 5.4 Gflops/s in 32-bit precision and 3.2 Gflops/s in 64-bit precision. The peak performance for square, two-dimensional transforms, is 3.1 Gflops/s in 32-bit precision, and for cubic, three dimensional transforms, the peak is 2.0 Gflops/s in 64-bit precision. Certain oblong shapes yield better performance. The number of twiddle factors stored in each processor is P/2N + log2 N for an FFT on P complex points uniformly distributed among N processors. To achieve this level of storage efficiency we show that a decimation-in-time FFT is required for normal order input, and a decimation-in-frequency FFT is required for bit-reversed input order.

  • 156.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Manson, M
    On–Line Determination of Power and Measurement System Configuration, Topological Properties and Observability1976Konferansepaper (Fagfellevurdert)
  • 157.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Mathur, Kapil K
    Data Structures and Algorithms for the Finite Element Method on a Data Parallel Supercomputer1990Inngår i: International Journal for Numerical Methods in Engineering, ISSN 0029-5981, E-ISSN 1097-0207, Vol. 29, nr 4, s. 881-908Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    This article describes a formulation of the finite element method and its implementation on a data parallel computing system. The Connection Machine® system, CM-2, has been used as the model architecture. Data structures, storage requirements, communication and parallel arithmetic complexity are analysed in detail for the cases when a processor represents an unassembled finite element and when a processor is assigned to an unassembled nodal point. Data parallel algorithms for the grid generation, the evaluation of the elemental stiffness matrices and for the iterative solution of the linear system are presented. The algorithm for evaluating the elemental stiffness matrices computes the matrix elements concurrently without communication. This concurrency is in addition to the inherent parallelism present among different finite elements. A conjugate gradient solver with diagonal pre-conditioner is used for the solution of the resulting linear system. Results from an implementation of the three-dimensional finite element method based on Lagrange elements are reported. For single-precision floating-point operations, the measured peak performance is approximately 2·4 G flops s−1 for evaluating the elemental stiffness matrices and approximately 850 M flops s−1 for the conjugate gradient solver. On a Connection Machine system with 16K physical processors, the time per conjugate gradient iteration for an application with 400 000 degrees of freedom is approximately 0·13 s for double-precision floating-point operations.

  • 158.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Mathur, Kapil K
    Experience with the Conjugate Gradient Method for Stress Analysis on a Data Parallel Supercomputer1989Inngår i: International Journal for Numerical Methods in Engineering, ISSN 0029-5981, E-ISSN 1097-0207, Vol. 27, nr 3, s. 523-546Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The storage requirements and performance consequences of a few different data parallel implementations of the finite element method for domains discretized by three-dimensional brick elements are reviewed. Letting a processor represent a nodal point per unassembled finite element yields a concurrency that may be one to two orders of magnitude higher for common elements than if a processor represents an unassembled finite element. The former representation also allows for higher order elements with a limited amount of storage per processor. A totally parallel stiffness matrix generation algorithm is presented. The equilibrium equations are solved by a conjugate gradient method with diagonal scaling. The results from several simulations designed to show the dependence of the number of iterations to convergence upon the Poisson ratio, the finite element discretization and the element order are reported. The domain was discretized by three-dimensional Lagrange elements in all cases. The number of iterations to convergence increases with the Poisson ratio. Increasing the number of elements in one special dimension increases the number of iterations to convergence, linearly. Increasing the element order p in one spatial dimension increases the number of iterations to convergence as pα, where α is 1·4–1·5 for the model problems.

  • 159.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Mathur, Kapil K
    High Performance, Scalable Scientific Software Libraries1994Inngår i: Portability and Performance in Parallel Processing, John Wiley & Sons, 1994, s. 159-208Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    Massively parallel processors introduces new demands on software systems with respect to performance, scalability, robustness and portability. The increased complexity of the memory systems and the increased range of problem sizes for which a given piece of software is used, poses serious challenges to software developers. The Connection Machine Scientific Software Library, CMSSL, uses several novel techniques to meet these challenges. The CMSSL contains routines for managing the data distribution and provides data distribution independent functionality. High performance is achieved through careful scheduling of operations and data motion, and through the automatic selection of algorithms at run--time. We discuss some of the techniques used, and provide evidence that CMSSL has reached the goals of performance and scalability for an important set of applications.

  • 160.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Mathur, Kapil K
    Massively Parallel Computing: Mathematics and Communications Libraries1993Inngår i: Parallel Supercomputing in Atmospheric Science, World Scientific, 1993, s. 250-285Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    Massively parallel computing holds the promise of extreme performance. The utility of these systems will depend heavily upon the availability of libraries until compilation and run-- time system technology is developed to a level comparable to what today is common on most uniprocessor systems. Critical for performance is the ability to exploit locality of reference and effective management of the communication resources. We discuss some techniques for preserving locality of reference in distributed memory architectures. In particular, we discuss the benefits of multidimensional address spaces instead of the conventional linearized address spaces, partitioning of irregular grids, and placement of partitions among nodes. Some of these techniques are supported as language directives, others as run--time system functions, and others still are part of the Connection Machine Scientific Software Library, CMSSL. We briefly discuss some of the unique design issues in this library for distribute...

  • 161.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Mathur, Kapil K
    Scientific Software Libraries for Scalable Architectures1994Inngår i: Parallel Scientific Computing, Springer-Verlag New York, 1994Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    Massively parallel processors introduce new demands on software systems with respect to performance, scalability, robustness and portability. The increased complexity of the memory systems and the increased range of problem sizes for which a given piece of software is used, poses serious challenges to software developers. The Connection Machine Scientific Software Library, CMSSL, uses several novel techniques to meet these challenges. The CMSSL contains routines for managing the data distribution and provides data distribution independent functionality. High performance is achieved through careful scheduling of arithmetic operations and data motion, and through the automatic selection of algorithms at run--time. We discuss some of the techniques used, and provide evidence that CMSSL has reached the goals of performance and scalability for an important set of applications. 1 Introduction The main reason for large scale parallelism is performance. In order for massively parallel archit...

  • 162.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Pitsianis, Nikos
    Load–Balance in Parallel FACR1999Inngår i: High Performance Algorithms for Structured Matrix Problems, Nova Biomedical , 1999, s. 163-180Kapittel i bok, del av antologi (Fagfellevurdert)
  • 163.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Saad, Y.
    Schultz, M.H.
    Alternating Direction Methods on Multiprocessors1987Inngår i: SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING, Vol. 8, nr 5, s. 686-700Artikkel i tidsskrift (Fagfellevurdert)
  • 164.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Wei, Ge
    Yuen, Dave
    GPU Solutions to Multi-Scale Problems in Science and Engineering2011Bok (Fagfellevurdert)
    Abstract [en]

    This book covers the new topic of GPU computing with many applications involved, taken from diverse fields such as networking, seismology, fluid mechanics, nano-materials, data-mining , earthquakes ,mantle convection, visualization. It will show the public why GPU computing is important and easy to use. It will offer a reason why GPU computing is useful and how to implement codes in an everyday situation.

  • 165.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Weiser, Uri
    Cohen, Danny
    L Davis, Alan
    Towards a Formal Treatment of VLSI Arrays1981Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper presents a formalism for describing the behavior of computational networks at the algorithmic level. It establishes a direct correspondence between the mathematical expressions defining a function and the computational networks which compute that function. By formally manipulating the symbolic expressions that define a function, it is possible to obtain different networks that compute the function. From this mathematical description of a network, one can directly determine certain important characteristics of computational networks, such as computational rate, performance and communication requirements. The use of this formalism for design and verification is demonstrated on computational networks for Finite Impulse Response (FIR) filters, matrix operations, and the Discrete Fourier Transform (DFT). The progression of computations can often be modeled by wave fronts in an illuminating way. The formalism supports this model. A computational network can be viewed in an abstract form that can be represented as a graph. The duality between the graph representation and the mathematical expressions is briefly introduced.

  • 166.
    Johnsson, Lennart
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Zdenek, Johan
    Mathur, Kapil K
    Thomas, J.R
    Parallel Implementation of Recursive Spectral Bisection on the Connection Machine CM–5 System1995Inngår i: Parallel Computational Fluid Dynamics: New Trends and Advances, Elsevier, 1995, s. 451-459Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    this paper, we present only an abbreviated description of the parallel implementation of the RSB algorithm, followed by two decomposition examples. Details of the implementation can be found in [4].

  • 167. Karagiannis, F.
    et al.
    Keramida, D.
    Ioannidis, Y.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Vitlacil, Dejan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Short, Faith
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Technological and organisational aspects of global research data infrastructures towards year 20202013Inngår i: Data Science Journal, ISSN 1683-1470, E-ISSN 1683-1470, Vol. 12, s. GRDI1-GRDI5Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    A general-purpose Global Research Data Infrastructure (GRDI) for all sciences and research purposes is not conceivable for the next decade as there are too many discipline-specific modalities that currently prevail for such generalisation efforts to be effective. On the other hand, a more pragmatic approach is to start from what currently exists, identify best practices and key issues, and promote effective inter-domain collaboration among different components forming an ecosystem. This will promote interoperability, data exchange, data preservation, and distributed access (among others). This ecosystem of interoperable research data infrastructures will be composed of regional, disciplinary, and multidisciplinary components, such as libraries, archives, and data centres, offering data services for both primary datasets and publications. The ecosystem will support data-intensive science and research and stimulate the interaction among all its elements, thus promoting multidisciplinary and interdisciplinary science. This special issue includes a set of independent papers from renowned experts on organisational and technological issues related to GRDIs. These documents feed into and compliment the GRDI2020 roadmap, which supports a Global Research Data Infrastructure ecosystem.

  • 168. Karlsson, A.
    et al.
    Olofsson, N.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Clements, M.
    A parallel microsimulation package for modelling cancer screening policies2017Inngår i: Proceedings of the 2016 IEEE 12th International Conference on e-Science, e-Science 2016, IEEE, 2017, s. 323-330Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Microsimulation with stochastic life histories is an important tool in the development of public policies. In this article, we use microsimulation to evaluate policies for prostate cancer testing. We implemented the microsimulations as an R package, with pre- and post-processing in R and with the simulations written in C++. Calibrating a microsimulation model with a large population can be computationally expensive. To address this issue, we investigated four forms of parallelism: (i) shared memory parallelism using R; (ii) shared memory parallelism using OpenMP at the C++ level; (iii) distributed memory parallelism using R; and (iv) a hybrid shared/distributed memory parallelism using OpenMP at the C++ level and MPI at the R level. The close coupling between R and C++ offered advantages for ease of software dissemination and the use of high-level R parallelisation methods. However, this combination brought challenges when trying to use shared memory parallelism at the C++ level: the performance gained by hybrid OpenMP/MPI came at the cost of significant re-factoring of the existing code. As a case study, we implemented a prostate cancer model in the microsimulation package. We used this model to investigate whether prostate cancer testing with specific re-testing protocols would reduce harms and maintain any mortality benefit from prostate-specific antigen testing. We showed that four-yearly testing would have a comparable effectiveness and a marked decrease in costs compared with two-yearly testing and current testing. In summary, we developed a microsimulation package in R and assessed the cost-effectiveness of prostate cancer testing. We were able to scale up the microsimulations using a combination of R and C++, however care was required when using shared memory parallelism at the C++ level.

  • 169. Kennedy, Ken
    et al.
    Mazina, Mark
    Crummey-Mellor, John
    Cooper, Keith
    Torezon, Linda
    Berman, Fran
    Chein, Andrew
    Dail, Holly
    Sievert, Otto
    Angulo, Dave
    Foster, Ian
    Gannon, Dennis
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Kesselman, Carl
    Aydt, Ruth
    Reed, Daniel
    Dongarra, Jack
    Vadhiyar, Sathish
    Wolski, Rich
    Toward a Framework for Preparing and Executing Adaptive Grid Programs2002Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper describes the program execution framework being developed by the Grid Application Development Software (GrADS) Project . The goal of this framework is to provide good resource allocation for Grid applications and to support adaptive reallocation if performance degrades because of changes in the availability of Grid resources. At the heart of this strategy is the notion of a configurable object program, which contains, in addition to application code, strategies for mapping the application to different collections of resources and a resource selection model that provides an estimate of the performance of the application on a specific collection of Grid resources. This model must be accurate enough to distinguish collections of resources that will deliver good performance from those that will not. The GrADS execution framework also provides a contract monitoring mechanism for interrupting and remapping an application execution when performance falls below acceptable levels.

  • 170.
    Kootstra, Geert
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Autonoma System, CAS.
    Wilming, N.
    Schmidt, N. M.
    Djurfeldt, Mikael
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Kragic, Danica
    KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Centrum för Autonoma System, CAS.
    König, P.
    Learning and adaptation of sensorimotor contingencies: Prism-adaptation, a case study2012Inngår i: From Animals to Animats 12, Springer Berlin/Heidelberg, 2012, Vol. 7426 LNAI, s. 341-350Konferansepaper (Fagfellevurdert)
    Abstract [en]

    This paper focuses on learning and adaptation of sensorimotor contingencies. As a specific case, we investigate the application of prism glasses, which change visual-motor contingencies. After an initial disruption of sensorimotor coordination, humans quickly adapt. However, scope and generalization of that adaptation is highly dependent on the type of feedback and exhibits markedly different degrees of generalization. We apply a model with a specific interaction of forward and inverse models to a robotic setup and subject it to the identical experiments that have been used on previous human psychophysical studies. Our model demonstrates both locally specific adaptation and global generalization in accordance with the psychophysical experiments. These results emphasize the role of the motor system for sensory processes and open an avenue to improve on sensorimotor processing.

  • 171. Korovinskiy, D. B.
    et al.
    Divin, A.
    Erkaev, N. V.
    Ivanova, V. V.
    Ivanov, I. B.
    Semenov, V. S.
    Lapenta, G.
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Biernat, H. K.
    Zellinger, M.
    MHD modeling of the double-gradient (kink) magnetic instability2013Inngår i: Journal of Geophysical Research, ISSN 0148-0227, E-ISSN 2156-2202, Vol. 118, nr 3, s. 1146-1158Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The paper presents the detailed numerical investigation of the "double-gradient mode," which is believed to be responsible for the magnetotail flapping oscillations-the fast vertical (normal to the layer) oscillations of the Earth's magnetotail plasma sheet with a quasiperiod similar to 100-200 s. The instability is studied using the magnetotail near-equilibrium configuration. For the first time, linear three-dimensional numerical analysis is complemented with full 3-D MHD simulations. It is known that the "double-gradient mode" has unstable solutions in the region of the tailward growth of the magnetic field component, normal to the current sheet. The unstable kink branch of the mode is the focus of our study. Linear MHD code results agree with the theory, and the growth rate is found to be close to the peak value, provided by the analytical estimates. Full 3-D simulations are initialized with the numerically relaxed magnetotail equilibrium, similar to the linear code initial condition. The calculations show that current layer with tailward gradient of the normal component of the magnetic field is unstable to wavelengths longer than the curvature radius of the field line. The segment of the current sheet with the earthward gradient of the normal component makes some stabilizing effect (the same effect is registered in the linearized MHD simulations) due to the minimum of the total pressure localized in the center of the sheet. The overall growth rate is close to the theoretical double-gradient estimate averaged over the computational domain.

  • 172. Korovinskiy, D. B.
    et al.
    Divin, A. V.
    Erkaev, N. V.
    Semenov, V. S.
    Artemyev, A. V.
    Ivanova, V. V.
    Ivanov, I. B.
    Lapenta, G.
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Biernat, H. K.
    The double-gradient magnetic instability: Stabilizing effect of the guide field2015Inngår i: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 22, nr 1, artikkel-id 012904Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The role of the dawn-dusk magnetic field component in stabilizing of the magnetotail flapping oscillations is investigated in the double-gradient model framework (Erkaev et al., Phys. Rev. Lett. 99, 235003 (2007)), extended for the magnetotail-like configurations with non-zero guide field By. Contribution of the guide field is examined both analytically and by means of linearized 2-dimensional (2D) and non-linear 3-dimensional (3D) MHD modeling. All three approaches demonstrate the same properties of the instability: stabilization of current sheet oscillations for short wavelength modes, appearing of the typical (fastest growing) wavelength lambda(peak) of the order of the current sheet width, decrease of the peak growth rate with increasing B-y value, and total decay of the mode for B-y similar to 0: 5 in the lobe magnetic field units. Analytical solution and 2D numerical simulations claim also the shift of lambda(peak) toward the longer wavelengths with increasing guide field. This result is barely visible in 3D simulations. It may be accounted for the specific background magnetic configuration, the pattern of tail-like equilibrium provided by approximated solution of the conventional Grad-Shafranov equation. The configuration demonstrates drastically changing radius of curvature of magnetic field lines, R-c. This, in turn, favors the "double-gradient" mode (lambda > R-c) in one part of the sheet and classical "ballooning" instability (lambda < R-c) in another part, which may result in generation of a "combined" unstable mode. (C) 2015 AIP Publishing LLC.

  • 173.
    Korovinskiy, D.
    et al.
    Austrian Acad Sci, Space Res Inst, Schmiedlstr 6, A-8042 Graz, Austria.;St Petersburg State Univ, St Petersburg 198504, Russia..
    Divin, A.
    Swedish Inst Space Phys, SE-75121 Uppsala, Sweden..
    Ivanova, V.
    St Petersburg State Univ, St Petersburg 198504, Russia..
    Erkaev, N.
    SB RAS, Inst Computat Modelling, Krasnoyarsk 660036, Russia.;Siberian Fed Univ, Krasnoyarsk 660041, Russia..
    Semenov, V.
    St Petersburg State Univ, St Petersburg 198504, Russia..
    Ivanov, I.
    St Petersburg State Univ, St Petersburg 198504, Russia..
    Biernat, H.
    Austrian Acad Sci, Space Res Inst, Schmiedlstr 6, A-8042 Graz, Austria..
    Lapenta, G.
    Katholieke Univ Leuven, Ctr Plasma Astrofys, Dept Wiskunde, B-3001 Leuven, Belgium..
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    MHD Modeling of the Kink "Double-gradient" Branch of the Ballooning Instability in the Magnetotail2014Inngår i: NUMERICAL MODELING OF SPACE PLASMA FLOWS: ASTRONUM-2013 / [ed] Pogorelov, NV Audit, E Zank, GP, ASTRONOMICAL SOC PACIFIC , 2014, Vol. 488, s. 149-154Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We present a numerical investigation of the double-gradient mode, which is believed to be responsible for the magnetotail flapping oscillations the fast vertical oscillations of the Earth's magnetotail plasma sheet (quasiperiod similar to 100 - 200 s). It is known that this mode has an unstable solution in the region of the tailward-growing normal magnetic field component. The kink branch of the mode is the focus of our study. The instability is studied using the magnetotail near-equilibrium configuration, fixed by the approximate solution of the Grad-Shafranov equation. The linear three-dimensional numerical analysis is complemented with full 3-D MUD simulations. The results of our linearized MHD code agree with the theory, and the growth rate is found to be close to the peak value provided by an analytical estimate. Also, the eigenfunctions, calculated analytically, are very similar to the perturbations obtained numerically. The full 3D MHD simulations are initialized with the numerically relaxed magnetotail equilibrium, similar to the linear code initial condition. The calculations show that the double-gradient mode is excited in a region of small radii of the magnetic field lines curvature, which is in accordance with the analytical predictions. In contrast to the linearized MHD simulations, non-local interactions are involved; hence, the overall growth rate turns out to be close to the theoretical estimate averaged over the computational domain.

  • 174. Kramer, D
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Hu, Y
    Local Basic Linear Algebra Subroutines (LBLAS) for the CM–5/5E1996Inngår i: International Journal of Supercomputer Applications, Vol. 10, nr 4, s. 300-335Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The Connection Machine Scientific Software Library (CMSSL) is a library of scientific routines designed for distributed memory architectures, The basic linear algebra subroutines (BLAS) of the CMSSL have been implemented as a two-level structure to exploit optimizations local to nodes and across nodes. This paper presents the implementation considerations and performance of the local BLAS, or BLAS local to each node of the system. A wide variety of loop structures and unrollings have been implemented in order to achieve a uniform and high performance, irrespective of the data layout in node memory. The CMSSL is the only existing high performance library capable of supporting both the data parallel and message-passing modes of programming a distributed memory computer. The implications of implementing BLAS on distributed memory computers are considered in this light.

  • 175. Krawitz, Robert
    et al.
    Frye, Roger
    Mc Donald, Doug
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    A Radix–2 FFT on the Connection Machine1989Konferansepaper (Fagfellevurdert)
    Abstract [en]

    We describe a radix-2 FFT implementation on the Connection Machine. The FFT implementation pipelines successive FFT stages to make full use of the communication capability of the network interconnecting processors, when there are multiple elements assigned to each processor. Of particular interest in distributed memory architectures such as the Connection Machine is the allocation of twiddle factors to processors. We show that with a consecutive data allocation scheme and normal order input a decimation-in-time FFT results in a factor of log2N less storage for twiddle factors than a decimation-in-frequency FFT for N processors. Similarly, with consecutive storage and bit-reversed input a decimation-in-frequency FFT requires a factor of log2N less storage than a decimation-in-time FFT. The performance of the local FFT has a peak of about 3 Gflops/s. The “global” FFT has a peak performance of about 1.7 Gflops/s.

  • 176. Kutzner, C.
    et al.
    Apostolov, Rossen
    KTH, Skolan för teknikvetenskap (SCI), Teoretisk fysik, Beräkningsbiofysik. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Hess, Berk
    KTH, Skolan för teknikvetenskap (SCI), Teoretisk fysik, Beräkningsbiofysik.
    Grubmüller, H.
    Scaling of the GROMACS 4.6 molecular dynamics code on SuperMUC2014Inngår i: Advances in Parallel Computing, ISSN 0927-5452, E-ISSN 1879-808X, Vol. 25, s. 722-727Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Here we report on the performance of GROMACS 4.6 on the SuperMUC cluster at the Leibniz Rechenzentrum in Garching. We carried out benchmarks with three biomolecular systems consisting of eighty thousand to twelve million atoms in a strong scaling test each. The twelve million atom simulation system reached a performance of 49 nanoseconds per day on 32,768 cores.

  • 177. Lapenta, Giovanni
    et al.
    Pierrard, Viviane
    Keppens, Rony
    Markidis, Stefano
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Poedts, Stefaan
    Sebek, Ondrej
    Travnicek, Pavel M.
    Henri, Pierre
    Califano, Francesco
    Pegoraro, Francesco
    Faganello, Matteo
    Olshevsky, Vyacheslav
    Restante, Anna Lisa
    Nordlund, Åke
    Frederiksen, Jacob Trier
    Mackay, Duncan H.
    Parnell, Clare E.
    Bemporad, Alessandro
    Susino, Roberto
    Borremans, Kris
    SWIFF: Space weather integrated forecasting framework2013Inngår i: Journal of Space Weather and Space Climate, ISSN 2115-7251, E-ISSN 2115-7251, Vol. 3, s. A05-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    SWIFF is a project funded by the Seventh Framework Programme of the European Commission to study the mathematical-physics models that form the basis for space weather forecasting. The phenomena of space weather span a tremendous scale of densities and temperature with scales ranging 10 orders of magnitude in space and time. Additionally even in local regions there are concurrent processes developing at the electron, ion and global scales strongly interacting with each other. The fundamental challenge in modelling space weather is the need to address multiple physics and multiple scales. Here we present our approach to take existing expertise in fluid and kinetic models to produce an integrated mathematical approach and software infrastructure that allows fluid and kinetic processes to be modelled together. SWIFF aims also at using this new infrastructure to model specific coupled processes at the Solar Corona, in the interplanetary space and in the interaction at the Earth magnetosphere.

  • 178.
    Laure, Erwin
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Edlund, Åke
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    The e-Infrastructure Ecosystem: Providing Local Support to Global Science2012Inngår i: Large-Scale Computing, John Wiley & Sons, 2012, s. 19-34Kapittel i bok, del av antologi (Fagfellevurdert)
  • 179. Laure, Erwin
    et al.
    Fisher, S.M.
    Frohner, A.
    Grandi, C.
    Kunszt, P.
    Krenek, A.
    Mulmo, O.
    Pacini, F.
    Prelz, F.
    White, J.
    Barroso, M.
    Buni, P.
    Hemmer, F.
    Di Meglio, A.
    Edlund, Åke
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Programming the Grid using gLite2006Inngår i: Computational Methods in Science and Technology, ISSN 1505-0602, Vol. 12, nr 1, s. 33-45Artikkel i tidsskrift (Annet vitenskapelig)
    Abstract [en]

    The past few years have seen the creation of the first production level Grid infrastructures that offer their users a dependableservice at an unprecedented scale. Depending on the flavor of middleware services these infrastructures deploy (for instance Condor,gLite, Globus, UNICORE, to name only a few) different interfaces to program the Grid infrastructures are provided. Despite ongoingefforts to standardize Grid service interfaces, there are still significant differences in how applications can interface to a Grid infrastructure.In this paper we describe the middleware (gLite) and services deployed on the EGEE Grid infrastructure and explain how applicationscan interface to them.

  • 180.
    Laure, Erwin
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Holmgren, Sverker
    Preface2013Inngår i: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 29, nr 8, s. 2115-2116Artikkel i tidsskrift (Fagfellevurdert)
  • 181.
    Laure, Erwin
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Jones, B.
    Enabling Grids for e-Science: The EGEE Project2009Inngår i: Grid computing: Infrastructure, Service, and application / [ed] Lizhe Wang, Wei Jie, Jinjun Chen, CRC Press, 2009, s. 55-74Kapittel i bok, del av antologi (Fagfellevurdert)
    Abstract [en]

    Enabling Grids for E-sciencE represents the world's largest multi-disciplinary Grid infrastructure today. Co-funded by the European Commission, it brings together more than 250 resource centres from 48 countries to produce a reliable and scalable computing resource available to the European and global research community. This article provides an overview of EGEE, its infrastructure, middleware, applications and support structures. This article is intended to provide a first source of information for similar efforts elsewhere and based on EGEE's experiences a sustainable model for Grid operations will be discussed.

  • 182.
    Laure, Erwin
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Kao, O.
    Badia, R. M.
    Lefevre, L.
    Di Martino, B.
    Prodan, R.
    Turilli, M.
    Warneke, D.
    Topic 6: Grid, cluster and cloud computing (Introduction)2013Inngår i: Lecture Notes in Computer Science, 2013, s. 241-Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Grid and cloud computing have changed the IT landscape in the way we access and manage IT infrastructures. The use of computing resources has become essential for many applications in various areas. Both technologies provide easyto- use and on-demand access to large-scale infrastructures. The high number of submissions to "Topic 6: Grid, Cluster and Cloud Computing" reflected the importance of the research area. The papers addressed key challenges regarding design, deployment, operation and use of Grid and cloud infrastructures. Moreover, several innovative algorithms and methods for fundamental capabilities and services that are required in a heterogeneous environment, such as adaptability, scalability, reliability and security, and to support applications as diverse as ubiquitous local services, enterprise-scale virtual organizations, and internet-scale distributed supercomputing were proposed. Finally, many experimental evaluations and use-cases delivered an insight into the deployment in real-world scenarios and showed interesting future application domains. Each submission was reviewed by at least four reviewers and, finally, we were able to select nine high-quality papers. The papers were grouped in four sessions that are briefly summarized in following.

  • 183.
    Laure, Erwin
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Vitlacil, Dejan
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Data storage and management for global research data infrastructures - Status and perspectives2013Inngår i: Data Science Journal, ISSN 1683-1470, E-ISSN 1683-1470, Vol. 12, s. GRD37-GRDI42Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    In the vision of Global Research Data Infrastructures (GRDIs), data storage and management plays a crucial role. A successful GRDI will require a common globally interoperable distributed data system, formed out of data centres, that incorporates emerging technologies and new scientific data activities. The main challenge is to define common certification and auditing frameworks that will allow storage providers and data communities to build a viable partnership based on trust. To achieve this, it is necessary to find a long-term commitment model that will give financial, legal, and organisational guarantees of digital information preservation. In this article we discuss the state of the art in data storage and management for GRDIs and point out future research directions that need to be tackled to implement GRDIs.

  • 184. Li, Peggy
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    The Tree Machine: An evaluation of program loading strategies1983Konferansepaper (Fagfellevurdert)
    Abstract [en]

    The Caltech Tree Machine has an ensemble architecture, Processors are interconnected into a binary tree. Each node executes its own code. No two nodes need to execute identical code. Nodes are synchronized by messages between adjacent nodes. Since the number of nodes is intended to be large, in the order of thousands, great care needs to be exercised in devising loading strategies to make the loading time as short as possible. A constraint is also imposed by the very limited storage associated with a processor. Nodes are assigned a type that identifies the code it shall execute. Nodes of the same type execute identical code. Tree Machine programs are frequently very regular. By exploiting this regularity, compact descriptions of the types of all nodes in the tree can be created. The limited storage of a node, and the desire to only use local information in the expansion of the compacted description implies constraints on the compression/decompression algorithms. A loading time proportional to the height of the tree is attainable in many cases with the algorithms presented. This time is also the worst case performance for one of the algorithms. The other algorithms have a worst case performance of 0 square root of N/f and O square root of (N to the power of 1/log2f), where N is the total number of nodes in a tree with fanout f. The algorithms with a less favorable upper bound, in some cases allow a more compact tree description, than the algorithm with the best upper bound.

  • 185. Lichtenstein, Woody
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Block Cyclic Dense Linear Algebra1993Inngår i: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197, Vol. 14, nr 6, s. 1257-1286Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

     Block-cyclic order elimination algorithms for LU and QR factorization and solve routines are described for distributed memory architectures with processing nodes configured as two-dimensional arrays of arbitrary shape. The cyclic-order elimination, together with a consecutive data allocation, yields good load balance for both the factorization and solution phases for the solution of dense systems of equations by LU and QR decomposition. Blocking may offer a substantial performance enhancement on architectures for which the level-2 or level-3 BLAS (basic linear algebra subroutines) are ideal for operations local to a node. High-rank updates local to a node may have a performance that is a factor of four or more higher than a rank-1 update. This paper shows that in many parallel implementations, the O(N2) work in the factorization may be of the same significance as the O(N3) work, even for large matrices. The O(N2) work is poorly load balanced in two-dimensional nodal arrays, which are shown to be optimal with respect to communication for consecutive data allocation, block-cyclic order elimination, and a simple, but fairly general, communications model. In this Connection Machine system CM-200 implementation, the peak performance for LU factorization is about 9.4 Gflops/s in 64-bit precision and 16 Gflops/s in 32-bit precision. Blocking offers an overall performance enhancement of an approximate factor of two. The broadcast-and-reduce operations fully utilize the bandwidth available in the Boolean cube network interconnecting the nodes along each axis of the two-dimensional nodal array embedded in the cube network.

  • 186.
    Livenson, Ilja
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Towards Transparent Integration of Heterogeneous Cloud Storage Platforms2011Inngår i: 4th International Workshop on Data-Intensive Distributed Computing, DIDC 2011, Association for Computing Machinery (ACM), 2011, s. 27-34Konferansepaper (Fagfellevurdert)
  • 187.
    Livenson, Ilja
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Singer, Georg
    Srirama, Satish N.
    Norbisrath, Ulrich
    Dumas, Marlon
    Towards a model for cloud computing cost estimation with reserved resources2010Inngår i: Proceeding of 2nd International ICST Conference on Cloud Computing, CloudComp 2010, 2010Konferansepaper (Fagfellevurdert)
  • 188.
    Lundborg, Magnus
    et al.
    KTH, Skolan för teknikvetenskap (SCI), Teoretisk fysik. KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Apostolov, Rossen
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Spångberg, Daniel
    Gärdenäs, Anders
    van der Spoel, David
    Lindahl, Erik
    KTH, Skolan för teknikvetenskap (SCI), Teoretisk fysik, Beräkningsbiofysik. KTH, Centra, Science for Life Laboratory, SciLifeLab. KTH, Centra, SeRC - Swedish e-Science Research Centre.
    An Efficient and Extensible Format, Library, and API for Binary Trajectory Data from Molecular Simulations2014Inngår i: Journal of Computational Chemistry, ISSN 0192-8651, E-ISSN 1096-987X, Vol. 35, nr 3, s. 260-269Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Molecular dynamics simulations is an important application in theoretical chemistry, and with the large high-performance computing resources available today the programs also generate huge amounts of output data. In particular in life sciences, with complex biomolecules such as proteins, simulation projects regularly deal with several terabytes of data. Apart from the need for more cost-efficient storage, it is increasingly important to be able to archive data, secure the integrity against disk or file transfer errors, to provide rapid access, and facilitate exchange of data through open interfaces. There is already a whole range of different formats used, but few if any of them (including our previous ones) fulfill all these goals. To address these shortcomings, we present Trajectory Next Generation (TNG)a flexible but highly optimized and efficient file format designed with interoperability in mind. TNG both provides state-of-the-art multiframe compression as well as a container framework that will make it possible to extend it with new compression algorithms without modifications in programs using it. TNG will be the new file format in the next major release of the GROMACS package, but it has been implemented as a separate library and API with liberal licensing to enable wide adoption both in academic and commercial codes.

  • 189. Manson, M
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    On–Line Determination of Power and Measurement System Configuration1976Konferansepaper (Fagfellevurdert)
  • 190.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Gong, Jing
    KTH, Centra, SeRC - Swedish e-Science Research Centre. KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC. KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Schliephake, Michael
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centra, SeRC - Swedish e-Science Research Centre.
    Hart, Alistair
    Henty, David
    Heisey, Katherine
    Fischer, Paul
    OpenACC acceleration of the Nek5000 spectral element code2015Inngår i: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 29, nr 3, s. 311-319Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    We present a case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. The original NekBone Fortran source code has been used as the base and enhanced by OpenACC directives. The profiling of NekBone provided an assessment of the suitability of the code for GPU systems, and indicated possible kernel optimizations. To port NekBone to GPU systems required little effort and a small number of additional lines of code (approximately one OpenACC directive per 1000 lines of code). The naïve implementation using OpenACC leads to little performance improvement: on a single node, from 16 Gflops obtained with the version without OpenACC, we reached 20 Gflops with the naïve OpenACC implementation. An optimized NekBone version leads to a 43 Gflop performance on a single node. In addition, we ported and optimized NekBone to parallel GPU systems, reaching a parallel efficiency of 79.9% on 1024 GPUs of the Titan XK7 supercomputer at the Oak Ridge National Laboratory.

  • 191.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz).
    Henri, P.
    Lapenta, G.
    Divin, A.
    Goldman, M.
    Newman, D.
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Kinetic simulations of plasmoid chain dynamics2013Inngår i: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 20, nr 8, s. 082105-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The dynamics of a plasmoid chain is studied with three dimensional Particle-in-Cell simulations. The evolution of the system with and without a uniform guide field, whose strength is 1/3 the asymptotic magnetic field, is investigated. The plasmoid chain forms by spontaneous magnetic reconnection: the tearing instability rapidly disrupts the initial current sheet generating several small-scale plasmoids that rapidly grow in size coalescing and kinking. The plasmoid kink is mainly driven by the coalescence process. It is found that the presence of guide field strongly influences the evolution of the plasmoid chain. Without a guide field, a main reconnection site dominates and smaller reconnection regions are included in larger ones, leading to an hierarchical structure of the plasmoid-dominated current sheet. On the contrary in presence of a guide field, plasmoids have approximately the same size and the hierarchical structure does not emerge, a strong core magnetic field develops in the center of the plasmoid in the direction of the existing guide field, and bump-on-tail instability, leading to the formation of electron holes, is detected in proximity of the plasmoids.

  • 192.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Henri, P.
    Lapenta, G.
    Divin, A.
    Goldman, M. V.
    Newman, D.
    Eriksson, S.
    Collisionless magnetic reconnection in a plasmoid chain2012Inngår i: Nonlinear processes in geophysics, ISSN 1023-5809, E-ISSN 1607-7946, Vol. 19, nr 1, s. 145-153Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The kinetic features of plasmoid chain formation and evolution are investigated by two dimensional Particlein-Cell simulations. Magnetic reconnection is initiated in multiple X points by the tearing instability. Plasmoids form and grow in size by continuously coalescing. Each chain plasmoid exhibits a strong out-of plane core magnetic field and an out-of-plane electron current that drives the coalescing process. The disappearance of the X points in the coalescence process are due to anti-reconnection, a magnetic reconnection where the plasma inflow and outflow are reversed with respect to the original reconnection flow pattern. Anti-reconnection is characterized by the Hall magnetic field quadrupole signature. Two new kinetic features, not reported by previous studies of plasmoid chain evolution, are here revealed. First, intense electric fields develop in-plane normally to the separatrices and drive the ion dynamics in the plasmoids. Second, several bipolar electric field structures are localized in proximity of the plasmoid chain. The analysis of the electron distribution function and phase space reveals the presence of counter-streaming electron beams, unstable to the two stream instability, and phase space electron holes along the reconnection separatrices.

  • 193.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Lapenta, G.
    Divin, A.
    Goldman, M.
    Newman, D.
    Andersson, L.
    Three dimensional density cavities in guide field collisionless magnetic reconnection2012Inngår i: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 19, nr 3, s. 032119-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Particle-in-cell simulations of collisionless magnetic reconnection with a guide field reveal for the first time the three dimensional features of the low density regions along the magnetic reconnection separatrices, the so-called cavities. It is found that structures with further lower density develop within the cavities. Because their appearance is similar to the rib shape, these formations are here called low density ribs. Their location remains approximately fixed in time and their density progressively decreases, as electron currents along the cavities evacuate them. They develop along the magnetic field lines and are supported by a strong perpendicular electric field that oscillates in space. In addition, bipolar parallel electric field structures form as isolated spheres between the cavities and the outflow plasma, along the direction of the low density ribs and of magnetic field lines.

  • 194.
    Markidis, Stefano
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Peng, Ivy Bo
    KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
    Larsson Träff, Jesper
    Rougier, Antoine
    Bartsch, Valeria
    Machado, Rui
    Rahn, Mirko
    Hart, Alistair
    Holmes, Daniel
    Bull, Mark
    Laure, Erwin
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    The EPiGRAM Project: Preparing Parallel Programming Models for Exascale2016Inngår i: HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, Springer, 2016, s. 56-68Konferansepaper (Fagfellevurdert)
    Abstract [en]

    EPiGRAM is a European Commission funded project to improve existing parallel programming models to run efficiently large scale applications on exascale supercomputers. The EPiGRAM project focuses on the two current dominant petascale programming models, message-passing and PGAS, and on the improvement of two of their associated programming systems, MPI and GASPI. In EPiGRAM, we work on two major aspects of programming systems. First, we improve the performance of communication operations by decreasing the memory consumption, improving collective operations and introducing emerging computing models. Second, we enhance the interoperability of message-passing and PGAS by integrating them in one PGAS-based MPI implementation, called EMPI4Re, implementing MPI endpoints and improving GASPI interoperability with MPI. The new EPiGRAM concepts are tested in two large-scale applications, iPIC3D, a Particle-in-Cell code for space physics simulations, and Nek5000, a Computational Fluid Dynamics code.

  • 195. Marthur, Kapil K
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    All–to–All Communication on the Connection Machine system CM–2001995Inngår i: Scientific Programming, ISSN 1058-9244, E-ISSN 1875-919X, Vol. 4, nr 4, s. 251-273Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    based on all--to--all broadcast and all--to--all reduce are presented. For DBLAS, at each all--to--all step, it is necessary to know the data values and the indices of the data values as well. This is in contrast to the more traditional applications of all--to--all broadcast (such as a N--body solver) where the identity of the data values is not of much interest. Detailed schedules for all--to--all broadcast and reduction are given for the data motion of arrays mapped to the processing nodes of binary cube networks using binary encoding and binary--reflected Gray encoding. The algorithms compute the indices for the communicated data locally. No communication bandwidth is consumed for data array indices. For the Connection Machine system CM--200, Hamiltonian cycle based all--to--all communication algorithms improve the performance by a factor of two to ten over a combination of tree, butterfly network, and router based algorithms. The data rate achieved for all--to--all broadcast on a 256 node Connection Machine system CM--200 is 0.3 Gbytes/sec. The data motion rate for all--to--all broadcast, including the time for index computations and local data reordering, is about 2.8 Gbytes/sec for a 2048 node system. Excluding the time for index computation and local memory reordering the measured data motion rate for all--to--all broadcast is 5.6 Gbytes/s. On a Connection Machine system, CM--200, with 2048 processing nodes, the overall performance of the distributed matrix vector multiply (DGEMV) and vector matrix multiply (DGEMV with TRANS) is 10.5 Gflops/s and 13.7 Gflops/s respectively

  • 196. Martorell, X
    et al.
    Smeds, Nils
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Walkup, R
    Brunheroto, J R
    Almási, G
    Gunnels, J A
    DeRose, L
    Labarta, J
    Escalé, F
    Giménez, J
    Servat, H
    Moreira, J E
    Blue Gene/L performance tools2005Inngår i: IBM Journal of Research and Development, ISSN 0018-8646, E-ISSN 2151-8556, Vol. 49, nr 2-3, s. 407-424Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Good performance monitoring is the basis of modern performance aualysis tools for application optimization. We are providing a variety of such performance analysis tools for the new Blue Gene(®)/L supercomputer. Those tools can be divided into two categories: single-node performance tools and multinode performance tools. Front a single-node perspectire, we provide standard interfaces and libraries, such as PAPI and libHPM, that propide access to the hardware performance counters for applications running on the Blue Gene/L compute nodes. From a multinode perspective, we focus on tools that analyze Message Passing Interface (MPI) behavior. Those tools work by first collecting message-passing trace data when a program runs. The trace data is then used by, graphical interface tools that analyze the behavior of applications. Using the current prototype tools, we demonstrate their usefulness and applicability with case studies of application optimization.

  • 197. Marzolla, M.
    et al.
    Andreetto, P.
    Venturi, V.
    Ferraro, A.
    Memon, S.
    Twedell, B.
    Riedel, M.
    Mallmann, D.
    Streit, A.
    Van De Berghe, S.
    Li, V.
    Snelling, D.
    Stamou, Katerina
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Shah, Zeeshan Ali
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Hedman, Fredrik
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Open standards-based interoperability of job submission and management interfaces across the grid middleware platforms gLite and UNICORE2007Inngår i: Proceedings - e-Science 2007, 3rd IEEE International Conference on e-Science and Grid Computing, IEEE Computer Society, 2007, s. 592-599Konferansepaper (Fagfellevurdert)
    Abstract [en]

    In a distributed Grid environment with ambitious service demands the job submission and management interfaces provide functionality of major importance. Emerging e-Science and Grid infrastructures such as EGEE and DEISA rely on highly available services that are capable of managing scientific jobs. It is the adoption of emerging open standard interfaces which allows the distribution of Grid resources in such a way that their actual service implementation or Grid technologies are not isolated from each other, especially when these resources are deployed in different e-Science infrastructures that consist of different types of computational resources. This paper motivates the interoperability of these infrastructures and discusses solutions. We describe the adoption of various open standards that recently emerged from the Open Grid Forum (OGF) in the field of job submission and management by well-known Grid technologies, respectively gLite and UNICORE. This has a fundamental impact on the interoperability between these technologies and thus within the next generation e-Science infrastructures that rely on these technologies.

  • 198. Mascellaro, L.
    et al.
    Axner, Lilit
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Gong, Jing
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Monotricat® hull, first displacement naval hull navigating at speeds of planing hulls, on spray self-produced, at high hydrodynamic efficiency and energy recovery2015Inngår i: 18th International Conference on Ships and Shipping Research, NAV 2015, The European Marine Energy Centre Ltd , 2015, s. 38-47Konferansepaper (Fagfellevurdert)
    Abstract [en]

    From the '50s, with the introduction of the first semi-planing hull of Nelson, which allowed to navigate with a certain tranquility at speeds higher than those of traditional hulls, and with the subsequent availability of more powerful engines, have been reached a speed equal to Fn greater than 0.6, which defines planing hulls. It was created so a clear distinction between displacement and planing hulls, in relation to the performances. The need to have naval units displacing faster has pushed the ship design to achieve increasingly high performance hulls, also focusing on the use of lightweight materials such as aluminum and more powerful engines, etc., but without substantially changing the traditional forms of hull. The patented hull Monotricat high hydrodynamic efficiency and energy saving represents the overcoming of this distinction between displacement and planing hulls, because, unlike previous solutions, is configured as the first hull that combines the characteristics of displacement and planning hull, since it presents an innovative architecture that could be defined as a hybrid between a monohull and catamaran, navigating on spray self-produced. This presentation will show how the hull Monotricat is the first displacement hull that can navigate at both displacement and planning speeds, with a resistance curve almost straight, maintaining the characteristics of a displacement hull. For these reasons the Monotricat hull is able to ensure: safety, comfort navigation, best seakeeping and maneuverability in restricted waters, stability, reduction of resistance to motion, cost management, regularity on the routes even in adverse weather-sea. These characteristics of the hull have been studied, tested and validated by leading research institutes and universities with more ameliorative results in each subsequent experimentation, reported in the present work, which demonstrated a greater hydrodynamic efficiency compared to conventional hulls tending to 20%.

  • 199. Mathur, K
    et al.
    Zdenek, Johan
    Hughes, Thomas J.R
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    Massively Parallel Computing: Unstructured Finite Element Simulations1993Konferansepaper (Fagfellevurdert)
    Abstract [en]

    Massively parallel computing holds the promise of extreme performance. Critical for achieving high performance is the ability to exploit locality of reference and effective management of the communication resources. This article describes two communication primitives and associated mapping strategies that have been used for several different unstructured, three-dimensional, finite element applications in computational fluid dynamics and structural mechanics.

  • 200. Mathur, Kapil K
    et al.
    Johnsson, Lennart
    KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.
    All–to–all Communication Algorithms for Distributed BLAS1993Konferansepaper (Fagfellevurdert)
    Abstract [en]

    based on all--to--all broadcast and all--to--all reduce are presented. For DBLAS, at each all--to--all step, it is necessary to know the data values and the indices of the data values as well. This is in contrast to the more traditional applications of all--to--all broadcast (such as a N--body solver) where the identity of the data values is not of much interest. Detailed schedules for all--to--all broadcast and reduction are given for the data motion of arrays mapped to the processing nodes of binary cube networks using binary encoding and binary--reflected Gray encoding. The algorithms compute the indices for the communicated data locally. No communication bandwidth is consumed for data array indices. For the Connection Machine system CM--200, Hamiltonian cycle based all--to--all communication algorithms improve the performance by a factor of two to ten over a combination of tree, butterfly network, and router based algorithms. The data rate achieved for all--to--all broadcast on a 256 node Connection Machine system CM--200 is 0.3 Gbytes/sec. The data motion rate for all--to--all broadcast, including the time for index computations and local data reordering, is about 2.8 Gbytes/sec for a 2048 node system. Excluding the time for index computation and local memory reordering the measured data motion rate for all--to--all broadcast is 5.6 Gbytes/s. On a Connection Machine system, CM--200, with 2048 processing nodes, the overall performance of the distributed matrix vector multiply (DGEMV) and vector matrix multiply (DGEMV with TRANS) is 10.5 Gflops/s and 13.7 Gflops/s respectively.

123456 151 - 200 of 260
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf