kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data Movement on Emerging Large-Scale Parallel Systems
KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).ORCID iD: 0000-0003-4158-3583
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Large-scale HPC systems are an important driver for solving computational problems in scientific communities. Next-generation HPC systems will not only grow in scale but also in heterogeneity. This increased system complexity entails more challenges to data movement in HPC applications. Data movement on emerging HPC systems requires asynchronous fine-grained communication and efficient data placement in the main memory. This thesis proposes an innovative programming model and algorithm to prepare HPC applications for the next computing era: (1) a data streaming model that supports emerging data-intensive applications on supercomputers, (2) a decoupling model that improves parallelism and mitigates the impact of imbalance in applications, (3) a new framework and methodology for predicting the impact of largescale heterogeneous memory systems on HPC applications, and (4) a data placement algorithm that uses a set of rules and a decision tree to determine the data-to-memory mapping in heterogeneous main memory.

The proposed approaches in this thesis are evaluated on multiple supercomputers with different processors and interconnect networks. The evaluation uses a diverse set of applications that represent conventional scientific applications and emerging data-analytic workloads on HPC systems. The experimental results on the petascale testbed show that the approaches obtain increasing performance improvements as system scale increases and this trend supports the approaches as a valuable contribution towards future HPC systems.

Abstract [sv]

Storskaliga HPC-system är en viktig drivkraft för att lösa datorproblem i vetenskapliga samhällen. Nästa generations HPC-system kommer inte bara att växa i skala utan också i heterogenitet. Denna ökade systemkomplexitet medför flera utmaningar för dataförflyttning i HPC-applikationer. Dataförflyttning på nya HPC-system kräver asynkron, finkorrigerad kommunikation och en effektiv dataplacering i huvudminnet.

Denna avhandling föreslår en innovativ programmeringsmodell och algoritm för att förbereda HPC-applikationer för nästa generation: (1) en dataströmningsmodell som stöder nya dataintensiva applikationer på superdatorer, (2) en kopplingsmodell som förbättrar parallelliteten och minskar obalans i applikationer, (3) en ny metologi och struktur för att förutse effekten av storskaliga, heterogena minnessystem på HPC-applikationer, och (4) en datalägesalgoritm som använder en uppsättning av regler och ett beslutsträd för att bestämma kartläggningen av data-till-minnet i det heterogena huvudminnet.

Den föreslagna programmeringsmodellen i denna avhandling är utvärderad på flera superdatorer med olika processorer och sammankopplingsnät. Utvärderingen använder en mängd olika applikationer som representerar konventionella vetenskapliga applikationer och nya dataanalyser på HPC-system. Experimentella resultat på testbädden i petascala visar att programmeringsmodellen förbättrar prestandan när systemskalan ökar. Denna trend indikerar att modellen är ett värdefullt bidrag till framtida HPC-system.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2017. , p. 116
Series
TRITA-CSC-A, ISSN 1653-5723 ; 2017:25
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-218338ISBN: 978-91-7729-592-1 (print)OAI: oai:DiVA.org:kth-218338DiVA, id: diva2:1160619
Public defence
2017-12-18, F3, Lindstedtsvägen 26, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20171128

Available from: 2017-11-28 Created: 2017-11-27 Last updated: 2022-06-26Bibliographically approved
List of papers
1. Preparing HPC Applications for the Exascale Era: A Decoupling Strategy
Open this publication in new window or tab >>Preparing HPC Applications for the Exascale Era: A Decoupling Strategy
Show others...
2017 (English)In: 2017 46th International Conference on Parallel Processing (ICPP), IEEE Computer Society, 2017, p. 1-10, article id 8025274Conference paper, Published paper (Refereed)
Abstract [en]

Production-quality parallel applications are often a mixture of diverse operations, such as computation- and communication-intensive, regular and irregular, tightly coupled and loosely linked operations. In conventional construction of parallel applications, each process performs all the operations, which might result inefficient and seriously limit scalability, especially at large scale. We propose a decoupling strategy to improve the scalability of applications running on large-scale systems. Our strategy separates application operations onto groups of processes and enables a dataflow processing paradigm among the groups. This mechanism is effective in reducing the impact of load imbalance and increases the parallel efficiency by pipelining multiple operations. We provide a proof-of-concept implementation using MPI, the de-facto programming system on current supercomputers. We demonstrate the effectiveness of this strategy by decoupling the reduce, particle communication, halo exchange and I/O operations in a set of scientific and data-analytics applications. A performance evaluation on 8,192 processes of a Cray XC40 supercomputer shows that the proposed approach can achieve up to 4x performance improvement.

Place, publisher, year, edition, pages
IEEE Computer Society, 2017
Series
Proceedings of the International Conference on Parallel Processing, ISSN 0190-3918
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-218333 (URN)10.1109/ICPP.2017.9 (DOI)000426952300001 ()2-s2.0-85030654606 (Scopus ID)9781538610428 (ISBN)
Conference
46th International Conference on Parallel Processing, ICPP 2017, Bristol, United Kingdom, 14 August 2017 through 17 August 2017
Note

QC 20171128

Available from: 2017-11-27 Created: 2017-11-27 Last updated: 2024-03-18Bibliographically approved
2. A Data streaming model in MPI
Open this publication in new window or tab >>A Data streaming model in MPI
Show others...
2015 (English)In: Proceedings of the 3rd ExaMPI Workshop at the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2015, ACM Digital Library, 2015Conference paper, Published paper (Refereed)
Abstract [en]

Data streaming model is an effective way to tackle the chal-lenge of data-intensive applications. As traditional HPC applications generate large volume of data and more data-intensive applications move to HPC infrastructures, it is nec-essary to investigate the feasibility of combining message-passing and streaming programming models. MPI, the de facto standard for programming on HPC, cannot intuitively express the communication pattern and the functional op-erations required in streaming models. In this work, we de-signed and implemented a data streaming library MPIStream atop MPI to allocate data producers and consumers, to stream data continuously or irregularly and to process data at run-Time. In the same spirit as the STREAM benchmark, we developed a parallel stream benchmark to measure data processing rate. The performance of the library largely de-pends on the size of the stream element, the number of data producers and consumers and the computational intensity of processing one stream element. With 2,048 data produc-ers and 2,048 data consumers in the parallel benchmark, MPIStream achieved 200 GB/s processing rate on a Blue Gene/Q supercomputer. We illustrate that a streaming li-brary for HPC applications can effectively enable irregular parallel I/O, application monitoring and threshold collective operations. 

Place, publisher, year, edition, pages
ACM Digital Library, 2015
Keywords
Data-intensive, HPC, MPI, Streaming model, Data reduction, Digital storage, Functional programming, Message passing, Supercomputers, Application monitoring, Collective operations, Communication pattern, Computational intensity, Data intensive, Data-intensive application, Parallel benchmarks, Data handling
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-201832 (URN)10.1145/2831129.2831131 (DOI)2-s2.0-85009188222 (Scopus ID)9781450339988 (ISBN)
Conference
3rd Workshop on Exascale MPI, ExaMPI 2015, 15 November 2015
Note

QC 20170216

Available from: 2017-02-16 Created: 2017-02-16 Last updated: 2024-03-18Bibliographically approved
3. A performance characterization of streaming computing on supercomputers
Open this publication in new window or tab >>A performance characterization of streaming computing on supercomputers
Show others...
2016 (English)In: Procedia Computer Science, Elsevier, 2016, p. 98-107Conference paper, Published paper (Refereed)
Abstract [en]

Streaming computing models allow for on-the-y processing of large data sets. With the increased demand for processing large amount of data in a reasonable period of time, streaming models are more and more used on supercomputers to solve data-intensive problems. Because supercomputers have been mainly used for compute-intensive workload, supercomputer performance metrics focus on the number of oating point operations in time and cannot fully characterize a streaming application performance on supercomputers. We introduce the injection and processing rates as the main metrics to characterize the performance of streaming computing on supercomputers. We analyze the dynamics of these quantities in a modi ed STREAM benchmark developed atop of an MPI streaming library in a series of di erent congurations. We show that after a brief transient the injection and processing rates converge to sustained rates. We also demonstrate that streaming computing performance strongly depends on the number of connections between data producers and consumers and on the processing task granularity.

Place, publisher, year, edition, pages
Elsevier, 2016
Keywords
Big data, Data-driven applications, High-performance computing, Streaming computing, Data handling, Supercomputers, Computing performance, High performance computing, Performance characterization, Performance metrics, Processing rates, Streaming applications, Task granularity
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-195477 (URN)10.1016/j.procs.2016.05.301 (DOI)000579452200009 ()2-s2.0-84978536252 (Scopus ID)
Conference
International Conference on Computational Science, ICCS 2016, 6 June 2016 through 8 June 2016
Note

Funding Details: 671500, EC, European Commission

QC 20161125

Available from: 2016-11-25 Created: 2016-11-03 Last updated: 2024-01-15Bibliographically approved
4. MPI Streams for HPC Applications
Open this publication in new window or tab >>MPI Streams for HPC Applications
Show others...
2017 (English)In: New Frontiers in High Performance Computing and Big Data / [ed] Geoffrey Fox, Vladimir Getov, Lucio Grandinetti, Gerhard Joubert, Thomas Sterling, IOS Press, 2017, p. 75-92Conference paper, Published paper (Refereed)
Abstract [en]

Data streams are a sequence of data flowing between source and destination processes. Streaming is widely used for signal, image and video processing for its efficiency in pipelining and effectiveness in reducing demand for memory. The goal of this work is to extend the use of data streams to support both conventional scientific applications and emerging data analytics applications running on HPC platforms. We introduce an extension called MPIStream to the de-facto programming standard on HPC, MPI. MPIStream supports data streams either within a single application or among multiple applications. We present three use cases using MPI streams in HPC applications together with their parallel performance. We show the convenience of using MPI streams to support the needs from both traditional HPC and emerging data analytics applications running on supercomputers.

Place, publisher, year, edition, pages
IOS Press, 2017
Series
Advances in Parallel Computing, ISSN 0927-5452 ; 30
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-218334 (URN)10.3233/978-1-61499-816-7-75 (DOI)000450329200004 ()2-s2.0-85046361827 (Scopus ID)978-1-61499-815-0 (ISBN)978-1-61499-816-7 (ISBN)
Conference
International Research Workshop on Advanced High Performance Computing Systems, JUL, 2016, Cetraro, ITALY
Note

QCR 2017. QC 20191106

Available from: 2017-11-27 Created: 2017-11-27 Last updated: 2022-06-26Bibliographically approved
5. Exploring Application Performance on Emerging Hybrid-Memory Supercomputers
Open this publication in new window or tab >>Exploring Application Performance on Emerging Hybrid-Memory Supercomputers
Show others...
2017 (English)In: Proceedings - 18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2016, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 473-480, article id 7828415Conference paper, Published paper (Refereed)
Abstract [en]

Next-generation supercomputers will feature more hierarchical and heterogeneous memory systems with different memory technologies working side-by-side. A critical question is whether at large scale existing HPC applications and emerging data-analytics workloads will have performance improvement or degradation on these systems. We propose a systematic and fair methodology to identify the trend of application performance on emerging hybrid-memory systems. We model the memory system of next-generation supercomputers as a combination of 'fast' and 'slow' memories. We then analyze performance and dynamic execution characteristics of a variety of workloads, from traditional scientific applications to emerging data analytics to compare traditional and hybrid-memory systems. Our results show that data analytics applications can clearly benefit from the new system design, especially at large scale. Moreover, hybrid-memory systems do not penalize traditional scientific applications, which may also show performance improvement.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
Keywords
Hybrid-memory system, Large-scale applications, Performance characterization
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-208452 (URN)10.1109/HPCC-SmartCity-DSS.2016.0074 (DOI)000401700900063 ()2-s2.0-85013674475 (Scopus ID)9781509042968 (ISBN)
Conference
18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2016, Sydney, Australia, 12 December 2016 through 14 December 2016
Note

QC 20170609

Available from: 2017-06-09 Created: 2017-06-09 Last updated: 2024-03-18Bibliographically approved
6. Exploring the performance benefit of hybrid memory system on HPC environments
Open this publication in new window or tab >>Exploring the performance benefit of hybrid memory system on HPC environments
Show others...
2017 (English)In: Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 683-692, article id 7965110Conference paper, Published paper (Refereed)
Abstract [en]

Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional DRAM memory. Theoretically, HBM can provide ∼4× higher bandwidth than conventional DRAM. However, many factors impact the effective performance achieved by applications, including the application memory access pattern, the problem size, the threading level and the actual memory configuration. In this paper, we analyze the Intel KNL system and quantify the impact of the most important factors on the application performance by using a set of applications that are representative of scientific and data-analytics workloads. Our results show that applications with regular memory access benefit from MCDRAM, achieving up to 3× performance when compared to the performance obtained using only DRAM. On the contrary, applications with random memory access pattern are latency-bound and may suffer from performance degradation when using only MCDRAM. For those applications, the use of additional hardware threads may help hide latency and achieve higher aggregated bandwidth when using HBM.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
Series
IEEE International Symposium on Parallel and Distributed Processing Workshops, ISSN 2164-7062
Keywords
application performance on Intel KNL, HBM, hybrid memory system, Intel Knights Landing (KNL), MCDRAM
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-213533 (URN)10.1109/IPDPSW.2017.115 (DOI)000417418900077 ()2-s2.0-85028050020 (Scopus ID)9781538634080 (ISBN)
Conference
31st IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, Orlando, United States, 29 May 2017 through 2 June 2017
Note

QC 20170904

Available from: 2017-09-04 Created: 2017-09-04 Last updated: 2024-03-18Bibliographically approved
7. RTHMS: A Tool for Data Placement on Hybrid Memory System
Open this publication in new window or tab >>RTHMS: A Tool for Data Placement on Hybrid Memory System
Show others...
2017 (English)In: Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management, ISMM 2017, Association for Computing Machinery (ACM) , 2017, Vol. 52, no 9, p. 82-91Conference paper, Published paper (Refereed)
Abstract [en]

Traditional scientific and emerging data analytics applications require fast, power-efficient, large, and persistent memories. Combining all these characteristics within a single memory technology is expensive and hence future supercomputers will feature different memory technologies side-by-side. However, it is a complex task to program hybrid-memory systems and to identify the best object-to-memory mapping. We envision that programmers will probably resort to use default configurations that only require minimal interventions on the application code or system settings. In this work, we argue that intelligent, fine-grained data placement can achieve higher performance than default setups. We present an algorithm for data placement on hybrid-memory systems. Our algorithm is based on a set of single-object allocation rules and global data placement decisions. We also present RTHMS, a tool that implements our algorithm and provides recommendations about the object-to-memory mapping. Our experiments on a hybrid memory system, an Intel Knights Landing processor with DRAM and HBM, show that RTHMS is able to achieve higher performance than the default configuration. We believe that RTHMS will be a valuable tool for programmers working on complex hybrid-memory systems.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2017
Keywords
heterogeneous memory systems, data placement, performance metrics
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-217951 (URN)10.1145/3092255.3092273 (DOI)000414339100009 ()2-s2.0-85029516280 (Scopus ID)
Conference
2017 ACM SIGPLAN International Symposium on Memory Management, ISMM 2017, Barcelona, Spain, June 18, 2017
Note

QC 20211109

Available from: 2017-11-21 Created: 2017-11-21 Last updated: 2024-03-18Bibliographically approved
8. The cost of synchronizing imbalanced processes in message passing systems
Open this publication in new window or tab >>The cost of synchronizing imbalanced processes in message passing systems
2015 (English)In: Proceedings - IEEE International Conference on Cluster Computing, ICCC, Institute of Electrical and Electronics Engineers (IEEE), 2015, p. 408-417Conference paper, Published paper (Refereed)
Abstract [en]

Synchronization in message passing systems is achieved by communication among processes. System and architectural noise and different workloads cause processes to be imbalanced and to reach synchronization points at different time. Thus, both communication and imbalance impact the synchronization performance. In this paper, we study the algorithmic properties that allow the communication in synchronization to absorb the initial imbalance among processes. We quantify the imbalance absorption properties of different barrier algorithms using a LogP Monte Carlo simulator. We found that linear and f-way tournament barriers can absorb up to 95% of random exponential imbalance with the standard deviation equal to the communication time for one message. Dissemination, butterfly and pairwise exchange barriers, on the other hand, do not absorb imbalance but can effectively bound the post-barrier imbalance. We identify that synchronization transits from communication-dominated to imbalance-dominated when the standard deviation of imbalance distribution is more than twice the communication time for one message. In our study, f-way tournament barriers provided the best imbalance absorption rate and convenient communication time.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2015
Keywords
Message Passing, Monte Carlo Simulations, Performance Modeling, Synchronization, Cluster computing, Computer architecture, Intelligent systems, Monte Carlo methods, Statistics, Absorption property, Algorithmic properties, Imbalance distributions, Message passing systems, Monte Carlo simulators, Performance Model, Synchronization performance, Synchronization points
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-186851 (URN)10.1109/CLUSTER.2015.63 (DOI)000378648100052 ()2-s2.0-84959308490 (Scopus ID)9781467365987 (ISBN)
Conference
IEEE International Conference on Cluster Computing, CLUSTER 2015, 8 September 2015 through 11 September 2015
Note

QC 20160615

Available from: 2016-06-15 Created: 2016-05-13 Last updated: 2024-03-18Bibliographically approved
9. Idle waves in high-performance computing
Open this publication in new window or tab >>Idle waves in high-performance computing
Show others...
2015 (English)In: Physical Review E. Statistical, Nonlinear, and Soft Matter Physics, ISSN 1539-3755, E-ISSN 1550-2376, Vol. 91, no 1, p. 013306-Article in journal (Refereed) Published
Abstract [en]

The vast majority of parallel scientific applications distributes computation among processes that are in a busy state when computing and in an idle state when waiting for information from other processes. We identify the propagation of idle waves through processes in scientific applications with a local information exchange between the two processes. Idle waves are nondispersive and have a phase velocity inversely proportional to the average busy time. The physical mechanism enabling the propagation of idle waves is the local synchronization between two processes due to remote data dependency. This study provides a description of the large number of processes in parallel scientific applications as a continuous medium. This work also is a step towards an understanding of how localized idle periods can affect remote processes, leading to the degradation of global performance in parallel scientific applications.

Keywords
Continuous medium, Global performance, High performance computing, Local information, Local synchronizations, Parallel scientific applications, Physical mechanism, Scientific applications
National Category
Physical Sciences
Identifiers
urn:nbn:se:kth:diva-160745 (URN)10.1103/PhysRevE.91.013306 (DOI)000348330600020 ()25679738 (PubMedID)2-s2.0-84921638180 (Scopus ID)
Note

QC 20150302

Available from: 2015-03-02 Created: 2015-02-27 Last updated: 2024-03-18Bibliographically approved
10. Idle period propagation in message-passing applications
Open this publication in new window or tab >>Idle period propagation in message-passing applications
Show others...
2017 (English)In: Proceedings - 18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2016, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 937-944, article id 7828475Conference paper, Published paper (Refereed)
Abstract [en]

Idle periods on different processes of Message Passing applications are unavoidable. While the origin of idle periods on a single process is well understood as the effect of system and architectural random delays, yet it is unclear how these idle periods propagate from one process to another. It is important to understand idle period propagation in Message Passing applications as it allows application developers to design communication patterns avoiding idle period propagation and the consequent performance degradation in their applications. To understand idle period propagation, we introduce a methodology to trace idle periods when a process is waiting for data from a remote delayed process in MPI applications. We apply this technique in an MPI application that solves the heat equation to study idle period propagation on three different systems. We confirm that idle periods move between processes in the form of waves and that there are different stages in idle period propagation. Our methodology enables us to identify a self-synchronization phenomenon that occurs on two systems where some processes run slower than the other processes.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
Keywords
Idle period propagation, Message Passing applications, Process imbalance, Self-synchronization
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-208451 (URN)10.1109/HPCC-SmartCity-DSS.2016.0134 (DOI)000401700900123 ()2-s2.0-85013677158 (Scopus ID)9781509042968 (ISBN)
Conference
18th IEEE International Conference on High Performance Computing and Communications, 14th IEEE International Conference on Smart City and 2nd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2016, Sydney, Australia, 12 December 2016 through 14 December 2016
Note

QC 20170609

Available from: 2017-06-09 Created: 2017-06-09 Last updated: 2024-03-18Bibliographically approved
11. The EPiGRAM Project: Preparing Parallel Programming Models for Exascale
Open this publication in new window or tab >>The EPiGRAM Project: Preparing Parallel Programming Models for Exascale
Show others...
2016 (English)In: HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2016 INTERNATIONAL WORKSHOPS, Springer, 2016, p. 56-68Conference paper, Published paper (Refereed)
Abstract [en]

EPiGRAM is a European Commission funded project to improve existing parallel programming models to run efficiently large scale applications on exascale supercomputers. The EPiGRAM project focuses on the two current dominant petascale programming models, message-passing and PGAS, and on the improvement of two of their associated programming systems, MPI and GASPI. In EPiGRAM, we work on two major aspects of programming systems. First, we improve the performance of communication operations by decreasing the memory consumption, improving collective operations and introducing emerging computing models. Second, we enhance the interoperability of message-passing and PGAS by integrating them in one PGAS-based MPI implementation, called EMPI4Re, implementing MPI endpoints and improving GASPI interoperability with MPI. The new EPiGRAM concepts are tested in two large-scale applications, iPIC3D, a Particle-in-Cell code for space physics simulations, and Nek5000, a Computational Fluid Dynamics code.

Place, publisher, year, edition, pages
Springer, 2016
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9945
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-200050 (URN)10.1007/978-3-319-46079-6_5 (DOI)000389802700006 ()2-s2.0-84992593489 (Scopus ID)978-3-319-46079-6 (ISBN)978-3-319-46078-9 (ISBN)
Conference
International Supercomputing Conference (ISC High Performance), JUN 19-23, 2016, Frankfurt, GERMANY
Note

QC 20170126

Available from: 2017-01-26 Created: 2017-01-20 Last updated: 2024-03-18Bibliographically approved
12. Energetic particles in magnetotail reconnection
Open this publication in new window or tab >>Energetic particles in magnetotail reconnection
Show others...
2015 (English)In: Journal of Plasma Physics, ISSN 0022-3778, E-ISSN 1469-7807, Vol. 81, article id 325810202Article in journal (Refereed) Published
Abstract [en]

We carried out a 3D fully kinetic simulation of Earth's magnetotail magnetic reconnection to study the dynamics of energetic particles. We developed and implemented a new relativistic particle mover in iPIC3D, an implicit Particle-in-Cell code, to correctly model the dynamics of energetic particles. Before the onset of magnetic reconnection, energetic electrons are found localized close to current sheet and accelerated by lower hybrid drift instability. During magnetic reconnection, energetic particles are found in the reconnection region along the x-line and in the separatrices region. The energetic electrons are first present in localized stripes of the separatrices and finally cover all the separatrix surfaces. Along the separatrices, regions with strong electron deceleration are found. In the reconnection region, two categories of electron trajectory are identified. First, part of the electrons are trapped in the reconnection region, bouncing a few times between the outflow jets. Second, part of the electrons pass the reconnection region without being trapped. Different from electrons, energetic ions are localized on the reconnection fronts of the outflow jets.

National Category
Physical Sciences
Identifiers
urn:nbn:se:kth:diva-166497 (URN)10.1017/S0022377814001123 (DOI)000352194000020 ()2-s2.0-84925863043 (Scopus ID)
Funder
Swedish Research Council
Note

QC 20150518

Available from: 2015-05-18 Created: 2015-05-11 Last updated: 2024-03-18Bibliographically approved
13. The formation of a magnetosphere with implicit Particle-in-Cell simulations
Open this publication in new window or tab >>The formation of a magnetosphere with implicit Particle-in-Cell simulations
Show others...
2015 (English)In: Procedia Computer Science, Elsevier, 2015, no 1, p. 1178-1187Conference paper, Published paper (Refereed)
Abstract [en]

We demonstrate the improvements to an implicit Particle-in-Cell code, iPic3D, on the example of dipolar magnetic field immersed in the flow of the plasma and show the formation of a magnetosphere. We address the problem of modelling multi-scale phenomena during the formation of a magnetosphere by implementing an adaptive sub-cycling technique to resolve the motion of particles located close to the magnetic dipole centre, where the magnetic field intensity is maximum. In addition, we implemented new open boundary conditions to model the inflow and outflow of plasma. We present the results of a global three-dimensional Particle-in-Cell simulation and discuss the performance improvements from the adaptive sub-cycling technique.

Place, publisher, year, edition, pages
Elsevier, 2015
Keywords
Magnetosphere, Multi-scale simulations, Particle methods, Particle movers, Particle-in-Cell, Cells, Cytology, Magnetic fields, Magnetism, Magnetoplasma, Particle beam dynamics, Dipolar magnetic fields, Magnetic-field intensity, Multi-scale simulation, Open boundary condition, Particle in cell, Particle-in-cell simulations, Three dimensional particle-in-cell simulations, Plasma simulation
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-176135 (URN)10.1016/j.procs.2015.05.288 (DOI)000373939100120 ()2-s2.0-84939175406 (Scopus ID)
Conference
International Conference on Computational Science, ICCS 2002, 21 April 2002 through 24 April 2002, Amsterdam
Note

QC 20151202

Available from: 2015-12-02 Created: 2015-11-02 Last updated: 2024-03-18Bibliographically approved
14. Kinetic structures of quasi-perpendicular shocks in global particle-in-cell simulations
Open this publication in new window or tab >>Kinetic structures of quasi-perpendicular shocks in global particle-in-cell simulations
Show others...
2015 (English)In: Physics of Plasmas, ISSN 1070-664X, E-ISSN 1089-7674, Vol. 22, no 9, article id 092109Article in journal (Refereed) Published
Abstract [en]

We carried out global Particle-in-Cell simulations of the interaction between the solar wind and a magnetosphere to study the kinetic collisionless physics in super-critical quasi-perpendicular shocks. After an initial simulation transient, a collisionless bow shock forms as a result of the interaction of the solar wind and a planet magnetic dipole. The shock ramp has a thickness of approximately one ion skin depth and is followed by a trailing wave train in the shock downstream. At the downstream edge of the bow shock, whistler waves propagate along the magnetic field lines and the presence of electron cyclotron waves has been identified. A small part of the solar wind ion population is specularly reflected by the shock while a larger part is deflected and heated by the shock. Solar wind ions and electrons are heated in the perpendicular directions. Ions are accelerated in the perpendicular direction in the trailing wave train region. This work is an initial effort to study the electron and ion kinetic effects developed near the bow shock in a realistic magnetic field configuration.

Place, publisher, year, edition, pages
American Institute of Physics (AIP), 2015
Keywords
Cyclotrons, Electron beams, Ions, Kinetic theory, Kinetics, Magnetic fields, Magnetic levitation vehicles, Magnetism, Magnetosphere, Particle beam dynamics, Plasma shock waves, Solar wind, Electron cyclotron waves, Ion kinetic effect, Kinetic structure, Magnetic field configurations, Magnetic field line, Particle-in-cell simulations, Quasi-perpendicular shocks, Solar-wind ions, Magnetic field effects
National Category
Physical Sciences
Identifiers
urn:nbn:se:kth:diva-177749 (URN)10.1063/1.4930212 (DOI)000362571800024 ()2-s2.0-84942058186 (Scopus ID)
Note

QC 20151130

Available from: 2015-11-30 Created: 2015-11-25 Last updated: 2024-03-18Bibliographically approved

Open Access in DiVA

fulltext(3568 kB)1082 downloads
File information
File name FULLTEXT02.pdfFile size 3568 kBChecksum SHA-512
5f7333efe0a875f1a4eb11182ebabfa659a61053da00d5e41258961425a98ed65947167075c8da6d31098b1594b5c27a7c6bca3ccdb992d833ffaaeb4cab9432
Type fulltextMimetype application/pdf

Authority records

Peng, Ivy Bo

Search in DiVA

By author/editor
Peng, Ivy Bo
By organisation
Computational Science and Technology (CST)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1086 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1763 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf