kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 65) Show all publications
SeyyedAghaei Rezaei, S. H., Modarressi, M., Ausavarungnirun, R., Sadrosadati, M., Mutlu, O. & Daneshtalab, M. (2020). NoM: Network-on-Memory for Inter-bank Data Transfer in Highly-banked Memories. IEEE Computer Architecture Letters, 19(1), 80-83, Article ID 9078774.
Open this publication in new window or tab >>NoM: Network-on-Memory for Inter-bank Data Transfer in Highly-banked Memories
Show others...
2020 (English)In: IEEE Computer Architecture Letters, ISSN 1556-6056, Vol. 19, no 1, p. 80-83, article id 9078774Article in journal (Refereed) Published
Abstract [en]

Data copy is a widely-used memory operation in many programs and operating system services. In conventional computers, data copy is often carried out by two separate read and write transactions that pass data back and forth between the memory hierarchy and processor registers. Some prior mechanisms propose to avoid this unnecessary data movement by using the shared internal bus in DRAM chip to directly copy data between two DRAM banks. While these methods exhibit superior performance, compared to conventional techniques, this technique does not allow data copy over different DRAM channels. Hence, this technique has limited benefit for the emerging 3D stacked memories (such as HMC and HBM) that contains tens of banks across multiple memory controllers. In this paper, we present Network-on-Memory (NoM), a lightweight inter-bank communication scheme that enables direct data copy within memory. NoM adopts a TDM-based circuit-switching design, where circuit setup is done by the memory controller. Compared to previous state-of-the-art approaches, NoM enables both data copy over multiple DRAM channels and concurrent copy operation. Our evaluation shows that NoM improves the performance of data-intensive workloads by 3.8X on average compare to the state-of-the-art techniques, respectively. IEEE

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2020
Keywords
3D-Stacked Memory, Circuit Switching, Data Copy, Memory Network, Memory Systems, Data transfer, Dynamic random access storage, Three dimensional integrated circuits, Communication schemes, Conventional computers, Conventional techniques, Data-intensive workloads, Memory controller, Memory operations, State-of-the-art approach, State-of-the-art techniques, Data communication systems
National Category
Embedded Systems
Identifiers
urn:nbn:se:kth:diva-274214 (URN)10.1109/LCA.2020.2990599 (DOI)000543277600002 ()2-s2.0-85084060594 (Scopus ID)
Note

QC 20200716

Available from: 2020-07-06 Created: 2020-07-06 Last updated: 2022-06-26Bibliographically approved
Akbari, N., Modarressi, M., Daneshtalab, M. & Loni, E. (2018). A Customized Processing-in-Memory Architecture for Biological Sequence Alignment. In: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors: . Paper presented at 29th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2018, 10 July 2018 through 12 July 2018. Institute of Electrical and Electronics Engineers Inc., 2018, Article ID 8445124.
Open this publication in new window or tab >>A Customized Processing-in-Memory Architecture for Biological Sequence Alignment
2018 (English)In: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, Institute of Electrical and Electronics Engineers Inc. , 2018, Vol. 2018, article id 8445124Conference paper, Published paper (Refereed)
Abstract [en]

Sequence alignment is the most widely used operation in bioinformatics. With the exponential growth of the biological sequence databases, searching a database to find the optimal alignment for a query sequence (that can be at the order of hundreds of millions of characters long) would require excessive processing power and memory bandwidth. Sequence alignment algorithms can potentially benefit from the processing power of massive parallel processors due their simple arithmetic operations, coupled with the inherent fine-grained and coarse-grained parallelism that they exhibit. However, the limited memory bandwidth in conventional computing systems prevents exploiting the maximum achievable speedup. In this paper, we propose a processing-in-memory architecture as a viable solution for the excessive memory bandwidth demand of bioinformatics applications. The design is composed of a set of simple and lightweight processing elements, customized to the sequence alignment algorithm, integrated at the logic layer of an emerging 3D DRAM architecture. Experimental results show that the proposed architecture results in up to 2.4x speedup and 41% reduction in power consumption, compared to a processor-side parallel implementation.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2018
Series
IEEE International Conference on Communications, ISSN 1063-6862
Keywords
Accelerator, Processing-in-memory, Sequence Alignment, Alignment, Bandwidth, Bioinformatics, Computation theory, Dynamic random access storage, Parallel processing systems, Particle accelerators, Query processing, 3d dram architectures, Bioinformatics applications, Biological sequence alignment, Massive parallel processors, Parallel implementations, Processing in memory, Proposed architectures, Sequence alignments, Memory architecture
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-238006 (URN)10.1109/ASAP.2018.8445124 (DOI)000447635800027 ()2-s2.0-85053445393 (Scopus ID)9781538674796 (ISBN)
Conference
29th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2018, 10 July 2018 through 12 July 2018
Note

QC 20190115

Available from: 2019-01-15 Created: 2019-01-15 Last updated: 2022-06-26Bibliographically approved
Kokhazadeh, M., Kokhazad, Z., Dehyadegari, M. & Daneshtalab, M. (2018). A Novel Two-Step Method for Stereo Vision Algorithm to Reduce Search Space. In: 26th Iranian Conference on Electrical Engineering, ICEE 2018: . Paper presented at 26th Iranian Conference on Electrical Engineering, ICEE 2018, 8 May 2018 through 10 May 2018 (pp. 1681-1686). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>A Novel Two-Step Method for Stereo Vision Algorithm to Reduce Search Space
2018 (English)In: 26th Iranian Conference on Electrical Engineering, ICEE 2018, Institute of Electrical and Electronics Engineers Inc. , 2018, p. 1681-1686Conference paper, Published paper (Refereed)
Abstract [en]

Stereo vision is a crucial algorithm in depth detection. By comparing images of a scene from two points, the relative position of objects is extracted. Human's vision system uses this relative shift between the left and right eyes to estimate the depth of information. The main goal of stereo vision is to determine the distance between objects in the scene or, in other words, to obtain depth information. This paper presents a two-step method to reduce the runtime and maintain accuracy of the stereo vision algorithm. Due to the data dependency, its implementation in parallel reduces performance. We have implemented this method for the different values of maximum disparity and window sizes. The simulation result shows that the proposed method is more than 6X faster than the common stereo vision. We have also implemented this method using Compute Unified Device Architecture (CUDA) on a Graphics Processing Unit (GPU), and we have shown that due to data dependency, this method does not work well on the Graphics Processing Unit.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2018
Keywords
CUDA, GPGPU, Real-time application, Stereo vision, Computer graphics, Computer graphics equipment, Graphics processing unit, Program processors, Stereo image processing, Compute Unified Device Architecture(CUDA), Depth information, Graphics Processing Unit (GPU), Relative positions, Stereo vision algorithms
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-247170 (URN)10.1109/ICEE.2018.8472449 (DOI)000482783300305 ()2-s2.0-85055667011 (Scopus ID)9781538649169 (ISBN)
Conference
26th Iranian Conference on Electrical Engineering, ICEE 2018, 8 May 2018 through 10 May 2018
Note

QC 20190507

Available from: 2019-05-07 Created: 2019-05-07 Last updated: 2022-06-26Bibliographically approved
Daneshtalab, M., Ejlali, A. & Kargahi, M. (2018). Special section on design for resilience in cyber-physical systems. In: CSI International Symposium on Real-Time and Embedded Systems and Technologies, RTEST 2018: . Paper presented at 2018 CSI International Symposium on Real-Time and Embedded Systems and Technologies, RTEST 2018, Tehran, Iran, 9-10 May 2018. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Special section on design for resilience in cyber-physical systems
2018 (English)In: CSI International Symposium on Real-Time and Embedded Systems and Technologies, RTEST 2018, Institute of Electrical and Electronics Engineers (IEEE) , 2018Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
National Category
Embedded Systems
Identifiers
urn:nbn:se:kth:diva-314295 (URN)10.1109/RTEST.2018.8397166 (DOI)2-s2.0-85050484048 (Scopus ID)
Conference
2018 CSI International Symposium on Real-Time and Embedded Systems and Technologies, RTEST 2018, Tehran, Iran, 9-10 May 2018
Note

Part of proceedings: ISBN 978-1-5386-1475-4

QC 20220620

Available from: 2022-06-20 Created: 2022-06-20 Last updated: 2022-06-25Bibliographically approved
Kokhazadeh, M., Kokhazad, Z., Dehyadegari, M. & Daneshtalab, M. (2017). Accelerating stereo vision algorithm using SSE3, AVX2, and CUDA. In: 2017 25th Iranian Conference on Electrical Engineering, ICEE 2017: . Paper presented at 25th Iranian Conference on Electrical Engineering, ICEE 2017, K.N. Toosi University of TechnologyTehran, Iran, 2 May 2017 through 4 May 2017 (pp. 2194-2199). Institute of Electrical and Electronics Engineers (IEEE), Article ID 7985426.
Open this publication in new window or tab >>Accelerating stereo vision algorithm using SSE3, AVX2, and CUDA
2017 (English)In: 2017 25th Iranian Conference on Electrical Engineering, ICEE 2017, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 2194-2199, article id 7985426Conference paper, Published paper (Refereed)
Abstract [en]

Stereo vision features a widespread usage such as robotics, unmanned cars, aerial surveys, and many real-time applications. Also, it needs computational expensive calculations because of stereo matching. In real time applications, the execution time of stereo vision depth detection algorithm is very important. This paper studies the Intel SIMD instructions and CUDA effects on reducing the execution time of the stereo vision. CUDA and SIMD instructions improve performance by exploiting data level parallelism. We present a fast implementation of SSD stereo vision algorithm on Intel processors using SIMD instruction sets (SSE3 and AVX2) and NVIDIA Graphics Processing Unit (GPU) using CUDA language and compare their results with serial implementation. The algorithm applied to different ranges of disparity (from 16 to 256), window size (from 3×3 to 15×15) and image resolution (from 256×212 to 1408×1168) parameters. We achieved 182 frames per second rate for the disparity of 64 and window size of 3×3 in CUDA, 64 frames per second rate in AVX2 and 25 frames per second rate in SSE3. Experimental results show that we can get speedup up to 5× in SSE3, 10× in AVX2 and 21× in CUDA compared to serial implementation.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
Keywords
AVX2, CUDA, GPU, Intel SIMD instruction set, SSD, SSE3, Stereo vision
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-217489 (URN)10.1109/IranianCEE.2017.7985426 (DOI)000426916500405 ()2-s2.0-85032838288 (Scopus ID)9781509059638 (ISBN)
Conference
25th Iranian Conference on Electrical Engineering, ICEE 2017, K.N. Toosi University of TechnologyTehran, Iran, 2 May 2017 through 4 May 2017
Note

QC 20171114

Available from: 2017-11-14 Created: 2017-11-14 Last updated: 2024-03-15Bibliographically approved
Rezaei, A., Daneshtalab, M. & Zhao, D. (2017). CAP-W: Congestion-aware platform for wireless-based network-on-chip in many-core era. Microprocessors and microsystems, 52, 23-33
Open this publication in new window or tab >>CAP-W: Congestion-aware platform for wireless-based network-on-chip in many-core era
2017 (English)In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 52, p. 23-33Article in journal (Refereed) Published
Abstract [en]

In order to fulfill the ever-increasing demand for high-speed and high-bandwidth, wireless-based MCSoC is presented based on a NoC communication infrastructure. Inspiring the separation between the communication and the computation demands as well as providing the flexible topology configurations, makes wireless-based NoC a promising future MCSoC architecture. However, congestion occurrence in wireless routers reduces the benefit of high-speed wireless links and significantly increases the network latency. Therefore, in this paper, a congestion-aware platform, named CAP-W, is introduced for wireless-based NoC in order to reduce congestion in the network and especially over wireless routers. The triple-layer platform of CAP-W is composed of mapping, migration, and routing layers. In order to minimize the congestion probability, the mapping layer is responsible for selecting the suitable free core as the first candidate, finding the suitable first task to be mapped onto the selected core, and allocating other tasks with respect to contiguity. Considering dynamic variation of application behaviors, the migration layer modifies the primary task mapping to improve congestion situation. Furthermore, the routing layer balances utilization of wired and wireless networks by separating short-distance and long-distance communications. Experimental results show meaningful gain in congestion control of wireless-based NoC compared to state-of-the-art works.

Place, publisher, year, edition, pages
Elsevier, 2017
Keywords
Adaptive Routing Algorithm, Congestion, Dynamic Application Mapping, Dynamic Task Migration, Network-on-Chip, Wireless Network-on-Chip
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-209532 (URN)10.1016/j.micpro.2017.05.014 (DOI)000407984000003 ()2-s2.0-85019713847 (Scopus ID)
Note

QC 20170620

Available from: 2017-06-20 Created: 2017-06-20 Last updated: 2024-03-15Bibliographically approved
Hojabr, R., Modarressi, M., Daneshtalab, M., Yasoubi, A. & Khonsari, A. (2017). Customizing Clos Network-on-Chip for Neural Networks. IEEE Transactions on Computers, 66(11), 1865-1877
Open this publication in new window or tab >>Customizing Clos Network-on-Chip for Neural Networks
Show others...
2017 (English)In: IEEE Transactions on Computers, ISSN 0018-9340, E-ISSN 1557-9956, Vol. 66, no 11, p. 1865-1877Article in journal (Refereed) Published
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-216597 (URN)10.1109/TC.2017.2715158 (DOI)000412566600003 ()2-s2.0-85021824532 (Scopus ID)
Note

QC 20171116

Available from: 2017-11-16 Created: 2017-11-16 Last updated: 2023-03-28Bibliographically approved
Maabi, S., Safaei, F., Rezaei, A., Daneshtalab, M. & Zhao, D. (2017). ERFAN: Efficient reconfigurable fault-tolerant deflection routing algorithm for 3-D Network-on-Chip. In: International System on Chip Conference: . Paper presented at 29th IEEE International System on Chip Conference, SOCC 2016, 6 September 2016 through 9 September 2016 (pp. 306-311). IEEE Computer Society
Open this publication in new window or tab >>ERFAN: Efficient reconfigurable fault-tolerant deflection routing algorithm for 3-D Network-on-Chip
Show others...
2017 (English)In: International System on Chip Conference, IEEE Computer Society , 2017, p. 306-311Conference paper (Refereed)
Abstract [en]

With degradation in transistors dimensions and complication of circuits, Three-Dimensional Network-on-Chip (3-D NoC) is presented as a promising solution in electronic industry. By increasing the number of system components on a chip, the probability of failure will increase. Therefore, proposing fault tolerance mechanisms is an important target in emerging technologies. In this paper, two efficient fault-tolerant routing algorithms for 3-D NoC are presented. The presented algorithms have significant improvement in performance parameters, in exchange for small area overhead. Simulation results show that even with the presence of faults, the network latency is decreased in comparison with state-of-the-art works. In addition, the network reliability is improved reasonably.

Place, publisher, year, edition, pages
IEEE Computer Society, 2017
Keywords
3-D NoC, Deflection Routing Algorithm, Fault Tolerance, Reliability, TSV, Distributed computer systems, Fault tolerant computer systems, Network architecture, Programmable logic controllers, Routers, Routing algorithms, Servers, Three dimensional integrated circuits, Deflection routings, Electronic industries, Emerging technologies, Fault tolerance mechanisms, Fault-tolerant routing algorithm, Performance parameters, Probability of failure, Three-dimensional networks, Network-on-chip
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-216529 (URN)10.1109/SOCC.2016.7905497 (DOI)000403576000054 ()2-s2.0-85019108151 (Scopus ID)9781509013661 (ISBN)
Conference
29th IEEE International System on Chip Conference, SOCC 2016, 6 September 2016 through 9 September 2016
Note

QC 20171201

Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2024-03-15Bibliographically approved
Majd, A., Sahebi, G., Daneshtalab, M., Plosila, J. & Tenhunen, H. (2017). Hierarchal Placement of Smart Mobile Access Points in Wireless Sensor Networks Using Fog Computing. In: Proceedings - 2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017: . Paper presented at 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, 6 March 2017 through 8 March 2017 (pp. 176-180). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Hierarchal Placement of Smart Mobile Access Points in Wireless Sensor Networks Using Fog Computing
Show others...
2017 (English)In: Proceedings - 2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, Institute of Electrical and Electronics Engineers Inc. , 2017, p. 176-180Conference paper, Published paper (Refereed)
Abstract [en]

Recent advances in computing and sensor technologies have facilitated the emergence of increasingly sophisticated and complex cyber-physical systems and wireless sensor networks. Moreover, integration of cyber-physical systems and wireless sensor networks with other contemporary technologies, such as unmanned aerial vehicles (i.e. drones) and fog computing, enables the creation of completely new smart solutions. By building upon the concept of a Smart Mobile Access Point (SMAP), which is a key element for a smart network, we propose a novel hierarchical placement strategy for SMAPs to improve scalability of SMAP based monitoring systems. SMAPs predict communication behavior based on information collected from the network, and select the best approach to support the network at any given time. In order to improve the network performance, they can autonomously change their positions. Therefore, placement of SMAPs has an important role in such systems. Initial placement of SMAPs is an NP problem. We solve it using a parallel implementation of the genetic algorithm with an efficient evaluation phase. The adopted hierarchical placement approach is scalable, it enables construction of arbitrarily large SMAP based systems.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2017
Keywords
cyber-physical systems, evolutionary computing, fog computing wireless sensor networks, genetic algorithms, multi-objective optimization, multi-population, parallel approaches, parallel programming, placement, smart mobile access point, Cyber Physical System, Embedded systems, Fog, Hierarchical systems, Multiobjective optimization, Optimization, Scalability, Unmanned aerial vehicles (UAV), Mobile access, Multi population, Wireless sensor networks
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-216520 (URN)10.1109/PDP.2017.27 (DOI)000403395100022 ()2-s2.0-85019635352 (Scopus ID)9781509060580 (ISBN)
Conference
25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, 6 March 2017 through 8 March 2017
Note

QC 20171201

Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2022-06-26Bibliographically approved
Rezaei, A., Zhao, D., Daneshtalab, M. & Zhou, H. (2017). Multi-objective Task Mapping Approach for Wireless NoC in Dark Silicon Age. In: Proceedings - 2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017: . Paper presented at 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, 6 March 2017 through 8 March 2017 (pp. 589-592). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Multi-objective Task Mapping Approach for Wireless NoC in Dark Silicon Age
2017 (English)In: Proceedings - 2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, Institute of Electrical and Electronics Engineers Inc. , 2017, p. 589-592Conference paper, Published paper (Refereed)
Abstract [en]

Hybrid Wireless Network-on-Chip (HWNoC) provides high bandwidth, low latency and flexible topology configurations, making this emerging technology a scalable communication fabric for future Many-Core System-on-Chips (MCSoCs). On the other hand, dark silicon is dominating the chip footage of upcoming MCSoCs since Dennard scaling fails due to the voltage scaling problem that results in higher power densities. Moreover, congestion avoidance and hot-spot prevention are two important challenges of HWNoC-based MCSoCs in dark silicon age, Therefore, in this paper, a novel task mapping approach for HWNoC is introduced in order to first balance the usage of wireless links by avoiding congestion over wireless routers and second spread temperature across the whole chip by utilizing dark silicon. Simulation results show significant improvement in both congestion and temperature control of the system, compared to state-of-The-Art works.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2017
Keywords
Dark Silicon, Mapping, Temperature, Wireless NoC, Distributed computer systems, Routers, Silicon, System-on-chip, Voltage scaling, Wireless interconnects, Congestion avoidance, Dark silicons, Emerging technologies, Flexible topology, Hybrid wireless networks, Scalable communication, State of the art, Wireless routers, Network-on-chip
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-216516 (URN)10.1109/PDP.2017.12 (DOI)000403395100089 ()2-s2.0-85019625866 (Scopus ID)9781509060580 (ISBN)
Conference
25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, 6 March 2017 through 8 March 2017
Note

Funding details: 1441695, NSF, Norsk Sykepleierforbund; Funding details: 1533656, NSF, Norsk Sykepleierforbund; Funding text: This work is partially supported by NSF under 1441695 and 1533656.

QC 20171201

Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2022-06-26Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-6289-1521

Search in DiVA

Show all publications