Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 18) Show all publications
Akbari, N., Modarressi, M., Daneshtalab, M. & Loni, E. (2018). A Customized Processing-in-Memory Architecture for Biological Sequence Alignment. In: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors: . Paper presented at 29th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2018, 10 July 2018 through 12 July 2018. Institute of Electrical and Electronics Engineers Inc., 2018, Article ID 8445124.
Open this publication in new window or tab >>A Customized Processing-in-Memory Architecture for Biological Sequence Alignment
2018 (English)In: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors, Institute of Electrical and Electronics Engineers Inc. , 2018, Vol. 2018, article id 8445124Conference paper, Published paper (Refereed)
Abstract [en]

Sequence alignment is the most widely used operation in bioinformatics. With the exponential growth of the biological sequence databases, searching a database to find the optimal alignment for a query sequence (that can be at the order of hundreds of millions of characters long) would require excessive processing power and memory bandwidth. Sequence alignment algorithms can potentially benefit from the processing power of massive parallel processors due their simple arithmetic operations, coupled with the inherent fine-grained and coarse-grained parallelism that they exhibit. However, the limited memory bandwidth in conventional computing systems prevents exploiting the maximum achievable speedup. In this paper, we propose a processing-in-memory architecture as a viable solution for the excessive memory bandwidth demand of bioinformatics applications. The design is composed of a set of simple and lightweight processing elements, customized to the sequence alignment algorithm, integrated at the logic layer of an emerging 3D DRAM architecture. Experimental results show that the proposed architecture results in up to 2.4x speedup and 41% reduction in power consumption, compared to a processor-side parallel implementation.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2018
Series
IEEE International Conference on Communications, ISSN 1063-6862
Keywords
Accelerator, Processing-in-memory, Sequence Alignment, Alignment, Bandwidth, Bioinformatics, Computation theory, Dynamic random access storage, Parallel processing systems, Particle accelerators, Query processing, 3d dram architectures, Bioinformatics applications, Biological sequence alignment, Massive parallel processors, Parallel implementations, Processing in memory, Proposed architectures, Sequence alignments, Memory architecture
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-238006 (URN)10.1109/ASAP.2018.8445124 (DOI)000447635800027 ()2-s2.0-85053445393 (Scopus ID)9781538674796 (ISBN)
Conference
29th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2018, 10 July 2018 through 12 July 2018
Note

QC 20190115

Available from: 2019-01-15 Created: 2019-01-15 Last updated: 2019-05-13Bibliographically approved
Kokhazadeh, M., Kokhazad, Z., Dehyadegari, M. & Daneshtalab, M. (2018). A Novel Two-Step Method for Stereo Vision Algorithm to Reduce Search Space. In: 26th Iranian Conference on Electrical Engineering, ICEE 2018: . Paper presented at 26th Iranian Conference on Electrical Engineering, ICEE 2018, 8 May 2018 through 10 May 2018 (pp. 1681-1686). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>A Novel Two-Step Method for Stereo Vision Algorithm to Reduce Search Space
2018 (English)In: 26th Iranian Conference on Electrical Engineering, ICEE 2018, Institute of Electrical and Electronics Engineers Inc. , 2018, p. 1681-1686Conference paper, Published paper (Refereed)
Abstract [en]

Stereo vision is a crucial algorithm in depth detection. By comparing images of a scene from two points, the relative position of objects is extracted. Human's vision system uses this relative shift between the left and right eyes to estimate the depth of information. The main goal of stereo vision is to determine the distance between objects in the scene or, in other words, to obtain depth information. This paper presents a two-step method to reduce the runtime and maintain accuracy of the stereo vision algorithm. Due to the data dependency, its implementation in parallel reduces performance. We have implemented this method for the different values of maximum disparity and window sizes. The simulation result shows that the proposed method is more than 6X faster than the common stereo vision. We have also implemented this method using Compute Unified Device Architecture (CUDA) on a Graphics Processing Unit (GPU), and we have shown that due to data dependency, this method does not work well on the Graphics Processing Unit.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2018
Keywords
CUDA, GPGPU, Real-time application, Stereo vision, Computer graphics, Computer graphics equipment, Graphics processing unit, Program processors, Stereo image processing, Compute Unified Device Architecture(CUDA), Depth information, Graphics Processing Unit (GPU), Relative positions, Stereo vision algorithms
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-247170 (URN)10.1109/ICEE.2018.8472449 (DOI)2-s2.0-85055667011 (Scopus ID)9781538649169 (ISBN)
Conference
26th Iranian Conference on Electrical Engineering, ICEE 2018, 8 May 2018 through 10 May 2018
Note

QC 20190507

Available from: 2019-05-07 Created: 2019-05-07 Last updated: 2019-05-07Bibliographically approved
Kokhazadeh, M., Kokhazad, Z., Dehyadegari, M. & Daneshtalab, M. (2018). A novel two-step method for stereo vision to reduce search space. In: 26TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2018): . Paper presented at 26th Iranian Conference on Electrical Engineering (ICEE), MAY 08-10, 2018, Mashhad, IRAN (pp. 1681-1686). IEEE
Open this publication in new window or tab >>A novel two-step method for stereo vision to reduce search space
2018 (English)In: 26TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2018), IEEE , 2018, p. 1681-1686Conference paper, Published paper (Refereed)
Abstract [en]

Stereo vision is a crucial algorithm in depth detection. By comparing images of a scene from two points, the relative position of objects is extracted. Human's vision system uses this relative shift between the left and right eyes to estimate the depth of information. The main goal of stereo vision is to determine the distance between objects in the scene or, in other words, to obtain depth information. This paper presents a two-step method to reduce the runtime and maintain accuracy of the stereo vision algorithm. Due to the data dependency, its implementation in parallel reduces performance. We have implemented this method for the different values of maximum disparity and window sizes. The simulation result shows that the proposed method is more than 6X faster than the common stereo vision. We have also implemented this method using Compute Unified Device Architecture (CUDA) on a Graphics Processing Unit (GPU), and we have shown that due to data dependency, this method does not work well on the Graphics Processing Unit.

Place, publisher, year, edition, pages
IEEE, 2018
Series
Iranian Conference on Electrical Engineering, ISSN 2164-7054
Keywords
Stereo vision, CUDA, GPGPU, Real-time application
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-260236 (URN)10.1109/ICEE.2018.8472449 (DOI)000482783300305 ()2-s2.0-85055667011 (Scopus ID)978-1-5386-4916-9 (ISBN)
Conference
26th Iranian Conference on Electrical Engineering (ICEE), MAY 08-10, 2018, Mashhad, IRAN
Note

QC 20190927

Available from: 2019-09-27 Created: 2019-09-27 Last updated: 2019-09-27Bibliographically approved
Hojabr, R., Modarressi, M., Daneshtalab, M., Yasoubi, A. & Khonsari, A. (2017). Customizing Clos Network-on-Chip for Neural Networks. I.E.E.E. transactions on computers (Print), 66(11), 1865-1877
Open this publication in new window or tab >>Customizing Clos Network-on-Chip for Neural Networks
Show others...
2017 (English)In: I.E.E.E. transactions on computers (Print), ISSN 0018-9340, E-ISSN 1557-9956, Vol. 66, no 11, p. 1865-1877Article in journal (Refereed) Published
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-216597 (URN)10.1109/TC.2017.2715158 (DOI)000412566600003 ()2-s2.0-85021824532 (Scopus ID)
Note

QC 20171116

Available from: 2017-11-16 Created: 2017-11-16 Last updated: 2018-01-13Bibliographically approved
Majd, A., Sahebi, G., Daneshtalab, M., Plosila, J. & Tenhunen, H. (2017). Hierarchal Placement of Smart Mobile Access Points in Wireless Sensor Networks Using Fog Computing. In: Proceedings - 2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017: . Paper presented at 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, 6 March 2017 through 8 March 2017 (pp. 176-180). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Hierarchal Placement of Smart Mobile Access Points in Wireless Sensor Networks Using Fog Computing
Show others...
2017 (English)In: Proceedings - 2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, Institute of Electrical and Electronics Engineers Inc. , 2017, p. 176-180Conference paper (Refereed)
Abstract [en]

Recent advances in computing and sensor technologies have facilitated the emergence of increasingly sophisticated and complex cyber-physical systems and wireless sensor networks. Moreover, integration of cyber-physical systems and wireless sensor networks with other contemporary technologies, such as unmanned aerial vehicles (i.e. drones) and fog computing, enables the creation of completely new smart solutions. By building upon the concept of a Smart Mobile Access Point (SMAP), which is a key element for a smart network, we propose a novel hierarchical placement strategy for SMAPs to improve scalability of SMAP based monitoring systems. SMAPs predict communication behavior based on information collected from the network, and select the best approach to support the network at any given time. In order to improve the network performance, they can autonomously change their positions. Therefore, placement of SMAPs has an important role in such systems. Initial placement of SMAPs is an NP problem. We solve it using a parallel implementation of the genetic algorithm with an efficient evaluation phase. The adopted hierarchical placement approach is scalable, it enables construction of arbitrarily large SMAP based systems.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2017
Keywords
cyber-physical systems, evolutionary computing, fog computing wireless sensor networks, genetic algorithms, multi-objective optimization, multi-population, parallel approaches, parallel programming, placement, smart mobile access point, Cyber Physical System, Embedded systems, Fog, Hierarchical systems, Multiobjective optimization, Optimization, Scalability, Unmanned aerial vehicles (UAV), Mobile access, Multi population, Wireless sensor networks
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-216520 (URN)10.1109/PDP.2017.27 (DOI)000403395100022 ()2-s2.0-85019635352 (Scopus ID)9781509060580 (ISBN)
Conference
25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, 6 March 2017 through 8 March 2017
Note

QC 20171201

Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2017-12-01Bibliographically approved
Rezaei, A., Zhao, D., Daneshtalab, M. & Zhou, H. (2017). Multi-objective Task Mapping Approach for Wireless NoC in Dark Silicon Age. In: Proceedings - 2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017: . Paper presented at 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, 6 March 2017 through 8 March 2017 (pp. 589-592). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Multi-objective Task Mapping Approach for Wireless NoC in Dark Silicon Age
2017 (English)In: Proceedings - 2017 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, Institute of Electrical and Electronics Engineers Inc. , 2017, p. 589-592Conference paper (Refereed)
Abstract [en]

Hybrid Wireless Network-on-Chip (HWNoC) provides high bandwidth, low latency and flexible topology configurations, making this emerging technology a scalable communication fabric for future Many-Core System-on-Chips (MCSoCs). On the other hand, dark silicon is dominating the chip footage of upcoming MCSoCs since Dennard scaling fails due to the voltage scaling problem that results in higher power densities. Moreover, congestion avoidance and hot-spot prevention are two important challenges of HWNoC-based MCSoCs in dark silicon age, Therefore, in this paper, a novel task mapping approach for HWNoC is introduced in order to first balance the usage of wireless links by avoiding congestion over wireless routers and second spread temperature across the whole chip by utilizing dark silicon. Simulation results show significant improvement in both congestion and temperature control of the system, compared to state-of-The-Art works.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2017
Keywords
Dark Silicon, Mapping, Temperature, Wireless NoC, Distributed computer systems, Routers, Silicon, System-on-chip, Voltage scaling, Wireless interconnects, Congestion avoidance, Dark silicons, Emerging technologies, Flexible topology, Hybrid wireless networks, Scalable communication, State of the art, Wireless routers, Network-on-chip
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-216516 (URN)10.1109/PDP.2017.12 (DOI)000403395100089 ()2-s2.0-85019625866 (Scopus ID)9781509060580 (ISBN)
Conference
25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2017, 6 March 2017 through 8 March 2017
Note

Funding details: 1441695, NSF, Norsk Sykepleierforbund; Funding details: 1533656, NSF, Norsk Sykepleierforbund; Funding text: This work is partially supported by NSF under 1441695 and 1533656.

QC 20171201

Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2017-12-01Bibliographically approved
Momenzadeh, E., Modarressi, M., Mazloumi, A. & Daneshtalab, M. (2017). Parallel forwarding for efficient bandwidth utilization in networks-on-chip. In: 30th International Conference on Architecture of Computing Systems, ARCS 2017: . Paper presented at 3 April 2017 through 6 April 2017 (pp. 152-163). Springer Verlag
Open this publication in new window or tab >>Parallel forwarding for efficient bandwidth utilization in networks-on-chip
2017 (English)In: 30th International Conference on Architecture of Computing Systems, ARCS 2017, Springer Verlag , 2017, p. 152-163Conference paper, Published paper (Refereed)
Abstract [en]

Networks-on-chip (NoC) provide a scalable and power-efficient communication infrastructure for different computing chips, ranging from fully customized multi/many-processor systems-on-chip (MPSoCs) to general-purpose chip multiprocessors (CMPs). A common aspect in almost all NoC workloads is the varying size of data transmitted by each transaction: while large data blocks are transferred as multiple-flit packets, a part of the traffic consists of short data segment (control data) that does not even fill a single flit. In conventional NoCs, switch allocator assigns/ grants a switch output (and the link connected to it) to a single flit at each cycle, even if the flit is shorter than the link bit-width. In this paper, we propose a novel NoC architecture that enables routers to simultaneously send two short flits on the same link, effectively utilizing the link bandwidth that otherwise would be wasted. To this end, new crossbar, virtual channel (VC), and switch allocator architectures are presented to support parallel short packet forwarding on NoC links. Simulation results using synthetic and realistic workloads show that the proposed architecture improves the NoC performance by up to 24%.

Place, publisher, year, edition, pages
Springer Verlag, 2017
Keywords
Bandwidth utilization, Heterogeneous packet size, Network-on-Chip, Bandwidth, Communication channels (information theory), Computer architecture, Network architecture, Routers, Servers, System-on-chip, Band-width utilization, Efficient bandwidth, General purpose chips, NoC architectures, Packet size, Parallel forwarding, Power-efficient communications, Proposed architectures
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-207432 (URN)10.1007/978-3-319-54999-6_12 (DOI)2-s2.0-85014843599 (Scopus ID)9783319549989 (ISBN)
Conference
3 April 2017 through 6 April 2017
Note

QC 20170523

Available from: 2017-05-23 Created: 2017-05-23 Last updated: 2017-05-23Bibliographically approved
Rezaei, A., Daneshtalab, M., Zhao, D. & Modarressi, M. (2017). SAMi: Self-aware migration approach for congestion reduction in NoC-based MCSoC. In: International System on Chip Conference: . Paper presented at 29th IEEE International System on Chip Conference, SOCC 2016, 6 September 2016 through 9 September 2016 (pp. 145-150). IEEE Computer Society
Open this publication in new window or tab >>SAMi: Self-aware migration approach for congestion reduction in NoC-based MCSoC
2017 (English)In: International System on Chip Conference, IEEE Computer Society , 2017, p. 145-150Conference paper (Refereed)
Abstract [en]

Many-Core System-on-Chips (MCSoCs) require efficient task migration approach in order to reach system performance objectives such as load balancing, communication optimization, fault tolerance, and temperature control. In this paper an efficient self-aware migration approach is introduced for NoC-based MCSoCs using a centralized feedback controller in order to control the congestion over the system. The proposed approach is divided into four main steps: predicting behavior of the application, defining reliable triggers to initiate task migration, introducing cost comparison functions, and presenting a streamlined controlling mechanism to migrate tasks. The experimental results affirm that the proposed self-aware migration approach can help achieving significant throughput and system utilization while efficiently controlling system congestion.

Place, publisher, year, edition, pages
IEEE Computer Society, 2017
Keywords
Congestion, Feedback Controller, MCSoC, NoC, Performance, Task Migration, Controllers, Distributed computer systems, Fault tolerance, Feedback, Feedback control, Functions, Programmable logic controllers, Semiconductor device manufacture, System-on-chip, Traffic congestion, Network-on-chip
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-216528 (URN)10.1109/SOCC.2016.7905455 (DOI)000403576000026 ()2-s2.0-85019114536 (Scopus ID)9781509013661 (ISBN)
Conference
29th IEEE International System on Chip Conference, SOCC 2016, 6 September 2016 through 9 September 2016
Note

QC 20171201

Available from: 2017-12-01 Created: 2017-12-01 Last updated: 2017-12-01Bibliographically approved
Daneshtalab, M. & Palesi, M. (Eds.). (2016). Message from the chairs. Paper presented at 4th ACM International Workshop on Many-Core Embedded Systems, MES 2016; Seoul; South Korea; 19 June 2016. Association for Computing Machinery, 18-22-June-2016
Open this publication in new window or tab >>Message from the chairs
2016 (English)Conference proceedings (editor) (Refereed)
Place, publisher, year, edition, pages
Association for Computing Machinery, 2016
Series
ACM International Conference Proceeding Series
National Category
Embedded Systems
Identifiers
urn:nbn:se:kth:diva-197220 (URN)2-s2.0-84991051936 (Scopus ID)
Conference
4th ACM International Workshop on Many-Core Embedded Systems, MES 2016; Seoul; South Korea; 19 June 2016
Note

QC 20161207

Available from: 2016-12-07 Created: 2016-11-30 Last updated: 2016-12-07Bibliographically approved
Majd, A., Abdollahi, M., Sahebi, G., Abdollahi, D., Daneshtalab, M., Plosila, J. & Tenhunen, H. (2016). Multi-Population Parallel Imperialist Competitive Algorithm for Solving Systems of Nonlinear Equations. In: 2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016): . Paper presented at 14th International Conference on High Performance Computing & Simulation (HPCS), JUL 18-22, 2016, Innsbruck, AUSTRIA (pp. 767-775). IEEE
Open this publication in new window or tab >>Multi-Population Parallel Imperialist Competitive Algorithm for Solving Systems of Nonlinear Equations
Show others...
2016 (English)In: 2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016), IEEE, 2016, p. 767-775Conference paper, Published paper (Refereed)
Abstract [en]

the widespreadimportance of optimization and solving NP-hard problems, like solving systems of nonlinear equations, is indisputable in a diverse range of sciences. Vast uses of non-linear equations are undeniable. Some of their applications are in economics, engineering, chemistry, mechanics, medicine, and robotics. There are different types of methods of solving the systems of nonlinear equations. One of the most popular of them is Evolutionary Computing (EC). This paper presents an evolutionary algorithm that is called Parallel Imperialist Competitive Algorithm (PICA) which is based on a multi population technique for solving systems of nonlinear equations. In order to demonstrate the efficiency of the proposed approach, some well-known problems are utilized. The results indicate that the PICA has a high success and a quick convergence rate.

Place, publisher, year, edition, pages
IEEE, 2016
Keywords
parallel imperialist competitive algorithm (PICA), multi-population technique, evolutionary computing (EC), super linear performance, nonlinear equations, multi objective optimization
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-200032 (URN)10.1109/HPCSim.2016.7568412 (DOI)000389590600104 ()2-s2.0-84991660856 (Scopus ID)978-1-5090-2088-1 (ISBN)
Conference
14th International Conference on High Performance Computing & Simulation (HPCS), JUL 18-22, 2016, Innsbruck, AUSTRIA
Note

QC 20170130

Available from: 2017-01-30 Created: 2017-01-20 Last updated: 2017-01-30Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-6289-1521

Search in DiVA

Show all publications