Achieving memory access equalization via round-trip routing latency prediction in 3D many-core NoCs
2015 (English)In: Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI, IEEE , 2015, 398-403 p.Conference paper (Refereed)Text
3D many-core NoCs are emerging architectures for future high-performance single chips due to its integration of many processor cores and memories by stacking multiple layers. In such architecture, because processor cores and memories reside in different locations (center, corner, edge, etc.), memory accesses behave differently due to their different communication distances, and the performance (latency) gap of different memory accesses becomes larger as the network size is scaled up. This phenomenon may lead to very high latencies suffered from by some memory accesses, thus degrading the system performance. To achieve high performance, it is crucial to reduce the number of memory accesses with very high latencies. However, this should be done with care since shortening the latency of one memory access can worsen the latency of another as a result of shared network resources. Therefore, the goal should focus on narrowing the latency difference of memory accesses. In the paper, we address the goal by proposing to prioritize the memory access packets based on predicting the round-trip routing latencies of memory accesses. The communication distance and the number of the occupied items in the buffers in the remaining routing path are used to predict the round-trip latency of a memory access. The predicted round-trip routing latency is used as the base to arbitrate the memory access packets so that the memory access with potential high latency can be transferred as early and fast as possible, thus equalizing the memory access latencies as much as possible. Experiments with varied network sizes and packet injection rates prove that our approach can achieve the goal of memory access equalization and outperforms the classic round-robin arbitration in terms of maximum latency, average latency, and LSD1. In the experiments, the maximum improvement of the maximum latency, the average latency and the LSD are 80%, 14%, and 45% respectively. © 2015 IEEE.
Place, publisher, year, edition, pages
IEEE , 2015. 398-403 p.
Computer architecture, Ports (Computers), Routing, SDRAM, Switches, System-on-chip, Three-dimensional displays, Application specific integrated circuits, Dynamic random access storage, Forecasting, Network architecture, Network-on-chip, Routers, Communication distance, Emerging architectures, Memory access latency, Round-robin arbitration, Three-dimensional display, Memory architecture
IdentifiersURN: urn:nbn:se:kth:diva-186848DOI: 10.1109/ISVLSI.2015.8ISI: 000377094100071ScopusID: 2-s2.0-84957042740OAI: oai:DiVA.org:kth-186848DiVA: diva2:928912
IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2015, 8 July 2015 through 10 July 2015
QC 201605172016-05-172016-05-132016-06-28Bibliographically approved