Exploiting locality in OpenMP task scheduling
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Future multi- and many- core processors are likely to have tens of cores arranged in a tiled architecture where each tile will house a processing core and a bank of the shared last-level cache. The physical distribution of tiles on the processor die gives rise to a Distributed Shared Cache (DSC) architecture where cache access latencies are non-uniform and depend on the physical distance between core and cache bank. In order to maximize cache capacity and favor design simplicity, the address space on a tiled processor is likely to be divided and mapped either statically or dynamically on to the distributed last-level cache such that each cache bank homes certain cache blocks. Given this architecture, an efficient OpenMP 3.0 task scheduler can minimize miss latencies by scheduling tasks on tiles whichare physically closer to the cache banks which home task-relevant data.
This master thesis work deals with the design and implementation of a locality-aware user-level runtime OpenMP 3.0 task scheduler for a simulated tiled multicore architecture. Guided by programmer hints, the scheduler extracts locality information pertaining to the data referenced by a task and schedules the task accordingly on the core closest to the L2 slice homing the largest amount of data. Initial results of performance comparison against a work-first randomized work-stealing cilk-like scheduler and a breadth-first randomized work-stealing scheduler have revealed problems with the locality-aware scheduler and have created ground for deeper exploration in the areas of programmer locality characterization and feedback-based extraction of locality information.
Place, publisher, year, edition, pages
2010. , 76 p.
IdentifiersURN: urn:nbn:se:kth:diva-26318OAI: oai:DiVA.org:kth-26318DiVA: diva2:371721
Brorsson, Mats, Professor