As CPU chips integrate more processor cores, computer systems are evolving from multi-core to many-core. How to utilize them fully and efficiently is a great challenge. With message passing and native support of concurrent programming, Erlang is a convenient way of developing applications on these systems. The scalability of applications is dependent on the performance of the underlying Erlang runtime system or virtual machine (VM). This thesis presents a study on the scalability of the Erlang VM on a many-core processor with 64 cores, TILEPro64. The purpose is to study the implementation of parallel Erlang VM, investigate its performance, identify bottlenecks and provide optimization suggestions. To achieve this goal, the VM is tested with some benchmark programs. Then discovered problems are examinedmore closely with methods such as profiling and tracing.
The results show that the current version of Erlang VM achieves good scalability on the processor with most benchmarks used. The maximum speedup is from about 40 to 50 on 60 cores. Synchronization overhead caused by contention is a major bottleneck of the system. The scalability can be improved by reducing lock contention. Another major problem is that the parallel version of the virtual machine using one core is much slower than the sequential version with a benchmark program containing a huge amount of message passing. Further analysis indicates that synchronization latency induced by uncontended locks is one of the main reasons. Low overhead locks, lock-free structures or algorithms are recommended for improving the performance of the Erlang VM. Our evaluation result suggests Erlang is ready to be used to develop applications on many-core systems.