Combining analytics framework and Cloud schedulers in order to optimise resource utilisation in a distributed Cloud
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Analytics frameworks were initially created to run on bare-metal hardware so they contain scheduling mechanisms to optimise the distribution of the cpu load and data allocation. Generally, the scheduler is part of the analytics framework resource manager. There are different resources managers used in the market and the open-source community that can serve for different analytics frameworks. For example, Spark is initially built with Mesos. Hadoop is now using YARN. Spark is also available as a YARN application. On the other hand, cloud environments (Like OpenStack) contain theirs own mechanisms of distributing resources between users and services.
While analytics applications are increasingly being migrated to the cloud, the scheduling decisions for running an analytic job is still done in isolation between the different scheduler layers (Cloud/Infrastructure vs analytics resource manager). This can seriously impact performance of analytics or other services running jointly in the same infrastructure as well as limit load-balancing, and autoscaling capabilities. This master thesis identifies what are the scheduling decisions that should be taken at the different layers (Infrastructure, Platform and Software) as well as the required metrics from the environment when mul-tiple schedulers are used in order to get the best performance and maximise the resource utilisation.
Place, publisher, year, edition, pages
2015. , 51 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-177582OAI: oai:DiVA.org:kth-177582DiVA: diva2:873440