A Study of NoSQL and NewSQL databases for data aggregation on Big Data
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Sensor data analysis at Scania deal with large amount of data collected from vehicles. Each time a Scania vehicle enters a workshop, a large number of variables are collected and stored in a RDBMS at high speed. Sensor data is numeric and is stored in a Data Warehouse. Ad-hoc analyses are performed on this data using Business Intelligence (BI) tools like SAS. There are challenges in using traditional database that are studied to identify improvement areas. Sensor data is huge and is growing at a rapid pace. It can be categorized as BigData for Scania. This problem is studied to define ideal properties for a high performance and scalable database solution. Distributed database products are studied to find interesting products for the problem. A desirable solution is a distributed computational cluster, where most of the computations are done locally in storage nodes to fully utilize local machine’s memory, and CPU and minimize network load. There is a plethora of distributed database products categorized under NoSQL and NewSQL. There is a large variety of NoSQL products that manage Organizations data in a distributed fashion. NoSQL products typically have advantage as improved scalability and disadvantages like lacking BI tool support and weaker consistency. There is an emerging category of distributed databases known as NewSQL databases that are relational data stores and they are designed to meet the demand for high performance and scalability. In this paper, an exploratory study was performed to find suitable products among these two categories. One product from each category was selected based on comparative study for practical implementation and the production data was imported to the solutions. Performance for a common use case (median computation) was measured and compared. Based on these comparisons, recommendations were provided for a suitable distributed product for Sensor data analysis.
Place, publisher, year, edition, pages
2013. , 56 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-143345OAI: oai:DiVA.org:kth-143345DiVA: diva2:706302
Subject / course
Information and Software Systems
Master of Science - Software Engineering of Distributed Systems
Montelius, Johan, Associate Professor