Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 99) Show all publications
Koubarakis, M., Bereta, K., Bilidas, D., Giannousis, K., Ioannidis, T., Pantazi, D.-A. -., . . . Fleming, A. (2019). From copernicus big data to extreme earth analytics. In: Advances in Database Technology - EDBT: . Paper presented at 22nd International Conference on Extending Database Technology, EDBT 2019; Lisbon; Portugal; 26 March 2019 through 29 March 2019 (pp. 690-693). OpenProceedings
Open this publication in new window or tab >>From copernicus big data to extreme earth analytics
Show others...
2019 (English)In: Advances in Database Technology - EDBT, OpenProceedings, 2019, p. 690-693Conference paper, Published paper (Refereed)
Abstract [en]

Copernicus is the European programme for monitoring the Earth. It consists of a set of systems that collect data from satellites and in-situ sensors, process this data and provide users with reliable and up-to-date information on a range of environmental and security issues. The data and information processed and disseminated puts Copernicus at the forefront of the big data paradigm, giving rise to all relevant challenges, the so-called 5 Vs: volume, velocity, variety, veracity and value. In this short paper, we discuss the challenges of extracting information and knowledge from huge archives of Copernicus data. We propose to achieve this by scale-out distributed deep learning techniques that run on very big clusters offering virtual machines and GPUs. We also discuss the challenges of achieving scalability in the management of the extreme volumes of information and knowledge extracted from Copernicus data. The envisioned scientific and technical work will be carried out in the context of the H2020 project ExtremeEarth which starts in January 2019.

Place, publisher, year, edition, pages
OpenProceedings, 2019
Series
Advances in Database Technology - EDBT, ISSN 2367-2005
National Category
Other Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-251874 (URN)10.5441/002/edbt.2019.88 (DOI)2-s2.0-85064893710 (Scopus ID)9783893180813 (ISBN)
Conference
22nd International Conference on Extending Database Technology, EDBT 2019; Lisbon; Portugal; 26 March 2019 through 29 March 2019
Note

QC 20190528

Available from: 2019-05-28 Created: 2019-05-28 Last updated: 2019-05-28Bibliographically approved
Ismail, M., Bonds, A., Niazi, S., Haridi, S. & Dowling, J. (2019). Scalable Block Reporting for HopsFS. In: 2019 IEEE International Congress on Big Data (BigData Congress): . Paper presented at IEEE International Congress on Big Data, IEEE BigData Congress 2019, Milan, Italy, July 8- July 13, 2019 (pp. 157-164).
Open this publication in new window or tab >>Scalable Block Reporting for HopsFS
Show others...
2019 (English)In: 2019 IEEE International Congress on Big Data (BigData Congress), 2019, p. 157-164Conference paper, Published paper (Refereed)
Abstract [en]

Distributed hierarchical file systems typically de- couple the storage of the file system’s metadata from the data (file system blocks) to enable the scalability of the file system. This decoupling, however, requires the introduction of a periodic synchronization protocol to ensure the consistency of the file system’s metadata and its blocks. Apache HDFS and HopsFS implement a protocol, called block reporting, where each data server periodically sends ground truth information about all its file system blocks to the metadata servers, allowing the metadata to be synchronized with the actual state of the data blocks in the file system. The network and processing overhead of the existing block reporting protocol, however, increases with cluster size, ultimately limiting cluster scalability. In this paper, we introduce a new block reporting protocol for HopsFS that reduces the protocol bandwidth and processing overhead by up to three orders of magnitude, compared to HDFS/HopsFS’ existing protocol. Our new protocol removes a major bottleneck that prevented HopsFS clusters scaling to tens of thousands of servers.

National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-254924 (URN)10.1109/BigDataCongress.2019.00035 (DOI)978-1-7281-2771-2 (ISBN)
Conference
IEEE International Congress on Big Data, IEEE BigData Congress 2019, Milan, Italy, July 8- July 13, 2019
Note

QC 20190902

Available from: 2019-07-09 Created: 2019-07-09 Last updated: 2019-09-02Bibliographically approved
Kalavri, V., Vlassov, V. & Haridi, S. (2018). High-Level Programming Abstractions for Distributed Graph Processing. IEEE Transactions on Knowledge and Data Engineering, 30(2), 305-324
Open this publication in new window or tab >>High-Level Programming Abstractions for Distributed Graph Processing
2018 (English)In: IEEE Transactions on Knowledge and Data Engineering, ISSN 1041-4347, E-ISSN 1558-2191, Vol. 30, no 2, p. 305-324Article in journal (Refereed) Published
Abstract [en]

Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs appear in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can cover a diverse set of problems, including iterative refinement algorithms, graph transformations, graph aggregations, pattern matching, ego-network analysis, and graph traversals. Several high-level programming abstractions have been proposed and adopted by distributed graph processing systems and big data platforms. Even though significant work has been done to experimentally compare distributed graph processing frameworks, no qualitative study and comparison of graph programming abstractions has been conducted yet. In this survey, we review and analyze the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability. We review 34 distributed graph processing systems with respect to the graph processing models they implement and we survey applications that appear in recent distributed graph systems papers. Finally, we discuss trends and open research questions in the area of distributed graph processing.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2018
Keywords
Distributed graph processing, large-scale graph analysis, big data
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-221918 (URN)10.1109/TKDE.2017.2762294 (DOI)000422711800008 ()2-s2.0-85040652305 (Scopus ID)
Note

QC 20180131

Available from: 2018-01-31 Created: 2018-01-31 Last updated: 2018-02-02Bibliographically approved
Niazi, S., Ronström, M., Haridi, S. & Dowling, J. (2018). Size Matters: Improving the Performance of Small Files in Hadoop. In: : . Paper presented at Middleware’18. ACM, Rennes, France (pp. 14).
Open this publication in new window or tab >>Size Matters: Improving the Performance of Small Files in Hadoop
2018 (English)Conference paper, Published paper (Refereed)
Abstract [en]

The Hadoop Distributed File System (HDFS) is designed to handle massive amounts of data, preferably stored in very large files. The poor performance of HDFS in managing small files has long been a bane of the Hadoop community. In many production deployments of HDFS, almost 25% of the files are less than 16 KB in size and as much as 42% of all the file system operations are performed on these small files. We have designed an adaptive tiered storage using in-memory and on-disk tables stored in a high-performance distributed database to efficiently store and improve the performance of the small files in HDFS. Our solution is completely transparent, and it does not require any changes in the HDFS clients or the applications using the Hadoop platform. In experiments, we observed up to 61~times higher throughput in writing files, and for real-world workloads from Spotify our solution reduces the latency of reading and writing small files by a factor of 3.15 and 7.39 respectively.

National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-238597 (URN)
Conference
Middleware’18. ACM, Rennes, France
Note

QC 20181106

Available from: 2018-11-05 Created: 2018-11-05 Last updated: 2018-11-05Bibliographically approved
Yalew, S. D., Maguire Jr., G. Q., Haridi, S. & Correia, M. (2017). Hail to the Thief: Protecting Data from Mobile Ransomware with ransomSafeDroid. In: Gkoulalasdivanis, A Correia, MP Avresky, DR (Ed.), 2017 IEEE 16th International Symposium on Network Computing and Applications, NCA 2017: . Paper presented at 16th IEEE International Symposium on Network Computing and Applications, NCA 2017, Cambridge, United States, 30 October 2017 through 1 November 2017 (pp. 351-358). Institute of Electrical and Electronics Engineers (IEEE), 2017
Open this publication in new window or tab >>Hail to the Thief: Protecting Data from Mobile Ransomware with ransomSafeDroid
2017 (English)In: 2017 IEEE 16th International Symposium on Network Computing and Applications, NCA 2017 / [ed] Gkoulalasdivanis, A Correia, MP Avresky, DR, Institute of Electrical and Electronics Engineers (IEEE), 2017, Vol. 2017, p. 351-358Conference paper, Published paper (Refereed)
Abstract [en]

The growing popularity of Android and the increasing amount of sensitive data stored in mobile devices have lead to the dissemination of Android ransomware. Ransomware is a class of malware that makes data inaccessible by blocking access to the device or, more frequently, by encrypting the data; to recover the data, the user has to pay a ransom to the attacker. A solution for this problem is to backup the data. Although backup tools are available for Android, these tools may be compromised or blocked by the ransomware itself. This paper presents the design and implementation of RANSOMSAFEDROID, a TrustZone based backup service for mobile devices. RANSOMSAFEDROID is protected from malware by leveraging the ARM TrustZone extension and running in the secure world. It does backup of files periodically to a secure local persistent partition and pushes these backups to external storage to protect them from ransomware. Initially, RANSOMSAFEDROID does a full backup of the device filesystem, then it does incremental backups that save the changes since the last backup. As a proof-of-concept, we implemented a RANSOMSAFEDROID prototype and provide a performance evaluation using an i.MX53 development board.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-225237 (URN)10.1109/NCA.2017.8171377 (DOI)000426971900053 ()2-s2.0-85046532213 (Scopus ID)9781538614655 (ISBN)
Conference
16th IEEE International Symposium on Network Computing and Applications, NCA 2017, Cambridge, United States, 30 October 2017 through 1 November 2017
Note

QC 20180403

Available from: 2018-04-03 Created: 2018-04-03 Last updated: 2018-05-22Bibliographically approved
Niazi, S., Ismail, M., Haridi, S., Dowling, J., Grohsschmiedt, S. & Ronström, M. (2017). HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases. In: 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017: . Paper presented at 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017 (pp. 89-103). USENIX Association
Open this publication in new window or tab >>HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
Show others...
2017 (English)In: 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017, USENIX Association , 2017, p. 89-103Conference paper, Published paper (Refereed)
Abstract [en]

Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ single node in-memory metadata service, with a distributed metadata service built on a NewSQL database. By removing the metadata bottleneck, HopsFS enables an order of magnitude larger and higher throughput clusters compared to HDFS. Metadata capacity has been increased to at least 37 times HDFS’ capacity, and in experiments based on a workload trace from Spotify, we show that HopsFS supports 16 to 37 times the throughput of Apache HDFS. HopsFS also has lower latency for many concurrent clients, and no downtime during failover. Finally, as metadata is now stored in a commodity database, it can be safely extended and easily exported to external systems for online analysis and free-text search.

Place, publisher, year, edition, pages
USENIX Association, 2017
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-205355 (URN)000427295900007 ()
Conference
15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017
Funder
EU, FP7, Seventh Framework Programme, 317871Swedish Foundation for Strategic Research , E2E-Clouds
Note

QC 20170424

Available from: 2017-04-13 Created: 2017-04-13 Last updated: 2018-11-05Bibliographically approved
Kroll, L., Carbone, P. & Haridi, S. (2017). Kompics Scala: Narrowing the gap between algorithmic specification and executable code (short paper). In: Proceedings of the 8th ACM SIGPLAN International Symposium on Scala: . Paper presented at ACM SIGPLAN International Symposium on Scala (pp. 73-77). ACM Digital Library
Open this publication in new window or tab >>Kompics Scala: Narrowing the gap between algorithmic specification and executable code (short paper)
2017 (English)In: Proceedings of the 8th ACM SIGPLAN International Symposium on Scala, ACM Digital Library, 2017, p. 73-77Conference paper, Published paper (Refereed)
Abstract [en]

Message-based programming frameworks facilitate the development and execution of core distributed computing algorithms today. Their twofold aim is to expose a programming model that minimises logical errors incurred during translation from an algorithmic specification to executable program, and also to provide an efficient runtime for event pattern-matching and scheduling of distributed components. Kompics Scala is a framework that allows for a direct, streamlined translation from a formal algorithm specification to practical code by reducing the cognitive gap between the two representations. Furthermore, its runtime decouples event pattern-matching and component execution logic yielding clean, thoroughly expected behaviours. Our evaluation shows low and constant performance overhead of Kompics Scala compared to similar frameworks that otherwise fail to offer the same level of model clarity.

Place, publisher, year, edition, pages
ACM Digital Library, 2017
Keywords
component model, message-passing, distributed systems architecture
National Category
Information Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-218781 (URN)10.1145/3136000.3136009 (DOI)2-s2.0-85037137982 (Scopus ID)978-1-4503-5529-2 (ISBN)
Conference
ACM SIGPLAN International Symposium on Scala
Note

QC 20180111

Available from: 2017-11-30 Created: 2017-11-30 Last updated: 2018-01-13Bibliographically approved
Carbone, P., Gévay, G. E., Hermann, G., Katsifodimos, A., Soto, J., Markl, V. & Haridi, S. (2017). Large-scale data stream processing systems. In: Handbook of Big Data Technologies: (pp. 219-260). Springer International Publishing
Open this publication in new window or tab >>Large-scale data stream processing systems
Show others...
2017 (English)In: Handbook of Big Data Technologies, Springer International Publishing , 2017, p. 219-260Chapter in book (Other academic)
Abstract [en]

In our data-centric society, online services, decision making, and other aspects are increasingly becoming heavily dependent on trends and patterns extracted from data. A broad class of societal-scale data management problems requires system support for processing unbounded data with low latency and high throughput. Large-scale data stream processing systems perceive data as infinite streams and are designed to satisfy such requirements. They have further evolved substantially both in terms of expressive programming model support and also efficient and durable runtime execution on commodity clusters. Expressive programming models offer convenient ways to declare continuous data properties and applied computations, while hiding details on how these data streams are physically processed and orchestrated in a distributed environment. Execution engines provide a runtime for such models further allowing for scalable yet durable execution of any declared computation. In this chapter we introduce the major design aspects of large scale data stream processing systems, covering programming model abstraction levels and runtime concerns. We then present a detailed case study on stateful stream processing with Apache Flink, an open-source stream processor that is used for a wide variety of processing tasks. Finally, we address the main challenges of disruptive applications that large-scale data streaming enables from a systemic point of view.

Place, publisher, year, edition, pages
Springer International Publishing, 2017
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-216552 (URN)10.1007/978-3-319-49340-4_7 (DOI)2-s2.0-85019960984 (Scopus ID)9783319493404 (ISBN)9783319493398 (ISBN)
Note

QC 20171108

Available from: 2017-11-08 Created: 2017-11-08 Last updated: 2017-11-08Bibliographically approved
Carbone, P., Ewen, S., Fora, G., Haridi, S., Richter, S. & Tzoumas, K. (2017). State Management in Apache Flink (R) Consistent Stateful Distributed Stream Processing. Proceedings of the VLDB Endowment, 10(12), 1718-1729
Open this publication in new window or tab >>State Management in Apache Flink (R) Consistent Stateful Distributed Stream Processing
Show others...
2017 (English)In: Proceedings of the VLDB Endowment, ISSN 2150-8097, E-ISSN 2150-8097, Vol. 10, no 12, p. 1718-1729Article in journal (Refereed) Published
Abstract [en]

Stream processors are emerging in industry as an apparatus that drives analytical but also mission critical services handling the core of persistent application logic. Thus, apart from scalability and low-latency, a rising system need is first-class support for application state together with strong consistency guarantees, and adaptivity to cluster reconfigurations, software patches and partial failures. Although prior systems research has addressed some of these specific problems, the practical challenge lies on how such guarantees can be materialized in a transparent, non-intrusive manner that relieves the user from unnecessary constraints. Such needs served as the main design principles of state management in Apache Flink, an open source, scalable stream processor. We present Flink's core pipelined, in-flight mechanism which guarantees the creation of lightweight, consistent, distributed snapshots of application state, progressively, without impacting continuous execution. Consistent snapshots cover all needs for system reconfiguration, fault tolerance and version management through coarse grained rollback recovery. Application state is declared explicitly to the system, allowing efficient partitioning and transparent commits to persistent storage. We further present Flink's backend implementations and mechanisms for high availability, external state queries and output commit. Finally, we demonstrate how these mechanisms behave in practice with metrics and largedeployment insights exhibiting the low performance trade-offs of our approach and the general benefits of exploiting asynchrony in continuous, yet sustainable system deployments.

National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-220296 (URN)000416494000011 ()2-s2.0-85036646347 (Scopus ID)
Note

QC 20171222

Available from: 2017-12-22 Created: 2017-12-22 Last updated: 2018-01-13Bibliographically approved
Yalew, S. D., Mendonca, P., Maguire Jr., G. Q., Haridi, S. & Correia, M. (2017). TruApp: A TrustZone-based Authenticity Detection Service for Mobile Apps. In: 2017 IEEE 13TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB): . Paper presented at 13th IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), OCT 09-11, 2017, Rome, ITALY. IEEE
Open this publication in new window or tab >>TruApp: A TrustZone-based Authenticity Detection Service for Mobile Apps
Show others...
2017 (English)In: 2017 IEEE 13TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB), IEEE , 2017Conference paper, Published paper (Refereed)
Abstract [en]

In less than a decade, mobile apps became an integral part of our lives. In several situations it is important to provide assurance that a mobile app is authentic, i.e., that it is indeed the app produced by a certain company. However, this is challenging, as such apps can be repackaged, the user malicious, or the app tampered with by an attacker. This paper presents the design of TRUAPP, a software authentication service that provides assurance of the authenticity and integrity of apps running on mobile devices. TRUAPP provides such assurance, even if the operating system is compromised, by leveraging the ARM TrustZone hardware security extension. TRUAPP uses a set of techniques (static watermarking, dynamic watermarking, and cryptographic hashes) to verify the integrity of the apps. The service was implemented in a hardware board that emulates a mobile device, which was used to do a thorough experimental evaluation of the service.

Place, publisher, year, edition, pages
IEEE, 2017
Series
IEEE International Conference on Wireless and Mobile Computing Networking and Communications-WiMOB, ISSN 2160-4886
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-222218 (URN)10.1109/WiMOB.2017.8115820 (DOI)000419818000108 ()2-s2.0-85041407068 (Scopus ID)978-1-5386-3839-2 (ISBN)
Conference
13th IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), OCT 09-11, 2017, Rome, ITALY
Note

QC 20180205

Available from: 2018-02-05 Created: 2018-02-05 Last updated: 2019-04-15Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6718-0144

Search in DiVA

Show all publications