Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 95) Show all publications
Kalavri, V., Vlassov, V. & Haridi, S. (2018). High-Level Programming Abstractions for Distributed Graph Processing. IEEE Transactions on Knowledge and Data Engineering, 30(2), 305-324
Open this publication in new window or tab >>High-Level Programming Abstractions for Distributed Graph Processing
2018 (English)In: IEEE Transactions on Knowledge and Data Engineering, ISSN 1041-4347, E-ISSN 1558-2191, Vol. 30, no 2, p. 305-324Article in journal (Refereed) Published
Abstract [en]

Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs appear in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can cover a diverse set of problems, including iterative refinement algorithms, graph transformations, graph aggregations, pattern matching, ego-network analysis, and graph traversals. Several high-level programming abstractions have been proposed and adopted by distributed graph processing systems and big data platforms. Even though significant work has been done to experimentally compare distributed graph processing frameworks, no qualitative study and comparison of graph programming abstractions has been conducted yet. In this survey, we review and analyze the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability. We review 34 distributed graph processing systems with respect to the graph processing models they implement and we survey applications that appear in recent distributed graph systems papers. Finally, we discuss trends and open research questions in the area of distributed graph processing.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2018
Keywords
Distributed graph processing, large-scale graph analysis, big data
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-221918 (URN)10.1109/TKDE.2017.2762294 (DOI)000422711800008 ()2-s2.0-85040652305 (Scopus ID)
Note

QC 20180131

Available from: 2018-01-31 Created: 2018-01-31 Last updated: 2018-02-02Bibliographically approved
Yalew, S. D., Maguire Jr., G. Q., Haridi, S. & Correia, M. (2017). Hail to the Thief: Protecting Data from Mobile Ransomware with ransomSafeDroid. In: Gkoulalasdivanis, A Correia, MP Avresky, DR (Ed.), 2017 IEEE 16th International Symposium on Network Computing and Applications, NCA 2017: . Paper presented at 16th IEEE International Symposium on Network Computing and Applications, NCA 2017, Cambridge, United States, 30 October 2017 through 1 November 2017 (pp. 351-358). Institute of Electrical and Electronics Engineers (IEEE), 2017
Open this publication in new window or tab >>Hail to the Thief: Protecting Data from Mobile Ransomware with ransomSafeDroid
2017 (English)In: 2017 IEEE 16th International Symposium on Network Computing and Applications, NCA 2017 / [ed] Gkoulalasdivanis, A Correia, MP Avresky, DR, Institute of Electrical and Electronics Engineers (IEEE), 2017, Vol. 2017, p. 351-358Conference paper, Published paper (Refereed)
Abstract [en]

The growing popularity of Android and the increasing amount of sensitive data stored in mobile devices have lead to the dissemination of Android ransomware. Ransomware is a class of malware that makes data inaccessible by blocking access to the device or, more frequently, by encrypting the data; to recover the data, the user has to pay a ransom to the attacker. A solution for this problem is to backup the data. Although backup tools are available for Android, these tools may be compromised or blocked by the ransomware itself. This paper presents the design and implementation of RANSOMSAFEDROID, a TrustZone based backup service for mobile devices. RANSOMSAFEDROID is protected from malware by leveraging the ARM TrustZone extension and running in the secure world. It does backup of files periodically to a secure local persistent partition and pushes these backups to external storage to protect them from ransomware. Initially, RANSOMSAFEDROID does a full backup of the device filesystem, then it does incremental backups that save the changes since the last backup. As a proof-of-concept, we implemented a RANSOMSAFEDROID prototype and provide a performance evaluation using an i.MX53 development board.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-225237 (URN)10.1109/NCA.2017.8171377 (DOI)000426971900053 ()2-s2.0-85046532213 (Scopus ID)9781538614655 (ISBN)
Conference
16th IEEE International Symposium on Network Computing and Applications, NCA 2017, Cambridge, United States, 30 October 2017 through 1 November 2017
Note

QC 20180403

Available from: 2018-04-03 Created: 2018-04-03 Last updated: 2018-05-22Bibliographically approved
Niazi, S., Ismail, M., Haridi, S., Dowling, J., Grohsschmiedt, S. & Ronström, M. (2017). HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases. In: 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017: . Paper presented at 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017 (pp. 89-103). USENIX Association
Open this publication in new window or tab >>HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
Show others...
2017 (English)In: 15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017, USENIX Association , 2017, p. 89-103Conference paper, Published paper (Refereed)
Abstract [en]

Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ single node in-memory metadata service, with a distributed metadata service built on a NewSQL database. By removing the metadata bottleneck, HopsFS enables an order of magnitude larger and higher throughput clusters compared to HDFS. Metadata capacity has been increased to at least 37 times HDFS’ capacity, and in experiments based on a workload trace from Spotify, we show that HopsFS supports 16 to 37 times the throughput of Apache HDFS. HopsFS also has lower latency for many concurrent clients, and no downtime during failover. Finally, as metadata is now stored in a commodity database, it can be safely extended and easily exported to external systems for online analysis and free-text search.

Place, publisher, year, edition, pages
USENIX Association, 2017
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-205355 (URN)000427295900007 ()
Conference
15th USENIX Conference on File and Storage Technologies, FAST 2017, Santa Clara, CA, USA, February 27 - March 2, 2017
Funder
EU, FP7, Seventh Framework Programme, 317871Swedish Foundation for Strategic Research , E2E-Clouds
Note

QC 20170424

Available from: 2017-04-13 Created: 2017-04-13 Last updated: 2018-04-03Bibliographically approved
Kroll, L., Carbone, P. & Haridi, S. (2017). Kompics Scala: Narrowing the gap between algorithmic specification and executable code (short paper). In: Proceedings of the 8th ACM SIGPLAN International Symposium on Scala: . Paper presented at ACM SIGPLAN International Symposium on Scala (pp. 73-77). ACM Digital Library
Open this publication in new window or tab >>Kompics Scala: Narrowing the gap between algorithmic specification and executable code (short paper)
2017 (English)In: Proceedings of the 8th ACM SIGPLAN International Symposium on Scala, ACM Digital Library, 2017, p. 73-77Conference paper, Published paper (Refereed)
Abstract [en]

Message-based programming frameworks facilitate the development and execution of core distributed computing algorithms today. Their twofold aim is to expose a programming model that minimises logical errors incurred during translation from an algorithmic specification to executable program, and also to provide an efficient runtime for event pattern-matching and scheduling of distributed components. Kompics Scala is a framework that allows for a direct, streamlined translation from a formal algorithm specification to practical code by reducing the cognitive gap between the two representations. Furthermore, its runtime decouples event pattern-matching and component execution logic yielding clean, thoroughly expected behaviours. Our evaluation shows low and constant performance overhead of Kompics Scala compared to similar frameworks that otherwise fail to offer the same level of model clarity.

Place, publisher, year, edition, pages
ACM Digital Library, 2017
Keywords
component model, message-passing, distributed systems architecture
National Category
Information Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-218781 (URN)10.1145/3136000.3136009 (DOI)2-s2.0-85037137982 (Scopus ID)978-1-4503-5529-2 (ISBN)
Conference
ACM SIGPLAN International Symposium on Scala
Note

QC 20180111

Available from: 2017-11-30 Created: 2017-11-30 Last updated: 2018-01-13Bibliographically approved
Carbone, P., Gévay, G. E., Hermann, G., Katsifodimos, A., Soto, J., Markl, V. & Haridi, S. (2017). Large-scale data stream processing systems. In: Handbook of Big Data Technologies: (pp. 219-260). Springer International Publishing
Open this publication in new window or tab >>Large-scale data stream processing systems
Show others...
2017 (English)In: Handbook of Big Data Technologies, Springer International Publishing , 2017, p. 219-260Chapter in book (Other academic)
Abstract [en]

In our data-centric society, online services, decision making, and other aspects are increasingly becoming heavily dependent on trends and patterns extracted from data. A broad class of societal-scale data management problems requires system support for processing unbounded data with low latency and high throughput. Large-scale data stream processing systems perceive data as infinite streams and are designed to satisfy such requirements. They have further evolved substantially both in terms of expressive programming model support and also efficient and durable runtime execution on commodity clusters. Expressive programming models offer convenient ways to declare continuous data properties and applied computations, while hiding details on how these data streams are physically processed and orchestrated in a distributed environment. Execution engines provide a runtime for such models further allowing for scalable yet durable execution of any declared computation. In this chapter we introduce the major design aspects of large scale data stream processing systems, covering programming model abstraction levels and runtime concerns. We then present a detailed case study on stateful stream processing with Apache Flink, an open-source stream processor that is used for a wide variety of processing tasks. Finally, we address the main challenges of disruptive applications that large-scale data streaming enables from a systemic point of view.

Place, publisher, year, edition, pages
Springer International Publishing, 2017
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-216552 (URN)10.1007/978-3-319-49340-4_7 (DOI)2-s2.0-85019960984 (Scopus ID)9783319493404 (ISBN)9783319493398 (ISBN)
Note

QC 20171108

Available from: 2017-11-08 Created: 2017-11-08 Last updated: 2017-11-08Bibliographically approved
Carbone, P., Ewen, S., Fora, G., Haridi, S., Richter, S. & Tzoumas, K. (2017). State Management in Apache Flink (R) Consistent Stateful Distributed Stream Processing. Proceedings of the VLDB Endowment, 10(12), 1718-1729
Open this publication in new window or tab >>State Management in Apache Flink (R) Consistent Stateful Distributed Stream Processing
Show others...
2017 (English)In: Proceedings of the VLDB Endowment, ISSN 2150-8097, E-ISSN 2150-8097, Vol. 10, no 12, p. 1718-1729Article in journal (Refereed) Published
Abstract [en]

Stream processors are emerging in industry as an apparatus that drives analytical but also mission critical services handling the core of persistent application logic. Thus, apart from scalability and low-latency, a rising system need is first-class support for application state together with strong consistency guarantees, and adaptivity to cluster reconfigurations, software patches and partial failures. Although prior systems research has addressed some of these specific problems, the practical challenge lies on how such guarantees can be materialized in a transparent, non-intrusive manner that relieves the user from unnecessary constraints. Such needs served as the main design principles of state management in Apache Flink, an open source, scalable stream processor. We present Flink's core pipelined, in-flight mechanism which guarantees the creation of lightweight, consistent, distributed snapshots of application state, progressively, without impacting continuous execution. Consistent snapshots cover all needs for system reconfiguration, fault tolerance and version management through coarse grained rollback recovery. Application state is declared explicitly to the system, allowing efficient partitioning and transparent commits to persistent storage. We further present Flink's backend implementations and mechanisms for high availability, external state queries and output commit. Finally, we demonstrate how these mechanisms behave in practice with metrics and largedeployment insights exhibiting the low performance trade-offs of our approach and the general benefits of exploiting asynchrony in continuous, yet sustainable system deployments.

National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-220296 (URN)000416494000011 ()2-s2.0-85036646347 (Scopus ID)
Note

QC 20171222

Available from: 2017-12-22 Created: 2017-12-22 Last updated: 2018-01-13Bibliographically approved
Yalew, S. D., Mendonca, P., Maguire Jr., G. Q., Haridi, S. & Correia, M. (2017). TruApp: A TrustZone-based Authenticity Detection Service for Mobile Apps. In: 2017 IEEE 13TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB): . Paper presented at 13th IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), OCT 09-11, 2017, Rome, ITALY (pp. 791-799). IEEE
Open this publication in new window or tab >>TruApp: A TrustZone-based Authenticity Detection Service for Mobile Apps
Show others...
2017 (English)In: 2017 IEEE 13TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB), IEEE , 2017, p. 791-799Conference paper, Published paper (Refereed)
Abstract [en]

In less than a decade, mobile apps became an integral part of our lives. In several situations it is important to provide assurance that a mobile app is authentic, i.e., that it is indeed the app produced by a certain company. However, this is challenging, as such apps can be repackaged, the user malicious, or the app tampered with by an attacker. This paper presents the design of TRUAPP, a software authentication service that provides assurance of the authenticity and integrity of apps running on mobile devices. TRUAPP provides such assurance, even if the operating system is compromised, by leveraging the ARM TrustZone hardware security extension. TRUAPP uses a set of techniques (static watermarking, dynamic watermarking, and cryptographic hashes) to verify the integrity of the apps. The service was implemented in a hardware board that emulates a mobile device, which was used to do a thorough experimental evaluation of the service.

Place, publisher, year, edition, pages
IEEE, 2017
Series
IEEE International Conference on Wireless and Mobile Computing Networking and Communications-WiMOB, ISSN 2160-4886
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-222218 (URN)000419818000108 ()978-1-5386-3839-2 (ISBN)
Conference
13th IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), OCT 09-11, 2017, Rome, ITALY
Note

QC 20180205

Available from: 2018-02-05 Created: 2018-02-05 Last updated: 2018-03-07Bibliographically approved
Zeng, J., Barreto, J., Haridi, S., Rodrigues, L. & Romano, P. (2016). The Future(s) of Transactional Memory. In: Proceedings of the International Conference on Parallel Processing: . Paper presented at 45th International Conference on Parallel Processing, ICPP 2016, 16 August 2016 through 19 August 2016 (pp. 442-451). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>The Future(s) of Transactional Memory
Show others...
2016 (English)In: Proceedings of the International Conference on Parallel Processing, Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 442-451Conference paper, Published paper (Refereed)
Abstract [en]

This work investigates how to combine two powerful abstractions to manage concurrent programming: Transactional Memory (TM) and futures. The former hides from programmers the complexity of synchronizing concurrent access to shared data, via the familiar abstraction of atomic transactions. The latter serves to schedule and synchronize the parallel execution of computations whose results are not immediately required. While TM and futures are two widely investigated topics, the problem of how to exploit these two abstractions in synergy is still largely unexplored in the literature. This paper fills this gap by introducing Java Transactional Futures (JTF), a Java-based TM implementation that allows programmers to use futures to coordinate the execution of parallel tasks, while leveraging transactions to synchronize accesses to shared data. JTF provides a simple and intuitive semantic regarding the admissible serialization orders of the futures spawned by transactions, by ensuring that the results produced by a future are always consistent with those that one would obtain by executing the future sequentially. Our experimental results show that the use of futures in a TM allows not only to unlock parallelism within transactions, but also to reduce the cost of conflicts among top-level transactions in high contention workloads.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2016
Keywords
Concurrent programming, Futures, Transactional Memory, Abstracting, Computer programming, Java programming language, Semantics, Storage allocation (computer), Atomic transaction, Concurrent access, High contentions, Parallel executions, Parallel task, Concurrency control
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-195320 (URN)10.1109/ICPP.2016.57 (DOI)000387089600050 ()2-s2.0-84990909847 (Scopus ID)9781509028238 (ISBN)
Conference
45th International Conference on Parallel Processing, ICPP 2016, 16 August 2016 through 19 August 2016
Note

QC 20161109

Available from: 2016-11-09 Created: 2016-11-02 Last updated: 2017-01-10Bibliographically approved
Rahimian, F., Payberah, A. H., Girdzijauskas, S., Jelasity, M. & Haridi, S. (2015). A Distributed Algorithm for Large-Scale Graph Partitioning. ACM Transactions on Autonomous and Adaptive Systems, 10(2), Article ID 12.
Open this publication in new window or tab >>A Distributed Algorithm for Large-Scale Graph Partitioning
Show others...
2015 (English)In: ACM Transactions on Autonomous and Adaptive Systems, ISSN 1556-4665, E-ISSN 1556-4703, Vol. 10, no 2, article id 12Article in journal (Refereed) Published
Abstract [en]

Balanced graph partitioning is an NP-complete problem with a wide range of applications. These applications include many large-scale distributed problems, including the optimal storage of large sets of graph-structured data over several hosts. However, in very large-scale distributed scenarios, state-of-the-art algorithms are not directly applicable because they typically involve frequent global operations over the entire graph. In this article, we propose a fully distributed algorithm called JA-BE-JA that uses local search and simulated annealing techniques for two types of graph partitioning: edge-cut partitioning and vertex-cut partitioning. The algorithm is massively parallel: There is no central coordination, each vertex is processed independently, and only the direct neighbors of a vertex and a small subset of random vertices in the graph need to be known locally. Strict synchronization is not required. These features allow JA-BE-JA to be easily adapted to any distributed graph-processing system from data centers to fully distributed networks. We show that the minimal edge-cut value empirically achieved by JA-BE-JA is comparable to state-of-the-art centralized algorithms such as Metis. In particular, on large social networks, JA-BE-JA outperforms Metis. We also show that JA-BE-JA computes very low vertex-cuts, which are proved significantly more effective than edge-cuts for processing most real-world graphs.

Place, publisher, year, edition, pages
ACM: , 2015
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-170128 (URN)10.1145/2714568 (DOI)2-s2.0-84930973235 (Scopus ID)
Projects
End to End Clouds
Funder
Swedish Foundation for Strategic Research
Note

QC 20150629

Available from: 2015-06-26 Created: 2015-06-26 Last updated: 2017-12-04Bibliographically approved
Carbone, P., Fóra, G., Ewen, S., Haridi, S. & Tzoumas, K. (2015). Lightweight Asynchronous Snapshots for Distributed Dataflows.
Open this publication in new window or tab >>Lightweight Asynchronous Snapshots for Distributed Dataflows
Show others...
2015 (English)Report (Other academic)
Abstract [en]

Distributed stateful stream processing enables the deployment and execution of large scale continuous computations in the cloud, targeting both low latency and high throughput. One of the most fundamental challenges of this paradigm is providing processing guarantees under potential failures. Existing approaches rely on periodic global state snapshots that can be used for failure recovery. Those approaches suffer from two main drawbacks. First, they often stall the overall computation which impacts ingestion. Second, they eagerly persist all records in transit along with the operation states which results in larger snapshots than required. In this work we propose Asynchronous Barrier Snapshotting (ABS), a lightweight algorithm suited for modern dataflow execution engines that minimises space requirements. ABS persists only operator states on acyclic execution topologies while keeping a minimal record log on cyclic dataflows. We implemented ABS on Apache Flink, a distributed analytics engine that supports stateful stream processing. Our evaluation shows that our algorithm does not have a heavy impact on the execution, maintaining linear scalability and performing well with frequent snapshots. 

Publisher
p. 8
Series
TRITA-ICT ; 2015:08
Keywords
fault tolerance, distributed computing, stream processing, dataflow, cloud computing, state management
National Category
Computer Systems
Research subject
Information and Communication Technology; Computer Science
Identifiers
urn:nbn:se:kth:diva-170185 (URN)978-91-7595-651-0 (ISBN)
Note

QC 20150630

Available from: 2015-06-28 Created: 2015-06-28 Last updated: 2015-06-30Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6718-0144

Search in DiVA

Show all publications