Change search
Refine search result
12 1 - 50 of 87
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Al-Shishtawy, Ahmad
    RISE SICS, Stockholm, Sweden.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS. RISE SICS, Stockholm, Sweden..
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Short-Term Traffic Prediction Using Long Short-Term Memory Neural Networks2018Conference paper (Refereed)
    Abstract [en]

    Short-term traffic prediction allows Intelligent Transport Systems to proactively respond to events before they happen. With the rapid increase in the amount, quality, and detail of traffic data, new techniques are required that can exploit the information in the data in order to provide better results while being able to scale and cope with increasing amounts of data and growing cities. We propose and compare three models for short-term road traffic density prediction based on Long Short-Term Memory (LSTM) neural networks. We have trained the models using real traffic data collected by Motorway Control System in Stockholm that monitors highways and collects flow and speed data per lane every minute from radar sensors. In order to deal with the challenge of scale and to improve prediction accuracy, we propose to partition the road network into road stretches and junctions, and to model each of the partitions with one or more LSTM neural networks. Our evaluation results show that partitioning of roads improves the prediction accuracy by reducing the root mean square error by the factor of 5. We show that we can reduce the complexity of LSTM network by limiting the number of input sensors, on average to 35% of the original number, without compromising the prediction accuracy.

  • 2.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Kalavri, Vasiliki
    Systems Group, ETH, Zurich, Switzerland.
    Carbone, Paris
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Streaming Graph Partitioning: An Experimental Study2018In: Proceedings of the VLDB Endowment, ISSN 2150-8097, E-ISSN 2150-8097, Vol. 11, no 11, p. 1590-1603Article in journal (Refereed)
    Abstract [en]

    Graph partitioning is an essential yet challenging task for massive graph analysis in distributed computing. Common graph partitioning methods scan the complete graph to obtain structural characteristics offline, before partitioning. However, the emerging need for low-latency, continuous graph analysis led to the development of online partitioning methods. Online methods ingest edges or vertices as a stream, making partitioning decisions on the fly based on partial knowledge of the graph. Prior studies have compared offline graph partitioning techniques across different systems. Yet, little effort has been put into investigating the characteristics of online graph partitioning strategies.

    In this work, we describe and categorize online graph partitioning techniques based on their assumptions, objectives and costs. Furthermore, we employ an experimental comparison across different applications and datasets, using a unified distributed runtime based on Apache Flink. Our experimental results showcase that model-dependent online partitioning techniques such as low-cut algorithms offer better performance for communication-intensive applications such as bulk synchronous iterative algorithms, albeit higher partitioning costs. Otherwise, model-agnostic techniques trade off data locality for lower partitioning costs and balanced workloads which is beneficial when executing data-parallel single-pass graph algorithms.

  • 3.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Sigurdsson, Thorsteinn Thorri
    KTH.
    Al-Shishtawy, Ahmad
    RISE Res Inst Sweden, Stockholm, Sweden..
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Evaluation of the Use of Streaming Graph Processing Algorithms for Road Congestion Detection2018In: 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS / [ed] Chen, JJ Yang, LT, IEEE COMPUTER SOC , 2018, p. 1017-1025Conference paper (Refereed)
    Abstract [en]

    Real-time road congestion detection allows improving traffic safety and route planning. In this work, we propose to use streaming graph processing algorithms for road congestion detection and evaluate their accuracy and performance. We represent road infrastructure sensors in the form of a directed weighted graph and adapt the Connected Components algorithm and some existing graph processing algorithms, originally used for community detection in social network graphs, for the task of road congestion detection. In our approach, we detect Connected Components or communities of sensors with similarly weighted edges that reflect different states in the traffic, e.g., free flow or congested state, in regions covered by detected sensor groups. We have adapted and implemented the Connected Components and community detection algorithms for detecting groups in the weighted sensor graphs in batch and streaming manner. We evaluate our approach by building and processing the road infrastructure sensor graph for Stockholm's highways using real-world data from the Motorway Control System operated by the Swedish traffic authority. Our results indicate that the Connected Components and DenGraph community detection algorithms can detect congestion with accuracy up to approximate to 94% for Connected Components and up to approximate to 88% for DenGraph. The Louvain Modularity algorithm for community detection fails to detect congestion regions for sparsely connected graphs, representing roads that we have considered in this study. The Hierarchical Clustering algorithm using speed and density readings is able to detect congestion without details, such as shockwaves.

  • 4.
    Alferez, Mauricio
    et al.
    Univ Luxembourg, Interdisciplinary Ctr Secur Reliabil & Trust SnT, 2 Ave JF Kennedy, L-1855 Luxembourg, Luxembourg..
    Acher, Mathieu
    Univ Rennes, DiverSE Team Inria Rennes, IRISA, CNRS, Rennes, France..
    Galindo, Jose A.
    Univ Seville, Dept Comp Languages & Syst, Seville, Spain..
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Benavides, David
    Univ Seville, Dept Comp Languages & Syst, Seville, Spain..
    Modeling variability in the video domain: language and experience report2019In: Software quality journal, ISSN 0963-9314, E-ISSN 1573-1367, Vol. 27, no 1, p. 307-347Article in journal (Refereed)
    Abstract [en]

    In an industrial project, we addressed the challenge of developing a software-based video generator such that consumers and providers of video processing algorithms can benchmark them on a wide range of video variants. This article aims to report on our positive experience in modeling, controlling, and implementing software variability in the video domain. We describe how we have designed and developed a variability modeling language, called VM, resulting from the close collaboration with industrial partners during 2 years. We expose the specific requirements and advanced variability constructs; we developed and used to characterize and derive variations of video sequences. The results of our experiments and industrial experience show that our solution is effective to model complex variability information and supports the synthesis of hundreds of realistic video variants. From the software language perspective, we learned that basic variability mechanisms are useful but not enough; attributes and multi-features are of prior importance; meta-information and specific constructs are relevant for scalable and purposeful reasoning over variability models. From the video domain and software perspective, we report on the practical benefits of a variability approach. With more automation and control, practitioners can now envision benchmarking video algorithms over large, diverse, controlled, yet realistic datasets (videos that mimic real recorded videos)-something impossible at the beginning of the project.

  • 5.
    Apolonia, Nuno
    et al.
    Universitat Politecnica de Catalunya (UPC) Barcelona, Spain.
    Antaris, Stefanos
    Girdzijauskas, Šarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Pallis, G.
    Dikaiakos, Marios
    SELECT: A distributed publish/subscribe notification system for online social networks2018In: Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 970-979, article id 8425250Conference paper (Refereed)
    Abstract [en]

    Publish/subscribe (pub/sub) mechanisms constitutean attractive communication paradigm in the design of large-scale notification systems for Online Social Networks (OSNs). Toaccommodate the large-scale workloads of notifications producedby OSNs, pub/sub mechanisms require thousands of serversdistributed on different data centers all over the world, incurringlarge overheads. To eliminate the pub/sub resources used, wepropose SELECT - a distributed pub/sub social notificationsystem over peer-to-peer (P2P) networks. SELECT organizesthe peers on a ring topology and provides an adaptive P2Pconnection establishment algorithm where each peer identifiesthe number of connections required, based on the social structureand user availability. This allows to propagate messages to thesocial friends of the users using a reduced number of hops.The presented algorithm is an efficient heuristic to an NP-hard problem which maps workload graphs to structured P2Poverlays inducing overall, close to theoretical, minimal number ofmessages. Experiments show that SELECT reduces the numberof relay nodes up to 89% versus the state-of-the-art pub/subnotification systems. Additionally, we demonstrate the advantageof SELECT against socially-aware P2P overlay networks andshow that the communication between two socially connectedpeers is reduced on average by at least 64% hops, while achieving100% communication availability even under high churn.

  • 6.
    Bahri, Leila
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Carminati, Barbara
    Ferrari, Elena
    Univ Insubria, Dept Theoret & Appl Sci, Varese, Italy..
    Knowledge-based approaches for identity management in online social networks2018In: WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, ISSN 1942-4787, Vol. 8, no 5, article id e1260Article, review/survey (Refereed)
    Abstract [en]

    When we meet a new person, we start by introducing ourselves. We share our names, and other information about our jobs, cities, family status, and so on. This is how socializing and social interactions can start: we first need to identify each other. Identification is a cornerstone in establishing social contacts. We identify ourselves and others by a set of civil (e.g., name, nationality, ID number, gender) and social (e.g., music taste, hobbies, religion) characteristics. This seamlessly carried out identification process in face-to-face interactions is challenged in the virtual realms of socializing, such as in online social network (OSN) platforms. New identities (i.e., online profiles) could be created without being subject to any level of verification, making it easy to create fake information and forge fake identities. This has led to a massive proliferation of accounts that represent fake identities (i.e., not mapping to physically existing entities), and that poison the online socializing environment with fake information and malicious behavior (e.g., child abuse, information stealing). Within this milieu, users in OSNs are left unarmed against the challenging task of identifying the real person behind the screen. OSN providers and research bodies have dedicated considerable effort to the study of the behavior and features of fake OSN identities, trying to find ways to detect them. Some other research initiatives have explored possible techniques to enable identity validation in OSNs. Both kinds of approach rely on extracting knowledge from the OSN, and exploiting it to achieve identification management in their realms. We provide a review of the most prominent works in the literature. We define the problem, provide a taxonomy of related attacks, and discuss the available solutions and approaches for knowledge-based identity management in OSNs. This article is categorized under: Fundamental Concepts of Data and Knowledge > Human Centricity and User Interaction Application Areas> Internet and Web-Based Applications Application Areas> Society and Culture

  • 7.
    Bahri, Leila
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Trust Mends Blockchains: Living up to Expectations2019In: IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Dallas, July 7-10 2019, 2019Conference paper (Refereed)
    Abstract [en]

    At the heart of Blockchains is the trustless leader election mechanism for achieving consensus among pseudoanonymous peers, without the need of oversight from any third party or authority whatsoever. So far, two main mechanisms are being discussed: proof-of-work (PoW) and proof-of-stake (PoS). PoW relies on demonstration of computational power, and comes with the markup of huge energy wastage in return of the stake in cyrpto-currency. PoS tries to address this by relying on owned stake (i.e., amount of crypto-currency) in the system. In both cases, Blockchains are limited to systems with financial basis. This forces non-crypto-currency Blockchain applications to resort to “permissioned” setting only, effectively centralizing the system. However, non-crypto-currency permisionless blockhains could enable secure and self-governed peer-to-peer structures for numerous emerging application domains, such as education and health, where some trust exists among peers. This creates a new possibility for valuing trust among peers and capitalizing it as the basis (stake) for reaching consensus. In this paper we show that there is a viable way for permisionless non-financial Blockhains to operate in completely decentralized environments and achieve leader election through proof-of-trust (PoT). In our PoT construction, peer trust is extracted from a trust network that emerges in a decentralized manner and is used as a waiver for the effort to be spent for PoW, thus dramatically reducing total energy expenditure of the system. Furthermore, our PoT construction is resilient to the risk of small cartels monopolizing the network (as it happens with the mining-pool phenomena in PoW) and is not vulnerable to sybils. We evluate security guarantees, and perform experimental evaluation of our construction, demonstrating up to 10-fold energy savings compared to PoW without trading off any of the decentralization characteristics, with further guarantees against risks of monopolization.

  • 8.
    Bahri, Leila
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    When Trust Saves Energy - A Reference Framework for Proof-of-Trust (PoT) Blockchains2018In: WWW '18 Companion Proceedings of the The Web Conference 2018, ACM Digital Library, 2018, p. 1165-1169Conference paper (Refereed)
    Abstract [en]

    Blockchains are attracting the attention of many technical, financial, and industrial parties, as a promising infrastructure for achieving secure peer-to-peer (P2P) transactional systems. At the heart of blockchains is proof-of-work (PoW), a trustless leader election mechanism based on demonstration of computational power. PoW provides blockchain security in trusless P2P environments, but comes at the expense of wasting huge amounts of energy. In this research work, we question this energy expenditure of PoW under blockchain use cases where some form of trust exists between the peers. We propose a Proof-of-Trust (PoT) blockchain where peer trust is valuated in the network based on a trust graph that emerges in a decentralized fashion and that is encoded in and managed by the blockchain itself. This trust is then used as a waiver for the difficulty of PoW; that is, the more trust you prove in the network, the less work you do.

  • 9.
    Baudry, Benoit
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Harrand, Nicolas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Schulte, E.
    Timperley, C.
    Tan, S. H.
    Selakovic, M.
    Ugherughe, E.
    A spoonful of DevOps helps the GI go down2018In: Proceedings - International Conference on Software Engineering, IEEE Computer Society , 2018, p. 35-36Conference paper (Refereed)
    Abstract [en]

    DevOps emphasizes a high degree of automation at all phases of the software development lifecyle. Meanwhile, Genetic Improvement (GI) focuses on the automatic improvement of software artifacts. In this paper, we discuss why we believe that DevOps offers an excellent technical context for easing the adoption of GI techniques by software developers. We also discuss A/B testing as a prominent and clear example of GI taking place in the wild today, albeit one with human-supervised fitness and mutation operators.

  • 10.
    Borlenghi, Simone
    et al.
    KTH, School of Engineering Sciences (SCI), Applied Physics, Materials and Nanophysics.
    Boman, Magnus
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS. RISE SICS, Electrum 229, SE-16429 Kista, Sweden..
    Delin, Anna
    KTH, School of Industrial Engineering and Management (ITM), Materials Science and Engineering, Applied Material Physics. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Modeling reservoir computing with the discrete nonlinear Schrodinger equation2018In: Physical review. E, ISSN 2470-0045, E-ISSN 2470-0053, Vol. 98, no 5, article id 052101Article in journal (Refereed)
    Abstract [en]

    We formulate, using the discrete nonlinear Schrodinger equation (DNLS), a general approach to encode and process information based on reservoir computing. Reservoir computing is a promising avenue for realizing neuromorphic computing devices. In such computing systems, training is performed only at the output level by adjusting the output from the reservoir with respect to a target signal. In our formulation, the reservoir can be an arbitrary physical system, driven out of thermal equilibrium by an external driving. The DNLS is a general oscillator model with broad application in physics, and we argue that our approach is completely general and does not depend on the physical realization of the reservoir. The driving, which encodes the object to be recognized, acts as a thermodynamic force, one for each node in the reservoir. Currents associated with these thermodynamic forces in turn encode the output signal from the reservoir. As an example, we consider numerically the problem of supervised learning for pattern recognition, using as a reservoir a network of nonlinear oscillators.

  • 11. Bousse, Erwan
    et al.
    Leroy, Dorian
    Combemale, Benoit
    Wimmer, Manuel
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Omniscient debugging for executable DSLs2018In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 137, p. 261-288Article in journal (Refereed)
    Abstract [en]

    Omniscient debugging is a promising technique that relies on execution traces to enable free traversal of the states reached by a model (or program) during an execution. While a few General-Purpose Languages (GPLs) already have support for omniscient debugging, developing such a complex tool for any executable Domain Specific Language (DSL) remains a challenging and error prone task. A generic solution must: support a wide range of executable DSLs independently of the metaprogramming approaches used for implementing their semantics; be efficient for good responsiveness. Our contribution relies on a generic omniscient debugger supported by efficient generic trace management facilities. To support a wide range of executable DSLs, the debugger provides a common set of debugging facilities, and is based on a pattern to define runtime services independently of metaprogramming approaches. Results show that our debugger can be used with various executable DSLs implemented with different metaprogramming approaches. As compared to a solution that copies the model at each step, it is on average sixtimes more efficient in memory, and at least 2.2 faster when exploring past execution states, while only slowing down the execution 1.6 times on average.

  • 12.
    Broman, David
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Hybrid Simulation Safety: Limbos and Zero Crossings2018In: Principles of Modeling: Essays Dedicated to Edward A. Lee on the Occasion of His 60th Birthday, Springer, 2018, p. 106-121Chapter in book (Refereed)
    Abstract [en]

    Physical systems can be naturally modeled by combining continuous and discrete models. Such hybrid models may simplify the modeling task of complex system, as well as increase simulation performance. Moreover, modern simulation engines can often efficiently generate simulation traces, but how do we know that the simulation results are correct? If we detect an error, is the error in the model or in the simulation itself? This paper discusses the problem of simulation safety, with the focus on hybrid modeling and simulation. In particular, two key aspects are studied: safe zero-crossing detection and deterministic hybrid event handling. The problems and solutions are discussed and partially implemented in Modelica and Ptolemy II.

  • 13.
    Broman, David
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Siek, J. G.
    United States.
    Gradually typed symbolic expressions2017In: PEPM 2018 - Proceedings of the ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, Co-located with POPL 2018, Association for Computing Machinery (ACM), 2017, p. 15-29Conference paper (Refereed)
    Abstract [en]

    Embedding a domain-specific language (DSL) in a general purpose host language is an efficient way to develop a new DSL. Various kinds of languages and paradigms can be used as host languages, including object-oriented, functional, statically typed, and dynamically typed variants, all having their pros and cons. For deep embedding, statically typed languages enable early checking and potentially good DSL error messages, instead of reporting runtime errors. Dynamically typed languages, on the other hand, enable flexible transformations, thus avoiding extensive boilerplate code. In this paper, we introduce the concept of gradually typed symbolic expressions that mix static and dynamic typing for symbolic data. The key idea is to combine the strengths of dynamic and static typing in the context of deep embedding of DSLs. We define a gradually typed calculus <*>, formalize its type system and dynamic semantics, and prove type safety. We introduce a host language called Modelyze that is based on <*>, and evaluate the approach by embedding a series of equation-based domain-specific modeling languages, all within the domain of physical modeling and simulation.

  • 14.
    Carbone, Paris
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Scalable and Reliable Data Stream Processing2018Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Data-stream management systems have for long been considered as a promising architecture for fast data management. The stream processing paradigm poses an attractive means of declaring persistent application logic coupled with state over evolving data. However, despite contributions in programming semantics addressing certain aspects of data streaming, existing approaches have been lacking a clear, universal specification for the underlying system execution. We investigate the case of data stream processing as a general-purpose scalable computing architecture that can support continuous and iterative state-driven workloads. Furthermore, we examine how this architecture can enable the composition of reliable, reconfigurable services and complex applications that go even beyond the needs of scalable data analytics, a major trend in the past decade.

    In this dissertation, we specify a set of core components and mechanisms to compose reliable data stream processing systems while adopting three crucial design principles: blocking-coordination avoidance, programming-model transparency, and compositionality. Furthermore, we identify the core open challenges among the academic and industrial state of the art and provide a complete solution using these design principles as a guide. Our contributions address the following problems: I) Reliable Execution and Stream State Management, II) Computation Sharing and Semantics for Stream Windows, and III) Iterative Data Streaming. Several parts of this work have been integrated into Apache Flink, a widely-used, open-source scalable computing framework, and supported the deployment of hundreds of long-running large-scale production pipelines worldwide.

  • 15.
    Castañeda Lozano, Roberto
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS. RISE SICS (Swedish Institute of Computer Science).
    Constraint-Based Register Allocation and Instruction Scheduling2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to improve latency or throughput) are central compiler problems. This dissertation proposes a combinatorial optimization approach to these problems that delivers optimal solutions according to a model, captures trade-offs between conflicting decisions, accommodates processor-specific features, and handles different optimization criteria.

    The use of constraint programming and a novel program representation enables a compact model of register allocation and instruction scheduling. The model captures the complete set of global register allocation subproblems (spilling, assignment, live range splitting, coalescing, load-store optimization, multi-allocation, register packing, and rematerialization) as well as additional subproblems that handle processor-specific features beyond the usual scope of conventional compilers.

    The approach is implemented in Unison, an open-source tool used in industry and research that complements the state-of-the-art LLVM compiler. Unison applies general and problem-specific constraint solving methods to scale to medium-sized functions, solving functions of up to 647 instructions optimally and improving functions of up to 874 instructions. The approach is evaluated experimentally using different processors (Hexagon, ARM and MIPS), benchmark suites (MediaBench and SPEC CPU2006), and optimization criteria (speed and code size reduction). The results show that Unison generates code of slightly to significantly better quality than LLVM, depending on the characteristics of the targeted processor (1% to 9.3% mean estimated speedup; 0.8% to 3.9% mean code size reduction). Additional experiments for Hexagon show that its estimated speedup has a strong monotonic relationship to the actual execution speedup, resulting in a mean speedup of 5.4% across MediaBench applications.

    The approach contributed by this dissertation is the first of its kind that is practical (it captures the complete set of subproblems, scales to medium-sized functions, and generates executable code) and effective (it generates better code than the LLVM compiler, fulfilling the promise of combinatorial optimization). It can be applied to trade compilation time for code quality beyond the usual optimization levels, explore and exploit processor-specific features, and identify improvement opportunities in conventional compilers.

  • 16.
    Castañeda Lozano, Roberto
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS. RISE SICS (Swedish Institute of Computer Science).
    Carlsson, Mats
    RISE SICS (Swedish Institute of Computer Science).
    Hjort Blindell, Gabriel
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Schulte, Christian
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Combinatorial Register Allocation and Instruction Scheduling2018Report (Other academic)
    Abstract [en]

    This paper introduces a combinatorial optimization approach to register allocation and instruction scheduling, two central compiler problems. Combinatorial optimization has the potential to solve these problems optimally and to exploit processor-specific features readily. Our approach is the first to leverage this potential in practice: it captures the complete set of program transformations used in state-of-the-art compilers, scales to medium-sized functions of up to 1000 instructions, and generates executable code. This level of practicality is reached by using constraint programming, a particularly suitable combinatorial optimization technique. Unison, the implementation of our approach, is open source, used in industry, and integrated with the LLVM toolchain.

    An extensive evaluation of estimated speed, code size, and scalability confirms that Unison generates better code than LLVM while scaling to medium-sized functions. The evaluation uses systematically selected benchmarks from MediaBench and SPEC CPU2006 and different processor architectures (Hexagon, ARM, MIPS). Mean estimated speedup ranges from 1% to 9.3% and mean code size reduction ranges from 0.8% to 3.9% for the different architectures. Executing the generated code on Hexagon confirms that the estimated speedup indeed results in actual speedup. Given a fixed time limit, Unison solves optimally functions of up to 647 instructions, delivers improved solutions for functions of up to 874 instructions, and achieves more than 85% of the potential speed for 90% of the functions on Hexagon.

    The results in this paper show that our combinatorial approach can be used in practice to trade compilation time for code quality beyond the usual compiler optimization levels, fully exploit processor-specific features, and identify improvement opportunities in existing heuristic algorithms.

  • 17.
    Castañeda Lozano, Roberto
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS. RISE SICS (Swedish Institute of Computer Science).
    Schulte, Christian
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Survey on Combinatorial Register Allocation and Instruction Scheduling2018In: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 1557-7341Article in journal (Refereed)
    Abstract [en]

    Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time.

    This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization.

  • 18.
    Castegren, Elias
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Clarke, D.
    Fernandez-Reyes, K.
    Wrigstad, T.
    Yang, A. M.
    Attached and detached closures in actors2018In: AGERE 2018 - Proceedings of the 8th ACM SIGPLAN International Workshop on Programming Based on Actors, Agents, and Decentralized Control, co-located with SPLASH 2018, Association for Computing Machinery (ACM), 2018, p. 54-61Conference paper (Refereed)
    Abstract [en]

    Expressive actor models combine aspects of functional programming into the pure actor model enriched with futures. Such functional features include first-class closures which can be passed between actors and chained on futures. Combined with mutable objects, this opens the door to race conditions. In some situations, closures may not be evaluated by the actor that created them yet may access fields or objects owned by that actor. In other situations, closures may be safely fired off to run as a separate task. This paper discusses the problem of who can safely evaluate a closure to avoid race conditions, and presents the current solution to the problem adopted by the Encore language. The solution integrates with Encore’s capability type system, which influences whether a closure is attached and must be evaluated by the creating actor, or whether it can be detached and evaluated independently of its creator. Encore’s current solution to this problem is not final or optimal. We conclude by discussing a number of open problems related to dealing with closures in the actor model.

  • 19.
    Castegren, Elias
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Wrigstad, Tobias
    Uppsala Univ, Uppsala, Sweden..
    OOlong: A Concurrent Object Calculus for Extensibility and Reuse2018In: ACM SIGAPP Applied Computing Review, ISSN 1559-6915, E-ISSN 1931-0161, Vol. 18, no 4, p. 47-60Article in journal (Refereed)
    Abstract [en]

    We present OOlong, an object calculus with interface inheritance, structured concurrency and locks. The goal of the calculus is extensibility and reuse. The semantics are therefore available in a version for LATEX typesetting (written in Ott), a mechanised version for doing rigorous proofs in Coq, and a prototype interpreter (written in OCaml) for typechecking an running OOlong programs.

  • 20.
    Corcoran, Diarmuid
    et al.
    KTH. Ericsson AB.
    Andimeh, Loghman
    Ericsson AB.
    Ermedahl, Andreas
    Ericsson AB.
    Kreuger, Per
    RISE SICS AB.
    Schulte, Christian
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Data Driven Selection of DRX for Energy Efficient 5G RAN2017In: 2017 13TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), IEEE , 2017Conference paper (Refereed)
    Abstract [en]

    The number of connected mobile devices is increasing rapidly with more than 10 billion expected by 2022. Their total aggregate energy consumption poses a significant concern to society. The current 3gpp (3rd Generation Partnership Project) LTE/LTE-Advanced standard incorporates an energy saving technique called discontinuous reception (DRX). It is expected that 5G will use an evolved variant of this scheme. In general, the single selection of DRX parameters per device is non trivial. This paper describes how to improve energy efficiency of mobile devices by selecting DRX based on the traffic profile per device. Our particular approach uses a two phase data-driven strategy which tunes the selection of DRX parameters based on a smart fast energy model. The first phase involves the off-line selection of viable DRX combinations for a particular traffic mix. The second phase involves an on-line selection of DRX from this viable list. The method attempts to guarantee that latency is not worse than a chosen threshold. Alternatively, longer battery life for a device can be traded against increased latency. We built a lab prototype of the system to verify that the technique works and scales on a real LTE system. We also designed a sophisticated traffic generator based on actual user data traces. Complementary method verification has been made by exhaustive off-line simulations on recorded LTE network data. Our approach shows significant device energy savings, which has the aggregated potential over billions of devices to make a real contribution to green, energy efficient networks.

  • 21. Cremona, F.
    et al.
    Lee, E. A.
    Lohstroh, M.
    Masin, M.
    Broman, David
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Tripakis, S.
    Hybrid Co-simulation: It's about time2018In: Proceedings - 21st ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, MODELS 2018, Association for Computing Machinery, Inc , 2018Conference paper (Refereed)
    Abstract [en]

    Model-based design methodologies are commonly used in industry for the development of cyber-physical systems (CPSs). There are many different languages, tools, and formalisms for model-based design, each with its strengths and weaknesses. Instead of accepting the weaknesses of a particular tool, an alternative is to embrace heterogeneity and develop tool integration platforms and protocols to leverage the strengths from other environments. A fairly recent attempt in this direction is an open interface standard called Functional Mock-up Interface (FMI), which is focused on the convenient exchange and co-simulation of simulation models (Functional Mock-up Units, FMUs), primarily between component suppliers and OEMs. As it stands, FMI has reached acceptance in industry, but its specification (version 2.0) provides only limited support for hybrid co-simulation-simulating systems that mix continuous and discrete behaviors, which are commonly used to model CPSs. This paper identifies FMI's time representation based on floating-point numbers as a key problem, because it does not support well the discrete events that typically occur at the cyber-physical boundary; it is only suitable for modeling continuous dynamics without discrete behaviors. While time is a central concept in reasoning about the physical world, it is largely abstracted away when reasoning about the cyber world. As a result, the engineering methods for CPSs have misaligned abstractions between the physics domain, the mathematical domain used to model physics, the computational domain used to implement these mathematical abstractions for simulation, and the computational domain used on the cyber side of CPSs. The most common resolution for this conundrum is to adopt the naive Newtonian ideal model of time, where time is a real number known everywhere and advancing uniformly. But ironically, Newtonian time proves not so practical for hybrid co-simulation. The obvious reason is that digital computers do not work with real numbers. Whereas real numbers have infinite precision, their floating-point representation does not. This discrepancy leads to unpredictable quantization errors that may accumulate. Although real numbers can be compared for equality (e.g., to define “simultaneity”), it rarely makes sense to do so for floating-point numbers. We show that the approach taken in FMI (and many other modeling frameworks) that embraces a naive Newtonian physical model of time, and a cyber-approximation of this model using floating-point numbers, is inadequate for CPSs; it leads to models with unnecessarily inexplicable, nondeterministic, and complex behaviors. Our analysis concludes that a model of time that solely uses integers solves many of these problems. Specifically, we propose to use a 64-bit unsigned integer representation with arbitrary resolution, given as a power of ten, allowing model parameters specified in decimal to be represented exactly (granted ample resolution). Integer arithmetic is computationally efficient, and, for well-chosen resolutions, this representation will tolerate very long simulations without overflow. It is also easily converted to and from floating-point representations, albeit not losslessly. Given the vast range of time scales used across different simulation models, we believe that choosing a fixed universal time resolution does not make sense. Instead, we describe an algorithm that picks an adequate time resolution for a particular model and we provide procedures for time quantization needed to reconcile discrepacies between internal time representations of co-simulated FMUs. We propose concrete extensions to the FMI standard for the support of hybrid co-simulation that enable the use of integer time, automatic choice of time resolution, and the use of absent signals. We explain in detail how these extensions can be implemented mod-ularly within the frameworks of existing simulation environments and with support for legacy FMUs and superdense time.

  • 22.
    Durieux, Thomas
    et al.
    Univ Lille, Lille, France.;INRIA, Le Chesnay, France..
    Hamadi, Youssef
    Ecole Polytech, Palaiseau, France..
    Yu, Zhongxing
    Univ Lille, Lille, France.;INRIA, Le Chesnay, France..
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Monperrus, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Theoretical Computer Science, TCS.
    Exhaustive Exploration of the Failure-oblivious Computing Search Space2018In: 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST), IEEE Press, 2018, p. 139-149Conference paper (Refereed)
    Abstract [en]

    High-availability of software systems requires automated handling of crashes in presence of errors. Failure-oblivious computing is one technique that aims to achieve high availability. We note that failure-obliviousness has not been studied in depth yet, and there is very few study that helps understand why failure-oblivious techniques work. In order to make failure-oblivious computing to have an impact in practice, we need to deeply understand failure-oblivious behaviors in software. In this paper, we study, design and perform an experiment that analyzes the size and the diversity of the failure-oblivious behaviors. Our experiment consists of exhaustively computing the search space of 16 field failures of large-scale open-source Java software. The outcome of this experiment is a much better understanding of what really happens when failure-oblivious computing is used, and this opens new promising research directions.

  • 23.
    Edman, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Sequential Pattern Mining on Electronic Medical Records for Finding Optimal Clinical Pathways2018Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Electronic Medical Records (EMRs) are digital versions of paper charts, used to record the treatment of different patients in hospitals. Clinical pathways are used as guidelines for how to treat different diseases, determined by observing outcomes from previous treatments. Sequential pattern mining is a version of data mining where the data mined is organized in sequences. It is a common research topic in data mining with many new variations on existing algorithms being introduced frequently. In a previous report, the sequential pattern mining algorithm PrefixSpan was used to mine patterns in EMRs to verify or suggest new clinical pathways. It was found to only be able to verify pathways partially. One of the reasons stated for this was that PrefixSpan was too inefficient to be able to mine at a low enough support to consider some items. In this report CSpan is used instead, since it is supposed to outperform PrefixSpan by up to two orders of magnitude, in order to improve runtime and thereby address the problems mentioned in the previous work. The results show that CSpan did indeed improve the runtime and the algorithm was able to mine at a lower minimum support. However, the output was only barely improved.

  • 24.
    Engelin, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    CapsNet Comprehension of Objects in Different Rotational Views: A comparative study of capsule and convolutional networks2018Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Capsule network (CapsNet) is a new and promising approach to computer vision. In the small amount of research published so far, it has shown to be good at generalizing complex objects and perform well even when the images are skewed or the objects are seen from unfamiliar viewpoints. This thesis further tests this ability of CapsNetby comparing it to convolutional networks (ConvNets) on the task to understand images of clothing in different rotational views. Even though the ConvNets have a higher classification accuracy than CapsNets, the results indicate that CapsNets are better at understanding the clothes when viewed in different rotational views.

  • 25.
    Fernando, Tharidu
    et al.
    KTH.
    Gureev, Nikita
    KTH.
    Matskin, Mihhail
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Zwick, M.
    Natschlager, T.
    WorkflowDSL: Scalable Workflow Execution with Provenance for Data Analysis Applications2018In: Proceedings - International Computer Software and Applications Conference, IEEE Computer Society , 2018, p. 774-779Conference paper (Refereed)
    Abstract [en]

    Data analysis projects typically use different programming languages (from Python for prototyping to C++ for support of runtime constraints) at their different stages by different experts. This creates a need for a data processing framework that is re-usable across multiple programming languages and supports collaboration of experts. In this work, we discuss implementation of a framework which uses a Domain Specific Language (DSL), called WorkflowDSL, that enables domain experts to collaborate on fine-tuning workflows. The framework includes support for parallel execution without any specialized code. It also provides a provenance capturing framework that enables users to analyse past executions and retrieve complete lineage of any data item generated. Graph database is used for storing provenance data. Advantages of usage of a graph database compare to relational databases are demonstrated. Experiments which were performed using a real-world scientific workflow from the bioinformatics domain and industrial data analysis models show that users were able to execute workflows efficiently when using WorkflowDSL for workflow composition and Python for task implementations. Moreover, we show that capturing provenance data can be useful for analysing past workflow executions.

  • 26.
    Gammerman, Alexander
    et al.
    Royal Holloway Univ London, Egham, Surrey, England..
    Vovk, Vladimir
    Royal Holloway Univ London, Egham, Surrey, England..
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Carlsson, Lars
    Stena Line AB, Gothenburg, Sweden..
    Conformal and probabilistic prediction with applications: editorial2019In: Machine Learning, ISSN 0885-6125, E-ISSN 1573-0565, Vol. 108, no 3, p. 379-380Article in journal (Other academic)
  • 27.
    Ghoorchian, Kambiz
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Spatio-Temporal Multiple Geo-Location Identification on Twitter2018In: Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 / [ed] Abe, N Liu, H Pu, C Hu, X Ahmed, N Qiao, M Song, Y Kossmann, D Liu, B Lee, K Tang, J He, J Saltz, J, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 3412-3421Conference paper (Refereed)
    Abstract [en]

    Twitter Geo-tags that indicate the exact location of messages have many applications from localized opinion mining during elections to efficient traffic management in critical situations. However, less than 6% of Tweets are Geo-tagged, which limits the implementation of those applications. There are two groups of solutions: content and network-based. The first group uses location indicative factors like URLs and topics, extracted from the content of tweets, to infer Geo-location for non geoactive users, whereas the second group benefits from friendship ties in the underlying social network graph. Friendship ties are better predictors compared to content information because they are less noisy and often follow the natural human spatial movement patterns. However, their prediction's accuracy is still limited because they ignore the temporal aspects of human behavior and always assume a single location per user. This research aims to extend the current network-based approaches by taking users' temporal dimension into account. We assume multiple locations per user during different time-slots and hypothesize that location predictability varies depending on the time and the properties of the social membership group. Thus, we propose a hierarchical solution to apply temporal categorizations on top of social network partitioning for multiple location prediction for users in Online Social Networks (OSNs) like Twitter. Given a largescale Twitter dataset, we show that users' location predictability exhibits different behavior in different time-slots and different social groups. We find that there are specific conditions where users are more predictable in terms of Geo-location. Our solution outperforms the state-of-the-art by improving the prediction accuracy by 16:6% in terms of Median Error Distance (MED) over the same recall.

  • 28. Gomes, Claudio
    et al.
    Thule, Casper
    Broman, David
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Larsen, Peter Gorm
    Vangheluwe, Hans
    Co-Simulation: A Survey2018In: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 1557-7341, Vol. 51, no 3, article id 49Article in journal (Refereed)
    Abstract [en]

    Modeling and simulation techniques are today extensively used both in industry and science. Parts of larger systems are, however, typically modeled and simulated by different techniques, tools, and algorithms. In addition, experts from different disciplines use various modeling and simulation techniques. Both these facts make it difficult to study coupled heterogeneous systems. Co-simulation is an emerging enabling technique, where global simulation of a coupled system can be achieved by composing the simulations of its parts. Due to its potential and interdisciplinary nature, cosimulation is being studied in different disciplines but with limited sharing of findings. In this survey, we study and survey the state-of-the-art techniques for co-simulation, with the goal of enhancing future research and highlighting the main challenges. To study this broad topic, we start by focusing on discrete-event-based co-simulation, followed by continuous-time-based co-simulation. Finally, we explore the interactions between these two paradigms, in hybrid co-simulation. To survey the current techniques, tools, and research challenges, we systematically classify recently published research literature on co-simulation, and summarize it into a taxonomy. As a result, we identify the need for finding generic approaches for modular, stable, and accurate coupling of simulation units, as well as expressing the adaptations required to ensure that the coupling is correct.

  • 29.
    Gomez-Boix, Alejandro
    et al.
    Univ Rennes, INRIA, CNRS, IRISA, Rennes, France. audry, Benoit.
    perdrix, Pierre
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Hiding in the Crowd: an Analysis of the Effectiveness of Browser ngerprinting at Large Scale2018In: WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WW2018), Association for Computing Machinery (ACM), 2018, p. 309-318Conference paper (Refereed)
    Abstract [en]

    Browser fingerprinting is a stateless technique, which consists in collecting a wide range of data about a device through browser APIs. Past studies have demonstrated that modern devices present so much diversity that fingerprints can be exploited to identify and track users online. With this work, we want to evaluate if browser fingerprinting is still effective at uniquely identifying a large group of users when analyzing millions of fingerprints over a few months. We collected 2,067,942 browser fingerprints from one of the top 15 French websites. The analysis of this novel dataset sheds a new light on the ever-growing browser fingerprinting domain. The key insight is that the percentage of unique fingerprints in our dataset is much lower than what was reported in the past: only 33.6% of fingerprints are unique by opposition to over 80% in previous studies. We show that non-unique fingerprints tend to be fragile. If some features of the fingerprint change, it is very probable that the fingerprint will become unique. We also confirm that the current evolution of web technologies is benefiting users' privacy significantly as the removal of plugins brings down substantively the rate of unique desktop machines.

  • 30.
    Hakimzadeh, Kamal
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dowling, Jim
    Logical Clocks AB, Stockholm, Sweden.
    Karamel: A System for Timely Provisioning Large-Scale Software Across IaaS Clouds2019Conference paper (Refereed)
    Abstract [en]

    Cloud-native systems and application software platforms are becoming increasingly complex, and, ideally, they are expected to be quick to launch, elastic, portable across different cloud environments and easily managed. However, as cloud applications increase in complexity, so do the resultant challenges in configuring such applications, and orchestrating the deployment of their constituent services on potentially different cloud operating systems and environments.

    This paper presents a new orchestration system called Karamel that addresses these challenges by providing a cloud-independent orchestration service for deploying and configuring cloud applications and platforms across different environments. In Karamel, we model configuration routines with their dependencies in composable modules, and we achieve a high level of configuration/deployment parallelism by using techniques such as DAG traversal control logic, dataflow variable binding, and parallel actuation. In Karamel, complex distributed deployments are specified declaratively in a compact YAML syntax, and cluster definitions can be validated using an external artifact repository (GitHub).

  • 31.
    Hakimzadeh, Kamal
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dowling, Jim
    Logical Clocks AB, Stockholm, Sweden.
    Ops-Scale: Scalable and Elastic Cloud Operations by a Functional Abstraction and Feedback Loops2019Conference paper (Refereed)
    Abstract [en]

    Recent research has proposed new techniques to streamline the autoscaling of cloud applications, but little effort has been made to advance configuration management (CM) systems for such elastic operations. Existing practices use CM systems, from the DevOps paradigm, to automate operations. However, these practices still require human intervention to program ad hoc procedures to fully automate reconfiguration. Moreover, even after careful programming of cloud operations, the backing models are insufficient for re-running such programs unchanged in other platforms---which implies an overhead in rewriting the programs.

    We argue that CM programs can be designed to be deployment-agnostic and highly elastic with well-defined abstractions. In this paper, we introduce our abstraction based on declarative functional programming, and we demonstrate it using a feedback loop control mechanism. Our proposal, called Ops-Scale, is a family of cloud operations that are derived by making a functional abstraction over existing configuration programs. The hypothesis in this paper is twofold: 1) it should be possible to make a highly declarative CM system rich enough to capture fine-grained reconfigurations of autoscaling automatically, and; 2) that a program written for a specific deployment can be re-used in other deployments. To test this hypothesis, we have implemented an open source configuration engine called Karamel that is already used in industry for large-scale cluster deployments. Results show that at scale Ops-Scale can capture a polynomial order of reconfiguration growth in a fully automated manner. In practice, recent deployments have demonstrated that Karamel can provision clusters of 100 virtual machines consisting of many-layers distributed services on Google's IaaS Cloud in 'less than 10 minutes'.

  • 32.
    Hakimzadeh, Kamal
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Nicholson, P. K.
    Lugones, D.
    Auto-scaling with apprenticeship learning2018In: SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing, Association for Computing Machinery (ACM), 2018, p. 512-512Conference paper (Refereed)
  • 33.
    Halin, Axel
    et al.
    Univ Namur, Fac Comp Sci, NaDI, PReCISE, Namur, Belgium..
    Nuttinck, Alexandre
    CETIC, Charleroi, Belgium..
    Acher, Mathieu
    Univ Rennes, IRISA, CNRS, INRIA, Rennes, France..
    Devroey, Xavier
    Delft Univ Technol, SERG, Delft, Netherlands..
    Perrouin, Gilles
    Univ Namur, Fac Comp Sci, NaDI, PReCISE, Namur, Belgium..
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Test them all, is it worth it?: Assessing configuration sampling on the JHipster Web development stack2019In: Journal of Empirical Software Engineering, ISSN 1382-3256, E-ISSN 1573-7616, Vol. 24, no 2, p. 674-717Article in journal (Refereed)
    Abstract [en]

    Many approaches for testing configurable software systems start from the same assumption: it is impossible to test all configurations. This motivated the definition of variability-aware abstractions and sampling techniques to cope with large configuration spaces. Yet, there is no theoretical barrier that prevents the exhaustive testing of all configurations by simply enumerating them if the effort required to do so remains acceptable. Not only this: we believe there is a lot to be learned by systematically and exhaustively testing a configurable system. In this case study, we report on the first ever endeavour to test all possible configurations of the industry-strength, open source configurable software system JHipster, a popular code generator for web applications. We built a testing scaffold for the 26,000+ configurations of JHipster using a cluster of 80 machines during 4 nights for a total of 4,376 hours (182 days) CPU time. We find that 35.70% configurations fail and we identify the feature interactions that cause the errors. We show that sampling strategies (like dissimilarity and 2-wise): (1) are more effective to find faults than the 12 default configurations used in the JHipster continuous integration; (2) can be too costly and exceed the available testing budget. We cross this quantitative analysis with the qualitative assessment of JHipster's lead developers.

  • 34.
    Hammar, Kim
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Deep Text Mining of Instagram Data Without Strong Supervision2018Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    With the advent of social media, our online feeds increasingly consist of short, informal, and unstructured text. This data can be analyzed for the purpose of improving user recommendations and detecting trends. The grand volume of unstructured text that is available makes the intersection of text processing and machine learning a promising avenue of research. Current methods that use machine learning for text processing are in many cases dependent on annotated training data. However, considering the heterogeneity and variability of social media, obtaining strong supervision for social media data is in practice both difficult and expensive. In light of this limitation, a belief that has put its marks on this thesis is that the study of text mining methods that can be applied without strong supervision is of a higher practical interest.

    This thesis investigates unsupervised methods for scalable processing of text from social media. Particularly, the thesis targets a classification and extraction task in the fashion domain on the image-sharing platform Instagram. Instagram is one of the largest social media platforms, containing both text and images. Still, research on text processing in social media is to a large extent limited to Twitter data, and little attention has been paid to text mining of Instagram data. The aim of this thesis is to broaden the scope of state-of-the-art methods for information extraction and text classification to the unsupervised setting, working with informal text on Instagram. Its main contributions are (1) an empirical study of text from Instagram; (2) an evaluation of word embeddings for Instagram text; (3) a distributed implementation of the FastText algorithm; (4) a system for fashion attribute extraction in Instagram using word embeddings; and (5) a multi-label clothing classifier for Instagram text, built with deep learning techniques and minimal supervision.

    The empirical study demonstrates that the text distribution on Instagram exhibits the long-tail phenomenon, that the text is just as noisy as have been reported in studies on Twitter text, and that comment sections are multi-lingual. In experiments with word embeddings for Instagram, the importance of hyperparameter tuning is manifested and a mismatch between pre-trained embeddings and social media is observed. Furthermore, that word embeddings are a useful asset for information extraction is confirmed. Experimental results show that word embeddings beats a baseline that uses Levenshtein distance on the task of extracting fashion attributes from Instagram. The results also show that the distributed implementation of FastText reduces the time it takes to train word embeddings with a factor that scales with the number of machines used for training. Finally, our research demonstrates that weak supervision can be used to train a deep classifier, achieving an F1 score of 0.61 on the task of classifying clothes in Instagram posts based only on the associated text, which is on par with human performance.

  • 35.
    Hammar, Kim
    et al.
    KTH.
    Jaradat, Shatha
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dokoohaki, Nima
    KTH.
    Matskin, Mihhail
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Deep Text Mining of Instagram Data Without Strong Supervision2018In: Proceedings - 2018 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2018, IEEE, 2018, p. 158-165Conference paper (Refereed)
    Abstract [en]

    With the advent of social media, our online feeds increasingly consist of short, informal, and unstructured text. This textual data can be analyzed for the purpose of improving user recommendations and detecting trends. Instagram is one of the largest social media platforms, containing both text and images. However, most of the prior research on text processing in social media is focused on analyzing Twitter data, and little attention has been paid to text mining of Instagram data. Moreover, many text mining methods rely on annotated training data, which in practice is both difficult and expensive to obtain. In this paper, we present methods for unsupervised mining of fashion attributes from Instagram text, which can enable a new kind of user recommendation in the fashion domain. In this context, we analyze a corpora of Instagram posts from the fashion domain, introduce a system for extracting fashion attributes from Instagram, and train a deep clothing classifier with weak supervision to classify Instagram posts based on the associated text. With our experiments, we confirm that word embeddings are a useful asset for information extraction. Experimental results show that information extraction using word embeddings outperforms a baseline that uses Levenshtein distance. The results also show the benefit of combining weak supervision signals using generative models instead of majority voting. Using weak supervision and generative modeling, an F1 score of 0.61 is achieved on the task of classifying the image contents of Instagram posts based solely on the associated text, which is on level with human performance. Finally, our empirical study provides one of the few available studies on Instagram text and shows that the text is noisy, that the text distribution exhibits the long-tail phenomenon, and that comment sections on Instagram are multi-lingual.

  • 36.
    Håkansson, Anne
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Ipsum - An Approach to Smart Volatile ICT-Infrastructures for Smart Cities and Communities2018In: Procedia Computer Science, Elsevier B.V. , 2018, p. 2107-2116Conference paper (Refereed)
    Abstract [en]

    Information and Communication Technology (ICT)-infrastructures are increasingly important for enabling technology within Smart society with Smart cities and communitieS. An ICT-infrastructure handles data and information and encompasses devices and networks, protocols and procedures including Internet, Internet of Things and Cyber-Physical Systems. The current challenges of ICT-infrastructures are delivering Services and applications that are requested by users, such as residents, public organisations and institutionS. These Services and applications must be combined to enhance and enrich the environment and provide personalised ServiceS. This requires radical changes in technology, such as dynamic ICT-infrastructures, which should dynamically provide requested Services to be able to build Smart societieS. This paper is about pursuing Smart and connected cities and communities by creating Smart volatile ICT-infrastructures for Smart cities and communities, called Ipsum. The infrastructure is of multidisciplinary art and includes different kinds of hardware, software, artificial intelligence techniques depending on the available parts and the Services to be delivered. The goal is to provide a powerful and smart, and cost-saving volatile ICT-infrastructure with person-centred, ubiquitous and malleable parts, i.e., devices, sensors and ServiceS. Volatile means in real-time constitute a volatile network of devices and deploying it into cities and communitieS. Ipsum will be Smart everywhere by collaborating with several different hardware and software Systems and cooperating to perform complex taskS. By including ubiquitous and malleable parts in the infrastructure, Ipsum can facilitate an informed and engaged populace.

  • 37. Ingmar, Linnea
    et al.
    Schulte, Christian
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Making Compact-Table Compact2018In: 24th International Conference on the Principles and Practice of Constraint Programming, CP 2018, Springer, 2018, Vol. 11008, p. 210-218Conference paper (Refereed)
    Abstract [en]

    The compact-table propagator for table constraints appears to be a strong candidate for inclusion into any constraint solver due to its efficiency and simplicity. However, successful integration into a constraint solver based on copying rather than trailing is not obvious: while the underlying bit-set data structure is sparse for efficiency it is not compact for memory, which is essential for a copying solver. The paper introduces techniques to make compact-table an excellent fit for a copying solver. The key is to make sparse bit-sets dynamically compact (only their essential parts occupy memory and their implementation is dynamically adapted during search) and tables shared (their read-only parts are shared among copies). Dynamically compact bit-sets reduce peak memory by 7.2% and runtime by 13.6% on average and by up to 66.3% and 33.2%. Shared tables even further reduce runtime and memory usage. The reduction in runtime exceeds the reduction in memory and a cache analysis indicates that our techniques might also be beneficial for trailing solvers. The proposed implementation has replaced Gecode’s original implementations as it runs on average almost an order of magnitude faster while using half the memory.

  • 38.
    Ismail, Mahmoud
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Ronström, Mikael
    Haridi, Seif
    Dowling, Jim
    ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata2019In: 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2019, p. 92-101Conference paper (Refereed)
    Abstract [en]

    Distributed OLTP databases are now used to manage metadata for distributed file systems, but they cannot also efficiently support complex queries or aggregations. To solve this problem, we introduce ePipe, a databus that both creates a consistent change stream for a distributed, hierarchical file system (HopsFS) and eventually delivers the correctly ordered stream with low latency to downstream clients. ePipe can be used to provide polyglot storage for file system metadata, allowing metadata queries to be handled by the most efficient engine for that query. For file system notifications, we show that ePipe achieves up to 56X throughput improvement over HDFS INotify and Trumpet with up to 3 orders of magnitude lower latency. For Spotify’s Hadoop workload, we show that ePipe can replicate all file system changes from HopsFS to Elasticsearch with an average replication lag of only 330 ms.

  • 39.
    Issa, Shady
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS. INESC-ID, Instituto Superior Tecnico, Universidade de Lisboa.
    Techniques for Enhancing the Efficiency of Transactional Memory Systems2018Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Transactional Memory (TM) is an emerging programming paradigm that drastically simplifies the development of concurrent applications by relieving programmers from a major source of complexity: how to ensure correct, yet efficient, synchronization of concurrent accesses to shared memory. Despite the large body of research devoted to this area, existing TM systems still suffer from severe limitations that hamper both their performance and energy efficiency.

    This dissertation tackles the problem of how to build efficient implementations of the TM abstraction by introducing innovative techniques that address three crucial limitations of existing TM systems by: (i) extending the effective capacity of Hardware TM (HTM) implementations; (ii) reducing the synchronization overheads in Hybrid TM (HyTM) systems; (iii) enhancing the efficiency of TM applications via energy-aware contention management schemes.

    The first contribution of this dissertation, named POWER8-TM (P8TM), addresses what is arguably one of the most compelling limitations of existing HTM implementations: the inability to process transactions whose footprint exceeds the capacity of the processor's cache. By leveraging, in an innovative way, two hardware features provided by IBM POWER8 processors, namely Rollback-only Transactions and Suspend/Resume, P8TM can achieve up to 7x performance gains in workloads that stress the capacity limitations of HTM.

    The second contribution is Dynamic Memory Partitioning-TM (DMP-TM), a novel Hybrid TM (HyTM) that offloads the cost of detecting conflicts between HTM and Software TM (STM) to off-the-shelf operating system memory protection mechanisms. DMP-TM's design is agnostic to the STM algorithm and has the key advantage of allowing for integrating, in an efficient way, highly scalable STM implementations that would, otherwise, demand expensive instrumentation of the HTM path. This allows DMP-TM to achieve up to 20x speedups compared to state of the art HyTM solutions in uncontended workloads.

    The third contribution, Green-CM, is an energy-aware Contention Manager (CM) that has two main innovative aspects: (i) a novel asymmetric design, which combines different back-off policies in order to take advantage of Dynamic Frequency and Voltage Scaling (DVFS) hardware capabilities, available in most modern processors; (ii) an energy efficient implementation of a fundamental building block for many CM implementations, namely, the mechanism used to back-off threads for a predefined amount of time. Thanks to its innovative design, Green-CM can reduce the Energy Delay Product by up to 2.35x with respect to state of the art CMs.

    All the techniques proposed in this dissertation share an important common feature that is essential to preserve the ease of use of the TM abstraction: the reliance on on-line self-tuning mechanisms that ensure robust performance even in presence of heterogeneous workloads, without requiring prior knowledge of the target workloads or architecture.

  • 40.
    Jaradat, Shatha
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dokoohaki, Nima
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Hammar, Kim
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Matskin, Mihhail
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Matskin, Mihhail
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dynamic CNN models for fashion recommendation in Instagram2019In: Proceedings - 16th IEEE International Symposium on Parallel and Distributed Processing with Applications, 17th IEEE International Conference on Ubiquitous Computing and Communications, 8th IEEE International Conference on Big Data and Cloud Computing, 11th IEEE International Conference on Social Computing and Networking and 8th IEEE International Conference on Sustainable Computing and Communications, ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018, Institute of Electrical and Electronics Engineers Inc. , 2019, p. 1144-1151Conference paper (Refereed)
    Abstract [en]

    Instagram as an online photo-sharing and socialnetworking service is becoming more powerful in enabling fashion brands to ramp up their business growth. Nowadays, a single post by a fashion influencer attracts a wealth of attention and a magnitude of followers who are curious to know more about the brands and style of each clothing item sitting inside the image. To this end, the development of efficient Deep CNN models that can accurately detect styles and brands have become a research challenge. In addition, current techniques need to cope with inherent fashion-related data issues. Namely, clothing details inside a single image only cover a small proportion of the large and hierarchical space of possible brands and clothing item attributes. In order to cope with these challenges, one can argue that neural classifiers should become adapted to large-scale and hierarchical fashion datasets. As a remedy, we propose two novel techniques to incorporate the valuable social media textual content to support the visual classification in a dynamic way. The first method is adaptive neural pruning (DynamicPruning) in which the clothing item category detected from posts' text analysis can be used to activate the possible range of connections of clothing attributes' classifier. The second method (DynamicLayers) is a dynamic framework in which multiple-attributes classification layers exist and a suitable attributes' classifier layer is activated dynamically based upon the mined text from the image. Extensive experiments on a dataset gathered from Instagram and a baseline fashion dataset (DeepFashion) have demonstrated that our approaches can improve the accuracy by about 20% when compared to base architectures. It is worth highlighting that with Dynamiclayers we have gained 35% accuracy for the task of multi-class multilabeled classification compared to the other model.

  • 41.
    Jaradat, Shatha
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dokoohaki, Nima
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Hammar, Kim
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Wara, Ummal
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Matskin, Mihhail
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dynamic CNN Models For Fashion Recommendation in Instagram2018In: 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS / [ed] Chen, JJ Yang, LT, IEEE COMPUTER SOC , 2018, p. 1144-1151Conference paper (Refereed)
    Abstract [en]

    Instagram as an online photo-sharing and social-networking service is becoming more powerful in enabling fashion brands to ramp up their business growth. Nowadays, a single post by a fashion influencer attracts a wealth of attention and a magnitude of followers who are curious to know more about the brands and style of each clothing item sitting inside the image. To this end, the development of efficient Deep CNN models that can accurately detect styles and brands have become a research challenge. In addition, current techniques need to cope with inherent fashion-related data issues. Namely, clothing details inside a single image only cover a small proportion of the large and hierarchical space of possible brands and clothing item attributes. In order to cope with these challenges, one can argue that neural classifiers should become adapted to large-scale and hierarchical fashion datasets. As a remedy, we propose two novel techniques to incorporate the valuable social media textual content to support the visual classification in a dynamic way. The first method is adaptive neural pruning (DynamicPruning) in which the clothing item category detected from posts' text analysis can be used to activate the possible range of connections of clothing attributes' classifier. The second method (DynamicLayers) is a dynamic framework in which multiple-attributes classification layers exist and a suitable attributes' classifier layer is activated dynamically based upon the mined text from the image. Extensive experiments on a dataset gathered from Instagram and a baseline fashion dataset (DeepFashion) have demonstrated that our approaches can improve the accuracy by about 20% when compared to base architectures. It is worth highlighting that with Dynamiclayers we have gained 35% accuracy for the task of multi-class multi-labeled classification compared to the other model.

  • 42. Johansson, U.
    et al.
    Linusson, H.
    Löfström, T.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Interpretable regression trees using conformal prediction2018In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 97, p. 394-404Article in journal (Refereed)
    Abstract [en]

    A key property of conformal predictors is that they are valid, i.e., their error rate on novel data is bounded by a preset level of confidence. For regression, this is achieved by turning the point predictions of the underlying model into prediction intervals. Thus, the most important performance metric for evaluating conformal regressors is not the error rate, but the size of the prediction intervals, where models generating smaller (more informative) intervals are said to be more efficient. State-of-the-art conformal regressors typically utilize two separate predictive models: the underlying model providing the center point of each prediction interval, and a normalization model used to scale each prediction interval according to the estimated level of difficulty for each test instance. When using a regression tree as the underlying model, this approach may cause test instances falling into a specific leaf to receive different prediction intervals. This clearly deteriorates the interpretability of a conformal regression tree compared to a standard regression tree, since the path from the root to a leaf can no longer be translated into a rule explaining all predictions in that leaf. In fact, the model cannot even be interpreted on its own, i.e., without reference to the corresponding normalization model. Current practice effectively presents two options for constructing conformal regression trees: to employ a (global) normalization model, and thereby sacrifice interpretability; or to avoid normalization, and thereby sacrifice both efficiency and individualized predictions. In this paper, two additional approaches are considered, both employing local normalization: the first approach estimates the difficulty by the standard deviation of the target values in each leaf, while the second approach employs Mondrian conformal prediction, which results in regression trees where each rule (path from root node to leaf node) is independently valid. An empirical evaluation shows that the first approach is as efficient as current state-of-the-art approaches, thus eliminating the efficiency vs. interpretability trade-off present in existing methods. Moreover, it is shown that if a validity guarantee is required for each single rule, as provided by the Mondrian approach, a penalty with respect to efficiency has to be paid, but it is only substantial at very high confidence levels.

  • 43.
    Johansson, Ulf
    et al.
    Jonkoping Univ, Dept Comp Sci & Informat, Jonkoping, Sweden..
    Lofstrom, Tuve
    Jonkoping Univ, Dept Comp Sci & Informat, Jonkoping, Sweden..
    Linusson, Henrik
    Univ Boras, Dept Informat Technol, Boras, Sweden..
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Efficient Venn predictors using random forests2019In: Machine Learning, ISSN 0885-6125, E-ISSN 1573-0565, Vol. 108, no 3, p. 535-550Article in journal (Refereed)
    Abstract [en]

    Successful use of probabilistic classification requires well-calibrated probability estimates, i.e., the predicted class probabilities must correspond to the true probabilities. In addition, a probabilistic classifier must, of course, also be as accurate as possible. In this paper, Venn predictors, and its special case Venn-Abers predictors, are evaluated for probabilistic classification, using random forests as the underlying models. Venn predictors output multiple probabilities for each label, i.e., the predicted label is associated with a probability interval. Since all Venn predictors are valid in the long run, the size of the probability intervals is very important, with tighter intervals being more informative. The standard solution when calibrating a classifier is to employ an additional step, transforming the outputs from a classifier into probability estimates, using a labeled data set not employed for training of the models. For random forests, and other bagged ensembles, it is, however, possible to use the out-of-bag instances for calibration, making all training data available for both model learning and calibration. This procedure has previously been successfully applied to conformal prediction, but was here evaluated for the first time for Venn predictors. The empirical investigation, using 22 publicly available data sets, showed that all four versions of the Venn predictors were better calibrated than both the raw estimates from the random forest, and the standard techniques Platt scaling and isotonic regression. Regarding both informativeness and accuracy, the standard Venn predictor calibrated on out-of-bag instances was the best setup evaluated. Most importantly, calibrating on out-of-bag instances, instead of using a separate calibration set, resulted in tighter intervals and more accurate models on every data set, for both the Venn predictors and the Venn-Abers predictors.

  • 44. Kalavri, Vasiliki
    et al.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    High-Level Programming Abstractions for Distributed Graph Processing2018In: IEEE Transactions on Knowledge and Data Engineering, ISSN 1041-4347, E-ISSN 1558-2191, Vol. 30, no 2, p. 305-324Article in journal (Refereed)
    Abstract [en]

    Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs appear in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can cover a diverse set of problems, including iterative refinement algorithms, graph transformations, graph aggregations, pattern matching, ego-network analysis, and graph traversals. Several high-level programming abstractions have been proposed and adopted by distributed graph processing systems and big data platforms. Even though significant work has been done to experimentally compare distributed graph processing frameworks, no qualitative study and comparison of graph programming abstractions has been conducted yet. In this survey, we review and analyze the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability. We review 34 distributed graph processing systems with respect to the graph processing models they implement and we survey applications that appear in recent distributed graph systems papers. Finally, we discuss trends and open research questions in the area of distributed graph processing.

  • 45.
    Karlsson, Vide
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Utvärdering av nyckelordsbaserad textkategoriseringsalgoritmer2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Supervised learning algorithms have been used for automatic text categoriza- tion with very good results. But supervised learning requires a large amount of manually labeled training data and this is a serious limitation for many practical applications. Keyword-based text categorization does not require manually la- beled training data and has therefore been presented as an attractive alternative to supervised learning. The aim of this study is to explore if there are other li- mitations for using keyword-based text categorization in industrial applications. This study also tests if a new lexical resource, based on the paradigmatic rela- tions between words, could be used to improve existing keyword-based text ca- tegorization algorithms. An industry motivated use case was created to measure practical applicability. The results showed that none of five examined algorithms was able to meet the requirements in the industrial motivated use case. But it was possible to modify one algorithm proposed by Liebeskind et.al. (2015) to meet the requirements. The new lexical resource produced relevant keywords for text categorization but there was still a large variance in the algorithm’s capaci- ty to correctly categorize different text categories. The categorization capacity was also generally too low to meet the requirements in many practical applica- tions. Further studies are needed to explore how the algorithm’s categorization capacity could be improved. 

  • 46.
    Kefato, Zekarias
    et al.
    Trento Univesrity.
    Sheikh, Nasrullah
    Trento University.
    Bahri, Leila
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Soliman, Amira
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Montresor, Alberto
    Trento University.
    CAS2VEC: Network-Agnostic Cascade Prediction in Online Social Networks2018In: The 5th International Symposium on Social Networks Analysis, Management and Security (SNAMS-2018), IEEE, 2018Conference paper (Refereed)
  • 47.
    Kefato, Zekarias
    et al.
    Trento University.
    Sheikh, Nasrullah
    Trento University.
    Bahri, Leila
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Soliman, Amira
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Montresor, Alberto
    Trento University.
    CaTS: Network-Agnostic Virality Prediction Model to Aid Rumour Detection2018Conference paper (Refereed)
  • 48.
    Kefato, Zekarias T.
    et al.
    Univ Trento, Trento, Italy..
    Sheikh, Nasrullah
    Univ Trento, Trento, Italy..
    Bahri, Leila
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Soliman, Amira
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Montresor, Alberto
    Univ Trento, Trento, Italy..
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    CAS2VEC: Network-Agnostic Cascade Prediction in Online Social Networks2018Conference paper (Refereed)
    Abstract [en]

    Effectively predicting whether a given post or tweet is going to become viral in online social networks is of paramount importance for several applications, such as trend and break-out forecasting. While several attempts towards this end exist, most of the current approaches rely on features extracted from the underlying network structure over which the content spreads. Recent studies have shown, however, that prediction can be effectively performed with very little structural information about the network, or even with no structural information at all. In this study we propose a novel network-agnostic approach called CAS2VEC, that models information cascades as time series and discretizes them using time slices. For the actual prediction task we have adopted a technique from the natural language processing community. The particular choice of the technique is mainly inspired by an empirical observation on the strong similarity between the distribution of discretized values occurrence in cascades and words occurrence in natural language documents. Thus, thanks to such a technique for sentence classification using convolutional neural networks, CAS2VEC can predict whether a cascade is going to become viral or not. We have performed extensive experiments on two widely used real-world datasets for cascade prediction, that demonstrate the effectiveness of our algorithm against strong baselines.

  • 49.
    Khan, Amin M.
    et al.
    Department of Computer Science, UiT The Arctic University of Norway. Tromsø, Norway.
    Freitag, Felix
    Department of Computer Architecture. Universitat Politecnica de Catalunya. Barcelona, Spain .
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Ha, Phuong Hoai
    Department of Computer Science, UiT The Arctic University of Norway. Tromsø, Norway.
    Demo Abstract: Towards IoT Service Deployments on Edge Community Network Microclouds2018Conference paper (Refereed)
    Abstract [en]

    Internet of Things (IoT) services for personal devices and smart homes provided by commercial solutions are typically proprietary and closed. These services provide little control to the end users, for instance to take ownership of their data and enabling services, which hinders these solutions' wider acceptance. In this demo paper, we argue for an approach to deploy professional IoT services on user-controlled infrastructure at the network edge. The users would benefit from the ability to choose the most suitable service from different IoT service offerings, like the one which satisfies their privacy requirements, and third-party service providers could offer more tailored IoT services at customer premises. We conduct the demonstration on microclouds, which have been built with the Cloudy platform in the Guifi.net community network. The demonstration is conducted from the perspective of end users, who wish to deploy professional IoT data management and analytics services in volunteer microclouds.

  • 50.
    Khan, Amin M.
    et al.
    UiT Arctic Univ Norway, Dept Comp Sci, Tromso, Norway.;Hitachi Vantara Corp, Lisbon, Portugal..
    Freitag, Felix
    Univ Politecn Cataluna, Dept Comp Architecture, Barcelona, Spain..
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Ha, Phuong Hoai
    Demo Abstract: Towards IoT Service Deployments on Edge Community Network Microclouds2018In: IEEE INFOCOM 2018 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), IEEE , 2018Conference paper (Refereed)
    Abstract [en]

    Internet of Things (IoT) services for personal devices and smart homes provided by commercial solutions are typically proprietary and closed. These services provide little control to the end users, for instance to take ownership of their data and enabling services, which hinders these solutions' wider acceptance. In this demo paper, we argue for an approach to deploy professional IoT services on user-controlled infrastructure at the network edge. The users would benefit from the ability to choose the most suitable service from different IoT service offerings, like the one which satisfies their privacy requirements, and third-party service providers could offer more tailored IoT services at customer premises. We conduct the demonstration on microclouds, which have been built with the Cloudy platform in the Guifi. net community network. The demonstration is conducted from the perspective of end users, who wish to deploy professional IoT data management and analytics services in volunteer microclouds.

12 1 - 50 of 87
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf