Change search
Refine search result
123 51 - 100 of 110
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 51.
    Håkansson, Anne
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Ipsum - An Approach to Smart Volatile ICT-Infrastructures for Smart Cities and Communities2018In: Procedia Computer Science, Elsevier B.V. , 2018, p. 2107-2116Conference paper (Refereed)
    Abstract [en]

    Information and Communication Technology (ICT)-infrastructures are increasingly important for enabling technology within Smart society with Smart cities and communitieS. An ICT-infrastructure handles data and information and encompasses devices and networks, protocols and procedures including Internet, Internet of Things and Cyber-Physical Systems. The current challenges of ICT-infrastructures are delivering Services and applications that are requested by users, such as residents, public organisations and institutionS. These Services and applications must be combined to enhance and enrich the environment and provide personalised ServiceS. This requires radical changes in technology, such as dynamic ICT-infrastructures, which should dynamically provide requested Services to be able to build Smart societieS. This paper is about pursuing Smart and connected cities and communities by creating Smart volatile ICT-infrastructures for Smart cities and communities, called Ipsum. The infrastructure is of multidisciplinary art and includes different kinds of hardware, software, artificial intelligence techniques depending on the available parts and the Services to be delivered. The goal is to provide a powerful and smart, and cost-saving volatile ICT-infrastructure with person-centred, ubiquitous and malleable parts, i.e., devices, sensors and ServiceS. Volatile means in real-time constitute a volatile network of devices and deploying it into cities and communitieS. Ipsum will be Smart everywhere by collaborating with several different hardware and software Systems and cooperating to perform complex taskS. By including ubiquitous and malleable parts in the infrastructure, Ipsum can facilitate an informed and engaged populace.

  • 52. Ingmar, Linnea
    et al.
    Schulte, Christian
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Making Compact-Table Compact2018In: 24th International Conference on the Principles and Practice of Constraint Programming, CP 2018, Springer, 2018, Vol. 11008, p. 210-218Conference paper (Refereed)
    Abstract [en]

    The compact-table propagator for table constraints appears to be a strong candidate for inclusion into any constraint solver due to its efficiency and simplicity. However, successful integration into a constraint solver based on copying rather than trailing is not obvious: while the underlying bit-set data structure is sparse for efficiency it is not compact for memory, which is essential for a copying solver. The paper introduces techniques to make compact-table an excellent fit for a copying solver. The key is to make sparse bit-sets dynamically compact (only their essential parts occupy memory and their implementation is dynamically adapted during search) and tables shared (their read-only parts are shared among copies). Dynamically compact bit-sets reduce peak memory by 7.2% and runtime by 13.6% on average and by up to 66.3% and 33.2%. Shared tables even further reduce runtime and memory usage. The reduction in runtime exceeds the reduction in memory and a cache analysis indicates that our techniques might also be beneficial for trailing solvers. The proposed implementation has replaced Gecode’s original implementations as it runs on average almost an order of magnitude faster while using half the memory.

  • 53.
    Ismail, Mahmoud
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Bonds, August
    KTH.
    Niazi, Salman
    Logical Clocks AB.
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dowling, Jim
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Scalable Block Reporting for HopsFS2019In: 2019 IEEE International Congress on Big Data (BigData Congress), 2019, p. 157-164Conference paper (Refereed)
    Abstract [en]

    Distributed hierarchical file systems typically de- couple the storage of the file system’s metadata from the data (file system blocks) to enable the scalability of the file system. This decoupling, however, requires the introduction of a periodic synchronization protocol to ensure the consistency of the file system’s metadata and its blocks. Apache HDFS and HopsFS implement a protocol, called block reporting, where each data server periodically sends ground truth information about all its file system blocks to the metadata servers, allowing the metadata to be synchronized with the actual state of the data blocks in the file system. The network and processing overhead of the existing block reporting protocol, however, increases with cluster size, ultimately limiting cluster scalability. In this paper, we introduce a new block reporting protocol for HopsFS that reduces the protocol bandwidth and processing overhead by up to three orders of magnitude, compared to HDFS/HopsFS’ existing protocol. Our new protocol removes a major bottleneck that prevented HopsFS clusters scaling to tens of thousands of servers.

  • 54.
    Ismail, Mahmoud
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Ronström, Mikael
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dowling, Jim
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    ePipe: Near Real-Time Polyglot Persistence of HopsFS Metadata2019In: 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2019, p. 92-101Conference paper (Refereed)
    Abstract [en]

    Distributed OLTP databases are now used to manage metadata for distributed file systems, but they cannot also efficiently support complex queries or aggregations. To solve this problem, we introduce ePipe, a databus that both creates a consistent change stream for a distributed, hierarchical file system (HopsFS) and eventually delivers the correctly ordered stream with low latency to downstream clients. ePipe can be used to provide polyglot storage for file system metadata, allowing metadata queries to be handled by the most efficient engine for that query. For file system notifications, we show that ePipe achieves up to 56X throughput improvement over HDFS INotify and Trumpet with up to 3 orders of magnitude lower latency. For Spotify’s Hadoop workload, we show that ePipe can replicate all file system changes from HopsFS to Elasticsearch with an average replication lag of only 330 ms.

  • 55.
    Issa, Shady
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS. INESC-ID, Instituto Superior Tecnico, Universidade de Lisboa.
    Techniques for Enhancing the Efficiency of Transactional Memory Systems2018Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Transactional Memory (TM) is an emerging programming paradigm that drastically simplifies the development of concurrent applications by relieving programmers from a major source of complexity: how to ensure correct, yet efficient, synchronization of concurrent accesses to shared memory. Despite the large body of research devoted to this area, existing TM systems still suffer from severe limitations that hamper both their performance and energy efficiency.

    This dissertation tackles the problem of how to build efficient implementations of the TM abstraction by introducing innovative techniques that address three crucial limitations of existing TM systems by: (i) extending the effective capacity of Hardware TM (HTM) implementations; (ii) reducing the synchronization overheads in Hybrid TM (HyTM) systems; (iii) enhancing the efficiency of TM applications via energy-aware contention management schemes.

    The first contribution of this dissertation, named POWER8-TM (P8TM), addresses what is arguably one of the most compelling limitations of existing HTM implementations: the inability to process transactions whose footprint exceeds the capacity of the processor's cache. By leveraging, in an innovative way, two hardware features provided by IBM POWER8 processors, namely Rollback-only Transactions and Suspend/Resume, P8TM can achieve up to 7x performance gains in workloads that stress the capacity limitations of HTM.

    The second contribution is Dynamic Memory Partitioning-TM (DMP-TM), a novel Hybrid TM (HyTM) that offloads the cost of detecting conflicts between HTM and Software TM (STM) to off-the-shelf operating system memory protection mechanisms. DMP-TM's design is agnostic to the STM algorithm and has the key advantage of allowing for integrating, in an efficient way, highly scalable STM implementations that would, otherwise, demand expensive instrumentation of the HTM path. This allows DMP-TM to achieve up to 20x speedups compared to state of the art HyTM solutions in uncontended workloads.

    The third contribution, Green-CM, is an energy-aware Contention Manager (CM) that has two main innovative aspects: (i) a novel asymmetric design, which combines different back-off policies in order to take advantage of Dynamic Frequency and Voltage Scaling (DVFS) hardware capabilities, available in most modern processors; (ii) an energy efficient implementation of a fundamental building block for many CM implementations, namely, the mechanism used to back-off threads for a predefined amount of time. Thanks to its innovative design, Green-CM can reduce the Energy Delay Product by up to 2.35x with respect to state of the art CMs.

    All the techniques proposed in this dissertation share an important common feature that is essential to preserve the ease of use of the TM abstraction: the reliance on on-line self-tuning mechanisms that ensure robust performance even in presence of heterogeneous workloads, without requiring prior knowledge of the target workloads or architecture.

  • 56.
    Jaradat, Shatha
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dokoohaki, Nima
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Hammar, Kim
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Wara, Ummal
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Matskin, Mihhail
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dynamic CNN Models For Fashion Recommendation in Instagram2018In: 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS / [ed] Chen, JJ Yang, LT, IEEE COMPUTER SOC , 2018, p. 1144-1151Conference paper (Refereed)
    Abstract [en]

    Instagram as an online photo-sharing and social-networking service is becoming more powerful in enabling fashion brands to ramp up their business growth. Nowadays, a single post by a fashion influencer attracts a wealth of attention and a magnitude of followers who are curious to know more about the brands and style of each clothing item sitting inside the image. To this end, the development of efficient Deep CNN models that can accurately detect styles and brands have become a research challenge. In addition, current techniques need to cope with inherent fashion-related data issues. Namely, clothing details inside a single image only cover a small proportion of the large and hierarchical space of possible brands and clothing item attributes. In order to cope with these challenges, one can argue that neural classifiers should become adapted to large-scale and hierarchical fashion datasets. As a remedy, we propose two novel techniques to incorporate the valuable social media textual content to support the visual classification in a dynamic way. The first method is adaptive neural pruning (DynamicPruning) in which the clothing item category detected from posts' text analysis can be used to activate the possible range of connections of clothing attributes' classifier. The second method (DynamicLayers) is a dynamic framework in which multiple-attributes classification layers exist and a suitable attributes' classifier layer is activated dynamically based upon the mined text from the image. Extensive experiments on a dataset gathered from Instagram and a baseline fashion dataset (DeepFashion) have demonstrated that our approaches can improve the accuracy by about 20% when compared to base architectures. It is worth highlighting that with Dynamiclayers we have gained 35% accuracy for the task of multi-class multi-labeled classification compared to the other model.

  • 57.
    Jaradat, Shatha
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dokoohaki, Nima
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Wara, Ummal
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Goswami, Mallu
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Hammar, Kim
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Matskin, Mihhail
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    TALS: A framework for text analysis, fine-grained annotation, localisation and semantic segmentation2019In: Proceedings - International Computer Software and Applications Conference, IEEE Computer Society, 2019, Vol. 8754470, p. 201-206Conference paper (Refereed)
    Abstract [en]

    With around 2.77 billion users using online social media platforms nowadays, it is becoming more attractive for business retailers to reach and to connect to more potential clients through social media. However, providing more effective recommendations to grab clients’ attention requires a deep understanding of users’ interests. Given the enormous amounts of text and images that users share in social media, deep learning approaches play a major role in performing semantic analysis of text and images. Moreover, object localisation and pixel-by-pixel semantic segmentation image analysis neural architectures provide an enhanced level of information. However, to train such architectures in an end-to-end manner, detailed datasets with specific meta-data are required. In our paper, we present a complete framework that can be used to tag images in a hierarchical fashion, and to perform object localisation and semantic segmentation. In addition to this, we show the value of using neural word embeddings in providing additional semantic details to annotators to guide them in annotating images in the system. Our framework is designed to be a fully functional solution capable of providing fine-grained annotations, essential localisation and segmentation services while keeping the core architecture simple and extensible. We also provide a fine-grained labelled fashion dataset that can be a rich source for research purposes.

  • 58. Johansson, U.
    et al.
    Linusson, H.
    Löfström, T.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Interpretable regression trees using conformal prediction2018In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 97, p. 394-404Article in journal (Refereed)
    Abstract [en]

    A key property of conformal predictors is that they are valid, i.e., their error rate on novel data is bounded by a preset level of confidence. For regression, this is achieved by turning the point predictions of the underlying model into prediction intervals. Thus, the most important performance metric for evaluating conformal regressors is not the error rate, but the size of the prediction intervals, where models generating smaller (more informative) intervals are said to be more efficient. State-of-the-art conformal regressors typically utilize two separate predictive models: the underlying model providing the center point of each prediction interval, and a normalization model used to scale each prediction interval according to the estimated level of difficulty for each test instance. When using a regression tree as the underlying model, this approach may cause test instances falling into a specific leaf to receive different prediction intervals. This clearly deteriorates the interpretability of a conformal regression tree compared to a standard regression tree, since the path from the root to a leaf can no longer be translated into a rule explaining all predictions in that leaf. In fact, the model cannot even be interpreted on its own, i.e., without reference to the corresponding normalization model. Current practice effectively presents two options for constructing conformal regression trees: to employ a (global) normalization model, and thereby sacrifice interpretability; or to avoid normalization, and thereby sacrifice both efficiency and individualized predictions. In this paper, two additional approaches are considered, both employing local normalization: the first approach estimates the difficulty by the standard deviation of the target values in each leaf, while the second approach employs Mondrian conformal prediction, which results in regression trees where each rule (path from root node to leaf node) is independently valid. An empirical evaluation shows that the first approach is as efficient as current state-of-the-art approaches, thus eliminating the efficiency vs. interpretability trade-off present in existing methods. Moreover, it is shown that if a validity guarantee is required for each single rule, as provided by the Mondrian approach, a penalty with respect to efficiency has to be paid, but it is only substantial at very high confidence levels.

  • 59.
    Johansson, Ulf
    et al.
    Jonkoping Univ, Dept Comp Sci & Informat, Jonkoping, Sweden..
    Lofstrom, Tuve
    Jonkoping Univ, Dept Comp Sci & Informat, Jonkoping, Sweden..
    Linusson, Henrik
    Univ Boras, Dept Informat Technol, Boras, Sweden..
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Efficient Venn predictors using random forests2019In: Machine Learning, ISSN 0885-6125, E-ISSN 1573-0565, Vol. 108, no 3, p. 535-550Article in journal (Refereed)
    Abstract [en]

    Successful use of probabilistic classification requires well-calibrated probability estimates, i.e., the predicted class probabilities must correspond to the true probabilities. In addition, a probabilistic classifier must, of course, also be as accurate as possible. In this paper, Venn predictors, and its special case Venn-Abers predictors, are evaluated for probabilistic classification, using random forests as the underlying models. Venn predictors output multiple probabilities for each label, i.e., the predicted label is associated with a probability interval. Since all Venn predictors are valid in the long run, the size of the probability intervals is very important, with tighter intervals being more informative. The standard solution when calibrating a classifier is to employ an additional step, transforming the outputs from a classifier into probability estimates, using a labeled data set not employed for training of the models. For random forests, and other bagged ensembles, it is, however, possible to use the out-of-bag instances for calibration, making all training data available for both model learning and calibration. This procedure has previously been successfully applied to conformal prediction, but was here evaluated for the first time for Venn predictors. The empirical investigation, using 22 publicly available data sets, showed that all four versions of the Venn predictors were better calibrated than both the raw estimates from the random forest, and the standard techniques Platt scaling and isotonic regression. Regarding both informativeness and accuracy, the standard Venn predictor calibrated on out-of-bag instances was the best setup evaluated. Most importantly, calibrating on out-of-bag instances, instead of using a separate calibration set, resulted in tighter intervals and more accurate models on every data set, for both the Venn predictors and the Venn-Abers predictors.

  • 60. Jonsson, Leif
    et al.
    Broman, David
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Magnusson, Måns
    UC Berkeley, Sweden.
    Sandahl, Kristian
    UC Berkeley, Sweden.
    Villani, Mattias
    UC Berkeley, Sweden.
    Eldh, Sigrid
    Automatic Localization of Bugs to Faulty Components in Large Scale Software Systems using Bayesian Classification2016In: 2016 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2016), IEEE, 2016, p. 425-432Conference paper (Refereed)
    Abstract [en]

    We suggest a Bayesian approach to the problem of reducing bug turnaround time in large software development organizations. Our approach is to use classification to predict where bugs are located in components. This classification is a form of automatic fault localization (AFL) at the component level. The approach only relies on historical bug reports and does not require detailed analysis of source code or detailed test runs. Our approach addresses two problems identified in user studies of AFL tools. The first problem concerns the trust in which the user can put in the results of the tool. The second problem concerns understanding how the results were computed. The proposed model quantifies the uncertainty in its predictions and all estimated model parameters. Additionally, the output of the model explains why a result was suggested. We evaluate the approach on more than 50000 bugs.

  • 61. Kalavri, Vasiliki
    et al.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    High-Level Programming Abstractions for Distributed Graph Processing2018In: IEEE Transactions on Knowledge and Data Engineering, ISSN 1041-4347, E-ISSN 1558-2191, Vol. 30, no 2, p. 305-324Article in journal (Refereed)
    Abstract [en]

    Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs appear in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can cover a diverse set of problems, including iterative refinement algorithms, graph transformations, graph aggregations, pattern matching, ego-network analysis, and graph traversals. Several high-level programming abstractions have been proposed and adopted by distributed graph processing systems and big data platforms. Even though significant work has been done to experimentally compare distributed graph processing frameworks, no qualitative study and comparison of graph programming abstractions has been conducted yet. In this survey, we review and analyze the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability. We review 34 distributed graph processing systems with respect to the graph processing models they implement and we survey applications that appear in recent distributed graph systems papers. Finally, we discuss trends and open research questions in the area of distributed graph processing.

  • 62.
    Karlsson, Vide
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Utvärdering av nyckelordsbaserad textkategoriseringsalgoritmer2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Supervised learning algorithms have been used for automatic text categoriza- tion with very good results. But supervised learning requires a large amount of manually labeled training data and this is a serious limitation for many practical applications. Keyword-based text categorization does not require manually la- beled training data and has therefore been presented as an attractive alternative to supervised learning. The aim of this study is to explore if there are other li- mitations for using keyword-based text categorization in industrial applications. This study also tests if a new lexical resource, based on the paradigmatic rela- tions between words, could be used to improve existing keyword-based text ca- tegorization algorithms. An industry motivated use case was created to measure practical applicability. The results showed that none of five examined algorithms was able to meet the requirements in the industrial motivated use case. But it was possible to modify one algorithm proposed by Liebeskind et.al. (2015) to meet the requirements. The new lexical resource produced relevant keywords for text categorization but there was still a large variance in the algorithm’s capaci- ty to correctly categorize different text categories. The categorization capacity was also generally too low to meet the requirements in many practical applica- tions. Further studies are needed to explore how the algorithm’s categorization capacity could be improved. 

  • 63.
    Kefato, Zekarias
    et al.
    Trento University.
    Sheikh, Nasrullah
    Trento University.
    Bahri, Leila
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Soliman, Amira
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Montresor, Alberto
    Trento University.
    CaTS: Network-Agnostic Virality Prediction Model to Aid Rumour Detection2018Conference paper (Refereed)
  • 64.
    Kefato, Zekarias T.
    et al.
    Univ Trento, Trento, Italy..
    Sheikh, Nasrullah
    Univ Trento, Trento, Italy..
    Bahri, Leila
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Soliman, Amira
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Montresor, Alberto
    Univ Trento, Trento, Italy..
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    CAS2VEC: Network-Agnostic Cascade Prediction in Online Social Networks2018Conference paper (Refereed)
    Abstract [en]

    Effectively predicting whether a given post or tweet is going to become viral in online social networks is of paramount importance for several applications, such as trend and break-out forecasting. While several attempts towards this end exist, most of the current approaches rely on features extracted from the underlying network structure over which the content spreads. Recent studies have shown, however, that prediction can be effectively performed with very little structural information about the network, or even with no structural information at all. In this study we propose a novel network-agnostic approach called CAS2VEC, that models information cascades as time series and discretizes them using time slices. For the actual prediction task we have adopted a technique from the natural language processing community. The particular choice of the technique is mainly inspired by an empirical observation on the strong similarity between the distribution of discretized values occurrence in cascades and words occurrence in natural language documents. Thus, thanks to such a technique for sentence classification using convolutional neural networks, CAS2VEC can predict whether a cascade is going to become viral or not. We have performed extensive experiments on two widely used real-world datasets for cascade prediction, that demonstrate the effectiveness of our algorithm against strong baselines.

  • 65.
    Khan, Amin M.
    et al.
    UiT Arctic Univ Norway, Dept Comp Sci, Tromso, Norway.;Hitachi Vantara Corp, Lisbon, Portugal..
    Freitag, Felix
    Univ Politecn Cataluna, Dept Comp Architecture, Barcelona, Spain..
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Ha, Phuong Hoai
    Demo Abstract: Towards IoT Service Deployments on Edge Community Network Microclouds2018In: IEEE INFOCOM 2018 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), IEEE , 2018Conference paper (Refereed)
    Abstract [en]

    Internet of Things (IoT) services for personal devices and smart homes provided by commercial solutions are typically proprietary and closed. These services provide little control to the end users, for instance to take ownership of their data and enabling services, which hinders these solutions' wider acceptance. In this demo paper, we argue for an approach to deploy professional IoT services on user-controlled infrastructure at the network edge. The users would benefit from the ability to choose the most suitable service from different IoT service offerings, like the one which satisfies their privacy requirements, and third-party service providers could offer more tailored IoT services at customer premises. We conduct the demonstration on microclouds, which have been built with the Cloudy platform in the Guifi. net community network. The demonstration is conducted from the perspective of end users, who wish to deploy professional IoT data management and analytics services in volunteer microclouds.

  • 66. Kolbay, B.
    et al.
    Mrazovic, Petar
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Larriba-Pey, J. L.
    Analyzing last mile delivery operations in barcelona’s urban freight transport network2018In: 2nd EAI International Conference on ICT Infrastructures and Services for Smart Cities, IISSC 2017 and 2nd International Conference on Cloud, Networking for IoT systems, CN4IoT 2017, Springer Verlag , 2018, p. 13-22Conference paper (Refereed)
    Abstract [en]

    Barcelona has recently started a new strategy to control and understand Last Mile Delivery, AreaDUM. The strategy is to provide freight delivery vehicle drivers with a mobile app that has to be used every time their vehicle is parked in one of the designated AreaDUM surface parking spaces in the streets of the city. This provides a significant amount of data about the activity of the freight delivery vehicles, their patterns, the occupancy of the spaces, etc. In this paper, we provide a preliminary set of analytics preceded by the procedures employed for the cleansing of the dataset. During the analysis we show that some data blur the results and using a simple strategy to detect when a vehicle parks repeatedly in close-by parking slots, we are able to obtain different, yet more reliable results. In our paper, we show that this behavior is common among users with 80\% prevalence. We conclude that we need to analyse and understand the user behaviors further with the purpose of providing predictive algorithms to find parking lots and smart routing algorithms to minimize traffic.

  • 67. Koubarakis, M.
    et al.
    Bereta, K.
    Bilidas, D.
    Giannousis, K.
    Ioannidis, T.
    Pantazi, D. -A
    Stamoulis, G.
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Bruzzone, L.
    Paris, C.
    Eltoft, T.
    Krämer, T.
    Charalabidis, A.
    Karkaletsis, V.
    Konstantopoulos, S.
    Dowling, J.
    Kakantousis, T.
    Datcu, M.
    Dumitru, C. O.
    Appel, F.
    Bach, H.
    Migdall, S.
    Hughes, N.
    Arthurs, D.
    Fleming, A.
    From copernicus big data to extreme earth analytics2019In: Advances in Database Technology - EDBT, OpenProceedings, 2019, p. 690-693Conference paper (Refereed)
    Abstract [en]

    Copernicus is the European programme for monitoring the Earth. It consists of a set of systems that collect data from satellites and in-situ sensors, process this data and provide users with reliable and up-to-date information on a range of environmental and security issues. The data and information processed and disseminated puts Copernicus at the forefront of the big data paradigm, giving rise to all relevant challenges, the so-called 5 Vs: volume, velocity, variety, veracity and value. In this short paper, we discuss the challenges of extracting information and knowledge from huge archives of Copernicus data. We propose to achieve this by scale-out distributed deep learning techniques that run on very big clusters offering virtual machines and GPUs. We also discuss the challenges of achieving scalability in the management of the extreme volumes of information and knowledge extracted from Copernicus data. The envisioned scientific and technical work will be carried out in the context of the H2020 project ExtremeEarth which starts in January 2019.

  • 68.
    Kroll, Lars
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Segeljakt, Klas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Schulte, Christian
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Carbone, P.
    Arc: An IR for batch and stream programming2019In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Association for Computing Machinery (ACM), 2019, p. 53-58Conference paper (Refereed)
    Abstract [en]

    In big data analytics, there is currently a large number of data programming models and their respective frontends such as relational tables, graphs, tensors, and streams. This has lead to a plethora of runtimes that typically focus on the efficient execution of just a single frontend. This fragmentation manifests itself today by highly complex pipelines that bundle multiple runtimes to support the necessary models. Hence, joint optimization and execution of such pipelines across these frontend-bound runtimes is infeasible. We propose Arc as the first unified Intermediate Representation (IR) for data analytics that incorporates stream semantics based on a modern specification of streams, windows and stream aggregation, to combine batch and stream computation models. Arc extends Weld, an IR for batch computation and adds support for partitioned, out-of-order stream and window operators which are the most fundamental building blocks in contemporary data streaming.

  • 69. Laperdrix, P.
    et al.
    Avoine, G.
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Nikiforakis, N.
    Morellian analysis for browsers: Making web authentication stronger with canvas fingerprinting2019In: Detection of Intrusions and Malware, and Vulnerability Assessment: 16th International Conference, DIMVA 2019, Gothenburg, Sweden, June 19–20, 2019, Proceedings / [ed] Roberto Perdisci, Clémentine Maurice, Giorgio Giacinto, Magnus Almgren, Springer Verlag , 2019, p. 43-66Conference paper (Refereed)
    Abstract [en]

    In this paper, we present the first fingerprinting-based authentication scheme that is not vulnerable to trivial replay attacks. Our proposed canvas-based fingerprinting technique utilizes one key characteristic: it is parameterized by a challenge, generated on the server side. We perform an in-depth analysis of all parameters that can be used to generate canvas challenges, and we show that it is possible to generate unique, unpredictable, and highly diverse canvas-generated images each time a user logs onto a service. With the analysis of images collected from more than 1.1 million devices in a real-world large-scale experiment, we evaluate our proposed scheme against a large set of attack scenarios and conclude that canvas fingerprinting is a suitable mechanism for stronger authentication on the web.

  • 70. Lin, X.
    et al.
    Buyya, R.
    Yang, L.
    Tari, Z.
    Choo, K. -KR.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Yao, L.
    Yin, H.
    Wang, W.
    Message from the BDCloud 2018 Chairs2019In: 16th IEEE International Symposium on Parallel and Distributed Processing with Applications, 17th IEEE International Conference on Ubiquitous Computing and Communications, 8th IEEE International Conference on Big Data and Cloud Computing, 11th IEEE International Conference on Social Computing and Networking and 8th IEEE International Conference on Sustainable Computing and Communications, ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018, p. XXIX-XXX, article id 8672358Article in journal (Refereed)
  • 71.
    Linusson, Henrik
    et al.
    Department of Information Technology, University of Borås, Sweden.
    Norinder, Ulf
    Swetox, Karolinska Institutet, Unit of Toxicology Sciences, Sweden.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS. Department of Computer and Systems Sciences, Stockholm University, Sweden.
    Johansson, Ulf
    Högskolan i Jönköping, JTH, Datateknik och informatik.
    Löfström, Tuve
    Högskolan i Jönköping, JTH. Forskningsmiljö Datavetenskap och informatik.
    On the calibration of aggregated conformal predictors2017In: Proceedings of Machine Learning Research: Volume 60: Conformal and Probabilistic Prediction and Applications, 13-16 June 2017, Stockholm, Sweden / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Harris Papadopoulos, 2017, p. 154-173Conference paper (Refereed)
    Abstract [en]

    Conformal prediction is a learning framework that produces models that associate with each of their predictions a measure of statistically valid confidence. These models are typically constructed on top of traditional machine learning algorithms. An important result of conformal prediction theory is that the models produced are provably valid under relatively weak assumptions—in particular, their validity is independent of the specific underlying learning algorithm on which they are based. Since validity is automatic, much research on conformal predictors has been focused on improving their informational and computational efficiency. As part of the efforts in constructing efficient conformal predictors, aggregated conformal predictors were developed, drawing inspiration from the field of classification and regression ensembles. Unlike early definitions of conformal prediction procedures, the validity of aggregated conformal predictors is not fully understood—while it has been shown that they might attain empirical exact validity under certain circumstances, their theoretical validity is conditional on additional assumptions that require further clarification. In this paper, we show why validity is not automatic for aggregated conformal predictors, and provide a revised definition of aggregated conformal predictors that gains approximate validity conditional on properties of the underlying learning algorithm.

  • 72.
    Liu, Hongyi
    et al.
    KTH, School of Industrial Engineering and Management (ITM), Production Engineering.
    Fang, Tongtong
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Zhou, Tianyu
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Wang, Lihui
    KTH, School of Industrial Engineering and Management (ITM), Production Engineering, Production Systems.
    Towards Robust Human-Robot Collaborative Manufacturing: Multimodal Fusion2018In: IEEE Access, E-ISSN 2169-3536, Vol. 6, p. 74762-74771Article in journal (Refereed)
    Abstract [en]

    Intuitive and robust multimodal robot control is the key toward human-robot collaboration (HRC) for manufacturing systems. Multimodal robot control methods were introduced in previous studies. The methods allow human operators to control robot intuitively without programming brand-specific code. However, most of the multimodal robot control methods are unreliable because the feature representations are not shared across multiple modalities. To target this problem, a deep learning-based multimodal fusion architecture is proposed in this paper for robust multimodal HRC manufacturing systems. The proposed architecture consists of three modalities: speech command, hand motion, and body motion. Three unimodal models are first trained to extract features, which are further fused for representation sharing. Experiments show that the proposed multimodal fusion model outperforms the three unimodal models. This paper indicates a great potential to apply the proposed multimodal fusion architecture to robust HRC manufacturing systems.

  • 73.
    Lundberg, Johannes
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Safe Kernel Programming with Rust2018Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Writing bug free computer code is a challenging task in a low-level language like C. While C compilers are getting better and better at detecting possible bugs, they still have a long way to go. For application programming we have higher level languages that abstract away details in memory handling and concurrent programming. However, a lot of an operating system's source code is still written in C and the kernel is exclusively written in C. How can we make writing kernel code safer? What are the performance penalties we have to pay for writing safe code? In this thesis, we will answer these questions using the Rust programming language. A Rust Kernel Programming Interface is designed and implemented, and a network device driver is then ported to Rust. The Rust code is analyzed to determine the safeness and the two implementations are benchmarked for performance and compared to each other. It is shown that a kernel device driver can be written entirely in safe Rust code, but the interface layer require some unsafe code. Measurements show unexpected minor improvements to performance with Rust.

  • 74. Magnusson, M.
    et al.
    Jonsson, L.
    Villani, M.
    Broman, David
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models2018In: Journal of Computational And Graphical Statistics, ISSN 1061-8600, E-ISSN 1537-2715, Vol. 27, no 2, p. 449-463Article in journal (Refereed)
    Abstract [en]

    Topic models, and more specifically the class of latent Dirichlet allocation (LDA), are widely used for probabilistic modeling of text. Markov chain Monte Carlo (MCMC) sampling from the posterior distribution is typically performed using a collapsed Gibbs sampler. We propose a parallel sparse partially collapsed Gibbs sampler and compare its speed and efficiency to state-of-the-art samplers for topic models on five well-known text corpora of differing sizes and properties. In particular, we propose and compare two different strategies for sampling the parameter block with latent topic indicators. The experiments show that the increase in statistical inefficiency from only partial collapsing is smaller than commonly assumed, and can be more than compensated by the speedup from parallelization and sparsity on larger corpora. We also prove that the partially collapsed samplers scale well with the size of the corpus. The proposed algorithm is fast, efficient, exact, and can be used in more modeling situations than the ordinary collapsed sampler. Supplementary materials for this article are available online.

  • 75.
    Morin, B.
    et al.
    SINTEF Digital, Oslo, Norway.
    Høgenes, J.
    Song, H.
    Harrand, Nicolas Yves Maurice
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Engineering software diversity: A model-based approach to systematically diversify communications2018In: Proceedings - 21st ACM/IEEE International Conference on Model Driven Engineering Languages and Systems, MODELS 2018, Association for Computing Machinery, Inc , 2018, p. 155-165Conference paper (Refereed)
    Abstract [en]

    Automated diversity is a promising mean of increasing the security of software systems. However, current automated diversity techniques operate at the bottom of the software stack (operating system and compiler), yielding a limited amount of diversity. We present a novel Model-Driven Engineering approach to the diversification of communicating systems, building on abstraction, model transformations and code generation. This approach generates significant amounts of diversity with a low overhead, and addresses a large number of communicating systems, including small communicating devices.

  • 76.
    Mrazovic, Petar
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Crowdsensing-driven Route Optimisation Algorithms for Smart Urban Mobility2018Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Urban mobility is often considered as one of the main facilitators for greener and more sustainable urban development. However, nowadays it requires a significant shift towards cleaner and more efficient urban transport which would support for increased social and economic concentration of resources in cities. A high priority for cities around the world is to support residents’ mobility within the urban environments while at the same time reducing congestions, accidents, and pollution. However, developing a more efficient and greener (or in one word, smarter) urban mobility is one of the most difficult topics to face in large metropolitan areas. In this thesis, we approach this problem from the perspective of rapidly evolving ICT landscape which allow us to build mobility solutions without the need for large investments or sophisticated sensor technologies.

    In particular, we propose to leverage Mobile Crowdsensing (MCS) paradigm in which citizens use their mobile communication and/or sensing devices to collect, locally process and analyse, as well as voluntary distribute geo-referenced information. The mobility data crowdsensed from volunteer residents (e.g., events, traffic intensity, noise and air pollution, etc.) can provide valuable information about the current mobility conditions in the city, which can, with the adequate data processing algorithms, be used to route and manage people flows in urban environments.

    Therefore, in this thesis we combine two very promising Smart Mobility enablers – MCS and journey/route planning, and thus bring together to some extent distinct research challenges. We separate our research objectives into two parts, i.e., research stages: (1) architectural challenges in designing MCS systems and (2) algorithmic challenges in MCS-driven route planning applications. We aim to demonstrate a logical research progression over time, starting from fundamentals of human-in-the-loop sensing systems such as MCS, to route optimisation algorithms tailored for specific MCS applications. While we mainly focus on algorithms and heuristics to solve NP-hard routing problems, we use real-world application examples to showcase the advantages of the proposed algorithms and infrastructures.

  • 77.
    Mrazovic, Petar
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Eser, Elif
    Bilkent Univ, Dept Comp Engn, Ankara, Turkey..
    Ferhatosmanoglu, Hakan
    Univ Warwick, Dept Comp Sci, Coventry, W Midlands, England..
    Larriba-Pey, Josep L.
    Univ Politecn Cataluna, Dept Comp Architecture, Barcelona, Spain..
    Matskin, Mihhail
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Multi-vehicle Route Planning for Efficient Urban Freight Transport2018In: 2018 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS (IS) / [ed] JardimGoncalves, R Mendonca, JP Jotsov, V Marques, M Martins, J Bierwolf, R, IEEE , 2018, p. 744-753Conference paper (Refereed)
    Abstract [en]

    The urban parking spaces for loading/unloading are typically over-occupied, which shifts delivery operations to traffic lanes and pavements, increases traffic, generates noise, and causes pollution. We present a data analytics based routing optimization that improves the circulation of vehicles and utilization of parking spaces. We formalize this new problem and develop a novel multi vehicle route planner that avoids congestions at loading/unloading areas and minimizes the total duration. We present the developed tool with an illustration and analysis for the urban freight in the city of Barcelona, which monitors tens of thousands of deliveries every day. Our system includes an effective evaluation of candidate routes by considering the waiting times and further delays of other deliverers as a first class citizen in the optimization. A two-layer local search is proposed with a greedy randomized adaptive method for variable neighborhood search. Our approach is applied and validated over data collected across Barcelona's urban freight transport network, which contains 3,704,034 parking activities. Our solution is shown to significantly improve the use of available parking spaces and the circulation of vehicles, as evidenced by the results. The analysis also provides useful insights on how to manage delivery routes and parking spaces for sustainable urban freight transport and city logistics.

  • 78.
    Mrazovic, Petar
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Larriba-Pey, J. L.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS. Dept. of Computer Architecture, UPC Polytechnic University of Catalonia, Barcelona, Spain.
    Matskin, Mihhail
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    A Deep Learning Approach for Estimating Inventory Rebalancing Demand in Bicycle Sharing Systems2018In: Proceedings - International Computer Software and Applications Conference, IEEE Computer Society , 2018, p. 110-115Conference paper (Refereed)
    Abstract [en]

    Meeting user demand is one of the most challenging problems arising in public bicycle sharing systems. Various factors, such as daily commuting patterns or topographical conditions, can lead to an unbalanced state where the numbers of rented and returned bicycles differ significantly among the stations. This can cause spatial imbalance of the bicycle inventory which becomes critical when stations run completely empty or full, and thus prevent users from renting or returning bicycles. To prevent such service disruptions, we propose to forecast user demand in terms of expected number of bicycle rentals and returns and accordingly to estimate number of bicycles that need to be manually redistributed among the stations by maintenance vehicles. As opposed to traditional solutions to this problem, which rely on short-term demand forecasts, we aim to maximise the time within which the stations remain balanced by forecasting user demand multiple steps ahead of time. We propose a multi-input multi-output deep learning model based on Long Short-Term Memory networks to forecast user demand over long future horizons. Conducted experimental study over real-world dataset confirms the efficiency and accuracy of our approach.

  • 79.
    Murray, Lawrence
    et al.
    Uppsala University.
    Lundén, Daniel
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Kudlicka, Jan
    Uppsala University.
    Broman, David
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Schön, Thomas
    Delayed Sampling and Automatic Rao-Blackwellization of Probabilistic Programs2018In: Proceeding of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), PMLR , 2018Conference paper (Refereed)
    Abstract [en]

    We introduce a dynamic mechanism for the solution of analytically-tractable substructure in probabilistic programs, using conjugate priors and affine transformations to reduce variance in Monte Carlo estimators. For inference with Sequential Monte Carlo, this automatically yields improvements such as locally-optimal proposals and Rao–Blackwellization. The mechanism maintains a directed graph alongside the running program that evolves dynamically as operations are triggered upon it. Nodes of the graph represent random variables, edges the analytically-tractable relationships between them. Random variables remain in the graph for as long as possible, to be sampled only when they are used by the program in a way that cannot be resolved analytically. In the meantime, they are conditioned on as many observations as possible. We demonstrate the mechanism with a few pedagogical examples, as well as a linear-nonlinear state-space model with simulated data, and an epidemiological model with real data of a dengue outbreak in Micronesia. In all cases one or more variables are automatically marginalized out to significantly reduce variance in estimates of the marginal likelihood, in the final case facilitating a random-weight or pseudo-marginal-type importance sampler for parameter estimation. We have implemented the approach in Anglican and a new probabilistic programming language called Birch.

  • 80.
    Natarajan, Saranya
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Broman, David
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Timed C: An Extension to the C Programming Language for Real-Time Systems2018In: 24TH IEEE REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2018) / [ed] Pellizzoni, R, IEEE, 2018, p. 227-239Conference paper (Refereed)
    Abstract [en]

    The design and implementation of real-time systems require that both the logical and the temporal behavior are correct. There exist several specialized languages and tools that use the notion of logical time, as well as industrial strength languages such as Ada and RTJS that incorporate direct handling of real time. Although these languages and tools have shown to be good alternatives for safety-critical systems, most commodity real-time and embedded systems are today implemented in the standard C programming language. Such systems are typically targeting proprietary bare-metal platforms, standard POSIX compliant platforms, or open-source operating systems. It is, however, error prone to develop large, reliable, and portable systems based on these APIs. In this paper, we present an extension to the C programming language, called Timed C, with a minimal set of language primitives, and show how a retargetable source-to-source compiler can be used to compile and execute simple, expressive, and portable programs. To evaluate our approach, we conduct a case study of a CubeSat satellite. We implement the core timing aspects in Timed C, and show portability by compiling on-board software to both flight hardware, and to low-cost experimental platforms.

  • 81.
    Niazi, Salman
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Scaling Distributed Hierarchical File Systems Using NewSQL Databases2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    For many years, researchers have investigated the use of database technology to manage file system metadata, with the goal of providing extensible typed metadata and support for fast, rich metadata search. However, earlier attempts failed mainly due to the reduced performance introduced by adding database operations to the file system’s critical path. Recent improvements in the performance of distributed in-memory online transaction processing databases (NewSQL databases) led us to re-investigate the possibility of using a database to manage file system metadata, but this time for a distributed, hierarchical file system, the Hadoop Distributed File System (HDFS). The single-host metadata service of HDFS is a well-known bottleneck for both the size of the HDFS clusters and their throughput.In this thesis, we detail the algorithms, techniques, and optimizations used to develop HopsFS, an open-source, next-generation distribution of the HDFS that replaces the main scalability bottleneck in HDFS, single node in-memory metadata service, with a no-shared state distributed system built on a NewSQL database. In particular, we discuss how we exploit recent high-performance features from NewSQL databases, such as application-defined partitioning, partition pruned index scans, and distribution aware transactions, as well as more traditional techniques such as batching and write-ahead caches, to enable a revolution in distributed hierarchical file system performance.HDFS’ design is optimized for the storage of large files, that is, files ranging from megabytes to terabytes in size. However, in many production deployments of the HDFS, it has been observed that almost 20% of the files in the system are less than 4 KB in size and as much as 42% of all the file system operations are performed on files less than 16 KB in size. HopsFS introduces a tiered storage solution to store files of different sizes more efficiently. The tiers range from the highest tier where an in-memory NewSQL database stores very small files (<1 KB), to the next tier where small files (<64 KB) are stored in solid-state-drives (SSDs), also using a NewSQL database, to the largest tier, the existing Hadoop block storage layer for very large files. Our approach is based on extending HopsFS with an inode stuffing technique, where we embed the contents of small files with the metadata and use database transactions and database replication guarantees to ensure the availability, integrity, and consistency of the small files. HopsFS enables significantly larger cluster sizes, more than an order of magnitude higher throughput, and significantly lower client latencies for large clusters.Lastly, coordination is an integral part of the distributed file system operations protocols. We present a novel leader election protocol for partially synchronous systems that uses NewSQL databases as shared memory. Our work enables HopsFS, that uses a NewSQL database to save the operational overhead of managing an additional third-party service for leader election and deliver performance comparable to a leader election implementation using a state-of-the-art distributed coordination service, ZooKeeper.

  • 82.
    Niazi, Salman
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Ismail, Mahmoud
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Dowling, Jim
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases2019In: Encyclopedia of Big Data Technologies / [ed] Sherif Sakr, Albert Y. Zomaya, Springer, 2019, p. 16-32Chapter in book (Refereed)
    Abstract [en]

    Modern NewSQL database systems can be used to store fully normalized metadata for distributed hierarchical file systems, and provide high throughput and low operational latencies for the file system operations.

  • 83.
    Niazi, Salman
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Ronström, Mikael
    Haridi, Seif
    KTH.
    Dowling, Jim
    KTH.
    Size Matters: Improving the Performance of Small Files in Hadoop2018Conference paper (Refereed)
    Abstract [en]

    The Hadoop Distributed File System (HDFS) is designed to handle massive amounts of data, preferably stored in very large files. The poor performance of HDFS in managing small files has long been a bane of the Hadoop community. In many production deployments of HDFS, almost 25% of the files are less than 16 KB in size and as much as 42% of all the file system operations are performed on these small files. We have designed an adaptive tiered storage using in-memory and on-disk tables stored in a high-performance distributed database to efficiently store and improve the performance of the small files in HDFS. Our solution is completely transparent, and it does not require any changes in the HDFS clients or the applications using the Hadoop platform. In experiments, we observed up to 61~times higher throughput in writing files, and for real-world workloads from Spotify our solution reduces the latency of reading and writing small files by a factor of 3.15 and 7.39 respectively.

  • 84.
    Olsson, Jakob
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Measuring the Technical and Process Benefits of Test Automation based on Machine Learning in an Embedded Device2018Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Learning-based testing is a testing paradigm that combines model-based testing with machine learning algorithms to automate the modeling of the SUT, test case generation, test case execution and verdict construction. A tool that implements LBT been developed at the CSC school at KTH called LBTest.

    LBTest utilizes machine learning algorithms with off-the-shelf equivalence- and model-checkers, and the modeling of user requirements by propositional linear temporal logic.

    In this study, it is be investigated whether LBT may be suitable for testing a micro bus architecture within an embedded telecommunication device. Furthermore ideas to further automate the testing process by designing a data model to automate user requirement generation are explored.

  • 85.
    Oz, Isil
    et al.
    Izmir Inst Technol, Comp Engn Dept, TR-35430 Gulbahce, Urla Izmir, Turkey..
    Bhatti, Muhammad Khurram
    Informat Technol Univ, Lahore 54000, Punjab, Pakistan..
    Popov, Konstantin
    SICS Swedish ICT AB, SE-16429 Stockholm, Sweden..
    Brorsson, Mats
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Regression-Based Prediction for Task-Based Program Performance2019In: Journal of Circuits, Systems and Computers, ISSN 0218-1266, Vol. 28, no 4, article id 1950060Article in journal (Refereed)
    Abstract [en]

    As multicore systems evolve by increasing the number of parallel execution units, parallel programming models have been released to exploit parallelism in the applications. Task-based programming model uses task abstractions to specify parallel tasks and schedules tasks onto processors at runtime. In order to increase the efficiency and get the highest performance, it is required to identify which runtime configuration is needed and how processor cores must be shared among tasks. Exploring design space for all possible scheduling and runtime options, especially for large input data, becomes infeasible and requires statistical modeling. Regression-based modeling determines the effects of multiple factors on a response variable, and makes predictions based on statistical analysis. In this work, we propose a regression-based modeling approach to predict the task-based program performance for different scheduling parameters with variable data size. We execute a set of task-based programs by varying the runtime parameters, and conduct a systematic measurement for influencing factors on execution time. Our approach uses executions with different configurations for a set of input data, and derives different regression models to predict execution time for larger input data. Our results show that regression models provide accurate predictions for validation inputs with mean error rate as low as 6.3%, and 14% on average among four task-based programs.

  • 86.
    Palmkvist, Viktor
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Broman, David
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Creating domain-specific languages by composing syntactical constructs2019In: 21st International Symposium on Practical Aspects of Declarative Languages, PADL 2019, Springer, 2019, Vol. 11372, p. 187-203Conference paper (Refereed)
    Abstract [en]

    Creating a programming language is a considerable undertaking, even for relatively small domain-specific languages (DSLs). Most approaches to ease this task either limit the flexibility of the DSL or consider entire languages as the unit of composition. This paper presents a new approach using syntactical constructs (also called syncons) for defining DSLs in much smaller units of composition while retaining flexibility. A syntactical construct defines a single language feature, such as an if statement or an anonymous function. Each syntactical construct is fully self-contained: it specifies its own concrete syntax, binding semantics, and runtime semantics, independently of the rest of the language. The runtime semantics are specified as a translation to a user defined target language, while the binding semantics allow name resolution before expansion. Additionally, we present a novel approach for dealing with syntactical ambiguity that arises when combining languages, even if the languages are individually unambiguous. The work is implemented and evaluated in a case study, where small subsets of OCaml and Lua have been defined and composed using syntactical constructs.

  • 87.
    Peiro Sajjad, Hooman
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Methods and Algorithms for Data-Intensive Computing: Streams, Graphs, and Geo-Distribution2019Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Struggling with the volume and velocity of Big Data has attracted lots of interest towards stream processing paradigm, a paradigm in the area of data-intensive computing that provides methods and solutions to process data in motion. Today's Big Data includes geo-distributed data sources.In addition, a major part of today's Big Data requires exploring complex and evolving relationships among data, which complicates any reasoning on the data. This thesis aims at challenges raised by geo-distributed streaming data, and the data with complex and evolving relationships.

    Many organizations provide global scale applications and services that are hosted on servers and data centers that are located in different parts of the world. Therefore, the data that needs to be processed are generated in different geographical locations. This thesis advocates for distributed stream processing in geo-distributed settings to improve the performance including better response time and lower network cost compared to centralized solutions. In this thesis, we conduct an experimental study of Apache Storm, a widely used open-source stream processing system, on a geo-distributed infrastructure made of near-the-edge resources. The resources that host the system's components are connected by heterogeneous network links. Our study exposes a set of issues and bottlenecks of deploying a stream processing system on the geo-distributed infrastructure. Inspired by the results, we propose a novel method for grouping of geo-distributed resources into computing clusters, called micro data centers, in order to mitigate the effect of network heterogeneity for distributed stream processing applications. Next, we focus on the windowed aggregation of geo-distributed data streams, which has been widely used in stream analytics. We propose to reduce the bandwidth cost by coordinating windowed aggregations among near-the-edge data centers. We leverage intra-region links and design a novel low-overhead coordination algorithm that optimizes communication cost for data aggregation. Then, we propose a system, called SpanEdge, that provides an expressive programming model to unify programming stream processing applications on a geo-distributed infrastructure and provides a run-time system to manage (schedule and execute) stream processing applications across data centers. Our results show that SpanEdge can optimally deploy stream processing applications in a geo-distributed infrastructure, which significantly reduces the bandwidth consumption and response latency.

    With respect to data with complex and evolving relationships, this thesis aims at effective and efficient processing of inter-connected data. There exist several domains such as social network analysis, machine learning, and web search in which data streams are modeled as linked entities of nodes and edges, namely a graph. Because of the inter-connection among the entities in graph data, processing of graph data is challenging. The inter-connection among the graph entities makes it difficult to distribute the graph among multiple machines to process the graph at scale. Furthermore, in a streaming setting, the graph structure and the graph elements can continuously change as the graph elements are streamed. Such a dynamic graph requires incremental computing methods that can avoid redundant computations on the whole graph. This thesis proposes incremental computing methods of streaming graph processing that can boost the processing time while still obtaining high quality results. In this thesis, we introduce HoVerCut, an efficient framework for boosting streaming graph partitioning algorithms. HoVerCut is Horizontally and Vertically scalable. Our evaluations show that HoVerCut speeds up the partitioning process significantly without degrading the quality of partitioning. Finally, we study unsupervised representation learning in dynamic graphs. Graph representation learning seeks to learn low dimensional vector representations for the graph elements, i.e. edges and vertices, and the whole graph.We propose novel and computationally efficient incremental algorithms. The computation complexity of our algorithms depends on the extent and rate of changes in a graph and on the graph density. The evaluation results show that our proposed algorithms can achieve competitive results to the state-of-the-art static methods while being computationally efficient.

  • 88.
    Peiro Sajjad, Hooman
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Docherty, Andrew
    Data61, CSIRO.
    Tyshetskiy, Yuriy
    Data61, CSIRO.
    Efficient Representation Learning Using RandomWalks for Dynamic GraphsManuscript (preprint) (Other academic)
    Abstract [en]

    An important part of many machine learning workflows on graphs is vertex representation learning, i.e., learning a low-dimensional vector representation for each vertex in the graph. Recently, several powerful techniques for unsupervised representation learning have been demonstrated to give the state-of-the-art performance in downstream tasks such as vertex classification and edge prediction. These techniques rely on random walks performed on the graph in order to capture its structural properties. These structural properties are then encoded in the vector representation space. 

    However, most contemporary representation learning methods only apply to static graphs while real-world graphs are often dynamic and change over time. Static representation learning methods are not able to update the vector representations when the graph changes; therefore, they must re-generate the vector representations on an updated static snapshot of the graph regardless of the extent of the change in the graph. In this work, we propose computationally efficient algorithms for vertex representation learning that extend random walk based methods to dynamic graphs. The computation complexity of our algorithms depends upon the extent and rate of changes (the number of edges changed per update) and on the density of the graph. We empirically evaluate our algorithms on real world datasets for downstream machine learning tasks of multi-class and multi-label vertex classification. The results show that our algorithms can achieve competitive results to the state-of-the-art methods while being computationally efficient.

  • 89.
    Peiro Sajjad, Hooman
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Liu, Ying
    Stockholm University.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Optimizing Windowed Aggregation over Geo-Distributed Data Streams2018In: 2018 IEEE International Conference on Edge Computing (EDGE), IEEE Computer Society Digital Library, 2018, p. 33-41Conference paper (Refereed)
    Abstract [en]

    Real-time data analytics is essential since more and more applications require online decision making in a timely manner. However, efficient analysis of geo-distributed data streams is challenging. This is because data needs to be collected from all edge data centers, which aggregate data from local sources, in order to process most of the analytic tasks. Thus, most of the time edge data centers need to transfer data to a central data center over a wide area network, which is expensive. In this paper, we advocate for a coordinated approach of edge data centers in order to handle these analytic tasks efficiently and hence, reducing the communication cost among data centers. We focus on the windowed aggregation of data streams, which has been widely used in stream analytics. In general, aggregation of data streams among edge data centers in the same region reduces the amount of data that needs to be sent over cross-region communication links. Based on state-of-the-art research, we leverage intra-region links and design a low-overhead coordination algorithm that optimizes communication cost for data aggregation. Our algorithm has been evaluated using synthetic and Big Data Benchmark datasets. The evaluation results show that our algorithm reduces the bandwidth cost up to ~6x, as compared to the state-of-the-art solution.

  • 90.
    Rodriguez-Cancio, Marcelino
    et al.
    University of Rennes 1.
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    White, Jules
    Vanderbildt University.
    Images of Code: Lossy Compression for Native Instructions2018Conference paper (Refereed)
    Abstract [en]

    Developers can use lossy compression on images and many other artifacts to reduce size and improve network transfer times. Native program instructions, however, are typically not considered candidates for lossy compression since arbitrary losses in instructions may dramatically affect program output. In this paper we show that lossy compression of compiled native instructions is possible in certain circumstances. We demonstrate that the instructions sequence of a program can be lossily translated into a separate but equivalent program with instruction-wise differences, which still produces the same output. We contribute the novel insight that it is possible to exploit such instruction differences to design lossy compression schemes for native code. We support this idea with sound and unsound program transformations that improve performance of compression techniques such as Run-Length (RLE), Huffman and LZ77. We also show that large areas of code can endure tampered instructions with no impact on the output, a result consistent with previous works from various communities.

  • 91.
    Rodriguez-Cancio, Marcelino
    et al.
    Vanderbilt Univ, 221 Kirkland Hall, Nashville, TN 37235 USA..
    Combemale, Benoit
    Univ Toulouse, INRIA, Toulouse, France..
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Approximate Loop Unrolling2019In: CF '19 - PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, ASSOC COMPUTING MACHINERY , 2019, p. 94-105Conference paper (Refereed)
    Abstract [en]

    We introduce Approximate Unrolling, a compiler loop optimization that reduces execution time and energy consumption, exploiting code regions that can endure some approximation and still produce acceptable results. Specifically, this work focuses on counted loops that map a function over the elements of an array. Approximate Unrolling transforms loops similarly to Loop Unrolling. However, unlike its exact counterpart, our optimization does not unroll loops by adding exact copies of the loop's body. Instead, it adds code that interpolates the results of previous iterations.

  • 92.
    Safinianaini, Negar
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Kaldo, Viktor
    Department of Psychology, Faculty of Health and Life Sciences, Linnaeus University, Växjö, Sweden;Centre for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet, and Stockholm Health Care Services, Stockholm County Council, Stockholm, Sweden.
    Gated hidden markov models for early prediction of outcome of internet-based cognitive behavioral therapy2019In: 17th Conference on Artificial Intelligence in Medicine, AIME 2019, Cham: Springer Verlag , 2019, p. 160-169Conference paper (Refereed)
    Abstract [en]

    Depression is a major threat to public health and its mitigation is considered to be of utmost importance. Internet-based Cognitive Behavioral Therapy (ICBT) is one of the employed treatments for depression. However, for the approach to be effective, it is crucial that the outcome of the treatment is accurately predicted as early as possible, to allow for its adaptation to the individual patient. Hidden Markov models (HMMs) have been commonly applied to characterize systematic changes in multivariate time series within health care. However, they have limited capabilities in capturing long-range interactions between emitted symbols. For the task of analyzing ICBT data, one such long-range interaction concerns the dependence of state transition on fractional change of emitted symbols. Gated Hidden Markov Models (GHMMs) are proposed as a solution to this problem. They extend standard HMMs by modifying the Expectation Maximization algorithm; for each observation sequence, the new algorithm regulates the transition probability update based on the fractional change, as specified by domain knowledge. GHMMs are compared to standard HMMs and a recently proposed approach, Inertial Hidden Markov Models, on the task of early prediction of ICBT outcome for treating depression; the algorithms are evaluated on outcome prediction, up to 7 weeks before ICBT ends. GHMMs are shown to outperform both alternative models, with an improvement of AUC ranging from 12 to 23%. These promising results indicate that considering fractional change of the observation sequence when updating state transition probabilities may indeed have a positive effect on early prediction of ICBT outcome.

  • 93.
    Simonsson, Jesper
    et al.
    KTH.
    Zhang, Long
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Morin, Brice
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Monperrus, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Observability and Chaos Engineering on System Calls for Containerized Applications in DockerManuscript (preprint) (Other academic)
    Abstract [en]

    In this paper, we present a novel fault injection system called ChaosOrca for system calls in containerized applications. ChaosOrca aims at evaluating a given application's self-protection capability with respect to system call errors. The unique feature of ChaosOrca is that it conducts experiments under production-like workload without instrumenting the application. We exhaustively analyze all kinds of system calls and utilize different levels of monitoring techniques to reason about the behaviour under perturbation. We evaluate ChaosOrca on three real-world applications: a file transfer client, a reverse proxy server and a micro-service oriented web application. Our results show that it is promising to detect weaknesses of resilience mechanisms related to system calls issues.

  • 94.
    Soliman, Amira
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Graph-based Analytics for Decentralized Online Social Networks2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Decentralized Online Social Networks (DOSNs) have been introduced as a privacy preserving alternative to the existing online social networks.  DOSNs remove the dependency on a centralized provider and operate as distributed information management platforms. Current efforts of providing DOSNs are mainly focused on designing the required building blocks for managing the distributed network and supporting the social services (e.g., search, content delivery, etc.). However, there is a lack of reliable techniques for enabling complex analytical services (e.g., spam detection, identity validation, etc.) that comply with the decentralization requirements of DOSNs. In particular, there is a need for decentralized data analytic techniques and machine learning (ML) algorithms that can successfully run on top of DOSNs.

     

    In this thesis, we empower decentralized analytics for DOSNs through a set of novel algorithms. Our algorithms allow decentralized analytics to effectively work on top of fully decentralized topology, when the data is fully distributed and nodes have access to their local knowledge only. Furthermore, our algorithms and methods are able to extract and exploit the latent patterns in the social user interaction networks and effectively combine them with the shared content, yielding significant improvements for the complex analytic tasks. We argue that, community identification is at the core of the learning and analytical services provided for DOSNs. We show in this thesis that knowledge on community structures and information dissemination patterns, embedded in the topology of social networks has a potential to greatly enhance data analytic insights and improve results. At the heart of this thesis lies a community detection technique that successfully extracts communities in a completely decentralized manner. In particular, we show that multiple complex analytic tasks, like spam detection and identity validation,  can be successfully tackled by harvesting the information from the social network structure. This is achieved by using decentralized community detection algorithm which acts as the main building block for the community-aware learning paradigm that we lay out in this thesis. To the best of our knowledge, this thesis represents the first attempt to bring complex analytical services, which require decentralized iterative computation over distributed data, to the domain of DOSNs. The experimental evaluation of our proposed algorithms using real-world datasets confirms the ability of our solutions to generate  efficient ML models in massively parallel and highly scalable manner.

  • 95.
    Soliman, Amira
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Stad: Stateful Diffusion for Linear Time Community DetectionManuscript (preprint) (Other academic)
    Abstract [en]

    Community detection is one of the preeminent topics in network analysis. Communities in real-world networks vary in their characteristics, such as their internal cohesion and size. Despite a large variety of methods proposed to detect communities so far, most of existing approaches fall into the category of global approaches. Specifically, these global approaches adapt their detection model focusing on approximating the global structure of the whole network, instead of performing approximation at the communities level. Global techniques tune their parameters to "one size fits all" model, so they are quite successful with extracting communities in homogeneous cases but suffer in heterogeneous community size distributions.

    In this paper, we present a stateful diffusion approach (Stad) for community detection that employs diffusion. Stad boosts diffusion with conductance-based function that acts like a tuning parameter to control the diffusion speed. In contrast to existing diffusion mechanisms which operate with global and fixed speed, Stad introduces stateful diffusion to treat every community individually. Particularly, Stad controls the diffusion speed at node level, such that each node determines the diffusion speed associated with every possible community membership independently. Thus, Stad is able to extract communities more accurately in heterogeneous cases by dropping "one size fits all" model. Furthermore, Stad employs a vertex-centric approach which is fully decentralized and highly scalable, and requires no global knowledge. So as, Stad can be successfully applied in distributed environments, such as large-scale graph processing or decentralized machine learning. The results with both real-world and synthetic datasets show that Stad outperforms the state-of-the-art techniques, not only in the community size scale issue but also by achieving higher accuracy that is twice the accuracy achieved by the state-of-the-art techniques.

  • 96.
    Soliman, Amira
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Rahimian, Fatemeh
    RISE SICS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Stad: Stateful Diffusion for Linear Time Community Detection2018In: 38th IEEE International Conference on Distributed Computing Systems, 2018Conference paper (Refereed)
    Abstract [en]

    Community detection is one of the preeminent topics in network analysis. Communities in real-world networks vary in their characteristics, such as their internal cohesion and size. Despite a large variety of methods proposed to detect communities so far, most of existing approaches fall into the category of global approaches. Specifically, these global approaches adapt their detection model focusing on approximating the global structure of the whole network, instead of performing approximation at the communities level. Global techniques tune their parameters to “one size fits all” model, so they are quite successful with extracting communities in homogeneous cases but suffer in heterogeneous community size distributions. In this paper, we present a stateful diffusion approach (Stad) for community detection that employs diffusion. Stad boosts diffusion with a conductance-based function that acts like a tuning parameter to control the diffusion speed. In contrast to existing diffusion mechanisms which operate with global and fixed speed, Stad introduces stateful diffusion to treat every community individually. Particularly, Stad controls the diffusion speed at node level, such that each node determines the diffusion speed associated with every possible community membership independently. Thus, Stad is able to extract communities more accurately in heterogeneous cases by dropping “one size fits all” model. Furthermore, Stad employs a vertex-centric approach which is fully decentralized and highly scalable, and requires no global knowledge. So as, Stad can be successfully applied in distributed environments, such as large-scale graph processing or decentralized machine learning. The results with both real-world and synthetic datasets show that Stad outperforms the state-of-the-art techniques, not only in the community size scale issue but also by achieving higher accuracy that is twice the accuracy achieved by the state-of-the-art techniques.

  • 97.
    Soto Valero, César
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Benelallam, Amine
    Univ Rennes, Inria, CNRS, IRISA, Rennes, France.
    Harrand, Nicolas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Barais, Olivier
    Univ Rennes, Inria, CNRS, IRISA, Rennes, France.
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    The Emergence of Software Diversity inMaven Central2019In: 16th International Conference on Mining Software Repositories, Montréal, QC, Canada: IEEE conference proceedings, 2019, p. 333-343Conference paper (Refereed)
    Abstract [en]

    Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to releasea new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a library can pick any version. In this work, we hypothesize that the immutability of Maven artifacts and the ability to choose any version naturally support the emergence of software diversity within Maven Central. We analyze 1,487,956 artifacts that represent all the versions of 73,653 libraries. We observe that more than 30% of libraries have multiple versions that are actively used by latest artifacts. In the case of popular libraries, more than 50% of their versions are used. We also observe that more than 17% of libraries have several versions that are significantly more used than the other versions. Our results indicate that the immutability of artifacts in Maven Central does support a sustained level of diversity among versions of libraries in the repository.

  • 98.
    Soto-Valero, César
    et al.
    Universidad Central de Las Villas.
    Bourcier, Johann
    University of Rennes.
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Detection and Analysis of Behavioral T-patternsi n Debugging Activities2018Conference paper (Refereed)
  • 99.
    Sozinov, Konstantin
    et al.
    KTH.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Human Activity Recognition Using Federated Learning2018In: 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS / [ed] Chen, JJ Yang, LT, IEEE COMPUTER SOC , 2018, p. 1103-1111Conference paper (Refereed)
    Abstract [en]

    State-of-the-art deep learning models for human activity recognition use large amount of sensor data to achieve high accuracy. However, training of such models in a data center using data collected from smart devices leads to high communication costs and possible privacy infringement. In order to mitigate aforementioned issues, federated learning can be employed to train a generic classifier by combining multiple local models trained on data originating from multiple clients. In this work we evaluate federated learning to train a human activity recognition classifier and compare its performance to centralized learning by building two models, namely a deep neural network and a softmax regression trained on both synthetic and real-world datasets. We study communication costs as well as the influence of erroneous clients with corrupted data in federated learning setting. We have found that federated learning for the task of human activity recognition is capable of producing models with slightly worse, but acceptable, accuracy compared to centralized models. In our experiments federated learning achieved an accuracy of up to 89 % compared to 93 % in centralized training for the deep neural network. The global model trained with federated learning on skewed datasets achieves accuracy comparable to centralized learning. Furthermore, we identified an important issue of clients with corrupted data and proposed a federated learning algorithm that identifies and rejects erroneous clients. Lastly, we have identified a trade-off between communication cost and the complexity of a model. We show that more complex models such as deep neural network require more communication in federated learning settings for human activity recognition compared to less complex models, such as multinomial logistic regression.

  • 100. Thomas, Denez
    et al.
    Harrand, Nicolas
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Bossis, Bruno
    Code{strata} sonifying software complexity2018In: TEI 2018 - Proceedings of the 12th International Conference on Tangible, Embedded, and Embodied Interaction, Association for Computing Machinery (ACM), 2018, p. 617-621Conference paper (Refereed)
    Abstract [en]

    Code{strata} is an interdisciplinary collaboration between art studies researchers (Rennes 2) and computer scientists (INRIA, KTH). It is a sound installation: a computer system unit made of concrete that sits on a wooden desk. The purpose of this project is to question the opacity and simplicity of high-level interfaces used in daily gestures. It takes the form of a 3-D sonification of a full software trace that is collected when performing a copy and paste command in a simple text editor. The user may hear, through headphones, a poetic interpretation of what happens in a computer, behind the of graphical interfaces. The sentence "Copy and paste" is played back in as many pieces as there are nested functions called during the execution of the command.

123 51 - 100 of 110
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf