kth.sePublications
Change search
Refine search result
1234567 1 - 50 of 490
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Abbas, Zainab
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Scalable Streaming Graph and Time Series Analysis Using Partitioning and Machine Learning2021Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Recent years have witnessed a massive increase in the amount of data generated by the Internet of Things (IoT) and social media. Processing huge amounts of this data poses non-trivial challenges in terms of the hardware and performance requirements of modern-day applications. The data we are dealing with today is of massive scale, high intensity and comes in various forms. MapReduce was a popular and clever choice of handling big data using a distributed programming model, which made the processing of huge volumes of data possible using clusters of commodity machines. However, MapReduce was not a good fit for performing complex tasks, such as graph processing, iterative programs and machine learning. Modern data processing frameworks, that are being popularly used to process complex data and perform complex analysis tasks, overcome the shortcomings of MapReduce. Some of these popular frameworks include Apache Spark for batch and stream processing, Apache Flink for stream processing and Tensor Flow for machine learning.

    In this thesis, we deal with complex analytics on data modeled as time series, graphs and streams. Time series are commonly used to represent temporal data generated by IoT sensors. Analysing and forecasting time series, i.e. extracting useful characteristics and statistics of data and predicting data, is useful for many fields that include, neuro-physiology, economics, environmental studies, transportation, etc. Another useful data representation we work with, are graphs. Graphs are complex data structures used to represent relational data in the form of vertices and edges. Graphs are present in various application domains, such as recommendation systems, road traffic analytics, web analysis, social media analysis. Due to the increasing size of graph data, a single machine is often not sufficient to process the complete graph. Therefore, the computation, as well as the data, must be distributed. Graph partitioning, the process of dividing graphs into subgraphs, is an essential step in distributed graph processing of large scale graphs because it enables parallel and distributed processing.

    The majority of data generated from IoT and social media originates as a continuous stream, such as series of events from a social media network, time series generated from sensors, financial transactions, etc. The stream processing paradigm refers to the processing of data streaming that is continuous and possibly unbounded. Combining both graphs and streams leads to an interesting and rather challenging domain of streaming graph analytics. Graph streams refer to data that is modelled as a stream of edges or vertices with adjacency lists representing relations between entities of continuously evolving data generated by a single or multiple data sources. Streaming graph analytics is an emerging research field with great potential due to its capabilities of processing large graph streams with limited amounts of memory and low latency. 

    In this dissertation, we present graph partitioning techniques for scalable streaming graph and time series analysis. First, we present and evaluate the use of data partitioning to enable data parallelism in order to address the challenge of scale in large spatial time series forecasting. We propose a graph partitioning technique for large scale spatial time series forecasting of road traffic as a use-case. Our experimental results on traffic density prediction for real-world sensor dataset using Long Short-Term Memory Neural Networks show that the partitioning-based models take 12x lower training time when run in parallel compared to the unpartitioned model of the entire road infrastructure. Furthermore, the partitioning-based models have 2x lower prediction error (RMSE) compared to the entire road model. Second, we showcase the practical usefulness of streaming graph analytics for large spatial time series analysis with the real-world task of traffic jam detection and reduction. We propose to apply streaming graph analytics by performing useful analytics on traffic data stream at scale with high throughput and low latency. Third, we study, evaluate, and compare the existing state-of-the-art streaming graph partitioning algorithms. We propose a uniform analysis framework built using Apache Flink to evaluate and compare partitioning features and characteristics of streaming graph partitioning methods. Finally, we present GCNSplit, a novel ML-driven streaming graph partitioning solution, that uses a small and constant in-memory state (bounded state) to partition (possibly unbounded) graph streams. Our results demonstrate that \ours provides high-throughput partitioning and can leverage data parallelism to sustain input rates of 100K edges/s. GCNSplit exhibits a partitioning quality, in terms of graph cuts and load balance, that matches that of the state-of-the-art HDRF (High Degree Replicated First) algorithm while storing three orders of magnitude smaller partitioning state.

    Download full text (pdf)
    fulltext
  • 2.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Al-Shishtawy, Ahmad
    RISE SICS, Stockholm, Sweden.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. RISE SICS, Stockholm, Sweden..
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Short-Term Traffic Prediction Using Long Short-Term Memory Neural Networks2018Conference paper (Refereed)
    Abstract [en]

    Short-term traffic prediction allows Intelligent Transport Systems to proactively respond to events before they happen. With the rapid increase in the amount, quality, and detail of traffic data, new techniques are required that can exploit the information in the data in order to provide better results while being able to scale and cope with increasing amounts of data and growing cities. We propose and compare three models for short-term road traffic density prediction based on Long Short-Term Memory (LSTM) neural networks. We have trained the models using real traffic data collected by Motorway Control System in Stockholm that monitors highways and collects flow and speed data per lane every minute from radar sensors. In order to deal with the challenge of scale and to improve prediction accuracy, we propose to partition the road network into road stretches and junctions, and to model each of the partitions with one or more LSTM neural networks. Our evaluation results show that partitioning of roads improves the prediction accuracy by reducing the root mean square error by the factor of 5. We show that we can reduce the complexity of LSTM network by limiting the number of input sensors, on average to 35% of the original number, without compromising the prediction accuracy.

  • 3.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Ivarsson, Jón Reginbald
    KTH.
    Al-Shishtawy, A.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Scaling Deep Learning Models for Large Spatial Time-Series Forecasting:
    2019In: Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019:
    , Institute of Electrical and Electronics Engineers Inc. , 2019, p. 1587-1594
    Conference paper (Refereed)
    Abstract [en]

    Neural networks are used for different machine learning tasks, such as spatial time-series forecasting. Accurate modelling of a large and complex system requires large datasets to train a deep neural network that causes a challenge of scale as training the network and serving the model are computationally and memory intensive. One example of a complex system that produces a large number of spatial time-series is a large road sensor infrastructure deployed for traffic monitoring. The goal of this work is twofold: 1) To model large amount of spatial time-series from road sensors; 2) To address the scalability problem in a real-life task of large-scale road traffic prediction which is an important part of an Intelligent Transportation System.We propose a partitioning technique to tackle the scalability problem that enables parallelism in both training and prediction: 1) We represent the sensor system as a directed weighted graph based on the road structure, which reflects dependencies between sensor readings, and weighted by sensor readings and inter-sensor distances; 2) We propose an algorithm to automatically partition the graph taking into account dependencies between spatial time-series from sensors; 3) We use the generated sensor graph partitions to train a prediction model per partition. Our experimental results on traffic density prediction using Long Short-Term Memory (LSTM) Neural Networks show that the partitioning-based models take 2x, if run sequentially, and 12x, if run in parallel, less training time, and 20x less prediction time compared to the unpartitioned model of the entire road infrastructure. The partitioning-based models take 100x less total sequential training time compared to single sensor models, i.e., one model per sensor. Furthermore, the partitioning-based models have 2x less prediction error (RMSE) compared to both the single sensor models and the entire road model. 

  • 4.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Kalavri, Vasiliki
    Systems Group, ETH, Zurich, Switzerland.
    Carbone, Paris
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Streaming Graph Partitioning: An Experimental Study2018In: Proceedings of the VLDB Endowment, E-ISSN 2150-8097, Vol. 11, no 11, p. 1590-1603Article in journal (Refereed)
    Abstract [en]

    Graph partitioning is an essential yet challenging task for massive graph analysis in distributed computing. Common graph partitioning methods scan the complete graph to obtain structural characteristics offline, before partitioning. However, the emerging need for low-latency, continuous graph analysis led to the development of online partitioning methods. Online methods ingest edges or vertices as a stream, making partitioning decisions on the fly based on partial knowledge of the graph. Prior studies have compared offline graph partitioning techniques across different systems. Yet, little effort has been put into investigating the characteristics of online graph partitioning strategies.

    In this work, we describe and categorize online graph partitioning techniques based on their assumptions, objectives and costs. Furthermore, we employ an experimental comparison across different applications and datasets, using a unified distributed runtime based on Apache Flink. Our experimental results showcase that model-dependent online partitioning techniques such as low-cut algorithms offer better performance for communication-intensive applications such as bulk synchronous iterative algorithms, albeit higher partitioning costs. Otherwise, model-agnostic techniques trade off data locality for lower partitioning costs and balanced workloads which is beneficial when executing data-parallel single-pass graph algorithms.

  • 5.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Sigurdsson, Thorsteinn Thorri
    KTH.
    Al-Shishtawy, Ahmad
    RISE Res Inst Sweden, Stockholm, Sweden..
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Evaluation of the Use of Streaming Graph Processing Algorithms for Road Congestion Detection2018In: 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS / [ed] Chen, JJ Yang, LT, IEEE COMPUTER SOC , 2018, p. 1017-1025Conference paper (Refereed)
    Abstract [en]

    Real-time road congestion detection allows improving traffic safety and route planning. In this work, we propose to use streaming graph processing algorithms for road congestion detection and evaluate their accuracy and performance. We represent road infrastructure sensors in the form of a directed weighted graph and adapt the Connected Components algorithm and some existing graph processing algorithms, originally used for community detection in social network graphs, for the task of road congestion detection. In our approach, we detect Connected Components or communities of sensors with similarly weighted edges that reflect different states in the traffic, e.g., free flow or congested state, in regions covered by detected sensor groups. We have adapted and implemented the Connected Components and community detection algorithms for detecting groups in the weighted sensor graphs in batch and streaming manner. We evaluate our approach by building and processing the road infrastructure sensor graph for Stockholm's highways using real-world data from the Motorway Control System operated by the Swedish traffic authority. Our results indicate that the Connected Components and DenGraph community detection algorithms can detect congestion with accuracy up to approximate to 94% for Connected Components and up to approximate to 88% for DenGraph. The Louvain Modularity algorithm for community detection fails to detect congestion regions for sparsely connected graphs, representing roads that we have considered in this study. The Hierarchical Clustering algorithm using speed and density readings is able to detect congestion without details, such as shockwaves.

  • 6.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Sottovia, Paolo
    Huawei Munich Research Centre, Munich, Germany.
    Hassan, Mohamad Al Hajj
    Huawei Munich Research Centre, Munich, Germany.
    Foroni, Daniele
    Huawei Munich Research Centre, Munich, Germany.
    Bortoli, Stefano
    Huawei Munich Research Centre, Munich, Germany.
    Real-time Traffic Jam Detection and Congestion Reduction Using Streaming Graph Analytics2020In: 2020 IEEE International Conference on Big Data (Big Data), Institute of Electrical and Electronics Engineers (IEEE) , 2020, p. 3109-3118Conference paper (Refereed)
    Abstract [en]

    Traffic congestion is a problem in day to day life, especially in big cities. Various traffic control infrastructure systems have been deployed to monitor and improve the flow of traffic across cities. Real-time congestion detection can serve for many useful purposes that include sending warnings to drivers approaching the congested area and daily route planning. Most of the existing congestion detection solutions combine historical data with continuous sensor readings and rely on data collected from multiple sensors deployed on the road, measuring the speed of vehicles. While in our work we present a framework that works in a pure streaming setting where historic data is not available before processing. The traffic data streams, possibly unbounded, arrive in real-time. Moreover, the data used in our case is collected only from sensors placed on the intersections of the road. Therefore, we investigate in creating a real-time congestion detection and reduction solution, that works on traffic streams without any prior knowledge. The goal of our work is 1) to detect traffic jams in real-time, and 2) to reduce the congestion in the traffic jam areas.In this work, we present a real-time traffic jam detection and congestion reduction framework: 1) We propose a directed weighted graph representation of the traffic infrastructure network for capturing dependencies between sensor data to measure traffic congestion; 2) We present online traffic jam detection and congestion reduction techniques built on a modern stream processing system, i.e., Apache Flink; 3) We develop dynamic traffic light policies for controlling traffic in congested areas to reduce the travel time of vehicles. Our experimental results indicate that we are able to detect traffic jams in real-time and deploy new traffic light policies which result in 27% less travel time at the best and 8% less travel time on average compared to the travel time with default traffic light policies. Our scalability results show that our system is able to handle high-intensity streaming data with high throughput and low latency.

  • 7.
    Abdalmoaty, Mohamed
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).
    Eriksson, Oscar
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Bereza-Jarocinski, Robert
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).
    Broman, David
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Hjalmarsson, Håkan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).
    Identification of Non-Linear Differential-Algebraic Equation Models with Process Disturbances2021In: Proceedings The 60th IEEE conference on Decision and Control (CDC), Institute of Electrical and Electronics Engineers (IEEE) , 2021Conference paper (Refereed)
    Abstract [en]

    Differential-algebraic equations (DAEs) arise naturally as a result of equation-based object-oriented modeling. In many cases, these models contain unknown parameters that have to be estimated using experimental data. However, often the system is subject to unknown disturbances which, if not taken into account in the estimation, can severely affect the model's accuracy. For non-linear state-space models, particle filter methods have been developed to tackle this issue. Unfortunately, applying such methods to non-linear DAEs requires a transformation into a state-space form, which is particularly difficult to obtain for models with process disturbances. In this paper, we propose a simulation-based prediction error method that can be used for non-linear DAEs where disturbances are modeled as continuous-time stochastic processes. To the authors' best knowledge, there are no general methods successfully dealing with parameter estimation for this type of model. One of the challenges in particle filtering  methods are random variations in the minimized cost function due to the nature of the algorithm. In our approach, a similar phenomenon occurs and we explicitly consider how to sample the underlying continuous process to mitigate this problem. The method is illustrated numerically on a pendulum example. The results suggest that the method is able to deliver consistent estimates.

    Download full text (pdf)
    fulltext
  • 8.
    Akhavan Rahnama, Amir Hossein
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    The Blame Problem in Evaluating Local Explanations and How to Tackle It2024In: Artificial Intelligence. ECAI 2023 International Workshops - XAI^3, TACTIFUL, XI-ML, SEDAMI, RAAIT, AI4S, HYDRA, AI4AI, 2023, Proceedings, Springer Nature , 2024, p. 66-86Conference paper (Refereed)
    Abstract [en]

    The number of local model-agnostic explanation techniques proposed has grown rapidly recently. One main reason is that the bar for developing new explainability techniques is low due to the lack of optimal evaluation measures. Without rigorous measures, it is hard to have concrete evidence of whether the new explanation techniques can significantly outperform their predecessors. Our study proposes a new taxonomy for evaluating local explanations: robustness, evaluation using ground truth from synthetic datasets and interpretable models, model randomization, and human-grounded evaluation. Using this proposed taxonomy, we highlight that all categories of evaluation methods, except those based on the ground truth from interpretable models, suffer from a problem we call the “blame problem.” In our study, we argue that this category of evaluation measure is a more reasonable method for evaluating local model-agnostic explanations. However, we show that even this category of evaluation measures has further limitations. The evaluation of local explanations remains an open research problem.

  • 9.
    Aler, Ricardo
    et al.
    Univ Carlos III Madrid, Avda Univ 30, Leganes 28911, Spain..
    Valls, Jose M.
    Univ Carlos III Madrid, Avda Univ 30, Leganes 28911, Spain..
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Study of Hellinger Distance as a splitting metric for Random Forests in balanced and imbalanced classification datasets2020In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 149, article id 113264Article in journal (Refereed)
    Abstract [en]

    Hellinger Distance (HD) is a splitting metric that has been shown to have an excellent performance for imbalanced classification problems for methods based on Bagging of trees, while also showing good performance for balanced problems. Given that Random Forests (RF) use Bagging as one of two fundamental techniques to create diversity in the ensemble, it could be expected that HD is also effective for this ensemble method. The main aim of this article is to carry out an extensive investigation on important aspects about the use of HD in RF, including handling of multi-class problems, hyper-parameter optimization, metrics comparison, probability estimation, and metrics combination. In particular, HD is compared to other commonly used splitting metrics (Gini and Gain Ratio) in several contexts: balanced/imbalanced and two-class/multi-class. Two aspects related to classification problems are assessed: classification itself and probability estimation. HD is defined for two-class problems, but there are several ways in which it can be extended to deal with multi-class and this article studies the performance of the available options. Finally, even though HD can be used as an alternative to other splitting metrics, there is no reason to limit RF to use just one of them. Therefore, the final study of this article is to determine whether selecting the splitting metric using cross-validation on the training data can improve results further. Results show HD to be a robust measure for RF, with some weakness for balanced multi-class datasets (especially for probability estimation). Combination of metrics is able to result in a more robust performance. However, experiments of HD with text datasets show Gini to be more suitable than HD for this kind of problems.

  • 10.
    Alferez, Mauricio
    et al.
    Univ Luxembourg, Interdisciplinary Ctr Secur Reliabil & Trust SnT, 2 Ave JF Kennedy, L-1855 Luxembourg, Luxembourg..
    Acher, Mathieu
    Univ Rennes, DiverSE Team Inria Rennes, IRISA, CNRS, Rennes, France..
    Galindo, Jose A.
    Univ Seville, Dept Comp Languages & Syst, Seville, Spain..
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Benavides, David
    Univ Seville, Dept Comp Languages & Syst, Seville, Spain..
    Modeling variability in the video domain: language and experience report2019In: Software quality journal, ISSN 0963-9314, E-ISSN 1573-1367, Vol. 27, no 1, p. 307-347Article in journal (Refereed)
    Abstract [en]

    In an industrial project, we addressed the challenge of developing a software-based video generator such that consumers and providers of video processing algorithms can benchmark them on a wide range of video variants. This article aims to report on our positive experience in modeling, controlling, and implementing software variability in the video domain. We describe how we have designed and developed a variability modeling language, called VM, resulting from the close collaboration with industrial partners during 2 years. We expose the specific requirements and advanced variability constructs; we developed and used to characterize and derive variations of video sequences. The results of our experiments and industrial experience show that our solution is effective to model complex variability information and supports the synthesis of hundreds of realistic video variants. From the software language perspective, we learned that basic variability mechanisms are useful but not enough; attributes and multi-features are of prior importance; meta-information and specific constructs are relevant for scalable and purposeful reasoning over variability models. From the video domain and software perspective, we report on the practical benefits of a variability approach. With more automation and control, practitioners can now envision benchmarking video algorithms over large, diverse, controlled, yet realistic datasets (videos that mimic real recorded videos)-something impossible at the beginning of the project.

  • 11.
    Alkathiri, Abdul Aziz
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Giaretta, Lodovico
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Sahlgren, Magnus
    Decentralized Word2Vec Using Gossip Learning2021In: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021), 2021Conference paper (Refereed)
    Abstract [en]

    Advanced NLP models require huge amounts of data from various domains to produce high-quality representations. It is useful then for a few large public and private organizations to join their corpora during training. However, factors such as legislation and user emphasis on data privacy may prevent centralized orchestration and data sharing among these organizations. Therefore, for this specific scenario, we investigate how gossip learning, a massively-parallel, data-private, decentralized protocol, compares to a shared-dataset solution. We find that the application of Word2Vec in a gossip learning framework is viable. Without any tuning, the results are comparable to a traditional centralized setting, with a reduction in ground-truth similarity scores as low as 4.3%. Furthermore, the results are up to 54.8% better than independent local training.

    Download full text (pdf)
    fulltext
  • 12.
    Alkhatib, Amr
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Ennadir, Sofiane
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Johansson, Ulf
    Dept. of Computing, Jönköping University, Sweden.
    Approximating Score-based Explanation Techniques Using Conformal Regression2023In: Proceedings of the 12th Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2023, ML Research Press , 2023, p. 450-469Conference paper (Refereed)
    Abstract [en]

    Score-based explainable machine-learning techniques are often used to understand the logic behind black-box models. However, such explanation techniques are often computationally expensive, which limits their application in time-critical contexts. Therefore, we propose and investigate the use of computationally less costly regression models for approximating the output of score-based explanation techniques, such as SHAP. Moreover, validity guarantees for the approximated values are provided by the employed inductive conformal prediction framework. We propose several non-conformity measures designed to take the difficulty of approximating the explanations into account while keeping the computational cost low. We present results from a large-scale empirical investigation, in which the approximate explanations generated by our proposed models are evaluated with respect to efficiency (interval size). The results indicate that the proposed method can significantly improve execution time compared to the fast version of SHAP, TreeSHAP. The results also suggest that the proposed method can produce tight intervals, while providing validity guarantees. Moreover, the proposed approach allows for comparing explanations of different approximation methods and selecting a method based on how informative (tight) are the predicted intervals.

  • 13.
    Alkhatib, Amr
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Johansson, Ulf
    Dept. of Computing, Jönköping University, Sweden.
    Assessing Explanation Quality by Venn Prediction2022In: Proceedings of the 11th Symposium on Conformal and Probabilistic Prediction with Applications, COPA 2022, ML Research Press , 2022, p. 42-54Conference paper (Refereed)
    Abstract [en]

    Rules output by explainable machine learning techniques naturally come with a degree of uncertainty, as the complex functionality of the underlying black-box model often can be difficult to approximate by a single, interpretable rule. However, the uncertainty of these approximations is not properly quantified by current explanatory techniques. The use of Venn prediction is here proposed and investigated as a means to quantify the uncertainty of the explanations and thereby also allow for competing explanation techniques to be evaluated with respect to their relative uncertainty. A number of metrics of rule explanation quality based on uncertainty are proposed and discussed, including metrics that capture the tendency of the explanations to predict the correct outcome of a black-box model on new instances, how informative (tight) the produced intervals are, and how certain a rule is when predicting one class. An empirical investigation is presented, in which explanations produced by the state-of-the-art technique Anchors are compared to explanatory rules obtained from association rule mining. The results suggest that the association rule mining approach may provide explanations with less uncertainty towards the correct label, as predicted by the black-box model, compared to Anchors. The results also show that the explanatory rules obtained through association rule mining result in tighter intervals and are closer to either one or zero compared to Anchors, i.e., they are more certain towards a specific class label.

  • 14.
    Alkhatib, Amr
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Boström, Henrik
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Vazirgiannis, Michalis
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Explaining Predictions by Characteristic Rules2023In: Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2022, Part I / [ed] Amini, MR Canu, S Fischer, A Guns, T Novak, PK Tsoumakas, G, Springer Nature , 2023, Vol. 13713, p. 389-403Conference paper (Refereed)
    Abstract [en]

    Characteristic rules have been advocated for their ability to improve interpretability over discriminative rules within the area of rule learning. However, the former type of rule has not yet been used by techniques for explaining predictions. A novel explanation technique, called CEGA (Characteristic Explanatory General Association rules), is proposed, which employs association rule mining to aggregate multiple explanations generated by any standard local explanation technique into a set of characteristic rules. An empirical investigation is presented, in which CEGA is compared to two state-of-the-art methods, Anchors and GLocalX, for producing local and aggregated explanations in the form of discriminative rules. The results suggest that the proposed approach provides a better trade-off between fidelity and complexity compared to the two state-of-the-art approaches; CEGA and Anchors significantly outperform GLocalX with respect to fidelity, while CEGA and GLocalX significantly outperform Anchors with respect to the number of generated rules. The effect of changing the format of the explanations of CEGA to discriminative rules and using LIME and SHAP as local explanation techniques instead of Anchors are also investigated. The results show that the characteristic explanatory rules still compete favorably with rules in the standard discriminative format. The results also indicate that using CEGA in combination with either SHAP or Anchors consistently leads to a higher fidelity compared to using LIME as the local explanation technique.

  • 15.
    Alsayfi, Majed S.
    et al.
    King Abdelaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia..
    Dahab, Mohamed Y.
    King Abdelaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia..
    Eassa, Fathy E.
    King Abdelaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia..
    Salama, Reda
    King Abdelaziz Univ, Fac Comp & Informat Technol, Dept Informat Technol, Jeddah 21589, Saudi Arabia..
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Al-Ghamdi, Abdullah S.
    King Abdelaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia.;King Abdelaziz Univ, Fac Comp & Informat Technol, Dept Informat Technol, Jeddah 21589, Saudi Arabia..
    Big Data in Vehicular Cloud Computing: Review, Taxonomy, and Security Challenges2022In: ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, Vol. 28, no 2, p. 59-71Article, review/survey (Refereed)
    Abstract [en]

    Modern vehicles equipped with various smart sensors have become a means of transportation and have become a means of collecting, creating, computing, processing, and transferring data while traveling through modern and rural cities. A traditional vehicular ad hoc network (VANET) cannot handle the enormous and complex data that are collected by modern vehicle sensors (e.g., cameras, lidar, and global positioning systems (GPS)) because they require rapid processing, analysis, management, storage, and uploading to trusted national authorities. Furthermore, the integrated VANET with cloud computing presents a new concept, vehicular cloud computing (VCC), which overcomes the limitations of VANET, brings new services and applications to vehicular networks, and generates a massive amount of data compared to the data collected by individual vehicles alone. Therefore, this study explored the importance of big data in VCC. First, we provide an overview of traditional vehicular networks and their limitations. Then we investigate the relationship between VCC and big data, fundamentally focusing on how VCC can generate, transmit, store, upload, and process big data to share it among vehicles on the road. Subsequently, a new taxonomy of big data in VCC was presented. Finally, the security challenges in big data-based VCCs are discussed.

  • 16.
    Alsayfi, Majed S.
    et al.
    Taibah Univ, Coll Comp Sci & Engn, Dept Comp Sci, Medina 42353, Saudi Arabia.;King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia..
    Dahab, Mohamed Y.
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia..
    Eassa, Fathy E.
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia..
    Salama, Reda
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Informat Technol, Jeddah 21589, Saudi Arabia..
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Al-Ghamdi, Abdullah S.
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Informat Syst, Jeddah 21589, Saudi Arabia..
    Securing Real-Time Video Surveillance Data in Vehicular Cloud Computing: A Survey2022In: IEEE Access, E-ISSN 2169-3536, Vol. 10, p. 51525-51547Article in journal (Refereed)
    Abstract [en]

    Vehicular ad hoc networks (VANETs) have received a great amount of interest, especially in wireless communications technology. In VANETs, vehicles are equipped with various intelligent sensors that can collect real-time data from inside and from surrounding vehicles. These real-time data require powerful computation, processing, and storage. However, VANETs cannot manage these real-time data because of the limited storage capacity in on board unit (OBU). To address this limitation, a new concept is proposed in which a VANET is integrated with cloud computing to form vehicular cloud computing (VCC) technology. VCC can manage real-time services, such as real-time video surveillance data that are used for monitoring critical events on the road. These real-time video surveillance data include highly sensitive data that should be protected against intruders in the networks because any manipulation, alteration, or sniffing of data will affect a driver's life by causing improper decision-making. The security and privacy of real-time video surveillance data are major challenges in VCC. Therefore, this study reviewed the importance of the security and privacy of real-time video data in VCC. First, we provide an overview of VANETs and their limitations. Second, we provide a state-of-the-art taxonomy for real-time video data in VCC. Then, the importance of real-time video surveillance data in both fifth generation (5G), and sixth generation (6G) networks is presented. Finally, the challenges and open issues of real-time video data in VCC are discussed.

  • 17.
    Angelovska, Marina
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
    Sheikholeslami, Sina
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Dunn, Bas
    Bol Com, Utrecht, Netherlands.
    Payberah, Amir H.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Siamese Neural Networks for Detecting Complementary Products2021In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, Association for Computational Linguistics , 2021, p. 65-70Conference paper (Refereed)
    Abstract [en]

    Recommender systems play an important role in e-commerce websites as they improve the customer journey by helping the users find what they want at the right moment. In this paper, we focus on identifying a complementary relationship between the products of an e-commerce company. We propose a content-based recommender system for detecting complementary products, using Siamese Neural Networks (SNN). To this end, we implement and compare two different models: Siamese Convolutional Neural Network (CNN) and Siamese Long Short-Term Memory (LSTM). Moreover, we propose an extension of the SNN approach to handling millions of products in a matter of seconds, and we reduce the training time complexity by half. In the experiments, we show that Siamese LSTM can predict complementary products with an accuracy of ~85% using only the product titles.

  • 18.
    Antaris, Stefanos
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Enabling Enterprise Live Video Streaming with Reinforcement Learning and Graph Neural Networks2022Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Over the last decade, video has vastly become the most popular way the world consumes content. Due to the increased popularity, video has been a strategic tool for enterprises. More specifically, enterprises organize live video streaming events for both internal and external purposes in order to attract large audiences and disseminate important information. However, streaming a high- quality video internally in large multinational corporations, with thousands of employees spread around the world, is a challenging task. The main challenge is to prevent catastrophic network congestion in the enterprise network when thousand of employees attend a high-quality video event simultaneously. Given that large enterprises invest a significant amount of their annual budget on live video streaming events, it is essential to ensure that the office network will not be congested and each viewer will have high quality of experience during the event.

    To address this challenge, large enterprises employ distributed live video streaming solutions to distribute high-quality video content between viewers of the same network. Such solutions rely on prior knowledge of the enterprise network topology to efficiently reduce the network bandwidth requirements during the event. Given that such knowledge is not always feasible to acquire, the distributed solutions must detect the network topology in real-time during the event. However, distributed solutions require a service to detect the network topology in the first minutes of the event, also known as the joining phase. Failing to promptly detect the enterprise network topology negatively impacts the event’s performance. In particular, distributed solutions may establish connections between viewers of different offices with limited network capacity. As a result, the enterprise network will be congested, and the employees will drop the event from the beginning of the event if they experience video quality issues.

    In this thesis, we investigate and propose novel machine learning models allowing the enterprise network topology service to detect the topology in real- time. In particular, we investigate the network distribution of live video streaming events caused by the distributed software solutions. In doing so, we propose several graph neural network models to detect the network topology in the first minutes of the event. Live video streaming solutions can adjust the viewers’ connections to distribute high-quality video content between viewers of the same office, avoiding the risk of network congestion. We compare our models with several baselines in real-world datasets and show that our models achieve significant improvement via empirical evaluations.

    Another critical factor for the efficiency of live video streaming events is the enterprise network topology service latency. Distributed live video streaming solutions require minimum latency to infer the network topology and adjust the viewers’ connections. We study the impact of the graph neural network size on the model’s online inference latency and propose several knowledge distillation strategies to generate compact models. Therefore, we create models with significantly fewer parameters, reducing the online inference latency while achieving high accuracy in the network topology detection task. Compared with state-of-the-art approaches, our proposed models have several orders of magnitude fewer parameters while maintaining high accuracy.

    Furthermore, we address the continuously evolving enterprise network topology problem. Modern enterprise networks frequently change their topology to manage their business needs. Therefore, distributed live video streaming solutions must capture the network topology changes and adjust their network topology detection service in real time. To tackle this problem, we propose several novel machine learning models that exploit historical events to assist the models in detecting the network topology in the first minutes of the event. We investigate the distribution of the viewers participating in the events. We propose efficient reinforcement learning and meta-learning techniques to learn the enterprise network topology for each new event. By applying meta-learning and reinforcement learning, we can generalize network topology changes and ensure that every viewer will have a high-quality experience during an event. Compared with baseline approaches, we achieved superior performance in establishing connections between viewers of the same office in the first minutes of the event. Therefore, we ensure that distributed solutions provide a high return on investment in every live video streaming event without risking any enterprise network congestion. 

    Download full text (pdf)
    Kappa
  • 19.
    Antaris, Stefanos
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. Hive Streaming AB, Stockholm, Sweden..
    Rafailidis, Dimitrios
    Univ Thessaly, Volos, Greece..
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    A Deep Graph Reinforcement Learning Model for Improving User Experience in Live Video Streaming2021In: 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) / [ed] Chen, Y Ludwig, H Tu, Y Fayyad, U Zhu, X Hu, X Byna, S Liu, X Zhang, J Pan, S Papalexakis, V Wang, J Cuzzocrea, A Ordonez, C, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 1787-1796Conference paper (Refereed)
    Abstract [en]

    In this paper we present a deep graph reinforcement learning model to predict and improve the user experience during a live video streaming event, orchestrated by an agent/tracker. We first formulate the user experience prediction problem as a classification task, accounting for the fact that most of the viewers at the beginning of an event have poor quality of experience due to low-bandwidth connections and limited interactions with the tracker. In our model we consider different factors that influence the quality of user experience and train the proposed model on diverse state-action transitions when viewers interact with the tracker. In addition, provided that past events have various user experience characteristics we follow a gradient boosting strategy to compute a global model that learns from different events. Our experiments with three real-world datasets of live video streaming events demonstrate the superiority of the proposed model against several baseline strategies. Moreover, as the majority of the viewers at the beginning of an event has poor experience, we show that our model can significantly increase the number of viewers with high quality experience by at least 75% over the first streaming minutes. Our evaluation datasets and implementation are publicly available at https://publicresearch.z13.web.core.windows.net

  • 20.
    Antaris, Stefanos
    et al.
    KTH. HiveStreaming AB, Stockholm, Sweden..
    Rafailidis, Dimitrios
    Maastricht Univ, Maastricht, Netherlands..
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    EGAD: Evolving Graph Representation Learning with Self-Attention and Knowledge Distillation for Live Video Streaming Events2020In: 2020 IEEE international conference on big data (big data) / [ed] Wu, XT Jermaine, C Xiong, L Hu, XH Kotevska, O Lu, SY Xu, WJ Aluru, S Zhai, CX Al-Masri, E Chen, ZY Saltz, J, Institute of Electrical and Electronics Engineers (IEEE) , 2020, p. 1455-1464Conference paper (Refereed)
    Abstract [en]

    In this study, we present a dynamic graph representation learning model on weighted graphs to accurately predict the network capacity of connections between viewers in a live video streaming event. We propose EGAD, a neural network architecture to capture the graph evolution by introducing a self-attention mechanism on the weights between consecutive graph convolutional networks. In addition, we account for the fact that neural architectures require a huge amount of parameters to train, thus increasing the online inference latency and negatively influencing the user experience in a live video streaming event. To address the problem of the high online inference of a vast number of parameters, we propose a knowledge distillation strategy. In particular, we design a distillation loss function, aiming to first pretrain a teacher model on offline data, and then transfer the knowledge from the teacher to a smaller student model with less parameters. We evaluate our proposed model on the link prediction task on three real-world datasets, generated by live video streaming events. The events lasted 80 minutes and each viewer exploited the distribution solution provided by the company Hive Streaming AB. The experiments demonstrate the effectiveness of the proposed model in terms of link prediction accuracy and number of required parameters, when evaluated against state-of-the-art approaches. In addition, we study the distillation performance of the proposed model in terms of compression ratio for different distillation strategies, where we show that the proposed model can achieve a compression ratio up to 15:100, preserving high link prediction accuracy. For reproduction purposes, our evaluation datasets and implementation are publicly available at https://stefanosantaris.github.io/EGAD.

  • 21.
    Antaris, Stefanos
    et al.
    KTH. HiveStreaming AB, Stockholm, Sweden..
    Rafailidis, Dimitrios
    Univ Thessaly, Volos, Greece..
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Knowledge distillation on neural networks for evolving graphs2021In: Social Network Analysis and Mining, ISSN 1869-5450, E-ISSN 1869-5469, Vol. 11, no 1, article id 100Article in journal (Refereed)
    Abstract [en]

    Graph representation learning on dynamic graphs has become an important task on several real-world applications, such as recommender systems, email spam detection, and so on. To efficiently capture the evolution of a graph, representation learning approaches employ deep neural networks, with large amount of parameters to train. Due to the large model size, such approaches have high online inference latency. As a consequence, such models are challenging to deploy to an industrial setting with vast number of users/nodes. In this study, we propose DynGKD, a distillation strategy to transfer the knowledge from a large teacher model to a small student model with low inference latency, while achieving high prediction accuracy. We first study different distillation loss functions to separately train the student model with various types of information from the teacher model. In addition, we propose a hybrid distillation strategy for evolving graph representation learning to combine the teacher's different types of information. Our experiments with five publicly available datasets demonstrate the superiority of our proposed model against several baselines, with average relative drop 40.60% in terms of RMSE in the link prediction task. Moreover, our DynGKD model achieves a compression ratio of 21: 100, accelerating the inference latency with a speed up factor x30, when compared with the teacher model. For reproduction purposes, we make our datasets and implementation publicly available at https://github.com/stefanosantaris/DynGKD.

  • 22.
    Antaris, Stefanos
    et al.
    Hive Streaming AB, Sweden.
    Rafailidis, Dimitrios
    University of Thessaly, Greece.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Meta-reinforcement learning via buffering graph signatures for live video streaming events2021In: Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2021, Association for Computing Machinery (ACM) , 2021, p. 385-392Conference paper (Refereed)
    Abstract [en]

    In this study, we present a meta-learning model to adapt the predictions of the network's capacity between viewers who participate in a live video streaming event. We propose the MELANIE model, where an event is formulated as a Markov Decision Process, performing meta-learning on reinforcement learning tasks. By considering a new event as a task, we design an actor-critic learning scheme to compute the optimal policy on estimating the viewers' high-bandwidth connections. To ensure fast adaptation to new connections or changes among viewers during an event, we implement a prioritized replay memory buffer based on the Kullback-Leibler divergence of the reward/throughput of the viewers' connections. Moreover, we adopt a model-agnostic meta-learning framework to generate a global model from past events. As viewers scarcely participate in several events, the challenge resides on how to account for the low structural similarity of different events. To combat this issue, we design a graph signature buffer to calculate the structural similarities of several streaming events and adjust the training of the global model accordingly. We evaluate the proposed model on the link weight prediction task on three real-world datasets of live video streaming events. Our experiments demonstrate the effectiveness of our proposed model, with an average relative gain of 25% against state-of-the-art strategies. For reproduction purposes, our evaluation datasets and implementation are publicly available at https://github.com/stefanosantaris/melanie

  • 23.
    Apolonia, Nuno
    et al.
    Universitat Politecnica de Catalunya (UPC) Barcelona, Spain.
    Antaris, Stefanos
    Girdzijauskas, Šarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Pallis, G.
    Dikaiakos, Marios
    SELECT: A distributed publish/subscribe notification system for online social networks2018In: Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 970-979, article id 8425250Conference paper (Refereed)
    Abstract [en]

    Publish/subscribe (pub/sub) mechanisms constitutean attractive communication paradigm in the design of large-scale notification systems for Online Social Networks (OSNs). Toaccommodate the large-scale workloads of notifications producedby OSNs, pub/sub mechanisms require thousands of serversdistributed on different data centers all over the world, incurringlarge overheads. To eliminate the pub/sub resources used, wepropose SELECT - a distributed pub/sub social notificationsystem over peer-to-peer (P2P) networks. SELECT organizesthe peers on a ring topology and provides an adaptive P2Pconnection establishment algorithm where each peer identifiesthe number of connections required, based on the social structureand user availability. This allows to propagate messages to thesocial friends of the users using a reduced number of hops.The presented algorithm is an efficient heuristic to an NP-hard problem which maps workload graphs to structured P2Poverlays inducing overall, close to theoretical, minimal number ofmessages. Experiments show that SELECT reduces the numberof relay nodes up to 89% versus the state-of-the-art pub/subnotification systems. Additionally, we demonstrate the advantageof SELECT against socially-aware P2P overlay networks andshow that the communication between two socially connectedpeers is reduced on average by at least 64% hops, while achieving100% communication availability even under high churn.

  • 24.
    Apolonia, Nuno
    et al.
    KTH, School of Information and Communication Technology (ICT). Universitat Politecnica de Catalunya (UPC) Barcelona, Spain.
    Freitag, Felix
    Universitat Politècnica de Catalunya. Barcelona, Spain.
    Navarro, Leandro
    Universitat Politècnica de Catalunya, BarcelonaTECH, Barcelona, Spain.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Socially aware microcloud service overlay optimization in community networks2019In: Software, practice & experience, ISSN 0038-0644, Vol. 49, no 1, article id 13Article in journal (Refereed)
    Abstract [en]

    Community networks are a growing network cooperation effort by citizens to build and maintain Internet infrastructure in regions that are not available. Adding that, to bring cloud services to community networks (CNs), microclouds were started as an edge cloud computing model where members cooperate using resources. Therefore, enhancing routing for services in CNs is an attractive paradigm that benefits the infrastructure. The problem is the growing consumption of resources for disseminating messages in the CN environment. This is because the services that build their overlay networks are oblivious to the underlying workload patterns that arise from social cooperation in CNs. In this paper, we propose Select in Community Networks (SELECTinCN), which enhances the overlay creation for pub/sub systems over peer‐to‐peer (P2P) networks. Moreover, SELECTinCN includes social information based on cooperation within CNs by exploiting the social aspects of the community of practice. Our work organizes the peers in a ring topology and provides an adaptive P2P connection establishment algorithm, where each peer identifies the number of connections needed based on the social structure and user availability. This allows us to propagate messages using a reduced number of hops, thus providing an efficient heuristic to an NP‐hard problem that maps the workload graph to the structured P2P overlays resulting in a number of messages close to the theoretical minimum. Experiments show that, by using social network information, SELECTinCN reduces the number of relay nodes by up to 89% using the community of practice information versus the state‐of‐the‐art pub/sub notification systems given as baseline.

  • 25.
    Armgarth, Astrid
    et al.
    Linköping Univ, Dept Sci & Technol, Lab Organ Elect, S-60174 Norrköping, Sweden.;RISE Res Inst Sweden AB, Printed Elect, S-60221 Norrköping, Sweden..
    Pantzare, Sandra
    RISE Res Inst Sweden AB, Printed Elect, S-60221 Norrköping, Sweden..
    Arven, Patrik
    J2 Holding AB, Elect Engn, S-59533 Mjolby, Sweden..
    Lassnig, Roman
    RISE Res Inst Sweden AB, Printed Elect, S-60221 Norrköping, Sweden..
    Jinno, Hiroaki
    RIKEN, Ctr Emergent Matter Sci, 2-1 Hirosawa, Wako, Saitama 3510198, Japan.;Univ Tokyo, Elect & Elect Engn & Informat Syst, Bunkyo Ku, 7-3-1 Hongo, Tokyo 1138656, Japan..
    Gabrielsson, Erik O.
    Linköping Univ, Dept Sci & Technol, Lab Organ Elect, S-60174 Norrköping, Sweden..
    Kifle, Yonatan
    Linköping Univ, Dept Elect Engn, S-58183 Linköping, Sweden..
    Cherian, Dennis
    Linköping Univ, Dept Sci & Technol, Lab Organ Elect, S-60174 Norrköping, Sweden..
    Sjostrom, Theresia Arbring
    Linköping Univ, Dept Sci & Technol, Lab Organ Elect, S-60174 Norrköping, Sweden..
    Berthou, Gautier
    Res Inst Sweden AB, RISE SICS, Kista, Sweden..
    Dowling, Jim
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Someya, Takao
    RIKEN, Ctr Emergent Matter Sci, 2-1 Hirosawa, Wako, Saitama 3510198, Japan.;Univ Tokyo, Elect & Elect Engn & Informat Syst, Bunkyo Ku, 7-3-1 Hongo, Tokyo 1138656, Japan..
    Wikner, J. Jacob
    Linköping Univ, Dept Elect Engn, S-58183 Linköping, Sweden..
    Gustafsson, Goran
    RISE Res Inst Sweden AB, Printed Elect, S-60221 Norrköping, Sweden..
    Simon, Daniel T.
    Linköping Univ, Dept Sci & Technol, Lab Organ Elect, S-60174 Norrköping, Sweden..
    Berggren, Magnus
    Linköping Univ, Dept Sci & Technol, Lab Organ Elect, S-60174 Norrköping, Sweden..
    A digital nervous system aiming toward personalized IoT healthcare2021In: Scientific Reports, E-ISSN 2045-2322, Vol. 11, no 1, article id 7757Article in journal (Refereed)
    Abstract [en]

    Body area networks (BANs), cloud computing, and machine learning are platforms that can potentially enable advanced healthcare outside the hospital. By applying distributed sensors and drug delivery devices on/in our body and connecting to such communication and decision-making technology, a system for remote diagnostics and therapy is achieved with additional autoregulation capabilities. Challenges with such autarchic on-body healthcare schemes relate to integrity and safety, and interfacing and transduction of electronic signals into biochemical signals, and vice versa. Here, we report a BAN, comprising flexible on-body organic bioelectronic sensors and actuators utilizing two parallel pathways for communication and decision-making. Data, recorded from strain sensors detecting body motion, are both securely transferred to the cloud for machine learning and improved decision-making, and sent through the body using a secure body-coupled communication protocol to auto-actuate delivery of neurotransmitters, all within seconds. We conclude that both highly stable and accurate sensing-from multiple sensors-are needed to enable robust decision making and limit the frequency of retraining. The holistic platform resembles the self-regulatory properties of the nervous system, i.e., the ability to sense, communicate, decide, and react accordingly, thus operating as a digital nervous system.

  • 26.
    Arsalan, Muhammad
    et al.
    Tech Univ Carolo Wilhelmina Braunschweig, Braunschweig, Germany..
    Di Matteo, Davide
    KTH.
    Imtiaz, Sana
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. KRY Int AB, Stockholm, Sweden..
    Abbas, Zainab
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. KRY Int AB, Stockholm, Sweden..
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Issakov, Vadim
    Tech Univ Carolo Wilhelmina Braunschweig, Braunschweig, Germany..
    Energy-Efficient Privacy-Preserving Time-Series Forecasting on User Health Data Streams2022In: Proceedings - 2022 IEEE 21st International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022, Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 541-546Conference paper (Refereed)
    Abstract [en]

    Health monitoring devices are gaining popularity both as wellness tools and as a source of information for healthcare decisions. In this work, we use Spiking Neural Networks (SNNs) for time-series forecasting due to their proven energy-saving capabilities. Thanks to their design that closely mimics the natural nervous system, SNNs are energy-efficient in contrast to classic Artificial Neural Networks (ANNs). We design and implement an energy-efficient privacy-preserving forecasting system on real-world health data streams using SNNs and compare it to a state-of-the-art system with Long short-term memory (LSTM) based prediction model. Our evaluation shows that SNNs tradeoff accuracy (2.2x greater error), to grant a smaller model (19% fewer parameters and 77% less memory consumption) and a 43% less training time. Our model is estimated to consume 3.36 mu J energy, which is significantly less than the traditional ANNs. Finally, we apply epsilon-differential privacy for enhanced privacy guarantees on our federated learning-based models. With differential privacy of epsilon = 0.1, our experiments report an increase in the measured average error (RMSE) of only 25%.

  • 27.
    Asratyan, Albert
    et al.
    KTH.
    Sheikholeslami, Sina
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    A Parallel Chain Mail Approach for Scalable Spatial Data Interpolation2021In: 2021 IEEE International Conference on Big Data (Big Data), Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 306-314Conference paper (Refereed)
    Abstract [en]

    Deteriorating air quality is a growing concern that has been linked to many health-related issues. Its monitoring is a good first step to understanding the problem. However, it is not always possible to collect air quality data from every location. Various data interpolation techniques are used to assist with populating sparse maps with more context, but many of these algorithms are computationally expensive. This work introduces a three-step Chain Mail algorithm that uses kriging (without any modifications to the base algorithm) and achieves up to ×100 execution time improvement with minimal accuracy loss (relative RMSE of 3%) by running concurrent interpolation executions. This approach can be described as a multiple-step parallel interpolation algorithm that includes specific regional border data manipulation for achieving greater accuracy. It does so by interpolating geographically defined data chunks in parallel and sharing the results with their neighboring nodes to provide context and compensate for lack of knowledge of the surrounding areas. Combined with a serverless cloud architecture, this approach opens doors to interpolating large data sets in a matter of minutes while remaining cost-efficient. The effectiveness of the three-step Chain Mail approach depends on the equal point distribution among all nodes and the resolution of the parallel configuration. In general, it offers a good balance between execution speed and accuracy.

  • 28.
    Attieh, Joseph
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. University of Helsinki, Helsinki, Finland;Huawei Technologies Oy., Helsinki, Finland.
    Woubie Zewoudie, Abraham
    Silo AI, Helsinki, Finland.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Flanagan, Adrian
    Huawei Technologies Oy., Helsinki, Finland.
    Bäckström, Tom
    Aalto University, Espoo, Finland.
    Optimizing the Performance of Text Classification Models by Improving the Isotropy of the Embeddings Using a Joint Loss Function2023In: Document Analysis and Recognition: ICDAR 2023 / [ed] Gernot A. Fink, Rajiv Jain, Koichi Kise, and Richard Zanibbi, Cham: Springer Nature , 2023, p. 121-136Conference paper (Refereed)
    Abstract [en]

    Recent studies show that the spatial distribution of the sentence representations generated from pre-trained language models is highly anisotropic. This results in a degradation in the performance of the models on the downstream task. Most methods improve the isotropy of the sentence embeddings by refining the corresponding contextual word representations, then deriving the sentence embeddings from these refined representations. In this study, we propose to improve the quality of the sentence embeddings extracted from the [CLS] token of the pre-trained language models by improving the isotropy of the embeddings. We add one feed-forward layer between the model and the downstream task layers, and we train it using a novel joint loss function. The proposed approach results in embeddings with better isotropy, that generalize better on the downstream task. Experimental results on 3 GLUE datasets with classification as the downstream task show that our proposed method is on par with the state-of-the-art, as it achieves performance gains of around 2–3% on the downstream tasks compared to the baseline.

  • 29. Avetisyan, A.
    et al.
    Ivar, J.
    Pozin, B. A.
    Petrenko, A. K.
    Cavalli, A. R.
    Arlazarov, V.
    Avdoshin, S.
    Batovrin, V. K.
    Bershadsky, A. M.
    Boichenko, A.
    Juris, B.
    Vasenin, V. A.
    Schlingloff, H.
    Kalyanov, G.
    Kantorovich, G. G.
    Korolev, A.
    Kosolapov, M. S.
    Kostogryzov, A. I.
    Koznov, D. V.
    Kuznetsov, S.
    Prokhorov, S.
    Soyfer, V. A.
    Starykh, V. A.
    Stolyarov, G. K.
    Stupnikov, S. A.
    Telnov, Y.Ph.
    Shmid, A.
    Kajko-Mattsson, Mira Miroslawa
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Wentzl, W.
    Mayer, W. H.
    Yevtushenko, N.
    Zakharov, V.
    Zmeev, O.
    Preface2019In: APSSE 2019 Actual Problems of System and Software Engineering: Proceedings of the 6th International Conference Actual Problems of System and Software Engineering, CEUR-WS , 2019, Vol. 2514, p. 1-2Conference paper (Refereed)
  • 30. Babaheidarian, P.
    et al.
    Salimi, S.
    Papadimitratos, Panagiotis
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Communication Systems, CoS. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Scalable Security in Interference Channels with Arbitrary Number of Users2020In: Proceedings of 2020 International Symposium on Information Theory and its Applications, ISITA 2020, Institute of Electrical and Electronics Engineers Inc. , 2020, p. 402-406Conference paper (Refereed)
    Abstract [en]

    In this paper, we present an achievable security scheme for an interference channel with arbitrary number of users. In this model, each receiver should be able to decode its intended message while it should remain ignorant regarding messages intended for other receivers. Our scheme relies ontransmitters to collectively ensure the confidentiality of the transmitted messages using a cooperative jamming technique and lattice alignment. The Asymmetric compute-and-forward framework is used to perform the decoding operation. The proposed scheme is the first asymptotically optimal achievable scheme for this security scenario which scales to arbitrary number of users and works for any finite-valued SNR. Also, our scheme achieves the upper bound sum secure degrees of freedom of 1 without using external helpers and thus the achievable rates lie within constant gap from sum secure capacity.

  • 31.
    Bahri, Leila
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Carminati, B.
    Ferrari, E.
    Decentralized privacy preserving services for Online Social Networks2018In: Online Social Networks and Media, ISSN 2468-6964, Vol. 6, p. 18-25Article in journal (Refereed)
    Abstract [en]

    Current popular and widely adopted Online Social Networks (OSNs) all follow a logically centered architecture, by which one single entity owns unprecedented collections of personal data in terms of amount, variety, geographical span, and richness in detail. This is clearly constituting one of the major threats to users privacy and to their right to be-left-alone. Decentralization has then been considered as the panacea to privacy issues, especially in the realms of OSNs. However, with a more thoughtful consideration of the issue, it could be argued that decentralization, if not designed and implemented carefully and properly, can have more serious implications on users privacy rather than bringing radical solutions. Moreover, research on Decentralized Online Social Networks (DOSNs) has shown that there are more challenges to their realization that need proper attention and more innovative technical solutions. In this paper, we discuss the issues related to privacy preservation between centralization and decentralization, and we provide a review of available research work on decentralized privacy preserving services for social networks. 

  • 32.
    Bahri, Leila
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Carminati, Barbara
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Ferrari, Elena
    Univ Insubria, Dept Theoret & Appl Sci, Varese, Italy..
    Knowledge-based approaches for identity management in online social networks2018In: Wiley Interdisciplinary Reviews. Data Mining and Knowledge Discovery, ISSN 1942-4787, Vol. 8, no 5, article id e1260Article, review/survey (Refereed)
    Abstract [en]

    When we meet a new person, we start by introducing ourselves. We share our names, and other information about our jobs, cities, family status, and so on. This is how socializing and social interactions can start: we first need to identify each other. Identification is a cornerstone in establishing social contacts. We identify ourselves and others by a set of civil (e.g., name, nationality, ID number, gender) and social (e.g., music taste, hobbies, religion) characteristics. This seamlessly carried out identification process in face-to-face interactions is challenged in the virtual realms of socializing, such as in online social network (OSN) platforms. New identities (i.e., online profiles) could be created without being subject to any level of verification, making it easy to create fake information and forge fake identities. This has led to a massive proliferation of accounts that represent fake identities (i.e., not mapping to physically existing entities), and that poison the online socializing environment with fake information and malicious behavior (e.g., child abuse, information stealing). Within this milieu, users in OSNs are left unarmed against the challenging task of identifying the real person behind the screen. OSN providers and research bodies have dedicated considerable effort to the study of the behavior and features of fake OSN identities, trying to find ways to detect them. Some other research initiatives have explored possible techniques to enable identity validation in OSNs. Both kinds of approach rely on extracting knowledge from the OSN, and exploiting it to achieve identification management in their realms. We provide a review of the most prominent works in the literature. We define the problem, provide a taxonomy of related attacks, and discuss the available solutions and approaches for knowledge-based identity management in OSNs. This article is categorized under: Fundamental Concepts of Data and Knowledge > Human Centricity and User Interaction Application Areas> Internet and Web-Based Applications Application Areas> Society and Culture

  • 33.
    Bahri, Leila
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Blockchain technology: Practical P2P computing (Tutorial)2019In: Proceedings - 2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems, FAS*W 2019, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 249-250, article id 8791982Conference paper (Refereed)
    Abstract [en]

    Blockchain technology comes with the promise to revolutionize the way current IT systems are organized as well as to revise how trust is perceived in the wider society. In spite of the wide attention that cyrpto-currencies (such as Bitcoin) have attracted, Blockchain technology is more likely to make an impact beyond ongoing speculations on cyrpto-currencies. Decentralized identity management, transparent supply-chain systems, and IoT governance and security are only few examples of research challenges for which this technology may hold substantial potential. Blockchain technology has emerged at the intersection of two well established research areas: peer-to-peer (P2P) computing and cryptography. In this tutorial, we provide a general overview of the main components behind this technology, we present the difference between the types of Blockchain available today, and we make a high level discussion on its potentials and limitations as well as possible research challenges.

  • 34.
    Bahri, Leila
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Trust mends blockchains: Living up to expectations2019In: Proceedings - International Conference on Distributed Computing Systems, 2019, p. 1358-1368Conference paper (Refereed)
    Abstract [en]

    At the heart of Blockchains is the trustless leader election mechanism for achieving consensus among pseudo-anonymous peers, without the need of oversight from any third party or authority whatsoever. So far, two main mechanisms are being discussed: proof-of-work (PoW) and proof-of-stake (PoS). PoW relies on demonstration of computational power, and comes with the markup of huge energy wastage in return of the stake in cyrpto-currency. PoS tries to address this by relying on owned stake (i.e., amount of crypto-currency) in the system. In both cases, Blockchains are limited to systems with financial basis. This forces non-crypto-currency Blockchain applications to resort to "permissioned" setting only, effectively centralizing the system. However, non-crypto-currency permisionless blockhains could enable secure and self-governed peer-to-peer structures for numerous emerging application domains, such as education and health, where some trust exists among peers. This creates a new possibility for valuing trust among peers and capitalizing it as the basis (stake) for reaching consensus. In this paper we show that there is a viable way for permisionless non-financial Blockhains to operate in completely decentralized environments and achieve leader election through proof-of-trust (PoT). In our PoT construction, peer trust is extracted from a trust network that emerges in a decentralized manner and is used as a waiver for the effort to be spent for PoW, thus dramatically reducing total energy expenditure of the system. Furthermore, our PoT construction is resilient to the risk of small cartels monopolizing the network (as it happens with the mining-pool phenomena in PoW) and is not vulnerable to sybils. We evluate security guarantees, and perform experimental evaluation of our construction, demonstrating up to 10-fold energy savings compared to PoW without trading off any of the decentralization characteristics, with further guarantees against risks of monopolization.

  • 35.
    Bahri, Leila
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Trust Mends Blockchains: Living up to Expectations2019In: IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Dallas, July 7-10 2019, 2019Conference paper (Refereed)
    Abstract [en]

    At the heart of Blockchains is the trustless leader election mechanism for achieving consensus among pseudoanonymous peers, without the need of oversight from any third party or authority whatsoever. So far, two main mechanisms are being discussed: proof-of-work (PoW) and proof-of-stake (PoS). PoW relies on demonstration of computational power, and comes with the markup of huge energy wastage in return of the stake in cyrpto-currency. PoS tries to address this by relying on owned stake (i.e., amount of crypto-currency) in the system. In both cases, Blockchains are limited to systems with financial basis. This forces non-crypto-currency Blockchain applications to resort to “permissioned” setting only, effectively centralizing the system. However, non-crypto-currency permisionless blockhains could enable secure and self-governed peer-to-peer structures for numerous emerging application domains, such as education and health, where some trust exists among peers. This creates a new possibility for valuing trust among peers and capitalizing it as the basis (stake) for reaching consensus. In this paper we show that there is a viable way for permisionless non-financial Blockhains to operate in completely decentralized environments and achieve leader election through proof-of-trust (PoT). In our PoT construction, peer trust is extracted from a trust network that emerges in a decentralized manner and is used as a waiver for the effort to be spent for PoW, thus dramatically reducing total energy expenditure of the system. Furthermore, our PoT construction is resilient to the risk of small cartels monopolizing the network (as it happens with the mining-pool phenomena in PoW) and is not vulnerable to sybils. We evluate security guarantees, and perform experimental evaluation of our construction, demonstrating up to 10-fold energy savings compared to PoW without trading off any of the decentralization characteristics, with further guarantees against risks of monopolization.

    Download full text (pdf)
    fulltext
  • 36.
    Bahri, Leila
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    When Trust Saves Energy - A Reference Framework for Proof-of-Trust (PoT) Blockchains2018In: WWW '18 Companion Proceedings of the The Web Conference 2018, ACM Digital Library, 2018, p. 1165-1169Conference paper (Refereed)
    Abstract [en]

    Blockchains are attracting the attention of many technical, financial, and industrial parties, as a promising infrastructure for achieving secure peer-to-peer (P2P) transactional systems. At the heart of blockchains is proof-of-work (PoW), a trustless leader election mechanism based on demonstration of computational power. PoW provides blockchain security in trusless P2P environments, but comes at the expense of wasting huge amounts of energy. In this research work, we question this energy expenditure of PoW under blockchain use cases where some form of trust exists between the peers. We propose a Proof-of-Trust (PoT) blockchain where peer trust is valuated in the network based on a trust graph that emerges in a decentralized fashion and that is encoded in and managed by the blockchain itself. This trust is then used as a waiver for the difficulty of PoW; that is, the more trust you prove in the network, the less work you do.

    Download full text (pdf)
    fulltext
  • 37.
    Balliu, Musard
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Bobadilla, Sofia
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Ekstedt, Mathias
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Monperrus, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Ron Arteaga, Javier
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Sharma, Aman
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Skoglund, Gabriel
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Soto Valero, César
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Wittlinger, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Challenges of Producing Software Bill of Materials for Java2023In: IEEE Security and Privacy, ISSN 1540-7993, E-ISSN 1558-4046, Vol. 21, no 6, p. 12-23Article in journal (Refereed)
    Abstract [en]

    Software bills of materials (SBOMs) promise to become the backbone of software supply chain hardening. We deep-dive into six tools and the SBOMs they produce for complex open source Java projects, revealing challenges regarding the accurate production and usage of SBOMs.

  • 38.
    Balliu, Musard
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Bobadilla, Sofia
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Ekstedt, Mathias
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.
    Monperrus, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Ron Arteaga, Javier
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Sharma, Aman
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Skoglund, Gabriel
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Soto Valero, César
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Wittlinger, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Software Bill of Materials in Java2023In: SCORED 2023 - Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses, Association for Computing Machinery (ACM) , 2023, p. 75-76Conference paper (Refereed)
    Abstract [en]

    Modern software applications are virtually never built entirely in-house. As a matter of fact, they reuse many third-party dependencies, which form the core of their software supply chain [1]. The large number of dependencies in an application has turned into a major challenge for both security and reliability. For example, to compromise a high-value application, malicious actors can choose to attack a less well-guarded dependency of the project [2]. Even when there is no malicious intent, bugs can propagate through the software supply chain and cause breakages in applications. Gathering accurate, upto- date information about all dependencies included in an application is, therefore, of vital importance.

  • 39.
    Barbette, Tom
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Communication Systems, CoS.
    Wu, Erfan
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Communication Systems, CoS.
    Kostic, Dejan
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Communication Systems, CoS.
    Maguire Jr., Gerald Q.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Communication Systems, CoS.
    Papadimitratos, Panagiotis
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Chiesa, Marco
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Cheetah: A High-Speed Programmable Load-Balancer Framework with Guaranteed Per-Connection-Consistency2022In: IEEE/ACM Transactions on Networking, ISSN 1063-6692, E-ISSN 1558-2566, Vol. 30, no 1, p. 354-367Article in journal (Refereed)
    Abstract [en]

    Large service providers use load balancers to dispatch millions of incoming connections per second towards thousands of servers. There are two basic yet critical requirements for a load balancer: uniform load distribution of the incoming connections across the servers, which requires to support advanced load balancing mechanisms, and per-connection-consistency (PCC), i.e, the ability to map packets belonging to the same connection to the same server even in the presence of changes in the number of active servers and load balancers. Yet, simultaneously meeting these requirements has been an elusive goal. Today's load balancers minimize PCC violations at the price of non-uniform load distribution. This paper presents Cheetah, a load balancer that supports advanced load balancing mechanisms and PCC while being scalable, memory efficient, fast at processing packets, and offers comparable resilience to clogging attacks as with today's load balancers. The Cheetah LB design guarantees PCC for any realizable server selection load balancing mechanism and can be deployed in both stateless and stateful manners, depending on operational needs. We implemented Cheetah on both a software and a Tofino-based hardware switch. Our evaluation shows that a stateless version of Cheetah guarantees PCC, has negligible packet processing overheads, and can support load balancing mechanisms that reduce the flow completion time by a factor of 2-3 ×.

  • 40.
    Basloom, Huda
    et al.
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia..
    Dahab, Mohamed
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia..
    Al-Ghamdi, Abdullah Saad
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Informat Syst, Jeddah 21589, Saudi Arabia..
    Eassa, Fathy
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia..
    Alghamdi, Ahmed Mohammed
    Univ Jeddah, Coll Comp Sci & Engn, Dept Software Engn, Jeddah 21493, Saudi Arabia..
    Haridi, Seif
    KTH, School of Engineering Sciences (SCI), Physics. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    A Parallel Hybrid Testing Technique for Tri-Programming Model-Based Software Systems2023In: Computers, Materials and Continua, ISSN 1546-2218, E-ISSN 1546-2226, Vol. 74, no 2, p. 4501-4530Article in journal (Refereed)
    Abstract [en]

    Recently, researchers have shown increasing interest in combining more than one programming model into systems running on high performance computing systems (HPCs) to achieve exascale by applying parallelism at multiple levels. Combining different programming paradigms, such as Message Passing Interface (MPI), Open Multiple Processing (OpenMP), and Open Accelerators (OpenACC), can increase computation speed and improve performance. During the integration of multiple models, the probability of runtime errors increases, making their detection difficult, especially in the absence of testing techniques that can detect these errors. Numerous studies have been conducted to identify these errors, but no technique exists for detecting errors in three-level programming models. Despite the increasing research that integrates the three programming models, MPI, OpenMP, and OpenACC, a testing technology to detect runtime errors, such as deadlocks and race conditions, which can arise from this integration has not been developed. Therefore, this paper begins with a definition and explanation of runtime errors that result fromintegrating the three programming models that compilers cannot detect. For the first time, this paper presents a classification of operational errors that can result from the integration of the three models. This paper also proposes a parallel hybrid testing technique for detecting runtime errors in systems built in the C++ programming language that uses the triple programming models MPI, OpenMP, and OpenACC. This hybrid technology combines static technology and dynamic technology, given that some errors can be detected using static techniques, whereas others can be detected using dynamic technology. The hybrid technique can detect more errors because it combines two distinct technologies. The proposed static technology detects a wide range of error types in less time, whereas a portion of the potential errors that may or may not occur depending on the operating environment are left to the dynamic technology, which completes the validation.

  • 41.
    Basloom, Huda Saleh
    et al.
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21514, Saudi Arabia..
    Dahab, Mohamed Yehia
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21514, Saudi Arabia.;Agr Res Ctr ARC, Giza 12619, Egypt..
    Alghamdi, Ahmed Mohammed
    Univ Jeddah, Coll Comp Sci & Engn, Dept Software Engn, Jeddah 21493, Saudi Arabia..
    Eassa, Fathy Elbouraey
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21514, Saudi Arabia..
    Al-Ghamdi, Abdullah Saad Al-Malaise
    King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Informat Syst, Jeddah 21589, Saudi Arabia..
    Haridi, Seif
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Errors Classification and Static Detection Techniques for Dual-Programming Model (OpenMP and OpenACC)2022In: IEEE Access, E-ISSN 2169-3536, Vol. 10, p. 117808-117826Article in journal (Refereed)
    Abstract [en]

    Recently, incorporating more than one programming model into a system designed for high performance computing (HPC) has become a popular solution to implementing parallel systems. Since traditional programming languages, such as C, C++, and Fortran, do not support parallelism at the level of multi-core processors and accelerators, many programmers add one or more programming models to achieve parallelism and accelerate computation efficiently. These models include Open Accelerators (OpenACC) and Open Multi-Processing (OpenMP), which have recently been used with various models, including Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA). Due to the difficulty of predicting the behavior of threads, runtime errors cannot be predicted. The compiler cannot identify runtime errors such as data races, race conditions, deadlocks, or livelocks. Many studies have been conducted on the development of testing tools to detect runtime errors when using programming models, such as the combinations of OpenACC with MPI models and OpenMP with MPI. Although more applications use OpenACC and OpenMP together, no testing tools have been developed to test these applications to date. This paper presents a testing tool for detecting runtime using a static testing technique. This tool can detect actual and potential runtime errors during the integration of the OpenACC and OpenMP models into systems developed in C++. This tool implement error dependency graphs, which are proposed in this paper. Additionally, a dependency graph of the errors is provided, along with a classification of runtime errors that result from combining the two programming models mentioned earlier.

  • 42.
    Baudry, Benoit
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Chen, Zimin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Etemadi, Khashayar
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Fu, Han
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Ginelli, Davide
    Univ Milano Bicocca, Comp Sci, I-20166 Milan, Italy..
    Kommrusch, Steve
    Colorado State Univ, Machine Learning, Ft Collins, CO 80523 USA..
    Martinez, Matias
    Univ Polytech Hauts De France, F-59260 Valenciennes, France..
    Monperrus, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Ron Arteaga, Javier
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. KTH Royal Inst Technol, Software Engn, S-11428 Stockholm, Sweden..
    Ye, He
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Yu, Zhongxing
    Shandong Univ, Sch Comp Sci & Technol, Jinan 266237, Peoples R China..
    A Software-Repair Robot Based on Continual Learning2021In: IEEE Software, ISSN 0740-7459, E-ISSN 1937-4194, Vol. 38, no 4, p. 28-35Article in journal (Refereed)
    Abstract [en]

    Software bugs are common, and correcting them accounts for a significant portion of the costs in the software development and maintenance process. In this article, we discuss R-Hero, our novel system for learning how to fix bugs based on continual training.

  • 43.
    Baudry, Benoit
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Harrand, Nicolas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Schulte, E.
    Timperley, C.
    Tan, S. H.
    Selakovic, M.
    Ugherughe, E.
    A spoonful of DevOps helps the GI go down2018In: Proceedings - International Conference on Software Engineering, IEEE Computer Society , 2018, p. 35-36Conference paper (Refereed)
    Abstract [en]

    DevOps emphasizes a high degree of automation at all phases of the software development lifecyle. Meanwhile, Genetic Improvement (GI) focuses on the automatic improvement of software artifacts. In this paper, we discuss why we believe that DevOps offers an excellent technical context for easing the adoption of GI techniques by software developers. We also discuss A/B testing as a prominent and clear example of GI taking place in the wild today, albeit one with human-supervised fitness and mutation operators.

  • 44.
    Baudry, Benoit
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Monperrus, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Dynamic Analysis in the Browser2019Other (Other (popular science, discussion, etc.))
    Download full text (pdf)
    fulltext
  • 45.
    Baudry, Benoit
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Monperrus, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Science-changing Code2021Other (Other (popular science, discussion, etc.))
    Download full text (pdf)
    fulltext
  • 46.
    Baudry, Benoit
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Monperrus, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Testing beyond coverage2021In: Increment, ISSN 2832-6598, Vol. Feb, no 16Article in journal (Other (popular science, discussion, etc.))
    Download full text (pdf)
    fulltext
  • 47.
    Baudry, Benoit
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Toady, Tim
    KTH.
    Monperrus, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.
    Long Live Software Easter Eggs!2022In: Queue, ISSN 1542-7730, Vol. 20, no 2, p. 31-42Article in journal (Refereed)
    Abstract [en]

    It's a period of unrest. Rebel developers, striking from continuous deployment servers, have won their first victory. During the battle, rebel spies managed to push an epic commit in the HTML code of https://pro.sony. Pursued by sinister agents, the rebels are hiding in commits, buttons, tooltips, API, HTTP headers, and configuration screens. 

  • 48.
    Behravesh, Rasoul
    et al.
    Fdn Bruno Kessler, Digital Socity Ctr, SNESE Unit, Trento, Italy.;Univ Bologna, Dept Elect Elect & Informat Engn, I-40126 Bologna, Italy..
    Rao, Akhila
    Res Inst Sweden AB, Connected Intelligence, S-16440 Stockholm, Sweden..
    Perez-Ramirez, Daniel F.
    Res Inst Sweden AB, Connected Intelligence, S-16440 Stockholm, Sweden..
    Harutyunyan, Davit
    Robert Bosch GmbH, Corp Res, D-70465 Gerlingen, Germany..
    Riggio, Roberto
    Univ Politecn Marche, Informat Engn Dept, I-60121 Ancona, Italy..
    Boman, Magnus
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Machine Learning at the Mobile Edge: The Case of Dynamic Adaptive Streaming Over HTTP (DASH)2022In: IEEE Transactions on Network and Service Management, ISSN 1932-4537, E-ISSN 1932-4537, Vol. 19, no 4, p. 4779-4793Article in journal (Refereed)
    Abstract [en]

    Dynamic Adaptive Streaming over HTTP (DASH) is a standard for delivering video in segments and adapting each segment's bitrate (quality), to adjust to changing and limited network bandwidth. We study segment prefetching, informed by machine learning predictions of bitrates of client segment requests, implemented at the network edge. We formulate this client segment request prediction problem as a supervised learning problem of predicting the bitrate of a client's next segment request, in order to prefetch it at the mobile edge, with the objective of jointly improving the video streaming experience for the users and network bandwidth utilization for the service provider. The results of extensive evaluations showed a segment request prediction accuracy of close to 90% and reduced video segment access delay with a cache hit ratio of 58%, and reduced transport network load by lowering the backhaul link utilization by 60.91%.

  • 49. Benelallam, A.
    et al.
    Harrand, Nicolas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Soto Valero, César
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Baudry, Benoit
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Barais, O.
    The maven dependency graph: A temporal graph-based representation of maven central2019In: IEEE International Working Conference on Mining Software Repositories, Institute of Electrical and Electronics Engineers (IEEE) , 2019, Vol. 2019-May, p. 344-348, article id 8816814Conference paper (Refereed)
    Abstract [en]

    The Maven Central Repository provides an extraordinary source of data to understand complex architecture and evolution phenomena among Java applications. As of September 6, 2018, this repository includes 2.8M artifacts (compiled piece of code implemented in a JVM-based language), each of which is characterized with metadata such as exact version, date of upload and list of dependencies towards other artifacts. Today, one who wants to analyze the complete ecosystem of Maven artifacts and their dependencies faces two key challenges: (i) this is a huge data set; and (ii) dependency relationships among artifacts are not modeled explicitly and cannot be queried. In this paper, we present the Maven Dependency Graph. This open source data set provides two contributions: a snapshot of the whole Maven Central taken on September 6, 2018, stored in a graph database in which we explicitly model all dependencies; an open source infrastructure to query this huge dataset.

  • 50.
    Bereza-Jarocinski, Robert
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).
    Eriksson, Oscar
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Abdalmoaty, Mohamed R-H
    Uppsala Univ, Div Syst & Control, S-75105 Uppsala, Sweden..
    Broman, David
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Hjalmarsson, Håkan
    KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).
    Stochastic Approximation for Identification of Non-Linear Differential-Algebraic Equations with Process Disturbances2022In: 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 6712-6717Conference paper (Refereed)
    Abstract [en]

    Differential-algebraic equations, commonly used to model physical systems, are the basis for many equation-based object-oriented modeling languages. When systems described by such equations are influenced by unknown process disturbances, estimating unknown parameters from experimental data becomes difficult. This is because of problems with the existence of well-defined solutions and the computational tractability of estimators. In this paper, we propose a way to minimize a cost function-whose minimizer is a consistent estimator of the true parameters-using stochastic gradient descent. This approach scales significantly better with the number of unknown parameters than other currently available methods for the same type of problem. The performance of the method is demonstrated through a simulation study with three unknown parameters. The experiments show a significantly reduced variance of the estimator, compared to an output error method neglecting the influence of process disturbances, as well as an ability to reduce the estimation bias of parameters that the output error method particularly struggles with.

    Download full text (pdf)
    fulltext
1234567 1 - 50 of 490
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf