Change search
Link to record
Permanent link

Direct link
BETA
Maguire Jr., Gerald Q., professorORCID iD iconorcid.org/0000-0002-6066-746X
Alternative names
Publications (10 of 304) Show all publications
Barbette, T., Tang, C., Yao, H., Kostic, D., Maguire Jr., G. Q., Papadimitratos, P. & Chiesa, M. (2020). A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency. In: USENIX Association (Ed.), 17th USENIX Symposium on Networked Systems Design and Implementation: . Paper presented at NSDI'20 (pp. 667-683). Santa Clara, CA, USA
Open this publication in new window or tab >>A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency
Show others...
2020 (English)In: 17th USENIX Symposium on Networked Systems Design and Implementation / [ed] USENIX Association, Santa Clara, CA, USA, 2020, p. 667-683Conference paper, Published paper (Refereed)
Abstract [en]

Large service providers use load balancers to dispatch millions of incoming connections per second towards thousands of servers. There are two basic yet critical requirements for a load balancer: uniform load distribution of the incoming connections across the servers and per-connection-consistency (PCC), i.e., the ability to map packets belonging to the same connection to the same server even in the presence of changes in the number of active servers and load balancers. Yet, meeting both these requirements at the same time has been an elusive goal. Today's load balancers minimize PCC violations at the price of non-uniform load distribution.

This paper presents Cheetah, a load balancer that supports uniform load distribution and PCC while being scalable, memory efficient, resilient to clogging attacks, and fast at processing packets. The Cheetah LB design guarantees PCC for any realizable server selection load balancing mechanism and can be deployed in both a stateless and stateful manner, depending on the operational needs. We implemented Cheetah on both a software and a Tofino-based hardware switch. Our evaluation shows that a stateless version of Cheetah guarantees PCC, has negligible packet processing overheads, and can support load balancing mechanisms that reduce the flow completion time by a factor of 2–3×.

Place, publisher, year, edition, pages
Santa Clara, CA, USA: , 2020
Keywords
load-balancer, cheetah, high-speed, connection consistency, pcc, p4, fastclick
National Category
Communication Systems Computer Systems Computer Sciences
Research subject
Computer Science; Telecommunication
Identifiers
urn:nbn:se:kth:diva-268968 (URN)978-1-939133-13-7 (ISBN)
Conference
NSDI'20
Funder
Swedish Foundation for Strategic Research , TCCEU, European Research Council, 770889
Note

QC 20200302

Available from: 2020-03-01 Created: 2020-03-01 Last updated: 2020-03-02Bibliographically approved
Farshin, A., Roozbeh, A., Maguire Jr., G. Q. & Kostic, D. (2019). Make the Most out of Last Level Cache in Intel Processors. In: Proceedings of the Fourteenth EuroSys Conference (EuroSys'19), Dresden, Germany, 25-28 March 2019.: . Paper presented at EuroSys'19. ACM Digital Library
Open this publication in new window or tab >>Make the Most out of Last Level Cache in Intel Processors
2019 (English)In: Proceedings of the Fourteenth EuroSys Conference (EuroSys'19), Dresden, Germany, 25-28 March 2019., ACM Digital Library, 2019Conference paper, Published paper (Refereed)
Abstract [en]

In modern (Intel) processors, Last Level Cache (LLC) is divided into multiple slices and an undocumented hashing algorithm (aka Complex Addressing) maps different parts of memory address space among these slices to increase the effective memory bandwidth. After a careful study of Intel’s Complex Addressing, we introduce a slice-aware memory management scheme, wherein frequently used data can be accessed faster via the LLC. Using our proposed scheme, we show that a key-value store can potentially improve its average performance ∼12.2% and ∼11.4% for 100% & 95% GET workloads, respectively. Furthermore, we propose CacheDirector, a network I/O solution which extends Direct Data I/O (DDIO) and places the packet’s header in the slice of the LLC that is closest to the relevant processing core. We implemented CacheDirector as an extension to DPDK and evaluated our proposed solution for latency-critical applications in Network Function Virtualization (NFV) systems. Evaluation results show that CacheDirector makes packet processing faster by reducing tail latencies (90-99th percentiles) by up to 119 µs (∼21.5%) for optimized NFV service chains that are running at 100 Gbps. Finally, we analyze the effectiveness of slice-aware memory management to realize cache isolation

Place, publisher, year, edition, pages
ACM Digital Library, 2019
Keywords
Slice-aware Memory Management, Last Level Cache, Non-Uniform Cache Architecture, CacheDirector, DDIO, DPDK, Network Function Virtualization, Cache Partitioning, Cache Allocation Technology, Key-Value Store.
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-244750 (URN)10.1145/3302424.3303977 (DOI)000470898700008 ()2-s2.0-85063919722 (Scopus ID)9781450362818 (ISBN)
Conference
EuroSys'19
Projects
Time-Critical CloudsULTRAWASP
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research EU, Horizon 2020, 770889
Note

QC 20190226

Available from: 2019-02-24 Created: 2019-02-24 Last updated: 2019-07-29Bibliographically approved
Farshin, A., Roozbeh, A., Maguire Jr., G. Q. & Kostic, D. (2019). Make the Most out of Last Level Cache in Intel Processors. In: Proceedings of the Fourteenth EuroSys Conference (EuroSys'19), Dresden, Germany, 25-28 March 2019.: . Paper presented at Fourteenth EuroSys Conference (EuroSys'19), Dresden, Germany, 25-28 March 2019..
Open this publication in new window or tab >>Make the Most out of Last Level Cache in Intel Processors
2019 (English)In: Proceedings of the Fourteenth EuroSys Conference (EuroSys'19), Dresden, Germany, 25-28 March 2019., 2019Conference paper, Poster (with or without abstract) (Refereed) [Artistic work]
Keywords
Slice-aware Memory Management, Last Level Cache, Non-Uniform Cache Architecture, CacheDirector, DDIO, DPDK, Network Function Virtualization, Cache Partitioning, Cache Allocation Technology, Key-Value Store.
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-269086 (URN)
Conference
Fourteenth EuroSys Conference (EuroSys'19), Dresden, Germany, 25-28 March 2019.
Projects
Time-Critical CloudsULTRAWASP
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research EU, Horizon 2020, 770889
Note

QC 20200304

Available from: 2020-03-03 Created: 2020-03-03 Last updated: 2020-03-04Bibliographically approved
Barbette, T., Katsikas, G. P., Maguire Jr., G. Q. & Kostic, D. (2019). RSS++: load and state-aware receive side scaling. In: ACM (Ed.), Proceedings of the 15th International Conference on emerging Networking EXperiments and Technologies: . Paper presented at CoNEXT '19: The 15th International Conference on Emerging Networking Experiments And Technologies, Orlando, United States, 9-12 December 2019. Orlando, FL, USA: Association for Computing Machinery (ACM)
Open this publication in new window or tab >>RSS++: load and state-aware receive side scaling
2019 (English)In: Proceedings of the 15th International Conference on emerging Networking EXperiments and Technologies / [ed] ACM, Orlando, FL, USA: Association for Computing Machinery (ACM), 2019Conference paper, Published paper (Refereed)
Abstract [en]

While the current literature typically focuses on load-balancing among multiple servers, in this paper, we demonstrate the importance of load-balancing within a single machine (potentially with hundreds of CPU cores). In this context, we propose a new load-balancing technique (RSS++) that dynamically modifies the receive side scaling (RSS) indirection table to spread the load across the CPU cores in a more optimal way. RSS++ incurs up to 14x lower 95th percentile tail latency and orders of magnitude fewer packet drops compared to RSS under high CPU utilization. RSS++ allows higher CPU utilization and dynamic scaling of the number of allocated CPU cores to accommodate the input load, while avoiding the typical 25% over-provisioning. RSS++ has been implemented for both (i) DPDK and (ii) the Linux kernel. Additionally, we implement a new state migration technique, which facilitates sharding and reduces contention between CPU cores accessing per-flow data. RSS++ keeps the flow-state by groups that can be migrated at once, leading to a 20% higher efficiency than a state of the art shared flow table.

Place, publisher, year, edition, pages
Orlando, FL, USA: Association for Computing Machinery (ACM), 2019
Keywords
networking, load-balancing, packet scheduling, high-speed networking, intra-server load-balancing, receive side scaling, network function virtualization, RSS++
National Category
Communication Systems Computer Systems Computer Sciences
Research subject
Information and Communication Technology; Computer Science
Identifiers
urn:nbn:se:kth:diva-263941 (URN)10.1145/3359989.3365412 (DOI)2-s2.0-85077231875 (Scopus ID)978-1-4503-6998-5 (ISBN)
Conference
CoNEXT '19: The 15th International Conference on Emerging Networking Experiments And Technologies, Orlando, United States, 9-12 December 2019
Funder
Swedish Foundation for Strategic Research , TCCEU, European Research Council, 770889
Note

QC 20191126

Available from: 2019-11-20 Created: 2019-11-20 Last updated: 2020-03-09Bibliographically approved
Bogdanov, K., Reda, W., Maguire Jr., G. Q., Kostic, D. & Canini, M. (2018). Fast and accurate load balancing for geo-distributed storage systems. In: SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing: . Paper presented at 2018 ACM Symposium on Cloud Computing, SoCC 2018, Carlsbad, United States, 11 October 2018 through 13 October 2018 (pp. 386-400). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Fast and accurate load balancing for geo-distributed storage systems
Show others...
2018 (English)In: SoCC 2018 - Proceedings of the 2018 ACM Symposium on Cloud Computing, Association for Computing Machinery (ACM), 2018, p. 386-400Conference paper, Published paper (Refereed)
Abstract [en]

The increasing density of globally distributed datacenters reduces the network latency between neighboring datacenters and allows replicated services deployed across neighboring locations to share workload when necessary, without violating strict Service Level Objectives (SLOs). We present Kurma, a practical implementation of a fast and accurate load balancer for geo-distributed storage systems. At run-time, Kurma integrates network latency and service time distributions to accurately estimate the rate of SLO violations for requests redirected across geo-distributed datacenters. Using these estimates, Kurma solves a decentralized rate-based performance model enabling fast load balancing (in the order of seconds) while taming global SLO violations. We integrate Kurma with Cassandra, a popular storage system. Using real-world traces along with a geo-distributed deployment across Amazon EC2, we demonstrate Kurma’s ability to effectively share load among datacenters while reducing SLO violations by up to a factor of 3 in high load settings or reducing the cost of running the service by up to 17%.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2018
Keywords
Cloud Computing, Distributed Systems, Server Load Balancing, Service Level Objectives, Wide Area Networks
National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-241481 (URN)10.1145/3267809.3267820 (DOI)000458692200031 ()2-s2.0-85059006718 (Scopus ID)9781450360111 (ISBN)
Conference
2018 ACM Symposium on Cloud Computing, SoCC 2018, Carlsbad, United States, 11 October 2018 through 13 October 2018
Funder
EU, Horizon 2020, 770889Swedish Foundation for Strategic Research
Note

QC 20190123

Available from: 2019-01-23 Created: 2019-01-23 Last updated: 2019-10-17Bibliographically approved
Bogdanov, K., Reda, W., Kostic, D., Maguire Jr., G. Q. & Canini, M. (2018). Kurma: Fast and Efficient Load Balancing for Geo-Distributed Storage Systems: Evaluation of Convergence and Scalability.
Open this publication in new window or tab >>Kurma: Fast and Efficient Load Balancing for Geo-Distributed Storage Systems: Evaluation of Convergence and Scalability
Show others...
2018 (English)Report (Other academic)
Abstract [en]

This report provides an extended evaluation of Kurma, a practical implementation of a geo-distributed load balancer for backend storage systems. In this report we demonstrate the ability of distributed Kurma instances to accurately converge to the same solutions within 1% of the total datacenter’s capacity and the ability of Kurma to scale up to 8 datacenters using a single CPU core at each datacenter.

National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-222289 (URN)
Note

QR 20180212

Available from: 2018-02-05 Created: 2018-02-05 Last updated: 2018-02-12Bibliographically approved
Roozbeh, A., Soares, J., Maguire Jr., G. Q., Wuhib, F., Padala, C., Mahloo, M., . . . Kostic, D. (2018). Software-Defined "Hardware" Infrastructures: A Survey on Enabling Technologies and Open Research Directions. IEEE Communications Surveys and Tutorials, 20(3), 2454-2485
Open this publication in new window or tab >>Software-Defined "Hardware" Infrastructures: A Survey on Enabling Technologies and Open Research Directions
Show others...
2018 (English)In: IEEE Communications Surveys and Tutorials, ISSN 1553-877X, E-ISSN 1553-877X, Vol. 20, no 3, p. 2454-2485Article in journal (Refereed) Published
Abstract [en]

This paper provides an overview of software-defined "hardware" infrastructures (SDHI). SDHI builds upon the concept of hardware (HW) resource disaggregation. HW resource disaggregation breaks today's physical server-oriented model where the use of a physical resource (e.g., processor or memory) is constrained to a physical server's chassis. SDHI extends the definition of of software-defined infrastructures (SDI) and brings greater modularity, flexibility, and extensibility to cloud infrastructures, thus allowing cloud operators to employ resources more efficiently and allowing applications not to be bounded by the physical infrastructure's layout. This paper aims to be an initial introduction to SDHI and its associated technological advancements. This paper starts with an overview of the cloud domain and puts into perspective some of the most prominent efforts in the area. Then, it presents a set of differentiating use-cases that SDHI enables. Next, we state the fundamentals behind SDI and SDHI, and elaborate why SDHI is of great interest today. Moreover, it provides an overview of the functional architecture of a cloud built on SDHI, exploring how the impact of this transformation goes far beyond the cloud infrastructure level in its impact on platforms, execution environments, and applications. Finally, an in-depth assessment is made of the technologies behind SDHI, the impact of these technologies, and the associated challenges and potential future directions of SDHI.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
Keywords
CR-software-defined infrastructure, resource disaggregation, cloud infrastructure, rack-scale, hyperscale computing, disaggregated DC
National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-235270 (URN)10.1109/COMST.2018.2834731 (DOI)000443030500033 ()2-s2.0-85046804138 (Scopus ID)
Funder
Swedish Foundation for Strategic Research Knut and Alice Wallenberg Foundation
Note

QC 20180919

Available from: 2018-09-19 Created: 2018-09-19 Last updated: 2018-11-23Bibliographically approved
Yalew, S. D., Maguire Jr., G. Q., Haridi, S. & Correia, M. (2017). Hail to the Thief: Protecting Data from Mobile Ransomware with ransomSafeDroid. In: Gkoulalasdivanis, A Correia, MP Avresky, DR (Ed.), 2017 IEEE 16th International Symposium on Network Computing and Applications, NCA 2017: . Paper presented at 16th IEEE International Symposium on Network Computing and Applications, NCA 2017, Cambridge, United States, 30 October 2017 through 1 November 2017 (pp. 351-358). Institute of Electrical and Electronics Engineers (IEEE), 2017
Open this publication in new window or tab >>Hail to the Thief: Protecting Data from Mobile Ransomware with ransomSafeDroid
2017 (English)In: 2017 IEEE 16th International Symposium on Network Computing and Applications, NCA 2017 / [ed] Gkoulalasdivanis, A Correia, MP Avresky, DR, Institute of Electrical and Electronics Engineers (IEEE), 2017, Vol. 2017, p. 351-358Conference paper, Published paper (Refereed)
Abstract [en]

The growing popularity of Android and the increasing amount of sensitive data stored in mobile devices have lead to the dissemination of Android ransomware. Ransomware is a class of malware that makes data inaccessible by blocking access to the device or, more frequently, by encrypting the data; to recover the data, the user has to pay a ransom to the attacker. A solution for this problem is to backup the data. Although backup tools are available for Android, these tools may be compromised or blocked by the ransomware itself. This paper presents the design and implementation of RANSOMSAFEDROID, a TrustZone based backup service for mobile devices. RANSOMSAFEDROID is protected from malware by leveraging the ARM TrustZone extension and running in the secure world. It does backup of files periodically to a secure local persistent partition and pushes these backups to external storage to protect them from ransomware. Initially, RANSOMSAFEDROID does a full backup of the device filesystem, then it does incremental backups that save the changes since the last backup. As a proof-of-concept, we implemented a RANSOMSAFEDROID prototype and provide a performance evaluation using an i.MX53 development board.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-225237 (URN)10.1109/NCA.2017.8171377 (DOI)000426971900053 ()2-s2.0-85046532213 (Scopus ID)9781538614655 (ISBN)
Conference
16th IEEE International Symposium on Network Computing and Applications, NCA 2017, Cambridge, United States, 30 October 2017 through 1 November 2017
Note

QC 20180403

Available from: 2018-04-03 Created: 2018-04-03 Last updated: 2018-05-22Bibliographically approved
Yalew, S. D., Mendonca, P., Maguire Jr., G. Q., Haridi, S. & Correia, M. (2017). TruApp: A TrustZone-based Authenticity Detection Service for Mobile Apps. In: 2017 IEEE 13TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB): . Paper presented at 13th IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), OCT 09-11, 2017, Rome, ITALY. IEEE
Open this publication in new window or tab >>TruApp: A TrustZone-based Authenticity Detection Service for Mobile Apps
Show others...
2017 (English)In: 2017 IEEE 13TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB), IEEE , 2017Conference paper, Published paper (Refereed)
Abstract [en]

In less than a decade, mobile apps became an integral part of our lives. In several situations it is important to provide assurance that a mobile app is authentic, i.e., that it is indeed the app produced by a certain company. However, this is challenging, as such apps can be repackaged, the user malicious, or the app tampered with by an attacker. This paper presents the design of TRUAPP, a software authentication service that provides assurance of the authenticity and integrity of apps running on mobile devices. TRUAPP provides such assurance, even if the operating system is compromised, by leveraging the ARM TrustZone hardware security extension. TRUAPP uses a set of techniques (static watermarking, dynamic watermarking, and cryptographic hashes) to verify the integrity of the apps. The service was implemented in a hardware board that emulates a mobile device, which was used to do a thorough experimental evaluation of the service.

Place, publisher, year, edition, pages
IEEE, 2017
Series
IEEE International Conference on Wireless and Mobile Computing Networking and Communications-WiMOB, ISSN 2160-4886
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-222218 (URN)10.1109/WiMOB.2017.8115820 (DOI)000419818000108 ()2-s2.0-85041407068 (Scopus ID)978-1-5386-3839-2 (ISBN)
Conference
13th IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), OCT 09-11, 2017, Rome, ITALY
Note

QC 20180205

Available from: 2018-02-05 Created: 2018-02-05 Last updated: 2019-04-15Bibliographically approved
Olivecrona, H., Maguire, G. Q. ., Noz, M. E., Zeleznik, M. P., Kesteris, U. & Weidenhielm, L. (2016). A CT method for following patients with both prosthetic replacement and implanted tantalum beads: preliminary analysis with a pelvic model and in seven patients. Journal of Orthopaedic Surgery and Research, 11, Article ID 27.
Open this publication in new window or tab >>A CT method for following patients with both prosthetic replacement and implanted tantalum beads: preliminary analysis with a pelvic model and in seven patients
Show others...
2016 (English)In: Journal of Orthopaedic Surgery and Research, ISSN 1749-799X, E-ISSN 1749-799X, Vol. 11, article id 27Article in journal (Refereed) Published
Abstract [en]

Background: Radiostereometric analysis (RSA) is often used for evaluating implanted devices over time. Following patients who have had tantalum beads implanted as markers in conjunction with joint replacements is important for longitudinal evaluation of these patients and for those with similar implants. As doing traditional RSA imaging is exacting and limited to specialized centers, it is important to consider alternative techniques for this ongoing evaluation. This paper studies the use of computed tomography (CT) to evaluate over time tantalum beads which have been implanted as markers. Methods: The project uses both a hip model implanted with tantalum beads, acquired in several orientations, at two different CT energy levels, and a cohort of seven patients. The model was evaluated twice by the same observer with a 1-week interval. All CT volumes were analyzed using a semi-automated 3D volume fusion (spatial registration) tool which provides landmark-based fusion of two volumes, registering a target volume with a reference volume using a rigid body 3D algorithm. The mean registration errors as well as the accuracy and repeatability of the method were evaluated. Results: The mean registration error, maximum value of repeatability, and accuracy for the relative movement in the model were 0.16 mm, 0.02 degrees and 0.1 mm, and 0.36 degrees and 0.13 mm for 120 kVp and 0.21 mm, 0.04 degrees and 0.01 mm, and 0.39 degrees and 0.12 mm for 100 kVp. For the patients, the mean registration errors per patient ranged from 0.08 to 0.35 mm. These results are comparable to those in typical clinical RSA trials. This technique successfully evaluated two patients who would have been lost from the cohort if only RSA were used. Conclusions: The proposed technique can be used to evaluate patients with tantalum beads over time without the need for stereoradiographs. Further, the effective dose associated with CT is decreasing.

Place, publisher, year, edition, pages
BioMed Central, 2016
Keywords
Radiostereometry, Longitudinal studies, CT analysis, RSA
National Category
Medical Image Processing
Identifiers
urn:nbn:se:kth:diva-183622 (URN)10.1186/s13018-016-0360-7 (DOI)000370777100001 ()26911571 (PubMedID)2-s2.0-84959129104 (Scopus ID)
Note

QC 20160319

Available from: 2016-03-19 Created: 2016-03-18 Last updated: 2020-03-09Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6066-746X

Search in DiVA

Show all publications