kth.sePublications
Change search
Link to record
Permanent link

Direct link
Maguire Jr., Gerald Q., professorORCID iD iconorcid.org/0000-0002-6066-746X
Alternative names
Publications (10 of 328) Show all publications
Verardo, G., Barreira, D., Chiesa, M., Kostic, D. & Maguire Jr., G. Q. (2023). Fast Server Learning Rate Tuning for Coded Federated Dropout. In: Goebel, R Yu, H Faltings, B Fan, L Xiong, Z (Ed.), FL 2022: Trustworthy Federated Learning. Paper presented at 1st International Workshop on Trustworthy Federated Learning (FL), JUL 23, 2022, Vienna, AUSTRIA (pp. 84-99). Springer Nature, 13448
Open this publication in new window or tab >>Fast Server Learning Rate Tuning for Coded Federated Dropout
Show others...
2023 (English)In: FL 2022: Trustworthy Federated Learning / [ed] Goebel, R Yu, H Faltings, B Fan, L Xiong, Z, Springer Nature , 2023, Vol. 13448, p. 84-99Conference paper, Published paper (Refereed)
Abstract [en]

In Federated Learning (FL), clients with low computational power train a common machine model by exchanging parameters via updates instead of transmitting potentially private data. Federated Dropout (FD) is a technique that improves the communication efficiency of a FL session by selecting a subset of model parameters to be updated in each training round. However, compared to standard FL, FD produces considerably lower accuracy and faces a longer convergence time. In this chapter, we leverage coding theory to enhance FD by allowing different sub-models to be used at each client. We also show that by carefully tuning the server learning rate hyper-parameter, we can achieve higher training speed while also reaching up to the same final accuracy as the no dropout case. Evaluations on the EMNIST dataset show that our mechanism achieves 99.6% of the final accuracy of the no dropout case while requiring 2.43x less bandwidth to achieve this level of accuracy.

Place, publisher, year, edition, pages
Springer Nature, 2023
Series
Lecture Notes in Artificial Intelligence, ISSN 2945-9133
Keywords
Federated Learning, Hyper-parameters tuning, Coding Theory
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-330513 (URN)10.1007/978-3-031-28996-5_7 (DOI)000999818400007 ()2-s2.0-85152560522 (Scopus ID)
Conference
1st International Workshop on Trustworthy Federated Learning (FL), JUL 23, 2022, Vienna, AUSTRIA
Note

QC 20230630

Available from: 2023-06-30 Created: 2023-06-30 Last updated: 2023-06-30Bibliographically approved
Barbette, T., Wu, E., Kostic, D., Maguire Jr., G. Q., Papadimitratos, P. & Chiesa, M. (2022). Cheetah: A High-Speed Programmable Load-Balancer Framework with Guaranteed Per-Connection-Consistency. IEEE/ACM Transactions on Networking, 30(1), 354-367
Open this publication in new window or tab >>Cheetah: A High-Speed Programmable Load-Balancer Framework with Guaranteed Per-Connection-Consistency
Show others...
2022 (English)In: IEEE/ACM Transactions on Networking, ISSN 1063-6692, E-ISSN 1558-2566, Vol. 30, no 1, p. 354-367Article in journal (Refereed) Published
Abstract [en]

Large service providers use load balancers to dispatch millions of incoming connections per second towards thousands of servers. There are two basic yet critical requirements for a load balancer: uniform load distribution of the incoming connections across the servers, which requires to support advanced load balancing mechanisms, and per-connection-consistency (PCC), i.e, the ability to map packets belonging to the same connection to the same server even in the presence of changes in the number of active servers and load balancers. Yet, simultaneously meeting these requirements has been an elusive goal. Today's load balancers minimize PCC violations at the price of non-uniform load distribution. This paper presents Cheetah, a load balancer that supports advanced load balancing mechanisms and PCC while being scalable, memory efficient, fast at processing packets, and offers comparable resilience to clogging attacks as with today's load balancers. The Cheetah LB design guarantees PCC for any realizable server selection load balancing mechanism and can be deployed in both stateless and stateful manners, depending on operational needs. We implemented Cheetah on both a software and a Tofino-based hardware switch. Our evaluation shows that a stateless version of Cheetah guarantees PCC, has negligible packet processing overheads, and can support load balancing mechanisms that reduce the flow completion time by a factor of 2-3 ×.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
Cloud networks, Layer 4 load balancing, P4, Per-connection-consistency, Programmable networks, QUIC, Stateful classification, Stateless load balancing, TCP, Electric power plant loads, Network layers, Servers, Load modeling, Load-Balancing, Programmable network, QUIC., Resilience, Hash functions
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-312304 (URN)10.1109/TNET.2021.3113370 (DOI)000732385800001 ()2-s2.0-85116873307 (Scopus ID)
Note

QC 20220530

Available from: 2022-05-30 Created: 2022-05-30 Last updated: 2022-06-25Bibliographically approved
Ghasemirahni, H., Barbette, T., Katsikas, G. P., Farshin, A., Roozbeh, A., Girondi, M., . . . Kostic, D. (2022). Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets. In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022: . Paper presented at 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA (pp. 807-827). USENIX - The Advanced Computing Systems Association
Open this publication in new window or tab >>Packet Order Matters! Improving Application Performance by Deliberately Delaying Packets
Show others...
2022 (English)In: Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022, USENIX - The Advanced Computing Systems Association, 2022, p. 807-827Conference paper, Published paper (Refereed)
Abstract [en]

Data centers increasingly deploy commodity servers with high-speed network interfaces to enable low-latency communication. However, achieving low latency at high data rates crucially depends on how the incoming traffic interacts with the system's caches. When packets that need to be processed in the same way are consecutive, i.e., exhibit high temporal and spatial locality, caches deliver great benefits.

In this paper, we systematically study the impact of temporal and spatial traffic locality on the performance of commodity servers equipped with high-speed network interfaces. Our results show that (i) the performance of a variety of widely deployed applications degrades substantially with even the slightest lack of traffic locality, and (ii) a traffic trace from our organization reveals poor traffic locality as networking protocols, drivers, and the underlying switching/routing fabric spread packets out in time (reducing locality). To address these issues, we built Reframer, a software solution that deliberately delays packets and reorders them to increase traffic locality. Despite introducing μs-scale delays of some packets, we show that Reframer increases the throughput of a network service chain by up to 84% and reduces the flow completion time of a web server by 11% while improving its throughput by 20%.

Place, publisher, year, edition, pages
USENIX - The Advanced Computing Systems Association, 2022
Keywords
packet ordering, spatial and temporal locality, packet scheduling, batch processing, high-speed networking
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-304656 (URN)000876762200046 ()2-s2.0-85140983450 (Scopus ID)
Conference
19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), APR 04-06, 2022, Renton, WA
Projects
ULTRAWASPTime-Critical Clouds
Funder
Swedish Foundation for Strategic ResearchKnut and Alice Wallenberg FoundationEU, European Research Council
Note

QC 20230619

Available from: 2021-11-09 Created: 2021-11-09 Last updated: 2023-06-19Bibliographically approved
Katsikas, G. P., Barbette, T., Kostic, D., Maguire Jr., G. Q. & Steinert, R. (2021). Metron: High-Performance NFV Service Chaining Even in the Presence of Blackboxes. ACM Transactions on Computer Systems, 38(1-2), 1-45, Article ID 3.
Open this publication in new window or tab >>Metron: High-Performance NFV Service Chaining Even in the Presence of Blackboxes
Show others...
2021 (English)In: ACM Transactions on Computer Systems, ISSN 0734-2071, E-ISSN 1557-7333, Vol. 38, no 1-2, p. 1-45, article id 3Article in journal (Refereed) Published
Abstract [en]

Deployment of 100 Gigabit Ethernet (GbE) links challenges the packet processing limits of commodity hardware used for Network Functions Virtualization (NFV). Moreover, realizing chained network functions (i.e., service chains) necessitates the use of multiple CPU cores, or even multiple servers, to process packets from such high speed links.

Our system Metron jointly exploits the underlying network and commodity servers' resources: (i) to offload part of the packet processing logic to the network, (ii) by using smart tagging to setup and exploit the affinity of traffic classes, and (iii) by using tag-based hardware dispatching to carry out the remaining packet processing at the speed of the servers' cores, with zero inter-core communication. Moreover, Metron transparently integrates, manages, and load balances proprietary "blackboxes" together with Metron service chains.

Metron realizes stateful network functions at the speed of 100 GbE network cards on a single server, while elastically and rapidly adapting to changing workload volumes. Our experiments demonstrate that Metron service chains can coexist with heterogeneous blackboxes, while still leveraging Metron's accurate dispatching and load balancing. In summary, Metron has (i) 2.75-8× better efficiency, up to (ii) 4.7× lower latency, and (iii) 7.8× higher throughput than OpenBox, a state-of-the-art NFV system.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Keywords
elasticity, service chains, hardware offloading, accurate dispatching, 100 GbE, load balancing, tagging, blackboxes, NFV
National Category
Communication Systems Computer Sciences
Identifiers
urn:nbn:se:kth:diva-298691 (URN)10.1145/3465628 (DOI)000679809300003 ()2-s2.0-85111657554 (Scopus ID)
Projects
European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 770889)Swedish Foundation for Strategic Research (SSF)
Note

QC 20210712

Available from: 2021-07-11 Created: 2021-07-11 Last updated: 2024-03-15
Farshin, A., Barbette, T., Roozbeh, A., Maguire Jr., G. Q. & Kostic, D. (2021). PacketMill: Toward Per-Core 100-Gbps Networking. In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS): . Paper presented at 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’21), 19–23 April, 2021, Virtual/Online. ACM Digital Library
Open this publication in new window or tab >>PacketMill: Toward Per-Core 100-Gbps Networking
Show others...
2021 (English)In: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), ACM Digital Library, 2021Conference paper, Published paper (Refereed)
Abstract [en]

We present PacketMill, a system for optimizing software packet processing, which (i) introduces a new model to efficiently manage packet metadata and (ii) employs code-optimization techniques to better utilize commodity hardware. PacketMill grinds the whole packet processing stack, from the high-level network function configuration file to the low-level userspace network (specifically DPDK) drivers, to mitigate inefficiencies and produce a customized binary for a given network function. Our evaluation results show that PacketMill increases throughput (up to 36.4 Gbps -- 70%) & reduces latency (up to 101 us -- 28%) and enables nontrivial packet processing (e.g., router) at ~100 Gbps, when new packets arrive >10× faster than main memory access times, while using only one processing core.

Place, publisher, year, edition, pages
ACM Digital Library, 2021
Keywords
PacketMill, X-Change, Packet Processing, Metadata Management, 100-Gbps Networking, Middleboxes, Commodity Hardware, LLVM, Compiler Optimizations, Full-Stack Optimization, FastClick, DPDK.
National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-289665 (URN)10.1145/3445814.3446724 (DOI)000829871000001 ()2-s2.0-85104694209 (Scopus ID)
Conference
26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’21), 19–23 April, 2021, Virtual/Online
Projects
Time-Critical CloudsULTRAWASP
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic ResearchEU, Horizon 2020, 770889
Note

Part of proceedings: ISBN 978-1-4503-8317-2

QC 20210210

Available from: 2021-02-10 Created: 2021-02-10 Last updated: 2024-03-15Bibliographically approved
Katsikas, G. P., Barbette, T., Chiesa, M., Kostic, D. & Maguire Jr., G. Q. (2021). What you need to know about (Smart) Network Interface Cards. In: Springer International Publishing (Ed.), Proceedings Passive and Active Measurement - 22nd International Conference, PAM 2021: . Paper presented at Passive and Active Measurement - 22nd International Conference, PAM 2021, Virtual Event, March 29 - April 1, 2021. Springer Nature
Open this publication in new window or tab >>What you need to know about (Smart) Network Interface Cards
Show others...
2021 (English)In: Proceedings Passive and Active Measurement - 22nd International Conference, PAM 2021 / [ed] Springer International Publishing, Springer Nature , 2021Conference paper, Published paper (Refereed)
Abstract [en]

Network interface cards (NICs) are fundamental componentsof modern high-speed networked systems, supporting multi-100 Gbpsspeeds and increasing programmability. Offloading computation from aserver’s CPU to a NIC frees a substantial amount of the server’s CPU resources, making NICs key to offer competitive cloud services.

Therefore, understanding the performance benefits and limitations of offloading anetworking application to a NIC is of paramount importance.In this paper, we measure the performance of four different NICs fromone of the largest NIC vendors worldwide, supporting 100 Gbps and200 Gbps. We show that while today’s NICs can easily support multihundred-gigabit throughputs, performing frequent update operations ofa NIC’s packet classifier — as network address translators (NATs) andload balancers would do for each incoming connection — results in adramatic throughput reduction of up to 70 Gbps or complete denial ofservice. Our conclusion is that all tested NICs cannot support high-speednetworking applications that require keeping track of a large number offrequently arriving incoming connections. Furthermore, we show a variety of counter-intuitive performance artefacts including the performanceimpact of using multiple tables to classify flows of packets.

Place, publisher, year, edition, pages
Springer Nature, 2021
Series
Lecture Notes in Computer Science ; 12671
Keywords
Network interface cards, hardware classifier, offloading, rule operations, performance, benchmarking, 100 GbE
National Category
Computer Systems Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-292353 (URN)10.1007/978-3-030-72582-2_19 (DOI)000788003900019 ()2-s2.0-85107297942 (Scopus ID)
Conference
Passive and Active Measurement - 22nd International Conference, PAM 2021, Virtual Event, March 29 - April 1, 2021
Funder
European Commission, 770889Swedish Foundation for Strategic Research, TCC
Note

QC 20220524

Available from: 2021-03-30 Created: 2021-03-30 Last updated: 2022-06-25
Barbette, T., Tang, C., Yao, H., Kostic, D., Maguire Jr., G. Q., Papadimitratos, P. & Chiesa, M. (2020). A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency. In: USENIX Association (Ed.), Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020: . Paper presented at 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020, Santa Clara, 25-27 February 2020 (pp. 667-683). Santa Clara, CA, USA: USENIX Association
Open this publication in new window or tab >>A High-Speed Load-Balancer Design with Guaranteed Per-Connection-Consistency
Show others...
2020 (English)In: Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020 / [ed] USENIX Association, Santa Clara, CA, USA: USENIX Association , 2020, p. 667-683Conference paper, Published paper (Refereed)
Abstract [en]

Large service providers use load balancers to dispatch millions of incoming connections per second towards thousands of servers. There are two basic yet critical requirements for a load balancer: uniform load distribution of the incoming connections across the servers and per-connection-consistency (PCC), i.e., the ability to map packets belonging to the same connection to the same server even in the presence of changes in the number of active servers and load balancers. Yet, meeting both these requirements at the same time has been an elusive goal. Today's load balancers minimize PCC violations at the price of non-uniform load distribution.

This paper presents Cheetah, a load balancer that supports uniform load distribution and PCC while being scalable, memory efficient, resilient to clogging attacks, and fast at processing packets. The Cheetah LB design guarantees PCC for any realizable server selection load balancing mechanism and can be deployed in both a stateless and stateful manner, depending on the operational needs. We implemented Cheetah on both a software and a Tofino-based hardware switch. Our evaluation shows that a stateless version of Cheetah guarantees PCC, has negligible packet processing overheads, and can support load balancing mechanisms that reduce the flow completion time by a factor of 2–3×.

Place, publisher, year, edition, pages
Santa Clara, CA, USA: USENIX Association, 2020
Keywords
load-balancer, cheetah, high-speed, connection consistency, pcc, p4, fastclick
National Category
Communication Systems Computer Systems Computer Sciences
Research subject
Computer Science; Telecommunication
Identifiers
urn:nbn:se:kth:diva-268968 (URN)000570979600040 ()2-s2.0-85091845586 (Scopus ID)
Conference
17th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2020, Santa Clara, 25-27 February 2020
Funder
Swedish Foundation for Strategic Research , TCCEU, European Research Council, 770889
Note

Part of proceedings: ISBN 978-1-939133-13-7

QC 20200302

Available from: 2020-03-01 Created: 2020-03-01 Last updated: 2022-06-26Bibliographically approved
Lundblad, H., Karlsson-Thur, C., Maguire Jr., G. Q., Noz, M. E., Zeleznik, M. P. & Weidenhielm, L. (2020). Can Na18F PET/CT bone scans help when deciding if early intervention is needed in patients being treated with a TSF attached to the tibia: insights from 41 patients.. European Journal of Orthopaedic Surgery & Traumatology, 31(2), 349-364
Open this publication in new window or tab >>Can Na18F PET/CT bone scans help when deciding if early intervention is needed in patients being treated with a TSF attached to the tibia: insights from 41 patients.
Show others...
2020 (English)In: European Journal of Orthopaedic Surgery & Traumatology, ISSN 1633-8065, E-ISSN 1432-1068, Vol. 31, no 2, p. 349-364Article in journal (Refereed) Published
Abstract [en]

PURPOSE: To demonstrate the usefulness of positron emission tomography (PET)/computed tomography (CT) bone scans for gaining insight into healing bone status earlier than CT or X-ray alone.

METHODS: Forty-one prospective patients being treated with a Taylor Spatial Frame were recruited. We registered data obtained from successive static CT scans for each patient, to align the broken bone. Radionuclide uptake was calculated over a spherical volume of interest (VOI). For all voxels in the VOI, histograms and cumulative distribution functions of the CT and PET data were used to assess the type and progress of new bone growth and radionuclide uptake. The radionuclide uptake difference per day between the PET/CT scans was displayed in a scatter plot. Superimposing CT and PET slice data and observing the spatiotemporal uptake of 18F- in the region of healing bone by a time-sequenced movie allowed qualitative evaluation.

RESULTS: Numerical evaluation, particularly the shape and distribution of Hounsfield Units and radionuclide uptake in the graphs, combined with visual evaluation and the movies enabled the identification of six patients needing intervention as well as those not requiring intervention. Every revised patient proceeded to a successful treatment conclusion.

CONCLUSION: Numerical and visual evaluation based on all the voxels in the VOI may aid the orthopedic surgeon to assess a patient's progression to recovery. By identifying slow or insufficient progress at an early stage and observing the uptake of 18F- in specific regions of bone, it might be possible to shorten the recovery time and avoid unnecessary late complications.

Place, publisher, year, edition, pages
Springer Nature, 2020
Keywords
Complex tibia fractures, NaF-18 bone scans, Orthopedic surgery, PET/CT, Taylor Spatial Frame, Tibia osteotomies
National Category
Orthopaedics Radiology, Nuclear Medicine and Medical Imaging
Identifiers
urn:nbn:se:kth:diva-281353 (URN)10.1007/s00590-020-02776-2 (DOI)000692824700018 ()32889671 (PubMedID)2-s2.0-85090317268 (Scopus ID)
Note

QC 20200929

Available from: 2020-09-18 Created: 2020-09-18 Last updated: 2022-06-25Bibliographically approved
Liu, L., Yani, S., Maguire Jr., G. Q., Li, Y. & Simeonidou, D. (2020). Hardware-Efficient ROADM Design with Fiber-Core Bypassing for WDM/SDM Networks. In: 2020 optical fiber communications conference and exposition (OFC): . Paper presented at 2020 optical fiber communications conference and exposition (OFC). IEEE
Open this publication in new window or tab >>Hardware-Efficient ROADM Design with Fiber-Core Bypassing for WDM/SDM Networks
Show others...
2020 (English)In: 2020 optical fiber communications conference and exposition (OFC), IEEE , 2020Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

A SDM/WDM ROADM is proposed with low port-count WSSs. Fiber-core bypassing reduces the number of and port-count of WSSs in the implementation. The design requires less hardware without compromising on network performance with the developed routing core and wavelength assignment algorithm.

Place, publisher, year, edition, pages
IEEE, 2020
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-300247 (URN)000676346200529 ()2-s2.0-85085218860 (Scopus ID)
Conference
2020 optical fiber communications conference and exposition (OFC)
Note

QC 20210831

Available from: 2021-08-31 Created: 2021-08-31 Last updated: 2022-06-25Bibliographically approved
Farshin, A., Roozbeh, A., Maguire Jr., G. Q. & Kostic, D. (2020). Optimizing Intel Data Direct I/O Technology for Multi-hundred-gigabit Networks. In: Proceedings of the Fifteenth EuroSys Conference (EuroSys'20), Heraklion, Crete, Greece, April 27-30, 2020.: . Paper presented at Fifteenth EuroSys Conference (EuroSys'20), Heraklion, Crete, Greece, April 27-30, 2020..
Open this publication in new window or tab >>Optimizing Intel Data Direct I/O Technology for Multi-hundred-gigabit Networks
2020 (English)In: Proceedings of the Fifteenth EuroSys Conference (EuroSys'20), Heraklion, Crete, Greece, April 27-30, 2020., 2020Conference paper, Poster (with or without abstract) (Refereed) [Artistic work]
Abstract [en]

Digitalization across society is expected to produce a massive amount of data, leading to the introduction of faster network interconnects. In addition, many Internet services require high throughput and low latency. However, having only faster links does not guarantee throughput or low latency. Therefore, it is essential to perform holistic system optimization to fully take advantage of the faster links to provide high-performance services. Intel Data Direct I/O (DDIO) is a recent technology that was introduced to facilitate the deployment of high-performance services based on fast interconnects. We evaluated the effectiveness of DDIO for multi-hundred-gigabit networks. This paper briefly discusses our findings on DDIO, which show the necessity of optimizing/adapting it to address the challenges of multi-hundred-gigabit-per-second links.

Keywords
Data Direct I/O technology, DDIO, Optimizing, Characteristic, Multi-hundred-gigabit networks.
National Category
Communication Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-272720 (URN)
Conference
Fifteenth EuroSys Conference (EuroSys'20), Heraklion, Crete, Greece, April 27-30, 2020.
Projects
Time-Critical CloudsULTRAWASP
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research EU, Horizon 2020, 770889
Note

QC 20200626

Available from: 2020-04-27 Created: 2020-04-27 Last updated: 2022-06-26Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6066-746X

Search in DiVA

Show all publications