kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Ghodsi, Ali
Publications (10 of 13) Show all publications
Ananthanarayanan, G., Ghodsi, A., Shenker, S. & Stoica, I. (2019). Effective straggler mitigation: Attack of the Clones. In: Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013: . Paper presented at 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013, 2 April 2013 through 5 April 2013 (pp. 185-198). USENIX Association
Open this publication in new window or tab >>Effective straggler mitigation: Attack of the Clones
2019 (English)In: Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013, USENIX Association , 2019, p. 185-198Conference paper, Published paper (Refereed)
Abstract [en]

Small jobs, that are typically run for interactive data analyses in datacenters, continue to be plagued by disproportionately long-running tasks called stragglers. In the production clusters at Facebook and Microsoft Bing, even after applying state-of-the-art straggler mitigation techniques, these latency sensitive jobs have stragglers that are on average 8 times slower than the median task in that job. Such stragglers increase the average job duration by 47%. This is because current mitigation techniques all involve an element of waiting and speculation. We instead propose full cloning of small jobs, avoiding waiting and speculation altogether. Cloning of small jobs only marginally increases utilization because workloads show that while the majority of jobs are small, they only consume a small fraction of the resources. The main challenge of cloning is, however, that extra clones can cause contention for intermediate data. We use a technique, delay assignment, which efficiently avoids such contention. Evaluation of our system, Dolly, using production workloads shows that the small jobs speedup by 34% to 46% after state-of-the-art mitigation techniques have been applied, using just 5% extra resources for cloning.

Place, publisher, year, edition, pages
USENIX Association, 2019
Keywords
Clone cells, Systems analysis, Data centers, Facebook, Interactive data analysis, MicroSoft, Mitigation techniques, Production workloads, Running tasks, State of the art, Cloning
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-268581 (URN)2-s2.0-85076690211 (Scopus ID)
Conference
10th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2013, 2 April 2013 through 5 April 2013
Note

QC 20200414

Part of ISBN 9781931971003

Available from: 2020-04-14 Created: 2020-04-14 Last updated: 2024-10-15Bibliographically approved
Bailis, P., Ghodsi, A., Hellerstein, J. M. & Stoica, I. (2013). Bolt-on causal consistency. In: SIGMOD '13 Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data: . Paper presented at 2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013; New York, NY; United States; 22 June 2013 through 27 June 2013 (pp. 761-772). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Bolt-on causal consistency
2013 (English)In: SIGMOD '13 Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, Association for Computing Machinery (ACM), 2013, p. 761-772Conference paper, Published paper (Refereed)
Abstract [en]

We consider the problem of separating consistency-related safety properties from availability and durability in distributed data stores via the application of a "bolt-on" shim layer that upgrades the safety of an underlying general-purpose data store. This shim provides the same consistency guarantees atop a wide range of widely deployed but often inflexible stores. As causal consistency is one of the strongest consistency models that remain available during system partitions, we develop a shim layer that upgrades eventually consistent stores to provide convergent causal consistency. Accordingly, we leverage widely deployed eventually consistent infrastructure as a common substrate for providing causal guarantees. We describe algorithms and shim implementations that are suitable for a large class of application-level causality relationships and evaluate our techniques using an existing, production-ready data store and with real-world explicit causality relationships.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2013
Series
Proceedings of the ACM SIGMOD International Conference on Management of Data, ISSN 0730-8078
Keywords
Causal consistency, Eventual consistency, Separation of concerns
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-134068 (URN)10.1145/2463676.2465279 (DOI)2-s2.0-84880526717 (Scopus ID)978-145032037-5 (ISBN)
Conference
2013 ACM SIGMOD Conference on Management of Data, SIGMOD 2013; New York, NY; United States; 22 June 2013 through 27 June 2013
Note

QC 20131118

Available from: 2013-11-18 Created: 2013-11-15 Last updated: 2024-03-18Bibliographically approved
Panda, A., Scott, C., Ghodsi, A., Koponen, T. & Shenker, S. (2013). CAP for networks. In: HotSDN 2013 - Proceedings of the 2013 ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking: . Paper presented at 2013 2nd ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking, HotSDN 2013, 16 August 2013 through 16 August 2013, Hong Kong (pp. 91-96). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>CAP for networks
Show others...
2013 (English)In: HotSDN 2013 - Proceedings of the 2013 ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking, Association for Computing Machinery (ACM), 2013, p. 91-96Conference paper, Published paper (Refereed)
Abstract [en]

The CAP theorem showed that it is impossible for datastore systems to achieve all three of strong consistency, availability and partition tolerance. In this paper we investigate how these trade-offs apply to software-defined networks. Specifically, we investigate network policies such as tenant isolation and middlebox traversal, and prove that it is impossible for implementations to enforce them without sacrificing availability. We conclude by distilling practical design lessons from our observations.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2013
Keywords
Availability, Correctness, Distributed controllers, Software defined network
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-133277 (URN)10.1145/2491185.2491186 (DOI)2-s2.0-84883672874 (Scopus ID)978-145032056-6 (ISBN)
Conference
2013 2nd ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking, HotSDN 2013, 16 August 2013 through 16 August 2013, Hong Kong
Note

QC 20131030

Available from: 2013-10-30 Created: 2013-10-29 Last updated: 2024-03-18Bibliographically approved
Bailis, P. & Ghodsi, A. (2013). Eventual Consistency Today: Limitations, Extensions, and Beyond. Communications of the ACM, 56(5), 55-63
Open this publication in new window or tab >>Eventual Consistency Today: Limitations, Extensions, and Beyond
2013 (English)In: Communications of the ACM, ISSN 0001-0782, E-ISSN 1557-7317, Vol. 56, no 5, p. 55-63Article in journal (Refereed) Published
Abstract [en]

Brewer's conjecture'based on his experiences building infrastructure for some of the first Internet search engines at Inktomi'states that distributed systems requiring always on, highly available operation cannot guarantee the illusion of coherent, consistent single-system operation in the presence of network partitions, which cut communication between active servers. Moreover, even without partitions, a system that chooses availability over consistency enjoys benefits of low latency. If a server can safely respond to a user's request when it is partitioned from all other servers, then it can also respond to a user's request without contacting other servers even when it is able to do so. Eventual consistency as an available alternative. Given the CAP impossibility result, distributed-database designers sought weaker consistency models that would enable both availability and high performance.

National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-123107 (URN)10.1145/2447976.2447992 (DOI)000318241500018 ()2-s2.0-84877898372 (Scopus ID)
Note

QC 20130603

Available from: 2013-06-03 Created: 2013-06-03 Last updated: 2024-03-18Bibliographically approved
Bailis, P., Fekete, A., Ghodsi, A., Hellerstein, J. M. & Stoica, I. (2013). HAT, not CAP: Towards highly available transactions. In: 14th Workshop on Hot Topics in Operating Systems, HotOS 2013: . Paper presented at 14th Workshop on Hot Topics in Operating Systems, HotOS 2013, 13 May 2013 through 15 May 2013. USENIX Association
Open this publication in new window or tab >>HAT, not CAP: Towards highly available transactions
Show others...
2013 (English)In: 14th Workshop on Hot Topics in Operating Systems, HotOS 2013, USENIX Association , 2013Conference paper, Published paper (Refereed)
Abstract [en]

While the CAP Theorem is often interpreted to preclude the availability of transactions in a partition-prone environment, we show that highly available systems can provide useful transactional semantics, often matching those of today's ACID databases. We propose Highly Available Transactions (HATs) that are available in the presence of partitions. HATs support many desirable ACID guarantees for arbitrary transactional sequences of read and write operations and permit low-latency operation.

Place, publisher, year, edition, pages
USENIX Association, 2013
Keywords
Highly available systems, Low latency, Write operations, Semantics
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-306494 (URN)2-s2.0-84889691105 (Scopus ID)
Conference
14th Workshop on Hot Topics in Operating Systems, HotOS 2013, 13 May 2013 through 15 May 2013
Note

QC 20211217

Available from: 2021-12-17 Created: 2021-12-17 Last updated: 2022-06-25Bibliographically approved
Fayazbakhsh, S. K., Lin, Y., Tootoonchian, A., Ghodsi, A., Koponen, T., Maggs, B., . . . Shenker, S. (2013). Less pain, most of the gain: Incrementally deployable ICN. Paper presented at Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, ACM SIGCOMM 2013; Hong Kong; China; 12-16 August 2013. Computer communication review, 43(4), 147-158
Open this publication in new window or tab >>Less pain, most of the gain: Incrementally deployable ICN
Show others...
2013 (English)In: Computer communication review, ISSN 0146-4833, E-ISSN 1943-5819, Vol. 43, no 4, p. 147-158Article in journal (Refereed) Published
Abstract [en]

Information-Centric Networking (ICN) has seen a significant resurgence in recent years. ICN promises benefits to users and service providers along several dimensions (e.g., performance, security, and mobility). These benefits, however, come at a non-trivial cost as many ICN proposals envision adding significant complexity to the network by having routers serve as content caches and support nearest-replica routing. This paper is driven by the simple question of whether this additional complexity is justified and if we can achieve these benefits in an incrementally deployable fashion. To this end, we use trace-driven simulations to analyze the quantitative benefits attributed to ICN (e.g., lower latency and congestion). Somewhat surprisingly, we find that pervasive caching and nearest-replica routing are not fundamentally necessary - -most of the performance benefits can be achieved with simpler caching architectures. We also discuss how the qualitative benefits of ICN (e.g., security, mobility) can be achieved without any changes to the network. Building on these insights, we present a proof-of-concept design of an incrementally deployable ICN architecture.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2013
Keywords
information-centric networking, internet architecture
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-133407 (URN)10.1145/2534169.2486023 (DOI)000327465900020 ()2-s2.0-84891612475 (Scopus ID)978-145032056-6 (ISBN)
Conference
Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, ACM SIGCOMM 2013; Hong Kong; China; 12-16 August 2013
Note

QC 20131104

Available from: 2013-11-04 Created: 2013-10-31 Last updated: 2022-09-19Bibliographically approved
Ghodsi, A., Sekar, V., Zaharia, M. & Stoica, I. (2012). Multi-Resource Fair Queueing for Packet Processing. Computer communication review, 42(4), 1-12
Open this publication in new window or tab >>Multi-Resource Fair Queueing for Packet Processing
2012 (English)In: Computer communication review, ISSN 0146-4833, E-ISSN 1943-5819, Vol. 42, no 4, p. 1-12Article in journal (Refereed) Published
Abstract [en]

Middleboxes are ubiquitous in today's networks and perform a variety of important functions, including IDS, VPN, firewalling, and WAN optimization. These functions differ vastly in their requirements for hardware resources (e.g., CPU cycles and memory bandwidth). Thus, depending on the functions they go through, different flows can consume different amounts of a middlebox's resources. While there is much literature on weighted fair sharing of link bandwidth to isolate flows, it is unclear how to schedule multiple resources in a middlebox to achieve similar guarantees. In this paper, we analyze several natural packet scheduling algorithms for multiple resources and show that they have undesirable properties. We propose a new algorithm, Dominant Resource Fair Queuing (DRFQ), that retains the attractive properties that fair sharing provides for one resource. In doing so, we generalize the concept of virtual time in classical fair queuing to multi-resource settings. The resulting algorithm is also applicable in other contexts where several resources need to be multiplexed in the time domain.

Keywords
Fair Queueing, Middleboxes, Scheduling
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-104365 (URN)10.1145/2377677.2377679 (DOI)000309217600001 ()2-s2.0-84894599599 (Scopus ID)
Note

QC 20121105

Available from: 2012-11-05 Created: 2012-11-01 Last updated: 2024-03-18Bibliographically approved
Ghodsi, A., Sekar, V., Zaharia, M. & Stoica, I. (2012). Multi-resource fair queueing for packet processing. In: SIGCOMM'12 - Proceedings of the ACM SIGCOMM 2012 Conference Applications, Technologies, Architectures, and Protocols for Computer Communication: . Paper presented at ACM SIGCOMM 2012 Conference Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM 2012, 13 August 2012 through 17 August 2012, Helsinki (pp. 1-12). ACM
Open this publication in new window or tab >>Multi-resource fair queueing for packet processing
2012 (English)In: SIGCOMM'12 - Proceedings of the ACM SIGCOMM 2012 Conference Applications, Technologies, Architectures, and Protocols for Computer Communication, ACM , 2012, p. 1-12Conference paper, Published paper (Refereed)
Abstract [en]

Middleboxes are ubiquitous in today's networks and perform a variety of important functions, including IDS, VPN, firewalling, and WAN optimization. These functions differ vastly in their requirements for hardware resources (e.g., CPU cycles and memory bandwidth). Thus, depending on the functions they go through, different flows can consume different amounts of a middlebox's resources. While there is much literature on weighted fair sharing of link bandwidth to isolate flows, it is unclear how to schedule multiple resources in a middlebox to achieve similar guarantees. In this paper, we analyze several natural packet scheduling algorithms for multiple resources and show that they have undesirable properties. We propose a new algorithm, Dominant Resource Fair Queuing (DRFQ), that retains the attractive properties that fair sharing provides for one resource. In doing so, we generalize the concept of virtual time in classical fair queuing to multi-resource settings. The resulting algorithm is also applicable in other contexts where several resources need to be multiplexed in the time domain.

Place, publisher, year, edition, pages
ACM, 2012
Keywords
fair queueing, fairness, middleboxes, scheduling, CPU cycles, Fair queuing, Fair sharing, Hardware resources, Link bandwidth, Memory bandwidths, Multi-resource, Multiple resources, Packet processing, Packet scheduling algorithm, Time domain, Virtual-time, Algorithms, Communication, Computer architecture, Distributed computer systems, Packet networks, Queueing theory, Network architecture
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-105329 (URN)10.1145/2342356.2342358 (DOI)2-s2.0-84866524258 (Scopus ID)978-145031419-0 (ISBN)
Conference
ACM SIGCOMM 2012 Conference Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM 2012, 13 August 2012 through 17 August 2012, Helsinki
Note

QC 20121120

Available from: 2012-11-20 Created: 2012-11-20 Last updated: 2024-03-18Bibliographically approved
Ananthanarayanan, G., Ghodsi, A., Wang, A., Borthakur, D., Kandula, S., Shenker, S. & Stoica, I. (2012). PACMan: Coordinated memory caching for parallel jobs. In: Proceedings of NSDI 2012: 9th USENIX Symposium on Networked Systems Design and Implementation. Paper presented at 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, 25 April 2012 through 27 April 2012, San Jose, California (pp. 267-280). USENIX Association
Open this publication in new window or tab >>PACMan: Coordinated memory caching for parallel jobs
Show others...
2012 (English)In: Proceedings of NSDI 2012: 9th USENIX Symposium on Networked Systems Design and Implementation, USENIX Association , 2012, p. 267-280Conference paper, Published paper (Refereed)
Abstract [en]

Data-intensive analytics on large clusters is important for modern Internet services. As machines in these clusters have large memories, in-memory caching of inputs is an effective way to speed up these analytics jobs. The key challenge, however, is that these jobs run multiple tasks in parallel and a job is sped up only when inputs of all such parallel tasks are cached. Indeed, a single task whose input is not cached can slow down the entire job. To meet this "all-or-nothing" property, we have built PACMan, a caching service that coordinates access to the distributed caches. This coordination is essential to improve job completion times and cluster efficiency. To this end, we have implemented two cache replacement policies on top of PACMan's coordinated infrastructure fb-LIFE that minimizes average completion time by evicting large incomplete inputs, and LFU-F that maximizes cluster efficiency by evicting less frequently accessed inputs. Evaluations on production workloads from Facebook and Microsoft Bing show that PACMan reduces average completion time of jobs by 56% and 51% (small interactive jobs improve by 77%), and improves efficiency of the cluster by 47% and 54%, respectively.

Place, publisher, year, edition, pages
USENIX Association, 2012
Keywords
Systems analysis, All or nothings, Cache replacement policy, Caching services, Completion time, Distributed cache, Internet services, Large clusters, Production workloads, Efficiency
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-314740 (URN)2-s2.0-84919827070 (Scopus ID)
Conference
9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, 25 April 2012 through 27 April 2012, San Jose, California
Note

QC 20220623

Part of proceedings: ISBN  978-931971-92-8

Available from: 2022-06-23 Created: 2022-06-23 Last updated: 2022-06-25Bibliographically approved
Bailis, P., Fekete, A., Ghodsi, A., Hellerstein, J. M. & Stoica, I. (2012). The potential dangers of causal consistency and an explicit solution. In: Proceedings of the 3rd ACM Symposium on Cloud Computing, SoCC 2012: . Paper presented at 3rd ACM Symposium on Cloud Computing, SoCC 2012, 14 October 2012 through 17 October 2012, San Jose, CA. ACM
Open this publication in new window or tab >>The potential dangers of causal consistency and an explicit solution
Show others...
2012 (English)In: Proceedings of the 3rd ACM Symposium on Cloud Computing, SoCC 2012, ACM , 2012Conference paper, Published paper (Refereed)
Abstract [en]

Causal consistency is the strongest consistency model that is available in the presence of partitions and provides useful semantics for human-facing distributed services. Here, we expose its serious and inherent scalability limitations due to write propagation requirements and traditional dependency tracking mechanisms. As an alternative to classic potential causality, we advocate the use of explicit causality, or application-defined happens-before relations. Explicit causality, a subset of potential causality, tracks only relevant dependencies and reduces several of the potential dangers of causal consistency.

Place, publisher, year, edition, pages
ACM, 2012
Keywords
Causality, Convergence, Data dependencies, Explicit causality, Scalability, Semantic knowledge, Weak consistency
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-111813 (URN)10.1145/2391229.2391251 (DOI)2-s2.0-84870505762 (Scopus ID)978-145031761-0 (ISBN)
Conference
3rd ACM Symposium on Cloud Computing, SoCC 2012, 14 October 2012 through 17 October 2012, San Jose, CA
Note

QC 20130114

Available from: 2013-01-14 Created: 2013-01-14 Last updated: 2024-03-18Bibliographically approved
Organisations

Search in DiVA

Show all publications