Computing services are highly integrated into modern society and used by millions of people daily. To meet these high demands, many popular services are implemented and deployed as geo-distributed applications on top of third-party virtualized cloud providers. However, the nature of such a deployment leads to variable performance. To deliver high quality of service, these systems strive to adapt to ever-changing conditions by monitoring changes in state and making informed run-time decisions, such as choosing server peering, replica placement, and redirection of requests. In this dissertation, we seek to improve the quality of run-time decisions made by geo-distributed systems. We attempt to achieve this through: (1) a better understanding of the underlying deployment conditions, (2) systematic and thorough testing of the decision logic implemented in these systems, and (3) by providing a clear view of the network and system states allowing services to make better-informed decisions. First, we validate an application’s decision logic used in popular storage systems by examining replica selection algorithms. We do this by introducing GeoPerf, a tool that uses symbolic execution and modeling to perform systematic testing of replica selection algorithms. GeoPerf was used to test two popular storage systems and found one bug in each. Then, using measurements across EC2, we observed persistent correlation between network paths and network latency. Based on these observations, we introduce EdgeVar, a tool that decouples routing and congestion based changes in network latency. This additional information improves estimation of latency, as well as increases the stability of network path selection. Next, we introduce Tectonic, a tool that tracks an application’s requests and responses both at the user and kernel levels. In combination with EdgeVar, it decouples end-to-end request completion time into three components of network routing, network congestion, and service time. Finally, we demonstrate how this decoupling of request completion time components can be leveraged in practice by developing Kurma, a fast and accurate load balancer for geo-distributed storage systems. At runtime, Kurma integrates network latency and service time distributions to accurately estimate the rate of Service Level Objective (SLO) violations, for requests redirected between geo-distributed datacenters. Using real-world data, we demonstrate Kurma’s ability to effectively share load among datacenters while reducing SLO violations by a factor of up to 3 in high load settings or reducing the cost of running the service by up to 17%. The techniques described in this dissertation are important for current and future geo-distributed services that strive to provide the best quality of service to customers while minimizing the cost of operating the service.