We study adaptive-playout scheduling for Voice over IP using the frame-work of stochastic impulse control theory. We use the Wiener process tomodel the fluctuation of the buffer length in the absence of control. In thiscontext, the control signal consists of length units that correspond to insert-ing or dropping a pitch cycle. We define an optimality criterion that hasan adjustable trade-off between average buffering delay and average controlsignal (the length of the pitch cycles added plus the length of the pitch cyclesdropped), and show that a band control policy is optimal for this criterion.The band control policy maintains the buffer length within a band regionby imposing impulse control (inserted or dropped pitch cycles) whenever thebounds of the band are reached. One important property of the band controlpolicy is that it incurs no packet-loss through buffering if there are no out-of-order packet-arrivals. Experiments performed on both synthetic and realnetwork-delay traces show that the proposed playout scheduling algorithmoutperforms two recent algorithms in most cases.
We study packet-loss concealment for speech based on autoregressivemodelling using a rigorous minimum mean square error (MMSE) approach.The effect of the model estimation error on predicting the missing segment isstudied and an upper bound on the mean square error is derived. Our exper-iments show that the upper bound is tight when the estimation error is lessthan the signal variance. We also consider the usage of perceptual weightingon prediction to improve speech quality. A rigorous argument is presentedto show that perceptual weighting is not useful in this context. We createsimple and practical MMSE-based systems using two signal models: a basicmodel capturing the short-term correlation and a more sophisticated modelthat also captures the long-term correlation. Subjective quality comparisontests show that the proposed MMSE-based system provides state-of-the-artperformance.
We propose a new adaptive playout scheme for VoIP. The k- Erlang distribution is introduced to model the packet interarrival time distribution. A cost function is proposed for the next played out packet in the buffer based on modelling packet-arrival times with the k-Erlang distribution. The cost function essentially balances the average buffering delay and the packet-loss rate. The optimal playout length of the packet is determined by minimizing the cost function and realized by either inserting or dropping pitch cycles from the packet. Our real-world data experiments show that our scheme outperforms two reference methods for both low-jitter and highjitter cases.
This paper proposes a stratified approach for camera calibration using spheres. Previous works have exploited epipolar tangents to locate frontier points on spheres for estimating the epipolar geometry. It is shown in this paper that other than the frontier points, two additional point features can be obtained by considering the bitangent envelopes of a pair of spheres. A simple method for locating the images of such point features and the sphere centers is presented. An algorithm for recovering the fundamental matrix in a plane plus parallax representation using these recovered image points and the epipolar tangents from three spheres is developed. A new formulation of the absolute dual quadric as a cone tangent to a dual sphere with the plane at infinity being its vertex is derived. This allows the recovery of the absolute dual quadric, which is used to upgrade the weak calibration to a full calibration. Experimental results on both synthetic and real data are presented, which demonstrate the feasibility and the high precision achieved by our proposed algorithm.
In this paper we consider the rate region of the vector Gaussian one-helper distributed source coding problem. In particular, we derive optimality conditions under which a weighted sum rate is minimum by using a contradiction-based argument. When the sources are specified to be scalar, the optimality conditions can always be constructed for any weighted sum rate. In the derivation of the optimality conditions, we introduce a new concept of "source enhancement", which can be viewed as a dual to the well-known "channel enhancement" technique. In particular, source enhancement refers to the operation of increasing the covariance matrix of a Gaussian source in a partial ordering sense. This new technique makes the derivation of the optimality conditions straightforward.
Multimedia communications over packet networks, and in particular the voice over IP (VoIP) application, have become an integral part of society. However, the unreliable and heterogeneous nature of packet networks has led to a best-effort delivery of services. Delay, limitation of bandwidth, and packet-loss rate all affect the quality of service (QoS). In this thesis, we address two important network impairments in the design of robust multimedia communication systems: packet delay-variation and packet-loss.
Paper A considers the mitigation of the effect of packet delay-variation for audio communications by introducing a buffer at the receiver side. A new adaptive playout scheduling approach is proposed to control the buffering length, or, equivalently, the packet playout deadlines, in response to varying network conditions. A Wiener process is used to model the fluctuation of the buffering length without any playout adjustment. The playout scheduling problem is then reformulated as a stochastic impulse control problem by taking the playout adjustment as the control signal. The proposed approach is shown to be the optimal solution to the new control problem. It is demonstrated experimentally that the proposed approach provides improved perceived conversional quality.
Papers B, C and D address the packet-loss issue. Paper B focuses on the design of a low-complexity packet-loss concealment (PLC) method that is compatible with existing speech codecs for VoIP application. The new method is rigorously motivated based on the autoregressive (AR) speech model and the minimum mean squared error (MMSE) criterion. The effect of model estimation error on the prediction of the missing speech segment is also considered and an upper bound for the prediction error is derived. Both the theoretical and experimental results provide insight in the performance of the heuristically designed PLC methods. On the other hand, Paper C and D consider an active packet-loss-resilient coding scheme, namely multiple description coding (MDC). In general, MDC can be used for the transmission of any media data. Paper C derives a simple and accurate approximation of the rate-distortion lower bound of a particular multiple- description scenario and then demonstrates that the performance loss of some practical MD systems can be evaluated easily with the new approximation. Paper D studies the performance limit of a vector Gaussian multiple description scenario. An outer bound to the rate-distortion region is derived, and the outer bound is tight when the problem specializes to the scalar Gaussian case.
The rate region of the two-terminal vector Gaussian CEO problem is studied. A lower bound on the rate region is derived. It is obtained by lower-bounding a weighted sum rate for each supporting hyperplane of the rate region. The bound is in the form of a closed-form expression rather than the form of an optimization problem.
In this work, we study the rate region of the vector Gaussian multipledescription problem with individual and central quadratic distortion con-straints. In particular, we derive an outer bound to the rate region of theL-description problem. The bound is obtained by lower bounding a weightedsum rate for each supporting hyperplane of the rate region. The key ideais to introduce at most L-1 auxiliary random variables and further imposeupon the variables a Markov structure according to the ordering of the de-scription weights. This makes it possible to greatly simplify the derivationof the outer bound. In the scalar Gaussian case, the complete rate regionis fully characterized by showing that the outer bound is tight. In this case,the optimal weighted sum rate for each supporting hyperplane is obtained bysolving a single maximization problem. This contrasts with existing results,which require solving a min-max optimization problem.
This paper studies the tight rate-distortion bound for K-channel symmetric multiple-description coding for a memoryless Gaussian source. We find that the product of a function of the individual side distortions (for single received descriptions) and the central distortion (for K received descriptions) is asymptotically independent of the redundancy among the descriptions. Using this property, we analyze the asymptotic behaviors of two different practical multiple-description lattice vector quantizers (MDLVQ). Our analysis includes the treatment of a MDLVQ system from a new geometric viewpoint, which results in an expression for the side distortions using the normalized second moment of a sphere of higher dimensionality than the quantization space. The expression of the distortion product derived from the lower bound is then applied as a criterion to assess the performance losses of the considered MDLVQ systems.
We study adaptive-playout scheduling for VoIP using the framework of stochastic impulse control theory. A Wiener process is introduced to model the fluctuation of the buffer length in the absence of control. In this context, the control signal consists of length units that correspond to inserting or dropping a pitch cycle. We define an optimality criterion that has an adjustable trade-off between average buffing delay and average control length (the length of the pitch cycles added plus the length of the pitch cycles dropped). The clock-drift effect is treated in a unified manner within this framework. A band control policy is shown to be optimal. The algorithm does not require knowledge of the clock drift. It maintains the buffer length within a band region by imposing impulse control (inserted or dropped pitch cycles) whenever the bounds of the band are reached. Our experiments show that the proposed method outperforms a popular reference method.
This paper studies the tight rate-distortion bound for L-channel sym-metric multiple-description coding of a vector Gaussian source with twolevels of receivers. Each of the first-level receivers obtains κ (κ < L) ofthe L descriptions. The second-level receiver obtains all L descriptions.We find that when the theory is applied to the scalar Gaussian source, theproduct of a function of the side distortions (corresponding to the first-level receivers) and the central distortion (corresponding to the second-levelreceiver) is asymptotically independent of the redundancy among the de-scriptions. Using this property, we analyze the asymptotic behavior of apractical multiple-description lattice vector quantizer (MDLVQ). Our anal-ysis includes the treatment of the MDLVQ system from a new geometricviewpoint, which results in an expression for the side distortions using thenormalized second moment of a sphere of higher dimensionality than thequantization space. The expression of the distortion product derived fromthe lower bound is then applied as a criterion to assess the performance lossof the considered MDLVQ system. In principle, the efficiency of other prac-tical MD systems can also be evaluated using the derived distortion product.