In highly dynamic wireless networks, communications face several challenges. In the first place, noise levels between nodes might be difficult to predict a priori. Besides, a Byzantine attacker hidden in the network, with knowledge of the network topology and observation of all transmissions, can choose arbitrary locations to inject corrupted packets. Considering that transmissions are usually in bits and hardware in wireless networks usually use modulation schemes with the size of modulation alphabet being powers of two, e.g. BPSK, QPSK, 16-QAM, 64-QAM, and so on, to address the above problem, we study coding for networks experiencing worst case bit errors, and with network codes over binary extension fields. We demonstrate that in this setup prior network error-correcting schemes can be arbitrarily far from achieving the optimal network throughput. A new transform metric for errors under the considered model is proposed. Using this metric, we replicate many of the classical results from coding theory. Specifically, new Hamming-type, Plotkin-type, and Elias-Bassalygo-type upper bounds on the network capacity are derived. A commensurate lower bound is shown based on Gilbert-Varshamov (GV)-type codes for error-correction. The GV codes used to attain the lower bound can be non-coherent, that is, they require neither prior knowledge of the network topology nor network coding kernels. We also propose a computationally efficient concatenation scheme. The rate achieved by our concatenated codes is characterized by a Zyablov-type lower bound. We provide a generalized minimum-distance decoding algorithm which decodes up to half the minimum distance of the concatenated codes. The end-to-end nature of our design enables our codes to be overlaid on the classical distributed random linear network codes. The other advantage of the end-to-end strategy over the link-by-link error-correction is that it reduces the computational cost at the internal nodes for performing error-correction.
The problem of symmetric private information retrieval (SPIR) from a coded database which is distributively stored among colluding servers is studied. Specifically, the database comprises K files, which are stored among N servers using an (N, M)-MDS storage code. A user wants to retrieve one file from the database by communicating with the N servers, without revealing the identity of the desired file to any server. Furthermore, the user shall learn nothing about the other K - 1 files in the database. In the T-colluding SPIR problem (hence called TSPIR), any T out of N servers may collude, that is, they may communicate their interactions with the user to guess the identity of the requested file. We show that for linear schemes, the information-theoretic capacity of the MDS-TSPIR problem, defined as the maximum number of information bits of the desired file retrieved per downloaded bit, equals 1 - M+T-1/N, if the servers share common randomness (unavailable at the user) with amount at least M+T-1/N-M-T+1 times the file size. Otherwise, the capacity equals zero.
The problem of private information retrieval (PIR) is to retrieve one message out of K messages replicated at N databases, without revealing the identity of the desired message to the databases. We consider the problem of PIR with colluding databases and eavesdroppers, named ETPIR. Specifically, any T out of N databases may collude, that is, they may communicate their interactions with the user to guess the identity of the requested message. An eavesdropper is curious to know the content of the messages and can tap in on the incoming and outgoing transmissions of any E databases with the user. The databases share some common randomness unknown to the eavesdropper and the user, and use the common randomness to generate the answers, such that the eavesdropper can learn no information about the K messages. The capacity is defined as the maximum retrieval rate, i.e. the number of information bits of the desired message retrieved per downloaded bit. In our previous work [1], we found that when Egeq T, the capacity equals 1-frac E N. In this work, we focus on the case when Eleq T. We find an outer bound (converse bound) and an inner bound (achievability) on the optimal achievable rate. The gap between the derived inner and outer bounds vanishes as the number of messages K tends to infinity.
A user wants to retrieve a file from a database without revealing the identity of the file retrieved to the operator of the database (server), which is known as the problem of private information retrieval (PIR). If it is further required that the user obtains no information about the other files in the database, the concept of symmetric PIR (SPIR) is introduced to guarantee privacy for both parties. For SPIR, the server(s) need to access some randomness independent of the database, to protect the content of undesired files from the user. The information-theoretic capacity of SPIR is defined as the maximum number of information bits of the desired file retrieved per downloaded bit. In this paper, the problem of SPIR is studied for a distributed storage system with N servers (nodes), where all data (including the files and the randomness) are stored in a distributed way. Specifically, the files are stored by an (N, K-C)-MDS storage code. The randomness is distributedly stored such that any K-C servers store independent randomness information. We consider two scenarios regarding to the ability of the storage nodes to cooperate. In the first scenario considered, the storage nodes do not communicate or collude. It is shown that the SPIR capacity for MDS-coded storage (hence called MDS-SPIR) is 1 - K-C/N, when the amount of the total randomness of distributed nodes (unavailable at the user) is at least K-C/N-K-C times the file size. Otherwise, the MDS-SPIR capacity equals zero. The second scenario considered is the T-colluding SPIR problem (hence called TSPIR). Specifically, any T out of N servers may collude, that is, they may communicate their interactions with the user to guess the identity of the requested file. In the special case with K-C = 1, i.e., the database is replicated at each node, the capacity of TSPIR is shown to be 1 - T/N, with the ratio of the total randomness size relative to the file size be at least T/N-T. For TSPIR with MDS-coded storage (called MDS-TSPIR for short), when restricted to schemes with additive randomness where the servers add the randomness to the answers regardless of the queries received, the capacity is proved to equal 1 - K-C+T-1/N, with total randomness at least K-C+T-1/N-K-C-T+1 times the file size. The MDS-TSPIR capacity for general schemes remains an open problem.
The capacity of symmetric private information retrieval with K messages, N servers (out of which any T may collude), and an omniscient Byzantine adversary (who can corrupt any B answers) is shown to be 1 − T +2 B N [1], under the requirement of zero probability of error. In this work, we show that by weakening the adversary slightly (either providing secret low rate channels between the servers and the user, or limiting the observation of the adversary), and allowing vanishing probability of error, the capacity increases to 1 − T + B N
The capacity of symmetric private information retrieval (PIR) with N servers and K messages, each coded by an (N, M)-MDS code has been characterized as CMDS-SPIR = 1- M/N. A critical assumption for this result is that the randomness is similarly coded by an (N, M)-MDS code, i.e., the code parameters of the messages and randomness are matched. In this work, we are interested in the mismatched case, and as a preliminary result, we establish the capacity of the mismatched MDS coded symmetric PIR (SPIR) problem under an extreme setting, where the messages are coded by an (N, M)-MDS code and the randomness is replicated (i.e., coded by an (N, 1)-MDS code). The capacity is shown to be Cmis-MDS-SPIR = (1 - 1/N). (1 + M-1/N (1+ M/N + . . . (M/N)(K-2)))(-1). Interestingly, Cmis-MDS-SPIR > CMDS-SPIR, so mismatched coded randomness (with more redundancy) is strictly beneficial. Further, mismatched SPIR exhibits properties that are similar to PIR.
The capacity of symmetric private information retrieval with K messages, N servers (out of which any T may collude), and an omniscient Byzantine adversary (who can corrupt any B answers) is shown to be 1 - T+2B/N [1], under the requirement of zero probability of error. In this work, we show that by weakening the adversary slightly (either providing secret low rate channels between the servers and the user, or limiting the observation of the adversary), and allowing vanishing probability of error, the capacity increases to 1 - T+B/N.
Among many current data processing systems, the objectives are often not the reproduction of data, but to compute some answers based on the data responding to sonic queries. The similarity identification task is to identify the items in a database which are similar to a given query item regarding to a certain metric. The problem of compression for similarity identification has been studied in [1]. Unlike classic compression problems, the focus is not on reconstructing the original data. Instead, the compression rate is determined by the desired reliability of the answers. Specifically, the information measure identification rate of a compression scheme characterizes the minimum compression rate that can be achieved which guarantees reliable answers with respect to a given similarity threshold. In this paper, we study the component-based quadratic similarity identification for correlated sources. The blocks are first decorrelated by Karhunen-Loeve transform. Then, the decorrelated data is processed by a distinct D-admissible system for each component. We derive the identification rate of component-based scheme for block correlated Gaussian sources. In addition, we characterize the identification rate of a special setting where any information regarding to the component similarity thresholds is unknown while only the similarity threshold of the whole scheme is given. Furthermore, we prove that block-correlated Gaussian sources are the "most difficult" to code under the special setting.
The task of similarity identification is to identify items in a database which are similar to a given query item for a given metric. The identification rate of a compression scheme characterizes the minimum rate that can be achieved which guarantees reliable answers with respect to a given similarity threshold [1]. In this paper, we study a prediction-based quadratic similarity identification for autoregressive processes. We use an ideal linear predictor to remove linear dependencies in autoregressive processes. The similarity identification is conducted on the residuals. We show that the relation between the distortion of query and database processes and the distortion of their residuals is characterized by a sequence of eigenvalues. We derive the identification rate of our prediction-based approach for autoregressive Gaussian processes. We characterize the identification rate for the special case where only the smallest value in the sequence of eigenvalues is required to be known and derive its analytical upper bound by approximating a sequence of matrices with a sequence of Toeplitz matrices.