On the representation of wavefronts localized in space-time and wavenumber-frequency domains

of Pacific Summer Water layer variations and ice cover on Beaufort Sea underwater sound ducting Abstract: This Letter reports evidence suggesting a representation system for transient waves with band limited spectra, referred to here as localized waves in the space-time and wavenumber-frequency domains. A theoretical analysis with a transient monopole shows that the wavenumber-frequency pressure spectrum is distributed over hyperbolic regions of propagat-ing waves and evanescent waves. An experimental analysis is performed, applying dictionary learning to reverberant sound ﬁelds measured with a microphone array in three rooms. The learned components appear to be related by analytical transformations in the spectra, suggesting a partitioning characterized by hyperbolic dispersion curves and multiple directions and times of arrival. V C 2021 Author(s). All article content, except where otherwise noted, is licensed a Creative Commons Attribution (CC BY) .


Introduction
Acoustic waves emitted by a transient sound source interact with the surrounding environment, undergoing a variety of transformations due to interference, scattering, and absorption. 1 In practice, these transformations change the spectrum of the original wave, often resulting in a bandpass filtering effect in terms of incidence angles that are attenuated. An example is edge diffraction: transient waves scattered from a finite edge are confined to a localized region of space-time. 2 If, additionally, the edge has a frequency-dependent impedance, even some wavelengths can be attenuated. In this context, we are interested in all such phenomena that result in transient acoustic waves that have a band limited spectrum: we shall refer to them as waves localized in space-time and wavenumber-frequency domains.
The notion of localization is present in the acoustics literature. One of the simplest examples is the case of harmonic plane waves-localized in the wavenumber-frequency domain, which are well suited to represent steady-state responses 3 but ineffective to represent waves localized in space-time. Another example of space-wavenumber localization is the effect due to finite aperture and sensor density introduced by microphone array techniques such as near-field acoustic holography. 4 Due to measurement constraints, the finiteness (density) of the array compromises the localization of waves in the wavenumber (space) domain.
The state-of-the-art of representation systems can be summarized as follows. To represent non-stationary acoustic waves, space-time can be partitioned into spatiotemporal patches, usually followed by localized Fourier transforms. 5-7 A suitable choice of patch size allows one to localize the waves in space-time. Transient waves can also be represented in space and wavenumber domains with curvelets, 8 without the possibility of modelling, however, band-limitedness in the frequency domain. As far as the authors know, the closest approach to representing acoustic waves localized in space-time and wavenumber-frequency domains is the use of shearlets to represent room impulse responses. 9 Machine learning can also be used to train representations from acoustic data. 10 Notable examples of application include the learning of underwater sound speed profiles using K-SVD, 11 the use of sparse matrix factorization for scene classification, 12 the estimation of scattering effects for spatial audio rendering, 2 as well as the representation of steady-state sound fields in rooms with dictionary learning. 13 Despite the fact that some of the state-of-the-art methods are able to represent (or learn to represent) acoustic waves, it is not clear how to analytically design a representation system departing from the underlying architecture of the data.
The main contribution of this Letter is to report the architecture of pressure data in the wavenumber-frequency domain, as a first step towards the analytical design of a representation system for localized waves. The hypothesis is that a) Author to whom correspondence should be addressed, ORCID: 0000-0001-5723-9571. b) ORCID: 0000-0003-2153-9630. the patterns governing this architecture follow a spectral partitioning into band limited regions of hyperbolic dispersion curves and intervals of phase velocities. The hypothesis is tested with a theoretical analysis of the pressure spectra of a transient monopole source, as well as with an experimental analysis applying dictionary learning on reverberant sound fields.

Theoretical analysis
Let us consider the dispersion relation of acoustic waves in a homogeneous medium at constant temperature and in the absence of flow 3 which is obtained by Fourier transforming the linearized wave equation in Cartesian coordinates: @ tt P ¼ c 2 r 2 P, where P is the sound pressure and t is time. Here, the constant k ¼ x=c is the acoustic wavenumber, x is the angular frequency, c is the sound speed, and k i is the wavenumber along the i ¼ x; y; z spatial coordinates. Assuming without loss of generality that waves propagate along the z-axis, their wavenumber is k z ¼ 6 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi . This leads to the concept of radiation circle: 3 k 2 ¼ k 2 x þ k 2 y . Acoustic waves are said to propagate to the far-field when k z is real, i.e., k 2 ! k 2 x þ k 2 y , while evanescent waves decay exponentially in the near-field when k z is imaginary, i.e., k 2 < k 2 x þ k 2 y . By letting k vary, the radiation circle becomes a radiation cone; in other words, the radiation cone is defined as the extrusion of the radiation circle with a variable frequency x 2 ðÀ1; 1Þ.

Analysis with a transient monopole
Let us consider two spatial dimensions (x, y) and time t. Assume a monopole sound source, located at (x s , y s ), is emitting a transient pulse towards a continuous, infinite line of receivers located at y ¼ 0, as shown in Fig. 1(a). The pulse radiated by the source is Pðx; y; tÞ Translations of the line of receivers along the x and y axes correspond to trivial (phase) transformations in the wavenumber-frequency domain ðk x ; xÞ. A non-trivial transformation in space-time that leads to a modulus change in the wavenumber-frequency domain is a rotation around the normal to the (x, y) plane [see Fig. 1 Consider a rigid rotation with an angle a 2 ðÀp=2; p=2Þ. The distance between the source and a generic point ðx cos a; x sin aÞ on the line of receivers is now a function of a, The Fourier transform of the pressure with respect to space x and time t, as a function of a, reads By defining r s ðaÞ ¼ x s cos a þ y s sin a; u ¼ x À r s ðaÞ, and g 2 ðaÞ ¼ x 2 s þ y 2 s À r s ðaÞ 2 , then rðx; aÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , and the integral becomes  cos k x u ð Þdu: From the table of integrals, 14 the pressure spectrum becomes where H ð1Þ 0 is the complex conjugate of the zero-th order Hankel function of the first kind. The pressure spectrum is singular whenever the argument of H ð1Þ 0 is zero: and this happens whenever Eq. (1) is satisfied, i.e., at the radiation cone ðx=cÞ 2 ¼ k 2 x ; and/or when gða c Þ ¼ 0 where a c ¼ tan À1 ðy s =x s Þ.

Partitioning the wavenumber-frequency domain
Let us now analyze Eq. (5) more into detail by considering four values of a. Figures 2(a)-2(d) show the corresponding sound pressure level spectra. It can be observed that there are contours of constant sound pressure level, which follow hyperbolic trajectories asymptotically approaching the radiation cone (depicted in diagonals). One can also observe that the spectrum becomes increasingly broadband as a approaches p=2 or, more generally, a c . Figures 2(e)-2(g) show the partitioning of the wavenumber-frequency domain of three state-of-the-art representation systems: (e) Cartesian, (f) directional sub-bands, 6 and (g) shearlets. 9 It can be seen that these partitioning approaches do not follow closely the structure of the spectra in Figs. 2(a)-2(d).
We propose a different partitioning of the wavenumber-frequency domain, shown in Fig. 2(h), into band limited regions described by hyperbolic dispersion curves and lines of constant phase velocity. The hyperbolic curves are naturally linked with the physics of Eq. (5), and can be described analytically by bounding the square root in the argument of the Hankel function to intervals ½a; b; 0 < a < b; which gives the hyperbolic regions a jðx=cÞ 2 À k 2 x j b. In order to make these hyperbolic regions band limited, i.e., able to represent band limited waves, we need lines of constant phase velocity. The physical motivation is that sound sources have directivity patterns in practice, 3 resulting in band limited spectra within intervals of phase velocities. This is equivalent to the so-called directional sub-bands, 6 shown in Fig. 2(f), which are closely related to the direction and time of arrival of the waves. 7 These intervals of phase velocities can be described analytically as follows: x=v 1 k x x=v 2 ; v 1 > v 2 > 0.

Experimental analysis
The experimental data consists of three sets of room impulse responses measured with a 1D microphone array. 9 Reverberant sound fields comprise rich training data involving absorption, diffraction, and interference phenomena, which results in waves with various wavelengths, directions and times of arrival. An omnidirectional source is placed in three furnished rooms, at 1.5 m height and 2 m away from the array. The spatiotemporal impulse responses are obtained with sine sweeps in the frequency bandwidth 100-4500 Hz, with sampling rate f s ¼ 11:25 kHz. The array is placed at 1.2 m height, and has 100 positions spaced by Dx ¼ 3 cm, resulting in a wavenumber bandwidth of 209 rad/m.  The space-time datasets are first pre-processed for dictionary learning. The data is divided into non-overlapping patches of 32 Â 32 space-time samples each. A Tukey window and a 2D Fourier transform is applied to each patch, resulting in pressure spectra patches in the wavenumber-frequency domain. The wavenumber-frequency resolution of each patch is Dk ¼ 6:5 rad/m and Dx ¼ 2209 rad/s. Each spectrum is then vectorized and allocated into the columns of an observation matrix B 2 C MÂN , where M ¼ 1024 is the number of space-time samples, and N ¼ 1485 is the total number of space-time patches (each of size 32 Â 32). All impulse responses, approximately 1.5 Â 10 6 samples, are considered for the training set.
The pre-processed data is then fed to the dictionary learning algorithm. The main idea is to learn a limited number of components, defined over wavenumber-frequency patches of size 32 Â 32, that best represent the variance of the data. There is a variety of dictionary learning algorithms, 15 and in this Letter we choose sparse principal component analysis. In contrast to conventional principal component analysis (PCA), sparse PCA learns a redundant set of loading vectors, i.e., no longer orthogonal, and it ensures that each data point can be represented using a small number of components. Redundant representations are known to benefit from better performance against noise and missing data, as well as a more parsimonious representation of the data. 16 The dictionary learning is formulated as the following optimization problem: which is solved by means of a soft thresholding algorithm using the MATLAB toolbox SPASM. 17 Here, b i 2 C M is the ith column of the observation matrix B, U 2 C MÂQ is the dictionary matrix whose columns are the Q learned components, w i 2 C Q contains the weights of the components, l is a regularization parameter, and jj Á jj p ¼ ½+j Á j p 1=p is the vector ' p norm, 0 < p < 1. The choice of regularization parameter is 0 < l < M, and it determines the sparsity of the learned components: l out of M non-zero coefficients. Figures 3(a)-3(r) show the magnitude of Q ¼ 18 learned components, with 12.5% non-zero coefficients; while Fig. 3(s) shows the variance of the learned components versus Q, with 12.5% and 50% non-zero coefficients. It can be seen that a less sparse (denser representation) requires more learned components (higher Q) in order to explain the same variance. Moreover, the decay of the variance is irregular for Q < 15, and it becomes nearly exponential for Q ! 15.
The components plotted in Figs. 3(a)-3(r) have overlaid the cone, and the proposed partitioning shown in Fig.  2(h). The magnitude of the learned components is symmetric [mirrored with respect to the ðk x ; xÞ axes], attributed to the fact that the training data is real. Moreover, components are learned both in the far-field and the near-field regions of the wavenumber-frequency domain. In particular, the evanescent components learned are localized in low frequency regions, which is physical since the measurement array is not too close to the boundaries of the room, 9 and near-fields decay stronger with increasing frequency. 3 The hypothesis under study can be examined with the results in Figs. 3(a)-3(r). To facilitate the analysis, we have highlighted band limited regions in the spectra with boxes A, B, C, and D. How close the learned components follow the proposed partitioning can be studied by looking at how the components relate to each other. For instance, the regions A, B, and C, appear to be related by means of a curved transformation in ðk x ; xÞ from one component to another [e.g., compare Fig. 3(b) with Fig. 3(e) and Fig. 3(h) with Fig. 3(j)]. This indicates a strong structure is present in the data, and suggests a resemblance to the hyperbolic dispersion curves observed in Figs. 2(a)-2(d). A similar observation can be drawn for the near-field regions indicated by D in Figs. 3(l) and 3(p). Most of the remaining components [see, e.g., Figs. 3(c) and 3(i) and Figs. 3(g) and 3(m)], also contain regions related by such curved transformations. More evidently, the components are also shifted diagonally within band limited intervals of phase velocities. These intervals correspond to multiple wave directions and times of arrival in the array, which is a natural feature of the training data: reverberant sound fields in rooms. 7 The outcome of the dictionary learning yields some components that have combinations of active regions in the partitioning: we refer to these as "interfering components." For example, in Fig. 3(p), two distinct band limited components can be seen in the right-hand side of the spectrum (one of which is indicated by region D), which have their symmetric counterparts with respect to ðk x ; xÞ in the left-hand side of the spectrum. This suggests the interference of band limited waves with two distinct directions of arrival. Analogously in Fig. 3(b), two band limited components (A and B, plus their symmetric counterparts) are seen within the same phase velocity interval, which suggests the interference of waves with the same direction of arrival but two distinct wavelengths. It should be noted, however, that a representation system including interfering components would be ineffective at representing individual waves. Therefore, future work handcrafting the representation system should involve only individual waves, the linear superposition of which results in their interference.

Conclusions
This Letter shows evidence suggesting a new representation system for transient band limited acoustic waves. A theoretical analysis of the transient pressure emitted by a point source shows that the wavenumber-frequency spectrum is distributed along regions of hyperbolic dispersion curves. An experimental analysis is performed using dictionary learning of reverberant sound fields measured with a microphone array. These analyses suggest a new spectral partitioning into band limited regions described by hyperbolic dispersion curves and intervals of phase velocities. The promising result is that the components learned in the experimental analysis appear to be related by analytical transformations of these band limited regions in the spectra. Future work involves handcrafting a representation system based on such analytical transformations, which we suspect might lead to an efficient implementation and a sparser representation than the state-of-the-art approaches.