Multitaper Reassignment Letter

On By In 1

Time-frequency analysis seeks to decompose a one-dimensional signal along two dimensions, a time axis and a frequency axis; the best known time-frequency representation is the musical score, which notates frequency vertically and time horizontally. These methods are extremely important in fields ranging from quantum mechanics (1–5) to engineering (6, 7), animal vocalizations (8, 9), radar (10), sound analysis and speech recognition (11–13), geophysics (14, 15), shaped laser pulses (16–18), the physiology of hearing, and musicography.

Hainsworth, S. W., Macleod, M. D. & Wolfe, P. J., IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct. 21–24, 2001, Mohonk, NY, p. 26.

Results and Discussion

The auditory nerve preserves information about phases of oscillations much more accurately than information about amplitudes, a feature that inspired temporal theories of pitch perception (33–37). Let us consider what types of computation would be simple to perform given this information. We shall idealize the cochlea as splitting a sound signal χ(t) into many component signals χ(t,ω) indexed by frequency ω χ is the Gabor transform (19) or short-time Fourier transform (STFT) of the signal x(t). The parameter σ is the temporal resolution or time scale of the transform, and its inverse is the frequency resolution or bandwidth. The STFT χ is a smooth function of both t and ω and is strongly correlated for Δt < σ or Δω < 1/σ. In polar coordinates it decomposes into magnitude and phase, χ(t, ω) = |χ|(t, ω)eiφ(t,ω). A plot of |χ|2 as a function of (t, ω) is called the spectrogram (3, 38), sonogram (8), or Husimi distribution (2, 4) of the signal x(t). We call φ(t, ω) the phase of the STFT; it is well defined for all (t, ω) except where |χ| = 0. We shall base our representation on φ.

We can easily derive two quantities from φ: the time derivative of the phase, called the instantaneous frequency (31), and the current time minus the frequency derivative of the phase (the local group delay), the instantaneous time: Neural circuitry can compute or estimate these quantities from the information in the auditory nerve: the time derivative, as the time interval between action potentials in one given fiber of the auditory nerve, and the frequency derivative from a time interval between action potentials in nearby fibers, which are tonotopically organized (34).

Any neural representation that requires explicit use of ω or t is unnatural, because it entails “knowing” the numerical values of both the central frequencies of fibers and the current time. Eq. 2 affords a way out: given an estimate of a frequency and one of a time, one may plot the instantaneous estimates against each other, making only implicit use of (t, ω), namely, as the indices in an implicit plot. So for every pair (t, ω), the pair (tins, ωins) is computed from Eq. 2, and the two components are plotted against each other in a plane that we call (abusing notation) the (tins, ωins) plane. More abstractly, Eq. 2 defines a transformation T The transformation is signal-dependent because φ has to be computed from Eq. 1, which depends on the signal x, hence the subscript {x} on T.

The transformation given by Eqs. 2 and 3 has optimum time-frequency localization properties for simple signals (27, 28). The values of the estimates for simple test signals are given in Table 1.

So for a simple tone of frequency ω0, the whole (t, ω) plane is transformed into a single line, (t, ω0); similarly, for a “click,” Dirac delta function localized at time t0, the plane is transformed into the line (t0, ω); and for a frequency sweep where the frequency increases linearly with time as αt, the plane collapses onto the line αtins= ωins. (The full expression in the frequency sweep case is given in Appendix.) So for these simple signals the transformation (t, ω) → (tins, ωins) has a simple interpretation as a projection to a line that represents the signal. The transformation’s time-frequency localization properties are optimal, because these simple signals, independently of their slope, are represented by lines of zero thickness. Under the STFT the simple signals above transform into strokes with a Gaussian profile, with vertical thickness 1/σ (tones) and horizontal thickness σ (clicks).

These considerations lead to a careful restatement of the uncertainty principle. In optics it is well known that there is a difference between precision and resolution. Resolution refers to the ability to establish that there are two distinct objects at a certain distance, whereas precision refers to the accuracy with which a single object can be tracked. The wavelength of light limits resolution, but not precision. Similarly, the uncertainty principle limits the ability to separate a sum of signals as distinct objects, rather than the ability to track a single signal. The best-known distribution with optimal localization, the Wigner–Ville distribution (1), achieves optimal localization at the expense of infinitely long range in both frequency and time. Because it is bilinear, the Wigner transform of a sum of signals causes the signals to interfere or beat, no matter how far apart they are in frequency or time, seriously damaging the resolution of the transform. This nonlocality makes it unusable in practice and led to the development of Cohen’s class. In contrast, it is readily seen from Eq. 1 that the instantaneous time-frequency reassignment cannot cause a sum of signals to interfere when they are further apart than a Fourier uncertainty ellipsoid; therefore, it can resolve signals as long as they are further apart than the Fourier uncertainty ellipsoid, which is the optimal case. Thus, reassignment with instantaneous time-frequency estimates has optimal precision (unlimited) and optimal resolution (strict equality in the uncertainty relation).

We shall now derive the formula needed to implement numerically this method. First, the derivatives of the transformation defined by Eq. 2 should be carried out analytically. The Gaussian window in the STFT has a complex analytic structure; defining z = t/σ − iσω we can write the STFT as So up to the factor e(σω)2/2, the STFT is an analytic function of z (29). Defining we obtain in closed form So the mapping is a quotient of convolutions: where ∗ is the complex conjugate. Therefore, computing the instantaneous time-frequency transformation requires only twice the numerical effort of an STFT.

Any transformation F: (ω, t) → (ωins, tins) can be used to transform a distribution in the (ω, t) plane to its corresponding distribution in the (ωins, tins) plane. If the transformation is invertible and smooth, the usual case for a coordinate transformation, this change of coordinates is done by multiplying by the Jacobian of F the distribution evaluated at the “old coordinates” F−1ins, tins). Similarly, the transformation T given by Eqs. 2 and 3 transforms a distribution f in the (ω,t) plane to the (ωins, tins) plane, called a “reassigned f ” (28–30, 38)§. However, because T is neither invertible nor smooth, the reassignment requires an integral approach, best visualized as the numerical algorithm shown in Fig. 1: generate a fine grid in the (t, ω) plane, map every element of this grid to its estimate (tins, ωins), and then create a two-dimensional histogram of the latter. If we weight the histogrammed points by a positive-definite distribution f(t, ω), the histogram g(tins, ωins) is the reassigned or remapped f; if the points are unweighted (i.e., f = 1), we have reassigned the uniform (Lebesgue) measure. We call the class of distributions so generated the reassignment class. Reassignment has two essential ingredients: a signal-dependent transformation of the time-frequency plane, in our case the instantaneous time-frequency mapping defined by Eq. 2, and the distribution being reassigned. We shall for the moment consider two distributions: that obtained by reassigning the spectrogram |χ|2, and that obtained by reassigning 1, the Lebesgue measure. Later, we shall extend the notion of reassignment and reassign χ itself to obtain a complex reassigned transform rather than a distribution.

Fig. 1.

Reassignment. T{x} transforms a fine grid of points in (t, ω) space into a set of points in (tins, ωins) space; we histogram these points by counting how many fall within each element of a grid in (tins, ωins) space. The contribution of each point to the count in a bin may be unweighted, as shown above, or the counting may be weighted by a function g(t, ω), in which case we say we are computing the reassigned g. The weighting function is typically the sonogram from Eq. 1. An unweighted count can be viewed as reassigning 1, or more formally, as the reassigned Lebesgue measure. For a given grid size in (tins, ωins) space, as the grid of points in the original (t, ω) space becomes finer, the values in the histogram converge to limiting values.

Neurons could implement a calculation homologous to the method shown in Fig. 1, e.g., by using varying delays (39) and the “many-are-equal” logical primitive (40), which computes histograms.

Despite its highly desirable properties, the unwieldy analytical nature of the reassignment class has prevented its wide use. Useful signal estimation requires us to know what the transformation does to both signals and noise. We shall now demonstrate the usefulness of reassignment by proving some important results for white noise. Fig. 2shows the sonogram and reallocated sonogram of a discrete realization of white noise. In the discrete case, the signal is assumed to repeat periodically, and a sum replaces the integral in Eq. 1. If the signal has N discrete values we can compute N frequencies by Fourier transformation, so the time-frequency plane has N2 “pixels,” which, having been derived from only N numbers, are correlated (19). Given a discrete realization of white noise, i.e., a vector with N independent Gaussian random numbers, the STFT has exactly N zeros on the fundamental tile of the (t,ω) plane, so, on average, the area per zero is 1. These zeros are distributed with uniform density, although they are not independently distributed.

Fig. 2.

Analysis of a discrete white-noise signal, consisting of N independent identically distributed Gaussian random variables. (Left) |χ|, represented by false colors; red and yellow show high values, and black shows zero. The horizontal axis is time and the vertical axis is frequency, as in a musical score. Although the spectrogram of white noise has a constant expectation value, its value on a specific realization fluctuates as shown here. Note the black dots pockmarking the figure; the zeros of χ determine the local structure of the reassignment transformation. (Center) The reassigned spectrogram concentrates in a thin, froth-like structure and is zero (black) elsewhere. (Right) A composite picture showing reassigned distributions and their relationship to the zeros of the STFT; the green channel of the picture shows the reassigned Lebesgue measure, the red channel displays the reassigned sonogram, and the blue channel shows the zeros of the original STFT. Note that both distributions have similar footprints (resulting in yellow lines), with the reassigned histogram tracking the high-intensity regions of the sonogram and form a froth- or Voronoi-like pattern surrounding the zeros of the STFT.

Because the zeros of the STFT are discrete, the spectrogram is almost everywhere nonzero. In Fig. 2 the reassigned distributions are mostly zero or near zero: nonzero values concentrate in a froth-like pattern covering the ridges that separate neighboring zeros of the STFT. The Weierstrass representation theorem permits us to write the STFT of white noise as a product over the zeros × the exponential of an entire function of quadratic type: where Q(z) is a quadratic polynomial and zi is the zeros of the STFT. The phase φ = Im ln G and hence the instantaneous estimates in Eq. 2 become sums of magnetic-like interactions where and similarly for the instantaneous time; so the slow manifolds of the transformation T, where the reassigned representation has its support, are given by equations representing equilibria of magnetic-like terms.

The reassigned distributions lie on thin strips between the zeros, which occupy only a small fraction of the time-frequency plane; see Appendix for an explicit calculation of the width of the stripes in a specific case. The fraction of the time-frequency plane occupied by the support of the distribution decreases as the sequence becomes longer, as in Fig. 3; therefore, reassigned distributions are sparse in the time-frequency plane. Sparse representations are of great interest in neuroscience (41–43), particularly in auditory areas, because most neurons in the primary auditory cortex A1 are silent most of the time (44–46).

Fig. 3.

The complex reassigned representation is sparse. We generated white-noise signals with N samples and computed both their STFFT and its complex reassigned transform on the N × N time-frequency square. The magnitude squared of either transform is its “energy distribution.” (Left) The probability distribution of the energy for both transforms computed from 1,000 realizations for N = 2,048. The energy distribution of the STFT (blue) agrees exactly with the expected ex (see the log-linear plot inset). The energy distribution of the complex reassigned transform (red) is substantially broader, having many more events that are either very small or very large; we show in gray the power-law x−2 for comparison. For the complex reassigned transform most elements of the 2,048 × 2,048 time-frequency plane are close to zero, whereas a few elements have extremely large values. (Right) Entropy of the energy distribution of both transforms; this entropy may be interpreted as the natural logarithm of the fraction of the time-frequency plane that the footprint of the distribution covers. For each N, we analyzed 51 realizations of the signal and displayed them as dots on the graph. The entropy of the STFT remains constant, close to its theoretical value of 0.42278 as N increases, whereas the entropy of the complex reassigned transform decreases linearly with the logarithm of N. The representation covers a smaller and smaller fraction of the time-frequency plane as N increases.

Signals superposed on noise move the zeros away from the representation of the pure signal, creating crevices. This process is shown in Fig. 4. When the signal is strong and readily detectable, its reassigned representation detaches from the underlying “froth” of noise; when the signal is weak, the reassigned representation merges into the froth, and if the signal is too weak its representation fragments into disconnected pieces.

Fig. 4.

Detection of a signal in a background of noise. Shown are the reassigned distributions and zeros of the Gabor transform as in Fig. 2Right. The signal analyzed here is x = ζ(t) + Asinω0t, where ζ(t) is Gaussian white noise and has been kept the same across the panels. As the signal strength A is increased, a horizontal line appears at frequency ω0. We can readily observe that the zeros that are far from ω0 are unaffected; as A is increased, the zeros near ω0 are repelled and form a crevice whose width increases with A. For intermediate values of A a zigzagging curve appears in the vicinity of ω0. Note that because the instantaneous time-frequency reassignment is rotationally invariant in the time-frequency plane, detection of a click or a frequency sweep operates through the same principles, even though the energy of a frequency sweep is now spread over a large portion of the spectrum.

Distributions are not explicitly invertible; i.e., they retain information on features of the original signal, but lose some information (for instance about phases) irretrievably. It would be desirable to reassign the full STFT χ rather than just its spectrogram |χ|2. Also the auditory system preserves accurate timing information all of the way to primary auditory cortex (47). We shall now extend the reassignment class to complex-valued functions; to do this we need to reassign phase information, which requires more care than reassigning positive values, because complex values with rapidly rotating phases can cancel through destructive interference. We build a complex-valued histogram where each occurrence of (tins, ωins) is weighted by χ(t, ω). We must transform the phases so preimages of (tins, ωins) add coherently. The expected phase change from (t, ω) to (tins, ωins) is (ω + ωins)(tinst)/2, i.e., the average frequency times the time difference. This correction is exact for linear frequency sweeps. Therefore, we reassign χ by histogramming (tins, ωins) weighted by χ(t, ω)ei(ω+ωins)(tinst)/2. Unlike standard reassignment, the weight for complex reassignment depends on both the point of origin and the destination.

The complex reassigned STFT now shares an important attribute of χ that neither |χ|2 nor any other positive-definite distribution possesses: explicit invertibility. This inversion is not exact and may diverge significantly for spectrally dense signals. However, we can reconstruct simple signals directly by integrating on vertical slices, as in Fig. 5, which analyzes a chirp (a Gaussian-enveloped frequency sweep). Complex reassignment also allows us to define and compute synchrony between frequency bands: only by using the absolute phases can we check whether different components appearing to be harmonics of a single sound are actually synchronous.

Fig. 5.

Reconstruction of a chirp from the complex reassigned STFT. (Upper Left) STFT of a chirp; intensity represents magnitude, and hue represents complex phase. The spacing between lines of equal phase narrows toward the upper right, corresponding to the linearly increasing frequency. (Upper Right) Complex reassigned STFT of the same signal. The width of this representation is one pixel; the oscillation follows the same pattern. (Lower) A vertical integral of the STFT (blue) reconstructs the original signal exactly; the vertical integral of the complex reassigned transform (green) agrees with the original signal almost exactly. (Note the integral must include the mirror-symmetric, complex conjugate negative frequencies to reconstruct real signals.) (Lower Right) Full range of the chirp. (Lower Left) A detail of the rising edge of the waveform, showing the green and blue curves superposing point by point.

We defined the transformation for a single value of the bandwidth σ. Nothing prevents us from varying this bandwidth or using many bandwidths simultaneously, and indeed the auditory system appears to do so, because bandwidth varies across auditory nerve fibers and is furthermore volume-dependent. Performing reassignment as a function of time, frequency, and bandwidth we obtain a reassigned wavelet representation, which we shall not cover in this article. We shall describe a simpler method: using several bandwidths simultaneously and highlighting features that remain the same as the bandwidth is changed. When we intersect the footprints of representations for multiple bandwidths we obtain a consensus only for those features that are stable with respect to the analyzing bandwidth (31), as in Fig. 6. For spectrally more complex signals, distinct analyzing bandwidths resolve different portions of the signal. Yet the lack of predictability in the auditory stream precludes choosing the right bandwidths in advance. In Fig. 6, the analysis proceeds through many bandwidths, but only those bands that are locally optimal for the signal stand out as salient through consensus. Application of consensus to real sounds is illustrated in Fig. 7. This principle may also support robust pattern recognition in the presence of primary auditory sensors whose bandwidths depend on the intensity of the sound.

Fig. 6.

Consensus finds the best local bandwidth. Analysis of a signal x(t) composed of a series of harmonic stacks followed by a series of clicks; the separation between the stacks times the separation between the clicks is near the uncertainty limit 1/2, so no single σ can simultaneously analyze both. If the analyzing bandwidth is small (Center), the stacks are well resolved from one another, but the clicks are not. If the bandwidth is large (i.e., the temporal localization is high, Left), the clicks are resolved but the stacks merge. Using several bandwidths (Right) resolves both simultaneously.

Table 1.

The values of the estimates for simple test signals

1. Introduction

Oscillatory signals occur in a wide range of fields, including geophysics, biology, medicine, finance and social dynamics. They often consist of several different oscillatory components, the nature, time-varying behaviour and interaction of which reflect properties of the underlying system. In general, we want to assess the number, strength and rate of oscillation of the different components constituting the signal, to separate noise from signal, and to isolate individual components; efficient and robust extraction of this information from an observed signal will help us better describe and quantify the underlying dynamics that govern the system. For each of the quantities of interest listed, we thus want an estimator that is consistent, that has (ideally) small variance and that produces results robust to different types of noise.

If the observed signal f can be written as a finite sum of the so-called harmonic components, i.e. , where a>0 (respectively, ξ>0) represents the strength or amplitude (respectively, frequency) of the ℓth component, then one can recover the a and ξ from time samples of f(t) via the Fourier transform of f, defined by . (If the ξ are all integer multiples of a common 1/t0, then the integral can be taken over an interval of length t0; when this is not the case, one can resort to integrals over long time intervals and average. Typically, only discrete samples , are known, rather than the continuous-time course , and the integrals are estimated by quadrature.) However, oscillatory signals of interest often have more complex behaviour. We shall be interested in particular in signals that are still the combination of ‘elementary’ oscillations, but in which both the amplitude and the frequency of the components are no longer constant; they can be written as 1.1where , Ak(t)>0 and φk(t)>0 for all k, but Ak(t) and φk(t) are not constants. One can compute the Fourier transform of such signals, and recover f from (this can be validly done for a much wider class of functions), but it is now less straightforward to determine the time-varying amplitudes Ak(t) and the so-called ‘instantaneous frequencies’ φk(t) from . Although the time-local behaviour of the oscillations, and their deviation from perfect periodicity, cannot be captured by the Fourier transform in an easily ‘readable’ way, an accurate description of this instantaneous behaviour is nevertheless important in many applications, both to understand the system producing the signal and to predict its future behaviour. Examples in the medical field include studies of the circadian [1,2] and cortical rhythms [3], or of heart rate [4,5] and respiratory variability [6,7], all widely studied to understand physiology and predict clinical outcomes.

The last 50 years have seen many approaches, in applied harmonic analysis and signal processing, to develop useful analysis tools for signals of this type; this is the domain of time–frequency (TF) analysis. Several algorithms and associated theories have been developed and widely applied (see e.g. the overview [8]); well-known examples include the short-time Fourier transform (STFT), continuous wavelet transform (CWT) and Wigner–Ville distribution (WVD). The main idea is often to ‘localize’ a portion of the signal in time, and then ‘measure’ the oscillatory behaviour of this portion. For example, given a function fL2, the windowed transform or STFT associated with a window function h(t) can be defined as 1.2where is the time, is the frequency and h is the window function chosen by the user—a commonly used choice is the Gaussian function with kernel bandwidth σ>0, i.e. h(t)=(2πσ)−1/2 et2/(2σ2). (The overall phase factor ei2πηt is not always present in the STFT, leading to the name modified short-time Fourier transform (mSTFT) for this particular form in [9].)

Other, more specialized methods, targeting in particular signals of type (1.1), include the empirical mode decomposition [10], ensemble empirical mode decomposition [11], the sparsity approach [12], iterative convolution filtering [13,14], the approximation approach [15], non-local mean approach [16], time-varying autoregression and moving average approach [17] as well as the synchrosqueezing transforms (SSTs) introduced and studied by some of us [9,18–21].

All TF methods that target reasonably large classes of functions (as opposed to functions with such specific models that complete characterization requires only fitting a small number of parameters) must face the Heisenberg uncertainty principle, limiting how accurately oscillatory information can be captured over short time intervals; for toy signals specially designed to have precise TF properties (e.g. chirps), this typically expresses itself by a ‘blurring’ or ‘smearing out’ of their TF representation, regardless of the analysis tool used. Reassignment methods [22–24], introduced in 1978 and recently attracting more attention again, were proposed to analyse and possibly counter this. Their main idea is to analyse the local behaviour in the TF plane of portions of the representation, and determine nearby possible TF concentration candidates that best explain it; each small portion is then reallocated to its ‘right’ place in the TF plane, to obtain a more concentrated TF representation that, one hopes, gives a faithful and precise rendering of the TF properties of the signal. Reassignment methods can be applied to very general TF representations [8,23]; they can be adaptive as well [25]. It has been argued recently [16] that reassignment methods can be viewed as analogues of ‘non-local means’ techniques commonly applied in image processing; this provides an intuitive explanation for their robustness to noise.

The SST can be viewed as a special reassignment method [23–25]. In SST, the STFT or CWT coefficients are reassigned only in the frequency ‘direction’ [19,21,26]; this preserves causality, making it possible to reconstruct each component with real-time implementation [27]. The STFT-based SST of f is defined as 1.3where gα is an ‘approximate δ-function’ (i.e. g is smooth and has fast decay, with , so that gα(t):=(1/α)g(t/α) tends weakly to the delta measure δ as α→0), and with defined by 1.4The SST for CWT is defined similarly; see [19,28], or §2. SST was proposed originally for sound signals [18,29]; its theoretical properties have been studied extensively [19,26–28,30–32], including its stability to different types of noise [28,33]. Several variations of the SST have been proposed [31,34–37]; in particular, the SST approach can also be used for other TF representations, such as the wave packet transform [35] and S-transform [38], and it can be extended to two-dimensional signals (such as images) [39,40]. In addition, its practical usefulness has been demonstrated in a wide range of fields, including medicine [5–7,41–44], mechanics [37,45,46], finance [47,48], geography [38,49–51], denoising [31], atomic physics [52–54] and image analysis [55,56].

The SST approach can extract the instantaneous frequency and reconstruct the constitutional oscillatory components of a signal of type (1.1) in the presence of noise [28,33]. However, its performance suffers when the signal-to-noise ratio (SNR) becomes low: as the noise level increases, and even before it completely obscures the main concentration in the TF plane of the signal, spurious concentration areas appear elsewhere in the TF plane, caused by correlations introduced by the overcomplete STFT or CWT analysis tool. The effect of these misleading perturbations, which downgrade the quality of the results, can be countered, to some extent, by multitapering.

Multitapering is a technique originally proposed to reduce the variance and hence stabilize power spectrum estimation in the spectral analysis of stationary signals [57–59]. Sampling the signal during only a finite interval leads to artefacts, traditionally reduced by tapering; an unfortunate side effect of tapering is to diminish the impact of samples at the extremes of the time interval. Thomson [57] showed that one can nevertheless exploit optimally the information provided by the samples at the extremities, by using several orthonormal functions as tapers: the average of the corresponding power spectra is a good estimator with reduced variance. This technique has since been applied widely [58,60–63]. Multitapering was later extended to non-stationary TF analysis by combining it with reassignment [64–66]: a more robust ‘combined’ reassigned TF representation is obtained by picking orthonormal ‘windows’ (used to isolate portions of the TF representation when working with a reassignment method), and averaging the reassigned TF representations determined by each of the individual windows. Heuristically, the concentration for a ‘true’ constituting component of the signal will be in similar locations in the TF plane for each of the individual reassigned TF representations, whereas the spurious concentrations, artefacts of correlations between noise and the windowing function, tend not to be co-located and have a diminished impact when averaged. In the SST context, a similar multi-taper idea was used by one of us in a study of anaesthesia depth [5,44], in which J different window functions hj, j=1,…,J, are considered, and the multitaper SST (MTSST) is computed as the average of the individual : . Using multiple tapers reduces artefacts, and the MTSST remains ‘readable’ at higher noise levels than a ‘simple’ SST [5,44]. To suppress noise artefacts better, it is tempting to consider increasing J. However, the area in the TF plane over which the signal TF information is ‘smeared out’ also increases (linearly) with J, and a balance needs to be observed; in the multitaper reassignment method of [64], for instance, six Hermite functions were used (i.e. J=6).

In this paper, we introduce a new approach to obtain better concentrated time–frequency representations, which we call ConceFT, for ‘concentration of frequency and time’. It is based on STFT- or CWT-based SSTs, but the approach could be applied to yet other TF decomposition tools. The ConceFT algorithm will be defined precisely below, in §2. Like MTSST, ConceFT starts from a multilayered time–frequency representation, but instead of averaging the SST results obtained from STFT or CWT for orthonormal windows, which can be viewed as elements in a vector space of time–frequency functions, it considers many different projections in this same vector space, and averages the corresponding SSTs; for more details, see §2; figures illustrating the result of STFT-based ConceFT can be found in the Introduction section of the electronic supplementary material. Section 3 studies the theoretical properties of ConceFT, and explains how it can provide reliable results under challenging SNR conditions; finally, in §4, we provide several numerical results.

2. The ConceFT algorithm

We start by briefly reviewing SST. In the Introduction, we defined STFT-based SST, discussed in more detail in [21,26]; to show that the situation is very similar for CWT-based SST, we discuss that case here; see [19,28] for details. We start with the wavelet ψ with respect to which the CWT will be computed, which must necessarily have mean zero; that is, ; let us also pick it to be a Schwartz function. We shall assume that we are dealing with real signals f; in this case, the symmetry in ξ of makes it possible to consider only the ‘positive-frequency part’ of f, by picking ψ so that its Fourier transform is supported on . (The approach can be extended easily to handle complex signals as well, but notation becomes a bit heavier.) Then the continuous wavelet transform of a tempered distribution f, with the variables a, b standing for scale and time location, is defined as the inner product of f with ψ(a,b)(t)=|a|−1/2ψ((tb)/a). Even if the Fourier transform is very concentrated around some frequency ω0, the magnitude of the CWT will be spread out over a range of scales a, corresponding to a neighbourhood of ω0. However, the phase information of will still hold a ‘fingerprint’ of ω0 on that whole neighbourhood, in that will show oscillatory behaviour in b, with frequency ω0, for a range of different a. This is the motivation for the SST, which shifts the CWT coefficients ‘back’, according to certain reassignment rules determined by the phase information. More concretely, we set a thresholdΓ>0, and then define and where ∂b is the partial derivative with respect to b (see the electronic supplementary material for a remark concerning robust numerical implementation). The hard threshold Γ can be adjusted for best reduction of the numerical error and noise influence. For example, when the noise level is known or estimated with reasonable accuracy, Γ can be chosen so as to reduce the influence of the noise most effectively; see [33] for a discussion. The CWT-based SST then moves the CWT coefficient to the ‘right’ frequency slot, using as guideline: 2.1where 0<α≪1 is chosen by the user, g is a smooth function so that gα(⋅):=(1/α)g(⋅/α)→δ in the weak sense as α→0, and the factor a−3/2 is introduced to ensure that the integral of over ξ yields a close approximation to the original f(b). For more details, we refer the reader to [19,28].

Although both the CWT and its derived SST depend on the choice of the reference wavelet ψ, this is much less pronounced for the SST; CWT-based SST corresponding to different reference wavelets lead to different but very similar TF representations. (Theoretical reasons for this can be found in [19,28].) In particular, the dominant components in the TF representations are very similar. Moreover, even when the signal is contaminated by noise, these dominant components in the TF representations are not significantly disturbed [28]. However, the distribution of artefacts across the TF representation, induced by the noise, as seen in e.g. the middle left panel of figure S.1, vary from one reference wavelet to another; this can be intuitively explained by observing that the CWT is essentially a convolution with (scaled versions of) the reference wavelet, so that the wavelet transforms of independent and identically distributed (i.i.d.) noise based on different orthogonal reference wavelets are independent. These observations lead to the idea of a multitaper SST algorithm [5,44]. In brief, given J orthonormal reference wavelets ψj, j=1,…,J, one determines the reassignment rules , as well as the corresponding , and then defines the MTSST by 2.2This suggests that averaging over a large number of orthonormal reference wavelets would smooth out completely the TF artefacts induced by the noise, as originally discussed for the reassignment method [64]. However, in order for reassignment to make sense, the reference function, whether it is the window h for STFT or the wavelet ψ for CWT, must itself be fairly well concentrated in time and frequency, so that inner products with modulated window functions or scaled wavelets do not mix up different components and behaviours of the signal. On the other hand, there is a limit to how many orthonormal functions can be ‘mostly’ supported in a concentrated region in the TF plane—by a rule of thumb generalizing the Nyquist sampling density one can find, for a region in the TF plane, only orthonormal functions that are mostly concentrated on [67]. This limits how many different orthonormal ψj can be used in MTSST.

ConceFT uses the different TF ‘views’ provided by the CWT transforms in a different way, exploiting the nonlinearity of the SST operation. (See the electronic supplementary material for a sketch of an alternative way in which one could extend multitaper CWT; not pursued in this paper, however.) For each choice of ψ, the collection of CWT , where f ranges over the class of signals of interest, span a subspace of , the space of all reasonably smooth functions of the two variables a, b. Different orthonormal ψj generate different subspaces in ; combined, they generate a larger subspace, in which one can define an infinite number of ‘sections’, each corresponding to the collection of CWT generated by one reference wavelet. Each linear combination of the ψj defines such a CWT space, in which one can carry out the corresponding SST operation. For , where , one has ; because synchrosqueezing is a highly nonlinear operation, the corresponding are however not linear combinations of the . In practice, the artificial concentrations in the TF plane, triggered by fortuitous correlations between the noise and the (overcomplete) ψ(a,b), occur at locations sufficiently different, for different choices of the vector r=(r1,…,rJ), that averaging over many choices of r successfully suppresses noise artefacts.

More precisely, the CWT-based ConceFT algorithm proceeds as follows:

  • — Take J orthonormal reference wavelets, ψ1,…,ψJ, in the Schwartz space, with good concentration in the TF plane.

  • — Pick N random vectors rn, n=1,…,N, of unit norm, in ; that is, uniformly select N samples in SJ−1.

  • — For each n=1,…,N, define , and .

  • — Select the threshold Γ>0 and the approximation parameter α>0, and evaluate, for each n between 1 and N, the corresponding CWT-based SST of f by computing the reassignment rule , and hence , as defined above, with the minor adjustment that when the expression in the reassignment rule denominator has a negative real part, we switch to the vector −rn.

  • — The final ConceFT representation of f is then the average 2.3In practice, J could be as small as 2, while N could be chosen as large as the user wishes.

The square of the magnitude of , , can be of interest in its own right, as an estimated time-varying power spectrum (tvPS) of f.

STFT-based ConceFT representations are defined entirely analogously, based on the STFT reassignment rule given in §1.

3. Theoretical results

In this section, we list and explain theoretical results about CWT-based ConceFT. The detailed mathematical computations and proofs can be found in the electronic supplementary material. Entirely similar results hold for STFT-based ConceFT; as they are established by the same arguments, we skip those details. We start by recalling the structure of our signal space, as introduced in [19,28]. We emphasize that this is, to a large extent, a purely phenomenological model, constructed so as to reflect the fairly (but not exactly) periodic nature of many signals of interest, in particular (but not only) those of a physiological origin (see the discussion in [6]).

A single-component or intrinsic-mode type (IMT) function has the following form: 3.1where both the amplitude modulation A(t) and the phase function φ(t) are reasonably smooth; in addition, both A(t) and the derivative φ′(t) (or the ‘instantaneous frequency’) are strictly positive at all times as well as bounded; finally, we assume that A and φ′ vary in time at rates that are slow compared with the instantaneous frequency of F itself. For the precise mathematical formulation of these conditions, we refer to the electronic supplementary material; this precise formulation invokes a few parameters, one of which, ϵ, bounds the ratio of the rate of change of A and φ′. This parameter will play a role in our estimates below. (Although we are assuming the signal to be real-valued here, all this can easily be adapted to the complex case by replacing the cosine with the corresponding complex exponential; the discussion in the remainder of this section can be adapted similarly.) We also consider signals that contain several IMT components, that is, functions of the type 3.2where each F is an IMT function, and we assume in addition that the instantaneous frequencies φ′(t) are ordered (higher ℓ corresponding to larger φ′) and well separated, 3.3for all ℓ=1,…,L−1, for some d with 0<d<1. We denote by the set of all such functions G; it provides a flexible adaptive harmonic model space for a wide class of signals of interest. (Strictly speaking, they are not ‘truly’ harmonic, if harmonicity is interpreted—as it often is—as ‘having components with frequencies that are integer multiples of a fundamental frequency’.)

Next, we turn to the noise model for which we prove our main theoretical result. For the purposes of this theoretical discussion, we use a simple additive Gaussian white noise (even though the approach works for much more challenging noise models as well—see the electronic supplementary material). That is, we consider our noisy signals to be of the form 3.4where is in , Φ is a Gaussian white noise with standard deviation 1 and σ>0 is the noise level. Y is a generalized random process, since by definition G is a tempered distribution. We could extend this, introducing also the trend and a more general noise model as in [28], the wave-shape function used in [30], or the generalized IMT functions that model oscillatory signals with fast varying instantaneous frequency of [68]. Since none of these generalizations would significantly affect the mathematical analysis, we restrict ourselves to the model (3.4).

Finally, we describe the wavelets ψ1,…,ψJ with respect to which we compute the CWT of Y . For the sake of convenience of the theoretical analysis, we assume that they are smooth functions with fast decay, and that their Fourier transforms are all real functions with compact support, , where 0<Δj<1. We also assume that the ψ1,…,ψJ form an orthonormal set, that is, , where δi,j is the Kronecker delta. To build appropriate linear combinations of the ψj, we define, for any unit-norm vector r=(r1,…,rJ) in , the corresponding combination as . It is convenient to characterize intervals for the scale a such that the support of overlaps φ′(b), where ; we thus introduce the notation . It then follows from the definition of the CWT as the inner product between the signal and scaled, translated versions of the wavelets that [19,28] 3.5where

0 comments

Leave a Reply

Your email address will not be published. Required fields are marked *