Multitask Noisy Speech Enhancement System

- Speech band equalizer
- Dynamics processing
- Noise gate
- Signal level limiter
- Clipping restoration
- Noise reduction
- Noise whitening
- Blind deconvolution
- Spectrum analyser
- Time stretching
- Spectral expander
- Fourier corrector
- Neural network corrector
- Decorrelation
- Joint approximation
- Homomorphic approximation
- Reverberation
- Synchronisation
- Normalisation
Contact info

Noise whitening

The noise (and other additive distortions) usually has non-flat amplitude spectrum. The noise whitening module equalizes the spectrum of the signal, making it similar to the white noise spectrum. Noise whitening module works similarly to the automatic filter that enhances low level spectral components and attenuates high level ones. Additionally, de-emphasis is applied after whitening (high frequencies are attenuated), so that spectrum of processed signal is similar to the spectrum of speech signal (which has low energy in high frequency range). Whitening and de-emphasis enhance the quality of the speech signal. This module is especially useful if constant "whistling" is present in the recording.

The algorithm starts with finding the part of the recording that contains only noise and no speech signal. This part is segmented and the spectrum is calculated for each segment. The noise estimate is calculated as a smoothed (low-pass filtered) noise spectrum averaged in all segments. This estimate is inversed and used in restoration of the speech signal.

The averaging of the noise spectrum may be described by the equation:

The equation above describes power spectrum estimate. Power spectrum of the stationary random signal xn is related to the correlation by discrete Fourier transform:

Average signal power, limited by Nyquist frequency, is given by:

and power spectral density (PSD) is:

Practically, averaged spectrum is obtained by calculating the spectra of the segmented signal (segments may overlap and they may be time-weighted using windows). The averaged signal spectrum is the average spectral energy from all segments. Spectrum in each segment is calculated using the discrete cosine transform. Averaged spectrum is additionally smoothed using the moving average filter in cepstrum domain in order to avoid extreme variations of spectrum magnitude.

The following parameters may be set by the user.

  • Filtering depth - for low values signal is hardly altered, for high values - signal becomes a white noise. It is recommended to start with setting the filtering depth to the middle value, increase the value until the distortions are removed and finally decrease it until satisfactory speech quality is obtained.
  • Pre-emphasis depth - higher values mean more attenuation in the high frequency range.
  • Pre-emphasis slope - frequency above which signal will be filtered due to de-emphasis.