Multitask Noisy Speech Enhancement System

www.denoise.net

Overview
Example
Restorer
- Speech band equalizer
- Dynamics processing
- Noise gate
- Signal level limiter
- Clipping restoration
- Noise reduction
- Noise whitening
- Blind deconvolution
- Spectrum analyser
- Time stretching
- Spectral expander
- Fourier corrector
- Neural network corrector
- Decorrelation
- Joint approximation
- Homomorphic approximation
- Reverberation
Recorder
Browser
- Synchronisation
- Normalisation
Contact info

Recorder

The Recorder is an application designed for recording the speech signal to the hard disk of a personal computer. The Recorder is a separate application because it is possible to record the speech signal simultaneously from different sources on many computers and use the multiple recordings in the Browser and the Restorer applications.

The application records the signals received at the soundcard inputs. It is possible to record signal from up to four sources simultaneously (or up to 16 sources, depending on the application version and the soundcard type), using the multichannel soundcard.

After the user initialises the recording, the whole process is automatic. In order to save the disk space, the phonation detector analyses the signal from the soundcard input and only those fragments of the input signal that contain speech are recorded to a disk (each detected signal part containing speech is written to a separate sound file).

Block diagram of the Recorder application:

The user selects the soundcard input as a source for each of the recording channels in the application. The number of the recording channels may be set in the application options (four channels by default). The sampling frequency and the bit depth may be defined by the user. After the recording process is started, the monophonic signals from the selected sources are recorded to memory buffers. The signal samples stored in the buffers are analysed by the phonation detector (separate detector in each recording channel). The phonation detector computes the variance in the memory buffer and decides whether the buffer contains speech signal samples and should be written to a sound file.

The phonation detector:

A new recording session is created when the application is started or when the user starts new session from the application menu. After the recording process is initialised, all signal parts that are classified by the phonation detector as containing speech are recorded to separate sound files. The name of each sound file contains the date and time of the recording, the channel number and source identifier. A session file contains names of all sound files recorded during the session. The session is automatically saved to a disk and may be opened later in the Browser application. The recording session is closed when the user shuts down the application or starts a new session.

The sound files recorded during the current session are presented graphically in the application window. Each of the timelines presents sound files recorded in one recording channel. Each sound file is presented graphically as a box (rectangle). If the user clicks with the mouse on any of boxes, the corresponding sound file is played back (it is possible to play sound files even during the recording process) or the waveform of the recorded sound is displayed, depending on the option selected in the toolbar. The user can use zoom option to see a selected part of the recording session in details, by selecting the zoomed fragment with the mouse or using the zoom slider. The sound files may also be selected using the list boxes in the bottom part of the window. Clicking on the recording time inside the list box starts playback of the file.

The output of the phonation detector in each recording channel is displayed in the plot next to the timeline. The dashed lines indicate the threshold levels of the detector. If the detector output level rises above the upper threshold (red line), the signal is classified as containing speech and the signal samples are recorded to disk. If the detector output level falls beneath the lower threshold (blue line) and remains in this range for the defined sustain time, the signal is classified as noise and the recording to disk is stopped. The values of both threshold levels and the sustain time may be set in the options window. A green field near the plot indicates that the signal is recorded to a disk.

Each recording session - the session file and all the sound files belonging to a session - are stored in a separate folder on disk. The name of the folder describes the date and time of the recording. Therefore, it is easy to find a session recorded at a given time. The Recorder application automatically monitors the size of free space on the hard disk and alerts user if the disk is almost full. The amount of free disk space is presented in the main application window. In the separate window, free space on disk and information on all recorded session (time of the recording, number of sound files, size on disk, folder name) are presented. A selected session may be deleted from disk. Session archiving is possible using the CD recorder and the separate application.

If the internet connection is available, it is possible to use the Recorder application as a part of the distributed recording system. The speech signal may be recorded simultaneously on multiple personal computers connected to different communication channels, transmitting the same speech signal. Multiple versions of the recording are useful in the restoration process (in the Restorer application) and to improve the listening conditions (in the Browser application). In order to synchronise the recordings on each computer, the system clock is synchronised with the selected computer (which acts as a control server of the system) or with the internet time server. Additionally, short text messages may be sent between the recording computers (the server needs to know the IP addresses of all recording computers, so the log in procedure must be performed before the recording process is started).