REPAIR |
Declipping is the process of removing or reducing the perceptual effect of clipping from clipped audio. It consists of two steps: clipping detection, where samples are flagged as clipped or not clipped, and sample replacement, where all clipped samples are replaced with estimates of the original signal (or at least, estimates that remove the perceptual effect of clipping). Note that clipping is not a one-to-one mapping (multiple input amplitude values can be mapped to the same output amplitude value), therefore it can be impossible to reconstruct the original signal without making assumptions about properties of the input signal.
To detect clipping, we inspect the amplitude histogram of the signal. A clipped signal will have a local maxima near the clipping threshold, since all values beyond the threshold get mapped at or near the clipping threshold. A non-clipped signal should have a peak at the DC value and monotonically decrease to zero in both directions. Looking for the unnatural local maxima allows us to estimate the clipping threshold, and wherever the signal rises above the clipping threshold, we detect clipping.
Our declipping algorithm assumes that the signal on either side of a clipping region is stationary: the statistical properties of the signal do not change over time (at least for short periods of time, say 30 milliseconds). This allows us to declip by linearly interpolating blocks on either side of the clipped region. We do this bin by bin in the frequency domain. We also assume that clipping does not modify the phase of the signal. We determined this heuristically by creating the phase examples below.
For a formal description of our algorithm, see our AES publication: Laguna, Christopher, and Alexander Lerch. "An Efficient Algorithm for Clipping Detection and Declipping Audio." (2016). This is available at http://www.gtcmt.gatech.edu/publications/music-informatics.
Declipping
|
Phase Tests
|
The declipping examples demonstrate the performance of the declipping algorithm. The phase tests demonstrate that clipping distorts a signal's frequency content more its phase content. The input and clipped signals are blocked, and a reconstructed signal is created by combining the original signal's FFT magnitude with the clipped signal's FFT phase. As you can hear, the reconstructed signal sounds extremely similar to the original signal.
Noise removal is the process of removing stationary noise from a signal. This is accomplished by first estimating statistical properties of the noise from the signal. Then, an estimate of the original signal is obtained by minimizing a cost function. We use a cost function proposed by Wolfe et. al. (Wolfe, Patrick J., and Simon J. Godsill. "Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement." EURASIP Journal on Advances in Signal Processing 2003.10 (2003): 1-9.) that jointly estimates the original signal's magnitude and phase using the maximum a posteriori (MAP) estimator.
To estimate the noise profile, we prompt the user to highlight sections of the signal containing only noise, and from these sections, we estimate the Fast Fourier Transform (FFT) magnitude components of the noise by averaging the FFT magnitudes of each block selected by the user. We automatically suggest noise-only sections of the audio by looking for sections of the signal with a root-mean-square (RMS) value lower than a threshold, which is calculated as a percentage of the peak amplitude in the signal.
The perceptual equalizer is a filter whose paramters are selected by moving sliders associated with a few terms that correspond to different frequency ranges. The implementation is nothing more than a few biquad filters:
Boom: A low-shelf filter with a cutoff frequency of 60 Hz.
Warmth: A peak filter with a middle frequency of 300 Hz.
Brightness: A high-shelf filter with a cutoff frequency of 9000 Hz.