Lab 5: Spectral Peaks

A "peak" is defined as a local maximum of the magnitude spectrum, and the only practical constraints to be made in the peak search are to have a frequency range and a magnitude threshold.

Due to the sampled nature of the spectrum returned by the FFT, each peak is accurate only to within half a sample. A spectral sample represents a frequency interval of fs/N Hz, where fs is the sampling rate and N is the FFT size. Zero-padding in the time domain increases the number of spectral samples per Hz and thus increases the accuracy of the simple peak detection (see previous section). However, to obtain frequency accuracy on the level of 0.1% of the distance from the top of an ideal peak to its first zero crossing (in the case of a rectangular window), the zero-padding factor required is 1000.

A more efficient spectral interpolation scheme is to zero-pad only enough so that quadratic (or other simple) spectral interpolation, using only samples immediately surrounding the maximum-magnitude sample, suffices to refine the estimate to 0.1% accuracy.

The frequency and magnitude of a peak is obtained from the magnitude spectrum expressed in dB. Then the phase value of the peak is measured by reading the value of the unwrapped phase spectrum at the position resulting from the frequency of the peak.

4.1 Peak detection

Find the spectral peaks of a sound starting from the following code that implements analysis/synthesis using the STFT:
  1. Find the locations, ploc, of the local maxima above a given threshold in each magnitude spectrum by finding changes of slope. Ex: convert spectrum to differences, specder = diff([threshold; mX(:); threshold]) and then find up-down changes, ploc = find(mX(1:N) >= threshold & specder(1:N)>= 0 & specder(2:N+1) <= 0)
  2. Find the magnitudes, pmag, and phases, pphase, of the obtained locations. pmag = mX(ploc), pphase = pX(ploc)
  3. Plot the peak values on top of the magnitude and phase spectra at each frame. 

5.2 Sinusoidal synthesis from peak values

Synthesize a sound from the peak information using the inverse FFT.
  1. Generate a complex spectrum from all the obtained peak magnitudes and phases, X(ploc) = pmag .*exp(i.*pphase)
  2. Write a complete function that takes a sound, finds the spectral peaks, and synthesizes an output sound from that information. ex: y = stpt (x, w, N, H, threshold) (stpt for short-time peak transform)

5.3 Coding and transforming sounds

The spectral peaks can capture most of the relevant perceptual information of a sound and thus they can be used for audio coding or for transforming sounds.
  1. Choose a group of sounds with different characteristics, such as harmonic, noisy, percusive, .... and perform analysis/synthesis with different paramenters with the goal to get the best possible resynthesized sound, that is, the one that sounds the closest to the original. 
  2. Compute the number of peaks used to code each sound. If we could store all the values of a peak (location, magnitude and phase) with 16 bits (one sample), what would be the resulting compression ratio for the different sounds? Explain the results, why do you get different values for the different sounds?
  3. Apply transformations to the peak values before resynthesis, for example multiplying the peak locations by a small factor (transposition?).