A "peak" is defined as a local maximum of the magnitude spectrum, and the only practical constraints to be made in the peak search are to have a frequency range and a magnitude threshold.

Due to the sampled nature of the spectrum returned by the FFT, each peak is accurate only to within half a sample. A spectral sample represents a frequency interval of fs/N Hz, where fs is the sampling rate and N is the FFT size. Zero-padding in the time domain increases the number of spectral samples per Hz and thus increases the accuracy of the simple peak detection (see previous section). However, to obtain frequency accuracy on the level of 0.1% of the distance from the top of an ideal peak to its first zero crossing (in the case of a rectangular window), the zero-padding factor required is 1000.

A more efficient spectral interpolation scheme is to zero-pad only enough so that quadratic (or other simple) spectral interpolation, using only samples immediately surrounding the maximum-magnitude sample, suffices to refine the estimate to 0.1% accuracy.

The frequency and magnitude of a peak is obtained from the magnitude spectrum expressed in dB. Then the phase value of the peak is measured by reading the value of the unwrapped phase spectrum at the position resulting from the frequency of the peak.

- function y = stft(x, w, N, H)
- % x: input sound, w: window, N: FFT size, H: hop size
- M = length(w); % window size
- soundlength = length(x);
- pin = 0;
- pend = soundlength - M;
- y = zeros(soundlength,1); % output array
- fftbuffer = zeros (N,1);
- while pin<pend
- x1 = x(pin+1:pin+M) .* w(1:M); % window the signal
- fftbuffer(1:(M+1)/2) = x1((M+1)/2: M); % zero-phase window
- fftbuffer(N-(M-1)/2+1:N) = x1(1:(M-1)/2);
- X = fft(fftbuffer); % compute the FFT
- mX = abs(X); % magnitude spectrum
- pX = angle(X); % phase spectrum
- X = mX.*cos(pX)+i*mX.*sin(pX); % compute complex spectrum
- fftbuffer = real(ifft(X)); % inverse FFT
- x1((M+1)/2:M) = fftbuffer(1:(M+1)/2);
- x1(1:(M-1)/2) = fftbuffer(N-(M-1)/2+1:N);
- y(pin+1:pin+M) = y(pin+1:pin+M) + x1(1:M); % overlap-add
- pin = pin + H;
- end

- Find the locations, ploc, of the local maxima above a given threshold in each magnitude spectrum by finding changes of slope. Ex: convert spectrum to differences, specder = diff([threshold; mX(:); threshold]) and then find up-down changes, ploc = find(mX(1:N) >= threshold & specder(1:N)>= 0 & specder(2:N+1) <= 0)
- Find the magnitudes, pmag, and phases, pphase, of the obtained locations. pmag = mX(ploc), pphase = pX(ploc)
- Plot the peak values on top of the magnitude and phase spectra at each frame.

- Generate a complex spectrum from all the obtained peak magnitudes and phases, X(ploc) = pmag .*exp(i.*pphase)
- Write a complete function that takes a sound, finds the spectral peaks, and synthesizes an output sound from that information. ex: y = stpt (x, w, N, H, threshold) (stpt for short-time peak transform)

- Choose a group of sounds with different characteristics, such as harmonic, noisy, percusive, .... and perform analysis/synthesis with different paramenters with the goal to get the best possible resynthesized sound, that is, the one that sounds the closest to the original.
- Compute the number of peaks used to code each sound. If we could store all the values of a peak (location, magnitude and phase) with 16 bits (one sample), what would be the resulting compression ratio for the different sounds? Explain the results, why do you get different values for the different sounds?
- Apply transformations to the peak values before resynthesis, for example multiplying the peak locations by a small factor (transposition?).