Lab 5: Spectral Peaks
A "peak" is defined as a local maximum of the magnitude spectrum, and the only practical constraints
to be made in the peak search are to have a frequency range and a magnitude threshold.
Due to the sampled nature of the spectrum returned by the FFT, each peak is accurate only
to within half a sample. A spectral sample represents a frequency interval of fs/N Hz, where
fs is the sampling rate and N is the FFT size. Zero-padding in the time domain increases the
number of spectral samples per Hz and thus increases the accuracy of the simple peak
detection (see previous section). However, to obtain frequency accuracy on the level
of 0.1% of the distance from the top of an ideal peak to its first zero
crossing (in the case of a rectangular window), the zero-padding factor required is 1000.
A more efficient spectral interpolation scheme is to zero-pad only enough so that quadratic (or other simple)
spectral interpolation, using only samples immediately surrounding the maximum-magnitude sample, suffices to
refine the estimate to 0.1% accuracy.
The frequency and magnitude of a peak is obtained from the magnitude
spectrum expressed in dB. Then the phase value of the peak is measured
by reading the value of the unwrapped phase spectrum at the position
resulting from the frequency of the peak.
4.1 Peak detection
Find the spectral peaks of a sound starting from the following code that implements analysis/synthesis using the STFT:
- function y = stft(x, w, N, H)
- % x: input sound, w: window, N: FFT size, H: hop size
- M = length(w); % window size
- soundlength = length(x);
- pin = 0;
- pend = soundlength - M;
- y = zeros(soundlength,1); % output array
- fftbuffer = zeros (N,1);
- while pin<pend
- x1 = x(pin+1:pin+M) .* w(1:M); % window the signal
- fftbuffer(1:(M+1)/2) = x1((M+1)/2: M); % zero-phase window
- fftbuffer(N-(M-1)/2+1:N) = x1(1:(M-1)/2);
- X = fft(fftbuffer); % compute the FFT
- mX = abs(X); % magnitude spectrum
- pX = angle(X); % phase spectrum
- X = mX.*cos(pX)+i*mX.*sin(pX); % compute complex spectrum
- fftbuffer = real(ifft(X)); % inverse FFT
- x1((M+1)/2:M) = fftbuffer(1:(M+1)/2);
- x1(1:(M-1)/2) = fftbuffer(N-(M-1)/2+1:N);
- y(pin+1:pin+M) = y(pin+1:pin+M) + x1(1:M); % overlap-add
- pin = pin + H;
- Find the locations, ploc, of the local maxima above a given threshold in each magnitude spectrum by finding changes of slope. Ex: convert spectrum to differences, specder = diff([threshold; mX(:); threshold]) and then find up-down changes, ploc = find(mX(1:N) >= threshold & specder(1:N)>= 0 & specder(2:N+1) <= 0)
- Find the magnitudes, pmag, and phases, pphase, of the obtained locations. pmag = mX(ploc), pphase = pX(ploc)
- Plot the peak values on top of the magnitude and phase spectra at each frame.
5.2 Sinusoidal synthesis from peak valuesSynthesize a sound from the peak information using the inverse FFT.
- Generate a complex spectrum from all the obtained peak magnitudes and phases, X(ploc) = pmag .*exp(i.*pphase)
a complete function that takes a sound, finds the spectral peaks, and
synthesizes an output sound from that information. ex: y = stpt (x, w, N, H, threshold) (stpt for short-time peak transform)
5.3 Coding and transforming soundsThe
spectral peaks can capture most of the relevant perceptual information
of a sound and thus they can be used for audio coding or
for transforming sounds.
- Choose a group of sounds with
different characteristics, such as harmonic, noisy, percusive, .... and
perform analysis/synthesis with different paramenters with the goal to
get the best possible resynthesized sound, that is, the one that sounds the closest
to the original.
the number of peaks used to code each sound. If we could store all the
values of a peak (location, magnitude and phase) with 16
bits (one sample), what would be the resulting compression ratio for the different sounds? Explain
the results, why do you get different values for the different
transformations to the peak values before resynthesis, for example
multiplying the peak locations by a small factor (transposition?).