CERL Sound Group, University of Illinois
103 S. Mathews, Urbana IL 61801, USA
lemur@uiuc.edu
We base our work on the considerable strength of sinusoidal modeling and synthesis (or additive synthesis). Sinusoidal modeling and synthesis are computationally very expensive, but they are more powerful and more flexible than other methods, because fine control can be exercised over the behavior of each individual frequency component.
Figure 1. Example of Lemur analysis data from a cello tone, displayed in Lemur's editing window.
Figure 2. Lemur analysis of a sound with strong noise components, showing the proliferation of short, jittery tracks.
A fundamental problem we encountered with this approach is that the MQ analysis is being applied to a noisy signal that it cannot possibly model well, and its failure is being collected as the residual. The sinusoidal analysis is, therefore, cluttered with jittery tracks that are trying to model noise, and any stretched synthesis from this analysis will produce the wormy artifacts described above. Serra's algorithm addresses problem by attempting to remove the wormy tracks from the analysis data before performing spectral subtraction and residual construction. In our experience, however, it is not always possible to remove the entire noise contribution from a MQ-style analysis. The worminess corrupts the tracks representing the sinusoidal components.
Stochastic models do not to fully address the problem of time and frequency scale modification of the noise components in the modeled sound. Clearly, the raw residual signal cannot easily be modified, because it is simply an audio signal of the same length as the original signal. Neither can the short-time spectra be stretched without introducing audible artifacts caused by phase discontinuities at frame boundaries. A separate process is needed for modifying the stochastic part of the model. The problem of performing modified synthesis from short-time spectra has a long history (Allen 1977, Crochiere 1980, Portnoff 1980). Until the introduction of the MQ algorithm, all the solutions to this problem were pitch tracking algorithms that worked well only for monophonic, harmonic sounds.
The greatest shortcoming we find in this and other stochastic methods of accommodating noise, including those which represent noise energy in fixed frequency bands, is that they do not provide a unified representation of sinusoidal and noise components, and therefore don't allow all components of the sound to be edited and manipulated together.
Sinusoidal frequency modulation, implemented using the complex phase modulation equation,
generates spectral sidebands at frequencies that are the sum and difference of the carrier frequency,
, and
integer multiples of the modulating frequency,
(i.e.
frequencies
) (Dodge
and
Jerse 1985). For very small indices
of modulation, only the first sidebands (at frequencies
)
are of significant amplitude (Stremler 1982).
We use
narrowband (small modulation index) frequency modulation with a filtered noise modulator to implement a
bandwidth enhanced oscillator for additive synthesis. We compute the contribution of the noise modulation by
expressing it in the Taylor series expansion,
where s(t) is the noise modulator, and I is the index of modulation. For small indices of modulation, we can neglect the higher order terms in the Taylor expansion. For I = 0 (no modulation, corresponding to zero bandwidth), x(t) reduces to a simple sinusoid. The spectral contribution from noise modulation can be computed by taking the Fourier transform of the time domain equation, giving
where
is the delta function produced by a pure sinusoid
of frequency
, and
is the spectrum of the noise
modulator. For very small modulation indices, only the first
term will be significant, so the effect of using
modulating with noise is to reshape the spectrum of the oscillator, by adding approximately a scaled copy of the
noise modulator's spectrum centered at the carrier frequency,
. When I = 0 , the convolution reduces to
the delta
function,
, the spectrum of a sine wave at frequency
.
If the spectrum of the modulator rolls off smoothly, as in the case of low pass filtered noise (we have used a four tap averaging lowpass filter), then the frequency modulation will produce an approximately bell-shaped spectrum centered at the carrier frequency, as shown in Figure 3. The exact shape will be determined by the spectrum of the modulator, and the bandwidth will be proportional to the index of modulation.
Figure 3. Spectral snapshot of the signal produced by a bandwidth enhanced oscillator. The spectrum on the left corresponds to a partial with zero bandwidth (a sinusoid). The spectrum on the right corresponds to a partial with non-zero bandwidth.
Lemur Pro for the
Macintosh or PowerMac is available by anonymous ftp at www.cerlsoundgroup.org, in pub/lemur.
The authors may be contacted at lemur@uiuc.edu.
References
Jont B. Allen, "Short Term Sectral Analysis, Synthesis, and Modification by Discrete Fourier Transform," IEEE
Trans. Acous, Speech, Signal Processing, vol ASSP-25, pp. 235-238, 1977.
R.E. Crochiere, "A Weighted Overlap-Add Method of Short-Time Fourier Analysis/Synthesis," IEEE Trans. Acous, Speech, Signal Processing, vol ASSP-28, pp. 99-102, 1980.
Charles Dodge and Thomas Jerse, Computer Music, New York, NY: Schirmer Books, 1985.
Kelly Fitz, William Walker, and Lippold Haken, "Extending the McAulay-Quatieri Analysis for Synthesis With a Limited Number Of Oscillators," Proc. Intl. Computer Music Conf., 1992, pp. 381-382.
Kelly Fitz, Lippold Haken, and Bryan Holloway, "Lemur - A Tool for Timbre Manipulation ," to appear in Proc. Intl. Computer Music Conf., 1995.
Adrian Freed, Xavier Rodet, and Phillipe Depalle, "Synthesis and Control of Hundreds of Sinusoidal Partials on a Desktop Computer without Custom Hardware," Proc. ICSPAT, 1993.
Robert C. Maher, "An Approach for the Separation of Voices in Composite Musical Signals," Ph.D. dissertation, Dept. of Computer Science, University Of Illinois at Urbana-Champaign, 1989.
Robert J. McAulay and Thomas Quatieri, "Speech Analysis/Synthesis Based on a Sinusoidal Representation," IEEE Trans. Acous, Speech, Signal Processing, vol ASSP-34, pp. 744-754, 1986.
Michael R. Portnoff, "Time-Frequency Representation of Digital Signals and Systems Based on Short-Time Fourier Analysis," IEEE Trans. Acous, Speech, Signal Processing, vol ASSP-8, pp. 55-69, 1980.
Xavier Serra, "A System for Sound Analysis/Transformation/Synthesis Based on a Deterministic Plus Stochastic Decomposition," Ph.D. dissertation, Dept. of Music, Stanford University, Stanford CA, 1989.
Ferrel G. Stremler, Introduction to Communication Systems, Reading, MA: Addison-Wesley Publishing Co., 1982.
Macintosh(R) is a registered trademark of the Apple Computer Corporation.
Go to Kelly's Home Page
Download a postscript version of this paper. (2034 kbytes)