Edwin Tellman, Lippold Haken, Bryan Holloway
CERL Sound Group
University of Illinois at Urbana-Champaign
We present an algorithm for morphing between sounds (timbre interpolation) using Lemur analysis/synthesis. The Lemur representation is made up of amplitude- and frequency-varying sinusoids. Our timbre morphing involves time-scale modification of sounds (to morph between differing attack rates and vibrato rates), as well as amplitude and frequency modification of individual sinusoidal components. Lemur is available via anonymous ftp at unicorn.cerl.uiuc.edu in the directory /pub/lemur.
Timbre Morphing is the process of combining two or more sounds to create a new sound with an intermediate timbre. For instance, if a long loud sound with a fast and narrow vibrato is morphed with a short quiet sound with a slow and wide vibrato, the morphed sound should be a medium length, medium loudness sound with an intermediate vibrato speed and width. This process differs from simply mixing the two sounds; only a single sound, with some of the characteristics of each of the original sounds, is audible as the morphed sound. Morphing can be used to create interesting new sounds which have some of the characteristics of familiar, naturally occurring sounds, and to provide more realistic synthesis of natural instrument tones. The latter becomes useful when increasing the loudness of a quiet sound, or artificially adding vibrato by providing the continuous spectral changes that naturally occur.
Previous work in timbre morphing includes Gray, Schindler, and Haken. Here we study the morphing of Lemur files describing sounds which may contain an unequal number of features and non-harmonic components.
3. The Lemur Representation
An implementation of the McAulay-Quatieri  sinusoidal technique was done by Rob Maher  and James Beauchamp. Our analyses use an extended technique, implemented as Lemur, a program for the Apple Macintosh. The technique models sounds as sinusoids with time-varying amplitudes and frequencies, called partials. Partials have the ability to be "born" and to "die." This implies that a full analysis can have a varying number of partials. A typical Lemur graph shows time along the horizontal axis, frequency along the vertical axis, and different shades of gray to represent amplitude. However, the graphs presented here were created showing detailed amplitude deviation, and no frequency deviation.
4. Our Algorithm
We investigated the morphing of pitched sounds with vibrato. This discussion describes the morphing of two sounds. The principles discussed, however, can be applied to any number and proportions (or time-varying proportions) of original sounds.
The amplitude envelopes and frequency envelopes of corresponding partials of the two sounds to be morphed are averaged together. Partials are matched by looking for partials in each of the sounds in which the ratio of analyzed frequency to that sound's fundamental frequency is approximately equal. If there is a partial in one sound with no corresponding partial in the other sound, it is morphed with zero magnitude and a frequency determined by the ratio of the fundamentals.
Since the analysis frequency cannot be accurate for very quiet partials, the frequency used in morphing may not be exactly the frequency from the Lemur analysis. If a quiet partial with inaccurate frequency information is morphed with a loud partial with accurate frequency information, this results in a medium amplitude partial with audibly inaccurate frequency information. This problem is avoided by using the frequency of the nearest harmonic for very quiet partials and using the analysis frequency for loud partials. As a partial gradually increases in volume, more of the frequency used in morphing is taken from the analysis, so there is no abrupt change between a calculated frequency and an analyzed frequency.
Interpolation of both frequency and amplitude is done in a log scale.
We considered sounds which had any number of "features." We distinguish between two different types of features:
Specific points on each sound such as the start of the attack, the peak of the attack, the loudest point, the start of the decay, etc. Each of these points in the original sounds should be algorithmically lined up to create the morphed sound.
Features in one sound which don't necessarily correspond exactly to a specific feature in the other sound. For instance, it is not necessary that the fifth vibrato cycle from one sound is interpolated with the fifth vibrato cycle in the second sound. It is important, however that vibrato peaks in the two sounds match up in the interpolation so the morphed sound has a single vibrato rate. It is assumed that repeatable features may be skipped or repeated as necessary. This implies that the frequency and amplitude at the beginning of each repeatable feature should be approximately equal to the frequency and amplitude at the beginning of adjacent features.
When morphing, the length of the current morphed feature is calculated as a weighted average of the length of the current feature in the two original sounds. The partial envelopes for the two original sounds are then stepped through at a rate such that the ends of the current feature in each sound are reached at the same time. The same principle is applied at a larger scale when the end of a repeatable feature is reached. At this point, the number of repeatable features before the next unique feature or the end of the morphed sound is calculated. This number is a weighted average of the number of repeatable features remaining in each of the original sounds before the next unique feature or the end of the sound. If a sound doesn't have enough repeatable features remaining, the most recent feature may be repeated. Conversely, if a sound has too many features, the next feature(s) are skipped.
[Grey, 1975] John M. Grey, "An Exploration of Musical Timbre." Ph.D. Dissertation, CCRMA/Dept. of Music, Stanford University, Stanford, CA., Dept. of Music, Report No. STAN-M-2, 1975.
[Schindler, 1984] Keith W. Schindler. Dynamic Timbre Control for Real-Time Digital Synthesis. Computer Music Journal, 8 (1): pp. 28-42, 1984.
[McAulay and Quatieri, 1986] R.J. McAulay and T.F. Quatieri. Speech Analysis/Synthesis Based on a Sinusoidal Representation. IEEE Transactions on Acoustics, Speech and Signal Processing, 34 (4): August, 1986.
[Maher, 1989] Robert T. Maher. "An Approach For The Separation Of Voices In Composite Musical Signals" Ph.D. Dissertation, University of Illinois at Urbana-Champaign, 1989.
[Haken, 1992] Lippold Haken. Computational Methods for Real-Time Fourier Synthesis. IEEE Transactions on Signal Processing, 40 (9): pp. 2327-2329, Sept. 1992.
Figure 1: A morph between two violin sounds which does not take into account the vibratos of the two sounds. The resulting sound has an erratic vibrato which doesn't resemble either of the original sounds
Figure 2: A morph between the same two sounds which interpolates between the vibratos of the original sounds. Its vibrato looks very similar to both of the original vibratos, but has an intermediate speed and width.
Figure 3: A violin gradually changing to a clarinet and then back to a violin. The clarinet sound has a very small and slow vibrato, so the vibrato gradually decreases in magnitude and increases in width in the middle of the sound and then reverses the process towards the end of the sound.