Testing with real music: "Dirty Blue"
Next, I fed the
encoders real music. The test file was a four second clip from
"Dirty Blue" by Adam Makowicz with Phil Woods, a jazz
instrumental which includes drums, bass, and saxophone.
Since the
encoders are nearly identical in the low frequency range, the graph
above shows only the information above 10 kHz. At a 128kbs bit rate,
the Blade and LAME encoders politely bow out at 16 kHz. The FhG
encoder strives mightily all the way out to 20 kHz, but this results
in obvious errors in the power spectra. The Xing encoder has the
oddest behavior - - most of the high frequency information past 16
kHz is gone except for two spikes of energy around 18 kHz. Curious indeed.
Blade meets
"Dirty Blue"
Blade
meets "Dirty Blue"
As I increased
the bit rate the power spectra of the encoded files better matched
the power spectra of the test signal in all cases except that of the
Xing encoder. The power spectra of the Blade encoder is shown above
for the "Dirty Blue" test file encoded at five bit rates
(128, 160, 192, 256, and 320 kbs). For 160 kbs and above, this graph
is also representative of the behavior of the LAME and FhG encoders.
As you can see, the response extends out to 20 kHz for bit rates of
160 kbs and higher. In addition, while the error in the spectra is
obvious at 160 kbs, it gets much smaller at higher bit rates. The
graph below shows the same test for the Xing encoder.
Xing and
"Dirty Blue"
Xing
& "Dirty Blue"
The Xing encoder
never quite makes it out to 20 kHz, even at 320 kbs. Why does it cut
off the music file and not the pink noise? At these frequencies, note
the difference in the power between the pink noise and the music test
file. The power of the music file past 16 kHz is about two orders of
magnitude lower than the pink noise. Perhaps the Xing encoder
considers this "below threshold" or "masked" and
has changed the frequency spectrum accordingly. Can we hear
this high frequency hash as well as we can see it in the power
spectrum? I'll address that question in the listening tests.
Accuracy and
Mean Standard Error (MSE)
Before we do the
listening tests, we have to address the question of accuracy. Now you
wouldnt expect the decoded MP3 waveforms to match the test waveform
bit for bit since MP3 encoding is a perceptually based compression scheme.
The point of the encoder is to sound the same and not
necessarily to measure the same, but arguably an encoder which
most accurately reproduces the original waveform should also sound
the best.
The above graph
shows the MSE (arbitrary units) between the original waveform and the
encoded signal at various bit rates. Note that the Fraunhofer encoder
has the lowest MSE at 128 kbs by a large margin. FhG continues to
have the least error up until a bit rate of 256 kbs, at which point
all the encoders essentially have the same MSE. There's a diminishing
rate of return as the bit rate increases past 192 kbs -- improvements
continue with higher bit rates but relative gains become smaller and smaller.
Listening tests
Numbers can only
tell you so much. Trying to choose an MP3 encoder by measuring its
bandwidth and waveform MSE is like trying to choose a car by only
clocking its 0-60 mph acceleration time and counting the number of
cup holders. We've still got to talk feel. So how do these
encoders sound? For this test I used the following audio tracks:
Op.59 no.3
"Razumosvsky" Beethoven Quartet from Key to the Quartets,
performed by Emerson String Quartet
String
instruments are harmonically rich and display a wide array of
tonalities and dynamics. In addition to their distinctive sound,
aggressive bowing brings out sharp transients with complex overtones
which might make for difficult encoding.
"Setting
Sun," from Dig Your Own Hole, Chemical Brothers
This song is a
wall of sound from top to bottom. In the upper frequencies there are
plenty of percussive tracks, frequency sweeps, and some constant high
pitched tones. In addition, there's a deep, gated bass drum that
solidly cuts through the music. Plenty here to trip up an encoder.
"Tears
in Heaven," from Eric Clapton Unplugged, Eric Clapton
This recording
has great "presence," in the sense that it gives you a
feeling of being right there during the recording. On my
stereo the imaging is wide and deep, with well defined space between
instruments. Also, there are plenty of audio cues to listen for when
doing critical comparisons. At the beginning of the track you can
hear someones foot tapping with short but distinctive reverberations
coming from the wooden stage. The guitars produce sharp and clean
attack transients. Clapton sings with backup singers, and vocal
tracks can be revealing. Finally, there's a high pitched bell with a
long and clear decay.
Copyright
2000 arstechnica.com
Next: Results & observations
What do you think?
Post your comments on our Forums.
|