In Computer Audio, More Samples Matter
A little More Sampling Can Go A Long Way
By Trevor Marshall
January 08, 2001
Last month (DVD Audio Part I Of II: DVD Audio CDs and DVD Audio Part II: Sounding Good), I briefly discussed some of the technologies used in both DVD-audio and DVD-video recordings. I made the statement that a 96-KHz sampling rate was overkill, and received a number of e-mail disagreeing with my opinion.
This letter from Michael Zarky contains a pretty good summary of all the issues that readers raised.
Subject: Digital-Music Quality
I enjoyed your recent Byte.com columns about the DVD music formats. I wish to object to your reliance on the numbers game when you question the need for much greater sampling rates/data depths. No doubt the sound is good enough for most people, who have grown up with amplified sound and don't know how to use their ears, I see you have a background in rock/amplified music. But a significant number of people who listen critically to acoustic music (I've been a harpsichord builder) find CD-quality audio harsh; so clearly there is room for improvement. A little digital overkill might help, I've never met any serious listener who thinks recorded/reproduced music can equal the sound of a live concert.
Reliance On Numbers
Engineers spend much of their lives trying to make their world fit into the straightjacket of scientific laws and principles that they were taught at universities. But the most ticklish engineering problems are those that cannot easily be numerically analyzed.
So, is a 96-KHz sampling rate twice as good as the 48-KHz rate I recommended? And even better than the 44,100-Hz rate used on audio CDs? I firmly believe the answer is no, even though the numbers paint a deceptively simple concept around which even the most junior marketer can construct an effective spin campaign.
I clearly did not make this point very well in my previous column, so let's look at it from another angle.
One of the most basic waveforms is the square wave. The square wave is a synthetic waveform that doesn't occur in nature, but which is easily created with a signal generator.
In digital audio format it consists of a series of successive numbers, either +X or -X in size (X is the amplitude). The transitions between levels generate a spectrum containing a lot of odd harmonic energy.
These transitions also act as an impulse, and, in much the same way as a striker on a bell causes the bell to ring, even after the striking impulse has been removed, the electronic impulses will energize any circuitry that is resonant and cause it to "ring."
I used the Digital Signal Generator software written by He Lingsong, of China's Huazhong University, to create a file of 1-KHz square wave values sampled at 44100 Hz.
(The digital square wave .wav file has been loaded into Cool Edit 2000, so that the incoming digital values and the audio equivalent may easily be seen. Click the thumbnail for a full-size view.)
Cool Edit 2000 has a mode in which you can visualize the input digital waveform, together with the analog (audible) representation. By loading the square wave into Cool Edit we can see the original (yellow) values that have come from the file. But some analog artifacts have also appeared (in green), superimposed over the digital values.
These superimposed damped sinusoidal waves come from two sources.
All Sound Can Be Decomposed Into Sine Waves
First, the 1-KHz square wave is actually made up of an (essentially) infinite number of sinusoids at 1 KHz, 3 Khz, 5 KHz, and so on. When some of these sinusoids are removed from the square wave, the transition between the levels becomes less steep, and the "ringing" appears. This effect occurs whenever you filter the square wave to remove its higher-frequency components. In this case, Cool Edit has applied a 20-KHz lowpass filter to the digital values, and we can see that the resulting analog signal is different from the incoming digital waveform.
The number one dilemma of any audio engineer -- when there has clearly been a change to the input signal -- is it audible or is it not?
As Michael wrote"No doubt the sound is good enough for most people ... But a significant number of people who listen critically to acoustic music find CD-quality audio harsh; so clearly there is room for improvement.
Looking at the waveforms, there has definitely been a change to the input. But if, as is generally accepted, the human ear can only hear to 20 KHz, surely any changes above that frequency would not make any audible change to the output sound?
I don't believe this is true. I have personally set up two sinewaves at 19 KHz and 20 KHz into two different amplifiers and two separate tweeters. When the volume gets high enough, my ears become non-linear, and I can hear the 1-KHz difference frequency quite clearly.
So, here we have a second cause for degradation, the possible generation of audible components by the 20-KHz digital filter itself. These can come both from its intended action as a sharp cut-off filter, which gives rise to "ringing," and from non- linearity and other imperfections in the filter's implementation.
I therefore decided to record my square wave onto an audio CD so that I could see what the output waveform of several CD players looked like. I used DART32 to create an audio-CD compatible image of the square wave, burning it onto a CD-R with NERO 5.0 (If you want to burn a test CD yourself, I have zipped the file, and it may be downloaded from my website [a 256Kbyte archive].)
(This is a photograph of an oscilloscope screen showing the output when playing the square wave CD in a quality player. Click the thumbnail for a full-size view.)
Here is the output of the CD waveform from my (old) Sony CDP-620ES audiophile CD player. You can see that it bears a striking resemblance to the Cool Edit visualization, but that the sinusoid is a little less damped.
(The waveform was taken with a Hewlett Packard 2-GHz digital sampling oscilloscope and an old, digital camera)
When I played the CD on my daughter's cheap portable Panasonic player I got a different waveform.
(This is a photograph of an Oscilloscope screen showing the output when playing the Square Wave CD in a low-cost player. Click the thumbnail for a full-size view.)
Not surprisingly, there are more artifacts visible in the output of this CD player. Significant overshoot and an unusual damping characteristic are indicative of problems that will probably be audible, even to a novice.
(This is a photograph of an oscilloscope screen showing the output when playing the square wave CD in a Genica player. Click the thumbnail for a full-size view.)
Finally, here is the waveform from a portable Genica MP3 player (in the standard audio CD mode, of course). Even though the Genica is inexpensive, it is generating the least artifacts from the CD square wave, and the waveform is remarkably close to what Cool Edit predicted.
The Waveforms Are Different, Who Cares?
It is pretty obvious that the waveforms measured from each of these CD players are different. I have assumed that the Genica is producing the cleanest signal, but I may be totally incorrect. You see, we are dealing with 10 percent artifacts in the square wave, much larger numbers than the (typical) 0.5 percent threshold for audible harmonic distortion. It is generally accepted that the artifacts we are seeing are at frequencies beyond the limits of human hearing, and therefore cannot be heard. But it is possible that somebody may be able to hear differences between the sound reproduced by these CD players. In fact, it is probable.
The problems exhibited by these players are related to the sharp cut-off lowpass filter required to let 20-KHz audio be processed, while signals at and above the (44100/2 =) 22,050-KHz Nyquist limit are removed. The lowpass filter transitions from full amplitude to zero amplitude in about (22,050-20,000 =) 2050 Hz.
With a 48-KHz sampling rate, the filter is required to do its work from 20,000 to (48,000/2=) 24,000. (24,000-20,000=) 4000 Hz is a much more gradual cut-off slope than the 2050 Hz with the 44,100 KHz rate, and filters are much easier to implement.
So an analysis of the numbers actually shows the relationship between the filter quality and the sampling rate is not a linear function of the sampling rate. 96 KHz is not twice as good as 48,000, it is approximately 10 times better. Even 48 KHz is twice as good as 44.1 KHz. It was on this basis that I described the 96-Khz sampling rate as "overkill."
Can Recorded Music Sound Like Live Music?
Michael said, "I've never met any serious listener who thinks recorded/reproduced music can equal the sound of a live concert," and I fully agree.
But there is a point at which the recorded entertainment experience becomes pleasing in its own right. Surround Sound can take us a lot closer to that point than stereo ever did. And there is no doubt that audio is easier to distribute in a digital format without the significant degradations in quality that used to occur from blunt gramophone needles, and the like.
Selecting Computer Audio Components
I received a note from fellow Byte.com columnist Bill Nicholls, saying:
I want to compliment you on those audio columns. I am an interested, albeit with limited knowledge, about digital audio.
Your columns have made a lot clear to me, and I hope to graduate to being able to manipulate audio on my system in the near future.
It would be very helpful if you would write a column or two on setting up such a system, specifically on the audio card(s) and other audio equipment needed, and software options. Your short notes on software in the latest column are just too brief.
One of my plans for next year is to digitally capture live music directly on a computer for later processing. I don't know enough to do that yet, and would appreciate any help/pointers you could give.
Bill, I apologize for the brevity of my software descriptions. There always seems so much to say and so little space to say it! But I agree that outfitting a computer for digital audio can get a little frustrating, and so I will make a few comments on the equipment that I use.
The weakest point in any audio-reproduction chain has always been the loudspeaker.
And having more of them does make a difference. More speakers injecting bass sounds into the listening room introduces acoustic diversity that tends to hide the imperfections of each speaker. Additionally, imperfections in the mid-range tend to be less noticeable in a multi-speaker environment. So much so, in fact, that a typical surround-sound system uses much lower-cost speakers than would be required in a good stereo installation.
There are a lot of low-cost speakers being promoted for computer audio right now, and even Dell's new THX-certified computer systems are much less capable than an audiophile THX system would be. My advice here is to get the largest speakers you can fit into your computer environment. There are several low-end surround-sound systems intended for living rooms that have small satellite speakers. Listen to them at your local Electronics Superstore. The going price from Yamaha and Sony was around $400 the last time I looked.
I personally prefer Wharfedale speakers, (such as the "Opal50 and Opal 30") and hack the cross-over networks so that they sound half-reasonable. I also like some of the speakers from Infinity, but I had better stop right here. I could spend months talking about my speaker preferences and how to hack cross-overs. I tend to be a very critical listener. And an inveterate hardware hacker.
When auditioning your speakers (you will audition the most important part of your system before you buy it, won't you?), make sure you take along your own test CDs. I use Carl Orff's "Carmina Burana" (Deutsche Grammophon 447437-2) to test clarity, the ability to pick out individual components from a complex sound source, and Heart's "Mistral Wind" to make sure the speakers can play loudly without getting too harsh. Finally, Pink Floyd's "Wish You Were Here" has a bass torture test at the start of track two, when each door to "the machine" is opened in turn. Your sub-woofer must be able to play this track without becoming too distressed.
There is a new Soundblaster Live card from Creative, which has a digital output that connects to the integrated surround-sound speaker systems. Make sure you get a five-speaker system (not the old four-speaker configurations) plus sub-woofer. The surround digital signals are carried in a format called S/PDIF along a normal RCA video cable, and most computer speaker systems decode this signal directly, with a series of internal power amplifiers to drive the speaker array.
When I first transcribed a live performance from my digital-audio tape recorder to my computer I noticed that the sound became a lot harsher. At that time, I had an early SoundBlaster card. After I changed to a Yamaha sound card I found that the playback quality was indistinguishable from the signal DAT's audio-output terminals. The sound card you use does matter. I will try to locate a few multi-channel cards over the coming month and let you know how I think they performed in my next column.
Just as no one person is expected to completely comprehend the design of a computer's mother board, so the design of computer video and computer-audio systems is a lot more complex than it appears at first sight. As each computer technology matures, the integration of advanced functions into standard computers becomes seamless and invisible to the user (take, for example, the recent integration of the USB functionality onto mother boards). But computer video and audio are still in their infancy. As early adopters, we have the opportunity to drive the eventual direction of these technologies. Unfortunately, the pioneers are always the ones with the arrows in their backs....