Recording & Rendering
& Rendering 101 --- Acoustics
--- Subjective evaluation ---
. . .
My talk is about
Earlier this year it became very clear to me how strongly the loudspeakerís polar frequency response and the speaker placement in the room determine what we hear when we listen to a 2-channel sound presentation.
of all I must emphasize that playback of a recording over two loudspeakers
can only create an auditory illusion of the original event.
Two channel playback produces loudspeaker
cross-talk signals at the ears. The left speaker signal reaches left and
right ears. Similarly for the right speaker.
Stereophonic sound recording and reproduction is a different process from binaural, as with dummy heads and headphones, nor is stereo an attempt at sound field reconstruction which would take many more loudspeakers.
With two loudspeakers we can only hope to emanate a sufficient range of auditory cues that allow us to recreate in our mind the recorded acoustic event, In the process we must minimize those auditory cues that mislead us into a different experience, like listening to two loudspeakers in a room.
has been my experience that the best strategy for designing
and their optimum placement in a room,
is to minimize misleading cues.
When properly done it is actually possible to create a fairly convincing illusion of listening into a different acoustic space.
are some obvious and well known misleading cues that the loudspeakers
themselves can contribute:
- Non-flat on-axis frequency response
- Resonances in loudspeaker drivers and cabinets
- Non-linear distortion and the generation of spectral components that were not in the original
And then there are the potentially misleading cues that the loudspeaker can generate in combination with the room due to:
- The off-axis sound radiation and the resulting room reflections. Typically the off-axis and on-axis frequency response curves are different.
I believe this area has not been sufficiently studied and is not fully understood
needs to be considered about a listening roomís contribution to the
perceived sound of a 2-channel reproduction is:
- The temporal symmetry with which sounds are reflected back to the listener
- The delay with which the reflections arrive at the listenerís ears relative to the direct sound
- The spectral content of the reflections,
- and the rate at which they decay
In addition we have potentially the room modes or resonances at lower frequencies.
A lot of attention has been focused on modes and bass reproduction, but not so much on reflections and their effect upon imaging and spatial perception.
has been my experience and that of many others, that for optimum stereo
- The loudspeaker-listener triangle should be set up symmetrical to the room boundaries
- The loudspeakers should be out in the room and at least 1 m or 3 feet away from large reflecting surfaces
What is a new insight to me is that the loudspeakers should have a uniform polar response such as
- an acoustically small dipole or open baffle loudspeaker,
- or an acoustic point source, a monopole or an acoustically small omni-directional loudspeaker.
are the two loudspeaker types that I had designed for different
The monopole is a 3-way system in non-resonant enclosures. It is omni-directional radiating up to around 3 kHz and then becomes forward pointing due to the size of the tweeter. Crossovers are at 1 kHz and 100 Hz.
The dipole is a 3-way open baffle speaker with conventional dynamic drivers. Crossovers are at 1.4 kHz and 120 Hz.
Speakers were measured outdoors on a tower and on the ground.
Both loudspeakers have a flat on-axis frequency response when measured outdoors under free-field conditions. A 4pi to 2pi transition between 200 Hz and 100 Hz extends the flat response to half-space conditions for the low frequencies.
All listening tests were done in my living room.
For my personal enjoyment the critical stereo listening position is A, but I also listen a lot from B, further away.
Both loudspeakers, dipole D and monopole M were also measured in this room.
layout shows the listening triangles, symmetrical to the room boundaries.
The speakers are 2 m and more out from the wall behind them.
The room extends behind the listener and is acoustically open or dead in that direction.
Reverberation time is around 450 ms. The room is fairly live.
Both loudspeakers, M and D, sound confusingly similar under these conditions.
The surprising observation led me to investigate how this similarity in sound perception might be possible.
frequency response measurement from position A shows the effect of the
room upon the direct loudspeaker signal.
A 200 ms time record has been analyzed.
It includes various reflections and room resonances, but is not long enough in duration to fully resolve all room modes.
Clearly, dipole and monopole responses look different. But also the corresponding left and right speakers measure differently.
The 1/3rd octave smoothed responses show this more clearly.
There is little indication of the flat free-field outdoor measurement response in these data.
Now letís look at the response in the time domain to see the contributions from reflections.
many reflections are generated by a loudspeaker and which direction do they
For example, here is a dipole like D in a room corner. Every image of D contributes sound via reflection. In some cases direct, as from the side wall S in other cases by bouncing around between rear wall, side wall and floor, R+S+F.
To measure reflections I use a 4-cycle, Blackman window shaped toneburst.
The burst covers about 1 octave in frequency.
The envelope of the burst signal on a logarithmic scale will be used to visualize reflections.
dipole loudspeaker has a separate rear tweeter, because the front tweeter
is closed in the back.
Initially no rear tweeter was used. Under outdoor, free-field measurement conditions it had no effect upon the on-axis frequency response. Also it was invisible for off-axis angles up to +/-60 degrees.
In the room, though, it contributes to the sound at the listening position via reflections.
The smoothed frequency response at position A shows the rear tweeter on and off.
The reflection pattern changes whether the rear tweeter is ON, OFF or Reversed in polarity.
The different conditions are audible in terms of timbre and imaging with rear tweeter ON or OFF.
They are audible in terms of strange imaging with the rear tweeter ON and then Reversed as for monopole like off-axis behavior.
There is some correlation to the predicted reflections from the room corner, but there are many more reflections.
letís look at dipole and monopole reflection patterns during the first
The left most peak is always the direct signal. It is the envelope of the 3 kHz burst. It is followed by the room reflections.
You see immediately that the reflection
magnitude is lower for the monopole M. That is a result of being closer to
the microphone than the dipole D and normalization to the direct signal.
You also see differences between left and right speakers because the room furnishings are not symmetrical.
It is interesting to look at the power spectrum of left dipole and monopole that corresponds to the time domain presentation shown here.
we see that the total power spectrum of direct and reflected signals
follows the spectral envelope of the direct signal except for some
The spectrum is very similar for dipole and monopole, only noise and distortion are higher for the monopole due to a less capable tweeter.
Thus, even though the specific reflection patterns for dipole and monopole look different, their spectral content is dominated by the direct signal and its reflections.
So far we only looked at the first 50 ms. Now letís extend the time to 400 ms and place the microphone further out into the room to position B.
immediately see the decay of the reflections.
It is shown here for two frequency regions, an octave around 3 kHz and an octave around 800 Hz.
At 3 kHz the monopole has lower reflections because the tweeter becomes forward directional.
At 800 Hz, though, you can clearly see the larger amount of room reflections generated by the monopole compared to the dipole.
One might estimate the decay rate, but the envelope is rather ragged. I am of the opinion that reverberation time is not a very meaningful parameter for acoustically small spaces like we have here.
is illustrative to look at the reflection pattern for different frequency
Here I display the full-wave rectified toneburst and its reflections, because the envelope as derived by the energy-time-curve shows artifacts when the signal to noise ratio is not high enough.
The display is on a linear amplitude scale.
As we go from 3.2 kHz, to 1.6 kHz, to 800 Hz, to 400 Hz, to 200 Hz and finally to 100 Hz, You can see how the direct signal gradually changes from a single spike to the discernible cycles of the toneburst.
Note that the visual multiplicity of individual reflections at 3.2 kHz gets gradually integrated into fewer and fewer variations.
Time resolution is lost as the reflections overlap more and more. The room begins to respond as a whole as we go down to low frequencies.
Reflections lose their meaning as a descriptor and room modes or resonances become important.
We have seen some of the reflection patterns for a dipole and a monopole. They are different when viewed in the time domain though their spectral content is similar. We have also seen that the measured frequency response is different for M and D. Neither measurement gives an indication of the great similarity between M and D that is heard in the room.
summarize the similarity:
The dipolar and monopolar loudspeakers sound almost identical in their spectral balance and clarity, despite the differences in measured room response and burst response.
Phantom imaging is very similar, but with greater depth for the dipole.
Loudspeakers and room "disappear" so to speak.
This to me is a most surprising result. I have demonstrated it many times to visitors seated in A or B. They can switch instantly between monopole and dipole.
People got up from their chair to walk over to the speaker to listen which one is playing.
I would not have expected such similarity because the two loudspeakers follow different concepts and even use different quality drive elements.
But, and this is the key point, both speakers have the same on-axis frequency response under free-space conditions. While one is a dipole and the other is essentially an omni they each have an off-axis response that is an attenuated version of the on-axis response.
Thus the room is illuminated with spectral uniformity by both speakers. The monopole is like a bare light bulb, the dipole like two flash lights back to back.
here is the hypothesis.
Confusing cues from the room are minimized if the reflections are:
1 - Left-right symmetrical
2 - if they are delayed >6 ms, and
3 - if they are attenuated copies of he direct sound in spectral content
I think a strong case can be made for our ability to sort out different auditory cues and to focus our attention to hear what is of interest at the moment.
It goes back millions of years and is evolutionary programming of the processor between two ears.
hearing must have evolved out of the need for survival.
For that it is essential to know the direction and the distance from which a threat is coming. So attention is paid to cues that tell direction and distance.
Those cues must be sorted out in different surroundings, like in an open savanna or a thick forest where reflections and reverberation have different acoustic characteristics.
It is also likely that we learned to integrate reflections of the threat with its direct sound and to mask stationary, non-threatening sounds, thus adapting to different situations.
Listening in rooms is just a blink on the evolutionary time scale and so we still use the same adaptive and hard wired processes to tell direction, and distance, and we mask what is not relevant.
Of course, much of the hearing process has been studied and the Precedence effect clearly is at play here.
precedence effect shows up in 3 phenomena in a room with multiple
1- as a localization effect, where the direct and reflected sound are heard as a single entity from the location of the direct sound.
2 - as the Haas effect, where a direct sound is integrated with a delayed sound for increased loudness
3 - as de-reverberation. We are normally not much aware of reverberated sound even when its energy is larger than that of the direct sound.
Reading the literature on sound perception I conclude that the specific case of 2-channel sound reproduction in an acoustically small space, like a living room, deserves further investigation.
Stereophonic listening, though, is a complicated case to study because of the variability of conditions and the many parameters that influence it.
the hypothesis of symmetry of reflections, of delays >6 ms and in
particular of the spectral content of the reflections in order that the
room is not heard, there are then a number of requirements upon
loudspeakers and rooms.
They are often not met and so become impediments to creating a realistic impression of a 2-channel recorded acoustic event.
1 - The polar response of typical box loudspeakers is omni-directional at low frequencies and becomes increasingly forward directional with higher frequencies
2 - Many loudspeakers have insufficient dynamic range and distort at high levels
3 - Speakers are placed too close to walls and not symmetrical with respect to the room
4 - Rooms are acoustically treated, but the treatment primarily attenuates high frequencies
5 - Electronic room equalization is based on in-room measurements that correlate poorly with perception
and finally, but most importantly:
6 - Recordings that were done with too many microphones and in synthesized acoustic spaces, so there is no coherent acoustic space in the recording to begin with.
the other hand, if loudspeakers, room and recording are appropriately
Two-channel playback in a normal living space can provide an experience that is fully satisfying.
Loudspeakers and room disappear and the illusion of listening into a different space takes over.
Thank you for you attention!
The illusion of listening into
a different space can only be created if the recording contains the necessary
The following are comments from a recording engineer's perspective.
new look at Recording for Stereo
The ORION is qualified to serve as a standard in monitor loudspeakers. They do not depend upon room standardization or special acoustic treatment to enhance uniformity of result. A normal room with normal acoustics will optimize the listenerís "room subtraction filters", reducing both the roomís influence and the effect of differences between rooms. Once the value and utility of this system was recognized and accepted, some conclusions involving acoustical recording have formed as a result.
Stereo, it turns out, is a less restricted and far more capable medium than previously thought. When well executed, there is a reassuring timelessness to it, and surprise at how complete the stereo experience can be. Who knew there was this much gold left to be mined? Even sub-optimal recordings benefit and are given new life.
One can also appreciate more easily that the audience perspective is the reality, the true reference. It represents a widely shared experience that ought to be respected rather than ignored. For illustration, I have watched a few young conscientious conductors begin a passage in rehearsal, then quickly run back into the auditorium for a few moments to judge the real balance. They know where reality resides, and it isnít the podium. In the end, of course, there is no practical aural escape from the podium, but that dash for a brief glimpse of reality remains telling. Yet despite this, it is the podium perspective that has dominated stereo recording from the beginning. There are reasons for this, but most are no longer valid.
More technically, the newly demonstrated importance of a uniform polar pattern in loudspeakers logically confirms a similar importance in microphones.
We simultaneously require coherence and incoherence (spaciousness, broadly defined) in stereo recordings and two microphones, whatever their patterns or however arrayed, cannot satisfy this requirement. They must inevitably produce a compromise that cannot fulfill the potential of stereo.
New recording techniques for increased realism are required to respond to this new level of accuracy in monitoring. As an example, a recording system that addresses these observations would require four microphones that separate the contradictory requirements: two for coherent information and two for incoherent information. Or more simply, one pair for the cause and one pair for the effect. Such a system is under evaluation. And this time Ė for the very first time, I think Ė a plausible standard can be employed to evaluate the process and the result.
Don Barringer, 11/2007
Also see and listen to "Accurate sound reproduction from two loudspeakers in a living room" under Publications #23.