Virtual Sound Environment

by Kristal Cazella

Throughout the centuries, artists have strived to recreate or replicate the natural world, through painting, sculpture, and other mediums. While visual forms are perhaps the most ready examples, audible mediums are far from excluded. The advancement of technology has opened a multitude of possibilities on all fronts, though the focus here is on computer-generated sound. More than the recreation of a sound, though, modern techniques and equipment have allowed us to become immersed in a virtual world our senses can interpret to be as “real” as its non-virtual counterpart. This is achieved through a sense of position and well-engineered composition.

The main idea and purpose of spatial audio is to create a virtual sound that can be perceived in more than one dimension (two or three), giving it a sense of realness. “The listener perceives as if the signals reproduced at the listener’s ears would have been produced by a specific source located at an intended position” (Bai & Lee).This is typically applied toward reproducing sounds and effects found in the physical world in a virtual space, though this “reality” can be manipulated as much as creativity allows. This flexibility and divorce of the invisible virtual world from the visible physical world allows combinations and scenarios that might otherwise be infeasible or even impossible.

The concepts of spatial audio can contribute to effectiveness by application in technologies such as mobile phones, video games, home theater, and others, primarily communication- and entertainment-related.

As expected of almost any audible medium, “the rendering of spatial audio is either by headphones or loudspeakers. Headphones reproduction is straightforward, but suffers from several shortcomings such as in-head localization, front-back reversal, and discomfort to wear. While loudspeakers do not have the same problems as the headphones, another issue adversely affects the performance of spatial audio rendering using loudspeakers. The issue frequently encountered in loudspeaker reproduction is the crosstalk in the contralateral paths from the loudspeakers to the listener’s ears, which may obscure source localization. To overcome the problem, crosstalk cancellation systems (CCS) that seek to minimize, if not totally eliminate, the crosstalks have been studied extensively by researchers” (Bai & Lee). Repeatable positioning becomes more and difficult the further the physical source of sound is moved from the listener’s ear. This is because the “sweet spot” shrinks with increased distance and difference angle, as well as additional sources.

When reproducing a “real world” sound environment, it would make sense to use the same number of dimensions this takes place in: three. “A 3D sound field is generally synthesized by considering three factors, the horizontal component, the vertical component, and the distance from the sound source” (Ito, et al). These correspond to x, y, and z in a mathematical perspective. Even in nature, though, a situation can come so close to being interactive exclusively on a specific plane that a third dimension is redundant. Common two-dimensional applications typically simply do without the vertical aspect. “The sound field in the horizontal direction is realized through the interaural intensity difference (IID) and the interaural time difference (ITD). Distance is represented by the attenuation of the sound volume in proportion to the distance from the sound source” (Ito, et al). The closer the source is, the louder the experienced volume. Likewise is true for the inverse. This dynamic is what is most readily noticed as a change with relation to distance, but a more subtle change in frequency may also be present depending on movement and placement. We do not naturally possess only one receptor for sound, however. Our stereo system is functional by the duality of input. “The IID is the following phenomenon. When a sound source is placed on the right side of the head, a stronger sound is heard in the right ear and a weaker sound is heard in the left ear. This phenomenon is realized in implementation by adjusting the sound volumes reaching the left and right ears in accordance with the distance from the sound source. The ITD is the following phenomenon. When a sound source is placed on the right side of the head, as above, the sound arrives at the right ear slightly earlier than at the left ear. The time difference is calculated as ITD = a/c ( + sin ) … where a is the head radius,  is the angle between the front direction and the sound source direction, and c is the sound velocity” (Ito, et al). Without these added mathematical tweaks, each ear would act as an independent microphone, resulting in an unrealistic effect as if somehow your physical head was nonexistent and sound could move freely through the space between the ears. In short, the effect would be ineffective and unnatural.

ItoImage.bmp (Ito et al)

Layering multiple applications of this technique over each other would theoretically construct a believable virtual landscape, but this practice would be largely inefficient and taxing on the system, assuming a large number of separate sound sources. If we were to consolidate some of these, it would require less time and work on the part of the machine generating the virtual environment. Whether we realize it or not, our minds often group sounds instead of always tracking individual sources. For instance, a distant crowd would be recognized as a sound source instead of each individual within the crowd. We experience a similar processing of visual information. “The level-of-difference (LOD) technique, which is common in CG applications, is used in the synthesis of the sound field in order to reduce the computational cost when the number of sound sources is increased. LOD is a technique for compromising between realism and drawing speed by providing a fine representation to objects near the user and a coarse representation to far objects. The idea of LOD is applied to grouping of the sound sources, so that multiple sound sources far from the listener are integrated in processing in order to reduce the computational cost” (Ito, et al). Because of our natural habit to focus on things close to us, the listener would likely not notice a difference. This is a fine example of how the precision of technology can exceed our own ability to sense it. There is no need to spend the time or resources on something imperceptible and thus useless to the listener, especially when those resources could be better applied toward quality or an additional added element into the existing collection in order to enhance the experience.

Space is a vital component to our virtual environment, but what fills the space is just as important. An essential element to a believable environment is the sound itself. It takes an active observer to identify all of the sounds in a given space. This includes “noise”, which was discussed in detail over the first few chapters of Audio Culture. “To adapt everyday sound is considered mostly from the point of view of eliminating what’s negative: minimize noise. And much is known about techniques of noise abatement, but not so much … on what exactly constitutes noise in everyday environments” (Kittay). As class discussions have demonstrated, noise has many differing definitions. An example of one is ““unwanted” or disturbing sounds, that either are unpleasant in and of themselves, or that impede effective (and “wanted”) communication” (Kittay).

Sound design is an art in itself, using aural medium as a painter might use paint. As opposed to the aforementioned subtractive technique for manipulating sound, painting is a suitably additive technique. “To the extent that sound design is to involve adding something positive, it usually involves using an audio system to put on soft, recorded music for peripheral, or ‘background,’ rather than focused listening. Very little systematic attention, however, is paid to sounds that are neither speech nor music. Examples are environmental sounds (rain, birds, bells), sounds of adjacent human activity, and continuous low frequency sounds such as drones” (Kittay). Until I mentioned it at this moment you might not have been aware of the immediate hum of a computer fan or the air conditioner, or perhaps a ticking clock or watch, or passing traffic outside. Little attention is paid them not because they are unimportant, but rather we have become accustomed to them. Compared with speech communication, background sounds have little immediate meaning and so become low-priority in our minds. This is common for sounds that are repetitive in nature, such as a steady ticking or hum. “As recording engineers know so well, each room already has its distinctive “room tone.” These are not things necessarily to be eliminated; they can even be enhanced or manipulated when one wants to “sense” one’s environment, or needs to do so for cognitive purposes” (Kittay). By influencing the listener’s natural prioritization of sounds, one can manipulate the impact of a sound and the experience of the environment, perhaps giving it a surreal quality.

Such attention to detail is what makes a soundscape more believable and sets an artist above the norm. When these virtual props combined with the virtual positioning techniques and placed around the listener, a believable virtual sound space is created. Technology and calculations mimic nature in order to trick our senses and place us in an otherwise non-existent location within cyber-space. This is the goal.


"Objective and subjective analysis of effects of listening angle on crosstalk cancellation in spatial sound reproduction" by Mingsian Bai and Chih-Chung Lee

"Design and Implementation of 3D Interface for Digital City" by Hideaki Ito, Siewling Teh, Hideyuki Nakanishi, and Toshihide Hagawa

"The Sound Surround" by Jeffrey Kittay

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License