A Systematic Review of The Most Appropriate Methods of Achieving Spatially Enhanced Audio For Headphone Use

Pinecone Research Labs
A systematic review of the
most appropriate methods of
achieving spatially enhanced
audio for headphone use
Benjamin Costerton

May, 2013

Abstract
The purpose of the dissertation is to

identify an appropriate method of
improving spatial awareness of audio for
headphone users. Spatial cues
experienced in everyday life are often lost
when using headphones. Focus is
made on users consuming audio content
on portable media devices such as
phones and tablets.
A systematic review of existing consumer

products was conducted, as well as
an investigation of current research
projects in spatial audio. Several key
elements thought to be the most crucial in
achieving improved spatial
awareness for audio were isolated early in
the research. These key subject
areas were explored through interview with
current researchers into the areas
of spatial audio and psychoacoustics.
The research found issues surrounding the

use of pre-user implemented
HRTF data, the issues involved with
externalisation being realised over
headphones, and the limitations of using
binaural recording techniques.
Several elements and challenges involved

with spatial audio are discussed.
The research concludes with suggested
areas of further research into spatially
enhanced audio to allow the research to
act as a knowledge base for
continuing study.
2
Contents
1. Introduction ................................................................................................................................4
1.1 Primary research areas ..............................................................................................................4
2 Literature Review ........................................................................................................................6

2.1 HRTFs, ITD, ILD and EQ ...........................................................................................................6
2.2 EQ ..............................................................................................................................................7
2.3 Concha Resonance/Externalisation ...........................................................................................7
2.4 Audio/Visual Scene ....................................................................................................................8
2.5 Timbral Issues - A Possible Result of Tricking the Brain ............................................................8
2.6 Re-recording 5.1 in Binaural .......................................................................................................9
2.7 Bit-depth Conversion ..................................................................................................................9
2.8 End User ..................................................................................................................................10
3 Methodology ..............................................................................................................................11
3.1
Introduction .....................................................................................................................................11
3.2 Summary ..................................................................................................................................11
3.3 Extended Methodology .............................................................................................................12
3.4 Method of Research Analysis ...................................................................................................13
4 Results .......................................................................................................................................14
4.1 Introduction ...............................................................................................................................14
5 Discussion .................................................................................................................................15
5.1 Timbral Issues ..........................................................................................................................15
5.2 Timbral Issues Related to Headphones ...................................................................................16
5.3 5.1 Surround to Binaural ..........................................................................................................17
5.4 Issues with 5.1 Surround - Binaural Re-recording ....................................................................18
5.5 Audio/Visual Scene ..................................................................................................................18
5.6 End User Manipulation .............................................................................................................19
6 Conclusion .................................................................................................................................21
6.1 HRTFs ......................................................................................................................................21
6.2 Externalisation ..........................................................................................................................21
6.3 Re-recording Techniques .........................................................................................................21
6.4 Suggested Further Research ...................................................................................................22
7 Glossary .....................................................................................................................................24
8 References .................................................................................................................................25
9 Bibliography ..............................................................................................................................27
3
gamingʼ. To have access to privacy while
listening to content on such devices users
will need to incorporate headphones into
their set-up, and with a 34% increase in
1 Introduction the sale of wireless headphones in 2012
(Anon, (L). n.d), this could be an indication
In 2013 surround sound can be for the need of more immersive audio on
experienced not only in cinemas but also our devices.
at home while watching movies (Anon, (F).
n.d), playing video games (Anon, (M). n.d) The soundtracks that accompany video on
while on the computer (Anon, (N). n.d) and these devices are in most cases a two
in some cases surround sound can be channel mix produced after the original
experienced in the car (Anon, (O). 2013). multi-channel (5.1, 7.1 ect) mixes found on
However surround sound is not yet found DVD and Blu-ray. Commonly this is done
in portable media. Mp3 players, tablets, via Lt/Rt (Left total/Right total) down-
phones, and other forms of portable media mixing, for example; “When the mixer
that are commonly used with headphones completes the final surround mix for a film,
do not have access to this same surround he processes the film through a Dolby DS4
audio experience. processor...The result is a 2-channel print
master ready to be converted into optical
Personal digital media is widely used soundtrack” (Purcell, J. 2007 p.314).
every day at home, at work, on trains, These mixes normally limit spatial
buses, and in the car, as well as many perception to a horizontal field, only
other places. A 2012 market survey found allowing humans to hear differences from
that 30.5% of the UK population owns a left to right. With the use of headphones, it
smartphone or tablet, and estimates show is possible to achieve extremely realistic
that by 2016 this could rise to 65.1% 3D sound, as an individuals own spatial
(Anon, (K). 2013). In a 2011 study on perception can be recreated using the
portable media usage 558 smartphone speakers covering his/her ears. This level
users and 419 tablet users were asked of 3D realism in sound can be found in
where they would commonly use their binaural media. By reproducing spatial
devices. When asked about using devices cues in the recordings that the brain
on a short commute (“on a bus or deciphers every day, the brain is able to
t u b e ” ( E v a n s , S . 2 0 11 ) ) 5 1 % o f believe it is hearing sound from all
smartphone users and 26% of tablet users directions while only listening to the two
said they use their devices. 66% of speakers (headphone). The dissertation
smartphone users and 55% of tablet users explores the possibilities of producing a
said they would use their devices on more realistic 2.0 stereo soundtrack to
longer commutes (“Long train journey or accompany visual content on portable
on plane” (Evans, S. 2011)). As well as media devices.
53% of smartphone users and 24% of
tablet user stating they would use their
devices while “waiting for a bus/train, in a
queue” (Evans, S. 2011). The study also 1.1 Primary research areas
showed that 34% of smartphone users and
36% of tablet users regularly use their The research has taken the following
devices for ʻMusic stored on deviceʼ, 12% areas of personal media consumption and
of smartphone user and 24% of tablet audio format into consideration.Surround
users regularly use their devices for ʻfilms audio in a consumer sense has been
stored on deviceʼ, and finally 20% of relatively successful in that several
smartphone users and 28% of tablet users hardware developers such as Sony (Anon,
say they use their devices for ʻcasual (F). n.d), Samsung (Anon, (G). n.d), LG
4
(Anon, (H). n.d), Panasonic (Anon, (I). n.d) image are only played to your left ear, and
and Phillips (Anon, (J). n.d) have their own the right signals only played to your right
5.1 home cinema systems on the market. ear. When listening to binaural recordings
Although these companies will have to you are listening to what the microphones
consider how many speakers consumers on the respective side of the dummy head
will want in their homes before they feel recorded, and all the spatial cues that
uncomfortable. There are currently were recorded along with it. Because of
headphones on the market with multiple this you will be able to hear sound in 3D
drivers in each head-cup, designed over a normal pair of stereo headphones,
specifically to achieve surround sound in using only basic spatial information that
headphones. The Razer Tiamat 7.1 your brain deciphers everyday.
features "10 discrete drivers built-in to
deliver the ultimate 7.1 Surround Sound Headphones therefore will always play a
experience" (Anon, (A). n.d), and is one significant part in the results of binaural
solution the manufacturer Razer feels is audio, as well as spatially improved
the answer to achieving surround sound recordings and synthesis technology. The
for headphones. dissertation acknowledges that the
The research has explored the possibilities frequency response of different
of achieving improved spatial headphones used will effect the end
characteristics for 2.0 stereo headphones, results of spatially improved audio, and
however has not explored any multi- that timbral issues may surround different
channel headphone options. The headphone choices. "[Are timbral issues
dissertation is focusing on changing end- brought about by the use of BRIR and
user and content creator technology as HRFT data] any worse than the difference
little as possible, therefore demanding a between some cheap headphones that
change in headphone usage is not you get with an mp3 player versus some
something that has been considered. nice Sennhesiers" (R, Mason. Personal
communication, 2013). The dissertation
An area of study that is heavily focused on however does not discuss the consumer
in the research is binaural audio. The term habits of headphone usage, nor does it
binaural audio is used when recordings explore in great detail the implications of
have been made in a way to replicate how using different headphones with audio
a human will hear and localizes sound. designed to achieve a greater spatial
"The term binaural stereo is usually impression. This would have to be
reserved for signals that have been explored in a separate body of research as
recorded or processed to represent the the factors involved and how they will
amplitude and timing characteristics of the implement the results of this body of
sound pressures at two human research are too great. It is important to
ears" (Rumsey, F. 2001 p.13). Binaural note that the research will not be looking
recordings are commonly made using a into the use of binaural soundtracks for
dummy head, featuring microphones DVD/Blu-ray releases, and will not be
where you would find eardrums, artificial looking to replace a 2.0 stereo mix on
ear canals, and replicas of human pinna DVD/Blu-ray. Instead the dissertation
(the part of the ear visible from outside the focuses on digital versions of content
head). The two microphones placed inside designed for use on portable devices such
the head will record sound independently as tablets and mp3 players where there is
while being subjected to all the spatial a sole consumer present.
cues that humans hear everyday and use
to localise sound. To experience the The dissertation does not discuss in any
desired effect, binaural recordings must be great detail the use of head-tracking as
listened to over headphones, so that the this would insist on changes to end-user
signals captured on the left of the stereo hardware. Instead the research focuses on
5
methods of achieving spatial awareness 2. Literature Review
where changing or manipulating end-user
hardware is kept to a minimum. There are 2.1 HRTFs, ITD, ILD, and EQ
issues surrounding the 'audio scene' and
its relationship to what visual is seen on As the space between each person's ear
the screen, and how the relationship drums, the length of his/her ear canals,
between the two is broken when users and the shape of the pinna is different,
move their heads. There are currently non- each person will have a unique set of
portable solutions such as the Realiser A8 head-related transfer functions, or HRTFs.
developed by Smyth Research (Anon, (E). HRTFs are a collection of measurements
n.d) that include the use of head tracking referring to how a certain head hears
to shift the audio scene with the user, but sound from different source cues. The
the research is not focusing in any great information given in a HRTF shows how
detail on these areas. the brain-ear combination can decipher,
among other things, which direction a
The dissertation concludes with what has sound is coming from. There is a problem
been found to be the most significant with collecting HRTFs however. As no two
contributing factors that will need to be human have the same shaped head, each
considered in order to achieve a spatially persons HRTFs will be different, and
enhanced audio track. The dissertation therefore recording HRTF values for
explores methods of making content that is binaural reproduction will only create an
originally mixed for a multi-speaker setup accurate representation of the subject
available for portable media users that is used in recording. This can cause
closer to its original form than a 2.0 stereo problems for binaural reproduction, as the
mix that will improve end user brain will not be use to these alien sound
experiences. cues. "People that have tried experiments
where they are given another person's
The use of ambisonic recordings is also HRTF, by blocking their own pinnae and
taken into consideration during the feeding signals directly to the ear canal,
research because of its flexibility to have found that their localising ability is
reproduce a 3D space over several markedly reduced. After a short time,
formats. However as this adds limitations though, they appear to adapt to the new
and will insist on change within the post information." (Rumsey, F. 2001 p. 26).
production process for the audio content, It These HRTFs are made up of several
is not considered in any great detail in this pieces of information including interaural
research. time difference (ITD), interaural loudness
difference (ILD), changes in equalisation
The underlying rule throughout the (EQ), and resonances from parts of the
dissertation is to make as little change to ear such as the concha and the ear canal.
methods used in content creation and end-
user habits. There is a strong emphasis on The ITD and ILD are key measurements
how appropriate each proposed method is for shifting the location of a phantom audio
to achieving a spatially improved image, however on their own it is hard to
soundtrack in relation to both content achieve anything more than horizontal
creators and end- panning. Changing the ITD and ILD of
users. sound cues can shift the phantom image
of sound as seen in conventional panning
techniques. Although as well as the size of
peoples heads being inconsistent, there
are some common problems when
measuring time differences. While
measuring localisation cues, time
6
difference in this example, attention needs sounds coming from below"(Han, H.L.
to be made to the frequencies that our 1994, p.19). He goes on to state that
ears, or binaural microphone will be "shoulder reflections do not play a
hearing. It is explained by Rumsey in his significant role in the high frequencies...
book Spacial Audio (2001) that humans any effects of such a reflection must be
are better at localising sound dependent insignificant compared with the effects of
on frequency, and will only be sensitive to the ear canal and the pinna" (Han, H.L.
a difference in phase at low frequencies, 1994, p. 19) H.L.Han's research goes on
generally no higher than 1kHz (Rumsey, F. to show how the concha is the most
2001). Rumsey goes on to state that important part of the ear when it comes to
"[Soundwaves] also give ambiguous detecting early reflections, and therefore
information above about 700 Hz where the important for ITD and ILD. "Covering part
distance between the ears is equal to half F [Fossa] with eardown has very little
a wavelength of the sound" (Rumsey, F. effect on the first arriving sounds, while
2001 p. 23). This 'ambiguous' information covering the concha C makes the double
is brought about because humans are less peak [initial sound and early reflections
able to localise sound once a given shown on a graph] fuse into one.
waveform is shorter than the space Therefore only the concha is responsible
between the two ears. If the ears are for splitting up the first arriving sound".
hearing two different waveforms at the (Han, H.L. 1994, p. 19). These results
same time the brain is less able to show that as sound arrives at the ear from
determine which ear is leading and which different directions, the EQ of the sound
ear is lagging behind, causing confusion will be changed, and therefore so will
about the location of sound sources. perception of location. In the same way
that sound arriving from different heights
will be affected by different part of the
outer ear, sound arriving from different
2.2 EQ directions on a 0° elevations should also
be affected by the outer ear. The outer ear
A possible key factor in producing usable is clearly different when viewed from
HRTF data for spatial enhance audio is 30°/-30° and 110°/-110° angles (common
EQ. Research has shown that there are locations of L/R and Ls/Rs speakers in a
changes in EQ depending on the direction 5.1 speaker set-up), so therefore
as well as elevation of a sound source differences should be expected in the EQ
(Han, H.L. 1994) and that this relies on the response of sound sources behind and in
different parts of our outer ear (Helix, front of the head. If these differences in EQ
Fossa, Anthihelix, and Concha). As sound are experienced when listening to sound
arriving at the head is greeted by an outer cues in front and behind the head, it
ear not subjected to change in everyday should be possible to use this EQ
life, EQ could possibly be used to alter the information to aid of reproducing sound at
perception of audio on the vertical and different locations related to the head.
horizontal fields, and assist in height and
front/back perception.
H.L. Han explains (Han, H.L. 1994) what 2.3 Concha Resonance/Externalisation
effect different parts of the ear have on
altering the EQ or sound sources at As the research is focusing on a situation
different heights on a 90° azimuth. "...the where the listener is mainly using
concha is the most active part in funneling headphones, the headphones used will
sound if the source is on ear level or up, play a big part of the success of the final
and that the part between the concha and reproduction. A known issue with
helix acts as an acoustic amplifier for headphone use is the sense of
7
externalisation, or more importantly the impression of a very large screen (Anon,
lack of it. "The so-called concha resonance (B). n.d). The glasses also feature ear
(that created by the main cavity in the buds close to the ears, locking the audio
centre of the pinna) is believed to be and visual scenes together. Users no
responsible for creating a sense of longer have the freedom to explore their
externalization" (Rumsey, F. 2001 p.26). surroundings using this technology, but It
As the majority of headphones supplied may be a solution to pairing the audio and
with devices such as ipods/mp3 players, v i s u a l b a c k t o g e t h e r. H o w e v e r
phones, and tablets are in-ear designs consideration will have to be made as to
(meaning that the driver sits in the concha whether users will want the audio and
as opposed to over the entire ear) sound visual scene to be locked together in this
will not resonate in the concha before format. In the real world the audio and
entering the ear canal and therefore this visual scenes are locked together, but
sense of externalisation will not occur. As a humans also have the freedom to explore.
result listening to audio using such In everyday life humans will have the
headphones can make sound sources freedom to move their heads slightly to
seem to come from inside the head, an better hear a sound cue, and therefore be
undesirable characteristics. able to locate a given sound. Using
technology such as the Vuzix glasses
removes the functionality and possibly the
ability to localise sound.
2.4 Audio/Visual Scene
One issue surrounding the use of binaural

audio for visual applications is the 2.5 Timbral Issues - A possible result of
relationship between the 'audio scene', this tricking the brain
being the audio heard over headphones,
and 'visual scene', this being the visual The basis of a lot of binaural reproduction
content seen on the screen. While or improving spacial awareness of multi-
watching visual media with loudspeakers, channel audio over headphones centres
the relationship between the two scenes is around understanding and manipulating
locked, assuming speakers will not be how the brain-ear combination localises
moving, but with headphone use this is not sound. Once this is understood,
the case. Moving the head 90° to the left manipulation of a location cue for a chosen
while listening to audio over headphones sound source is possible resulting in the
will break this audio-visual connection, creation of a phantom image. This is a
meaning the location of the sound cues similar process to stereo panning using
will not match the location of the relevant two speakers. Although sound is only
visual cues on screen. coming from two locations, phantom
While using an ipod this may not be as images can be produced in-between the
much of an issue. You still do not have a speakers due to changes in time and
'locked' audio-visual relationship, but this intensity.
being a personal experience, users tend to
hold the device in front of them and have a This manipulation of sound cues will most
m o v e fi x e d r e l a t i o n s h i p t o t h e likely bring about some challenges. When
screen.Technology does exist today that manipulation of sound takes place to
offers some assistance in the breakdown create a sense of space for six discrete
of audio/visual scenes in modern media. channels of audio for a single pair of
Companies such as Vuzix manufacturer headphones timbral issues can be
'glasses' that feature a miniature display experienced. Research carried out by BBC
behind each lens, meaning you can watch R&D for the Private Peaceful radio play
video inside the glasses with the produced in binaural used BRIR to
8
measure a 3D space and apply it to multi- recording has been experimented with to
channel audio (Pike, C. 2013.). While varying degrees of success.
using pre-user implemented HRTF data
timbral issues will most likely be Jorge Miramontes presents an example of
encountered through variables created by this by recording the 5.1 soundtrack from a
the difference between the equipment DVD with a pair of in-ear binaural
used to capture HRTF data, the content microphones (Miramontes, J. n.d).
manipulation stage and the HRTF data of Whether the two excerpts of film recorded
the end user. "Badly implemented HRTF in this fashion are able to achieve a sense
data can give rise to poor timbral of space or not would have to be decided
quality" (Rumsey, F. 2011 p.1). As the end- through trials to establish an average
users pinna and brain-ear combination is opinion. As mentioned previously when
different, timbral issues will affect the audio using binaural microphones, all audio
quality, and possibly stop the audio from being recorded is subjected to one set of
being usable. HRTFʼs and the outcome may not translate
well to every listener. Although it is worth
pointing out that this is the case with all
methods of improving spatial awareness in
2.6 Re-recording 5.1 in binaural soundtracks. Possibly if this binaural re-
recording experiment could be completed
One approach to creating a spatially using more favoured equipment in binaural
improved soundtrack is to record the study, the Neumann KU 100 or the
output of a surround mix by placing a KEMAR, along with a professionally
binaural head in the centre of a surround calibrated speakers set-up, then the
sound speaker arrangement. It is possible results might be a solution for attempting
to place sound in a 3D environment to achieve 5.1 reproduction over
through the use of multi-channel speaker headphones.
set-ups, and sound can be recorded in 3D
using microphones such as binaural
heads, so putting the two together does
seem a natural thing to do. Using space to 2.7 Bit-depth Conversion
manipulate a sound and then being re-
recorded is also nothing new as it is In todays media content environment
something that has been done through the everything is designed to be compatible
use of echo-chambers in recording studios with portable media, meaning small file
for many years. In 1959, an echo-chamber sizes. This has been the case with audio
was famously used by Irving Townsend for a long time, since the use of audio on
during the post production process of computers and the adoption of mp3 and
Miles Davis's 1959 album 'Kind of Blue'. similar formats, audio file sizes have
"[the effect of the echo chamber on kind of become very small. However as audio file
blue is] just a bit of sweetening. At 30th size if reduced so is quality through the
Street, a line was run from the mixing introduction of distortion and other
console down into a low-ceilinged, compression artifacts.
concrete basement room - about twelve by As discussed earlier in the dissertation and
fifteen feet in size - where we set up a by Francis Rumsey in his paper 'Who's
speaker and a good omnidirectional Head is it Anyway?' (Rumsey, F. 2011)
microphone." (Kahn, A, 2002, p.102). timbral quality is key to achieving
Although major motion picture companies successful binaural reproduction. Rumsey
do not seem to be manipulating an entire states that "In practice, though,
mix using similar techniques, this example implementations [of binaural synthesis]
of using space to influence the sound of a can be far from ideal...the reasons for this
are numerous, but can include the lack of
9
variation due to head movement (which allowing surround signals to be fed to
help to resolve front-back dilemmas), separate drivers independently, offering a
distortion in the spectral cues, and poor different surround experience over
headphone equalization" (Rumsey, F. 2011 headphones when compared to the
p.1). When file size of media is reduced, processors. Both these technologies are
resulting in a damaging effect on quality, designed with gaming in mind to enhance
the negative implications on spatial audio the user experience by immersing them
are introduced, and where some within the gaming environment. The rise of
consumers might be happy to listen to this technology in recent years shows
audio at such qualities, it could be evidence of a desire for a more immersive,
damaging to spatially enhanced audio. surrounding experience in modern content
With binaural synthesis and other consumption. These products are also not
techniques of improving spatial awareness cheap, costing between $79.99 RRP for a
in audio, attempts are made to replicate Dolby Surround Processor (Anon, (P). n.d)
something that is natural, "of course, all and up to $179.99 RRP for a top end
audio listening is really binaural, as long as Razor headset (Anon, (A). n.d), prices that
the listener has two active ears" (Rumsey, in 2012 surpass the gaming units they are
F. 2011), so when anything unnatural is intended for use with.
introduced, compression artefacts for
e x a m p l e , t h e b i n a u r a l e ff e c t c a n More convenient surround sound options
completely disappear. are also appearing in other consumer
environments. MPEG Surround was
released in 2006 and allows multi-channel
(5.1 and 7.1) audio to be compressed to
2.8 End User sizes as small as 64kb/s, allowing flexibility
for surround sound on devices with smaller
In recent years growth has been seen in storage capabilities (iPod's and tablets)
the need to explore further possibilities of (Anon, (C). n.d). The compressed file will
audio in everyday media consumption to also work anywhere AAC, HE-AAC, and
include making surround audio available to MPEG layer 2 codecs are used (Anon, (C).
the masses. The gaming industry has n.d), meaning there should not be many
adopted technology to allow users to hear limitations of usage in today's consumer
in surround via the use of external market. More interestingly though is the
processors and in some cases headsets ability to play MPEG Surround files over
featuring more than the conventional two stereo systems. The technology is
drivers. Turtlebeach and Razor are two completely backwards compatible in the
companies specialising in audio for video way that it makes its own stereo or mono
games whose product lines boast the down-mix of the multi-channel audio when
ability to experience surround sound over used without the MPEG Surround decoder.
stereo headsets, and headsets featuring
multiple speakers. Turtlebeach supply the Another piece of technology that has been
gaming market with hardware processors aimed more towards audio professions is
such as the DSS2 which inputs a optical the Focusrite VRM (Virtual Reference
audio cable from an Xbox 360, PS3 or PC Monitoring) Box. The VRM Box gives
and outputs a stereo 3.5 TRS (Anon, (P). users the experience of "listening to your
n.d). The unit features Dolby Surround mix on multiple sets of speakers, in
Sound processing to give the user a different rooms, just using
surround sound experience. Both Razor headphones" (Anon, (D). n,d). The
and Turtlebeach also offer headsets with technology works through the use of
multiple drivers in each speaker cup, such convolution reverb to put a piece of audio
as the Turtlebeach Z6a (Anon, (Q). n.d) inside a virtual space. The commercial aim
and the Razer Tiamat 7.1 (Anon, (A). n.d), of the product is to help mixing engineer
10
monitor how their mix will translate in 3. Methodology
different spaces and over different
monitors but it may also have further use 3.1 Introduction
in the research. Convolution reverb will be
discussed in the dissertation as a means During this research paper the aim was to
of moving audio into a different location. investigate the areas of achieving a
While studying devices such as the realistic spatial 2.0 soundtrack to
Turtlebeach DSS2 (as mentioned above) accompany video. As of 2013 there has
one key aspect that is missing is the sense been very recent research into this area
of space, or externalisation. The topic of (Pink, C. 2012), although major motion
externalisation is well discussed picture production companies do not seem
throughout the dissertation and to have changed the way in which they
convolution reverb, as found in the VRM work with stereo soundtracks in recent
Box, is one approach to achieving this. For years. The majority of the research for the
this technology to become directly dissertation has been of qualitative nature,
influential to the dissertation it will have to and has been conducted through
be realised on a more commercially analysing past research documents and
available level, and not aimed primarily at exploring the findings with the current
audio professionals. researchers in this area. The only
challenges the dissertation has faced
revolved around not being able to discuss
the research with industry professionals.
The primary data has been collected from
analysing past research papers allowing
the dissertation to develop hypotheses.
However, as these hypotheses in the
dissertation have only been backed up by
the opinions and theories of past research,
and although in the majority of cases these
opinions and theories are backed-up with
relevant data and references, the
dissertation required discussion from
industry professionals and in some cases
by the individuals who carried out the
research that these hypotheses are based
upon.
3.2 Summary
This research paper has been conducted

using qualitative research methods
including analysis of past research into the
areas of 3D sound, multi-channel audio,
binaural audio, and areas of study linked
with the topics mentioned. Interviews have
also been conducted with researchers
currently working, and those who have
published papers within this area of
study.The research started by analysing
past work into 3D audio over stereo
11
headphone and the surround areas of construction, rather than description or
study. The dissertation has been able to application of existing
analyse relevant topics of study and theories" (Silverman, D 2011 p. 67). As the
contribute them to the research, topics that dissertation is based around grounded
would later be discussed during interviews. theory, it has benefited from not starting
Early research into binaural audio allowed out with a defined hypothesis, and instead
the dissertation to explore the possibilities been able to develop one over time
of reproducing multi-channel audio in a through data analysis and theory
binaural environment, and through construction.
knowledge acquisition of past literature by
researchers such as Rumsey (Rumsey, F. The dissertation started as quite a focused
2001), the dissertation was able to piece of work, developing out of a single
construct theory on the subject. research topic and through methods of
grounded theory, grew into a wider area of
Using interviews the dissertation has been study with several sub-topics. Because of
able to discuss, and more importantly this the research was able to be very
evolve theories using input from current focused throughout the length of the
researchers. These interviews have helped paper. "It makes sense to begin by trying
construct more solid research, and in to develop a detailed analysis of a very
some places validate the text analysis. limited amount of data (intensive analysis).
These interviews have also aided the text This should provide a good initial grasp of
analysis by allowing a second opinion, or a the phenomena with which you are
different point of view to influence the concerned" (Silverman, D. 2011 p. 62).
knowledge supporting the theory. The This means the whole collection of data
research has also been able to establish did not need to be analysed and therefore
how the results will be validated through time was saved during my data collection.
appropriateness. Through data analysis This does however impose some issues
and interview the dissertation has been on the data collection for the research.
able to identify key aspect of making the
results and theories discussed appropriate As the research started by analysing a
for the consuming market. The dissertation small collection of research papers, early
has noticed trends on what a consumer hypothesis were drawn and interviews
will be comfortable adapting to their were conducted based these hypotheses.
listening/viewing experience of portable The majority of research material that was
digital media, and therefore what the future acquired for the dissertation was chosen
developments in this research will have to because of issues or topics brought about
abide to. Using this understanding of during analysis of research carried out by
appropriateness in the audio/visual a very small number of people. This can
portable digital market, the dissertation will intern influence the research by the
be able to validate any results found. opinions of a very small number of people.
Coolican states "most qualitative workers
realise and admit that what their report is
one version among many that are
3.3 Extended Methodology possible" (Coolican, H. 2008 p. 563). It
was therefore important to be careful
The dissertation has been constructed during the early stages of the research
using grounded theory. Grounded theory is when the dissertation was examining
a method of qualitative research in which a theories of 5.1 conversion for binaural
theory is constructed through data audio. Problems in these theories were
analysis, followed by testing hypotheses pointed out very early on in the research
against further research and/or theory. but it was important not to let the outcome
"The purpose of grounded theory is theory of this theory establish a route for the rest
12
of my research, but instead allow that Care had to be taken when evaluating the
theory to explore a conclusion and carry interviewees contributions to the research.
on with other topics that influenced the By interviewing a researcher on a chosen
overall research, ambisonics and topic, Chris Pike on the conversion of 5.1
wavefield synthesis for example. Allowing to binaural for example, the research is
these other research topics to not be asking what he thinks about the topic and
influenced by the results of another, as what his research has produced. During
long as they were not directly impacted by this interview Chris Pike referred to an
the results, meant that the research could article which backed up his results on the
continue on track without allowing one subject, allowing the research to take a
avenue of thought to direct the entire referenced opinion into consideration.
dissertation. However consideration of the validity of
other researchers work had to be taken
As the research started by conducting data into account before the work would be able
analysis, several hypotheses that had to influence the dissertation and further
emerged would have to be discussed. research. It was important to state that the
Interviews were conducted to aid in the results presented by other researchers not
evaluation of existing hypotheses allowing backed up by theory were based on
them to develop along with continuing personal opinion.
knowledge acquisition. By using this
workflow the dissertation has evolved
theories while being influenced by other 3.4 Method of Research Analysis
researchers work through comment and
discussion. This method of research was The analysis for this paper shows signs of
similar to that of grounded theory. It is theoretical sampling, being that the
explained by Silverman (Silverman, D. research has constantly been collecting,
2011) that grounded research involves coding, and analysing results while
simultaneously coding memo writing, data drawing conclusions along the way. The
analysis and theoretical sampling, while research did not begin with predefined
constructing and adapting theory. It is a deadlines, or separate time frames for
method that has allowed the dissertation to data collection and analysis. Because of
develop and adapt several hypotheses. this the analysis and data collection have
taking place simultaneously. Silverman
The type of interview the research involves explains grounded theory as "a continuous
can be most closely related to movement between data, memos and
constructionism in an semi-structured theory so that data analysis is theoretically
interview, being that a set collection of based and theory is grounded in
questions or topics that needed to be d a t a " ( S i l v e r m a n , D . 2 0 11 p . 7 1 ) .
discussed were introduced, but also Performing analysis this way has allowed
leaving the interviewees enough space to the dissertation to analyse the data using
express their own opinions in a descriptive comments and further data collected by
manner, as opposed to a very 'yes-or-no' interviews and discussions of hypotheses.
fashion. Having interviews in the research This meant that after analysis was
allows other researchers theories and complete, all of the conclusions and
results to be taken into account, and as results had been backed up by tested
the research was largely influenced by theories. Any conclusions brought about in
grounded theory, this becomes a both the research have been discussed, or
positive and a negative. Of course it is contain elements of discussion with other
important to note that "What an interview researchers in the respective areas.
produces is a particular representation or
account of an individual's views or
opinions" (Silverman, D. 2011. p.168).
13
4. Results Below is a table including all methods of
creating a realistic 2.0 soundtrack explored
4.1 Introduction in the research, as well as all technical
requirements of using these methods (See
During the dissertation several means and fig. 1). The table below details what action
techniques of creating a realistic 2.0 is needed, if any, in order to further explore
soundtrack in portable media have been any particular method.
explored, and during this section the
results found in this research will be
outlined
14
The dissertation has explored many of the 5 Discussion
most discussed methods of achieving an
improved spatial awareness over 5.1 Timbral Issues
headphones and conclusions have been
drawn. Many of these topics are discussed In January 2012 BBC R&D worked
in greater detail through the discussion together with BBC Radio 4 to produce a
section where the points raised in the table binaural production of Private Peaceful,
above are explored in greater detail. the book by Michael Morpurgo (Brun, R.
2012). The 88 minute dramatization
featured a reproduction of a 5.1 speaker
system, and had 4 variations. At the start
of each variation the listener would hear a
series of test signals allowing for a choice
of which version gives the listener the best
spatial experience. By doing this, BBC
R&D have accepted that there will be
variations on the success of the binaural
reproduction, and therefore provided
different mixes based on different sets of
HRTF data. The release of Private
Peaceful had an accompanying survey
which all listeners were asked to complete.
It asked questions about the success that
the binaural reproduction had with the
listeners and which version (1-4) the
listener though was most successful.
During an interview with Chris Pike from

BBC R&D in September 2012, Pike stated
that "you may get good spatial impression
but timbral coloration is often an
issue" (Pike, C. Personal Communication,
2012). The issue of timbral coloration is
mentioned in a large amount of spatial
enhancement research and is sometimes
seen as the outcome of the misuse or
insufficient amount of HRTF data when
reproducing binaural audio for example, or
the fact that the end-user simply will not
respond well to the collected HRTF data.
Francis Rumsey states in the 2011 article
'Whose head is it anyway?' that "badly
implemented HRTFs can give rise to poor
timbral quality, poor externalisation, and a
host of other unwanted results" (Rumsey,
F. 2011. p.1). Getting the HRTF data
correct is obviously a key point in making
the final product a success, and possibly
by making the HRTF data as extensive as
possible, there will be less room for error
such as timbral issues. The HRTFs used
for Private Peaceful were designed by
15
measuring impulse responses in a recording techniques (this topic is
reverberant room, done so to capture a discussed in more detail later in this
sense of space, but is not very external section). One immediately apparent
and there are obvious timbral issues as difference in the mentioned techniques is
pointed out by Pike. To achieve a sense of the sense of externalisation that is present
externalisation as well as obtaining the in the end results. Later discussion into the
original timbral quality of the audio, it subject has suggested that this outcome is
would appear that more extensive work on mainly a result of the space the HRTF
producing HRTFs is needed. The HRIR data, or binaural re-recording has taken
(head related impulse response) data used place in. "[Experiencing externalisation]
for Private Peaceful achieves good depends how reverberant the room is that
localization on the horizontal field, but the impulse response, or recording, was
there is possibly a need for more attention made in" (Mason, R. Personal
to EQ data and information that can be communication, 2013). Dr Mason
learnt from harnessing the use of the continues to explain that achieving a
concha in the binaural reproduction. usable sense of space can be "down to the
fact that its not your own pinna, and youʼre
Juha Merimaa's, (Merimaa, J. 2010) from having to use someone else's ears
Sennheiser Research Laboratories in effectively" (Mason, R. Personal
California discusses using HRTF filters communication, 2013). This last point can
and EQ to reduce timbral issues in his conclude a lot of research topics. When
paper entitled 'Modification of HRTF Filters recording a series of HRTF data, only a
to Reduce Timbral Effects in Binaural limited amount of measurements can be
Synthesis, Part 2: Individual taken for distribution, and the end-users
HRTFs' (2010). His research found that will have to find the best results for
using HRTF filters to reduce timbral issues themselves. Of course the best HRTF data
did not affect the spatial localisation for any individuals will be the information
previously achieved using the data when that would be collect from their own pinna,
tested on a panel of listeners. This not something that content creators for
explains that there are ways of reducing mobile applications are currently taking
the effects of timbral issues on audio that part in. Because of this timbral issues may
has been processed with HRTF data, but be unavoidable while using non-personal
this does mean further EQ manipulation of HRTF data, or attempting to distribute any
the audio. If this route is to be further audio that has already been affected by
explored, researchers will have to be spatial manipulation. It may be that the
happy with the fact that the audio is being most feasible route to improving spatial
manipulated in great amounts to achieve a awareness in audio is to explore the
greater sense of spacial awareness, and possibilities of head tracking or other
that this further manipulation will cause methods of collecting HRTF data at the
irreversible changes to the audio, user-end.
something content creators may not be
happy with. Consideration will have to be
taken into how much manipulation is
appropriate and to what extent, if any, will 5.2 Timbral Issues Related to
this affect the end users experience.It is Headphones
important to consider the room that the
BRIR and HRTF data has been collected The headphones used by consumers will
in, as different rooms will influence the end inevitably make an impact on the end
results. The dissertation has focused on results. An issue surrounding headphone
the differences between collecting HRTF use is the wide range in quality of
and BRIR data compared to that of re- consumer level headphones. Many mp3
recording surround audio using binaural players and tablets are traditionally
16
supplied with low budget earphones and of the spatial quality for the end-
these can cause problems for spatially u s e r. H a v i n g h e a d p h o n e s t h a t a d d
enhanced audio. colouration to the audio heard will
undoubtedly make disruptive changes.
Ideal listening conditions will most likely be This can disrupt any chances of
experienced with headphones designed experiencing an improved spatial
and calibrated to give an as flat frequency soundtrack.
response as possible in order to reduce
colouration of the audio the user is
listening to. In most circumstances this has
not seemed enough of a problem for end- 5.3 5.1 Surround to Binaural
users to make an investment into
headphones that will allow them to hear The dissertation has found that using
audio exactly how the creator of the HRTFs is currently the most favorable
content intended, and will instead continue method of improving spatial awareness in
to use bundled headphones, or in some audio (Pike, C. 2012), (Rumsey, F. 2011),
cases make investments into headphones (Merimma, J. 2010), but there are other
endorsed and branded by certain artists. methods. One less explored method
involves the re-recording of a surround
Previously the dissertation has discussed sound speaker set-up with a binaural
the issues of timbral effects present while microphone set-up, therefore producing a
using BRIR and HRTF data to create recording that should simulate the
spatially improved audio, techniques used experience of being surrounded by
by Chris Pike and BBC R&D. The results audio.By re-recording the audio in this
experienced timbral issues and therefore environment all of the reflections of audio
this method may not yet be a successful that help to construct the space in which
way of creating spatially enhanced audio 'the listener', in this case a binaural
for headphones, but these timbral issues microphone, is placed will be recorded
are also experienced with headphone along with the audio itself. These
choice. "[Are timbral issues brought about recordings feature the effect of time and
by the use of BRIR and HRFT data] any intensity differences experienced by the
worse than the difference between some listener, any EQ difference experienced by
cheap headphones that you get with an the positioning of the speakers in relation
mp3 player versus some nice to the height of the listeners ears, and any
Sennhesiers" (R, Mason. Personal resonance of audio experienced in the
communication, 2013). outer or inner ear of the listener.
The point raised by DR Mason in this Using re-recording techniques like this
interview is that even if a successful loses control over the space of the room,
reproduction of audio in a more realistic meaning once the audio is recorded, the
3D space is achieved, the effect could be size of the room cannot be changed. The
damaged by the end users choice of audio is also being applied to one set of
headphones. Cheaper headphones, and HRTF data, one room, and a fixed
indeed more expensive headphones with position, meaning head tracking would not
EQ colouration, will have an influence on be an option for audio re-recorded in this
how the audio is heard by the end user. manner. As re-recording techniques will
With the research that this paper use a single impersonal collection of
discusses into constructing HRTF and HRTFs, users will likely find the recordings
BRIR data through measuring impulse confusing at first, and therefore lower the
responses and EQ, among other methods, chances of success. Having one output
it is clear that this data would need to from this technique means the end user
remain intact for a successful reproduction will not be able to choose which binaural
17
recording works best for them, as seen in even though the re-recording technique
the Private Peaceful radio drama produced can achieve reverb, and therefore assist in
by BBC R&D and BBC Radio 4. Having a externalisation, it does not achieve spacial
re-recording of multi-channel audio in awareness of multiple sound sources or
binaural will remove multiple versions of solve issues of impersonal HRTFs.
audio that will not fit onto distribution
formats such as DVD and Blu-ray, but also
limits the audio being well received by end
users, and therefore will not work as a 5.4 Issues with 5.1 Surround - Binaural
simple solution to the problem. re-recording
These techniques however are seen in It is however important to point out that
other areas of audio production. Walter using re-recording techniques with multi
Murch, sound designer and re-recording channel audio is a more complex method
mixer for films such as Apocalypse Now of processing than that used on Kind of
(1979) and the Godfather trilogy (1972, Blue, where the mix was only just
1974, 1990) has discussed his use of experimenting with the use of two-channel
'worldizing', a technique in which sounds stereo.
or lines of dialogue are played over
loudspeaker into a desired location, and Consideration will have to be made to the
then re-recorded for use in post production equipment used in the re-recording, what
(Murch, W. n.d). These lines of dialogue limitations this equipment will have, and
were then balanced against the original what effects the equipment will have on
recordings until the desired effect was the audio, colouration or lose in
produced. This example shows how the frequencies for example. Using such
space required for the audio could not be recording methods for multi-channel audio
designed in post production, and therefore as previously discussed involves binaural
needed to be sourced and recorded. Of microphones and therefore a choice as to
course this example dates back to a time what speaker arraignment and which
when digital reverb was not at everyone's microphone is used will have to be made.
disposal, and the only way to replicate a As previously mentioned having an
space was to record it. This technique impersonal set of HRTF data with binaural
won't in itself solve spatial awareness for recordings can introduce timbral and
multi-channel audio, but does address the spatial problems for users. Decisions
issues of increasing a sense of would also have to be made about the
externalisation in the same way as amount of binaural recordings produced
convolution reverb will. A further example for any media, and how these mixes will
of this has been discussed previously in be implemented into the market. Content
this paper through the work of Irvin creators and distributors would have to
Townsend during his work on the 1959 consider whether having a ʻbinaural mixʼ
album Kind of Blue by Miles Davis (1959). optional on a DVD, Blu-ray or download
Once the album had been mixed, every would be well received by the end user
component of the mix was sent to an market.
echo-chamber. (Kahn, A, 2002, p.102).
This example shows how an entire body of
commercial work was processed through
re-recording techniques, and hopefully the 5.5 Audio Visual Scene
lifetime of this piece of work shows that
this process does not detract from the As discussed earlier when looking at
usability of the final product. Again it technology such as the Vuzix Wrap 1200
should be mentioned that this technique (Anon, (B). n.d), It is important to consider
was used on a mono sound source, and the relationship between audio and video
18
in consumer technology. This relationship at the movies or in home theatre is not
is different to everyday life as normally discussed in the dissertation. The
audio and visual scenes are locked discussion over the audio visual scene is
together. For example as car drives from based on a ʻdefault positionʼ in which users
point A to point B, the visual cue (the car) of mobile devices will place their device in
and the sound cue (sound produced by the front of the line of sight in order to
car) will move together and therefore have experience a pleasant and comfortable
a fixed relationship. The challenge viewing experience. The same argument
experienced with the audio/visual scene of a ʻdefaultʼ position is used when
relationship in binaural synthesis, and discussing speaker set-ups, in which users
indeed in all headphone use, is that the will sit within the speaker arrangement and
audio scene is fixed to the users head, and in front of a screen.
the visual scene is not. This means turning
away from the visual source displayed will The requirements for head-tracking to
not have any effect on the audio that work also include slightly more
accompanies it, and this removes some sophisticated hardware than seen in
very basic human functionality. smartphones and tablets in 2013. This is
due to the hardware requirements needed
One solution to audio-visual scene to track and alter real time HRTF data, see
problems is head tracking. "Head tracking Smyth Research Realiser A8 for example
is a means by which the head movements (Anon, (E). n.d). In Francis Rumseys book
of the listener can be monitored by the Spatial Audio he speaks of experiments
replay system" (Rumsey, F. 2001 p. 72). carried out at the Institute für
The replay system being used has to track Rundfunkteknik in 1999, and notes that
where the users head is and in what experiments carried out using head
direction it is facing, and modify the HRTFs tracking for the localization of sound
in real time. Head tracking may become improved the usability of binaural audio.
very useful in the future of 3D audio as it "Front-back reversals were virtually
can simulate normal physical actions eliminated. Even more interestingly, they
performed in everyday hearing, for found that substituting the dummy head for
example, when a sound is heard off to one a simple sphere microphone (no pinnae)
direction that needs to be investigated, produced very similar results, suggesting
humans will turn their heads, sometimes that the additional spectral cues provided
only slightly, to give a better audio image by the pinnae were of relatively low
of the sound source. With head tracking, importance compared with the effect of
this is possible as the centre of the audio head rotation" (Rumsey, F. 2001 pg 73).
scene is controllable. This therefore is a These results show that the use of head
downside to using spatially enhanced movement is more important than what
recordings without head tracking. If a can be learnt from the human head and
sound cue is away from the centre of a mix spectral cues alone, and that maybe to
in spatially enhanced audio where head- achieve a full reproduction of 3D audio
tracking is not present, the sound source over headphone, head-tracking may be
cannot be moved towards the centre of the hard to ignore.
audio scene. It will remain fixed in
whatever location it was placed in during
the mix. But is this really that much of a
problem seeing as the creators of the 5.6 End user manipulation
content have control over where sound is
placed in multi-channel loudspeaker set- It has become apparent that delivering a
ups? Whether users often feel the need to piece of ready-to-go spatially improved
explore their audio surroundings when multi-channel audio has many issues
listening to a multi-channel speaker set-up surrounding it. Early in the research head-
19
tracking was ruled out as research topic spatially enhanced audio on a portable
and a possible method of improving 3D media device.
spatial awareness for end-users based on
the fact that end-user technology would
have to adapted. Although it has become
apparent that this may be the only option
available in order to avoid the series of
issues involved with manipulating audio
pre-user involvement as discussed
previously.
Symth Research has developed a

hardware unit designed to measure ones
HRTF data and store it for use in media
playback (Anon, (E). n.d). This technology
allows user to avoid issues previously
discussed through using impersonal HRTF
data. The technology also accommodates
for head movement during viewing by
"monitoring the position of the listener's
head every five milliseconds" (Anon, (E).
n.d). This combination of personal HRTF
data and head-tracking could then be the
answer for improving spatial awareness in
audio, but it doesn't solve the problem for
portable media. It however may not be
unachievable for similar results to be
realised in portable media. Most popular
mobile phones in 2013 such as the Apple
iPhone feature accelerometer technology
(Anon, (R). 2013), a way of the phone
knowing at what degree of tilt or rotation it
is in relation to gravitational pull, and with
calibration, in what spatial relationship the
phone is in relation to the user. There are
also methods of headtracking making their
way into mobile platform (Lim, J. 2011). If
this technology could be calibrated to the
position of the user it would therefore,
theoretically, be able to know its positional
relationship to the users head, and
therefore with the addition of personalised
H RT F d a t a , a c c e s s i b l e t h r o u g h a
calibration process, be able to manipulate
audio in order to achieve an increased
sense of spatial awareness. This head
tracking technology combined with
convolution reverb technology previously
discussed in relation to the VRM Box
(Anon, (D). n.d), may be the most feasible
and appropriate method of achieving
20
6 Conclusion multi-channel audio for decoding at an
end-user level.
The following concludes the key areas of
discussion found to be most relevant. It is also worth noting that badly
Each subject featured in the conclusion implemented HRTF's could quickly
has been previously discussed in the damage the possibility of technology
dissertation. The conclusion also features striving to improve spatial awareness over
what has been seen as the most headphones from being accepted. If end-
appropriate areas of further research. users do not experience an improvement
in their spatial awareness of audio from
the first use, as the technology would
advertise, then the credibility of the
6.1 HRTFs technology would be greatly damaged,
and end users could possibly lose faith in
The majority of methods employing a one- achieving greater spatial awareness.
size-fits-all solution will without doubt face
problems at an end-user level. The
dissertation has been heavily focused
throughout on improving spatial 6.2 Externalisation
awareness for multi-channel audio on
portable media with the end user in mind, Methods of achieving externalisation
and although methods such as measuring through technology such as the VRM Box
HRTF data or binaural re-recording may from Focusrite have been discussed, but
achieve success on a theoretical level, it as with many aspects of spatial audio what
has been shown that this may not be the works for one individual may not work for
case for every end-user. It has been another. As explained by Dr Russell
commented on in several pieces of Mason "for some listeners it can work fine,
research that have been discussed that as in people get externalisation and they
timbral issues are the main problem of can imagine it being there, wherever the
achieving greater spatial awareness (Pike, loudspeakers are arranged, without head-
C. Personal Communication. 2012). The tracking. [For] other listeners it never
solution to creating better spatial seems to work, they never seem to get
awareness for end-users will rely heavily away from the fact that they have
on the appropriateness of whatever headphones on, so therefore it must be
technology is decided on, and if timbral inside my head... People are used to
issues are a worry after creating four listening on headphones and getting
separate mixes (Pike, C. Personal [audio] inside their head so therefore they
Communication. 2012), it may suggest that can't believe it's anything
implementing HRFT's prior to distribution different" (Mason, R. Personal
may not be appropriate for the end-user. communication, 2013). Any results
concluded will always be dependent on
The dissertation has concluded that acceptance from end-users. As Dr Mason
however extensive the HRTF data that is previously stated, some users simply will
collected at a content creation level, it may never accept externalisation while listening
never be close enough to the personal to audio over headphones.
HRFT data that end users can obtain
through technology such as the Smyth
Research Realiser A8 (Anon, (E). n.d). The
conclusion can be made that the most 6.3 Re-recording Techniques
appropriate method of making multi-
channel audio accessible to end-users A binaural re-recording of multi-channel
over headphones would be to distribute audio would create a 3D image, and
21
depending on the room would capture a personalised HRTF data for the end-user
good sense of externalisation, but through a similar calibration set-up seen
unfortunately the same issues surrounding with the Smyth Research Realiser A8, and
a one-size-fits-all HRTF recording still be able to track head movements of the
apply with re-recording. Many end-users user, together with room and monitor
will undoubtedly experience issues characteristics collected through
because "its not your own pinna, and you convolution reverb and BRIR, similar to
are having to use someone else's ears what is seen with the VRM Box, then this
effectively" (Mason, R. Personal may give the user the best chance of
communication, 2013). As discussed experiencing an improved spatial
previously, using someone else's pinna awareness while listening to audio on
can cause confusion and timbral issues for portable media devices.
many users.
By having an application implement the
Another key point made at the beginning spatial characteristics needed to improve
of the dissertation was to cause as little spatial awareness at the end-user stage,
disturbance to how content creators and nothing is being asked from the content
end-users work and consume their audio creators. Access to multi-channel audio is
content. By introducing binaural re- possible through the use of DVD and Blu-
recording techniques as a possible ray, and although this information would
solution, content creators are being asked need to be extracted from such formats,
to introduce a further stage in the using this multi-channel audio would mean
distribution process. Asking the production no effort would have to be made by
companies to introduce such a process content creators/distributors to allow end-
would not appear to be an appropriate users to experience improved spatial
option. awareness in portable media.
Having an app deliver the spatially

enhanced audio content also allows end-
6.4 Suggested Further Research users to opt-in to this new form of
experiencing audio content. It makes the
The most feasible chance of experiencing use of headtracking more appropriate to
an improved spatial awareness when end-users, as like with the Smyth
listening to audio over headphones or ear- Research Realiser A8 (Anon, (E). n.d), the
buds is to adapt the listening experience of end-user will be aware of the need of head
the end-user. Some levels of success have tracking hardware before making a
been seen while introducing hardware to decision on whether to adopt the
the end-user such as the Smyth Research technology. By doing so the technology
Realiser A8 (Anon, (E). n.d), and the would be staying true to the strong
Focusrite VRM Box. Both pieces of emphasis that changing consumer habits
technology are designed to enhance the would have to be kept as minimal as
listening experience but are not portable possible.
solutions.
There are of course several factors that
As explained in the discussion the most would have to be considered in continuing
appropriate and feasible way of improving research into this area. Suggested further
spatial awareness in portable media may research topics into creating a mobile app
rely on a piece of software, or app, to will include;
achieve what is seen with the Smyth
Research Realiser A8 (Anon, (E). n.d) and - Whether a mobile phone/tablet will be
Focusrite VRM Box (Anon, (D). n.d). If a able to accurately track the
mobile app was able to create
22
movements of a user's head, or whether
additional hardware will need to be
used in order to do so.
- Developing an appropriate method for a

mobile application to collect
personalised HRTF data.
- And finally having a mobile application

gain access to multi channel audio
including 2.0 stereo and 5.1 surround.
23
7 Glossary: Ambisonics - Multi-channel recording
technique designed to reproduce acoustic
2.0 Stereo - Stereo sound featuring two and directionality in sound recordings.
channels of audio designed to be played in
front of the listener, isometrically distant HRTF - Head Related Transfer Function.
from the listeners head. A collection of personal measurements
documenting how an individual hears
5.1 Surround - Surround sound format sound in relation to acoustic properties
consisting of three front channels, two rear and directionality.
channels, and a sub channel (LFE).
HRIR - Head Related Impulse Responses.
7.1 Surround - Surround sound format An impulse response recorded in any
consisting of three front channels, two side given space normally done with a binaural
channels, two rear channels, and a sub head
channel (LFE)
Impulse Response - A short (normally 1
LFE - Low Frequency Effects. This frame) sound, normally a clap or staring
channel is the (.1) in a multi channel setup pistol, recorded in a room. Recording the
sound excites the acoustic properties of
Binaural - Binaural refers to audio the room for use in convolution reverb.
recordings that attempt to replicate the
human hearing experience as closely as ITD - Inter-aural Time Difference. The time
possible. Binaural recordings are often difference between two ears experienced
made using a dummy head featuring from a sound reaching the head.
microphones where the eardrums are
found in a human. More sophisticated ILD/IID - Inter-aural Loudness Difference.
models feature moulds of real human ears, The loudness/intensity difference between
and the addition of synthetic ear canals. In two ears experienced from a sound
some cases shoulders are also added to reaching the head.
the dummy head to assist in replicating the
human body more accurately. EQ - The frequency information of sound.
Dummy head - A model of a human head Phantom Image - A sound source (not of
featuring microphones where eardrums physical existence) created between
would be found. Other human physical sound sources through the use of
characteristics are added to more panning.
sophisticated models.
BRIR - Binaural Room Impulse
Binaural Synthesis - Refers to post Responses. Impulse response recordings
production techniques designed to used to excite a rooms acoustic properties
manipulate stereo recordings with the allowing for use in post production. For
desire of making them more realistic in BRIR these recordings are made using a
reference to human hearing. binaural head.
Pinna - The outer most part of the ear.

This is the part of the ear visible outside
the body.
Timbre - The characteristic quality of a

sound.
24
8 References:
Anon, (I). (n.d) SC-BTT290 3D Blu-ray
Apocalypse Now. (1979) Directed by Home Cinema [Online]. Panasonic.
Francis Ford Coppola. 153 mins. Zoetrope Available from: http://
Studios. DVD w w w. p a n a s o n i c . c o . u k / h t m l / e n _ G B /
Products/Home+Entertainment/Blu-ray
Anon, (A). (n.d) Razer Tiamat 7.1 [Online]. +Home+Cinema/SC-BTT290/Overview/
R a z e r. Av a i l a b l e f r o m : h t t p : / / 9406265/index.html [Accessed 11th April
www.razerzone.com/store/razer-tiamat-71 2013]
[Accessed 09 March 2013].
Anon, (J). (n.d) 5.1 Home theater 3D Blu-
Anon, (B). (n.d) Vuzix Wrap 1200 [Online]. ray, dock iPod/iPhone [Online]. Phillips.
Vuzix. Available from: http:// Available from: http://www.philips.co.uk/c/
www.vuzix.com/consumer/ home-cinema-systems/3d-
products_wrap_1200.html [Accessed 20 blu-ray-dock-ipod-iphone-hts5562_12/prd/
Feburary 2013] [Accessed 11th April 2013]
Anon, (C). (n.d) MPEG Surround [Online]. Anon, (K). (2013) In the UK, Will Mobile
MPEG Surround. Available from: http:// Payment Go Mainstream? [Online].
w w w. m p e g s u r r o u n d . c o m / i n d e x . h t m l e M a r k e t e r. Av a i l a b l e f r o m : h t t p : / /
[Accessed 23 January 2013] www.emarketer.com/Article/UK-Will-
Mobile-
Anon, (D). (n.d) VRM Box [Online]. Payments-Go-Mainstream/1009641
Focusrite. Available from: http:// [Accessed 11th April 2013]
uk.focusrite.com/usb-audio-interfaces/vrm-
box [Accessed 21 March 2013] Anon, (L). (n.d) Half of Tablet and
Smartphone Users Are Using These
Anon, (E). (n.d) Realiser A8 [Online] Devices to Listen to Music, According to
Smyth Research. Available from: http:// The NPD Group [Online]. NPD Group.
smyth-research.com/products.html Available from: https://www.npd.com/wps/
[Accessed 21 March 2013] portal/npd/us/news/press-
releases/half-of-tablet-and-smartphone-
Anon, (F). (n.d) BDV-E2100 3D Blu-rayTM users-are-using-these-devices-to-
Home Cinema System [Online]. Sony. listen-to-music-according-to-the-npd-
Available from: http://www.sony.co.uk/ group/ [Accessed 11th April 2013]
product/hch-systems-with-blu-ray-
disc/bdv-e2100 [Accessed 11th April 2013] Anon, (M). (n.d) Complete 5.1 Surround
Sound System Licenced for Xbox 360 HD
Anon, (G). (n.d) HT-D550W; 5.1ch DVD Game Console [Online]. Pioneer. Available
Home Theatre System [Online]. Samsung. from: http://www.pioneerelectronics.com/
Available from: http://www.samsung.com/ PUSA/Home/Home-Theater-Systems/
hk_en/consumer/tv-av/home-theater/5-1- HTS-GS1[Accessed 11th April 2013]
h o m e - t h e a t r e - s e t / H T- D 5 5 0 W / Z K ?
pid=hk_en_5.1hometheatresetsubtype_ke Anon, (N). (n.d) Logitech Speaker System
y v i s u a l 1 _ h t - d 5 5 0 w / Z906 [Online]. Logitech. Available from:
zk_20130123[Accessed 11th April 2013] http://www.logitech.com/en-gb/product/
speaker-system-Z906 [Accessed 11th April
Anon, (H). (n.d) LG BH9520TW Cinema 2013]
3D Sound Home Cinema System[Online].
LG. Available from: http://www.lg.com/uk/ Anon, (O). (2013) A6 allround quattro;
home-entertainment/lg-BH9520TW-home- Audio & Communication [Online] Audi.
cinema-system [Accessed 11th April 2013]
25
Available from: http://www.audi.co.uk/new- your-face-maybe-better-than-you-do/
cars/a6/a6-allroad-quattro/audio- [Accessed 24 March 2013]
and-communication/bose-surround-
sound.html. [Accessed 11th April 2013] Merimma, J. (2010) Modifications of HRTF
Filters to Reduce Timbral Effects in
Anon, (P). (n.d) Ear Force DSS2 Surround Binaural Synthesis, Part 2: Individual
Sound Processor [Online]. Turtle Beach. HRTFs. AES. USA
Available from: http://
www.turtlebeach.com/product-detail/dolby- Miles Davis. (1959) Kind of Blue. Columbia
processor-accessories/ear-force-dss2/33 Records. California.
[Accessed 12th April 2013]
Murch, W. (n.d) Walter Murch Articles
Anon, (Q). (n.d) Easr Force Z6A Multi- [ O n l i n e ] . F i l m s o u n d . o r g . Av a i l a b l e
Speaker Surround Sound [Online]. Turtle from:http://filmsound.org/murch/murch.htm
Beach. Available from: http:// [Accessed 08 February 2013]
www.turtlebeach.com/product-detail/pc-
headsets/ear-force-z6a/44 [Accessed 13th Purcell, J. (2007) Dialogue Editing for
April 2013] Motion Pictures. Focal Press. Oxford
Anon, (R). (2013) iPhone; Tech Specs Rumsey, F. (2001) Spatial Audio. Focal
[Online]. Apple. Available from: http:// Press. Oxford
www.apple.com/uk/iphone/specs.html
[Accessed 15th april 2013] Rumsey, F. (2011) Whose head is it
anyway?. AES. USA
Brun, R. (2012) Private Peaceful: Drama in
surround sound [Online]. BBC. Available S i l v e r m a n , D . ( 2 0 11 ) I n t e r p r e t i n g
f r o m : h t t p : / / w w w. b b c . c o . u k / b l o g s / Qualitative Data. Sage Publications Ltd.
radio4/2012/02/private_peaceful.html London.
[Accessed 13th April 2013]
The Godfather. (1972) Directed by Francis
Coolican, H. (2008) Research Methods Ford Coppola. 175 mins. Paramount
and Statistics in Psychology. Bookpoint Pictures. DVD
Ltd. Oxon.
The Godfather: Part 2. (1974) Directed by
Evans, S. (2011) Portable Devices Francis Ford Coppola. 200 mins.
[Online]. Harris Interactive. Available Paramount Pictures. DVD
from:http://www.harrisinteractive.com/vault/
HI_UK_Corp-%20Portable-Device- The Godfather: Part 3. (1990) Directed by
Research.pdf [Accessed 11th April 2013] Francis Ford Coppola. 162 mins.
Paramount Pictures. DVD
Han, H.L. (1994) Measuring a Dummy
Head in Search of Pinna Cues*. AES. USA
Kahn, A, (2002), Kind of Blue, The Making

of a masterpiece. Granta Publications.
London
Lim, J (2011) Olaworks knows your face,

maybe better than you do [Online]
Te c h N o d e . Av a i l a b l e f r o m : h t t p : / /
technode.com/2011/08/10/olaworks-
knows-
26
9 Bibliography:
Anon, (2009) Video Compression & H.264

enhancements. BBC. London
Begault, D. (1999) Auditory and non-

auditory factors that potentially influence
virtual acoustic imagery. AES. USA
C h e n g , C . W a k e fi e l d , G . ( 2 0 0 1 )
Introduction to head-related transfer
functions (HRTFs): Representations of
HRTFs in time, frequency, and space*.
AES. USA
Grothe, B. Pecka, M. Mcalpine, D. (2010)

Mechanisms of sound localization in
mammals. American Physiological Society.
Bethesda
27

A Systematic Review of The Most Appropriate Methods of Achieving Spatially Enhanced Audio For Headphone Use

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

A Systematic Review of The Most Appropriate Methods of Achieving Spatially Enhanced Audio For Headphone Use

Hochgeladen von

Copyright:

Verfügbare Formate

The purpose of the dissertation is to

A systematic review of existing consumer

The research found issues surrounding the

Several elements and challenges involved

2 Literature Review ........................................................................................................................6

One issue surrounding the use of binaural

This research paper has been conducted

During an interview with Chris Pike from

Symth Research has developed a

Having an app deliver the spatially

used in order to do so.

- Developing an appropriate method for a

personalised HRTF data.

- And finally having a mobile application

including 2.0 stereo and 5.1 surround.

Pinna - The outer most part of the ear.

Timbre - The characteristic quality of a

Kahn, A, (2002), Kind of Blue, The Making

Lim, J (2011) Olaworks knows your face,

Anon, (2009) Video Compression & H.264

Begault, D. (1999) Auditory and non-

Grothe, B. Pecka, M. Mcalpine, D. (2010)

Das könnte Ihnen auch gefallen