Sie sind auf Seite 1von 21

Michele Fiorentino*

Saverio Debernardis
Antonio E. Uva
Giuseppe Monno
Dipartimento di Meccanica,
Matematica e Management
(DMMM)
Politecnico di Bari
70126 Bari, Italy

Augmented Reality Text Style


Readability with See-Through
Head-Mounted Displays in
Industrial Context

Abstract
The application of augmented reality in industrial environments requires an effective
visualization of text on a see-through head-mounted display (HMD). The main contribution of this work is an empirical study of text styles as viewed through a monocular
optical see-through display on three real workshop backgrounds, examining four colors
and four different text styles. We ran 2,520 test trials with 14 participants using a
mixed design and evaluated completion time and error rates. We found that both presentation mode and background influence the readability of text, but there is no interaction effect between these two variables. Another interesting aspect is that the presentation mode differentially influences completion time and error rate. The present
study allows us to draw some guidelines for an effective use of AR text visualization in
industrial environments. We suggest maximum contrast when reading time is important, and the use of colors to reduce errors. We also recommend a colored billboard
with transparent text where colors have a specific meaning.

Presence, Vol. 22, No. 2, Spring 2013, 171190

Introduction

A valuable application of augmented reality (AR) in an industrial context


is to superimpose technical information on the real world. The main advantage
of this approach when compared to paper/screen-based documentation is that
the added graphics is co-located and visualized in real time. This feature is very
useful to support complex maintenance or assembly processes where most of
personnel time is spent retrieving technical task instructions, localizing parts,
and operating on them in the right order (Uva, Cristiano, Fiorentino, &
Monno, 2010). In this case, a solution could be offered by head-mounted displays (HMDs) that allow task instructions to be superimposed on the real-world
view of the operator. Two main technologies are available for HMD: (1) video
and (2) optical see-through; these technologies have different trade-offs as
described by van Krevelen and Poelman (2010). An optical see-through HMD
could be the ideal candidate for industrial use, because of the real environment
awareness and ergonomics. However, optical see-through systems require a
comprehensive study of the visual perception: color and brightness of the real
environment visually conflicts with the color and/or contrast of the superim-

doi:10.1162/PRES_a_00146
2013 by the Massachusetts Institute of Technology

*Correspondence to fiorentino@poliba.it.

Fiorentino et al. 171

172 PRESENCE: VOLUME 22, NUMBER 2

Figure 1. Text and crosshair superimposed on the HMD during the


user tests.

posed graphical elements (see Figure 1). The main problem of the current technology is that only bright objects
can overlap on the background. In practice, dark colors
appear as semi-transparent and they mix in with the
background. This makes the use of see-through HMD
very challenging, especially in outdoor environments,
where the brightness of the background overcomes the
brightness of the display.
Industrial environments usually are indoor and characterized by controlled lighting, as given by the standard
ISO 8995-1 (ISO, 2002), but visibility problems commonly arise in the readability of the technical text labels
and these limit the effectiveness of this technology.
Literature on the readability of simple text in optical
see-through HMDs is scattered among different disciplines (computer graphics, humancomputer interaction, etc.) and usually it addresses general problems
without satisfying the specific requirements and constraints of industrial workspaces (e.g., standard color
coding, industrial practice, and workshop backgrounds).
It is common practice in industry to follow standard or
personalized color coding rules. For example, the 5S,
one of the most popular workplace organization methods, suggests the use of colors in workspace to enforce
sorting, straightening, systematic cleaning, standardizing, and sustaining (Hirano, 1995). A very common
practice of industrial data visualization which can be supported by AR technology is the use of shop floor paper
tags (see Figure 2). The tags carry important production
information in a cheap, simple, and effective way by text

Figure 2. Example of industrial color tag commonly used in manufacturing.

and color coding. In an aerospace facility, red tags mean


defective products, green tags identify items to be
repaired, and yellow tags classify products that passed
quality control tests and are ready to be shipped out.
Another relevant example of color coding in industrial
practice is in piping. The standard colors for industrial
piping are given in ASME A13.1 (ASME, 2007) which
describes the content of the pipes, potential hazards, and
the direction of flow. Properly labeled pipes improve
safety and productivity by providing employees and
emergency responders with key information. An ARbased visualization system can be very supportive by filtering the technical database and displaying or labeling
only the pipes of interest to the user. A further industrial
reference is the standard ISO 3864 (ISO, 2011a), which
defines safety colors and safety signs for graphical symbols. It describes design principles for safety signs and
markings, product safely labels, graphical symbols, and
their colorimetric and photometric properties. Another
well-known color coding scheme in industry is the
OSHA safety color code for marking physical hazards
(29 CFR 1910.144; OSHA, 2007). This standard states
that red identifies fire protection equipment, emergency
stop devices, and containers holding dangerous materials. Yellow indicates physical safety hazards, such as striking against, stumbling, falling, tripping, and so on.
All these standards refer specifically to colors of
printed/painted supports, and not as visualized on digital displays. In order to fulfill color coding when passing
to an AR-based information system, a methodical
approach is needed. In fact, specific guidelines are
needed for AR devices where color perception could

Fiorentino et al. 173

change, as opposed to printed signs; and as color perception varies, so would readability. Industrial applications
would benefit from these guidelines.
Previous work reports general optimization of visualization, without providing color-based readability
guidelines. The main goal of the presented work is to
study the readability of textual information in indoor
industrial environments with an optical see-through
HMD. Different colors and text styles were combined
to investigate textual visualization on industrial backgrounds. For this purpose, we developed an opensource test workspace to support readability test experiments and made it available to the academic community
(Fiorentino, 2012).
The paper is organized as follows. In Section 2, we
present previous literature, followed by the description
of our approach in Section 3. In Sections 4 and 5, we
present the design of experiments, the results, and a
related discussion. Finally, we present a conclusion and
future work in Section 6.
2

Related Work

The readability of text is strictly related to aspects


of human cognition and perception. In particular,
human beings are sensitive to the contrast between text
and the background on which text is superimposed
(Legge, Parish, Leubker, & Wurm, 1990). In fact, the
International Standards Organization ISO 9241 standard-3 (ISO, 1993) recommends a minimum luminance
ratio of 3:1 and a preferred value of 10:1. Text readability is a complex problem and involves different sciences
(e.g., cognitive research, psycholinguistics, and human
factors). Physiological and psychological effects influence text readability on displays, as demonstrated by
Fukuzimi, Yamazaki, Kamijo, and Hayashi (1998). They
studied the physical parameters that influence human
color perception on CRT displays: dominant wavelength, stimulus purity, and luminance. They analyzed
results from subjective evaluations combining: (1) some
dominant wavelengths, (2) stimulus purities, and (3)
luminance. They also studied the readability of colors
using an objective method using measurements from
electroencephalogram signals. Their results demon-

strated that an optimal stimulus purity exists in each


dominant wavelength, and further, that it is independent of luminance.
Harrison and Vicente (1996) explored text readability
in the design of transparent 2D menus superimposed
over different graphical user interface (GUI) background
content. They presented a novel anti-interference (AI)
font that uses luminance values to create a contrasting
outline. Their work includes an empirical evaluation of
the effect of varying transparency levels, visual interference produced by different types of background content,
and the performance of AI fonts on text-menu selection
tasks. Testing demonstrated that the closer that the
shade and hue of the background is to the text color, the
higher is the interference, and the detriment of the
resulting performance. AI fonts produced a substantially
flatter performance curve, shifted toward better (i.e.,
faster) performance, especially at higher transparency levels (i.e., over 50%), which is exactly the condition we
have in a see-through display.
A basic study on the perception of gray text on a nonuniform gray background was presented by Petkov and
Westenberg (2003). They conducted psychophysical
experiments to demonstrate that the spatial frequency of
the patterns in the background has a relevant effect on
readability. The masking effect of the background is
higher when its characteristic pattern width is comparable to the letter stroke (or weight), while the letter size
shows no main effect. Their research can be a valid justification for the use of the outline style, and even more of
a billboard, which removes the background texturized
pattern around the text strokes. Nevertheless, this
research does not address color issues.
A more specific study on text readability for AR applications was presented by Leykin and Tuceryan (2004)
using a calibrated desktop CRT monitor at an approximate distance of 50 cm from the users head. They
implemented seven real-time supervised classifiers,
trained them with user data, and evaluated whether text
placed on a particular background was readable or not
on the screen. They concluded that textured background
variations affect readability only when the text contrast is
low. Their study considered only the luminance information of grayscale images and not different color.

174 PRESENCE: VOLUME 22, NUMBER 2

An interesting study of augmented reality viewability


in outdoor environments was conducted by Gabbard,
Swan, and Hix (2006). They evaluated text legibility
using an optical see-through display and different text
styles superimposed on matte-finished printed poster
backgrounds (40 in  60 in). They used six text drawing
styles, three static and three active (meaning that the text
color changed depending upon the presented background poster), six backgrounds (pavement, granite, red
brick, sidewalk, foliage, and sky), and three distances
(1 m, 2 m, and 4 m from the user). Their approach presented three different active algorithms to determine the
best color to use: Complement, Maximum HSV Complement, and Maximum Brightness Contrast. They
chose blue text to replace black text, which is impossible
to produce on see-through displays. Their most important finding was the empirical evidence that user performance is significantly affected by background texture,
text drawing style, and their interaction. The billboard
drawing style (blue text on pure white), and green text,
provided the fastest performance. Visually complex background textures performed very well (red brick) and
intermediately well (foliage), contradicting the initial
hypothesis that a complex background must reduce performance. Surprisingly, the active text drawing styles did
not perform better than the static styles in the practical
tests. Their final guidelines suggested the use of fully
saturated green labels, and the avoidance of fully saturated red labels. An important aspect for our research is
that the error rate was very small (1.9%), and they did
not analyze it further.
Tanaka, Kishino, Miyamae, Terada, and Nishio
(2008) proposed an unusual approach to address optical
see-through HMD limitations by using a fixed camera
mounted on the visor and directed forward. The camera
faced two mirrors that separated the left and right view.
Their approach was based on using the camera to evaluate the peripheral visibility of the user periphery and a
suggestion to turn the head whenever this would lead to
better conditions. Their visibility model considered: (1)
the average of RGB and HSV color spaces, (2) the variances in RGB, YCbCr, and HSV color spaces, (3) how
information was tied to a precise area, and (4) which
movements were possible. However, their layout strat-

egy expressly did not preserve the registration of the digital information on the real objects.
Jankowski, Samp, Irzynska, Jozwowicz, and Decker
(2010) explored the effects of varying four text drawing
styles (plain, billboard, anti-interference, and shadow),
image polarity (positive when dark characters are on a
light-colored panel and conversely for negative), and
two backgrounds: the first one with videos recorded in
urban and outdoor environments and the second one
recorded in 3D video games. They found out that there
was little difference in reading performance for video and
3D backgrounds. Furthermore, they concluded that
negative presentation is faster and more accurate than
positive presentation. Therefore, billboard styles resulted
in the easiest to read and the most immune to background distractions.
From the presented works, we can conclude that the
knowledge on the readability of text on HMD is scattered among different disciplines, application fields, and
hardware setups, and, at the moment, it is not adequate
to provide standard and reliable guidelines for the application developers. In particular, we found no previous
work addressing the specific industrial environments.
This study draws inspiration from Gabbards experiments that address mainly outdoor environments and
textures. Our idea is to apply a similar approach to industrial context. Therefore, the motivation of this work is to
study and find effective text styles for monocular optical
see-through presentation, specifically for the industrial
context.
3

Our Approach

We used a mixed-design approach to examine user


performance in a text-identification task similar to Gabbards test (Gabbard, Swan, Hix, Kim, & Fitch, 2007).
Gabbard considered text as one of the most fundamental
graphical elements in any user interface and therefore the
identification task is text-based (as opposed to icon-,
line-, or bitmap-based).
Because of the limited previous work on displaying information in AR in an industrial context, we wanted to
focus on text readability in real workshop scenarios. Specifically, we designed an experiment that abstracted the

Fiorentino et al. 175

short reading tasks that are very common in technical


AR applications in industry. For this study, we used a
low-level identification and visual search task, since we
did not want to address the semantics (e.g., cognitively
understanding the contents/meaning of the text). We
simply evaluated whether or not users could quickly and
accurately read information (i.e., text legibility), asking
the user to perform the following tasks.





Scan a meaningless short random text string.


Identify a target letter.
Count letters.
Provide a response.

The user is asked to perform these tasks in different


presentation modes, obtained with text styles and colors
used to convey mandatory information for the aforementioned industrial motivations. In this work, we limited
text styles to four types: (1) simple text, (2) text with
outline, (3) text with billboard, and (4) text with outline
and billboard. The text, outline, and billboard can all be
of different colors. The combination of text styles and
colors creates the presentation modes used for our
experiment, which will be detailed in Sections 4 and 5.
We also want to make it clear that the experiment task is
a foreground-only activity, and we did not measure anything about the users awareness of background content
or changes (i.e., this was not a divided attention task).
In the initial stage, we ran preliminary tests needed to
detect the most significant parameters to be used as the
independent variables of our experiments. We used an
optical see-through HMD, the Liteye LE 750A, 800 
600 OLED display. However, the parameters involved
in the visualization of the text (text font, color, size,
position, etc.) are too numerous for extensive user tests.
For this reason, we developed a specific software tool,
called HMD test, written in C using the Qt library,
which has two main functions: editing the parameters
(editor mode) and running the user tests (player mode).
In the editing phase, the user can interactively change
all the parameters of the text visualization with a simple
GUI and preview the final effect in real time on the monitor (see Figure 3).
If the HMD is connected to the second video port,
the user can preview the visualization directly; otherwise

he or she can simulate the result by loading a background image on a desktop screen.
The user can change and test the following text style
parameters: font size, text color and transparency, text
billboard color and transparency, outline width, and outline color and transparency. In our preliminary phase, we
simulated different configurations using a library of 100
pictures downloaded from the internet using Google
Images with specific keywords (i.e., workshop, shop
floor, manufacturing, etc.; see Figure 4). In this way, we
were able to evaluate the experimental settings and to
plan the user tests.
The HMD testbed automated the execution of tests in
player mode: the test configurations are retrieved from
the test template file, shuffled randomly, and then displayed on the HMD. The application acquires and archives the following data in a simple log file: subject
username, time and date of the test, displayed text
strings, text style, users answer, and response time.
During the test, the performance, the progress bar,
the current score, and the top score are displayed on the
service desktop screen to monitor the test (see Figure
5). The score is added to motivate the user to maximize
performances during the test; since it is not visible to
the participants, it cannot influence the test results. The
software is publicly available on our website
(Fiorentino, 2012). We are interested in comparing the
results from other researchers using different display
configurations.
4

Design of Experiment

We differentiate our experiments from Gabbards


tests by the usage of real industrial backgrounds. Our
software displays two different text blocks on the HMD
view area. The upper text block is composed of three
randomly generated strings with alternating uppercase
and lowercase letters, while the lower block consists of
three strings of capital letters. In the upper block, one of
the three sets of letter pairs consists of the same letter,
given once as uppercase and once as lowercase (e.g.,
mM, Pp). This is the target letter. Each user has to identify the target letter and he or she has to count out how
many times it appears in the lower block. The partici-

176 PRESENCE: VOLUME 22, NUMBER 2

Figure 3. HMD testbed in editor mode: user can design the text style and preview it on the screen or HMD.

pants should input the result on a provided numeric keypad. The possible answers are 1, 2, 3, or 0 in the case of
unreadable letter not found. The alphabet is restricted to
the following letters: C, K, M, O, P, S, U, V, W, X, Z.
These letters have graphical similarity in uppercase and
lowercase, therefore this restriction makes the difficulty
associated with the target identification uniform. Our
software generates and visualizes the text blocks on the
HMD and records response time and user errors. A
crosshair viewfinder is displayed, and the user must point
to a specific target in the real scene (see Figure 1). This
solution avoids the chance that users may turn the HMD
to a more favorable position (i.e., to choose a specific
background point).

4.1 Measures
We focused on the following experiment independent variables (see Table 1) and the dependent variables (see Table 2) that we collected for the subsequent
statistical analysis.
Apart from measuring efficiency (completion time)
and effectiveness (error rate), a 5-point Likert scale is
used to measure user preferences with a post-experiment
questionnaire.
4.2 Backgrounds
Most of the industrial backgrounds we have
encountered, especially those related to production

Fiorentino et al. 177

Figure 4. A sample of the images used in the preliminary design phase.

facilities, present some common characteristics. They are


indoors, uniformly lit, quite dirty, and mainly gray in
color. They often present sparse saturated colors (e.g.,
tools, signs, etc.) We used three real-world backgrounds:
(1) testbed frame, (2) tool workbench, and (3) motorbike engine (see Figure 6).
We chose the backgrounds with the intention to provide three different luminance profiles: negative
(testbed), positive (tool workbench), and neutral
(engine). We took pictures from the users point of view
with a digital camera and we display in Figure 7 the
related histograms. The test was carried out with the user
sitting on a swivel chair in order to have the same head
position for all participants of about 50 cm in height and
60 cm in depth from the background center point (see
Figure 8). All tests were performed in a laboratory with
fully shaded windows and artificial lighting (overhead
fluorescent lights). We measured the illuminance value

with a lux meter and we registered an average illumination over the work area of about 300 lux.

4.3 Colors
Our setup required users to be concentrated for
the whole task. Studies on the length of human sustained
attention reported a maximum of around 20 min for
adults (Cornish & Dukette, 2009). To keep the experiment time within 20 min, we limited the color range to
only four options. In particular, following the specifications defined by the ISO 3864 standard (colors for safety
signs; ISO, 2011a), we decided to use the red and the
green as safety colors (i.e., colors with special properties
to which a safety meaning is attributed), and white and
black as contrast colors. Apart from their general messages (safety and prohibition), green and red are worth a
deeper investigation in AR setup because the literature

178 PRESENCE: VOLUME 22, NUMBER 2

Figure 5. Player mode: during the test, the service screen (not seen by the participant) shows the list of configurations
(center), progress bar (lower left), and the user performance and top score.

Table 1. Independent Variables of Our Experiment


Independent variables
Participant users
Backgrounds
Text styles
Colors
Repetitions
Total trials

14
3
4
4
5
2,520

11 males, 3 females
Testbed frame, tool workbench, engine
Text only, with outline, with billboard, with outline and billboard
Black, green, red, white (when applicable)
Five for each presentation mode and background
14  3  12  5

Table 2. Dependent Variable of Our Designed Experiment


Dependent variables
Completion time
Error rate

in ms
Correct task completion 1,
wrong task completion 0

reports green as one of the best colors for reading, and


red among the worst performing colors on CRT displays
(Fukuzimi et al., 1998). Colors are displayed on our
uncalibrated optical see-through HMD. We tested the
following colors, defined in an RGB color space.






White: RGB (255,255,255).


Red: RGB (255,0,0).
Green: RGB (0,255,0).
Black: RGB (0,0,0).

An important issue is the visualization of the color


black on the optical see-through HMDs. In fact, the
RGB (0,0,0) means, in additive color composition, that
all the display pixels are off, so it will be transparent on
an optical see-through device. Therefore, in this work,
when we speak about the color black, we mean no
added color, which is the background color bleeding
through these transparent pixels. In our presentation

Fiorentino et al. 179

Figure 6. The three real backgrounds used in the tests: testbed frame (left), tool workbench (center), motorbike engine (right).

Figure 7. Normalized luminance histograms (number of pixels for each luminance value) of the pictures of the background used in the test:
negative (testbed frame), positive (tool workbench), and neutral (engine).

Figure 8. The experiment setups for the three backgrounds.

modes, the color black can only be used as text or as outline on a differently colored billboard, because a black
billboard is equivalent to no billboard. Our experimental
indoor environment has a low illuminance (around 300
lux); therefore, the transparent stroke of black text or of
the outline is perceived as dark enough to be called
black. With this meaning we considered black as a color
in our experiment. Our purpose is to study its performance as contrast color to background colors (i.e., white,
red, and green), as indicated by ISO standards (ISO
2011a, 2011b, 2004).

4.4 Text Styles


Font type and size were not considered as variables.
We chose the sans serif Helvetica font because it was
used in most of the readability experiments in the literature and we chose 22 points as the font height as the
smallest size that we can clearly read in the pre-test
phase. Indeed, we focused on four different text styles.
There are two well-known techniques in the literature to
isolate text from a variable background: the outline,
inspired by Harrisons AI font, and the billboard, which

180 PRESENCE: VOLUME 22, NUMBER 2

Table 3. The 12 Presentation Modes Used in the Experiments

1
2
3
4
5
6
7
8
9
10
11
12

Text
color

Outline
color

Billboard
color

Black
Black
Black
Green
Green
Green
Red
Red
Red
White
White
White

Black

Black

Green
Red

Green
Red
White

White
White
White

White

has proven effective but costly in terms of pixels. We


used four main text styles in our experiments: the first,
the simplest, is text only; the second is text with a 2point-wide outline; the third is text with a rectangular
billboard; and finally, the fourth is text with a combination of outline and billboard.
Table 3 shows the 12 presentation modes used in the
experiments preliminarily selected from the possible ones
changing text color (black, green, red, white), outline
modes (black, i.e., transparent, green, red), and billboard
mode (green, red, white).

Figure 9. The optical see-through HMD used in our experiments


(Liteye LE 750A).

repetitions) ensuring a Latin square design. Each subject


saw, on each background, a total of 60 visualization
queries. At the end of the complete trial, each participant
filled out a questionnaire to detect particular problems
and to collect evaluations and opinions.
4.6 Apparatus
Our hardware system for experimental tests consisted of the following.


4.5 Participants
Fourteen unpaid participants were recruited for the
study among undergraduates in technical subjects. They
were 11 males and three females with the following age
distribution: seven from 21 to 25 and seven from 26 to
30, with an average age of 25. Six participants wore
glasses but none had color deficiency. All were right-eye
dominant. The users wore the HMD in front of the right
eye, and they received adequate instruction and performed a trial session. The participants could discontinue
the test at any time and the break time was not limited.
The subjects performed a total of 2,520 trials (14 participants  12 permutation modes  3 backgrounds  5

Notebook HP Pavilion dv6-6150sl Entertainment


Notebook PC, Intel Core i5-2410M, 2.30 GHz,
RAM: 4 GB di DDR3, graphics card: AMD Radeon
HD 6770M with Windows 7 and HMD test software.
Viewer Liteye LE 750A, OLED display, 800  600 
60 Hz, contrast  100:1, transmission 70/30,
luminance 300 cd/m2, 288 diagonal FOV, mounted
on ergonomic support, and connected by VGA (see
Figure 9). We set the diopter adjustment to 0 for all
users.
Wireless numeric keypad by Targus, model
AKP02EU, battery powered, to collect participants
answers.
4.7 Hypotheses

Prior to conducting the study, we formulated the


following hypotheses.
H1. Different workshop backgrounds will affect user
test performance (completion time and error): text
readability is background-dependent.

Fiorentino et al. 181

N  2 contingency tables to do statistical inference (p


.05) on error data. We used the following error rate definition.
ER%

Figure 10. Box plot of the completion times for the three backgrounds
(the X marks the mean of samples).

H2. The presentation mode will affect performance.


H3. Text style will affect performance.
H4. Text color will affect performance.
H5. Outline color will affect performance.
H6. Billboard color will affect performance.

Results

We analyzed the acquired data to evaluate the main


effect of background, color, and text style on readability
performance. We used quantitative and qualitative data.
The completion time and error rate were quantitative
data, while the subjective responses were the qualitative
data. In a preliminary phase of the analysis, we removed
the outliers with the Tukeys outlier filter based on the
interquartile range. To make statistical inferences, we
started to inquire whether the completion time data followed a normal distribution. We used the ShapiroWilk
normal test, the AS R94 algorithm, which rejected the
normal distribution for all the samples (p < .05). The
skewness analysis showed a positive value for all samples:
this is typical in task-time-completion measures that follow a lognormal distribution. We log10-transformed all
the completion times prior to statistical analysis. To evaluate the homoscedasticity, we applied the Levene test,
because this test does not require equal dimensions for
all the groups.
As to the error rate, the faults considered in our analysis are users wrong answers. We used the method of

Number of errors
 100
Number of targets

Each sample of 12 modes could have 70 possible


errors (14 participants  5 repetitions), that is, the number of targets in the error rate definition.
In the following sections, we detail the results as to
the background effect, the text style effect, and the color
effect with a discussion on how to optimize readability
when the color message is required, such as for safety
warning.

5.1 Background Effect


With regard to completion times, the ANOVA
showed a main effect of background, F (2 2442)
49.377; p < .001. Figure 10 shows the box plot of the
completion times. On each box, the central mark is the
median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data
points not considered outliers, and the outliers are plotted individually. Considering the mean completion time,
the engine background had times 13% lower than the
tool workbench and 17% lower than the testbed frame.
The application of the ShapiroWilk test revealed that
the answer distributions for the three backgrounds were
not all normal, and homoscedasticity was not verified.
Therefore, we applied the Friedman test because it is
more indicated than ANOVA in these conditions. The
Friedman test showed a significant difference among the
three backgrounds (see Table 4). We used as the post
hoc pair-comparison the Wilcoxon signed-ranked test,
with Bonferroni correction (a 0.017), which confirmed that the engine background had the lowest
response time (with respect to the testbed frame, Z
22.601, p < .001; with respect to the tool workbench,
Z 23.907, p < .001), while the testbed frame background had statistically the highest answer time (with
respect to the tool workbench, Z 23.526, p < .001).
An explanation could be that neutral backgrounds are
the best for text readability. As a result, we can confirm

182 PRESENCE: VOLUME 22, NUMBER 2

Table 4. Background Data Analysis


Background

Testbed frame

ShapiroWilk test

W 0.996
p .031

Levene test
Mean rank
Friedman test

2.95

Figure 11. Box plot of error rate for the three backgrounds (the X
marks the mean of samples).

the hypothesis H1 relative to the completion time. Text


readability depends on the background.
As to the error rates for the three different backgrounds, we computed an average error of 6.67% on the
testbed frame background, 7.26% for the tool workbench background, and 6.55% on the engine background (see Figure 11).
Comparing the three sample error rates with contingency tables, we did not find statistically significant differences among the three backgrounds, w2(2) 0.3869
 5.991. Unlike the completion-time analysis, this
result, limited to error rates, does not support hypothesis
H1.

5.2 Presentation Mode Effect


With regard to completion, the results in Table 5
about normality and homoscedasticity begged the appli-

Tool workbench
W 0.997
p .097
F(2,2442) 9.895; p < .001
2.01
w2(2) 1518.9; p < .001

Engine
W 0.995
p .015
1.03

cation of the Welch ANOVA test for all of the 12 combinations, and the GamesHowell test for the 66 pair comparisons. Differences among all samples were statically
shown, thus hypothesis H2 was supported (see Figure
12).
The fastest presentation modes resulted from the black
text, no outline, and white billboard (mode 3). The best
performance of mode 3 was statistically confirmed
against modes 9, 7, 4, 11, and 8 (d 0.092, p  .001).
It is important to note that mode 5 (green, no, white)
and mode 6 (green, black, white) showed no statistical
difference, F(1) 0.012, p 0.912, while mode 7 (red,
no, white) and mode 9 (red, black, white) are not
strongly different, F(1) 3.962; p .047. This confirms
that the black color is effectively not a color since it
shows no main effect if used as an outline.
An error-rate comparison (see Figure 13) among all
the 12 combinations revealed a statistically significant
difference, w2 (11) 37.811 > 19.675, allowing us to
accept hypothesis H2; but in this case, the performance
distribution is different from the completion-time analysis. The best performing modes were modes 2, 5, and
12. An interesting result is that presentation modes displaying red text (modes 7, 8, and 9) had bad scoring,
both for completion times and error rates.
5.2.1 Interaction BackgroundPresentation
Mode. We tested the interaction between the background and the presentation-mode effects with a twoways unbalanced ANOVA, which showed that there is
no interaction effect, F(22,2427) 0.778; p 0.756.
Every presentation mode displays results respecting the
background ranking (with the partial exception of modes
2 and 8) as shown in the radar plot in Figure 14.

Fiorentino et al. 183

Table 5. Completion-Time Analysis of the Presentation Modes, Sorted by the Mean Response Time
Presentation
mode

Mean response
time (ms)

ShapiroWilk test

3 (black,-,white)
12 (white,red,-)
1 (black,-,green)
6 (green,-,white)
10 (white,-,-)
5 (green,-,white)
2 (black,-,red)
8 (red,-,-)
11 (white,green,-)
4 (green,-,-)
7 (red,-,white)
9 (red,black,white)

5,441
5,909
6,108
6,109
6,112
6,139
6,239
6,728
6,864
6,966
7,276
7,914

W(205) 0.996
W(209) 0.989
W(207) 0.992
W(203) 0.992
W(197) 0.994
W(208) 0.995
W(206) 0.990
W(206) 0.990
W(204) 0.990
W(209) 0.995
W(205) 0.997
W(204) 0.996

p .841
p .119
p .306
p .385
p .599
p .764
p .189
p .189
p .184
p .763
p .927
p .813

Levene
test

Welch
ANOVA test

F(11,2451) 4.359
p < .001

F(11) 12.653
p < .001

Figure 12. Box plot of completion time for each presentation mode (the X marks the mean of samples).

184 PRESENCE: VOLUME 22, NUMBER 2

Figure 13. Box plot of error rate for each presentation mode (the X marks the mean of samples).

Figure 14. Radar plot of response times (ms) for the three backgrounds in the 12 presentation modes.

Fiorentino et al. 185

Table 6. Text Style Comparison

Text style
Presentation modes
ShapiroWilk test
Levene test
Mean (ms)
Welch-ANOVA test
GamesHowell test

Text only
(T)

Text and
outline (TO)

4, 8, 10
W 0.996
p .101

Text and
billboard (TB)

11, 12
1, 2, 3, 5, 7
W 0.994
W 0.997
p .092
p .072
F(3,2249) 5.944; p < .001
6,622
6,368
6,237
F(3) 5.977; p .001
TB better than TOB d 0.042, p .001
TB better than T d 0.026, p .022
TO better than TOB d 0.034, p .035

5.3 Text Style Effect


To analyze the effect of text styles, we gathered all
the presentation modes in four groups, as presented in
Table 6. The completion-time analysis showed that the
data did not pass the homoscedasticity test; therefore,
we used the Welch-ANOVA test, which revealed a significant difference among the styles (see Table 6). This
result allowed us to accept hypothesis H3.
The GamesHowell post hoc test showed clearly that
the text and billboard style performed better than the
text-only style and the text outline and billboard style.
The text and outline style is better than only the worst
style: text outline and billboard (see Table 6). As to error
rates, there is no significant difference among the text
styles, w2(3) 5.14 < 7.82.

5.4 Color Effect


5.4.1 Text Color. We explored the text colors by
collecting all data in four groups: black (1, 2, and 3),
green (4, 5, 6), red (7, 8, 9), and white (10, 11, 12).
The results are represented in Figure 15. The comparison of these four samples gave as the result that the color
black seems to outperform all other colors; ANOVA:
F(3,2448) 25.420, p < .001, but indeed, good performance can be attributed to the presence of the billboard, which is always associated with black text, as
reported in Section 5.3. Therefore, we removed the

Text outline and


billboard (TOB)
6, 9
W 0.996
p .406
6,887

black group and proceeded to the comparison of green,


red, and white colors for text.
The ShapiroWilk test showed that all groups had a
normal distribution, but the Levene test showed that the
variances were different (see Table 7).
In this case, we applied the Welch-ANOVA test to
compare the three samples. There was a statistically significant difference in the text color choice; thus, hypothesis H4 is accepted. The white text group had (see Figure 16) the lowest answer time. GamesHowell post hoc
tests allowed pair-wise comparisons, and they revealed
that: (1) the green text color group is statistically better
than red (d 0.052, p < .001); (2) the white text color
is better than red (d 0.061, p < .001).
As to error rate, there were statistically significant differences among the three color text samples: w2(2)
7.563 > 5.991. The green text group has the minimum
average error rate, at 6.03%, compared to 6.34% for the
white text group, and 9.68% for the red text group (see
Figure 16). The black (transparent) text group has an average error rate of 5.24%. This is in accord with hypothesis H4.
Moreover, the red text group did not perform as well
as the other colors, as confirmed by completion-time
results and as reported in the previous literature.
5.4.2 Outline Color. As stated in Section 5.2,
the color black, for the reasons discussed in that section,
shows no main effect if used as the outline. Therefore,

186 PRESENCE: VOLUME 22, NUMBER 2

Figure 15. Box plot about scattered completion-time data referring to each text color group (the X marks
the mean of samples).

Table 7. Text Color Analysis


Text color

Green

ShapiroWilk test

W(617) 0.998
p .653

Levene test
Means (ms)
Welch-ANOVA test

6,412

Red
W(611) 0.997
p .248
F (2,1831) 3.688 p .025
7,228
F(2) 21.344 p < .001

White
W(606) 0.996
p .103
6,281

we could compare only the green and the red outline


(mode 11 and mode 12) applied on white text. The statistical results support hypothesis H5 (12 better than
11) for both completion-time (GamesHowell post hoc
test: d 0.065, p .002) and error-rate (see Figure 16)
analyses. The red outline performs better than the green
outline, probably because of a higher contrast between
the text and the outline.

Figure 16. Box plot of error rate for each grouped text color presentation mode (the X marks the mean of samples).

5.4.3 Billboard Color. Next, we wanted to analyze what was the best billboard among the three combinations available under the black text group. Therefore,
we kept the black text and focused our attention on conveying the color information using the billboards (green,

Fiorentino et al. 187

Table 8. Comparison of Billboard Colors (Black Text)


Black text on billboard
Green
ShapiroWilk test
Levene test
Means (ms)
One-way ANOVA

W(207) 0.992
p .306
6,109

Red
W(206) 0.990
p .189
F(2, 615) 0.126 p .882
6,237
F(2, 615) 7.061 p .001

White
W(205) 0.996
p .841
5,445

presented with the stimuli as reminder. In the first part,


the participant had to mark every presentation mode
with a vote from 1 to 5. In the second part, the participant answered questions about his or her opinions using
five judgment values: not at all, a little, on average,
enough, much. Figure 18 shows the cumulative
responses of the user interviews.

5.6 Discussion

Figure 17. Box plot of scattered completion-time data for black text
color and the three different billboard colors (the X marks the mean of
samples).

red, and white). For these presentation modes we had


three normal distributions and homogeneity for variance
(see Table 8).
The one-way ANOVA revealed a statistically significant difference among modes 1, 2, and 3, that is, black
text, no outline, and green, red, and white billboards,
thus accepting hypothesis H6 on billboard color influence (see Figure 17). Tukey post hoc tests confirmed
statistically significant differences between mode 3 and
mode 1 (d 0.050, p .009), and between mode 3 and
mode 2 (d 0.059, p .017) showing the best presentation mode with back text over a white billboard.
5.5 Qualitative Results
The post-experiment questionnaire was composed
of two parts, both using a Likert scale. The subjects were

A first result is that the real industrial background


(300 lux) influenced text readability with regard to
completion time. This is in accordance with previous
work that used different setups: a printed poster and
video on display monitors. Unlike the completion time,
the analysis of the error rates showed that background
did not have a significant influence. This last aspect
should be further investigated, because our results are in
contrast with general expectation and previous results
(e.g., Jakowski et al., 2010). Gabbards tests (Gabbard
et al., 2006), which are closer to our setup, revealed an
error rate that was not significant, and therefore they
ignored it in the statistics. Our tests, on the contrary,
showed higher error rates (6.54% vs. 1.9%). The engine
background performed better than the other two. This
result may depend on several factors, including luminance profile, which, in the specific case, is neutral,
unlike the other two. The presentation mode showed a
main effect on both completion time and error rate. This
result is in accordance with the literature. A non-trivial
statistical outcome is that there is no interaction effect
between background and presentation mode. This result
is in contrast to previous findings in outdoor environ-

188 PRESENCE: VOLUME 22, NUMBER 2

Figure 18. Cumulative marks given by participants at the end of their test trials (range 15).

ments. Gabbard, Zedlitz, Swan, and Winchester (2010)


found strong interactions between background and display color. However, these outcomes were reported for
outdoor environments with 8001000 lux. In our opinion, our result justifies the efforts in finding an optimal
presentation mode, since it will be independent from the
background when the luminance of the display is much
brighter than the environment, as in an indoor industrial
background.
Among all the presentation modes, the text style
revealed a main effect on completion time but not on
error rates. Post hoc analysis showed that the billboard is
more effective than outline or text only, in accordance
with the literature. The good performance of the billboard has a drawback in terms of scene occlusion.
The results achieved revealed that completion time,
error rate, and user interviews are not coherent in defining a unique ranking of presentation modes. According
to response time, the best results are obtained by mode
3 (black text, white billboard), and by mode 12 (white
text, red outline). The third-best performer is mode 1
(black text, green billboard), very similar to mode 3. An
explanation can be found in the higher contrast between
text and background in accordance with the ISO recommendations about text readability. We validated, in the
industrial context, the principle of using maximum contrast in order to achieve fast readability on a see-through

HMD. Therefore, our results suggest either black text


and white billboard or white text only when reading time
is important, for example, for maintenance instructions.
In contrast to the results obtained from completion
time, the error rates showed a different ranking. The best
results are obtained by mode 2 (black text, red billboard), followed by mode 5 (green text, white billboard), and by mode 12 (white text, red outline).
Although our results suggest the use of colors when the
information is critical and accuracy is mandatory (e.g.,
warning signal), deeper study is necessary. Also, the presentation mode qualitative ranking obtained from user
interviews is not in concert with user quantitative performance in terms of completion time and/or error rate.
This is quite interesting, since it proves that the user is
not able to choose the best presentation mode in terms
of performance. We therefore suggest in AR application
design to prevent users from freely customizing visualization preferences. The only result that is confirmed by
completion time, error rate, and user interview is that
presentation modes displaying red text perform poorly,
as already shown in the literature (e.g., Gabbard et al.,
2007; Fukuzimi et al., 1998). In industrial applications,
it can be necessary to convey specific color information
along with the textual description. In this case, our tests
recommend the use of a specific color for the billboard
and black (transparent) for the text.

Fiorentino et al. 189

Conclusion

We presented an empirical study on the readability


of text styles using an optical see-through HMD on different industrial scenarios. A preliminary test, supported
by a software tool implemented by the authors, was used
to explore a large number of configurations against a gallery of industrial images taken from the internet. We
selected and tested 12 presentation modes using four
main colors (black/transparent, white, red, and green),
four different text styles (text only, text and outline, text
and billboard, text and outline and billboard), and three
different real workshop backgrounds (testbed frame, a
tool workbench, and a motorbike engine). We ran 2,520
test trials with 14 participants who were interviewed after
the experiment.
The first finding of this work is that both the presentation mode and the background influence the readability
of text, but there is no interaction effect between these
two variables. An important result is that an optimal presentation mode will work well, independent of the background in indoor industrial lighting conditions (300
lux). We also note that the user is not able to choose the
best performing presentation mode, and therefore we
recommend that an AR application should not allow the
user to customize the visualization preferences. Another
interesting aspect is that the presentation mode differentially influences completion time and error rate.
The present study allows us to draw some guidelines
for an effective use of AR text visualization in industrial
environments. In particular, we suggest maximum contrast styles, such as black text and white billboard or
white text only, when reading time is important, and the
use of colors when avoiding errors in readability is critical. We also suggest a colored billboard with black text
where colors have a specific meaning. Billboards provide
the best performance, but at the cost of scene occlusion.
Future investigation is needed to explore billboard area
optimization. Apart from black and white colors, we
tested only red and green. Future work will involve testing with other colors such as blue, yellow, orange, and
so on. As a final remark, our findings are HMD-devicedependent, and for this reason, we provide our software
and the test configurations presented in this paper on

our website, in order to allow other researchers to collect


and compare results using different devices.

Acknowledgments
The authors would like to thank Michele Mazzoccoli and
Michele Gattullo for the useful help provided in the test design
and execution, and all the students who took part in the test.

References
ASME. (2007). Scheme for the identification of piping systems. (ASME A13.1) Retrieved September 4, 2012 from
https://www.asme.org/products/codes-standards/
scheme-for-the-identification-of-piping-systems
Cornish, D., & Dukette, D. (2009). The essential 20: Twenty
components of an excellent health care team (pp. 7273).
Pittsburgh, PA: RoseDog Books.
Fiorentino, M. (2012). HMD test [software]. Retrieved from
Polytechnic of Bari, Department of Mechanics, Mathematics
and Management, Vr3Lab website: http://www.dimeg
.poliba.it/vr3lab/
Fukuzimi, S., Yamazaki, T., Kamijo, K., & Hayashi, Y. (1998).
Physiological and psychological evaluation for visual display
colour readability: A visual evoked potential study and a subjective evaluation study. Ergonomics, 41(1), 89108.
doi:10.1080/001401398187341
Gabbard, J. L., Swan, J. E., II, & Hix, D. (2006). The effects
of text drawing styles, background textures, and natural
lighting on text legibility in outdoor augmented reality.
Presence: Teleoperators and Virtual Environments, 15(1),
1632. doi:10.1162/pres.2006.15.1.16
Gabbard, J. L., Swan, J. E., II, Hix, D., Kim, S.-J., & Fitch, G.
(2007). Active text drawing styles for outdoor augmented
reality: A user-based study and design implications. Proceedings of the IEEE Virtual Reality Conference, VR 07, 3542.
doi:10.1109/VR.2007.352461
Gabbard, J. L., Zedlitz, J., Swan, J. E., II, & Winchester, W.
W. III (2010). More than meets the eye: An engineering
study to empirically examine the blending of real and virtual
color spaces. Technical Papers, Proceedings of IEEE Virtual
Reality, 10, 7986. doi:10.1109/VR.2010.5444808
Harrison, B. L., & Vicente, K. J. (1996). An experimental evaluation of transparent menu usage. Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems: Com-

190 PRESENCE: VOLUME 22, NUMBER 2

mon Ground, CHI 96, 391398. doi:10.1145/238386


.238583
Hecht, E. (1987). Optics (2nd ed.). Reading, MA: AddisonWesley.
Hirano, H. (1995). 5 pillars of the visual workplace. Cambridge, MA: Productivity Press.
ISO. (1993). Ergonomic requirements for office work with
visual display terminals (VDTST). Part 3: Visual display
requirements. ISO 9241-3. Geneva: ISO.
ISO. (2002). Lighting of work placesPart 1: Indoor. ISO
8995-1. Geneva: ISO.
ISO. (2004). Graphical symbolsSafety colours and safety
signsPart 2: Design principles for product safety labels.
ISO 3864, Part 2. Geneva: ISO.
ISO. (2011a). Graphical symbolsSafety colours and safety
signsPart 1: Design principles for safety signs and safety
markings. ISO 3864, Part 1. Geneva: ISO.
ISO. (2011b). Graphical symbolsSafety colours and safety
signsPart 4: Colorimetric and photometric properties of
safety sign materials. ISO 3864, Part 4. Geneva: ISO.
Jankowski, J., Samp, K., Irzynska, I., Jozwowicz, M., &
Decker, S. (2010). Integrating text with video and 3D
graphics: The effects of text drawing styles on text readability. Proceedings of the 28th International Conference on
Human Factors in Computing Systems (CHI 10), 1321
1330. doi:10.1145/1753326.1753524
Legge, G. E., Parish, H. D., Leubker, A., & Wurm, H. L.
(1990). Psychophysics of reading. XI. Comparing color con-

trast and luminance contrast. Journal of the Optical Society of


America, 7(10), 20022010. doi:10.1364/JOSAA
.7.002002
Leykin, A., & Tuceryan, M. (2004). Automatic determination
of text readability over textured backgrounds for augmented
reality systems. Proceedings of the 3rd IEEE/ACM International Symposium on Mixed and Augmented Reality, ISMAR
04, 224230. doi:10.1109/ISMAR.2004.22
OSHA. (2007). Safety color code for marking physical hazards.
U.S. Department of Labor Regulations (Standards-29 CFR),
1910.144. Washington, DC: OSHA.
Petkov, N., & Westenberg, M. A. (2003). Suppression of
contour perception by band-limited noise and its relation
to nonclassical receptive field inhibition. Biological Cybernetics, 88(3), 236246. doi:10.1007/s00422-002-0378-2
Tanaka, K., Kishino, Y., Miyamae, M., Terada, T., & Nishio, S.
(2008). An information layout method for an optical seethrough head-mounted display focusing on the viewability.
Proceedings of the 7th IEEE/ACM International Symposium
on Mixed and Augmented Reality, 139142. doi:10.1109
/ISMAR.2008.4637340
Uva, A. E., Cristiano, S., Fiorentino, M., & Monno, G.
(2010). Distributed design review using tangible augmented
technical drawings. Computer-Aided Design, 42(5), 364
372. doi:10.1016/j.cad.2008.10.015
van Krevelen, D. W. F., & Poelman, R. (2010). A survey of
augmented reality technologies, applications and limitations.
The International Journal of Virtual Reality, 9(2), 120.

Copyright of Presence: Teleoperators & Virtual Environments is the property of MIT Press
and its content may not be copied or emailed to multiple sites or posted to a listserv without
the copyright holder's express written permission. However, users may print, download, or
email articles for individual use.

Das könnte Ihnen auch gefallen