Beruflich Dokumente
Kultur Dokumente
1 Introduction
The ability to quantify the visual quality of an image in a
manner that agrees with human vision is a crucial step for
any system that processes consumer images. Over the past
several decades, research on this front has given rise to a
variety of computational methods of image quality assessment. So-called full-reference quality assessment methods
take as input an original image and a distorted version of
that image, and yield as output a prediction of the visual
quality of the distorted image relative to the original image.
The effectiveness of an image quality assessment method is
Paper 09070SSPR received May 1, 2009; revised manuscript received Jul.
15, 2009; accepted for publication Jul. 30, 2009; published online Jan. 7,
2010. This paper is a revision of a paper presented at the SPIE conference
on Image Quality and System Performance VI, January 2009, San Jose,
California. The paper presented there appears unrefereed in SPIE Proceedings Vol. 7242.
1017-9909/2010/191/011006/21/$25.00 2010 SPIE and IS&T.
011006-1
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
DMOS = 17.7
DMOS = 23.5
Fig. 1 When judging the quality of a distorted image containing near-threshold distortions, one tends
to rely primarily on visual detection often using point-by-point comparisons with the original image in
an attempt to locate any visible differences. a Close-up of original image Bikes; b close-up of image
Bikes distorted via JPEG-2000 compression; and c close-up of image Bikes distorted by additive
Gaussian white noise. DMOS values indicate differential mean opinion scores from the LIVE image
database.25
Gaussian blurring
JPEG-2000 compression
White noise
JPEG-2000 compression
Fig. 2 When judging the quality of a distorted image containing clearly visible suprathreshold distortions, one tends to rely much less on visual detection and much more on overall image appearance in
an attempt to recognize image content in the presence of the dominating distortions. a Close-up of
image Bikes distorted via Gaussian blurring; b and d close-up of image Bikes distorted via JPEG2000 compression; and c close-up of image Bikes distorted by additive Gaussian white noise. DMOS
values indicate differential mean opinion scores from the LIVE image database.25
Journal of Electronic Imaging
011006-2
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
011006-3
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
combine PSNR with a measure of structure based on differences in wavelet modulus maxima corresponding to lowand high-frequency bands. Structural approaches to quality
assessment have also been applied to video40,41 and wireless applications.42
2.3 Methods Based on Other Measures
Other measures of image quality have been proposed that
operate based on statistical and/or information-theoretic
measures. For example, In Ref. 43, Sheikh and Bovik quantify image quality based on natural-scene statistics. VIF operates under the premise that the HVS has evolved based on
the statistical properties of the natural environment. Accordingly, the quality of the distorted image can be quantified based on the amount of information it provides about
the original. VIF models images as realizations of a mixture
of marginal Gaussian densities of wavelet subbands, and
quality is then determined based on the mutual information
between the subband coefficients of the original and distorted images.
In Ref. 44, Liu and Yang apply supervised learning to
derive a measure of image quality based on decision fusion.
A training set of images and subjective ratings is used to
determine an optimal linear combination of four methods of
quality assessment: PSNR, SSIM, VIF, and VSNR. The
learning is performed via canonical correlation analysis and
images/subjective ratings from the Laboratory for Image
and Video Engineering LIVE image database25 University of Texas at Austin and the A57 image database.20 The
authors demonstrate that the resulting approach is competitive with VIF.
In Ref. 24, Shnayderman, Gusev, and Eskicioglu measure image quality based on a singular value decomposition
SVD. The Euclidean distance is measured between the
singular values of an original image block and the singular
values of the corresponding distorted image block; the collection of block-wise distances constitues a local distortion
map. An overall scalar value of image quality is computed
as the average absolute difference between each blocks
distance and the median distance over all blocks. Shnayderman, Gusev, and Eskicioglu report that their SVD-based
method performs better than SSIM on a suite of test images.
2.4 Summary of Existing Methods
Methods that operate based only on the energy of the distortions, such as MSE and PSNR, are attractive due to their
mathematical simplicity. However, these methods have also
been shown to be relatively poor predictors of visual
quality,45 particularly when comparing images containing
different types of distortions. Methods that take into account properties of the human visual system have demonstrated great success at predicting quality for images containing near-threshold distortions. However, these methods
generally perform less well for highly distorted images containing suprathreshold distortions unless properties of suprathreshold vision are also taken into account e.g., Refs.
19 and 20. In contrast, methods based on structural and/or
statistical principles have demonstrated success for images
containing suprathreshold distortions. However, because
Journal of Electronic Imaging
Algorithm
011006-4
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
Conv. To L and
CSF filter
Perceived
Luminance/
Contrast
Conversion
Conv. To L and
CSF filter
Subtract L
Images
LMSE, D
Modified RMS
contrast
Cerr
Local Squared
Error
RMS contrast
Corg
ln(Cerr)
Visibility/
Error
Maps
Visible
Not
Visible
Compare
Local
RMS
Contrasts
Visibility
ln(Corg)
2-Norm
ddetect
Fig. 3 Evolution of original and distorted image Cactus during the calculation of ddetect. Also shown is
Cerr plotted versus Corg in log space and the corresponding detection thresholds used to determine
visibility.
L = b + kI ,
where L denotes the luminance image, and where the parameters b, k, and are constants specific to the device on
which the images are to be displayed. For 8-bit pixel values
and an sRGB display,46 the values for these parameters are
given by b = 0, k = 0.02874, and = 2.2. Equation 1 is applied to both Iorg and Idst to yield Lorg and Ldst, respectively.
Next, we take into account the nonlinear HVS response
to luminance by converting luminances into perceived luminances relative lightness via
L = L,
3
Hf, =
otherwise
011006-5
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
u , v FL
, where F
frequency domain via I = F1H
1
and F denote the DFT and inverse DFT, respectively.
u , v denotes a DFT-based version of
The quantity H
Hf , , where u , v are the DFT indices see Appendix A in
Sec. 6. At this point, Iorg
and Ierr
represent the original and
error images with values that are linearly proportional to
both perceived luminance and perceived contrast. Ierr
can
be considered to be the distortions in the image that the
HVS could detect if the distortions were viewed against a
uniform background i.e., no masking rather than being
viewed against the image.
Contrast masking. To account for the fact that the presence of an image can reduce the detectability of distortions,
we employ a simple spatial-domain measure of contrast
masking. First, a local contrast map is computed for the
original image by dividing Iorg
into 16 16 blocks with
75% overlap between neighboring blocks, and then measuring the rms contrast of each block, where rms contrast is
measured in the lightness domain. The rms contrast of
block p of Iorg
is computed via
Corgp = orgp/orgp,
Cerrp =
p = ln Cerrp ,
0,
1
P
if ln Cerrp ln Corgp .
otherwise
ddetect =
p Dp2
1/2
Dp =
1
I i, j2 ,
162 i,jNp err
011006-6
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
(a)
(b)
(c)
(d)
(e)
Fig. 4 Perceived distortion in the high-quality regime is determined based largely on masking. Given
an original image a, and a distorted image b, a map denoting the locations of visible distortions is
computed c, which is then multiplied by local MSE measured in the lightness domain d. The
resulting visibility-weighted local MSE e is used to determine ddetect. In this image, the only visible
distortions occur in the form of blocking in the background and ringing around the branches, whereas
the worms and the interior of the tree mask these errors. Because local MSE does not take into
account masking, the local MSE map in d indicates that the greatest distortions occur in the areas of
greatest energy. However, by augmenting local MSE with the distortion visibility map, the final visibilityweighted local MSE map e accurately reflects the locations and amounts of visible distortions.
Figures 4c4e show the computed visibility map, local MSE map, and visibility-weighted local MSE map
p Dp, respectively. Notice from Fig. 4c that the
visibility map does well at capturing the visible artifacts.
Notice from Fig. 4d that because local MSE does not take
into account masking, the local MSE map indicates that the
greatest distortions occur in the areas of greatest energy
worms, tree, whereas the distortions in these regions are
masked. Finally, notice from Fig. 4e that the visibilityweighted local MSE map performs well at indicating both
the locations and perceived intensities of the visible distortions. In particular, this latter map captures both the blocking in the background and the ringing around the branches.
Although the spatial-domain model of masking employed here is certainly not a proper model of how masking
is effected in the HVS, it provides a reasonable tradeoff
between accuracy and computational efficiency. In particular, standard deviation can be approximated efficiently using a fast, separable convolution. Furthermore, although
MSE on its own is not always a veridical indicator of perceived distortion, when MSE is computed locally and combined with a visibility map, it can perform well in the highquality regime. In Sec. 4.5, we demonstrate that, despite its
simplicity, this model is quite effective at predicting the
Journal of Electronic Imaging
011006-7
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
4 Orientations
Orient. 1, cs,o
Orient. 4, cs,o
Multiple
Scale/Orient.
Gabor
Subbands
5 Scales
Var.
Skew.
Kurt.
Difference
from
Original*
Statistical
Difference
Map,
Var.
Skew.
Kurt.
BlockBased
Subband
Statistics
2-Norm
dappear
org
dst
org
dst
p = wss,o
p s,o
p + 2s,o
p s,o
p
s=1 o=1
Gs,or, = exp
log r/rs2
o 2
exp
, 9
2log s/rs2
22o
org
dstp
p s,o
,
+ s,o
10
1
P
p2
1/2
11
011006-8
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
(a)
(b)
(c)
Fig. 6 Perceived distortion in the low-quality regime is determined based largely on changes in the
appearance of the images subject matter. Given an original image a, and a distorted image b, a
map of local changes in log-Gabor filter-response statistics is computed c. This statistical difference
map is used to determine dappear. Note that the difference map is based on statistics from multiple
scales, so distortions are not completely localized.
are combined to yield an overall measure of perceived distortion, applicable for all images, based on how apparently
distorted the image appears.
High-quality images should obtain their rating mostly
from ddetect and low-quality images from dappear. We hypothesize that in the transition between high- and low-quality
assessment, the HVS uses a mixture of strategies. This hypothesis makes sense for images in which some regions
may appear to be of high quality, while other regions may
contain clearly suprathreshold distortions. Images compressed with JPEG and JPEG-2000, for example, fall into
this category. Figure 7 shows an example. In comparison to
the original image shown in Fig. 7a, the distorted image
shown in Fig. 7b contains a mixture of what one would
consider high-quality and low-quality regions. The left
sidewall of the bridge and the mountain in the background
exhibit statistical appearance changes severe blurring,
while the high-contrast trees and brighter areas of the sky
largely mask the distortions. In addition, the middle slats of
the bridge look somewhat normal, whereas the more distant
slats appear clearly degraded.
To capture the interacting strategies, we propose using a
weighted geometric mean of ddetect and dappear, given by
MAD = ddetectdappear1 ,
12
where MAD 0 , denotes the overall perceived distortion. The weight 0 , 1 is chosen based on the predicted overall level of distortion. For low levels of distortion, MAD should obtain its value mostly from ddetect i.e.,
should approach a value of 1. For high levels of distortion, MAD should obtain its value mostly from dappear
should approach a value of 0.
Although the optimal technique of selecting remains
an area of future research, we have found that selecting
based on ddetect can yield good performance. Here, is
computed via
1
,
1 + 1ddetect2
13
(a)
(b)
011006-9
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
1 2
+ 2 .
x 3
1 + exp
4
14
011006-10
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
Table 1 Performances of MAD and other quality assessment algorithms on images from the TID,
LIVE, Toyama, and CSIQ databases. The best performances are bolded. Italic entries are statistically
the same as the best performer for the particular database.
CC
SROCC
OR
OD
PSNR
SSIM
MS-SSIM
NQM
VSN
VIF
MAD
TID
0.5355
0.6520
0.8390
0.6103
0.6820
0.8055
0.8306
LIVE
0.8707
0.9378
0.9330
0.9096
0.9233
0.9595
0.9683
Toyama
0.6353
0.7970
0.8924
0.8917
0.8708
0.9136
0.8951
CSIQ
0.7998
0.8149
0.8972
0.7418
0.8002
0.9252
0.9502
Average
0.7103
0.8004
0.8904
0.7883
0.8191
0.9009
0.9110
TID
0.5245
0.6448
0.8528
0.6243
0.7046
0.7496
0.8340
LIVE
0.8763
0.9473
0.9437
0.9051
0.9278
0.9633
0.9675
Toyama
0.6126
0.7864
0.8864
0.8886
0.8610
0.9080
0.8908
CSIQ
0.8056
0.8367
0.9137
0.7401
0.8105
0.9192
0.9466
Average
0.7048
0.8038
0.8992
0.7895
0.8260
0.8850
0.9097
LIVE
68.16%
59.18%
61.87%
63.80%
58.79%
54.56%
41.46%
Toyama
22.02%
14.29%
8.33%
6.55%
9.52%
5.36%
7.14%
CSIQ
34.30%
33.49%
24.48%
37.30%
31.06%
22.63%
18.01%
Average
41.49%
35.65%
31.56%
35.88%
33.13%
27.52%
22.21%
LIVE
4943.3
2814.1
2960.0
3616.8
3246.8
1890.4
1369.8
Toyama
12.9526
6.6174
1.8967
1.6174
3.6299
1.3249
1.7753
CSIQ
3178.0
2896.2
1528.4
4351.9
3325.3
1218.2
626.2
be given around the opinion scores associated with the variability of observers. This variability is normally quantified
using the standard deviation of all subjective ratings for a
particular image subj. With this in mind, the outlier ratio is
defined as61
Rout =
Nfalse
,
Ntotal
15
where Nfalse is the number of predictions outside two standard deviations 2subj of the DMOS or MOS, and Ntotal is
the total number of predicted ratings. The range of 2subj
was chosen because it contains 95% of all the subjective
quality scores for a given image.
In addition to knowing if a predicted rating is an outlier,
it is also informative to know how far outside of the error
bars 2subj the outlier falls. To quantify this, we propose a new measure, termed the outlier distance, which is
the distance from an outlier to the closest error bar. The
outlier distance dout is defined as
Journal of Electronic Imaging
dout =
fx
xXfalse
OSx 2subjx,
16
011006-11
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
Toyoma Database
100
3
2
R = 0.831
80
DMOS
MOS
R = 0.968
CSIQ Database
100
60
40
4
2
0
0
1
20 40 60 80 100
60
40
20
20
R = 0.950
80
DMOS
R = 0.895
TID Database
8
MOS
LIVE Database
0
0
20
40
60
80 100
Linearized MAD
Linearized MAD
Linearized MAD
Linearized MAD
(a)
(b)
(c)
(d)
Fig. 8 The overall logistically transformed fits for MAD on the Toyama, LIVE, TID, and CSIQ image
databases. The average subj is 0.70, 15.8, 0.79, and 9.33 for each database, respectively. All the
databases have differing dynamic ranges for MOS and DMOS as seen from the differing y axes.
011006-12
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
Table 2 Statistical significance relationships ratio of residual variances between the algorithms on all
the databases. A value 1 denotes that the algorithm in the row has smaller residuals than the
algorithm in the column. Bold entries are statistically significant with confidence greater than 95%. A
value 1 denotes larger residuals. Italicized entries indicate the algorithm in the row is statistically
worse than the algorithms in the column. An * appears next to the statistically best-performing
algorithms in the database.
TID
PSNR
SSIM
MS-SSIM
NQM
VSNR
VIP
MAD
PSNR
1.2406
2.4088
1.1365
1.3334
2.0306
2.2994
SSIM
0.8060
1.9416
0.9161
1.0748
1.6868
1.8535
MS-SSIM*
0.4151
0.5150
0.4718
0.5535
0.8430
0.9546
NQM
0.8799
1.0916
2.1194
1.1732
1.7867
2.0232
VSNR
0.7500
0.9304
1.8065
0.8524
1.5229
1.7245
VIF
0.4925
0.6110
1.1862
0.5597
0.6566
1.1324
MAD*
0.4349
0.5395
1.0475
0.4943
0.5799
0.8831
20.1
98.9
31.6
282.6
559.2
283.1
438.3
PSNR
2.0052
1.8680
1.4012
1.6394
3.0448
3.8744
SSIM
0.4987
0.9316
0.6988
0.8176
1.5185
1.9822
MS-SSIM
0.5353
1.0734
0.7501
0.8776
1.6300
2.0741
NQM
0.7137
1.4310
1.3331
1.1700
2.1730
2.7650
VSNR
0.6100
1.2231
1.1394
0.8547
1.8573
2.3633
VIF
0.3284
0.6586
0.6135
0.4602
0.5384
1.2724
MAD*
0.2581
0.5176
0.4821
0.3617
0.4231
0.7859
11.8
2.6
7.1
180.0
20.0
4.8
123.5
PSNR
1.6852
2.9284
2.9284
2.4672
3.6057
3.0002
SSIM
0.6116
1.7909
1.7909
1.5088
2.2051
1.8348
MS-SSIM*
0.3415
0.5584
1.0000
0.8425
1.2313
1.0245
NQM*
0.3415
0.5584
1.0000
0.8425
1.2313
1.0245
VSNR
0.4053
0.6628
1.1869
1.1869
1.4615
1.2160
VIF*
0.2773
0.4535
0.8122
0.8122
0.6842
0.8321
MAD*
0.3333
0.5450
0.9761
0.9761
0.8223
1.2018
JBSTAT
8.0
2.9
0.0
2.8
18.1
2.9
13.0
JPSNR
1.0730
1.8486
0.8012
1.0018
2.5020
3.7087
SSIM
0.9320
1.7229
0.7467
0.9337
2.8318
3.4564
MS-SSIM
0.5410
0.5804
0.4334
0.5419
1.3534
2.0062
NQM
1.2481
1.8392
2.8073
1.2504
3.1228
4.6289
VSNR
0.9982
1.0710
1.8452
0.7997
2.4974
3.7020
VIF
0.3997
0.4289
0.7389
0.3202
0.4004
1.4823
MAD*
0.2696
0.2893
0.4985
0.2160
0.2701
0.6746
2.2
38.6
26.9
43.9
45.6
32.2
17.4
JBSTAT
LIVE
JBSTAT
Toyama
CSIQ
JBSTAT
011006-13
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
small, one can assume that the data follow a Gaussian distribution. Larger values of the JB statistic denote larger
deviations from Gaussianity.
Table 2 shows the summary for overall statistical performance of each algorithm on TID, LIVE, Toyama, and
CSIQ. Each entry is the ratio of the residual variance of the
algorithm in the row to the algorithm in the column. An
F-test was performed between each pair of algorithms.
Bold entries denote that the algorithm in the row has statistically smaller residuals than the algorithm in the column
with confidence greater than 95%. Italicized entries denote
that the algorithm in the row has statistically greater residual variances with the same confidence. Plain text entries denote that there is statistically no difference between
the residuals of the two predictions. Starred algorithms in
column 2 indicate that no other algorithm performs statistically better on the particular database.
Table 2 also contains the JB statistic measure of Gaussianity. Italicized JB entries denote that the residuals can be
deemed Gaussian with 99% confidence. A large JB statistic
indicates an increased departure from Gaussian usually
due to the presence of outliers.
On TID, MAD and MS-SSIM are statistically the bestperforming algorithms. It is notable that none of the algorithm residuals can be deemed Gaussian with high confidence. For instance, MAD has one of the highest JB
statistics due to a large number of outliers at high quality
see Fig. 8. It is surprising, then, that MAD stays competitive with MS-SSIM, which has considerably more Gaussian
residuals. That is to say, the variance of MAD is inflated by
the existence of outliers, while the variance of MS-SSIM
arises from a more even, but wider, clustering across quality ranges.
On LIVE, MAD is statistically the best-performing algorithm. Again, MAD is also deemed as highly nonGaussian as denoted by the JB statistic. The high JB value
is attributable to several outliers see Fig. 8 in the LIVE
database group that inflate the variance of MAD, similar to
the high-quality outliers on TID.
On CSIQ, MAD is statistically the best-performing algorithm. Notice that MAD has slightly more Gaussian residuals than every algorithm except PSNR. While none of
the algorithms except PSNR can be considered Gaussian,
the JB statistics between algorithms are much more consistent. There are no algorithms that are wildly non-Gaussian
as seen in other databases. Because of the similarity in
Gaussianity for the algorithms, CSIQ provides the most
unbiased application of the F-test. Under these conditions,
MAD performs well.
On Toyama, statistical significance is highly nonselective. MAD, MS-SSIM, NQM, and VIF have statistically the
same performance. Even so, it is of note that MAD performs comparatively well. Toyama seems to show the least
preference toward MAD, which is a contraindication from
the other database analyses. Perhaps Toyama shows the
least preference because it contains the least number of
distortion types. This is investigated in the next section.
In summary, MAD was shown to have significantly
smaller residuals on the CSIQ and LIVE databases. For
Toyama and TID, the residuals of MAD were shown to be
drawn from the same distribution as the best-performing
algorithm with 95% confidence. For Toyama and TID, there
Journal of Electronic Imaging
011006-14
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
Table 3 SROCC of MAD and other quality assessment algorithms on multiple databases using single types of distortion. Bold entries are the
best performers in the database for the particular type of distortion. Italicized entries are the second-best performers. The last row shows the
number of times the SROCC was above 0.95.
PSNR
SSIM
MS-SSIM
NQM
VSNR
VIF
MD
Images
CSIQ
0.929
0.924
0.972
0.958
0.945
0.975
0.966
150
LIVE
0.781
0.951
0.959
0.859
0.942
0.973
0.899
145
TID
0.868
0.937
0.961
0.885
0.933
0.955
0.914
100
CSIQ
0.936
0.925
0.947
0.939
0.924
0.957
0.960
150
LIVE
0.985
0.969
0.973
0.986
0.979
0.985
0.971
145
TID
0.911
0.825
0.809
0.768
0.773
0.880
0.863
100
CSIQ
0.888
0922
0.962
0.953
0.9003
0.970
0.966
150
LIVE
0.881
0.975
0.979
0.957
0.966
0.984
0.949
175
TOY
0.284
0.627
0.835
0.889
0.797
0.907
0.895
84
TID
0.901
0.897
0.935
0.907
0.917
0.917
0.941
100
CSIQ
0.936
0.921
0.969
0.963
0.948
0.967
0.977
150
LIVE
0.895
0.961
0.965
0.944
0.956
0.969
0.938
169
TOY
0.860
0.915
0.947
0.904
0.925
0.956
0.955
84
TID
0.830
0.877
0.974
0.953
0.952
0.971
0.972
100
CSIQ
0.862
0.740
0.952
0.948
0.869
0.936
0.917
116
TID
0.613
0.629
0.640
0.727
0.424
0.819
0.492
100
1/fnoise
CSIQ
0.934
0.894
0.933
0.911
0.908
0.951
0.954
150
Jpeg2000
LIVE
0.893
0.955
0.930
0.800
0.905
0.965
0.883
145
pack. loss
TID
0.777
0.847
0.852
0.726
0.791
0.851
0.840
100
Jpeg
pack. loss
TID
0.766
0.819
0.874
0.737
0.805
0.858
0.851
100
Quantity
TID
0.870
0.804
0.854
0.821
0.827
0.796
0.850
100
Denoised
TID
0.938
0.926
0.957
0.945
0.929
0.919
0.945
100
Awgncolor
TID
0.907
0.825
0.806
0.749
0.779
0.878
0.839
100
Corrnoise
TID
0.923
0.849
0.820
0.772
0.766
0.870
0.898
100
Masknoise
TID
0.849
0.818
0.816
0.707
0.729
0.870
0.736
100
Hifreqnoise
TID
0.932
0.858
0.868
0.901
0.881
0.907
0.897
100
Impulse
TID
0.918
0.759
0.687
0.762
0.647
0.883
0.512
100
Pattern
TID
0.593
0.678
0.734
0.680
0.572
0.761
0.838
100
Block
TID
0.585
0.891
0.762
0.235
0.193
0.832
0.161
100
Mean shift
TID
0.697
0.757
0.737
0.525
0.372
0.513
0.589
100
11
13
NA
Blur
Awgn
Jpeg
Jpeg2000
Contrast
# 0.95
011006-15
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
Table 4 SROCC for ddetect and dappear and other algorithms on images from the multiple databases containing mostly near-threshold distortions
DMOS or MOS in the top fifth of the quality range of the database and images that are highly distorted DMOS or MOS in the bottom fifth of
the quality range of the database. Bold entries have the best performance, statistically.
SSIM
MS-SSIM
NQM
VSNR
VIF
ddetect
dappear
Images
High
CSIQ
0.576
0.717
0.430
0.563
0.649
0.684
0.471
316
DMOS 20
quality
LIVE
0.532
0.532
0.375
0.634
0.557
0.696
0.182
193
DMOS 25
image
TID
0.318
0.216
0.473
0.338
0.508
0.334
0.258
227
MOS 6
Low
CSIQ
0.306
0.576
0.555
0.452
0.782
0.561
0.752
177
DMOS 65
quality
LIVE
0.558
0.598
0.620
0.513
0.445
0.479
0.786
112
DMOS 80
images
TID
0.214
0.491
0.232
0.194
0.631
0.128
0.622
125
MOS 2.5
Table 5 SROCC/linear CC, respectively, for ddetect versus MAD, dappear versus MAD, and dappear versus
ddetect on multiple database. The mean and standard deviation of on each database is also shown.
dappear versus MAD
mean
std
TID
0.8206/ 0.9111
0.9558/ 0.9895
0.6481/ 0.8784
0.394
0.0721
TOY
0.9912/ 0.9752
0.7402/ 8520
0.6552/ 0.9537
0.6493
0.0414
LIVE
0.9523/ 0.5439
0.9588/ 0.9787
0.8522/ 0.6073
0.4216
0.1207
CSIQ
0.9406/ 0.9050
0.9410/ 0.9903
0.8079/ 0.8866
0.4714
0.1735
011006-16
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
011006-17
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
p1,1 p1,2
p1,1 p1,2
p1,1 p1,2
p2,1 p2,2
p2,1 p2,2
p2,1 p2,2
(a)
(b)
(c)
Fig. 9 Example of measuring the local block-based contrast in an image containing a low-contrast
object here, a solid disk placed on a high-contrast background, as is often found in natural images.
In a, the contrast of the block is low. In c, the contrast of the block is high. In b, the contrast of the
block would normally be higher than the contrast measured in a, despite the fact that this object
border is not effective at masking.68 Here, we overcome this limitation by taking the effective local
contrast to be the minimum contrast of all four subblocks. Thus in b, since subblock p2,1 has the
lowest contrast, the effective local contrast is measured to be just as low as that measured in a.
f=
u
M/2
v
+
N/2
2 1/2
= arctan
tan
v
,
u
180
c/deg,
17
18
u, v = H
H
u
M/2
v
+
N/2
2 1/2 v
tan
180
,arctan
v
u
19
A.2
given by
orgp = minp1,1, p1,2, p2,1, p2,2.
20
The quantities p1,1, p1,2, p2,1, and p2,2 correspond to the standard deviations of the four subblocks of
block p as illustrated in Fig. 9.
This modified standard deviation is used to compensate
for the fact that edges and object boundaries are not effective at masking, despite the fact that these regions exhibit
high contrast.67 As shown in Fig. 9,68 by defining the effective contrast based on the minimum contrast of each
blocks four subblocks, the resulting contrast around edges/
object borders is measured to be relatively low.
A.3 Mapping Contrast Ratios to Visibility
To determine the locations at which the distortions in the
distorted image are visible, Eq. 6 is used to compare Cerr
with Corg to yield the visibility map p as described in
Sec. 3.1.1.
The first statement in Eq. 6 handles the case in which
the log contrast of the error is greater than the log contrast of the image in block p, and both of these contrasts are
011006-18
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
21
u , v = G u / M / 22 + v / N / 22 , arctan
where
G
s,o
s,o
v / u, and where I denotes the original or distorted image.
This equation is equivalent to convolving I in the spatial
domain with both even and odd-symmetric log-Gabor filter
kernels. The inverse DFT yields complex-valued subband
coefficients cs,o C MN in which the real part of the coefficients correspond to the even-symmetric filter outputs,
and the imaginary part of the coefficients correspond to the
odd-symmetric filter outputs.
For each subband, we collapse the real and imaginary
components of each coefficient even and odd filter output
into a single magnitude via
Journal of Electronic Imaging
22
011006-19
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
and evaluation, in Vision Models for Target Detection and Recognition, E. Peli, Ed., pp. 245283, World Scientific, New York 1995.
C. J. van den Branden Lambrecht, A working spatio-temporal model
of the human visual system for image representation and quality assessment applications, in Proc. IEEE Intl. Conf. Acoustics, Speech,
Signal Process., pp. 22912294 May 1996.
Y. Lai and C. J. Kuo, Image quality measurement using the haar
wavelet, Proc. SPIE 3169, 127 1997.
M. Miyahara, K. Kotani, and V. R. Algazi, Objective picture quality
scale PQS for image coding, IEEE Trans. Commun. 469, 1215
1226 1998.
W. Osberger, N. Bergmann, and A. Maeder, An automatic image
quality assessment technique incorporating higher level perceptual
factors, Proc. IEEE Intl. Conf. Image Process., 3, 414418 1998.
S. Winkler, A perceptual distortion metric for digital color images,
Proc. IEEE Intl. Conf. Image Process., 3, 399403 1998.
A. Bradley, A wavelet visible difference predictor, IEEE Trans.
Image Process. 8, 717730 May 1999.
S. Winkler, Visual quality assessment using a contrast gain control
model, in IEEE Signal Process. Soc. Workshop Multimedia Signal
Process., pp. 527532 Sep. 1999.
J. Lubin, M. Brill, A.. De Vries, and O. Finard, Method and apparatus for assessing the visibility of differences between two image
sequences, U.S. Patent 5,974,159 1999.
Jndmetrix technology, Sarnoff Corp., see http://www.sarnoff.com/.
N. Damera-Venkata, T. D. Kite, W. S. Geisler, B. L. Evans, and A. C.
Bovik, Image quality assessment based on a degradation model,
IEEE Trans. Image Process. 9, 636650 2000.
M. Carnec, P. L. Callet, and D. Barba, An image quality assessment
method based on perception of structural information, in ICIP 2003,
2, 185188 2003.
See http://foulard.ece.cornell.edu/VSNR.html.
Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, Image quality
assessment: from error visibility to structural similarity, IEEE Trans.
Image Process. 13, 600612 2004.
H. R. Sheikh, A. C. Bovik, and G. de Veciana, An information
fidelity criterion for image quality assessment using natural scene
statistics, IEEE Trans. Image Process. 1412, 21172128 2005.
G. Zhai, W. Zhang, X. Yang, and Y. Xu, Image quality assessment
metrics based on multi-scale edge presentation, IEEE Workshop Signal Process. Syst. Design Implement., pp. 331336 2005.
A. Shnayderman, A. Gusev, and A. M. Eskicioglu, An SVD-based
grayscale image quality measure for local and global assessment,
IEEE Trans. Image Process. 152, 422429 2006.
R. L. DeValois and K. K. DeValois, Spatial Vision, Oxford University
Press, Oxford, UK 1990.
J. Schulkin, Cognitive Adaptation, 1st ed., Cambridge University
Press, Boston, MA 2008.
H. R. Sheikh, Z. Wang, A. C. Bovik, and L. K. Cormack, Image and
video quality assessment research at live, see http://
live.ece.utexas.edu/research/quality/.
C. C. Taylor, Z. Pizlo, J. P. Allebach, and C. A. Bouman, Image
quality assessment with a gabor pyramid model of the human visual
system, Proc. SPIE 30161, 5869 1997.
P. LeCallet, A. Saadane, and D. Barba, Frequency and spatial pooling of visual differences for still image quality assessment, Proc.
SPIE 3959, 595603 2000.
T. N. Pappas, T. A. Michel, and R. O. Hinds, Supra-threshold perceptual image coding, Proc. ICIP, pp. 237240 1996.
M. P. Eckert and A. P. Bradley, Perceptual quality metrics applied to
still image compression, Signal Process. 70, 177200 1998.
A. Ninassi, O. L. Meur, P. L. Callet, and D. Barba, Which semi-local
visual masking model forwavelet based image quality metric? in
ICIP, pp. 11801183 2008.
N. Graham, Visual Pattern Analyzers, Oxford University Press, New
York 1989.
A. B. Watson and J. A. Solomon, A model of visual contrast gain
control and pattern masking, J. Opt. Soc. Am. A 14, 23782390
1997.
W. Zeng, S. Daly, and S. Lei, An overview of the visual optimization tools in jpeg 2000, Signal Process. Image Commun. 17, 85104
2001.
D. M. Chandler and S. S. Hemami, Dynamic contrast-based quantization for lossy wavelet image compression, IEEE Trans. Image
Process. 144, 397410 2005.
Z. Wang, E. P. Simoncelli, and A. C. Bovik, Multi-scale structural
similarity for image quality assessment, in Proc. Asilomar Conf.
Signals, Syst. Computers, Vol. 2, p. 1398 Nov. 2003.
C. L. Yang, W. R. Gao, and L. M. Po, Discrete wavelet transformbased structural similarity for image quality assessment, in ICIP, pp.
377380 2008.
M. Zhang and X. Mou, A psychovisual image quality metric based
on multi-scale structure similarity, in ICIP, pp. 381384 2008.
Z. Wang, L. Lu, and A. C. Bovik, Video quality assessment based on
structural distortion measurement, Signal Process. Image Commun.
192, pp. 121132 2004.
41. S. Ye, K. Su, and C. Xiao, Video quality assessment based on edge
structural similarity, Image Signal Process. Congress 3, 445448
2008.
42. U. Engelke and H. J. Zepernick, Pareto optimalweighting of structural impairments for wireless imaging quality assessment, in ICIP,
pp. 373376 2008.
43. H. R. Sheikh and A. C. Bovik, Image information and visual quality, IEEE Trans. Image Process. 152, 430444 2006.
44. M. Liu and X. Yang, A new image quality approach based on decision fusion, Fuzzy Syst. Knowledge Discovery, 4th Intl. Conf., 4,
1014 2008.
45. B. Girod, Whats wrong with mean-squared error? in Digital Images and Human Vision, A. B. Watson, Ed., pp. 207220 1993.
46. M. Stokes, M. Anderson, S. Chandrasekar, and R. Motta, A standard
default color space for the internet-srgb, see http://www.w3.org/
Graphics/Color/sRGB 1996.
47. S. J. Daly, Subroutine for the generation of a human visual contrast
sensitivity function, Eastman Kodak Tech. Report 233203y 1987.
48. T. Mitsa and K. Varkur, Evaluation of contrast sensitivity functions
for the formulation of quality measures incorporated in halftoning
algorithms, IEEE Intl. Conf. Acoustics, Speech, Signal Process., pp.
301304 1993.
49. D. M. Chandler and S. S. Hemami, Effects of natural images on the
detectability of simple and compound wavelet subband quantization
distortions, J. Opt. Soc. Am. A 20, pp. 11671180 July 2003.
50. A. K. Jain and F. Farrokhnia, Unsupervised texture segmentation
using gabor filters, Pattern Recogn. 24, 11671186 May 1991.
51. D. J. Field, Relations between the statistics of natural images and the
response properties of cortical cells, J. Opt. Soc. Am. A 4, 2379
2394 1987.
52. B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: a strategy employed by v1? Vision Res. 37, 3311
3325 1997.
53. F. A. A. Kingdom, A. Hayes, and D. J. Field, Sensitivity to contrast
histogram differences in synthetic wavelet-textures, Vision Res. 41,
585598 1995.
54. D. M. Chandler and S. S. Hemai, Vsnr: a wavelet-based visual
signal-to-noise ratio for natural images, IEEE Trans. Image Process.
169, 22842298 2007.
55. N. Ponomarenko, F. Battisti, K. Egiazarian, J. Astola, and V. Lukin,
Metrics performance comparison for color image database, in 4th
Intl. Workshop Video Process. Quality Metrics Consumer Electron.,
6, p. 6 Jan. 2009.
56. H. R. Sheikh, Z. Wang, A. C. Bovik, and L. K. Cormack, Image and
video quality assessment research at LIVE, see http://live.ece.
utexas.edu/research/quality/.
57. Y. Horita, K. Shibata, and Z. M. Parvez Saddad, Subjective quality
assessment toyama database, see http://mict.eng.u-toyama.ac.jp/
mict/ 2008.
58. See http://vision.okstate.edu/csiq/.
59. E. C. Larson and D. M. Chandler, Most apparent distortion: a dual
strategy for full-reference image quality assessment, Proc. SPIE
7242, 72420S 2009.
60. VQEG, Final report from the video quality experts group on the
validation of objective models of video quality assessment, phase ii,
see http://www.vqeg.org Aug. 2003.
61. H. Sheikh, M. Sabir, and A. Bovik, A statistical evaluation of recent
full reference image quality assessment algorithms, IEEE Trans. Image Process. 15, 13491364 2006.
62. Z. Wang, E. P. Simoncelli, and A. C. Bovik, Multi-scale structural
similarity for image quality assessment, 37th IEEE Asilomar Conf.
Signals, Syst. Computers, Vol. 2, pp. 1398 2003.
63. M. D. Gaubatz, D. M. Rouse, and S. S. Hemami, MeTriX MuX,
http://foulard.ece.cornell.edu/gaubatz/metrix_mux/.
64. A. K. Bera and C. M. Jarque, Efficient tests for normality, homoscedasticity and serial independence of regression residuals, Econ. Lett.
6, 255259 1980.
65. E. Peli, L. E. Arend, G. M. Young, and R. B. Goldstein, Contrast
sensitivity to patch stimuli: effects of spatial bandwidth and temporal
presentation, Spatial Vis. 7, 114 1993.
66. A. B. Watson, G. Y. Tangand, J. A. Solomon, and J. Villasenor,
Visibility of wavelet quantization noise, IEEE Trans. Image Process. 6, 11641175 1997.
67. D. M. Chandler, M. D. Gaubatz, and S. S. Hemami, A patch based
structural masking model with an application to compression, in
EURASIP J. Image Video Process. 2009, 649316 2009.
68. S. S. Hemami, D. M. Chandler, B. C. Chern, and J. A. Moses, Suprathreshold visual psychophysics and structure-based visual masking, in Proc. SPIE 6077, 60770O 2006.
69. D. M. Chandler and S. S. Hemami, Suprathreshold image compression based on contrast allocation and global precedence, Proc. SPIE
5007, 7386 2003.
011006-20
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms
Larson and Chandler: Most apparent distortion: full-reference image quality assessment
Eric C. Larson received his BS and MS in
electrical engineering from Oklahoma
State University, Stillwater, where he specialized in image processing and perception, in 2006 and 2008, respectively. His
main research area is the analysis of psychovisual signal processing in computer
systems, especially pertaining to humancomputer interaction. He is also active in
the areas of supported signal processing
and machine learning for healthcare and
environmental applications. He is active in signal processing education, and is a member of IEEE and HKN. He is currently pursuing his
PhD in electrical and computer engineering at the University of
Washington in Seattle.
011006-21
Downloaded from SPIE Digital Library on 22 Feb 2010 to 139.78.79.37. Terms of Use: http://spiedl.org/terms