Beruflich Dokumente
Kultur Dokumente
shifeng.chen@siat.ac.cn h.yan@cityu.edu.hk
Shenzhen Institutes of Advanced Technology Department of Electronic Engineering
CAS, China City University of Hong Kong
Otter Shark
Shark
Fish
Hamster
Shrew
Dolphin
Hamster Ray
Shrew
Ray
Figure 1: Feature spaces w/o and w/ Semantic hierarchy. (a): Feature space without hierarchy can only learn the similarity
within fine-level classes. (b): Using semantic hierarchy, the learnt feature space preserves the similarity between classes.
ABSTRACT semantic labels or convert them to similar/dissimilar labels for train-
Convolutional neural networks have been widely used in content- ing. The natural semantic hierarchy structures are ignored in the
based image retrieval. To better deal with large-scale data, the deep training stage of the deep hashing model. In this paper, we present
hashing model is proposed as an effective method, which maps an effective algorithm to train a deep hashing model that can pre-
an image to a binary code that can be used for hashing search. serve a semantic hierarchy structure for large-scale image retrieval.
However, most existing deep hashing models only utilize fine-level Experiments on two datasets show that our method improves the
fine-level retrieval performance. Meanwhile, our model achieves
state-of-the-art results in terms of hierarchical retrieval.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation KEYWORDS
on the first page. Copyrights for components of this work owned by others than the Deep neural networks, learn to hash, hierarchical image retrieval
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission ACM Reference Format:
and/or a fee. Request permissions from permissions@acm.org.
Xuefei Zhe, Le Ou-Yang, Shifeng Chen, and Hong Yan. 2019. Semantic
Conference’17, July 2017, Washington, DC, USA
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Hierarchy Preserving Deep Hashing for Large-scale Image Retrieval. In
ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00 Proceedings of ACM Conference (Conference’17). ACM, New York, NY, USA,
https://doi.org/10.1145/nnnnnnn.nnnnnnn 5 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
Conference’17, July 2017, Washington, DC, USA Xuefei Zhe, Le Ou-Yang, Shifeng Chen, and Hong Yan
mAP@all mAHP@2.5k
Methods
32-bit 64-bit 128-bit 32-bit 64-bit 128-bit
DPSH 0.1657 0.1829 0.1767 0.2334 0.2506 0.2218
DSDH 0.1790 0.2128 0.1987 0.2532 0.3201 0.3301
DCWH 0.7680 0.8178 0.8008 0.4443 0.5066 0.4616
Ours 0.8067 0.8259 0.8144 0.8595 0.8667 0.8630
class gaps. d(·, ·) is the Hamming distance function. To solve this 4 EXPERIMENTS
discrete optimization problem, we follow the optimization method
4.1 Implementation Details
in DCWH [24] that firstly relaxes the r n to [−α, α] where α is set to
1.1 as [24]. Then the distance d(·, ·) can be replaced with a Euclidean We implement our model with an open source deep learning frame-
distance. The loss function can be derived as work, MXNet [3]. We use Resnet-50 [8] as our base neural networks.
The models are pre-trained on the ImageNet [18] and are down-
K exp{− 2σ1 2 ∥r n − µynk ∥}
Õ loaded from the official website of MXNet. The final parameters of
L =− log ÍC k
k
exp{− 1 ∥r , µ ∥} (2) fully-connected layer are initialized by the Xavier method with the
k =1 i=1 2σk 2 n ki
magnitude of 1. The mini-batch stochastic gradient descent (SGD)
+ {ReLU (−α − r n ) + ReLU (r n − α)}, with momentum is used for training. The learning rate is decayed
where η 1 is a regulation term. ReLU is the rectified linear unit from 10−4 to 10−5 . Other hyperparameters are set based on the
defined as ReLU (x) = max(0, x). This loss function is differentiable dataset specifically following the stand cross-validation. We con-
and the classical back-propagation can be used for optimizing the ducted experiments on CIFAR-100 [11] and North American birds
parameters of CNN, Θ datasets. Images are firstly resized to 240 × 240, and then randomly
We update the fine-level (leaf nodes) class centers {µ 0i } with the cropped to 224 × 224 as the input of networks. Experiments are
training data using the following formulation. run on a single Nvidia GTX-1080 GPU server. The results of DPSH,
DSDH, and DCWH are based on the codes given by the original
N 0i
1 Õ authors. Other results are cited from their own papers.
µ 0i = f (x n , Θ), (3)
N 0i n=1
4.2 Evaluation Metrics
where N 0i is the number of images X n that belong to the i-th class in
the lowest level of hierarchy. The upper class centers are calculated We present both results of the single level retrieval and hierarchi-
with their own lower level class centers as follows cal retrieval performance. The fine level retrieval performance is
Ck i evaluated with the mean average precision (mAP@all). As for the
1 Õ hierarchical retrieval performance, we calculate the mean average
µ ki = µ , (4)
Cki c=1 (k −1),c Hierarchical Precision (mAHP@K) and the normalized Discounted
where Cki is the number of level-k classes belonging to the i-th Cumulative Gain (nDCG) [9] measurements.
class at level k. By such a recursive calculation, we can save com- Considering xq is a query image with label yq , the ordered re-
putation cost and eliminate the influence of imbalanced training trieval results is denoted as {(x 1 , y1 ), (x 2 , y2 ), . . . , (x N , y N )}. The
data between classes. To ensure that the parent class gaps are larger hierarchical precision (HP) [4, 16] at N-th returning images is de-
than the child ones, the hyperparameters σk should be chosen to fined as Ín
i=1 Sim(yq , yi )
satisfy the following equation, HP@N = , (5)
maxo ni=1 Sim(yq , yoi )
Í
σk2 < σk2+1
where Sim is score function for measuring the similarity between
The two-stage strategy described in [24] is used to reduce the the classes based on the category hierarchy. The denominator is
quantization error. The whole algorithm is summarized in Algo- the largest summation that can be achieved by the best N database
rithm 1. items.
Algorithm 1 Algorithm
4.3 CIFAR-100
Initialize CNN parameters Θ with a pre-trained model CIFAR-100 comprses of 60, 00 tinny images. It has a two-level se-
repeat mantic hierarchy. Each fine class contains 500 training images and
1. Compute features r n = f (x n , Θ) with Θ 100 for testing. The 100 fine classes are grouped into 20 coarse cat-
2. Update the fine-level class centers {µ 0i} with Eq. 3 egories. Each coarse class is consisted by 20 fine ones. The official
3. Update upper-level class centers with Eq. 4 training set is used for training deep hashing models. During the
4. Using {µ ki } and loss function Eq. 2 to optimize Θ for N testing, the training set serves as the database, and the testing im-
iterations ages are used as query ones. DPSH, DSDH, and DCWH are trained
until Convergence with fine labels. Our method uses both fine and coarse labels for
training.
Conference’17, July 2017, Washington, DC, USA Xuefei Zhe, Le Ou-Yang, Shifeng Chen, and Hong Yan
mAHP@352 mAHP@1087
Methods
64-bit 128-bit 256-bit 64-bit 128-bit 256-bit
DCWH 0.5939 0.5940 0. 6511 0.5215 0.5110 5733
Ours 0.8323 0.8372 0.8275 0.8631 0.8806 0.8797
We present the mAP@all as the fine level retrieval perfor- Table 4: MAP of NABirds
mance. Considering each coarse categories has 2500 images, we eval-
uate the hierarchical retrieval performance with mAHP@2,500. mAP@all
Both results are presented in Table 1. It can be found that, for Methods
64-bit 128-bit 256-bit
general retrieval performance, our method is slightly higher than
DCWH. At 32-bit, the mAP of our method is 3.87% higher than DCWH 0.5113 0.5538 0.5957
the one of DCWH, and 0.8% and 1.36% higher for 64- and 128-bit, Ours 0.6425 0.6590 0.6358
respectively. However, our method achieves significantly better re-
sults than the previous best one. Our method obtains 0.8631 mAHP
in average, which is 39.22% higher than DCWH’s 0.4708. The huge to supervise the training of deep hashing models. The fine level
difference of mAP and mAHP of DCWH indicates that the semantic retrieval performance is measured with mAP@all and the results
hierarchical Hamming space is hard to learn if only use the fine are presented in Table 4. It can be observed that by using the se-
level class labels. mantic hierarchical information, our method significantly improves
the fine level retrieval performance. The best performance of our
Table 3: nDCG@100 on CIFAR-100 method is 0.6590 at 128-bit, which is 6.33% higher than the best
of DCWH at 256-bit. The average performance gain is 9.21% for
nDCG@100 tested three bit numbers.
Methods
32-bit 64-bit In order to evaluate the hierarchical retrieval performance better,
DPSH∗ 0.5650 0.5751 we present the mAHP at different numbers of return images, which
SHDH∗ 0.6141 0.6460 are the median value (352) and the mean value (1087) of images for
DCWH 0.7894 0.8022 the highest-level categories. Both results are shown in Table 2. For
Ours 0.8189 0.8242 mAHP@352, our method obtains the best performance, 0.8372 at
128-bit, which is 23.42% higher than DCWH’s. For mAHP@1, 087,
We also provide nDCG@100 for easy comparison with SHDH [20], the average performance gain achieved by our method is 36.7%. In
which also utilizes the hierarchical label information for deep hash- conclusion, by utilizing the hierarchical information our method not
ing models. We follow the testing protocol in SHDH and present only significantly improves the hierarchical retrieval performance
the nDCG@100 result of DCWH and ours in Table 3. The results but also obtains better fine level retrieval results.
of DPSH ∗ and SH DH ∗ are cited from [20]. It can be found from
the table that our method surpasses SHDH with a clear margin 5 CONCLUSION
under nDCG@100 metric. Our model surpasses SHDH by 21.48% In this work, we address the problem of deep hashing learning with
and 17.82% for 32-bit and 64-bit, respectively. semantic hierarchical labels. We propose a novel hierarchical deep
hashing model based on a multi-level Gaussian loss function The
4.4 North America Birds proposed loss function allows our model to utilize the semantic
The North America Birds dataset [17] collects more than 48, 000 hierarchical labels. Meanwhile, it takes advantage of class-level
images from more than 550 visual categories of North America birds similarity learning as deep class-wise hashing [24], which enables
species. These categories are organized taxonomically into a five- our method to achieve much better performance than SHDH [20]
level hierarchy. Because all the categories belong to one root node that is based on pair-wise similarity learning.
“bird” that does not provide additional information gain, we ignore Experiments on two important hierarchical image datasets, CIFAR-
the final level and only use lower four-level hierarchy. We use the 100 and NABirds, show that our method surpasses other state-of-
official split that has 23, 929 images for training and 24, 633 testing the-art deep hashing methods in terms of the fine-level retrieval
ones are used for query images. The bounding box annotations are performance. Moreover, with help of the semantic hierarchy, our
not used in this experiment. We directly resize the original images method obtains significantly better hierarchical retrieval perfor-
to 240 × 240 and then randomly crop them to 224 × 224 as inputs of mance, which indicates that our model can provide more user-
the network. Randomly mirroring and randomly jittering are used desired retrieval results.
for data augmentation.
Considering that the performances of DPSH and DSDH signifi- REFERENCES
cantly decrease when the number of categories increases, we only [1] Yue Cao, Mingsheng Long, Jianmin Wang, Han Zhu, and Qingfu Wen. 2016. Deep
Quantization Network for Efficient Image Retrieval.. In AAAI. 3457–3463.
present the results of DCWH and ours. The main difference between [2] Z. Cao, M. Long, J. Wang, and P. S. Yu. 2017. HashNet: Deep learning to hash by
DCWH and ours is that we utilize the multi-level class information continuation. ArXiv preprint arXiv:1702.00758 (2017).
Semantic Hierarchy Preserving Deep Hashing
for Large-scale Image Retrieval Conference’17, July 2017, Washington, DC, USA
[3] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Joint Conference on Artificial Intelligence (IJCAI). 1711–1717.
Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and [15] Yeqing Li, Chen Chen, Wei Liu, and Junzhou Huang. 2014. Sub-Selective Quanti-
efficient machine learning library for heterogeneous distributed systems. arXiv zation for Large-Scale Image Search.. In AAAI. 2803–2809.
preprint arXiv:1512.01274 (2015). [16] Yuhao Ma, Meina Kan, Shiguang Shan, and Xilin Chen. 2018. Hierarchical
[4] Jia Deng, Alexander C Berg, and Li Fei-Fei. 2011. Hierarchical semantic indexing Training for Large Scale Face Recognition with Few Samples Per Subject. In 2018
for large scale image retrieval. (2011). 25th IEEE International Conference on Image Processing (ICIP). IEEE, 2401–2405.
[5] Anuvabh Dutt, Denis Pellerin, and Georges Quénot. 2017. Improving Image [17] Cornell Lab of Ornithology. [n. d.]. NABirds Dataset. http://dl.allaboutbirds.org/
Classification using Coarse and Fine Labels. In Proceedings of the 2017 ACM on nabirds Accessed:2019-01-25.
International Conference on Multimedia Retrieval. ACM, 438–442. [18] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
[6] Christiane Fellbaum. 2010. WordNet. In Theory and applications of ontology: Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C.
computer applications. Springer, 231–243. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge.
[7] Yunchao Gong, Svetlana Lazebnik, Albert Gordo, and Florent Perronnin. 2013. International Journal of Computer Vision 115, 3 (2015), 211–252.
Iterative quantization: A procrustean approach to learning binary codes for [19] Xiang Cheng Rong Bi Sen Su, Gang Chen. 2017. Deep Supervised Hashing with
large-scale image retrieval. IEEE Transactions on Pattern Analysis and Machine Nonlinear Projections. In Proceedings of the International Joint Conference on
Intelligence 35, 12 (2013), 2916–2929. Artificial Intelligence (IJCAI). 2786–2792.
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual [20] Dan Wang, Heyan Huang, Chi Lu, Bo-Si Feng, Liqiang Nie, Guihua Wen, and
learning for image recognition. In CVPR. 770–778. Xian-Ling Mao. 2017. Supervised Deep Hashing for Hierarchical Labeled Data.
[9] Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation arXiv preprint arXiv:1704.02088 (2017).
of IR techniques. ACM Transactions on Information Systems 20, 4 (2002), 422–446. [21] Xiaofang Wang, Yi Shi, and Kris M Kitani. 2016. Deep supervised hashing with
[10] Mohammad Ahangar Kiasari, Minho Lee, and Jixiang Shen. 2016. Content-based triplet labels. In Asian Conference on Computer Vision. 70–84.
image retrieval by using deep kernel Machine with Gaussian Mixture Model. In [22] Huei-Fang Yang, Kevin Lin, and Chu-Song Chen. 2017. Supervised Learning
Neural Networks (IJCNN), 2016 International Joint Conference on. IEEE, 3371–3377. of Semantics-Preserving Hash via Deep Convolutional Neural Networks. IEEE
[11] Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features Transactions on Pattern Analysis and Machine Intelligence (2017). in press.
from tiny images. (2009). University of Toronto Technical Report. [23] Ting Yao, Fuchen Long, Tao Mei, and Yong Rui. 2016. Deep Semantic-Preserving
[12] Brian Kulis, Prateek Jain, and Kristen Grauman. 2009. Fast similarity search for and Ranking-Based Hashing for Image Retrieval.. In Proceedings of the Interna-
learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence tional Joint Conference on Artificial Intelligence (IJCAI). 3931–3937.
31, 12 (2009), 2143–2157. [24] Xuefei Zhe, Shifeng Chen, and Hong Yan. 2018. Deep Class-Wise Hash-
[13] Qi Li, Zhenan Sun, Ran He, and Tieniu Tan. 2017. Deep Supervised Discrete ing: Semantics-Preserving Hashing via Class-wise Loss. arXiv preprint
Hashing. arXiv preprint arXiv:1705.10999 (2017). arXiv:1803.04137 (2018).
[14] Wu-Jun Li, Sheng Wang, and Wang-Cheng Kang. 2016. Feature learning based
deep supervised hashing with pairwise labels. In Proceedings of the International