Beruflich Dokumente
Kultur Dokumente
Abstract:
Automatic image annotation has attracted lots of research interest, and effective method for image annotation. Find
effectively the correlation among labels and images is a critical task for multi-label learning. Most of the existing multi-label
learning methods exploit the label correlation only in the output label space, leaving the connection between label and features
of images untouched. In image annotation, a semi supervised learning which incorporates a large amount of unlabeled data
along with a small amount of labelled data, is regarded as an effective tool to reduce the burden of manual annotation. But
some unlabeled data in semi-supervised models contain distance that negatively affects the training stage. Outliers in the
method can be over-fitting problem especially when a small amount of training data is used. In this paper, proposing an
automatic image annotation method called modified MLDL with hierarchical sparse coding for solving these problems. This
method prevents the over-fitting associated with the semi-supervised based approach by using sparse representation to
maximizing the correlation between the data. Apply a Tree Conditional Random Field to construct the Hierarchical structure of
an image. The result will be multi-label set prediction of a query image and semantic retrieval of images. Experiment results
using LabelMe datasets and Caltech datasets confirms the effectiveness of this method.
them image annotation has become an important Generative models mainly consist of
field in research. The aim of image annotation is to mixture models and topic models. Mixture models
denoting the meanings and concepts with the usually define a joint distribution over image visual
images. But the manual annotation is become features and labels. To annotate a new image,
impossible, costly, and time consuming, so need to mixture models compute the conditional probability
introduce automate the annotation process. So that of each label given the visual features of the image.
the information and features extracted from the A fixed number of labels with the highest
images do not always reflect the Image content and probability are used for annotation.
the semantic gap as the lack of coincidence between In which, sparse kernel learning framework
the information as the main challenge of automatic into the continuous relevance model and greedily
systems. selects an optimal combination of kernels. Here
A. Main Objectives
address this gap by formulating a sparse kernel
learning framework for the CRM, dubbed the SKL-
Specifically the aim is: CRM that greedily selects an optimal combination
• To bridging the semantic gap (maximizing of kernels. The kernel learning framework rapidly
the correlation) between low level features converges to an annotation accuracy that
of the images and high level Semantic substantially outperforms a host of state-of-the-art
concepts to automatically annotate an image annotation models. They made two surprising
with appropriate keywords, such as multiple conclusions: firstly, if the kernels are chosen
labels, which reflect visual content in the correctly, only a very small number of features are
image. required so to achieve superior performance over
• To enhance the relationship between labels models that utilize a full suite of feature types; and
and visual features. secondly, the standard default selection of kernels
• To improve the performance, speed and commonly used in the literature is sub-optimal, and
accuracy with reduce the computational cost. it is much better to adapt the kernel choice based on
From this section, the main challenge is to the feature type and image dataset.
suppress the outliers between the visual data and Advantages
the semantic concept of the image. The main aim of
the project is to enhance the relationship between • Comparatively less running time
them and improve the performance of the system. Limitations
incorporate the dictionary learning technique into • Lack of correlation between image and label
multi-label learning and design the label feature
consistency regularization term to learn better • When the number of training examples is
representation of features. In the output label space, limited, the nearest neighbor-based models
design the partial-identical label embedding, in may fail.
which samples with the exactly same label set can • The similarity between training samples and
cluster together, and samples with partial-identical query sample is computed only according to
label sets can collaboratively represent each other. visual features of images.
Advantages
• Showing better performance than existing D. “Multi-Instance Multi-Label Learning for Image
methods. Classification with Large Vocabularies”
Oksana Yakhnenko Google Inc New York, NY, USA, Vasant
• Higher the relationship between labels and Honavar Iowa State University Ames, IA, USA, 2014[4]
visual features.
• Obtain effective for semantic retrieval and In this paper introducing a learning algorithm
image annotation. for Multiple Instance Multiple Label learning
Limitations (MIML).The algorithm trains a set of discriminative
• Lack of correlation multiple instance classifiers and models the
• More computation time needed correlations among labels by finding a low rank
weight matrix thus forcing the classifiers to share
C. M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid, weights. MIML learning is a generalization of
“TagProp: Discriminative metric learning in nearest
neighbor models for image auto-annotation,” in IEEE supervised learning. Model is a discriminative
International Conference on Computer Vision, 2009.[2] model trained to maximize the probability of the
labels presenting the image and minimize the
TagProp is a nearest neighbor model. TagProp probability of the labels absent from the image.
is in short for Tag Propagation, a new nearest Advantages
neighbor type model that predicts tags by taking a • Easy-to-implement MIML learning
weighted combination of the tag absence/presence algorithm that can be used to train MIML
among neighbors. First, the weights for neighbors classifiers for image annotation on large
are either determined based on the neighbor rank or datasets and in settings where the
its distance, and set automatically by maximizing vocabulary of possible labels is large.
the annotations in a set of training images. With • MIML algorithms, is scalable to setting
rank based weights the k-th neighbor always with large number of images and large
receives a fixed weight, whereas distance based vocabulary of possible labels.
weights decay exponentially with the distance. Limitations
Combine a collection of image similarity metrics • Compare to semi supervised learning
that cover different aspects of image content, such method, supervised learning method has
as local shape descriptors or global color low performance.
histograms. • With this approach the training cost is
Advantages extremely high because an enormous
• Tag prediction model is conceptually amount of training data must be labeled
simple. manually.
• Annotate images by ranking the tags for
a given image, or can do keyword based
retrieval by ranking images for a given TABLE I
tag. STUDY OF EXISTING METHODS
Limitations Sl. Comparison of existing methods
no Title Advantages Disadvantages
1 Sparse kernel Comparatively The generative score and maximum value selected from that. The
learning for less running data may not be
image annotation time optimal for image
performance of proposed method evaluated and
annotation task; compared with existing methods. The experiment
Parameter results using two datasets confirm the performance
estimation process
computationally
of the proposed method.
expensive. A. Feature Vector Creation
2 Tag-Prop Tag prediction Lack of correlation
model is between image Sparse representation based methods have
conceptually and label feature, interesting image annotation results, and the
simple. When the number
Annotate images of training dictionary used for sparse coding plays a main role
by ranking the examples is in it. The dictionary can be constructed by directly
tags for a given limited, the nearest using the original training samples, whereas the
image neighbor-based
models may fail. original samples have much redundancy and noise
3 Mlml using Easy-to- Low performance, that are affecting to prediction. To obtain an
descriptive implement Training cost is effective representation samples, we need to
method MIML learning extremely high.
algorithm , large
adaptively learn dictionaries.
number of The main feature descriptors used here are, SSIM,
images and GIST, LBP, HOG, SIFT and Color descriptors.
large vocabulary
of possible Histogram of oriented gradients (HOG) is the
labels. feature descriptors used for the purpose of object
detection. This method using overlapping local
4 Multi-Label Annotate images Time complexity,
Dictionary with more Lack of accurate contrast normalization for improved accuracy and it
Learning with correct labels, annotation computed on a denze grid of uniformly spaced cells
label consistency Annotate (local descriptors). GIST summarizes the gradient
regularization and different images
partial-identical using the same information (scales and orientations) for different
label embedding label with parts of an image, which provides a rough
(MLDL) relatively proper description of the scene. Scale-Invariant Feature
weights,
Effective for Transform in short SIFT, it detects and uses a larger
semantic number of features from the images, which reduces
retrieval the contribution of the errors caused by local
variation. Extracting uniform local binary pattern
(LBP) from a greyscale image, it is a visual
In Table II show a study on some notable and
descriptors. The LBP features encode local Pattern
recent research work on automatic image
information or texture information, which can use
annotation.
for tasks such as classification, detection, and
IV. PROPOSED SYSTEM recognition. Self similarity descriptors (SSIM), it is
In this section, describes the proposed approach a local descriptor finding the structural similarity of
of Modified multi-label dictionary learning (MLDL) the images. Divide each image into certain blocks
using Hierarchical sparse coding [5]. In the training to finding the similarity among them.
stage, firstly describes the feature vector creation
using the datasets. Then, describes the dictionary B. Hierarchical Sparse Coding
learning with Hierarchical sparse coding. The In consideration of the drawbacks in a supervised
Hierarchical model used here is Tree conditional approach, propose an automatic image annotation
random field model (TCRF), this model is robust to method where an effective semi-supervised method,
scale variance. In the Testing stage, extract the semi-CCA and sparse representation collaboratively
feature value and then using the trained dictionary, suppress the outliers in image. The first step is to
calculating the score with the database dictionary generate subspaces that maximize the correlation
between image features and label features are
D. Sparse Representation
A semi-supervised approach is used which
uses unlabeled data instead of some of labelled data.
But there still exist outliers in the unlabeled data.
So here are introducing a new method that
generates subspaces using the semi-supervised
approach called Semi-supervised Canonical
Correlation Analysis. This will suppresses such
outliers using Regularized Orthogonal Matching
Pursuit (ROMP) sparse algorithm. The Fig.2 shows
the sparse representation.
and Y represent image feature and label feature. The aim of Then X and Y represent image feature and label feature,
Semi-CCA
CCA or CCA is to find the optimum subspace that learned dictionary D, query image Z.
maximizes a correlation between projected x and y. 2. Subspace generation and sparse representation
Semi-CCA
CCA can be expressed as the combination of two
Where XL = {Xn}Nn=1 and YL = {Yn}Nn=1 are N
factors: CCA with labelled data and PCA with all data
labeled data and XU = {Xm}Mm=1 is M unlabeled data. X and Y
includingg unlabeled data. Therefore, the semi-CCA semi
indicate image feature and label feature respectively.
formulation is also obtained by combination of the two
problems. The image feature and the label feature are
connected via latent variables Z in the subspace.
REFERENCES
1. S. Moran and V. Lavrenko, “Sparse kernel learning for
image annotation,” in Proceedings of International
Conference on Multimedia Retrieval, ACM, 2014.