Beruflich Dokumente
Kultur Dokumente
Introduction
Scale-Invariant Feature Transform (SIFT) A method developed by David G. Lowe Feature extraction method Invariance in feature extraction A method should locate features Extraction method should be robust, it should handle different types of changes between images
Illumination Afne transform
Scale Rotation
Algorithm
The steps of the SIFT algorithm: 1. Scale-space extrema detection
Search over scales and image locations Locate local extremas
2. Keypoint localization
Selects keypoints from local extremas Keypoints are selected based on measures of their stability
3. Orientation assignment
Orientations are assigned to each keypoint based on local image gradient directions
4. Keypoint descriptor
Local image gradients are measured at the selected scale in the region around each keypoint These are transformed into a representation that allows for signigant levels of local shape distortion and change in illumination
The rst stage in keypoint detection is to nd the local extremas in scale-space. It contains A cascade ltering approach Creation of octaves and Scale-space images for each octave
(2)
The procedure is repeated by changing the scale by multiplying it with the factor k s times. Then the difference of Gaussians (DoG), D(x, y, ) = L(x, y , k ) L(x, y, ), (3)
is calculated for adjagent blurred images. According to Lowe to achieve stable keypoints one should set s = 3 and k = 21/s .
The procedure to calculate the differences of Gaussians is then repeated for each octave. The creation of the next octave: Select the Gaussian blurred image which has value twice to that of the original Subsample the image and use the output as the starting point for next octave Subsampling is made by selecting every second pixel from the rows and columns of the image
Keypoint localization
Elimination of keypoints
Eliminate some points from the candidate list of keypoints by nding those that have low contrast or are poorly localised on an edge Contrast thresholding Cornerness thresholding
To detect the local maxima and minima of D(x, y , ) each point is compared with the pixels of all its 26 neighbours If this value is the minimum or maximum, then this point is an extrema
D X
(5)
If X > 0.5 then it means that the extremum lies closer to a different sample point. In this case, the interpolation is performed.
Elimination of keypoints
a The 233x189 pixel original image b The initial 832 keypoints locations at maxima and minima of the difference-of-Gaussian function c After applying a threshold on minimum contrast 729 keypoints remain d The nal 536 keypoints that remain following an additional threshold on ratio of principal curvatures
Contrast thresholding
The function value at the extremum, D(X ), is useful for rejecting unstable extrema with low contrast. This can be obtained by substituting equation (5) into (4), giving D(X ) = D + 1 D T X. 2 X (6)
If the function value at X is below a threshold value this point is excluded. For (c) all extrema with a value of |D(X )| < 0.03 were discarded.
Cornerness thresholding
A poorly dened peak in the difference-of-Gaussian function will have a large principal curvature across the edge but a small one in the perpendicular direction. The principal curvatures can be computed from a 2x2 Hessian matrix, H, computed at the location and scale of the keypoint: H = Dxx Dxy Dxy Dyy (7)
The derivatives are estimated by taking differences of neighboring sample points. The eigenvalues of H are proportional to the principal curvatures of D.
Cornerness thresholding
Let be the eigenvalue with the largest magnitude and be the smaller one Tr (H) = Dxx + Dyy = + (8) Det(H) = Dxx Dyy (Dxy )2 = Let r be the ratio between the largest magnitude eigenvalue and the smaller one, so that = r . Then, Tr (H)2 ( + )2 (r + )2 (r + 1)2 = = = Det(H) r r 2 The quantity (r + 1)2 /r is at a minimum when the two eigenvalues are equal and it increases with r . Therefore, to check that the ratio of principal curvatures is below some threshold, r , we only need to check Tr (H)2 (r + 1)2 < Det(H) r The transition from (c) to (d) was obtained with r = 10. (11) (10) (9)
Left: The point in the middle is the keypoint candidate. The orientations of the points in the square area around this point are precomputed using pixel differences. Right: Each bin in the histogram holds 10 degree, so it covers the whole 360 degree with 36 bins in it. The value of each bin holds the magnitude sums from all the points precomputed within that orientation.
Keypoint descriptor
Keypoint samples are accumulated into orientation histograms summarizing the contents over 4x4 subregions Best result is obtained 4X4 array of histograms with 8 orientation bins in each As a result a 4x4x8 = 128 element feature vector is generated for each keypoint
Keypoint descriptor
Orientation invariance
In order to achieve orientation invariance the coordinates of the descriptor and the gradient orientations are rotated relative to the keypoint orientation For efciency, the gradients are precomputed for all levels of the pyramid A Gaussian weighting function with equal to one half the width of the descriptor window is used to assign a weight to the magnitude of each sample point The purpose of the Gaussian window is
To avoid sudden changes in the descriptor with small changes in the position of the window And to give less emphasis to gradients that are far from the center of the descriptor, as these are most affected by misregistration errors
Boundary affects
To avoid all boundary affects Trilinear interpolation is used to distribute the value of each gradient sample into adjacent histogram bins In other words, each entry into a bin is multiplied by a weight of 1 d for each dimension d is the distance of the sample from the central value of the bin as measured in units of the histogram bin spacing
Effect of illumination
The feature vector modication Reason by this is to reduce the effects of illumination change First, the vector is normalized to unit length Second, threshold the values in the unit feature vector And then renormalizing to unit length
Model images of planar objects Recognition of 3D objects Recognising Panoramas People Redetection [Hu et al., 2008]
Object recognition
Recognising panoramas
References
David G. Lowe, Object Recognition from Local Scale-Invariant Features, Proc. of the International Cenference on Computer Vision, 1999 David G. Lowe, Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 2004 M. Brown and D.G. Lowe, Recognising Panoramas, International Conference on Computer Vision, 2002 Andrea Vevaldi, SIFT for Matlab, http://www.vlfeat.org/ vedaldi/code/sift.html Cosmin Ancuti and Philippe Bekaert, SIFT-CCH: Increasing the SIFT distinctness by Color Co-occurrence Histograms, Proceedings of 5th IEEE International Symposium on Image and Signal Processing and Analysis, 2007.
References
K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE PAMI, 2005. Alaa E. Abdel-Hakim, Aly A. Farag: CSIFT: A SIFT Descriptor with Color Invariant Characteristics. Y. Ke, R. Suthankar and L. Hutson, PCA-SIFT: a more distinctive representation for local image descriptors, in Proc. of CVPR, 2004. Y. Yang and S. Newsam, Comparing SIFT Descriptors and Gabor Texture Features for Classication of Remote Sensed Imagery, IEEE International Conference on Image Processing, 2008 Lei Hu, Shuqiang Jiang, Qingming Huang, Yizhou Wang,Wen Gao,PEOPLE RE-DETECTION USING ADABOOST WITH SIFT AND COLOR CORRELOGRAM, The International Conference on Image Processing (ICIP2008)