Unsupervised Domain Adaptation

000
001
002
Unsupervised Domain Adaptation in the Wild:

Dealing with Asymmetric Label Sets
000
001
002
003
003
004
Anonymous ECCV submission
004
005
005
Paper ID 925
006
006
007
007
008
008
009
Abstract. The goal of domain adaptation is to adapt models learned

on a source domain to a particular target domain. Most methods for unsupervised domain adaptation proposed in the literature to date, assume
that the set of classes present in the target domain is identical to the set
of classes present in the source domain. This is a restrictive assumption
that limits the practical applicability of unsupervised domain adaptation
techniques in real world settings (in the wild). Therefore, we relax this
constraint and propose a technique that allows the set of target classes
to be a subset of the source classes. This way, large publicly available
annotated datasets with a wide variety of classes can be used as source,
even if the actual set of classes in target can be more limited and, maybe
most importantly, unknown beforehand.
To this end, we propose an algorithm that orders a set of source subspaces
that are relevant to the target classification problem. Our method then
chooses a restricted set from this ordered set of source subspaces. As an
extension, even starting from multiple source datasets with varied sets
of categories, this method automatically selects an appropriate subset of
source categories relevant to a target dataset. Empirical analysis on a
number of source and target domain datasets shows that restricting the
source subspace to only a subset of categories does indeed substantially
improve the eventual target classification accuracy over the baseline that
considers all source classes.
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
Keywords: Domain Adaptation
030
034
035
036
037
038
039
040
041
042
043
044
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
030
031
031
033
010
029
029
032
009
Introduction
Suppose we have access to a state-of-the-art classifier trained with thousands of

categories as in Imagenet and would like our household robot to use it to recognize objects. Unfortunately, as the distribution of samples in our house varies
from that used in the Imagenet dataset, our state-of-the-art classifier will not
perform that well. This is generally the case, as machine learning algorithms
usually assume that the training and test instances come from the same distribution. This assumption does not hold for several real world computer vision
problems. The distribution of image samples in our training set, termed Source
domain, is different from the distribution of samples at test time, termed Target domain. This shift in the underlying data distribution is called domain shift.
Visual domain shift between two domains can be caused, among others, by differences in resolution, viewpoints, illumination, clutter and backgrounds. Domain
032
033
034
035
036
037
038
039
040
041
042
043
044
2
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
ECCV-16 submission ID 925
shift problems are very pertinent in computer vision tasks because of the fact
that each dataset has its own bias. That is why results from one dataset cannot
easily be transferred to another, as shown by Torralba and Efros [1]. A number of domain adaptation techniques have been proposed in the literature [26]
that allow for adapting a source classifier to a target domain. However, all these
methods assume that we a priori choose the categories in our source dataset that
are applicable in our target dataset. This is not always applicable, as illustrated
by our household robot example, where we would have to carefully modify the
source dataset for each robot and for each house in which we want to deploy the
robot. We would rather prefer an automatic solution that carefully chooses the
subset of source categories best suited for a particular target dataset. This is the
problem that we address in this paper.
Our main contribution is a general domain adaption algorithm that works
even when the original label set of source and target data are different. We
achieve this by iteratively modifying the source subspace. In each step we choose
one more category from the source dataset to better adapt to the target dataset.
This requires us also to define a stopping criterion that indicates when it is
no longer useful to add a new category from the source domain. The method
proposed relies on projection error between source and target domains to obtain
the ordering of source category classes and evaluates local and global minima
based stopping criteria. Through this paper we make the following contributions:
We consider the case of adapting a large/varied source dataset to a target
domain dataset. This is achieved by identifying an ordered set of source
categories.
We evaluate local and global minima based stopping criteria and show that
choosing a restricted set of source categories aids adaptation to the target
dataset.
We also further consider the case where multiple source datasets are present
and the optimal combination of classes from these is selected and adapted
to a particular target dataset.
The remainder of the paper is organized as follows. In the next section we
discuss the related work, and in section 3 we recapitulate some background
material. In section 4, the proposed method is discussed in detail. In section 5
we consider the setting where multiple source datasets are adapted to a particular
target dataset. Experimental evaluation is presented in section 6, including the
performance evaluation and a detailed analysis. We conclude in section 7.
084
085
086
087
088
089
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
082
082
083
045
Related Work
Though domain adaptation is a relatively new topic within the computer vision
community, it has long been studied in machine learning, speech processing and
natural language processing communities [7, 8]. We refer to [6] for a recent review
on visual domain adaptation. There are basically two classes of domain adaptation algorithms: semi-supervised and unsupervised domain adaptation. Both
083
084
085
086
087
088
089

090
091
092
093
these classes assume the source domain is a labeled dataset. Semi-supervised domain adaptation requires some labeled target examples during learning, whereas
unsupervised domain adaptation does not require any labeled target examples.
In this work, we will focus on the unsupervised setting.
096
097
098
099
100
101
102
103
104
Deep adaptation With the availability of convolutional neural networks (CNNs),

a few methods specific to CNNs have been proposed to adapt the deep network
parameters. In [9], Oquab et al. showed that an image representation learned
by CNN on a large dataset can be transferred for other tasks by fine tuning the
deep network. These however assume availability of labeled data in the target
dataset. We work in the unsupervised setting. In [10],the hidden representations
of all the task-specific layers are embedded to a RKHS for explicit matching
of the mean embeddings of different domain distributions. In the present work,
we focus on the asymmetric label setting and can benefit from such work that
investigates unsupervised domain adaptation of deep networks.
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
Subspace-based methods In our work we mainly work with subspace based

methods using deeply learned feature representations. Subspace-based methods [24] have shown promising results for unsupervised domain adaptation
problems. In [2], source and target subspaces are represented as separate points
on a Grassmann manifold. The data is then projected onto a set of in-between
subspaces along the geodesic path, so as to obtain a domain invariant representation. One main issue with this approach is that we dont know how many in
between subspaces need to be sampled along the path. To overcome this challenge, Gong et al. [3] proposed a geodesic flow kernel based approach that captures the incremental change from source to target subspace along the geodesic
path. In [4], Fernando et al. propose to learn a transformation by minimizing
the Frobenius norm of the difference function which directly aligns source and
target subspaces. Anant et al. [11] consider domain adaptation in a hierarchical
setting and show that using different subspaces at different levels of hierarchy
improves adaptation. Another direction of particular relevance to our work is the
idea of continuously evolving domains as proposed in Hoffman et al. [12]. The
key idea in their approach is to consider continuous temporal evolution of the
target domain. We also use projection errors as our criterion for searching for
subspaces in this work. However, none of the methods mentioned above try to
address unsupervised domain adaptation task with an asymmetric distribution
of labels. With the availability of large labeled source datasets, investigation of
this setting merits attention and is carried out in this paper.
130
Background
133
134
095
096
097
098
099
100
101
102
103
104
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
129
130
131
131
132
093
128
128
129
092
105
105
106
091
094
094
095
090
In this section we formally define the learning task in domain adaptation and
give some background about the Subspace Alignment approach for domain adaptation on which we build up our method.
132
133
134
4
135
3.1
Learning Task
135
136
136
137
138
139
140
141
142
143
144
145
We have a set S of labeled source data and a set T of unlabeled target data.
Both S and T are D dimensional. The source label set is denoted by Cs =
{Cs1 , . . . , Csm } and the target label set by Ct = {Ct1 , . . . , Ctn }, where m is
the number of classes in the source domain and n is the number of classes in
the target domain. Generally, domain adaptation algorithms assume source and
target domains have the same label set i.e. Cs = Ct . Source S is drawn from
a source distribution Ds and target T is drawn from target distribution Dt . In
general Ds 6= Dt . The aim in domain adaptation is to generalize a model learned
on source domain Ds to target domain Dt .
3.2
Subspace Alignment
150
151
152
153
154
155
The subspace alignment approach [4] for domain adaptation first projects the
source data (S) and target data (T ) to their respective d-dimensional subspaces
using Principal Component Analysis. Let us denote these subspaces by Xs for
source and Xt for target (Xs , Xt RDd ). Then subspace alignment directly
aligns the two subspaces Xs and Xt using a linear transformation function. Let
us denote the transformation matrix by M . To learn M , subspace alignment
minimizes following Bergmann divergence:
F (M ) = ||Xs M Xt ||2F
157
163
164
165
166
171
Xs M Xt0 .
where A =
an SVM classifier.
This similarity measure is then used as a kernel to train
176
177
178
179
147
149
150
151
152
153
154
155
157
159
161
162
163
164
165
166
168
169
170
171
172
Approach
173
174
174
175
145
167
Sim(ys , yt ) = ys0 Ayt
172
173
144
160
168
170
143
where k kF denotes Frobenius norm. Using the fact that Xs and Xt are intrinsically regularized and Frobenius norm is invariant to orthonormal operations,
it can be easily shown that the solution for the given optimization problem
is M = Xs0 Xt [4]. We denote a matrix transpose operation by 0 . The target
aligned source subspace is then given by Us = Xs Xs0 Xt , (Us RDd ). To compare source data point ys with a target data point yt (ys , yt RD1 ), the
Subspace Alignment approach uses the following similarity measure:
167
169
142
158
159
162
141
M = argminM (F (M ))
158
161
140
156
156
160
139
148
148
149
138
146
146
147
137
In this section we introduce our algorithm to adapt for asymmetrically distributed label sets between source and target class. The algorithm can be divided
into two steps. The first step iteratively expands the source subspace to discover
target categories by minimizing projection errors and the second step uses these
categories to train a source classifier in the expanded subspace.
175
176
177
178
179

180
181
182
183
184
185
186
187
188
189
190
In Figure 1, we illustrate the steps involved in the proposed approach. Evolving subspace algorithm evolves source subspace by iteratively adding categories
that minimizes the projection error between source and target subspace. This
iterative algorithm gives an ordering on the source categories. It also outputs
the projection error after each iteration of the algorithm. We first select K categories from this ordering which minimize the projection error. We can then use
any domain adaptation algorithm to learn a classifier using these selected source
categories to adapt to target.
We first define two different projection error criteria that are used later to
order the categories from source. Then we describe the two steps of our algorithm.
180
181
182
183
184
185
186
187
188
189
190
191
191
192
192
193
193
194
194
195
195
196
196
197
197
198
198
199
199
200
200
201
201
202
202
203
Fig. 1: Brief overview of evolving subspace algorithm.
204
206
206
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
204
205
205
207
203
4.1
Projection Errors
207
Projection Error is a key part of our algorithm as it decides which categories

minimize the discrepancy between source and target subspaces. We introduce
two different projection errors. Given the d-dimensional source subspace Xs and
the d-dimensional target subspace Xt (Xs , Xt RDd ), let us denote by Us the
target aligned source subspace i.e. Us = Xs Xs0 Xt ( RDd ) [4]. Similar to the
subspace alignment criterion described in Section 3, our first projection error
is the following Bregmann matrix divergence (we call it Subspace Alignment
Error):
SubspaceAlignmentError = ||Us Xt ||F
208
where F denotes Frobenius norm. This subspace alignment error is a subspace

based criterion. Intuitively, domains may share principal components. Subspace
alignment aligns principle components which makes adaptation possible. If a category is present in both source and target then corresponding category specific
subspaces for source and target will share principal components and subspace
alignment error will be small for such categories. This makes subspace alignment
error a good projection error for our problem. Our second criterion is Reprojection Error introduced by Hoffman et al. [12]. Reprojection error is a sampling
217
209
210
211
212
213
214
215
216
218
219
220
221
222
223
224
6
225
226
227
228
based criterion which uses source examples instead of source subspace for computing projection error. Hoffman et al. [12] compute the Reprojection Error of
the source data. We instead use the following Reprojection error of target data
on target aligned source subspace (Us ):
225
ReprojectionError = ||T (T Us )Us0 ||F
229
where T ( R
) is the target data. Reprojection error is a sampling based
criterion. Intuitively, reprojection error tries to measure the discrepancy between
original target data matrix and reconstructed target data matrix after projecting
them onto target aligned source subspace. Algorithm 1 and algorithm 2 give
the functions for computing Subspace Alignment Error and Reprojection Error
respectively which we will use as a sub-routine in the next section.
231
229
230
231
232
233
234
235
236
Nt D
241
242
243
244
245
246
Algorithm 1 Computing the Subspace Alignment Error

Input: Set of categories denoted by C
SC Source Examples with labels C
T Target Examples
Xt P CA(T )
Xs P CA(SC )
Us Xs Xs0 Xt
Output: ||Us Xt ||F
251
252
253
254
255
256
Target Subspace
Source Subspace on given label set
Target aligned source subspace
Subspace Alignment Error
261
262
263
264
265
266
267
268
269
234
235
236
239
243
244
245
246
247
248
Algorithm 2 Computing the Reprojection Error

Input: Set of categories denoted by C
SC Source Examples with labels C
T Target Examples
Xt P CA(T )
Xs P CA(SC )
Us Xs Xs0 Xt
Output: ||T (T Us )Us0 ||F
249
250
251
252
//
//
//
//
Target Subspace
Source Subspace on given label set
Target aligned source subspace
Reprojection Error
253
254
255
256
257
258
258
260
233
242
//
//
//
//
257
259
232
241
248
250
230
240
247
249
228
238
238
240
227
237
237
239
226
4.2
Step 1: Evolving Subspace
In this step we evolve the source subspace into the target subspace by iteratively
adding source categories which minimizes the projection error of the subspace
specific to these categories. Algorithm 3 gives the different steps involved. We
start with an empty list of categories which we denote by selectedCategories. In
each outer iteration of the algorithm we add a category to this list. The inner iteration uses Algorithm 1,2 to compute the projection errors and selects a category
with minimum error from the categories not selected so far. The algorithm finally
outputs the list of categories added after each iteration (selectedCategories) and
the list of corresponding projection errors (errors) at the end of each iteration.
Let us denote by Us the target aligned source subspace on the categories in the
259
260
261
262
263
264
265
266
267
268
269

270
271
272
273
274
275
276
277
278
279
280
281
list selectedCategories. Intuitively, if we add a source category Ci to the list

selectedCategories and the category Ci is also present in the target label set,
then adding Ci will bring Us and the target subspace Xt closer to each other.
We use projection errors for deciding this closeness of the two subspaces. If projection error increases on adding category Ci then the examples in Ci are not
similar to the target examples. This algorithm naturally gives us an ordering
on the source categories: the order in which categories are selected based on
minimizing the projection errors. Ideally, we expect categories present in target
domain to occur first in the list selectedCategories. Hence we expect the projection error to first decrease after each iteration until all categories which are
present in both source and target are added and then it must increase as we add
other categories.
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
272
273
274
275
276
277
278
279
280
281
283
283
285
271
282
282
284
270
Algorithm 3 Evolving source subspace to learn target categories

selectedCategories [ ]
// Empty List
unselectedSet Cs
// Set of categories in source
errors [ ]
// Stores projection errors
while unselectedSet is not empty do
minP rojectionError INFINITE
selectedCategory 1
for Category c in unselectedSet do
error Projection Error on selctedCategories {c} (Algorithm 1,2)
if error < minP rojectionError then
minP rojectionError error
selectedCategory c
end if
end for
Remove selectedCategory from unselectedSet
Append selectedCategory to selectedCategories
Append error for selectedCategory to errors
end while
Output: selectedCategories, errors
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
301
302
302
303
304
305
306
307
308
309
310
311
312
313
314
4.3
Step 2: Training Source Classifier
Step 1 gives us an ordering on the categories of the source domain. Let us

denote this ordering by selectedCategories = [C1 , C2 . . . , Cm ], where m is
the number of classes in the source label set. Let the projection errors after
adding each category in step 1 be errors = [E1 , E2 , . . . , Em ]. As discussed before
we expect values in errors to first decrease and then increase. We propose to
use this intuition and select the first K categories of the list selectedCategories
such that EK ( errors) is a minimum. We then use the categories Ctrain =
[C1 , C2 , . . . , CK ] for training a source classifier in the target aligned subspace.
Ideally, we should get a single global minimum in errors, but this is not true in
practice and we get many local minima. Hence we propose to use two methods to
303
304
305
306
307
308
309
310
311
312
313
314
8
315
316
317
318
select K. The first method is to select K such that EK is the first local minimum
in errors. The second method selects K such that EK is the global minimum in
errors. We discuss advantages and disadvantages of using local minima vs using
global minima in section 6.3. Algorithm 4 outlines the steps described here.
316
317
318
319
319
320
320
321
315
Algorithm 4 Training Source Classifier
322
Input:
// Output of Algorithm 3
selectedCategories = [C1 , C2 . . . , Cm ]
errors = [E1 , E2 , . . . , Em ]
K Index in errors with Global minima/first local minima.
// Categories selected for training
Ctrain [C1 , C2 . . . , CK ]
Strain Source examples with labels Ctrain
XStrain P CA(Strain )
// Source Subspace
Xt P CA(T )
// Target Subspace
US XStrain XS0 train Xt
// Target Aligned source space
trainingData Strain US
// Source data in aligned space
Classif ier Train classifier on trainingData
We then use this Classif ier to predict on target data in target space Xt .
Output: Classif ier
323
324
325
326
327
328
329
330
331
332
333
321
322
323
324
325
326
327
328
329
330
331
332
333
334
334
335
335
336
336
337
Multi-Source Domain Adaptation
339
339
340
341
342
343
344
345
Simplicity of our algorithm allows us to extend our algorithm to multi-source

domain adaptation where we have more than one source domains. The aim in
multi-source domain adaptation is to generalize these multiple source domains
to a single target domain. This is particularly useful in practice, as a non-expert
may not know which source dataset is to be preferred for his given target data
(i.e., most similar to it).
5.1
Problem Description
352
353
354
355
356
357
358
359
342
343
344
345
348
349
349
351
341
347
347
350
340
346
346
348
337
338
338
Given labeled source domains S1 , S2 , . . . , Sp and a target domain T , the aim in

multi-source domain adaptation is to generalize a model learned on these source
domains to target domain T . Let L = {C1 , C2 , . . . , Cm } denote the label set of
source and target domains. A common approach for using multiple sources is
to combine all the sources into a single source and then learn a classifier using
single source domain adaptation. This approach ignores the difference between
different source distributions. Another common approach is to train a classifier
for each source domain separately and combine these classifiers. In contrast, our
algorithm automatically picks an optimal combination of categories from the
different source domains.
350
351
352
353
354
355
356
357
358
359

360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
5.2
Our Approach
We again use the idea of evolving source subspace by selecting categories which
minimizes projection errors. For this problem, the same categories coming from
different source domains are considered as different. Let us modify our label
set L to L = {C11 , C12 , . . . , Cpm } where Cij represents category Cj of source
domain Si . We combine all the source domains and apply our single source
adaptation algorithm given in the previous section (Section 4) to this combined
source domain with label set L . The evolving step (Algorithm 3) of our approach
will give an ordering on label set L , and the training step (Algorithm 4) will
train a classifier on first K categories of this ordering which minimizes overall
projection error. Although our algorithm does not require the knowledge of the
target label set, for the multi-source setting we evaluate with both known and
unknown target label set. When using the target label set information, we add
an additional constraint in our training step (Algorithm 4) that the category set
Ctrain used for training the source classifier should have each target category
present at least once. If a target category Cj is such that Cij is not in training
set Ctrain for all source domains Si (i 1 . . . p) then we select the Cij which
occurs first in the ordering. We give experimental results for multi-source domain
adaptation in Section 6.6.
381
382
383
384
385
386
387
388
389
Experimental Results
This section is devoted to experimental results to validate our proposed algorithm. The performance evaluation has been done on numerous object classification datasets. SVM is used as the final classifier on transformed source data.
Apart from comparing the empirical performances on several datsets, we also
demonstrate the effect of varying the size of the target label set and analyze the
effect of underestimating/overestimating the number of target categories on the
adaptation. We intend to make our code available on acceptance.
392
393
394
395
396
397
398
399
6.1
Datasets
Cross-Dataset Testbed [13]: Tommasi et al. [13] have preprocessed several

existing datasets for evaluating domain adaptation algorithms. They have proposed two setups: the dense set and the sparse set for cross-dataset analysis.
Here we use the dense set for the evaluation of our algorithm. The dense set
consists of four datasets Caltech256 [14], Bing[15], Sun[16], and ImageNet[17]
which share 114 categories in themselves with 40 of the categories having more
than 20 images per category. We use these 40 categories for the performance
evaluation of the proposed algorithm.
402
403
404
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
380
381
382
383
384
385
386
387
388
389
391
392
393
394
395
396
397
398
399
400
400
401
362
390
390
391
361
379
379
380
360
Office-Dataset [18]: This datasets consists of three domains: Amazon (images obtained from Amazon website), DSLR (High resolution images obtained
from DSLR camera), Webcam (Low resolution images obtained from a Webcam) each having 34 categories. As done in [3],[4] we use this dataset along with
401
402
403
404
10
405
406
407
408
409
410
411
412
Caltech256 [14] which together have 10 overlapping classes for the performance
evaluation in multi-source setting.
For all the domains, we use DeCAF [19] features extracted at layer 7 of the
Convolutional Neural Network. These are the activation values of 7th layer of the
Convolutional Neural Network trained on 1000 Imagenet classes. In the following
sections we present evaluation of our algorithm on Bing[15] and Caltech256[14]
datasets of dense setups. We give results for other datasets of dense setup in
supplementary material.
415
416
417
418
419
420
421
422
423
424
406
407
408
409
410
411
412
413
413
414
405
6.2
Effect of Number of Source Categories on Adaptation
414
In this subsection, we demonstrate the effect of varying the number of source

categories on the performance of domain adaptation. Bing is used as source
dataset and Caltech as target. We fix the size of the label set in the target data,
setting it to 10. We apply our algorithm and greedily choose the source categories
one by one in successive steps from the source dataset. The maximum no. of
source categories which can be chosen is set to 30 categories. Figure 2 shows the
result for this experiment. We show both accuracy (blue) and reprojection errors
(green) in Figure 2. The maximum accuracy is achieved when the number of
selected source categories is 13 and the minimum rerprojection error is achieved
when the number of selected source categories is 11.
415
416
417
418
419
420
421
422
423
424
425
425
426
426
Effect of Number of Source Categories on Adaptation
100
427
2000
427
428
428
429
429
75
433
434
435
436
Accuracy on Target (%)
432
431
Maximum
Accuracy
50
1500
437
438
430
1750
431
432
433
434
435
436
437
Global Minima of
Reprojection Error
25
Reprojection Error
430
1250
438
439
439
440
440
441
442
0
0
10
15
20
25
1000
30
Number of Source Categories used from Ordering
445
446
447
448
449
442
443
443
444
441
Fig. 2: Performance of our adaptation algorithm (left y-axis) and reprojection

error (right y-axis) as we change the number of source categories (x-axis) used
for training the source classifier.
It can be observed from the figure 2 that for small number of source categories
accuracy of the adapted classifier is low as a number of target categories are
missing while learning the source subspace. It can also be clearly seen that
444
445
446
447
448
449

450
451
452
453
454
455
456
457
458
459
460
11
having too many categories in source classifier which are not present in target
doesnt help in improving classification accuracy because it increases the distance
between source and target subspace. As mentioned earlier in section 3.1, we
expect the domain adaptation algorithm to perform best when source and target
label sets are the same because under these conditions the distance between the
subspaces would be the smallest. All the target categories are covered in first 13
source categories of the ordering which justifies the maximum value of accuracy
for 13 source categories. The plot also justifies our intuition for projection error
as it first decreases, attains a minimum and then increases.
6.3
Discovered Classes
Classes not discoverd
Extra classes discoverd
25
25
473
20
476
457
458
15
15
10
10
465
466
467
468
10
15
20
25
Actual number of target classes
30
35
40
(a) Using Reprojection Error and First

Local Minima
469
470
0
10
15
20
25
30
35
40
471
(b) Using Subspace Alignment Error and

First Local Minima
472
473
474
474
475
456
464
20
471
472
455
463
468
470
454
462
Discovered Classes
467
469
453
461
Source: Bing (40 classes), Target: Caltech, SA Error, Local Minima
30
464
466
452
460
Source: Bing (40 classes), Target: Caltech, Reprojection Error, Local Minima
30
463
465
451
459
Analysis of Evolution Step
461
462
450
Source: Bing (40 classes), Target: Caltech, Reprojection Error, Global Minima
35
Source: Bing (40 classes), Target: Caltech, SA Error, Global Minima
475
35
Discovered Classes
30
Discovered Classes
30
476
477
477
25
25
20
20
480
15
15
480
481
10
10
481
482
482
478
478
479
483
484
485
486
487
488
489
490
491
492
493
494
10
15
20
25
30
35
40
(c) Using Reprojection Error and Global

Minima
479
483
0
10
15
20
25
30
35
40
(d) Using Subspace Alignment Error and

Global Minima
Fig. 3: Performance of Evolution Step for different sizes of the target label set (xaxis). Blue: Number of target categories covered in final selected set Ctrain . Red:
Number of target categories missed in final set. Green: Number of categories not
present in target but selected in final set.
Our measure for analyzing the evolution step is based on the number of
target categories present in the final list of selected categories by our algorithm.
Let us assume that the source categories are (Csrc = [C1 , C2 , . . . , Cm ]) and
484
485
486
487
488
489
490
491
492
493
494
12
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
our algorithm selects (Ctrain = [C1 , C2 , . . . , CK ]) out of Csrc after the final
evolution step of the algorithm. Figure 3 shows the performance for Bing as
source dataset and Caltech as target dataset. X-axis represents true number
of categories in target dataset. For each point on x-axis we average over 10
permutation to get number of target categories correctly discovered, number of
target categories missed and number of non-target categories in final selected set
Ctrain . Ideally, we expect selected categories Ctrain to be the label set of target
domain. Hence, we expect a x = y line in the plot.
Reprojection Error (Figure 3a,3c) and Subspace Alignment Error (Figure
3b,3d) both show similar trends. For computing K we propose to use either
the first local minimum or the global minimum of projection errors. It can be
clearly seen in Figure 3a,3b that the number of non-target categories discovered
using the first local minimum is very small (Green curve) but it highly underestimates the number of target categories for large number of target classes (Blue vs
Red curve). Hence, using the first local minimum discovers very small number of
wrong classes but misses a lot of correct classes. Using the global minimum (Figure 3c,3d ) for selecting K doesnt underestimate for large number of categories
in target. However, for small number of categories using the global minimum
overestimates. Although, we use global minimum for rest of the experiments,
one can use first local minimum if one knows beforehand that the number of
categories in target is small. As subspace alignment error and reprojection error
show similar performance, we use reprojection error for the rest of the experiments. (For other pairs of source and target dataset in dense setup we give plots
for this experiment in supplementary material).
495
6.4
519
Benchmark Comparison with SA as adaptation technique
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
520
520
521
496
Next, we compare the performance of our algorithm with two extreme setups:
First is the performance of domain adaptation algorithm while all available
source categories are considered with respect to the target categories to learn
the domain invariant representation and second is the performance of domain
adaptation algorithm when only those source categories are considered which
also belong to target categories (i.e., using an oracle). Here, we use subspace
alignment algorithm [4] for the final domain adaptation. Figure 4 shows the performance of our algorithm with subspace alignment based domain adaptation
algorithm. Bing is used as source (40 categories), Caltech is used as target and
for each point on x-axis accuracies are averaged over 10 experiments. As we have
discussed in Section 6.2, we expect a domain adaptation algorithm to perform
best when source and target categories have same label set. Green bars in Figure
4 justify our intuition. Performance of categories given by our algorithm (blue
bars) always outperform the case when we use all source categories for doing the
domain adaptation and training the classifier (red bars).
6.5
Experiments with GFK as adaptation technique
It is important to note that once the categories from the source have been selected using the criteria proposed in section 4, any domain adaptation algorithm
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539

540
541
542
543
544
545
546
13
can be used afterwards. So far, we choose to use subspace alignment based domain adaptation approach to evaluate our algorithm. Here, we also provide an
experiment with geodesic flow kernel based approach [3] to validate our claim.
We use Webcam and Amazon domains of Office-Caltech dataset as target and
source respectively. Results given in table 1 show that GFK classifier trained using categories predicted by our algorithm outperforms the GFK classifier which
uses all available source categories in almost all of the cases.
540
541
542
543
544
545
546
547
547
100
548
549
all available source categories (40)

source categories from our algorithm
source categories=target categories (Oracle)
95
548
549
550
550
90
551
552
551
552
85
553
Accuracy (%)
553
554
555
556
80
554
555
75
556
70
557
557
558
558
65
559
559
60
560
560
55
561
562
50
563
561
562
2
10
16
19
Number of Target Categories
24
30
565
566
567
Fig. 4: Performance of our approach with subspace alignment based domain

adaptation algorithm on Bing-Caltech dataset, with Bing as Source (40 categories) and Caltech as Target.
570
571
572
573
574
575
576
577
Number of Target Source Categories = Target Available Source Predicted Target Categories
Categories
Categories (Oracle)
Categories
(Our algorithm)
2
93.6701
64.0528
68.2893
3
92.8030
64.6888
78.0935
4
86.9177
64.3291
71.9271
5
83.6397
68.1922
74.8842
6
79.4589
68.7642
69.9612
7
79.7467
71.3108
73.4498
8
78.8076
72.0334
72.3195
9
77.0064
73.4751
73.5467
10
73.7835
73.4263
71.7411
580
Table 1: Performance of our algorithm with GFK (geodesic flow kernel) as adaptation technique. Source Datset:Webcam, Target Dataset:Amazon
583
584
567
569
570
571
572
573
574
575
576
577
579
580
581
581
582
566
578
578
579
565
568
568
569
563
564
564
6.6
Multi-Source Domain Adaptation
One more important application of our approach is in the case of multiple source
domain adaptation. These settings occur very often in real world scenarios. Con-
582
583
584
14
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
sider a case when you have lots of source domains available and have no information about the domain/domains which is/are most similar to the target. We
apply our algorithm to choose the corresponding source categories from the best
possible source domain to match with the respective target labels. This section
is devoted to the experiments on multi-source domain adaptation setting using our proposed approach. We use Office-Caltech dataset for this experiment.
There are 10 overlapping categories in four domains of Office-Caltech dataset.
We consider one of the four domains as target and rest three as source domains.
Both source and target categories are assumed to be known. As discussed before
we consider same category from different sources as different, we relabel source
categories from 1 to 30 (10 for each source domain). Now, we run our algorithm
with the extra constraint to select all the target categories at least once from
some source domain. The experimental results for this setting has been reported
in table 2. Along with the performance of our algorithm we list performance of
single source domain adaptation with all three sources combined into a single
source and also the performance of three single source domain adaptation classifier each with one of three sources. Note that our algorithm either outperforms
or performs very comparable to all the other benchmarks mentioned above on
Office-Caltech dataset. If we remove the constraint of selecting all the target
categories at least once then we get the same results as that for the constrained
case for DSLR, Amazon and Caltech as target datasets. For Webcam, number
of target categories are underestimated and we get a minima at 7 instead of 10.
607
Table 2: Multi-Source domain adaptation results.
608
609
610
611
612
613
Target Domain Source

DSLR
DSLR
Amazon
61.27
Webcam
82.91
Caltech
62.61
Source Source Source

Source
Source
Number of
Amazon Webcam Caltech Rest three domains Selected by our Algorithm categories
86.00
97.33 78.00
70.67
97.33
10
64.62
59.82
62.05
74.44
16
72.95
58.01
66.55
81.49
10
91.62
70.87
91.74
90.85
17
Conclusion
618
619
620
621
622
623
624
625
626
627
628
629
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
615
616
616
617
586
614
614
615
585
In this paper, we provide a method to adapt classifiers from a source domain

with a large category set to a target domain with few number of classes. The
main contribution is a method to obtain an ordered set of category specific source
subspaces that match a particular target domain. The stopping criterion makes
such a ordering practically useful (it could be a preprocessing step to enable
a number of other domain adaptation techniques). Its use in the multi-source
domain adaptation setting can serve widening the scope of domain adaptation
to real world (in the wild) settings where any combination of source domains
can be made available.
In future, we would be interested in exploring domain adaptation in the wild
setting for more challenging scenarios such as weakly supervised segmentation
and in per sample domain adaptation settings.
617
618
619
620
621
622
623
624
625
626
627
628
629

630
15
References
631
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
630
1. Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: Computer Vision and
Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE (2011) 15211528
2. Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: An
unsupervised approach. In: Computer Vision (ICCV), 2011 IEEE International
Conference on, IEEE (2011) 9991006
3. Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised
domain adaptation. In: Computer Vision and Pattern Recognition (CVPR), 2012
IEEE Conference on, IEEE (2012) 20662073
4. Fernando, B., Habrard, A., Sebban, M., Tuytelaars, T.: Subspace alignment for
domain adaptation. arXiv preprint arXiv:1409.5241 (2014)
5. Baktashmotlagh, M., Harandi, M., Lovell, B., Salzmann, M.: Unsupervised domain adaptation by domain invariant projection. In: Proceedings of the IEEE
International Conference on Computer Vision. (2013) 769776
6. Patel, V.M., Gopalan, R., Li, R., Chellappa, R.: Visual domain adaptation: A
survey of recent advances. Signal Processing Magazine, IEEE 32(3) (2015) 5369
7. Margolis, A.: A literature review of domain adaptation with unlabeled data. Tec.
Report (2011) 142
8. Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment
classification: A deep learning approach. In: Proceedings of the 28th International
Conference on Machine Learning (ICML-11). (2011) 513520
9. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level
image representations using convolutional neural networks. In: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition. (2014) 17171724
10. Long, M., Wang, J.: Learning transferable features with deep adaptation networks.
arXiv preprint arXiv:1502.02791 (2015)
11. Raj, A., Namboodiri, V.P., Tuytelaars, T.: Mind the gap: Subspace based hierarchical domain adaptation. arXiv preprint arXiv:1501.03952 (2015)
12. Hoffman, J., Darrell, T., Saenko, K.: Continuous manifold based adaptation for
evolving visual domains. In: Computer Vision and Pattern Recognition (CVPR),
2014 IEEE Conference on, IEEE (2014) 867874
13. Tommasi, T., Tuytelaars, T.: A testbed for cross-dataset analysis. In: Computer
Vision-ECCV 2014 Workshops, Springer (2014) 1831
14. Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. (2007)
15. Bergamo, A., Torresani, L.: Exploiting weakly-labeled web images to improve
object classification: a domain adaptation approach. In: Advances in Neural Information Processing Systems. (2010) 181189
16. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: Large-scale
scene recognition from abbey to zoo. In: Computer vision and pattern recognition
(CVPR), 2010 IEEE conference on, IEEE (2010) 34853492
17. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale
hierarchical image database. In: Computer Vision and Pattern Recognition, 2009.
CVPR 2009. IEEE Conference on, IEEE (2009) 248255
18. Saenko, K., Kulis, B., Fritz, M., Darrell, T.: Adapting visual category models to
new domains. In: Computer VisionECCV 2010. Springer (2010) 213226
19. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.:
Decaf: A deep convolutional activation feature for generic visual recognition. arXiv
preprint arXiv:1310.1531 (2013)
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674

Unsupervised Domain Adaptation

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Unsupervised Domain Adaptation

Hochgeladen von

Copyright:

Verfügbare Formate

000

Unsupervised Domain Adaptation in the Wild:

Anonymous ECCV submission

Abstract. The goal of domain adaptation is to adapt models learned

Keywords: Domain Adaptation

Suppose we have access to a state-of-the-art classifier trained with thousands of

ECCV-16 submission ID 925

ECCV-16 submission ID 925

Deep adaptation With the availability of convolutional neural networks (CNNs),

Subspace-based methods In our work we mainly work with subspace based

ECCV-16 submission ID 925

This similarity measure is then used as a kernel to train

Sim(ys , yt ) = ys0 Ayt

ECCV-16 submission ID 925

Fig. 1: Brief overview of evolving subspace algorithm.

Projection Error is a key part of our algorithm as it decides which categories

where F denotes Frobenius norm. This subspace alignment error is a subspace

ECCV-16 submission ID 925

ReprojectionError = ||T (T Us )Us0 ||F

Algorithm 1 Computing the Subspace Alignment Error

Algorithm 2 Computing the Reprojection Error

Step 1: Evolving Subspace

ECCV-16 submission ID 925

list selectedCategories. Intuitively, if we add a source category Ci to the list

Algorithm 3 Evolving source subspace to learn target categories

Step 2: Training Source Classifier

Step 1 gives us an ordering on the categories of the source domain. Let us

ECCV-16 submission ID 925

Algorithm 4 Training Source Classifier

Multi-Source Domain Adaptation

Simplicity of our algorithm allows us to extend our algorithm to multi-source

Given labeled source domains S1 , S2 , . . . , Sp and a target domain T , the aim in

ECCV-16 submission ID 925

Cross-Dataset Testbed [13]: Tommasi et al. [13] have preprocessed several

ECCV-16 submission ID 925

Effect of Number of Source Categories on Adaptation

In this subsection, we demonstrate the effect of varying the number of source

Effect of Number of Source Categories on Adaptation

Accuracy on Target (%)

Number of Source Categories used from Ordering

Fig. 2: Performance of our adaptation algorithm (left y-axis) and reprojection

ECCV-16 submission ID 925

(a) Using Reprojection Error and First

(b) Using Subspace Alignment Error and

Analysis of Evolution Step

Source: Bing (40 classes), Target: Caltech, SA Error, Global Minima

(c) Using Reprojection Error and Global

(d) Using Subspace Alignment Error and

ECCV-16 submission ID 925

Benchmark Comparison with SA as adaptation technique

Experiments with GFK as adaptation technique

ECCV-16 submission ID 925

all available source categories (40)

Fig. 4: Performance of our approach with subspace alignment based domain

Multi-Source Domain Adaptation

Table 2: Multi-Source domain adaptation results.

ECCV-16 submission ID 925

Target Domain Source

Source Source Source

In this paper, we provide a method to adapt classifiers from a source domain

ECCV-16 submission ID 925

Das könnte Ihnen auch gefallen