Beruflich Dokumente
Kultur Dokumente
R
obert Busa-Fekete1,2
Bal
azs Sz
or
enyi2,3
4
Paul Weng
Weiwei Cheng1
Eyke H
ullermeier1
busarobi@inf.u-szeged.hu
szorenyi@inf.u-szeged.hu
paul.weng@lip6.fr
cheng@mathematik.uni-marburg.de
eyke@mathematik.uni-marburg.de
Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str., 35032 Marburg, Germany
Research Group on Artificial Intelligence, Hungarian Academy of Sciences and University of Szeged, Hungary
3
INRIA Lille - Nord Europe, SequeL project, 40 avenue Halley, 59650 Villeneuve dAscq, France
4
Pierre and Marie Curie University (UPMC), 4 place Jussieu, 75005 Paris, France
2
Abstract
We consider the problem of reliably selecting an optimal subset of fixed size from
a given set of choice alternatives, based
on noisy information about the quality of
these alternatives. Problems of similar
kind have been tackled by means of adaptive sampling schemes called racing algorithms. However, in contrast to existing
approaches, we do not assume that each alternative is characterized by a real-valued
random variable, and that samples are
taken from the corresponding distributions.
Instead, we only assume that alternatives
can be compared in terms of pairwise preferences. We propose and formally analyze a general preference-based racing algorithm that we instantiate with three specific ranking procedures and corresponding
sampling schemes. Experiments with real
and synthetic data are presented to show
the efficiency of our approach.
1. Introduction
yi,j =
1X `
yi,j .
n
(2)
`=1
XX
I{j < i } ,
(1)
yi =
1 X
1 X
yi,k <
yj,k = yj .
K 1
K 1
k6=i
(4)
k6=j
The RW ranking is directly motivated by the PageRank algorithm (Brin & Page, 1998), which has been
well studied in social choice theory (Altman & Tennenholtz, 2008; Brandt & Fischer, 2007) and rank
aggregation (Negahban et al., 2012), and which is
KK
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
according to (2)
for i, j = 1 K do
. Update
confidence bounds, C, U, L
q
2
1
ci,j = 2ni,j log 2K nmax
. Hoeffding bound
ui,j = yi,j + ci,j , `i,j = yi,j ci,j
K, , U, L)
(A, B) = SSCO(A, Y,
. Sampling strategy for CO
K, , U, L, D)
(A, B, D) = SSSE(A, Y,
. Sampling strategy for SE
K, , C)
(A, B) = SSRW(Y,
. Sampling strategy for RW
return B
4. Sampling Strategies
4.1. Copelands Ranking (CO )
The preference relation specified by the matrix Y is
obviously reciprocal, i.e., yi,j = 1 yj,i for i 6= j.
Therefore, when using CO for ranking, the optimization task (3) can be reformulated as follows:
XX
argmax
I{yi,j > 1/2}
(5)
In this section, we introduce a general preferencebased racing (PBR) algorithm that provides the basic statistics needed to solve the selection problem
(3), notably estimates of the yi,j and corresponding confidence intervals. It contains a subroutine
that implements sampling strategies for the different ranking models described in Section 2.3.
The pseudocode of PBR is shown in Algorithm 1.
The set A contains all pairs of options that still need
to be sampled; it is initialized with all K 2 K pairs
of indices. The set B contains the indices of the
current top- solution. The algorithm samples those
Yi,j with (i, j) A (lines 68). Then, it maintains
the yi,j given in (2) for each pair of options in lines
(910). We denote the confidence interval of yi,j by
K, , U, L)
Procedure 2 SSCO(A, Y,
1: for i = 1 K do
2:
zi = |{j|ui,j < 1/2 i 6= j}|
3:
wi = |{j|`i,j > 1/2 i 6= j}|
4: C = i : K < {j|K zj < wi }
. Select
5: D = i : < {j|K wj < zi }
. Discard
6: for (i, j) A do
7:
if (i, j C D) (1/2 6 [`i,j , ui,j ]) then
8:
A = A \ (i, j)
. Stop updating yi,j
9: B = the top- options for which the correspond with most entries above 1/2
ing rows of Y
10: return (A, B)
K, , U, L, D)
Procedure 3 SSSE(A, Y,
1: G = {i : i appearing in A}
. Active options
e = {1, . . . , K} \ (G D)
. Already selected
2: B
3: for all i GP
do
1
4:
`i = K1
`i,j
PjG\{i}
1
5:
ui = K1 jG\{i} ui,j
e
e
6: K = |G|,
e = |B|
. Reduced problem
e
e
e
{j G : uj < `i }
7: B = B i : K
e
<
8: D = D i :
e < {j G : ui < `j }
9: for (i, j) A do
e D) then
10:
if (i B
11:
A = A \ (i, j)
. Stop updating yi,j
P
1
i,j
12: for i = 1 K do yi = K1
j6=i y
Despite important differences between the valuebased and the preference-based racing approach, the
expected number of samples taken by the latter can
be upper-bounded in much the same way as EvenDar et al. (2002) did for the former.2
Theorem 1. Let O = {o1 , . . . , oK } be a set of options such that i,j = yi,j 1/2 6= 0 for all i, j [K].
The expected number of pairwise comparison taken
by PBR-CO is bounded by
K X
X
i=1 j6=i
&
2K 2 nmax
1
log
22i,j
'
.
Theorem 2. Let O = {o1 , . . . , oK } be a set of options. Assume oi SE oj iff i < j without loss of
generality
i 6= j K. Let
and yi 6= yj for all 1
2
2K 2 nmax
4
log
for i [K ]
bi =
yi yK+1
2
2
4
and bj =
log 2K nmax for j = K +
yj yK
XX
I{yj < yi } ,
(6)
vT )1 1
1
(IS+1
vT . Moreover, we have kSSk
2
kCk1 with probability at least 1 K , since this
inequality requires all si,j to be within the confidence
interval given in (7) and, therefore all yi,j must be
within the confidence interval of yi,j .
i,j
lated as si,j = P y`,i , assuming that we know con`
fidence bounds ci,j for a given confidence level for
= [
each element of the matrix Y
yi,j ]KK . Aslam &
Decatur (1998) provide simple bounds for propagating error via some basic operations (see Lemma 1-2).
Using their results, a direct calculation yields that
si,j [
si,j ci,j , si,j + ci,j ] where S = [si,j ]KK is
y
the stochastic matrix calculated as si,j = P i,j
y`,i and
ci,j =
X
K
max ci,k
y`,i
3 k
(7)
Recall our original optimization task, namely to select a subset of options as follows:
XX
I{vj < vi }
(10)
argmax
Let be the sorting permutation that puts the ele in a descending order. Now, if |
ments of v
v()
# kmax is fulfilled, then we can
v(+1) | > 2kCk1 kA
# kmax for
stop sampling, since |vi vi | kCk1 kA
2
1 i K with probability 1 K ; therefore,
the confidence term has to be divided by K 2 . The
pseudo-code of RW sampling strategy is shown in
Procedure 4.
(9)
K, , C)
Procedure 4 SSRW(Y,
Accuracy
0.95
0.9
0.85
PBRCO, (=0.1)
UNIFORM, (=0.1)
PBRSE, (=0.1)
UNIFORM, (=0.1)
PBRRW, (=0.1)
UNIFORM, (=0.1)
0.8
are largest
v
0.75
10
10
Figure 1. The accuracy of different racing methods versus empirical sample complexity. The algorithms were
run with nmax = {100, 500, 1000, 5000, 10000}. The lowest empirical sample complexity is achieved by setting
nmax = 100, and the sample complexity grows with nmax .
1
20
by yi,j
, . . . , yi,j
, and we take the corresponding frequency distribution as the (ground-truth) probability distribution of Yi,j . The rankings of the teams
with respect to CO , SE and RW , computed from
the expectations yi,j = E[Yi,j ], are also shown in Taunchen)
ble 1. While the team of Munich (Bayern M
dominates the Bundesliga regardless of the ranking
model, the follow-up positions may vary depending
on which method is chosen.
Team
B. M
unchen
B. Dortmund
B. Leverkusen
VfB Stuttgart
Schalke 04
W. Bremen
VfL Wolfsburg
Hannover 96
6. A Special Case
In this section, we consider a setting that is in a
sense in-between the value-based and the preferencebased one. Like in the former, each option oi is
associated with a random variable Xi ; thus, it is
10
8: return (A, B)
W
77
56
55
55
54
52
44
30
L
33
49
49
53
47
51
66
75
T
30
35
36
32
39
37
30
35
CO
*1
*3
5
*2
4
6
7
8
SE RW
*1 *1
*2
5
4
*2
5
4
*3 *3
6
6
7
7
8
8
k N+ ; in the second, each Xi obeys a uniform distribution U [0, di ], where di U [0, 10k] and k N+ ;
in the third, each Xi obeys a Bernoulli distribution
Bern(1/2) + di , where di U [0, k/5] and k N+ .
In every scenario, the goal is to rank the distributions by their means. Note that the complexity of
the TKS problem is controlled by the parameter k,
with a higher k indicating a less complex task; we
varied k between 1 and 10. Besides, we used the
parameters K = 10, = 5, nmax = 300, = 0.05.
nj h
ni X
0
1 X
=
I{x`i x`j }
ni nj
0
(11)
`=1 ` =1
i
0
0
1 `
I{xi x`j } + I{x`i x`j }
2
1
2 min(ni ,nj )
ln 2 .
In the Bernoulli case, one may wonder why the sample complexity of PBR-CO hardly changes with k
(see the red point cloud in Figure 2(c)). This can be
explained by the fact that the two sample U-statistic
Y in (11) does not depend on the magnitude of the
drift di (as long as it is smaller than 1).
7. Related Work
Recall that the setup described above is more general than the original value-based one and, therefore,
that the PBR framework is more widely applicable
than the value-based Hoeffding race (HR).5 Nevertheless, it is interesting to compare their empirical
sample complexity in the standard numerical setting, where both algorithms can be used.
Precision
1098
7
6
6
5
4 5
4
3
0.96
10
9
0.94
0.92
0.9
1000
HR
PBRCO
PBRSE
PBRRW
1500
3
22
3
2
8
7
6
5
43
2000
2500
Number of samples
1
10 9
0.99
10
10
0.98
9 8
10
9
9
7
8
6
7
5
0.97
10
8
2
1
1
1
2 1
1
3000
0.96
0.95
2000
76
3
98
5
0.95
76
3
2 1 2
3
2
HR
PBRCO
PBRSE
PBRRW
2200
2400
2600
Number of samples
2800
3000
10
7
9
6
10 9
8
8
7
Precision
10
98
0.98
8 7
Precision
10 9
7 6
5
4
3
5
2
4
4
3 1
21
3
6
0.9
0.85
HR
PBRCO
PBRSE
PBRRW
1500
1
2000
2500
Number of samples
3000
Figure 2. The accuracy is plotted against the empirical sample complexities for the Hoeffding race algorithm (HR)
and PBR, with the complexity parameter k shown below the markers. Each result is the average of 1000 repetitions.
bandit algorithms. The authors propose such an algorithm and prove an upper bound on the expected
sample complexity. In this paper, we borrowed their
technique and used it in the complexity analysis of
PBR-CO and PBR-SE.
For future work, there are still a number of theoretical questions to be addressed, as well as interesting
variants of our setting. For example, inspired by
(Kalyanakrishnan et al., 2012), we plan to consider
a variant that seeks to find a ranking that is close
to the reference ranking (such as CO ) in terms
of a given rank distance, thereby distinguishing between correct and incorrect solutions in a more gradual manner than the (binary) top-k criterion.
Acknowledgments
This work was supported by the German Research
Foundation (DFG) as part of the Priority Programme 1527, and by the ANR-10-BLAN-0215 grant
of the French National Research Agency.
8
References
Akrour, R., Schoenauer, M., and Sebag, M. Preferencebased policy learning. In Proceedings ECMLPKDD
2011, European Conference on Machine Learning and
Principles and Practice of Knowledge Discovery in
Databases, pp. 1227, 2011.
Langville, A. N and Meyer, C. D. Deeper inside pagerank. Internet Mathematics, 1(3):335380, 2004.
Aslam, J.A. and Decatur, S.E. General bounds on statistical query learning and PAC learning with noise
via hypothesis boosting. Inf. Comput., 141(2):85118,
1998.
Maron, O. and Moore, A.W. Hoeffding races: accelerating model selection search for classification and
function approximation. In Advances in Neural Information Processing Systems, pp. 5966, 1994.
Brandt, F. and Fischer, F. Pagerank as a weak tournament solution. In Proceedings of the 3rd international
conference on Internet and Network Economics, pp.
300305, 2007.
Braverman, Mark and Mossel, Elchanan. Noisy sorting without resampling. In Proceedings of the nineteenth annual ACM-SIAM Symposium on Discrete algorithms, pp. 268276, 2008.
Brin, S. and Page, L. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30
(1-7):107117, 1998.
Cheng, W., F
urnkranz, J., H
ullermeier, E., and Park,
S.H. Preference-based policy iteration: Leveraging
preference learning for reinforcement learning. In
Proceedings ECMLPKDD 2011, European Conference
on Machine Learning and Principles and Practice of
Knowledge Discovery in Databases, pp. 414429, 2011.
"
yi,j
|
1
2
log , yi,j +
2ni,j
{z
} |
1
2
log
2ni,j
{z
}
#
(12)
ui,j
`i,j
1
2K 2 nmax m
log
.
22i,j
B. Proof of Theorem 2
P
1
i,j based on the confidence intervals of yj,i .
Proof. One can get a confidence interval for yi = K1
j6=i y
P
1
More precisely, put ci = K1
c
,
and
observe
that,
as
y
i,j [yi,j ci,j , yi,j + ci,j ] with probability at
j6=i i,j
2
least 1 /(2K nmax ) for any 1 j K in any of the nmax rounds, the interval
h
yi ci , yi + ci
| {z } | {z }
`i
ui
contains yi with probability at least 1 /(2Knmax ) in any round. Therefore, denoting by E the event that
each yi is within the confidence interval of yi throughout the whole run of algoritm PBR-SE, it holds that
P[E] 1 /2. Also note that whenever E holds then yi ci yi yj yj + cj for 1 i < j K, and
thus none of o1 , . . . , oK gets selected and none of oK+1 , . . . , oK gets discarded.
If for some
i K and j K
+ 1 the number of samples for each of yi,1 , . . . , yi,K and yj,1 , . . . , yj,K is
2
2
4
at least
log 2K nmax then event E implies yi + ci < yi + 2ci yj 2cj < yj cj by Hoeffdings
yi yj
bound. Therefore, with probability at least 1, after sampling yi,k at least bi times for i = 1, . . . , K and
k = 1, . . . , K and sampling yj,k at least bj times for j = K +1, . . . , K and k = 1, . . . , k, algorithm PBR-SE
selects oK+1 , . . . , oK , discards o1 , . . . , oK , and thus terminates and returns the optimal solution.
10
B.1. The pseudo-code of PBR framework regarding the setup described in Section 6
In this setup there are given a set of random variables Xi , . . . , XK as input. Each random variable Xi takes
values in a set that is a partially ordered by a preference relation .
Procedure 5 PBR(X1 , . . . , XK , , nmax , )
1: B = D =
2: A = {(i, j)| 1 i, j K, i 6= j}
3: for i = 1 K do
4:
ni = 0
5: while (ni nmax ) (|A| > 0) do
6:
for all i appearing in A do
(n )
7:
xi i Xi
8:
ni = ni + 1
9:
for all (i, j) A do
10:
Update
qyi,j with the new samples according to (11)
11:
12:
1
ci,j = 2 min(n
log 2K nmax
i ,nj )
ui,j = yi,j + ci,j , `i,j = yi,j ci,j
(A, B, D) = SSSE(A, Y, K, , U, L, D)
. Sampling strategy for SE
(A, B) = SSRW(Y, K, , C)
. Sampling strategy for RW
17: return B
13:
14:
15:
16:
11