Sie sind auf Seite 1von 11

EE364a Convex Optimization I March 16–17 or 17–18, 2012

Final exam

Prof. S. Boyd

This is a 24 hour take-home ﬁnal. Please turn it in at Bytes Cafe in the Packard building, 24 hours after you pick it up.

You may use any books, notes, or computer programs ( e.g. , Matlab, CVX), but you may not discuss the exam with anyone until March 19, after everyone h as taken the exam. The only exception is that you can ask us for clariﬁcation, via the cou rse staﬀ email address. We’ve tried pretty hard to make the exam unambiguous and clear, so we’re unlikely to say much.

order (problem 1, problem 2, problem 3,

everything associated with each problem ( e.g. , text, code, plots) together; do not attach code

or plots at the end of the ﬁnal.

We will deduct points from long needlessly complex solutions, even if they are correct. Our solutions are not long, so if you ﬁnd that your solution to a problem goes on and on for many pages, you should try to ﬁgure out a simpler one . We expect neat, legible exams from everyone, including those enrolled Cr/N.

), starting a new p age for each problem. Put

When a problem involves computation you must give all of the following: a clear discussion and justiﬁcation of exactly what you did, the Matlab source c ode that produces the result, and the ﬁnal numerical results or plots.

You’ll ﬁnd Matlab ﬁles containing problem data in

All problems have equal weight.

Be sure you are using the most recent version of CVX, which is Ver sion 1.22 (build 829). You can check this using the command cvx_version .

Be sure to check your email often during the exam, just in case we need to send out an important announcement.

1

1. Optimal political positioning. A political constituency is a group of voters with similar views on a set of political issues. The electorate ( i.e. , the set of voters in some elec- tion) is partitioned (by a political analyst) into K constituencies, with (nonnegative)

populations P 1 ,

on each of n issues, but is willing to consider (presumably small) devia tions from her prior positions in order to maximize the total number of vote s she will receive. We let

x i R denote the change in her position on issue i , measured on some appropriate scale. (You can think of x i < 0 as a move to the ‘left’ and x i > 0 as a move to the ‘right’ on the issue, if you like.) The vector x R n characterizes the changes in her position on all issues; x = 0 represents the prior positions. On each issue she has a limit on how far in each direction she is willing to move, which we express as l x u , where l 0 and u 0 are given.

K . A candidate in the election has an initial or prior position

,P

The candidate’s position change x aﬀects the fraction of voters in each constituency that will vote for her. This fraction is modeled as a logistic function,

T

f k = g ( w x + v k ) ,

k

k = 1 ,

, K.

Here g ( z ) = 1 / (1+exp( z )) is the standard logistic function, and w k R n and v k R are given data that characterize the views of constituency k on the issues. Thus the total number of votes the candidate will receive is

V = P 1 f 1 + · · · + P K f K .

The problem is to choose x (subject to the given limits) so as to maximize V . The

problem data are l , u , and P k , w k , and v k for k = 1 ,

, K .

(a)

(b)

(c)

The general political positioning problem. Show that the objective function V need not be quasiconcave. (This means that the general optimal po litical positioning problem is not a quasiconvex problem, and therefore also not a convex problem.) In other words, choose problem data for which V is not a quasiconcave function of x .

The partisan political positioning problem. Now suppose the candidate focuses only on her core constituencies, i.e. , those for which a signiﬁcant fraction will vote for her. In this case we interpret the K constituencies as her core constituencies; we assume that v k 0, which means that with her prior position x = 0, at least half of each of her core constituencies will vote for her. We a dd the constraint that w x + v k 0 for each k , which means that she will not take positions that alienate a majority of voters from any of her core constituen cies. Show that the partisan political positioning problem ( i.e. , maximizing V with the additional assumptions and constraints) is convex.

Numerical example. Find the optimal positions for the partisan political posi- tioning problem with data given in opt_pol_pos_data.m . Report the number of

T

k

2

votes from each constituency under the politician’s prior p ositions ( x = 0) and optimal positions, as well as the total number of votes V in each case. You may use the function

g approx ( z ) = min {1 , g ( i ) + g ( i )( z i ) for i =

0 , 1 , 2 , 3 , 4 }

as an approximation of g for z 0. (The function g approx is also an upper bound on g for z 0.) For your convenience, we have included function deﬁniti ons for g and g approx ( g and gapx , respectively) in the data ﬁle. You should report the results (votes from each constituency and total) using g , but be sure to check that these numbers are close to the results using g approx (say, within one percent or so).

3

2. Portfolio optimization with qualitative return forecasts. We consider the risk-return portfolio optimization problem described on pages 155 and 1 85 of the book, with one twist: We don’t precisely know the mean return vector p¯. Instead, we have a range of possible values for each asset, i.e. , we have l, u R n with l p¯ u . We use l and u to encode various qualitative forecasts we have about the me an return vector p¯. For example, l 7 = 0 . 02 and u 7 = 0 . 20 means that we believe the mean return for asset 7 is between 2% and 20%.

Deﬁne the worst-case mean return R wc , as a function of portfolio vector x , as the worst (minimum) value of p¯ T x , over all p¯ consistent with the given bounds l and u .

 (a) Explain how to ﬁnd a portfolio x that maximizes R wc , subject to a budget con- straint and risk limit, 1 T x = 1 , x T Σx ≤ σ 2 max , where Σ ∈ S ++ n and σ max ∈ R ++ are given. (b) Solve the problem instance given in port_qual_forecasts_data.m . Give the optimal worst-case mean return achieved by the optimal port folio x ⋆ . In addition, construct a portfolio x mid that maximizes c T x subject to the budget constraint and risk limit, where c = (1 / 2)( l + u ). This is the optimal portfolio assuming that the mean return has the midpoint value of the fo recasts. Compare the midpoint mean returns c T x mid and c T x ⋆ , and the worst-case mean returns of x mid and x ⋆ . Brieﬂy comment on the results.

4

3. Learning a quadratic pseudo-metric from distance measurem ents. We are given a set

,x

d 1 ,

The goal is to ﬁnd (or estimate or learn) a quadratic pseudo-m etric d ,

of N pairs of points in R n , x 1 ,

N , and y 1 ,

,y

N , together with a set of distances

,d

N > 0.

d ( x, y ) = ( x y ) T P ( x y ) 1/2 ,

with P S + n , which approximates the given distances, i.e. , d ( x i ,y i ) d i . (The pseudo- metric d is a metric only when P 0; when P 0 is singular, it is a pseudo-metric.)

To do this, we will choose P S + n that minimizes the mean squared error objective

1

N

N

i=1

( d i d ( x i ,y i )) 2 .

 (a) Explain how to ﬁnd P using convex or quasiconvex optimization. If you can- not ﬁnd an exact formulation ( i.e. , one that is guaranteed to minimize the total squared error objective), give a formulation that approxim ately minimizes the given objective, subject to the constraints. (b) Carry out the method of part (a) with the data given in quad_metric_data.m . The columns of the matrices X and Y are the points x i and y i ; the row vector d gives the distances d i . Give the optimal mean squared distance error. We also provide a test set, with data X_test , Y_test , and d_test . Report the mean squared distance error on the test set (using the metric found using the data set above).

5

4. Optimal parimutuel betting. In parimutuel betting , participants bet nonnegative amounts on each of n outcomes, exactly one of which will actually occur. (For exa mple, the out- come can be which of n horses wins a race.) The total amount bet by all participants on all outcomes is called the pool or tote . The house takes a commission from the pool (typically around 20%), and the remaining pool is divided am ong those who bet on the outcome that occurs, in proportion to their bets on the ou tcome. This problem concerns the choice of the amount to bet on each outcome.

Let x i 0 denote the amount we bet on outcome i , so the total amount we bet on all outcomes is 1 T x . Let a i > 0 denote the amount bet by all other participants on outcome i , so after the house commission, the remaining pool is P = (1 c )( 1 T a + 1 T x ), where c (0 , 1) is the house commission rate. Our payoﬀ if outcome i occurs is then

p i =

x

i

a i P.

x i +

The goal is to choose x , subject to 1 T x = B (where B is the total amount to be bet, which is given), so as to maximize the expected utility

n

i=1

π i U ( p i ) ,

where π i is the probability that outcome i occurs, and U is a concave increasing utility function, with U (0) = 0. You can assume that a i , π i , c , B , and the function U are known.

Explain how to ﬁnd an optimal x using convex or quasiconvex optimization. If you use a change of variables, be sure to explain how your variables a re related to x .

Remarks.

To carry out this betting strategy, you’d need to know a i , and then be the last participant to place your bets (so that a i don’t subsequently change). You’d also need to know the probabilities π i . These could be estimated using sophisticated machine learning techniques or insider information.

The formulation above assumes that the total amount to bet ( i.e. , B ) is known. If it is not known, you could solve the problem above for a rang e of values of B and use the value of B that yields the largest optimal expected utility.

6

5. Polyhedral cone questions. You are given matrices A R n × k and B R n × p .

Explain how to solve the following two problems using convex optimization. Your solution can involve solving multiple convex problems, as l ong as the number of such problems is no more than linear in the dimensions n, k, p .

 (a) How would you determine whether AR k + ⊆ BR + p ? This means that every nonneg- ative linear combination of the columns of A can be expressed as a nonnegative linear combination of the columns of B . (b) How would you determine whether AR + k = R n ? This means that every vector in R n can be expressed as a nonnegative linear combination of the c olumns of A .

7

6. Resource allocation in stream processing. A large data center is used to handle a stream of J types of jobs. The traﬃc (number of instances per second) of e ach job type is denoted t R + . Each instance of each job type (serially) invokes or calls a set of processes. There are P types of processes, and we describe the job-process relatio n by

J

the P × J matrix R pj =

1

0 otherwise .

job j invokes process p

The process loads (number of instances per second) are given by λ = Rt R P , i.e. , λ p is the sum of the traﬃc from the jobs that invoke process p .

The latency of a process or job type is the average time that it takes one instance to complete. These are denoted l proc R P and l job R J , respectively, and are related

by l job = R T l proc , i.e. , l

is the sum of the latencies of the processes called by j .

Job latency is important to users, since l

to handle an instance of job type j . We are given a maximum allowed job latency:

is the average time the data center takes

job

j

job

j

l job l max .

The process latencies depend on the process load and also how much of n diﬀerent resources are made available to them. These resources might include, for example, number of cores, disk storage, and network bandwidth. Here, we represent amounts of these resources as (nonnegative) real numbers, so x p R + n represents the resources allocated to process p . The process latencies are given by

l

proc

p

= ψ p ( x p p ) ,

p = 1 ,

, P,

where ψ p : R n × R R ∪ {∞} is a known (extended-valued) convex function. These functions are nonincreasing in their ﬁrst (vector) arguments, and nondecreasing in their second arguments ( i.e. , more resources or less load cannot increase latency). We interpret ψ p ( x p p ) = to mean that the resources given by x p are not suﬃcient to handle the load λ p .

We wish to allocate a total resource amount x tot R ++ n among the P processes, so we have p=1 x p x tot . The goal is to minimize the objective function

P

J

j=1

w j ( t

tar

j

t j ) + ,

is the target traﬃc level for job type j , w j > 0 give the priorities, and ( u ) +

is the nonnegative part of a vector, i.e. , u i = max {u i , 0 }. (Thus the objective is a weighted penalty for missing the target job traﬃc.) The vari ables are t R + and

, P . The problem data are the matrix R , the vectors l max , x tot , t tar ,

x p R + n , p = 1 ,

where t

tar

j

J

 and w , and the functions ψ p , p = 1 , ,P . 8
 (a) Explain why this is a convex optimization problem. (b) Solve the problem instance with data given in res_alloc_stream_data.m , with latency functions

ψ p ( x p p ) =

1 / ( a x p λ p ) a x p p ,

T

p

T

p

otherwise

x p x

min

p

p are

stored as the columns of the matrices A and x_min , respectively. Give the optimal objective value and job traﬃc. Compare the o ptimal job traﬃc with the target job traﬃc.

where a p R ++ n and x

min

p

R ++ n are given data. The vectors a p and x

min

9

7.

Probability bounds. Consider random variables X 1 ,X 2 ,X 3 ,X 4 that take values in {0 , 1 }. We are given the following marginal and conditional probabi lities:

Prob ( X 1 = 1) = 0 . 9 , Prob ( X 2 = 1) = 0 . 9 ,

Prob ( X 3 = 1) = 0 . 1 ,

0 . 7 ,

Prob ( X 4 = 1 | X 2 = 1 ,X 3 = 0) = 0 . 6 .

Prob ( X 1 = 1 ,X 4 = 0 | X 3 = 1) =

Explain how to ﬁnd the minimum and maximum possible values of Prob ( X 4 = 1), over all (joint) probability distributions consistent wit h the given data. Find these values and report them.

Hints. (You should feel free to ignore these hints.)

CVX supports multidimensional arrays; for example, variable p(2,2,2,2) de- clares a 4-dimensional array of variables, with each of the four indices taking the values 1 or 2.

The function sum(p,i) sums a multidimensional array p along the i th index.

The expression sum(a(:)) gives the sum of all entries of a multidimensional array a . You might want to use the function deﬁnition sum_all = @(A) sum( A(:)); , so sum_all(a) gives the sum of all entries in the multidimensional array a .

10

8. Perturbing a Hamiltonian to maximize an energy gap. A ﬁnite dimensional approxima- tion of a quantum mechanical system is described by its Hamilt onian matrix H S n . We label the eigenvalues of H as λ 1 ≤ · · · ≤ λ n , with corresponding orthonormal

n . In this context the eigenvalues are called the energy level s of

the system, and the eigenvectors are called the eigenstates . The eigenstate v 1 is called the ground state, and λ 1 is the ground energy. The energy gap (between the ground

eigenvectors v 1 ,

,v

and next state) is η = λ 2 λ 1 .

By changing the environment (say, applying external ﬁelds) , we can perturb a nominal Hamiltonian matrix to obtain the perturbed Hamiltonian, which has the form

H

= H nom +

k

i=1

x

i H i .

Here H nom S n is the nominal (unperturbed) Hamiltonian, x R k gives the strength

or value of the perturbations, and H 1 ,

, k . The

problem is to choose x to maximize the gap η of the perturbed Hamiltonian, subject to the constraint that the perturbed Hamiltonian H has the same ground state (up to scaling, of course) as the unperturbed Hamiltonian H nom . The problem data are the

k S n characterize the perturbations. We

have limits for each perturbation, which we express as | x i | ≤ 1, i = 1 ,

,H

nominal Hamiltonian matrix H nom and the perturbation matrices H 1 ,

,H

k

.

 (a) Explain how to formulate this as a convex or quasiconvex o ptimization problem. If you change variables, explain the change of variables cle arly. (b) Carry out the method of part (a) for the problem instance w ith data given in hamiltonian_gap_data.m . Give the optimal perturbations, and the energy gap for the nominal and perturbed systems. The data H i are given as a cell array; H{i} gives H i .

11