Beruflich Dokumente
Kultur Dokumente
Page 2
3. In the context of the state equations of LSTM, we have seen that ht = ot σ(st ) where
ht , ot , st ∈ Rn . What is the derivative of ht w.r.t. st ?
A. Vector
B. Tensor
C. Matrix
Page 3
4. Continuing the previous question, how many non-zero entries does the derivative of ht
w.r.t. st have?
A. No non-zero entries
B. n
C. n2 − n
Page 4
5. In the context of LSTMs, the gradient of Lt (θ) w.r.t θi vanishes when
A. the gradients flowing through at least one path from Lt (θ) to θi
vanishes.
B. the gradients flowing through each and every path from Lt (θ) to
θi vanishes.
Page 5
6. Which of the following options represent the full set of equations for GRU gates where
st represents the state of the GRU and ht refers to the intermediate output?
A. ot = σ(Wo ht−1 + Uo xt + bo )
it = σ(Wi ht−1 + Ui xt + bi )
B. ot = σ(Wo st−1 + Uo xt + bo )
it = σ(Wi st−1 + Ui xt + bi )
Page 6
7. Consider a GRU where the input x ∈ Rm and the state of GRU s ∈ Rn at any time step
t. What is the total number of parameters in this GRU ?
A. n2 + nm + 2n
B. 3 × (n2 + nm + n)
C. n + 3 × (n2 + nm + n)
D. 4 × (n2 + nm + n)
Solution: From the equations of GRU, we know that it has the parameters namely,
Wo , Wi , W, Uo , Ui , U, bo , bi , b where the dimensions of W 0 s ∈ Rn×n , U 0 s ∈ Rm×n and
b’s ∈ Rn . Therefore, Option B is the correct answer.
Page 7
8. Consider the following statements in the context of LSTMs:
Page 8
9. Consider the RNN with the following equations:
st = σ(U x + W st−1 + b)
yt = O(V st + c)
where st is the state of the network at timestep t and the parameters W,U,V,b,c are
shared across timesteps. The loss Lt (θ) is defined as :
Lt (θ) = − log(ytc )
where ytc is the predicted probability of true output at time-step t. Given the above
RNN, find ∂L∂st (θ)
t
at t = 4.
A. ∂L4 (θ)
∂s4 = − OO(V s4 +c)
0 (V s +c)
4
Page 9
∂L (θ)
10. Considering the same RNN setup as defined in the previous question, find ∂V
.
A. ∂L (θ)
∂V = −st OO(V st +c)
0 (V s +c)
t
0
B. ∂L (θ)
∂V = −st OO(V
(V st +c)
st +c)
∂L (θ)
= t=1 −st OO(V st +c)
PT
C. ∂V 0 (V s +c)
t
0
∂L (θ)
= t=1 −st OO(V (V st +c)
PT
D. ∂V st +c)
Page 10