Sie sind auf Seite 1von 2

Kathleen Hazel Reyes HOMEWORK # 2 EE 265

PART C :VECTOR MACHINES

NOTE
VAR
ADD 1 CYCLE X
MULT 2 CYCLE Y
LOAD / STORE 4 CYCLE M

CYCLE
Intr
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45

LV1 F D R M1 M2 M3 M4 W

LV1 R M1 M2 M3 M4 W

LV1 R M1 M2 M3 M4 W

LV1 R M1 M2 M3 M4 W

SLTSV F D - - - - - - - - R X1 W

SLTSV R X1 W

SLTSV R X1 W

SLTSV R X1 W

LV2 F D - - - - - - - - - - - - - R M1 M2 M3 M4 W

LV2 R M1 M2 M3 M4 W

LV2 R M1 M2 M3 M4 W

LV2 R M1 M2 M3 M4 W

MUL.1 F D - - - - - - - - - - - - - - - - - - - - - R Y1 Y2 W

MUL.1 R Y1 Y2 W

MUL.1 R Y1 Y2 W

MUL.1 R Y1 Y2 W

SV 1 F D - - - - - - - - - - - - - - - - - - - - - - - - - - - R M1 M2 M3 M4 W

SV 1 R M1 M2 M3 M4 W

SV 1 R M1 M2 M3 M4 W

SV 1 R M1 M2 M3 M4 W
16 banks are needed to complete the 4 words per cycle.
This is because the memory bank is busy for 4 cycles thus 20 words can be process
in 5 cycles if we have 16 banks as shown on the table.

BANK #
CYCLE
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 2 3 4 32 wordscan be process in 8 cycles
2 busy busy busy busy 5 6 7 8 32/8 = 4 words/ cycle
3 busy busy busy busy busy busy busy busy 9 10 11 12
4 busy busy busy busy busy busy busy busy busy busy busy busy 13 14 15 16
5 17 18 19 20 busy busy busy busy busy busy busy busy busy busy busy busy
6 busy busy busy busy 21 22 23 24 busy busy busy busy busy busy busy busy
7 busy busy busy busy busy busy busy busy 25 26 27 28 busy busy busy busy
8 busy busy busy busy busy busy busy busy busy busy busy busy 29 30 31 32

Stride: 2
BANK #
CYCLE
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 2
2 busy busy 3 4 8 words can be process in 4 cycles
3 busy busy busy busy 5 6 8/4 = 2 words/ cycle
4 busy busy busy busy busy busy 7 8
5 9 10 busy busy busy busy busy busy
6 busy busy 11 12 busy busy busy busy
7 busy busy busy busy 13 14 busy busy
8 busy busy busy busy busy busy 15 16

Stride: 3
BANK #
CYCLE
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 2 3 4 5 6
2 busy 7 busy 8 busy 9 busy 10 busy 11 busy 16 words that can be process until cycle 4
3 busy 12 busy busy 13 busy busy 14 busy busy 15 busy busy 16 busy busy 16/4 = 4 words per cycle
4 busybusybusybusybusybusybusybusybusybusybusybusybusybusybusybusy All banks are busy on cycle 4.

Stride: 4
BANK #
CYCLE
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 8 words that can be process until cycle 8
2 busy 2 8/8 = 1 word per cycle
3 busy busy 3
4 busy busy busy 4
5 5 busy busy busy
6 busy 6 busy busy
7 busy busy 7 busy
8 busy busy busy 8

Stride:5
BANK #
CYCLE
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1 2 3
2 busy 4 busy 5 busy 6 Bank latency will be = 5 if stride is equal to 5
3 busy 7 busy busy 8 busy busy 9 busy Thus the performance will be worse because the it will exceed the
4 busy 10 busy busy busy 11 busy busy busy 12 busy busy original latency of the bank which is 4.
5 idle 13 busy busy busy idle 14 busy busy busy idle 15 busy busy busy
6 16 busy busy busy idle 17 busy busy busy idle 18 busy busy busy idle 33/10 3.3 word/ cycle
7 busy busy busy idle 19 busy busy busy idle 20 busy busy busy idle 21
8 busy busy idle 22 busy busy busy idle 23 busy busy busy idle 24 busy
9 busy idle 25 busy busy busy idle 26 busy busy busy idle 27 busy busy
10 idle 28 busy busy busy idle 29 busy busy busy idle 30 busy busy busy

Stride:6
BANK #
CYCLE
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 1
2 busy 2
3 busy busy 3 16/10
4 busy busy busy 4 1.6 words/ cycle
5 idle busy busy busy 5
6 idle idle busy busy busy 6
7 7 idle idle busy busy busy
8 busy 8 idle idle busy busy
9 busy busy 9 idle idle busy
10 busy busy busy 10 idle idle

Given the same number of banks, and memory latency, stride = 2 can effectively deliver only 2 words / cycle.
Making it less efficient than when the stride = 1.
As for stride = 3 and stride = 4, they can effectively deliver 4 and 1 word/cycle respectively.
It can be notice that stride=3 almost has an equal performance as stride =1 . Initially, it can deliver 6 - 5 words/cycle,
however all banks every 3rd stride were busy, thus it can only deliver at least 4 words/cycle.

As for stride > 4, latency of memory banks exceeds 4 cycles, thus making them less efficient memory subsystem.

Das könnte Ihnen auch gefallen