Beruflich Dokumente
Kultur Dokumente
3, MARCH 2014
675
I. I NTRODUCTION
number of cores, become the real problem [3]. The networkon-chip (NoC) design paradigm [4] is recognized as the most
viable way to tackle with scalability and variability issues that
characterize the ultradeep submicronmeter era.
Nowadays, the on-chip communication issues are as relevant
as, and in some cases more relevant than, the computationrelated issues [4]. In fact, the communication subsystem
increasingly impacts the traditional design objectives, including cost (i.e., silicon area), performance, power dissipation,
energy consumption, reliability, etc. As technology shrinks,
an ever more significant fraction of the total power budget
of a complex many-core system-on-chip (SoC) is due to the
communication subsystem.
In this paper, we focus on techniques aimed at reducing
the power dissipated by the network links. In fact, the power
dissipated by the network links is as relevant as that dissipated
by routers and network interfaces (NIs) and their contribution
is expected to increase as technology scales [5]. In particular,
we present a set of data encoding schemes operating at flit
level and on an end-to-end basis, which allows us to minimize both the switching activity and the coupling switching
activity on links of the routing paths traversed by the packets.
The proposed encoding schemes, which are transparent with
respect to the router implementation, are presented and discussed at both the algorithmic level and the architectural
level, and assessed by means of simulation on synthetic and
real traffic scenarios. The analysis takes into account several
aspects and metrics of the design, including silicon area, power
dissipation, and energy consumption. The results show that by
using the proposed encoding schemes up to 51% of power
and up to 14% of energy can be saved without any significant
degradation in performance and with 15% area overhead in
the NI.
The rest of this paper is organized as follows. We briefly
discuss related works in Section II, while Section III presents
an overview of the proposed data encoding schemes. The
proposed data encoding schemes along with possible hardware
implementations and their analysis are described in Section IV.
In Section V, the results for the hardware overhead, power and
energy savings, and performance reduction of the proposed
data encoding schemes are compared with those of other
approaches. Finally, this paper is concluded in Section VI.
II. R ELATED W ORKS AND C ONTRIBUTIONS
In the next several years, the availability of chips with 1000
cores is foreseen [6]. In these chips, a significant fraction of
1063-8210 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
676
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 3, MARCH 2014
TABLE I
E FFECT OF O DD I NVERSION ON C HANGE OF T RANSITION T YPES
Time
t 1
t
Normal
Odd Inverted
Type I
Types II, III, and IV
00, 11 00, 11, 01, 10 01, 10 00, 11 00, 11, 01, 10 01, 10
10, 01 01, 10, 00, 11 11, 00 11, 00 00, 11, 01, 10 10, 01
T1*
T1**
T1*** Type III
Type IV
Type II
t 1
t
Type II
01, 10
10, 01
Type I
01, 10
11, 00
t 1
t
Type III
00, 11
11, 00
Type I
00, 11
10, 01
t 1
t
Type IV
00, 11, 01, 10
00, 11, 01, 10
Type I
00, 11, 01, 10
01, 10, 00, 11
In addition, the scheme was based on the hop-by-hop technique, and therefore, encoding/decoding is performed in each
node.
The scheme presented in [26] dealt with reducing the
coupling switching. In this method, a complex encoder counts
the number of Type I (Table I) transitions with a weighting
coefficient of one and the number of Type II transitions with
the weighting coefficient of two. If the number is larger
than half of the link width, the inversion will be performed.
In addition to the complex encoder, the technique only works
on the patterns whose full inversion leads to the link power
reduction while not considering the patterns whose full inversions may lead to higher link power consumption. Therefore,
the link power reduction achieved through this technique is
not as large as it could be. This scheme was also based on the
hop-by-hop technique.
In another coding technique presented in [25], bunches of
four bits are encoded with five bits. The encoded bits were
isolated using shielding wires such that the occurrence of
the patterns 101 and 010 were prevented. This way, no
simultaneous Type II transitions in two adjacent pair bits
are induced. This technique effectively reduces the coupling
switching activity. Although the technique reduces the power
consumption considerably, it increases the data transfer time,
and, hence, the link energy consumption. This is due to the fact
that for each four bits, six bits are transmitted which increases
the communication traffic. This technique was also based on
the hop-by-hop approach.
A coding technique that reduces the coupling switching
activity by taking the advantage of end-to-end encoding for
wormhole switching has been presented in [23]. It is based on
lowering the coupling switching activity by eliminating only
Type II transitions.
In this paper, we present three encoding schemes.
In Scheme I, we focus on reducing Type I transitions while
in Scheme II, both Types I and II transitions are taken into
account for deciding between half and full invert, depending
the amount of switching reduction. Finally, in Scheme III,
we consider the fact that Type I transitions show different
behaviors in the case of odd and even inverts and make the
inversion which leads to the higher power saving.
JAFARZADEH et al.: DATA ENCODING TECHNIQUES FOR REDUCING ENERGY CONSUMPTION IN NoC
677
(1)
(2)
(3)
(4)
A. Scheme I
In scheme I, we focus on reducing the numbers of Type I
transitions (by converting them to Types III and IV transitions) and Type II transitions (by converting them to Type I
transition). The scheme compares the current data with the
previous one to decide whether odd inversion or no inversion
of the current data can lead to the link power reduction.
1) Power Model: If the flit is odd inverted before being
transmitted, the dynamic power on the link is
+ K 1 T1 + K 2 T2 + K 3 T3 + K 4 T4 Cc (5)
P T01
, T1 , T2 , T3 , and T4 , are the self-transition activity,
where T01
and the coupling transition activity of Types I, II, III, and IV,
respectively. Table I reports, for each transition, the relationship between the coupling transition activities of the flit when
transmitted as is and when its bits are odd inverted. Data are
organized as follows. The first bit is the value of the generic
ith line of the link, whereas the second bit represents the value
of its (i + 1)th line. For each partition, the first (second) line
represents the values at time t 1 (t).
As Table I shows, if the flit is odd inverted, Types II, III,
and IV transitions convert to Type I transitions. In the case of
Type I transitions, the inversion leads to one of Types II, III, or
Type IV transitions. In particular, the transitions indicated as
T1 , T1 , and T1 in the table convert to Types II, III, and IV
transitions, respectively. Also, we have T01
= T00(odd) +
T01(even) where odd/even refers to odd/even lines. Therefore,
(5) can be expressed as
P T00(odd) + T01(even) Cs
+[K 1 (T2 +T3 +T4 )+ K 2 T1 + K 3 T1 + K 4 T1 ]Cc . (6)
678
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 3, MARCH 2014
(a)
(b)
Fig. 1. Encoder architecture scheme I. (a) Circuit diagram [27]. (b) Internal
view of the encoder block (E).
(7)
(8)
Ty = T2 + T1 T1
(9)
(10)
(11)
2) Proposed Encoding Architecture: The proposed encoding architecture, which is based on the odd invert condition
defined by (12), is shown in Fig. 1. We consider a link width
of w bits. If no encoding is used, the body flits are grouped
in w bits by the NI and are transmitted via the link. In
our approach, one bit of the link is used for the inversion
bit, which indicates if the flit traversing the link has been
inverted or not. More specifically, the NI packs the body
flits in w 1 bits [Fig. 1(a)]. The encoding logic E, which
is integrated into the NI, is responsible for deciding if the
inversion should take place and performing the inversion if
needed. The generic block diagram shown in Fig. 1(a) is the
same for all three encoding schemes proposed in this paper
and only the block E is different for the schemes. To make
the decision, the previously encoded flit is compared with the
current flit being transmitted. This latter, whose w bits are the
concatenation of w 1 payload bits and a 0 bit, represents
the first input of the encoder, while the previous encoded flit
represents the second input of the encoder [Fig. 1(b)]. The
w 1 bits of the incoming (previous encoded) body flit are
indicated by X i (Yi ), i = 0, 1, . . . , w 2. The wth bit of the
previously encoded body flit is indicated by inv which shows
if it was inverted (inv = 1) or left as it was (inv = 0). In the
encoding logic, each Ty block takes the two adjacent bits of
the input flits (e.g., X 1 X 2 Y1 Y2 , X 2 X 3 Y2 Y3 , X 3 X 4 Y3 Y4 , etc.)
and sets its output to 1 if any of the transition types of Ty is
detected. This means that the odd inverting for this pair of bits
leads to the reduction of the link power dissipation (Table I).
The Ty block may be implemented using a simple circuit.
The second stage of the encoder, which is a majority voter
block, determines if the condition given in (12) is satisfied
(a higher number of 1s in the input of the block compared
to 0s). If this condition is satisfied, in the last stage, the
inversion is performed on odd bits. The decoder circuit simply
inverts the received flit when the inversion bit is high.
B. Scheme II
In the proposed encoding scheme II, we make use of both
odd (as discussed previously) and full inversion. The full
inversion operation converts Type II transitions to Type IV
transitions. The scheme compares the current data with the
previous one to decide whether the odd, full, or no inversion
of the current data can give rise to the link power reduction.
1) Power Model: Let us indicate with P, P , and P the
power dissipated by the link when the flit is transmitted with
no inversion, odd inversion, and full inversion, respectively.
The odd inversion leads to power reduction when P < P
and P < P. The power P is given by [23]
P T1 + 2T4 .
(13)
(14)
(15)
JAFARZADEH et al.: DATA ENCODING TECHNIQUES FOR REDUCING ENERGY CONSUMPTION IN NoC
679
lg w
2
(a)
lg w
2
lg w
2
(b)
Fig. 2.
Fig. 3. Decoder architecture Scheme II. (a) Circuit diagram. (b) Internal
view of the decoder block (D).
(19)
680
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 3, MARCH 2014
TABLE II
E FFECT OF E VEN I NVERSION ON C HANGE OF T RANSITION T YPES
Time
Normal
Even Inverted
Type I
Types II, III, and
t 1 01, 10 00, 11, 01, 10 00, 11 01, 10 00, 11, 01, 10
t 00, 11 10, 01, 11, 00 01, 10 10, 01 00, 11, 01, 10
T1*
T1**
T1*** Type II
Type IV
lg w
2
IV
00, 11
11, 00
Type III
lg w
2
t 1
t
Type II
01, 10
10, 01
Type I
01, 10
00, 11
t 1
t
Type III
00, 11
11, 00
Type I
00, 11
01, 10
t 1
t
Type IV
00, 11, 01, 10
00, 11, 01, 10
Type I
00, 11, 01, 10
10, 01, 11, 00
Defining
lg w
2
lg w
2
Te = T2 + T1 T1
(20)
(22)
(23)
Fig. 4.
(30)
(24)
Tr = T3 + T4 + T1
Te = T2 + T1 T1 .
(25)
(26)
(27)
JAFARZADEH et al.: DATA ENCODING TECHNIQUES FOR REDUCING ENERGY CONSUMPTION IN NoC
15%
% Area Increase
% Power Increase
10%
100%
TypeI Decrease
681
TypeII Decrease
50%
5%
0%
H32
H16
H08
H04
HF32
HF16
HF08
HF04
OEF32
OEF16
OEF08
OEF04
SCS32
SCS16
SCS08
SCS04
SC32
SC16
SC08
SC04
BI32
B16
BI08
BI04
CDBI32
CDBI16
CDBI08
CDBI04
FPC
-50%
50%
40%
30%
20%
10%
0%
-10%
H32
H16
H08
H04
HF32
HF16
HF08
HF04
OEF32
OEF16
OEF08
OEF04
SCS32
SCS16
SCS08
SCS04
SC32
SC16
SC08
SC04
BI32
B16
BI08
BI04
CDBI32
CDBI16
CDBI08
CDBI04
FPC
H32
H16
H8
H4
HF32
HF16
HF8
HF4
OEF32
OEF16
OEF8
OEF4
SCS32
SCS16
SCS8
SCS4
SC32
SC16
SC8
SC4
BI32
BI16
BI8
BI4
CDBI32
CDBI16
CDBI8
CDBI4
FPC
0%
Fig. 7.
682
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 3, MARCH 2014
80%
PDF
Video
Music
Text
Picture
30%
-20%
Fig. 8.
Total power saving using different data encoding schemes for several data streams.
30%
20%
10%
0%
-10%
-20%
-30%
PDF
Video
Music
Text
Picture
Total energy saving using different data encoding schemes for several data streams.
show the highest power savings. This shows that our proposed
schemes (along with the FPC technique) provide more power
reductions when compared with other schemes. In the case of
OEF4, we achieve the highest energy reduction of about 14%.
All the proposed techniques provide some energy reductions.
Among the schemes, BI and CDBI increase the total energy
consumption. Interestingly, while the power reduction of FPC
was impressive, its energy reduction is not noteworthy. This is
due to the fact that this technique makes use of more than one
control bit for coding. Since the bandwidth of the data transfer
is fixed, having a higher number of control bits leads to a
lower transfer rate of the actual data, increasing the number
of flits, traffic, and the transfer time. Therefore, the total energy
consumption increases for this scheme.
The power (energy) savings obtained when different data
sets including PDF, video, music, text, and picture are used
as the workloads are given in Fig. 8 (Fig. 9). As can be
observed, when the partitioning is based on 4-bit width, all the
three proposed schemes show energy savings for all the data
streams considered in this paper. Also, in the case of OEF4, the
saving is the largest among all the encoding schemes. For this
encoding scheme, the maximum of energy and power more
than 20% and 60%, respectively, was achieved for the picture
workload.
Finally, it should be pointed out, in general, that the efficacy
of any encoding schemes depends on workload data patterns
which are transmitted via the bus. In the cases of the encoding
techniques proposed in this paper, the switching activities
of two consequent flits of the actual data affect the amount
of savings. These activities may vary from one application
to another. Higher (lower) activities provide more (fewer)
opportunities for the power saving by the proposed encoding
schemes. Specifically, if an application is dominated by integer
computation, there could be cases that the MSB bits are not
always toggled. This is due to the fact that the range of
numbers may be limited, and hence, the sign bits may remain
unchanged. Of course, this also depends on the representation
of the signed numbers. For these applications, the proposed
encoding schemes may provide lower power/energy savings.
In these cases, one may apply the coding technique only to
H32
H08
HF32
HF08
OEF32
OEF08
SCS32
SCS08
SC32
SC08
BI32
BI08
60%
%Total Power Saving
Fig. 9.
50%
40%
30%
20%
10%
0%
0%
10%
20%
30%
40%
50%
H16
H04
HF16
HF04
OEF16
OEF04
SCS16
SCS04
SC16
SC04
B16
BI04
60%
Fig. 10. Increase of the completion time versus increase of power dissipation.
JAFARZADEH et al.: DATA ENCODING TECHNIQUES FOR REDUCING ENERGY CONSUMPTION IN NoC
60%
683
40%
20%
0%
H32
H16
H08
H04
HF32
HF16
HF08
HF04
OEF32
OEF16
OEF08
OEF04
SCS32
SCS16
SCS08
SCS04
SC32
SC16
SC08
SC04
BI32
B16
BI08
BI04
CDBI32
CDBI16
CDBI08
CDBI04
FPC
-20%
(a)
Fig. 11.
60%
(b)
40%
20%
0%
H32
H16
H08
H04
HF32
HF16
HF08
HF04
OEF32
OEF16
OEF08
OEF04
SCS32
SCS16
SCS08
SCS04
SC32
SC16
SC08
SC04
BI32
B16
BI08
BI04
CDBI32
CDBI16
CDBI08
CDBI04
FPC
-20%
Fig. 12.
Fig. 13.
684
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 22, NO. 3, MARCH 2014
JAFARZADEH et al.: DATA ENCODING TECHNIQUES FOR REDUCING ENERGY CONSUMPTION IN NoC
685