You are on page 1of 42

MemoryBasedComputingforDSP

Applications Applications
Pr amod Meher
I nst i t ut e f or I nf ocomm Resear ch
Si ngapor e Si ngapor e
outline
trendsinmemorytechnology
memorybasedcomputing:advantagesandexamples
DAbasedcomputationforDSPapplications
l k bl d i f l i li i lookuptabledesignforconstantmultiplication
DAbasedvs LUTmultiplierbasedimplementations
b d l ti f li f ti memorybasedevaluationofnonlinearfunctions
conclusions
2
12/17/2010 Institute for Infocomm Research, Singapore
trends in memory technology
Applicationspecific memories [14]
lowpowermemoriesformobiledevicesand
consumerproducts
high speed memories for multimedia applications highspeedmemoriesformultimediaapplications
widetemperaturememoriesforautomotive
high reliability memories for biomedical instruments highreliabilitymemoriesforbiomedicalinstruments
radiationhardenedmemoryforspaceapplications
3
12/17/2010 Institute for Infocomm Research, Singapore
trends in memory technology
RAMlogicintegration
severalnonvolatileRAMtypesareemerging:
ferroelectricRAM(FeRAM),magnetoresistiveRAM
(MRAM), and varieties of phase change memory (MRAM),andvarietiesofphasechangememory
(PCM)[46]
theupcoming/newmemoriesprovidefasteraccess
andconsumelesspower[46]
canbeembeddeddirectlyintothestructureof
microprocessors or integrated in the functional microprocessorsorintegratedinthefunctional
elementsofdedicatedprocessors[7]
4
12/17/2010 Institute for Infocomm Research, Singapore
trends in memory technology
memoryplacement[711]
traditionalconceptofmemoryasastandalone
subsystemisgettingchanged
it is embedded within the logic components itisembeddedwithinthelogiccomponents
processorhasbeenmovedtomemoryormemoryhas
beenmovedtoprocessor p
therelocationsresultinhigherbandwidth,lowerpower
consumptionandlessaccessdelay
5
12/17/2010 Institute for Infocomm Research, Singapore
memory-based computing ?
aclassofdedicatedsystems,wherethecomputational
functions are performed by lookup tables (LUTs) functionsareperformedbylookuptables(LUTs),
insteadofactualcalculations
closetohumanlikecomputing p g
simpletodesign,andmoreregularcomparedwiththe
multiplyaccumulatestructures
havepotentialforhighthroughputandreduced
latencyimplementation
i l l d i ti d t involveslessdynamicpowerconsumptiondueto
minimizationofswitchingactivities
6
12/17/2010 Institute for Infocomm Research, Singapore
memory-based computations: examples
innerproductcomputationusingthedistributed
arithmetic (DA) [12] arithmetic(DA)[12]
directimplementationofconstantmultiplications[13]
ll it d f di it l filt i d th l wellsuitedfordigitalfilteringandorthogonal
transformationsfordigitalsignalprocessing
implementation of fixed and adaptive FIR filters and implementationoffixedandadaptiveFIRfiltersand
transforms
otherapplications:evaluationoftrigonometric
functions,sigmoidandothernonlinearfunction
7
12/17/2010 Institute for Infocomm Research, Singapore
DA to calculate inner-product : example
X = [X
0
X
1
X
2
]
T
and A = [A
0
, A
1
, A
2
]
T
: 3-point vectors. A is constant
X [ (3) (2) (1) (0)]
X
0
, X
1
and X
2
be 4-bit integers:
X
0
= [ x
0
(3) x
0
(2) x
0
(1) x
0
(0)]
X
1
= [ x
1
(3) x
1
(2) x
1
(1) x
1
(0)]
X
2
= [ x
2
(3) x
2
(2) x
2
(1) x
2
(0)]
inner-product of X and A : A.X =A
0
X
0
+ A
1
X
1
+ A
2
X
2
P
0
P
1
P
2
P
3
i d t f A X P + 2P + 4P + 8P
8
12/17/2010
inner-product of : A.X = P
0
+ 2P
1
+ 4P
2
+ 8P
3
Institute for Infocomm Research, Singapore
LUT for inner-product using DA [12]
x
2
(i) x
1
(i) x
0
(i)
partial sum
0
A
0
LUT
3

x (0) x (1) x (2) x (3)
0 0 0 0
0 0 1
A
0
0 1 0
A
1
A
0
A
1
A
1
+A
0
A
2
T
O

8

L
I
N
E

D
x
0
(0)
x
1
(0)
x
0
(1)
x
1
(1)
x
0
(2)
x
1
(2)
x
0
(3)
x
1
(3)
0 1 1
A
1
+A
0
1 0 0
A
2
1 0 1
A
2
+A
0
A
2
A
2
+A
0
A
2
+A
1
A
2
+A
1
+A
0
D
E
C
O
D
E
R
x
2
(0) x
2
(1) x
2
(2) x
2
(3)
1 1 0
A
2
+A
1
1 1 1 A
2
+A
1
+A
0
2 1 0
inner-product A.X
+
2^N LUT words required for N-point inner-product. For N=32, it exceeds 10^9 words!!
shiftright
9
12/17/2010
For L-bit inputs, computation time = L cycles : Cycle time, T=T
MEM
+ T
ADD
+ T
FF
Institute for Infocomm Research, Singapore
LUT compaction for DA [12]
x
2
(i) x
1
(i) x
0
(i) conventional
OBC LUT content
0 0 0 0 - (A
2
+A
1
+A
0
)
0 0 1
A
0
- ( A
2
+A
1
-A
0
)
0 1 0
A
1
- (A
2
-A
1
+A
0
)
0 1 1
A
1
+A
0
- (A
2
-A
1
-A
0
)
1 0 0
A
2
(A
2
-A
1
-A
0
)
1 0 1
A
2
+A
0
(A
2
-A
1
+A
0
)
( )
1 1 0
A
2
+A
1
( A
2
+A
1
-A
0
)
1 1 1 A
2
+A
1
+A
0
(A
2
+A
1
+A
0
)
Desired partial sum of product = [OBC value + (A
2
+A
1
+A
0
)]/2
half the number of LUT words are saved if OBC is used
10
12/17/2010 Institute for Infocomm Research, Singapore
linear convolution/ FIR filtering [13]
N-tap FIR filter equation:
address LUT content
0000 0
0001 h[0]
y[n]=h[0] x[n]+ h[1] x[n-1] +
. . .
+
di t f FIR filt f N 4
0001 h[0]
0010 h[1]
0011 h[1]+h[0]
0100 h[2]
y[n] h[0].x[n]+ h[1].x[n 1] + +
+ h[N-1].x[n-N+1]
. . .
direct-form FIR filter for N=4.
x[n]
D D
x[n-1] x[n-2]
D
x[n-3]
0101 h[2]+h[0]
0110 h[2]+h[1]
0111 h[2]+h[1]+h[0]
1000 h[3]
X
h[0] h[1]
E
h[3] h[2]
X X X
1000 h[3]
1001 h[3]+h[0]
1010 h[3]+h[1]
1011 h[3] +h[1]+h[0] E 1011 h[3] +h[1]+h[0]
1100 h[3] +h[2]
1101 h[3] +h[2]+h[0]
1110 h[3] +h[2]+h[1]
4pointinnerproduct.
Weightsareconstant
11
12/17/2010
y[n]
1111 h[3] +h[2]+h[1]+h[0]
Institute for Infocomm Research, Singapore
DA-based adaptive filtering [14]
example:4tapFIRadaptivefilter
x[n]
D D
x[n-1] x[n-2]
D
x[n-3]
h[3]
4point
X
h[0] h[1] h[3] h[2]
X X
p
innerproduct.
Weightsarenot
constant
X
y[n]
E
constant.
y[n]
+
d[n]
e[n]
weight-
update
12
12/17/2010 Institute for Infocomm Research, Singapore
LUT for adaptive filter: example [14]
LUT values LUT values
address address
bitsofthesameplacevaluesofthefiltercoefficientsareusedasaddresses
13
12/17/2010 Institute for Infocomm Research, Singapore
DA-based inner-product of long vectors
n
MP
M P
n n
P
P
n n
P
n n
N
n
X A X A X A X A + + + = =

1
) 1 (
1 2 1
0
1
0
AX for N=MP
M P n P n n n = = = = ) 1 ( 0 0
InnerProduct InnerProduct InnerProduct
- - -
- - -
Inner Product
Unit1
Inner Product
Unit2
Inner Product
UnitP
E
P LUTs of 2^(M) words and (P-1) adders required for N-point inner-product.
inner-product A.X
14
12/17/2010
( ) ( ) q p p
Institute for Infocomm Research, Singapore
large order FIR filter using DA [15]
INPUT SHIFT-REGISTER
Yin
x[n-N+2]
M
BIT-SERIAL WORD-PARALLEL CONVERTER
M M
x[n]
(b ) (b ) (b )
x[n-1] x[n-N+1]
Xout PE
Xin
). ( _ Yin Read ROM Xin Xout +
(b
n
)
1,0
(b
n
)
1,1
(b
n
)
1,(P-1)
(b
n
)
0,0
(b
n
)
0,1
(b
n
)
0,(P-1)
Xin
Xout
OUTPUT
CELL
0
PE PE
PE
A (P-1)A
(b
n
)
(L-1),0
(b
n
)
(L-1),1
(b
n
)
(L-1),(P-1)
OUTPUT
CELL
.
; 0 ; 0 :
tion Initializa End
Count S Initialize
; 2
: 1 0
Xin S S
L Count For
+
s s
OUTPUT
. ; 0 ; 0
;
. 1
Endif Count S
S Xout then L Count If
Count Count

=
+
(b
n
)
i,j
:(j+1)th segment of bit-vector of ith bits of input
15
12/17/2010 Institute for Infocomm Research, Singapore
large order FIR filter: a 2-D design [15]
A (P-1)A
M M M
E
R
T
E
R
0
SERIAL-IN PARALLEL-OUT SHIFT-REGISTER
Xin
SA
Yin
SA
0
PE PE PE
S
E
R
I
A
L

C
O
N
V
E
Yin Xin Yout 2 +
Xin
SA
Yout
A (P-1)A
M M M
L
L
E
L

W
O
R
D
-
S
SERIAL-IN PARALLEL-OUT SHIFT-REGISTER
Yin Xin Yout . 2 +
0 PE PE PE
B
I
T
-
P
A
R
A
(L-2)A
SA
A (P 1)A
SERIAL-IN PARALLEL-OUT SHIFT-REGISTER
Xout PE
Xin
Yin
(L-1)A
INPUT
SA
A (P-1)A
M M M
0 PE PE PE
). ( _ Yin Read ROM Xin Xout +
16
12/17/2010
OUTPUT
Institute for Infocomm Research, Singapore
circular Convolution using DA [16]
circular convolution of two N-point sequences {x(n)} and {h(n)} is :
i l l i f N 4 circular convolution for N=4:
17
12/17/2010 Institute for Infocomm Research, Singapore
cyclic convolution using DA: a 2-D design [16]
A (P-1)A
(L)-th bit-stream of input sequence {x(n)}
R
T
E
R
0
CIRCULRLY RIGHT-SHIFT BUFFER
M M M
0
PE PE PE
E
R
I
A
L

C
O
N
V
E
R
SA
A (P-1)A
L
E
L

W
O
R
D
-
S
E
CIRCULRLY RIGHT-SHIFT BUFFER
second bit-stream of input sequence {x(n)}
M M M
0 PE PE PE
B
I
T
-
P
A
R
A
L
L
(L-2)A
SA
first bit-stream of input sequence {x(n)}
(L-1)A
INPUT
SAMPLES
SA
A (P-1)A
M M M
0 PE PE PE
CIRCULRLY RIGHT-SHIFT BUFFER
18
12/17/2010
OUTPUT
( )
SAMPLES
SA 0 PE PE PE
Institute for Infocomm Research, Singapore
computation of sinusoidal transforms [17-20]
N-point sinusoidal transforms like the DFT, DCT and DHT are given by
where the transform kernel is defined as
computation of N-point sinusoidal transforms involves multiplication of computation of N point sinusoidal transforms involves multiplication of
an N x N kernel matrix with N-point input vectors
involves N number of inner-products of N-point input vector with the
rows of kernel matrix
the matrix-vector product requires N inner-product computation units by
the DA approach
for prime values of N, the N x N kernel matrix is transformed to an (N-1)-
i t li l ti
19
12/17/2010
point cyclic convolution.
Institute for Infocomm Research, Singapore
multiplication using look-up-table
address
word X
product
word
LUT to multiply a 4-bit word X with a constant A
address
word X
product
word
multiplication ofan
L bi b X i h
L
X
word, X word
0000 0
0001 A
word, X word
1000 8A
1001 9A
L-bitnumberX with
constantA will
requireanLUTof2
L
words
L
LUTOF
2^L
0010 2A
0011 3A
0100 4A
1010 10A
1011 11A
1100 12A
words
multiplicationtime=
l
Words
0101 5A
0110 6A
0111 7A
1101 13A
1110 14A
1111 15A
memorylatency
AX
LUT size increases exponentially with input size.
20
12/17/2010
p y p
Institute for Infocomm Research, Singapore
optimization for constant multiplications
oddmultiplestorage(OMS)scheme
antisymmetricproductcoding(APC)scheme
inputcoding(IC)scheme
combinedtechniques
21
12/17/2010 Institute for Infocomm Research, Singapore
odd-multiple storage scheme [21]
address
word
product
word
address
word
product
word
address
word
product
word
0000 0
0001 A
0010 2A
1000 8A
1001 9A
1010 10A
0001 A
0011 3A
0101 5A
0011 3A
0100 4A
0101 5A
0110 6A
1011 11A
1100 12A
1101 13A
1110 14A
0111 7A
1001 9A
1011 11A
1101 13A 0110 6A
0111 7A
1110 14A
1111 15A
1101 13A
1111 15A
OnlyoddmultipleoftheconstantaretobestoredintheLUT.
Evenmultiplescouldbederivedfromthestoredwords.
Only half the number of product words are to be saved
22
12/17/2010
Onlyhalfthenumberofproductwordsaretobesaved.
Institute for Infocomm Research, Singapore
odd-multiple storage scheme [21]
memoryunitof(2^L)/2 wordsof(W+L)bitwidthisused
t t th dd lti l f t t A tostoretheoddmultiplesofconstant A.
abarrelshifterforproducingamaximumof(L-1) left
shiftsisusedtoderivealltheevenmultiplesofA.
theLbitinputwordismappedto(L-1)-bitaddressof
theLUTbyanencoder.
thecontrolbitsforbarrelshifterarederivedbyacontrol y
circuittoperformthenecessaryshiftsoftheLUToutput.
RESETsignalisgeneratedbythesamecontrolcircuitto
reset the LUT output when the X=0 resettheLUToutputwhentheX 0.
ifonlymagnitudepartcouldbeusedasaddress,LUTsize
isreducedtohalf.
23
12/17/2010 Institute for Infocomm Research, Singapore
anti-symmetric product coding [22]
insteadof32wordswe
needonly17words
tobestoredintheLUT.
usefulforhighprecision
multiplicationandinner
productcomputation.
u
v
24
12/17/2010 Institute for Infocomm Research, Singapore
high-precision LUT-multiplier [22]
WhenthewidthofinputmultiplicandX islarge,direct
implementationofLUTmultiplierinvolvesverylargeLUT.
But,theinputwordX couldbedecomposedintocertainnumberof
segmentsorsubwords X=(X
1
X
1
, , X
T
) andfedtoseparateLUTs.
The partial products pertaining to different subwords could be read Thepartialproductspertainingtodifferentsub wordscouldberead
fromtheLUTsandshiftaddedtoobtaintheproductvalues.
25
12/17/2010
Generalized Architecture for High-Precision LUT-based Multiplier for L = S(T 1) + S.
Institute for Infocomm Research, Singapore
input coding scheme: example [23]
X = (1 0 1 1 0 1 0 1 1 1 0 0 0 1 1 1).
Wecandecomposeittofourwordsas
X (1 0 1 1) (0 1 0 1) (1 1 0 0) (0 1 1 1) X = (1 0 1 1) (0 1 0 1) (1 1 0 0) (0 1 1 1).
26
Institute for Infocomm Research, Singapore 12/17/2010
input coding scheme: basic concepts
27
Institute for Infocomm Research, Singapore 12/17/2010
input coding scheme: a case for L=5
28
Institute for Infocomm Research, Singapore 12/17/2010
combining input coding with OMS
29
Institute for Infocomm Research, Singapore 12/17/2010
combining input coding with OMS
multiplierforL=5
30
Institute for Infocomm Research, Singapore 12/17/2010
combining input coding with OMS
31
Institute for Infocomm Research, Singapore 12/17/2010
DA-LUT vs LUT-multiplier-based designs
eachoutputofanNtapFIRfilterinvolvesthe
i f N i i d computationofone Npointinnerproduct
onesamplecouldbeprocessedbyDAapproachineach
cycleusingL LUTsof(2^N)-wordsand(L-1) adders
LUTmultiplierbasedapproachtohavethesame
throughputrequiresN LUTsof(2^L)-wordseachand
(N-1) adders. ( )
for N=L andforthesamethroughputimplementation,
boththeapproacheshavesimilarperformances
32
12/17/2010 Institute for Infocomm Research, Singapore
LUT-multiplier-based FIR filter [21]
segmented memory core for N multiplications using OMS and APC [FIR 2010
Latency chart of the DA-based and Latency chart of the DA based and
LUT-multiplier-based FIR filter.
33
15% less area than DA-based design for the same throughput rate.
12/17/2010 Institute for Infocomm Research, Singapore
LUT design for non-linear functions [24]
Example:sigmoidfunction
ForarangeAx ofvaluesofx onevalueoftanh(x)needtobe
d stored.
TherangeAx=2o, where|o, isthemaximumpermissible
valueoferror.
34
12/17/2010 Institute for Infocomm Research, Singapore
LUT design for non-linear functions
35
12/17/2010 Institute for Infocomm Research, Singapore
conclusions
memorytechnologyisgrowingquitefastandefficient
memoriesfordifferentapplicationsareemergingoverthe
years
memoryelementscanbeembeddeddirectlyintothe
structureofthemicroprocessororintegratedinthe
functionalelementsofdedicatedprocessors.
memorybasedapproachcouldbeusedforcomputation
intensivefrequentlyusedDSPtools. q y
theDAapproachaswellastheLUTbasedmultiplication
couldbeusedformemorybasedimplementationofdigital
filters filters
36
12/17/2010 Institute for Infocomm Research, Singapore
conclusions
boththeapproachescouldbeusedforthecomputationof
discretesinusoidaltransformsbytransformingthekernel
t i t li l ti f matrixtocyclicconvolutionform.
DAapproachcouldbeusedforreducedhardwarerealization
whenhardwareisnotamajorconstraintLUTbased
multiplierscouldbeusedforasimpleandstraightforward
implementationofFIRfilters
anewapproachtoreductionofLUTsizeformultiplicationis pp p
proposedrecently,wherethememorysizeisreduced
significantly
LUT could be designed for efficient evaluation of nonlinear LUTcouldbedesignedforefficientevaluationofnon linear
functions,likesinusoidalandhyperbolicfunctions,logarithms
andmultipleprecisionarithmetic.
37
12/17/2010 Institute for Infocomm Research, Singapore
references
[1] K. Itoh, S. Kimura, and T. Sakata, VLSI memory technology: Current
status and future trends, in Proc. 25th European Solid-State Circuits
Conference Sept 1999 pp 310 Conference, Sept. 1999, pp. 310.
[2] B. Prince, Trends in scaled and nanotechnology memories, in Proc.
IEEE 2004 Conference on Custom Integrated Circuits, Nov. 2005.
[3] R. Barth, ITRS commodity memory roadmap, in Proc. International
Workshop on Memory Technology, Design and Testing, July 2003 pp.
61-63.
[4] Kinam Kim, Memory Technologies for Mobile Era, in Proc. Asian
S lid St t Ci it C f N 2005 7 11 Solid-State Circuits Conference, Nov. 2005, pp. 7-11.
[5] International Technology Roadmap for Semiconductors. [Online].
Available: http://public.itrs.net/
[6] S. Lai, Non-volatile memory technologies: The quest for ever lower [6] S. Lai, Non volatile memory technologies: The quest for ever lower
cost, in Proc. IEEE International on Electron Devices Meeting, Dec.
2008 pp.1 - 6
38
12/17/2010 Institute for Infocomm Research, Singapore
references
[7] D. G. Elliott, M. Stumm, W. M. Snelgrove, C. Cojocaru, and R.
Mckenzie, Computational RAM: implementing processors in memory,
IEEE Trans Design & Test of Computers vol 16 no 1 pp 3241 Jan- IEEE Trans. Design & Test of Computers, vol. 16, no. 1, pp. 3241, Jan-
Mar 1999.
[8] M. Wang, K. Suzuki, A. Sakai, W.Dai, Memory and logic integration for
System-in-a-Package, Proc. 4th International Conference on ASIC, Oct. y g f
2001, pp.843 - 847 .
[9] T. Furuyama, Trends and challenges of large scale embedded
memories, in Proc. IEEE 2004 Conference on Custom Integrated
Ci it O t 2004 449 456 Circuits, Oct. 2004, pp. 449-456.
[10] C. Trigas, S. Doll, J. Kruecken, MRAM and Microprocessor System-
In-Package: Technology Stepping Stone to Advanced Embedded
Devices, IEEE Custom Integrated Circuits Conf, 2004, pp.71-79. Devices, IEEE Custom Integrated Circuits Conf, 2004, pp.71 79.
[11] US Patent 5790839 - System integration of DRAM macros and logic
cores in a single chip architecture
39
12/17/2010 Institute for Infocomm Research, Singapore
references
[12] S. A. White, Applications of the distributed arithmetic to digital signal
processing: A tutorial review, IEEE ASSP Magazine, vol. 6, no. 3, pp.
519 July 1989 519, July 1989.
[13] H.-R. Lee, C.-W. Jen, and C.-M. Liu, On the design automation of
the memory-based VLSI architectures for FIR filters, IEEE Trans.
Consumer Electronics, vol. 39, no. 3, pp. 619629, Aug. 1993. , , , pp , g
[14] D. J. Allred, H. Yoo, V. Krishnan, W. Huang, D. V. Anderson, LMS
Adaptive Filters Using Distributed Arithmetic for High Throughput,
IEEE Trans Circuits & Systems-I, vol. 52, no. 7, pp. 1327- 1337, July
2005 2005.
[15] P. K. Meher, S. Chandrasekaran, and A. Amira, FPGA Realization of
FIR Filters by Efficient and Flexible Systolization Using Distributed
Arithmetic IEEE Trans Signal Processing pp 3009-3017 July 2008 Arithmetic, IEEE Trans Signal Processing, pp. 3009 3017, July 2008.
[16] P. K. Meher, Hardware-Efficient Systolization of DA-based
Calculation of Finite Digital Convolution, IEEE Trans Circuits &
Systems-II, pp.707-711, Aug 2006.
40
12/17/2010 Institute for Infocomm Research, Singapore
references
[17] J.-I. Guo, C.-M. Liu, and C.-W. Jen, The efficient memory-based VLSI
array design for DFT and DCT, IEEE Trans. Circuits and Syst. II:
Analog and Digital Signal Process vol 39 no 10 pp 723733 Oct Analog and Digital Signal Process., vol. 39, no. 10, pp. 723 733, Oct.
1992.
[18] H.-C. Chen, J.-I. Guo, T.-S. Chang, and C.-W. Jen, A memory-efficient
realization of cyclic convolution and its application to discrete cosine
transform, IEEE Trans. Circuits Syst. for Video Technol., vol. 15, no. 3,
pp. 445453, Mar. 2005.
[19] D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and T. Stouraitis,
S t li l ith d e b ed de i h f ified Systolic algorithms and a memory-based design approach for a unified
architecture for the computation of DCT/DST/IDCT/IDST, IEEE
Trans. Circuits Syst.-I: Regular Papers, vol. 52, no. 6, pp. 11251137,
Jun. 2005. Jun. 2005.
[20] P. K. Meher, J. C. Patra, and M. N. S. Swamy, High-throughput
memory- based architecture for DHT using a new convolutional
formulation, IEEE Trans. Circuits Syst. II: Express Briefs, vol. 54, no.
41
12/17/2010
7, pp. 606610, July 2007.
Institute for Infocomm Research, Singapore
references
[21] P. K. Meher, New Approach to Look-up-Table Design and Memory-
Based Realization of FIR Digital Filter, IEEE Trans on Circuits &
Systems-I pp 592-603 March 2010 Systems I, pp.592 603, March 2010.
[22] P. K. Meher, LUT Optimization for Memory-Based Computation,
IEEE Trans on Circuits & Systems-II, pp.285-289, April 2010.
[23] P. K. Meher, Novel Input Coding Technique for High-Precision LUT-
Based Multiplication for DSP Applications The18th IEEE/IFIP
International Conference on VLSI and System-on-Chip (VLSI-SoC
2010), pp. 201-206, Madrid, Spain, September 2010.
[24] P K Mehe A O ti i ed L k T ble f the E l ti f [24] P. K. Meher, An Optimized Lookup-Table for the Evaluation of
Sigmoid Function for Artificial Neural Networks The18th IEEE/IFIP
International Conference on VLSI and System-on-Chip (VLSI-SoC
2010), pp. 91-95, Madrid, Spain, September 2010. 2010), pp. 91 95, Madrid, Spain, September 2010.
42
12/17/2010 Institute for Infocomm Research, Singapore