Beruflich Dokumente
Kultur Dokumente
Instruction
O1 O2 Oj
Result
Points to be considered
• Choice of moduli set
• Computation time and area requirements for the
following blocks:
• Binary to RNS conversion
• RNS to Binary conversion
• Multiplication
• Scaling
• Base extension
• Sign detection
• Comparison
Binary to RNS conversion
n bit Adder
Two’s complement of mi or
(X+Y) (2n-mi)
Sign
bit
(n +1) bit Adder
Delay = nDFA+(n+1)DFA+DMUX
(X+Y)
2:1 MUX
select
Delay = (n+2)DFA+DMUX
(X+Y) mod mi
Area = nAFA+2(n+1)AFA+n D2:1MUX
Multiplier
mi
XY
Divider
Quotient
Throw it.
Reminder
• Area Multiplier+divider
• Delay Multiplier+divider
• Divider can be restoring or non-restoring.
• Word length of the processor 2n bits
Brickell’s Algorithm based Modulo
Multipliers
• Maximum word length (n+1) bits for taking
one bit at a time.
• Higher radix feasible.
• Area intensive
• Other methods exist such as using
Redundant Arithmetic, non-overlapping
multibit recoding
• 13.15 mod 23
• We do not want to do in a straight forward
manner .
• Write b = 13 in binary form:
• b3b2b1b0 =1101
• Do repeatedly starting from MSB:
• Old= (2.Old + bi.A) mod 23
EXAMPLE
• b3b2b1b0 =1101; A =15, mi = 23
• P= (2.0 + 1.15) mod 23 = 15
• P=(2.15 + 1.15) mod 23 = 22
• P=(2.22 + 0.15) mod 23 = 21
• P=(2.21+ 1.15) mod 23 = 11
• Maximum value of P <3(23) i.e. 3mi
• Modulo subtraction is by two comparisons:
• Is P>N? or Is P>2n?
• Answer is either P, P-mi, P-2mi; choose based on sign of P-mi,
P-2mi.
• Example 45 mod 23, anwers are 45,45-23=22,45-46=-1; since P-
2mi is negative and P-mi is positive, P-mi is the correct result.
• Multiple precision arithmetic to be used in PC based
implementations
Architecture for Modmul
Old A bi TC of 2mi
LSB of TC of mi
2Old Zero
Adder Adder
(n+2) bit adder
3:1 Mux
Latch
Latch
ModMUL
• Computation time= n[(n+2)DFA+DMux]
• Area = 3(n+2)AFA+A3:1MUX+nAAND
Modmul for IDEA
• IDEA (International Data Encryption
Algorithm) uses (xy) mod (216+1) as a
programmable S-Box (Substitution Box),
where x and y are 16 bit words.
• Ideal for DSPs
• Get P=xy a 32 bit word.
• Subtract MSB 16 bit word from LSB 16 bit
word. If negative, add (216+1)
RNS to Binary Conversion
• CRT based
• MRC based
• CRT: RNS {m1,m2,m3} Residues {x1,x2,x3}
• Define Mi=M/mi and M=m1m2m3
• Decoded Binary number X
• = [M1{(1/M1) mod m1}x1+ {M2 (1/M2) mod m2}x2+ M3{(1/M3) mod
m3}x3]mod M
• e.g. {3,5,7} M=105, M1=35,M2=21,M3=15
• (1/35) mod 3 = 2, (1/21) mod 5=1, (1/15) mod 7=1.
• X= [70x1+21x2+15x3] mod 105
• Consider (1,2,3), X = (70+42+45) mod 105 = 157 mod 105 = 52
• Generally, Mi are large, Mi{(1/Mi) mod mi} are stored,involves
multiplication of these large numbers by xi in parallel and adding.
CRT Implementation
[M2(1/M2) mod m2] [M3(1/M3) mod m3]
X1 [M1(1/M1) mod m1]
X2 X3
Mod M adder
X
• Modulo M adder may involve n subtractions for a n
moduli system
• Delay = D + D
MRC Example RNS {7,8,9}
7 8 9
m1 m2 m3 1 2 3
r1 r2 r3 -3 -3
- r3 - r3 5 7
(r1-r3) mod m1 = p (r2-r3) mod m2 =q x4 x1
XA XB 6 7
UA UB -7
6
-UB
x1
(UA-UB) mod m1 =r
6
XC X = 6.72+7.9+3 = 498
UC
2 2 1
n n n n n n
m1 m1 m1
1 n
mod 2 1 2 n 1
2 n2 n 1
1
mod 2 1
2 12 1
n
n n
1
mod 2 1 2 n n 1
1
2n2n 1
3
Divide by 2n to get 2n MSBs of the result as
( B x 2)
n
2n1 x1 2n 1 x2 2n (2n1 1) x3 2n 1mod 22n 1
2
• Example {7,8,9}
• [(32+4)x1-8x2+(36-1)x3] mod 63 yields 6
MSBs
Realization
• Andraros and Ahmad : Four 2n-bit words to be added
using two levels of Adders of rotated bits.
• Piestrak suggested using CSA two level with CPA using
end around carry for adding four 2n-bit words
• Delay - (4n+2) DFA, Area = (6n) AFA
• Suggested Low delay version (2n+2) DFA+DMUX also, 2n
A2:1MUXes needed.
• Dhurkadas (NPOL, Cochin) suggested simplification to
three 2n-bit inputs to be added
• Delay – (4n+2) DFA, Area = (4n) AFA
• Bhardwaj, Premkumar, Srikanthan [1998] suggested
using n-bit adders e.g Carry select adders n-bit
• Wang et al [2002] 2n-bit as well as n-bit adders three
converters.
{7,8,9} example (x1,x2,x3)
x1, x2 3 bit, x3 4 bit
x12x11x10, x22x21x20, x33x32x31x30
( B x 2)
n
2n1 x1 2n 1 x2 2n (2n1 1) x3 2n 1mod 22 n 1
2
( B x 2)
n
(22 n1 2n1) x1 x2 2n (22n1 2n1 1) x3 mod 22n 1
2
•[(32+4)x1-8x2+(36-1)x3] mod 63 :
Y= (x33+x32)′
X3x= x30+x33 since either x30 or
x33 exist
Other three, Four and Five moduli
sets
• {2n,2n-1,2n-1-1} Hiasat and Abdel-Aty-Zohdy, Wang,
Wang, Swamy and Ahmad: not better than popular
moduli set, multipliers etc are simpler
• {2n,2n-1,2n+1-1} Ananda Mohan better in area or time,
multipliers are simpler
• {2n,22n-1,22n+1} Ananda Mohan better than Cao et al
four moduli set, one large modulus
• {2n,2n-1,2n+1, 2n+1-1 } Vinod and Premkumar
• {2n,2n-1,2n+1, 2n+1-1 } Bhardwaj, Srikanthan, Ananda
Mohan and Premkumar Area and Time intensive
• {2n,2n-1,2n+1, 22n+1} Cao et al better than other four
moduli sets but one modulus bigger in size.
• {2n-3,2n-1,2n+1,2n+3} Sheu et al uses ROM not attractive
• {2n-1-1, 2n-1,2n,2n+1,2n+1-1} Cao et al 2007 Increases
cardinality to 5, DR of 5n bits but RNS to Binary
conversion is slower/area consuming
Comparison of various converters for three
moduli sets
• M2 {2k,2k-1,2k-1-1}, M1{2k-1,2k,2k+1},
• M4{2k,2k-1,2k+1-1}, M3{2k-1,2k,2k+1,2k+1-1}
Base Extension
• Needed in scaling or division.
• Uses MRC fist to divide followed by base
extension.
• CRT can be used but is cumbersome.
Example: {3,5,7} 52= (1,2,3) Scale by 7
3 5 7
1 2 3
-3 -3
1 4
x1 x3
1 2 2 First Base Extension step
-2
2
X2
1 +(1x5)mod 7 Base Extension step
0
RSA using RNS/ECC
• Needs computation of PQ mod N
• e.g 1023 mod 37 = (1016)(104)(102)(101) mod 37
• Successive squaring mod 37 and Multiplications mod 37 of selected
results.
• Needs (XY) mod N ass basic step where X,Y,N are 1024 bit
numbers.
• RNS can be used.
• Montgomery technique has been used to find (X′Y′/M) mod N where
M is the product of Moduli in RNS.
• Needs two RNS dynamic ranges M and M′ which are mutually
Prime and a redundant modulus
• Determine q such that (X′Y′+qN) is a multiple of M.
• Extend q to RNS with Dynamic range M′.
• Find r = (X′Y′+qN)/M in second RNS
• Do base extension to First RNS
Sign Detection and Comparison
• Is difficult
• Needed to go to Binary number to detect
sign
• Comparison is also difficult Needed to go
to Binary numbers or sequential
techniques such as comparing Mixed
Radix Digits.
Applications
• FIR Filters (ensure that RNS dynamic
range is larger than that of the filter)
• Digital Frequency Synthesis
• Video Filters
• 2-D filters
• NTTs (Number Theoretic Transforms)
• Cryptography
Applications of RNS
• [5] Freking, W.L., and Parhi, K.K., "Low-power FIR digital filters using residue
arithmetic, " in Conf. Record 31st Asil. Conf. Signals, Syst. and Comput.
(ACSSC 1997), vol. 1, Pacific Grove, CA USA [1997], 739-43.
• [6] D'Amora, A. et al., "Reducing power dissipation in complex digital filters by
using the quadratic residue number system, " in Conf. Record 34th Asil. Conf.
Signals, Syst. Comput. (ACSSC 2000), vol. 2, Pacific Grove, CA USA [2000],
879-83.
• [7] Cardarilli, G.C. et al., "Low-power implementation of polyphase filters in
Quadratic Residue Number system," in Proc. IEEE Int. Symp. Circuits Syst.
(ISCAS 2004), vol. 2, Vancouver, BC, Canada [2004], 725-728.
• [8] Shanbag, N.R., and Siferd, R.E., A single-chip pipelined 2-D FIR filter using
residue Arithmetic, IEEE JSSC -26[1991], 796-805.
• [9] Tuukka Toivonen., and Janne Heikkilä., Video Filtering With Fermat Number
Theoretic Transforms Using Residue Number System, IEEE CSVT-16[2006],
128-135.
• [10] Schwemmlein, J., and Posch, K.C., Reinhard Posch. RNS-modulo
reduction upon a restricted base value set and its applicability to RSA
cryptography, Computer & Security [1998], 17, 637-650.
• [11]Hanae Nozaki., Masahiko Motoyama., Atsushi Shimbo., and Shinichi
Kawamura., Implementation of RSA algorithm based on RNS Montgomery
multiplication, In C. Paar (ed). Cryptographic Hardware and Embedded
Systems – CHES, Springer-Verlag, Berlin, Germany [2001], 364-376.
• [12] Jean-Claude Bajard., Laurent Stephane Didier., Peter Kornerup.,
An RNS Montgomery modular multiplication Algorithm, IEEE C-47
[1998], 766-776.
• [13] Jean-Claude Bajard., and Laurent Imbert., A Full RNS
Implementation of RSA, IEEE C-53[2004],769-774.
• [14] Schinianakis, D.M., Kakarountas. A.P., and Stouraitis. T., A New
Approach to Elliptic Curve Cryptography: an RNS Architecture, IEEE
MELECON, May 16-19, Benalmádena (Málaga), Spain [2006], 1241-
1245.
• [15] Lie-Liang Yang., and Lajos Hanzo., A Residue Number System
Based Parallel Communication Scheme Using Orthogonal Signaling:
Part I—System Outline, IEEE VT-51[2002],1534-1546.
• [16] Chaves, R., and Sousa, L., “RDSP: A RISC DSP based on
residue number system,” in Proc. Euro. Symp. Digital System
Design: Architectures, Methods, and Tools, Antalya, Turkey [2003],
128-135.
• [17] Wei, W. et al., "RNS application for digital image processing," in
4th IEEE Int. Workshop Syst.-on-Chip for Real Time Applications,
Banff, Alta., Canada [2004],77-80.
Conclusion
• Very mature today
• Can be used in place of Custom DSP
blocks
• Research on newer moduli sets with high
cardinality and Faster Reverse
Conversion is of interest