Sie sind auf Seite 1von 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/332174707

FPGA Implementation of an RNS based Elliptic Curve Cryptography processor

Conference Paper · April 2019

CITATIONS READS

0 63

2 authors:

Shubham Anand Sakthivel Sm


VIT University VIT University Chennai
3 PUBLICATIONS   0 CITATIONS    14 PUBLICATIONS   25 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Simulation and design of Supercapacitor assisted low dropout regulator technique View project

FPGA Implementation of an RNS based Elliptic Curve Cryptography processor View project

All content following this page was uploaded by Shubham Anand on 27 November 2019.

The user has requested enhancement of the downloaded file.


FPGA Implementation of an RNS based
Elliptic Curve Cryptography processor
Shubham Anand1, Dattatray A2, S.M Sakthivel3
1
Student, Mtech VLSI Design, VIT Chennai, Tamil Nadu .
2.
Student, Mtech VLSI Design, VIT Chennai, Tamil Nadu .
3
Professor, School of electronics engineering, VIT Chennai, Tamil Nadu.
E-mail: sakthivel.sm@vit.ac.in

Abstract— Increasing digital transactions in today’s the 1024 bit key in any other cryptographic system. So a
world and many other applications where data spying smaller key size infers that ECC requires much less
can have very severe consequences brings a challenge hardware than any other system for implementation without
to design a reliable security system which requires committing any compromise with level of security.
less hardware without compromising with the level of
security. Many cryptographic system has been With massively increasing digital system design
purposed and implemented and still a lot of research carry propagation problem has always been the concern but
is being done to improve them in all aspects. many redundant schemes have been derived to overcome
Elliptical curve cryptography is one of the this problem. Most popular of them are carry save arithmetic
asymmetric cryptographic techniques which uses (CSA), redundant signed digits (RSD) and residue number
smaller sized keys and is capable of providing systems (RNS). Depending on the application any of these
security equivalent to the other known techniques can be used to overcome carry propagation problem. Out of
such as Rivest, Adleman etc. In this paper, we these RNS performs all the arithmetic operations on
implemented a processor which is capable to perform remainder from a base consisting of a set of relatively prime
operations such as point addition and point doubling numbers. Since remainders can be represented by a smaller
required in ECC for encryption. The processor number of bits than the actual number hence the hardware
computes all the arithmetic operations by using requirements are small for RNS based ALU.
residue number system technique which not only
In the literature many ECC processors have been
makes the data more secure but also requires less
proposed targeting either binary fields, prime fields, or dual
hardware for implementation, as it deals with
field operations. In prime field it is very important to
remainders which consists of much smaller number
achieve carry free arithmetic to avoid lengthy datapaths. In
of bits than the actual number.
our work we have used RNS for the same. Since both ECC
and RNS works for subsiding the hardware requirements of
Keywords— elliptical curve cryptography (ECC),
the design, our objective for this work was to combine the
residue number system (RNS) and field-
positives of both to bring down hardware requirements even
programmable gate array (FPGA).
further. In this work we have designed an elliptical curve
cryptographic processor utilizing an application specific
I. INTRODUCTION instruction set and performing all the arithmetic operations
using RNS. The work is performed in Xilinx Spartan3E
There are many cryptographic systems presently existing (xc3s250e-4tq144) FPGA. The design consists of an RNS
and lot of research is being applied to make a reliable based arithmetic unit utilizing RNS generator, modulo
security system capable of meeting up the present day adder, modulo subtractor, modulo multiplier, modulo
security requirements. More advanced levels of security divider and RNS to binary converter. A control unit
requires more hardware and thus induces large power consisting of a finite state machine is used to control the
consumptions and delays in the design. Thus the security flow of instructions to perform the desired operations. Two
comes at the cost of hardware and hence reducing the data buses and memories are also included in the design.
hardware requirements of a cryptographic system is equally
important. Also reduction in hardware cannot be done by From here the paper is organized as follows. The
compromising with the level of security. section II of this paper provides brief explanation of some
critical concepts required for the understanding of the design
A cryptographic technique called ECC which was such as ECC and RNS. Also implementations of RNS
there from quite a long time recently became very popular generator and RNS to binary converter are shown in this
when the need for reducing hardware became prominent. section. Section III describes the proposed architecture for
The reason for its popularity is the key size it requires for the design. The design and implementation results of the
encryption. According to researchers, a 164 bit key used arithmetic unit is presented in section IV. Section V
with ECC yields the same level of security as provided by describes control unit design while section VI contains

SET CONFRENCE, VIT CHENNAI, 2019


information about the implementation results. Section VII The number of times the operation performed serves as
and VIII contains conclusion and references. the private key. Hence even if the final and initial
coordinates are known it is very difficult to compute the
II. CRITICAL CONCEPTS number of times the operation is performed and hence
ECC can provide very high level of security with a small
A. Elliptical curve crytography
sized private key also.
ECC is based on elliptic curve theory and acts as a
public key encryption technique that is capable of B. Residue number system
generating smaller and more efficient cryptographic Residue number system is being existing from centuries.
keys. It uses properties of the elliptic curve equation for Its origin is supposed to be found in a third century book
generating keys whereas most of the other techniques Suan-Ching. RNS is a carry free system capable of
rely on the product of very large prime numbers for the performing arithmetic operations such as addition,
same. The ability of ECC of providing equivalent subtraction and multiplication as parallel operation and
security with less computing power and small battery considered to be a very popular technique for resolving
usage makes it very useful for mobile applications. the carry propagation problem of digital system design.
ECC utilizes a trapdoor function for its operation. RNS is defined in terms of a set of relatively prime
A trapdoor function is the one which can be easily moduli serving as a base for RNS computations. If P
performed in forward direction but requires very large denotes a moduli such that P = (p1, p2…, pL) then the
computations or almost impossible to perform in reverse greatest common divisor GCD (pi, pj) = 1 for i ≠ j. The
direction. There are many elliptical curves each with a remainder of a number X is calculated from each
particular equation for the curve are possible. The element of the set of RNS moduli to convert X to the
designed processor supports NIST P256 curve whose RNS form given as X = (x1, x2… xL) where xi = X mod pi.
equation is given by There many strategies to choose RNS moduli set
depending upon the application and range of operand.
E: y2 = x3 + ax + b (1) Generally a moduli set of {2n+1, 2n, 2n-1} is preferred. In
our work we have been using a moduli set of {32, 5, 3}.
The equation (1) is known as reduced Weierstrass The prime reason for choosing this moduli set is that in
equation. Points on the elliptical curves are defined by hardware description language like Verilog modulus
affine coordinates (x, y). These coordinates are always operator is not synthesizable so alternative algorithms to
of integer type. In ECC for encrypting a point a common obtain remainders from the moduli set. Since it is easy to
operation known as point multiplication is performed in compute remainders from two and power of two using
which a point on the curve gets multiplied with a scaler. the algorithm shown below we chose 5th power of two as
A scalar multiplication operation consists of a series of one element of moduli set. Also it is not possible to take
point addition and point doubling operations. Using the all elements as power of two as doing so the elements of
geometrical properties the points are encrypted through a the moduli set will not be relatively prime numbers so
series of arithmetic operations such as addition, we chose 5 and 3 as other elements of our moduli set.
subtraction, multiplication and division on the
coordinates. Based on the work presented in [1] the Algorithm 1 Computing Remainders from fifth power
coordinates obtained after point addition and point of two
doubling can be evaluated through algebraic expressions Input: [10:0] X;
given in equation (2), (3), (4) and (5). Assume two Output: [4:0] Z;
points P (x1, y1) and Q (x2, y2). On performing a point
addition operation considering P and Q as operands 1: Z = X [4:0];
results in a point R = P + Q having coordinates (x3, y3).
The coordinates x3 and y3 can be evaluated as It can be inferred from algorithm 1 that computation of
remainders from the powers of two can be easily coded
and requires very less hardware for the implementation.
No such algorithm can be derived for computing
remainders from 5 and 3 so in our work we have created
a small memory coded through a Perl script which
contains the pre computed remainders from 5 and 3. The
figure 1 and figure 2 below shows the RTL and
If point doubling operation is performed on point P the simulation results respectively of the module generating
resulting coordinates x4 and y4 can be evaluated as RNS values for an input. It can be inferred from the RTL
that a multiplexer based ROM and a sequential device
like D flip flop comprises the hardware of the module
computing remainders from 5 and 3 whereas the
hardware requirements for the module computing
remainder from 32 is just a single sequential element. As
soon as the positive edge of the clock signal arrives the
remainders 33%32, 33%5 and 33%3 are computed and
the results as 1, 3 and 0 are provided at the output ports 1. X1 = rem32 * 225;
rem32, rem5 and rem3 respectively. Every input from 2. X2 = rem5 * 96;
the data bus is first converted to its RNS form and then 3. X3 = rem3 * 160;
arithmetic operations are performed on RNS form only. 4. X4 = X1 + X2 + X3;
Since the numbers in this form requires less bits for 5. Z = X5 % 480;
representation the arithmetic operations performed on
this form requires less hardware. The simulation results for an RNS to binary converter
are shown in figure 3 for an RNS input {7, 2, 1} and
moduli set {32, 5, 7}.

Figure 3: Simulation of an RNS to binary converter

III. ARCHITECTURE
The proposed architecture of the processor consists of an
RNS based arithmetic unit including a modular adder,
modular subtractor and modular multiplier fed by a RNS
generator and output of AU unit is provided to a RNS to
Figure 1: RTL schematic of RNS generator
binary converter. The architecture consists of two data buses
one providing input and the other for storing the outputs. A
control unit consisting of a finite state machine is used for
controlling the arithmetic unit. The processor supports P256
NIST recommended prime curves. Two other controllers for
performing point addition and point addition acts as a sub
controller and being controlled by the main controller.
The input from the data bus is first converted to
RNS form and provided to the AU. The AU under the
influence of the controllers performs the required arithmetic
operations and the output is fed to RNS to binary converter
which drives the output data bus.

Figure 2: Simulation results for generating RNS of 33

After performing all the arithmetic operations the results


sent to the data bus should be in binary form only. There
are methods for computing binary representation from
RNS representation. One of them is described in
algorithm 2 in which the binary weights of each element
of the moduli set is used to compute the equivalent
binary representation.

Alogorithm 2: Computing equivalent binary


representation from RNS form
Input: rem32, rem5, rem3
Output: Z; Figure 4: Proposed architecture of the Processor
IV. ARITHEMATIC UNIT
The arithmetic unit of the design is responsible for
performing the needful arithmetic operations required for
computing the coordinates for a point addition and point
doubling operation. The AU will be working under the
influence of the control unit which will describe the required
arithmetic operation at a particular time. The AU consists of
the following modules:
1. Modulo adder
2. Modulo subtractor
3. Modulo multiplier
A. Modulo adder
Modulo adder unit is responsible for performing addition
over two RNS numbers. Modulo adder works on a simple Figure 6: Simulation of RNS based modulo adder
algorithm described in algorithm 3. In order to perform
RNS addition for a moduli set {32, 5, 3} requires one five In the simulation in figure 6 shows the addition of {6, 1, 0}
bit, one 3 bit and one 2 bit adder. The modulus of the result and {7, 2, 1} and the obtained result at the output port as
of each adder from the corresponding moduli set element {13, 3, 1}.
will give the results of the addition operation in the RNS
form
B. Modulo subtractor
Algorithm 3: Modular addition Modulo subtractor unit is responsible for performing
Inputs: X (remx32, remx5, remx3) subtraction over two RNS numbers. The algorithm for the
Y (remy32, remy5, remy3) same is described in algorithm 4.
Output: Z (remz32, remz5, remz3)
Algorithm 4: Modular subtraction
1. A = remx32 + remy32; Inputs: X (remx32, remx5, remx3)
2. B = remx5 + remy5; Y (remy32, remy5, remy3)
3. C = remx3 + remy3; Output: Z (remz32, remz5, remz3)
4. remz32 = A %32; remz5 = B%5; remz3 = C%3;
1. A = remx32 - remy32;
2. B = remx5 - remy5;
The RTL and simulation results for modulo adder are shown 3. C = remx3 - remy3;
in figure 5 and figure 6 respectively. 4. remz32 = A %32; remz5 = B%5; remz3 = C%3;

The RTL and simulation results for the modular subtractor


of AU are shown in figure 7 and 8 respectively.

Figure 5: RTL schematic of RNS based modulo adder


Figure 7: RTL schematic of modular subtractor
Figure 8: simulation of modulo subtractor Figure 10: Simulation of modulo multiplier
D. Arithematic unit top module
C. Modular multiplication
As it can be infered from the simulations of the components
Modulo multiplier unit is responsible for performing of the au that the airthematic computation need one clock
multiplication over two RNS numbers. The algorithm for cycle to to compute the output. So while driving the output
the same is described in algorithm 5. of the RNS too binary converter care should be taken to
provide enough delay for the RNS to binary generator so
Algorithm 5: Modular multiplication that it can sample the correct output. For this purpose we use
Inputs: X (remx32, remx5, remx3) a clk divider circuitary in between the AU and RNS to
Y (remy32, remy5, remy3) binary converter as shown in the RTL schematic in figure
Output: Z (remz32, remz5, remz3) 11.

1. A = remx32 * remy32;
2. B = remx5 * remy5;
3. C = remx3 * remy3;
4. remz32 = A %32; remz5 = B%5; remz3 = C%3;

The RTL and simulation results for the modular multiplier


of AU are shown in figure 9 and 10 respectively.

Figure 11: RTL schematic of top module of AU

The two operands over which the arithmetic operation is to


be performed is provided at the inp1 and inp2 terminal of
the design. The design contains one more input port named
as “operation”. The input to this port determines the type of
operation to be performed on the operands. If the value at
operation port can be 00 for modular addition, 01 for
performing modular subtraction and 10 for modular
multiplication operation. The output of the AU is fed to a
RNS to binary converter which produces the output of the
design in binary form. The operation port of the design is
controlled by a main controller which feeds a particular flow
of arithmetic instructions desired to perform operations such
Figure 9: RTL of RNS based modular multiplier as point addition and point multiplication.
V. CONTROL UNIT
The control unit comprises of a main controller and two sub
controllers for performing point addition and point
multiplication. The main controller is a finite state machine
derived such that it is capable of fetching, reading and
executing an instruction for performing desired operation.
The procedure for performing point addition and point
doubling is carried by two sub controllers working under the
influence of the main controller.
A. Main controller

The basic function of the main controller is to fetch, read


and execute instructions by utilizing a memory resource for
the same. The instruction set given in table 1 is used for
controlling the functionallity of AU.

Instruction Operation
INI Initialize registers
Figure 12. RTL schematic of the main controller
CMP Compares the given values
with another based on the Firstly a start signal is given to the controller which brings
state and set the not equal the control from default initial state (INIT) to instruction
flag read state. The function of this state is to decode the
instruction and depending on the instruction given send the
JMP Jump to the given control to the corresponding state. The six instructions
instruction address in case specified in the instruction are given in a specified order in
the not equal flag is set order to perform the desired operation. The first instruction
should be INI which sends the control to load coordinate
WPA Perform point addition state in which inputs from the input data bus are presented at
the input ports of AU. An input known as finish loading
WPD Perform point doubling
inputs goes high as all the inputs are presented to AU and
FIN Finish and produce output at brings control back to instruction read. The next instruction
output data bus should be CMP to compare the inputs for equality as equal
input can lead to an undefined form as per the equations
Table 1. Instruction set provided for point doubling and point addition. If the input
The FSM of the main controller consists of the following coordinates are not equal flag is enabled which kind of
states 1. INIT; 2. Instruction Read; 3. Load Coordinates; 4. ensures that other operations can be applied on the provided
Jump; 5. Wait; 6. Compare; and 7. Finish. Hence a three bit input coordinates. Then the design is all set to perform point
binary state assignment is done. On implementing the addition and doubling which are performed by instructions
control unit the state diagram and RTL obtained after WPA and WPD respectively. By the time AU is computing
synthesis are presented in Figure 12 and 13 respectively. results for point addition or point doubling a signal PA or
PD gets enabled which brings the controller to wait state.
The controller remains in wait state till the PA or PD is set
low. One JMP instruction is provided to move the
instruction pointer to any desired location. In the end FIN
instruction is used to provide the computed results on the
output data bus. And after the computations the controller
goes back to the initial (INIT) state.
B. Point adder/doubler controller
These are sub controllers working under the influence of
main controller. They are responsible for determining the
flow of instructions to be given to the AU for performing
either point doubling or point addition on the input
coordinates. This unit utilizes the components such as
modular adder, modular multiplier and modular subtractor
for obtaining the resulting coordinates after point addition or
point doubling. The flow of instructions for performing
point addition is shown in figure 14 and for point doubling
Figure 12. State diagram of the controller is shown in figure 15.
Figure 15. Flow of instructions for performing point
doubling

VI. IMPLEMENTATION RESULTS


The design is implemented in Xilinx Spartan3E (xc3s250e-
Figure 14. Flow of instructions for performing point 4tq144) FPGA. The most recent journal in literature where
addition an RSD based ECC processor is designed [1] have a
maximum operating frequency of 160 MHz. After
performing synthesis of the arithmetic unit designed in this
work with Xilinx ISE design tool computed a maximum
operating frequency of 197.044 MHz. A comparison of
design summary of each sub unit of our RNS based AU with
an RSD based AU [1] is provided in table 2.

Modular Modular Modular


adder subtractor multiplier
RNS RSD RNS RSD RNS RSD
Delay(ns) 4.014 4.79 4.014 4.79 1.897 6.24
Maximum 249 208 249 208 527 160
frequency
(MHz)

Table 2. Comparison of RNS and RSD based AU

Table 2 shows that RNS is capable of producing a massive


reduction in the hardware requirements of a design and
hence shows an immense reduction in the delay. Especially
big operations like multiplication are very costly using RSD
and can be easily implemented with RNS. The reason for
such a large difference in the delays is due to the fact that
throughout our work we have used a moduli set {32, 5, 3}
which gives a range of 480. Hence the design is functionally
stable only for maximum nine bits of inputs. Whereas RSD
design presented here [1] is designed for a larger range of
inputs. The larger range can be achieved in RNS also and if [6] Barrade P. “Series connection of supercapacitors:
proper moduli set is chosen, RNS can provide better results. comparative study of solution for the active
The overall design requirements of the AU equillization of the voltages”. Proc. Int. Conf. on
proposed in this work are summarized in table 3. Modeling and Simulation of Electric Machines,
Convertors and Systems, 2002.
Component Utilization [7] J. Vliegen et al., "A compact FPGA-based
Number of slices 244 architecture for elliptic curve cryptography over
Number of slice flip flops 66 prime fields," ASAP 2010 - 21st IEEE
International Conference on Application-specific
Number of 4 input LUTs 452 Systems, Architectures and Processors, Rennes,
Number of bounded IOBs 40 2010, pp. 313-316.
Number of BRAMs 4 [8] S. Wei and C. Jiang, "Residue Signed-Digit
Number of Mult18X18SIOs 4 Arithmetic and the Conversions between Residue
Number of Global clocks 1 and Binary Numbers for a Four-Moduli Set," 2012
11th International Symposium on Distributed
Computing and Applications to Business,
Table 3. Design summary of AU Engineering & Science, Guilin, 2012, pp. 436-440.
[9] H. M. Yassine, "Fast arithmetic based on residue
VII. CONCLUSION number system architectures," 1991., IEEE
International Sympoisum on Circuits and Systems,
Both elliptical curve cryptographic and residue number Singapore, 1991, pp. 2947-2950 vol.5.
systems individually serves as a very effective technique for [10] M. E. Kaihara and N. Takagi, "A hardware
reducing hardware requirements and thus delay in a design. algorithm for modular multiplication/division,"
In this work we combined the positives of both the in IEEE Transactions on Computers, vol. 54, no. 1,
techniques to make a reliable security system processor pp. 12-21, Jan. 2005.
which consumes very less hardware for encrypting a point.
The design proposed in this work is capable of encrypting
an input of maximum of nine bit coordinates. The range of
the input can be increased by increasing the values in the
moduli set. All the computations in the design are carried
out for NIST P256 elliptical curve and implementation
showed shortest data path with a maximum frequency of
197 MHz.

VIII. REFERENCES

[1] H. Marzouqi, M. Al-Qutayri, K. Salah, D.


Schinianakis and T. Stouraitis, "A High-Speed
FPGA Implementation of an RSD-Based ECC
Processor," in IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 24, no. 1,
pp. 151-164, Jan. 2016.
[2] Koblitz Neal. “Elliptic Curve Cryptosystems,”
Math Computation, vol.48, no. 177, pp. 203-209,
Jan.1987.
[3] Y. Wang and R. Li, "A Unified Architecture for
Supporting Operations of AES and ECC," 2011
Fourth International Symposium on Parallel
Architectures, Algorithms and Programming,
Tianjin, 2011, pp. 185-189.
[4] S. Chung, J. Lee, H. Chang and C. Lee, "A high-
performance elliptic curve cryptographic processor
over GF(p) with SPA resistance," 2012 IEEE
International Symposium on Circuits and Systems,
Seoul, 2012, pp. 1456-1459.
[5] D. Karakoyunlu, F. K. Gurkaynak, B. Sunar and Y.
Leblebici, "Efficient and side-channel-aware
implementations of elliptic curve cryptosystems
over prime fields," in IET Information Security,
vol. 4, no. 1, pp. 30-43, March 2010.

View publication stats

Das könnte Ihnen auch gefallen