Sie sind auf Seite 1von 17

Using Matlab to Aid the Implementation of a Fast RSA Cryptocore

2008 The MathWorks, Inc.

Carsten Siggaard, Senior Consultant Danish Technological Institute (DTI)

Danish Technological Institute (DTI)

Knowledge Development

Knowledge Transfer

Knowledge Application

Why Implement RSA on a Field Programmable Gate Array?


FPGAs are inherently parallel, that means faster than General Purpose Processors but at a much lower clock speed. Consider system using RSA encryption: If you can place the encryption on a separate FPGA then the CPU on this platform can perform other tasks. RSA is a difficult algorithm to implement on FPGAs - much more difficult than the Advanced Encryption Standard (Rijndael, AES) or Blowfish. Therefore if you can implement RSA virtually any encryption standard can be implemented. The core calculations in RSA are the same as those performed in other cryptographic schemes such as Diffie-Hellman key exchange and El-Gamal.

Major Results
The core can perform 35.000 cryptographic operations per second
50% Usage (On Xilinx XC4SX35) 1024 bit message 1024 bit modulo, 5 bit public exponent

@200 MHz 50.000 operations can be performed Compare with AMD Opteron 2.8 GHz: 26.000 Ops/s W 0 9 X A M Power consumption 1 W (Xilinx power estimator using simulated data). Theoretical MAX: 3.150.000 Ops/s (Altera Stratix IV E with 1360 16-bit multipliers).

Used Toolboxes and Blocksets


Matlab
Fixed Point Toolbox modelling large integers.

Simulink
Fixed Point Blockset modelling (large) integers. Stateflow was used to implement the controller. hdlCoder Generating generic VHDL code Xilinx Sysgen for HIL

Development Issues
In cryptography all numbers are usually either bit fields or integers modulo n. Therefore use a toolbox like Fixed Point Toolbox to model these numbers. Model the algorithm in Matlab Model the algorithm in Simulink/Stateflow, and compare the results vs. the results from the Matlab model. Generate the code and run it.

RSA Key Exchange (RFC4432)


Random bytes K
b,p Bobs public key

a,b,p

c= mb mod p Signed exchange hash

Put K into message m m=(c)a mod p

What is the engine in RSA, Diffie-Hellman and El-Gamal

n X

mod m

m h t i r a g o l s i e t m e r o l c T s u i L d D mo U C I F F DI
8

The Usual approach


To calculate exponentiation modulo m repeatedly do:
1. X*X (square and multiply) 2. Reduce modulo m by trial division or Barrets algorithm

For small numbers this can be done efficiently For large numbers this can become a bit difficult

The Montgomery Algorithm


x y n' r n

*
t

Calculates (a*r) * (b*r) *r-1 mod n

*
m t

M *
m2

+
y1

/ y2

Result is (a*b*r) mod n

y1

y2

Be aware of timing/power attacks!

10

Matlab Development
Matlabs built-in GCD is based upon floats (Double)
A GCD must be created which uses the FI-type.

R2 mod n must be calculated


Create a function which uses the FI type.

A helper function which generates stimuli structures for simulink. The Montgomery Algorithm was developed to compare the results from this algorithm with the results from Simulink.

11

Important topics for the NumericType and fimath objects!


Be aware of the round and overflow modes, they are intended to be used with signal processing. Be aware of how the numbers expands during the calculation because
The precision have impact on the correctness The precision have impact on the performance.

12

The Engine Schoolbook multiplication

13

HW in the LOOP

JTAG

14

Perspectives
The title is Using Matlab to aid the implementation of a fast RSA Cryptocore The title should have been Using Matlab to do the implementation of a fast RSA Cryptocore

An advanced encryption algorithm can implemented using Matlab/Simulink. For commercial SSL offload engines certification is a must. The core can be implemented as an Off-the shelf service

15

Conclusion
Correct use of Simulink with the hdlCoder results in a FAST and efficient core. Simulink runs faster than a comparable VHDL simulation
More tests can be performed during the same time. Using a faster model-based approach make programming more efficient.

The result is virtually generic. You do not need to spend time digging into subtile VHDL constructs. You must have knowledge of the mapping from Simulink Blocks into HDL blocks, and the result will also depend on your synthesis tool!
16

Questions ?
http://www.teknologisk.dk Teknologisk Institut Denmark
Taastrup Aarhus Kolding, Herning, Odense, Hirtshals

http://www.teknologiskinstitut.se Teknologisk Institut AB, Formerly SIFU FIRMA 2000 Poland Swedcert AB carsten.siggaard@teknologisk.dk

17

Das könnte Ihnen auch gefallen