Day1 Siggaard

Using Matlab to Aid the Implementation of a Fast RSA Cryptocore
2008 The MathWorks, Inc.
Carsten Siggaard, Senior Consultant Danish Technological Institute (DTI)
Danish Technological Institute (DTI)
Knowledge Development
Knowledge Transfer
Knowledge Application
Why Implement RSA on a Field Programmable Gate Array?

FPGAs are inherently parallel, that means faster than General Purpose Processors but at a much lower clock speed. Consider system using RSA encryption: If you can place the encryption on a separate FPGA then the CPU on this platform can perform other tasks. RSA is a difficult algorithm to implement on FPGAs - much more difficult than the Advanced Encryption Standard (Rijndael, AES) or Blowfish. Therefore if you can implement RSA virtually any encryption standard can be implemented. The core calculations in RSA are the same as those performed in other cryptographic schemes such as Diffie-Hellman key exchange and El-Gamal.
Major Results
The core can perform 35.000 cryptographic operations per second
50% Usage (On Xilinx XC4SX35) 1024 bit message 1024 bit modulo, 5 bit public exponent
@200 MHz 50.000 operations can be performed Compare with AMD Opteron 2.8 GHz: 26.000 Ops/s W 0 9 X A M Power consumption 1 W (Xilinx power estimator using simulated data). Theoretical MAX: 3.150.000 Ops/s (Altera Stratix IV E with 1360 16-bit multipliers).
Used Toolboxes and Blocksets

Matlab
Fixed Point Toolbox modelling large integers.
Simulink
Fixed Point Blockset modelling (large) integers. Stateflow was used to implement the controller. hdlCoder Generating generic VHDL code Xilinx Sysgen for HIL
Development Issues
In cryptography all numbers are usually either bit fields or integers modulo n. Therefore use a toolbox like Fixed Point Toolbox to model these numbers. Model the algorithm in Matlab Model the algorithm in Simulink/Stateflow, and compare the results vs. the results from the Matlab model. Generate the code and run it.
RSA Key Exchange (RFC4432)

Random bytes K
b,p Bobs public key
a,b,p
c= mb mod p Signed exchange hash
Put K into message m m=(c)a mod p
What is the engine in RSA, Diffie-Hellman and El-Gamal
n X
mod m
m h t i r a g o l s i e t m e r o l c T s u i L d D mo U C I F F DI
8
The Usual approach

To calculate exponentiation modulo m repeatedly do:
1. X*X (square and multiply) 2. Reduce modulo m by trial division or Barrets algorithm
For small numbers this can be done efficiently For large numbers this can become a bit difficult
The Montgomery Algorithm

x y n' r n
*
t
Calculates (a*r) * (b*r) *r-1 mod n
*
m t
M *
m2
+
y1
/ y2
Result is (a*b*r) mod n
y1
y2
Be aware of timing/power attacks!
10
Matlab Development
Matlabs built-in GCD is based upon floats (Double)
A GCD must be created which uses the FI-type.
R2 mod n must be calculated

Create a function which uses the FI type.
A helper function which generates stimuli structures for simulink. The Montgomery Algorithm was developed to compare the results from this algorithm with the results from Simulink.
11
Important topics for the NumericType and fimath objects!

Be aware of the round and overflow modes, they are intended to be used with signal processing. Be aware of how the numbers expands during the calculation because
The precision have impact on the correctness The precision have impact on the performance.
12
The Engine Schoolbook multiplication
13
HW in the LOOP
JTAG
14
Perspectives
The title is Using Matlab to aid the implementation of a fast RSA Cryptocore The title should have been Using Matlab to do the implementation of a fast RSA Cryptocore
An advanced encryption algorithm can implemented using Matlab/Simulink. For commercial SSL offload engines certification is a must. The core can be implemented as an Off-the shelf service
15
Conclusion
Correct use of Simulink with the hdlCoder results in a FAST and efficient core. Simulink runs faster than a comparable VHDL simulation
More tests can be performed during the same time. Using a faster model-based approach make programming more efficient.
The result is virtually generic. You do not need to spend time digging into subtile VHDL constructs. You must have knowledge of the mapping from Simulink Blocks into HDL blocks, and the result will also depend on your synthesis tool!
16
Questions ?
http://www.teknologisk.dk Teknologisk Institut Denmark
Taastrup Aarhus Kolding, Herning, Odense, Hirtshals
http://www.teknologiskinstitut.se Teknologisk Institut AB, Formerly SIFU FIRMA 2000 Poland Swedcert AB carsten.siggaard@teknologisk.dk
17

Day1 Siggaard

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Day1 Siggaard

Hochgeladen von

Copyright:

Verfügbare Formate

Using Matlab to Aid the Implementation of a Fast RSA Cryptocore

2008 The MathWorks, Inc.

Carsten Siggaard, Senior Consultant Danish Technological Institute (DTI)

Danish Technological Institute (DTI)

Why Implement RSA on a Field Programmable Gate Array?

Used Toolboxes and Blocksets

RSA Key Exchange (RFC4432)

c= mb mod p Signed exchange hash

Put K into message m m=(c)a mod p

What is the engine in RSA, Diffie-Hellman and El-Gamal

The Usual approach

The Montgomery Algorithm

Calculates (ar) (br) r-1 mod n

Result is (abr) mod n

Be aware of timing/power attacks!

R2 mod n must be calculated

Important topics for the NumericType and fimath objects!

The Engine Schoolbook multiplication

Das könnte Ihnen auch gefallen