Beruflich Dokumente
Kultur Dokumente
, Steve Liu
1
2
+
5
i=1
p
i
1/2
5
M
. Since
the transformed LCGs have N columns the probability of
the output of the NLCG being same for two iterations j
and k is
1
2
N
. With a large M and N, this proba-
bility can be made arbitrarily small. This means that the
composite sequence (of length M N) is maximal. For
our implementation with M = 32 and N = 4096, this
probability is
1
2
32
4096
0.
Figure 1: 4096-bit NLCG with Encoded Majority Function Blocks
0
1
2
31
0 1 2 4094 4095
M
31
M
31
M
31
M
31
M
31
NLCG Output
12 12
12
12-bit
LFSR
Index of
M Bo!"
409#-1 M$%
f
0
M
f
1
M
f
2
M
f
4095
M
0
1
2
30
31
&n!ode'
M()o'it*
fun!tion
&n!oded M()o'it* Fun!tion
Fun!tion See!tion Bo!" M
12-bit
LFSR
12
i
LCG
i
+32-bit,
0 1 2 31
IV. Experimental Results
We report our results both in terms of quantity of num-
bers generated (throughput) and the quality of random
numbers (results of NIST, Diehard and Dieharder bench-
marks).
A. Quantity of Random Numbers
We implemented the program for primality testing inde-
pendently, comparing results for CPU and GPU versions
of the program. We then integrated the primality tester
with the LCG kernel and observed the time taken for gen-
erating a xed quantity of random numbers.
The CPU used for comparison had a 64-bit Intel Xeon
Processor with 8 cores and 2 cores per thread. It had
16 GB of memory, 8 MB of L3 cache and was clocked
at 2.9 GHz. The CPU used Ubuntu 11.10 distribution
of the Linux operating system. For the GPU, we used
a NVIDIA Kepler GeForce GTX 680 card with 8 Mul-
tiprocessors (MP), 192 CUDA cores/MP, and 2 GB of
global memory. Table 1 reports the comparison of CPU
and GPU versions of the NLCG generators with 4096-bit
output. Note that we have assumed that the cryptography
application, Monte Carlo method or any random number
consuming application resides in the same kernel as the
NLCG generator itself. This makes the GPU-NLCG in-
tegrated with the consuming function, and removes the
overhead of transferring a large set of random numbers
generated every second to the CPU. The results reported
in Table 1 correspond to k = 1000 for the Miller-Rabin
algorithm and 4096-bit random numbers.
Table 1: Results for CPU and GPU versions of NLCG
Program
Program # Random Time Speedup
NLCG_CPU 10
9
3.01 h -
NLCG_GPU 10
9
61.9 s 175x
NLCG_CPU 2*10
9
6.22 h -
NLCG_GPU 2*10
9
124 s 180x
The GPU throughput calculated from Table 1 comes
to be 66.1 Gbps for the GPU version and 337 Mbps for
the CPU version of the program. Thus, the GPU-NLCG
can sustain data rates for communication standards like
OC-768, which operate at 40 Gbps.
B. Quality of Random Numbers
We used 3 benchmark tools for assessing the quality of
random numbers generated by our NLCG - Marsaglias
Diehard battery of tests of randomness [11], Dieharder (an
open source random number/uniform deviate generator
and tester [12]), and NIST, which is a statistical test suite
for the validation of RNGs and PRNGs for cryptographic
applications [10]. The Dieharder test suite incorporates
many of the tests included in NIST, and is available as
a standard software package in Ubuntus software repos-
itory, and can test either P-bit bitstrings (with P being
user specied) or double precision oating point numbers
in the range [0.0, 1.0). For the Diehard and NIST suites,
we had to split each 4096-bit random number into chunks
of 64-bit numbers for testing. As required by each of the
three test suites, we generated binary output les from the
NLCG with 8 million samples each.
The results of all three test suites are reported as p-
values. The p-value for a particular test is compared with
the following two parameters to decide the outcome of that
test:
Weak threshold - The test is deemed as weakly passed
if p-value is found less than the weak threshold. De-
fault is 0.005 for Dieharder tests. NIST does not use
this parameter.
Fail threshold - The test is deemed as certainly failed
if the p-value is found lesser than the fail thresh-
old. Dieharder recommends this to be 0.000001, while
NIST refers to this parameter as , with a recommen-
dation of 0.01 be used for all tests.
An of 0.001 indicates that one would expect one sequence
in 1000 sequences to be rejected by the test if the sequence
was random. For a p-value0.001, a sequence would be
considered to be random with a condence of 99.9%. For
a p-value0.001, a sequence would be considered to be
non-random with a condence of 99.9%.
We used default sample sizes recommended by the in-
dividual tests. For the GPU-NLCG, all the 15 tests in
Diehard, 12 major tests and their 162 variants in NIST,
and the 25 major tests and their 209 variants in Dieharder
passed. Table 2 lists the results of important tests from
all the 3 test suites. In this table, any test with a prex
NIST comes from the NIST suite, while a prex DHr
and DH refer to the Dieharder and Diehard suites re-
spectively. Column 2 lists the t-samples which is the total
random samples consumed by the test under considera-
tion.
Table 2: Results from Statistical Test Suites. (All tests
pass.)
Test-name t-samples p-value
NIST_Runs 100000 0.54890228
NIST_Serial 100000 0.21857258
NIST_BlockFrequency 100000 0.13538794
NIST_Monobit 100000 0.13499881
DHr_Bitstream 2097152 0.87918116
DHr_Squeeze 100000 0.36519217
DHr_Sums 100 0.35587054
DHr_rgb_lagged_sum 1000000 0.24969625
DHr_rgb_permutations 100000 0.17348102
DHr_min_distance 10000 0.53072256
DHr_3d_Sphere 4000 0.71973711
DH_rank_32x32 40000 0.00505979
DH_dna 2097152 0.23968055
V. Conclusion
We present a GPU-based parallelized implementation of a
simple linear congruential generator and extend it to serve
as a non-linear generator. The resulting implementation,
which we call GPU-NLCG, is an inexpensive method for
fast random number generation with variable bit-size on
any GPU based platform. The GPU-NLCG is perfectly
suited for fast key generation for cryptography applica-
tions. The NLCG uses 4096 independent LCGs, the out-
puts of which are permuted using 4096 encoded majority
functions selected randomly per bit per iteration. The
throughput of our system is limited not by the genera-
tor but by the bandwidth of the CPU-GPU PCI-e bus.
Hence, an FPGA or ASIC based implementation of our
NLCG may yield yet higher throughputs.
References
[1] E. Zenner, Cryptanalysis of LFSR-based pseudorandom genera-
tors - a survey, 2004.
[2] Severance, Frank (2001). System Modeling and Simulation, John
Wiley & Sons, Ltd. pp. 86. ISBN 0-471-49694-4.
[3] Joan Boyar (1989). Inferring sequences produced by pseudo-
random number generators, Journal of the ACM 36 (1): 129141.
[4] Kent Lin, Sunil Khatri (2010). VLSI Implementation of a Non-
Linear Feedback Shift Register for High-Speed Cryptography Ap-
plications, Proceedings of the 20th symposium on Great lakes
symposium on VLSI, Pages 381-384.
[5] J. Eichenauer, J. Lehn, A non-linear congruential pseudo random
number generator, Statist. Papers 27 (1986) 315-326.
[6] Shuang Gao and Gregory D. Peterson, GASPRNG: GPU Acceler-
ated Scalable Parallel Random Number Generator Library, Com-
puter Physics Communications.
[7] Reversing the Mersenne Twister RNG Temper Function.
http://b10l.com/?p=24
[8] G. Marsaglia. Xorshift RNGs. J. of Statistical Software, 8(14):1-6,
2003.
[9] Panneton, Franois (October 2005). On the xorshift random num-
ber generators, ACM Transactions on Modeling and Computer
Simulation (TOMACS) Vol. 15 (Issue 4).
[10] NIST computer security resource center.
http://csrc.nist.gov/groups/ST/toolkit/rng/index.html
[11] The Marsaglia Random Number CDROM includ-
ing the Diehard Battery of Tests of Randomness.
http://www.stat.fsu.edu/pub/diehard/
[12] Dieharder: A testing and benchmarking tool for random number
generators. Dieharder page for Ubuntu
[13] Data Encryption Standard (DES), U.S. Department of
Commerce/National Institute of Standards and Technology.
http://csrc.nist.gov/groups/ST/toolkit/rng/index.html
[14] Announcing the Advanced Encryption Standard (AES). Federal
Information Processing Standards Publication 197. United States
National Institute of Standards and Technology (NIST). November
26, 2001.
[15] New Directions in Cryptography. W. Die and M. E. Hellman,
IEEE Transactions on Information Theory, vol. IT-22, Nov. 1976,
pp: 644-654.
[16] Rivest, R.; A. Shamir; L. Adleman (1978). A Method for Obtain-
ing Digital Signatures and Public-Key Cryptosystems. Communi-
cations of the ACM 21 (2): 120-126.