Beruflich Dokumente
Kultur Dokumente
Abstract—Functional verification is the most difficult and time- verification flow of the chips, when improving the efficiency
consuming step in VLSI design flow, owing to the complexity and coverage rate of functional verification is necessary.
and scale of chips rapidly increasing. The key problem of VLSI Cache is the very important component in the
functional verification is improving the efficiency and microprocessors. The main function of Cache is to bridge the
coverage. For the important component-Cache in the gap between the main memory and processor cores.
microprocessors, an FPGA-based pseudo-random functional According the types of stimulus, the functional verification
verification method is proposed in this paper. The testbench of method of Cache can be divided into two categories: directed
this method is synthesizable, and the field programmable gate verification method and random verification method.
array (FPGA) emulation process is integrated to improve the
Broadly speaking, functional verification for Cache faces
efficiency of verification. The functional verification coverage
three problems. Firstly, the huge verification space is the
is increased by automatically generating the constraints
directed pseudo-random test stimuli. The method is applied in
main bottleneck of functional verification for Cache. In
the real chips, and is compared with the pseudo-random general, the functional complexity and design scale of Cache
software simulation method. The results show that our method is very large. If using directed verification method, it is very
is faster by about three orders of magnitude, and find more difficult for verification team to finish so much testbench.
bugs in the designs. Secondly, the automatic correctness check for verification
results is very necessary. The complex function of Cache and
Keywords-Cache; functional verification; pseudo-random; large scale of testbench bring much trouble to check the
field programmable gate array results correctness for Cache verification. Therefore the
correctness check for verification results is difficult, time-
I. INTRODUCTION consuming and error-prone. Thirdly, the high efficiency and
coverage rate is the main goal for functional verification of
Functional verification remains one of the largest Cache. When the directed verification method is adopted, the
challenges in modern VLSI design flows taking up to 70% verification team should write testbench for each functional
[1] of the total design time, owing to the dramatically point of Cache. It consumes much time in verification flow.
increasing complexity and the number of transistors for a Furthermore, some unpredictable combinational condition
single silicon chip. Functional verification is the register and corner cases are very difficult to be covered by the
transfer level (RTL) code or gate level netlist should directed testbench. The random verification method can
conform to the specification of the VLSI chip. The aim of solve the three problems during the verification flow of
functional verification is to find and locate the bugs in the Cache. The random verification method generates the
circuits. In recent years, new methods have been developed flexible scale of testbench, and easily covers many
to cope with the verification challenge. Software simulation unexpected functional corner cases. Therefore, it can arrive
method and hardware emulation method are the most at higher verification coverage rate. However, completely
popular functional verification techniques. The main random test stimuli always cover the same function points
advantage of software simulation method is that localization more than once. Consequently these unnecessary repeatedly
of errors in the design is easy. However, since the software testbench reduce the efficiency of verification.
simulation method is slow, its runtime is too long, and its For the above problems for the verification flow of
functional coverage generally is not very high, with typical Cache, a synthesizable pseudo-random functional
design sizes exceeding the half-million synthesized gates verification method is proposed in this paper. Firstly, to
mark. The most popular hardware emulation technique is improve the functional coverage rate, the pseudo-random
FPGA-based emulation method. It is generally faster than stimulus generation technique is adopted. Secondly, all of the
software simulation method for about a few orders of testbench are synthesizable. Thus the FPGA-based hardware
magnitude. Further, it generally can arrive at the high emulation approach is applied in this method, to achieve
coverage rate. However its debug process is difficult and its substantial improvement in verification efficiency. Thirdly,
testbench should be synthesizable. Therefore, the hardware constraint-directed random test generation technique is
emulation method is generally applied in functional introduced to reduce the unnecessary repeated test. The
278
FSM contains 5 states: IDLE, BUILD, IDGEN, SEND, errors. The first type of faults is the reading data error. When
UPDATE. The workflow and transition between these states the checker receives the reading data, it will compare the
are described as follows. reading date and the data at the same address in mirror image
1ˊIDLE˖The initial state of this FSM is called IDLE. memory. If they are unequal, it is indicated there exits a fault
If the initialization of mirror image is finished, the enable for the reading date from Cache. The second fault type is
signal of generating pseudo-random number is set to 1, and called reading ID error. In general, the double data rate
is sent to random number generator. Then the next state goes (DDR) SDRAM is employed in the main memory of CPU.
to BUILD. Otherwise, the next state stays at current state Whereas, the reading data returned from DDR SRAM may
IDLE. be out of order. Therefore, the test stimuli generator always
2ˊBUILD ˖ In this state, the constraints directed allocates a read ID in company with the address for each
pseudo-random signals, such as the address, writing data and reading operation. As above mentioned, it also maintenances
so on, are build. When received the pseudo-random number, a used ID set. When reading data and ID are returned from
this module extracts the lowest n bits as the writing data to Cache, the error checker will determine whether this ID
be sent to Cache. The next m bits of random number are belongs to the used ID set. The third fault type is the error
treated as the accessing address to Cache. The value of m is correcting code (ECC) check error. Currently ECC technique
determined by the depth d of mirror image memory, more is generally applied in DDR SDRAM and Cache. While the
precisely m=ªlog2dº. The highest bit is denoted as the type reading data are received from Cache, the error checker will
of operation. It indicates the current operation to Cache is compute ECC parity, and judge there exits one bit fault or
reading or writing operation. If it is reading operation, the two bits faults.
next state is transmitted to IDGEN. Otherwise, the next state The timeout reporting module judges whether the reading
is jumped to SEND. date can be return from Cache within a time limit after the
3ˊIDGEN˖In this state, the module detects whether reading operation is started. In this module, a 64-bit counter
there exits any idle reading ID number or not. If the set of is added for each reading ID. When a idle reading ID is used,
idle IDs are empty, the FSM stays at the current state. the corresponding counter begins to increase by 1 from 0.
Otherwise, the module selects an idle reading ID. This ID is While that reading ID and data are received from Cache, the
added into the used ID set, and removed from idle ID set. A counter will stop and set to 0. Otherwise, the counter will
matching list between the used ID and address is updated at continue accumulating, until the counter reaches the preset
the same time. Then the next state goes to SEND. timeout value. If timeout encountered, this module will
report the error signal to the system.
4ˊSEND˖The main function is sending control and
data signals to Cache. The sending valid signal is set to 1. At
the same time, all other signals, such as the generated prbs_gen_64bit(clk,rst_n,enable,seed_data,prbs_o)
address, writing data, operation type, reading ID, are sent to
Cache. Then the next state will arrive at UPDATE. always @(posedge clk) begin
5ˊUPDATE˖In this state, the writing data to Cache if (!rst_n) begin
are updated to the mirror image memory. When this current lfsr_q <= seed_data;
operation is writing operation, the writing data are end
simultaneously stored in the same address of mirror image
else if (enable) begin
memory. Then the next state will return back to IDLE.
lfsr_q[64] <= lfsr_q[64] ^ lfsr_q[63];
lfsr_q[63] <= lfsr_q[62];
lfsr_q[62] <= lfsr_q[64] ^ lfsr_q[61];
IDLE lfsr_q[61] <= lfsr_q[64] ^ lfsr_q[60];
Initialization
Update Over
lfsr_q[60:2] <= lfsr_q[59:1];
lfsr_q[1] <= lfsr_q[64];
Constraints
Directed end
Pseudo- UPDATE BUILD end
random Test
Stimuli
Generator Write Op assign prbs_o = lfsr_q[64:1];
Send Packets
Read Op
ID Generation
The pseudo-random number generator is used to produce
the random number. It adopts a typical 64-bit PRBS pseudo-
Figure 2. The finite state machine
random number generating algorithm. This algorithm is
illustrated in Fig. 3. If a 64-bit random number is not enough
The automatic error checker is composed of error
for all the signals, the generator changes different seeds to
judgment module and timeout reporting module. The error
build multiple random numbers. The input signals of this
judgment module mainly collects and judges three types of
algorithm include the clock signal clk, and the reset signal
279
rst_n, the enable signal enable, the 64-bit seed signal reading data is returned or the counter reaches the preset
seed_data. The output signal is a 64-bit pseudo-random value. The other branch fulfills the receiving data from
number called prbs_o. Cache. While the reading data and ID is arrived, the address
In Fig. 3, while reset signal is valid, the initial value of is obtained from the matching list between used ID and
the 64-bit register signal lfsr_q is assigned by the seed data. address, according to the ID. Then the data are read from the
When the reset signal is invalid, the algorithm is waiting for corresponding address of the mirror image memory, and
the enable signal. If the enable signal is 1, the linear compared with the reading data from Cache. At the same
feedback shift mode is adopted to generate the pseudo- time, the reading ID error and ECC parity error are checked
random number. In other words, the current clock cycle of and reported.
64-bit lfsr_q is obtained by ring shift of the last clock cycle
of lfsr_q. However, there is mutation on some bits. The 64th IV. EVALUATION RESULTS AND ANALYSIS
bit is generated by exclusive or operation of the 63rd and To evaluate the effectiveness and efficiency of the
64th of last cycle lfsr_q. Similarly, the 62nd bit is exclusive method, the three capacities of Cache, including 128KB,
or of the 61st and 64th of last cycle lfsr_q. The 61st bit is 256KB and 512KB, are implemented. The pseudo-random
exclusive or of the sixtieth and 64th of last cycle lfsr_q. This software simulation method is very popular verification
algorithm has two advantages. The first is all of the codes in technique for Cache. Therefore the proposed FPGA-based
this algorithm are synthesizable, and can be applied in FPGA pseudo-random verification method is compared with
verification. The second is the algorithm can generate pseudo-random software simulation method. The designs
different pseudo-random number by changing seed data, to under test (DUT) are 128KB Cache, 256KB Cache and
obtain more stimuli and improve the test coverage. 512KB Cache.
The software simulation environment is Cadence NC-
Verilog simulator. The experiments were conducted on a 2.9
Mirror image initialization
GHz Intel Xeon machine having 64 GB memory and running
the Linux operating system. The platform of the
synthesizable pseudo-random verification method is an
FPGA board based on Xilinx Virtex-6 565T FPGA device.
Pseudo-random Receive
number generation reading data The synthesizable pseudo-random verification method is
implemented in RTL verilog. The ISE tool is used to perform
synthesis, layout and routing. Finally, the bit stream of the
Build the control Obtain address DUT and testbench is generated, and is downloaded to the
and data signal by read ID FPGA verification board. The FPGA device works on
60MHz.
Allocate read
The evaluation results of the software simulation method
ID Read data and and pseudo-random verification method on FPGA are list in
compare Table 1. Table 1 shows the number of requests sent by
testbench within 10 seconds (Reqs no) and the number of
Send signals to bugs found by pseudo-random software simulation method.
Cache Errors check The last two columns provide the number of requests sent by
testbench within 10 seconds (Reqs no) and the number of
Update mirror bugs found by pseudo-random verification method on
image FPGA. The columns called Reqs no. are to evaluate
efficiency of the two verification methods.
Timeout report
TABLE I. VERIFICATION RESULTS OF TWO METHODS ON CACHE
280
and miss buffer requests leads to another bug. The last one is more aggressive techniques to improve the efficiency of
the address conflict problem of input queue and miss buffer locating the bugs. The other future work is to apply our
requests. The three bugs are deeply faults in the design, and method to more designs.
dug very difficultly.
The 256KB and 512KB Cache are modified based on ACKNOWLEDGMENT
128KB Cache. Therefore the software has found 2 bugs The authors would like to thank all peer reviewers for
introduced by modification. The two bugs of 256KB Cache their valuable comments and suggestions. This work is
are identical as the two bugs of 512KB Cache. However, supported by the National Natural Science Foundation of
FPGA-based pseudo-random verification method has found China under grant No. 61103083 and 61133007, and
5 bugs. The two bugs are the same as the faults found by National High Technology Research and Development
software simulation method. The other three bugs are similar Program of China (863 Program) under grant No.
as the bugs found in 128KB Cache. In summary, the FPGA- 2012AA01A301.
based pseudo-random verification method achieves
substantial improvement on coverage, and can cover more REFERENCES
corner cases and combinational conditions. This method can [1] P. Bose, D.H. Albonesi, and D. Marculescu, “Guest editors’
help to find more faults in the designs, and improve the introduction: power and complexity aware design,” IEEE Micro, vol.
efficiency of verification, and shorten the time to market of 23, pp. 8-11, 2003.
chips. [2] Patrick G, Christian L, Serge P, and Arnaud V, “Comparison between
random and pseudo-random generation for BIST of delay, stuck-at
V. CONCLUSIONS and bridging faults,” Proc. 6th IEEE International On-Line Testing
Workshop, pp. 121-126, 2000.
For the functional verification of Cache, we propose an [3] Liang Z, Yan X, Wang J, and Xu Z, “A dynamic random instruction
FPGA-based pseudo-random verification method. This and stimulus generation for functional verification of embedded
method has two features: one is the testbench is processor,” Proc.5th International Conference on ASIC, pp. 459-462,
synthesizable, and the FPGA emulation process is introduced 2003.
to improve the efficiency of verification; the second is the [4] Mike B, Darren G, and Tim B, “A comparison of three verification
techniques: directed testing, pseudo-random testing and property
pseudo-random stimuli are generated automatically, and then checking,” Proc. 39th Design Automation Conference, pp. 819-823,
the functional verification coverage is increased. This 2002.
method is compared with the pseudo-random software [5] Qin X, Mishra P, “Automated generation of directed tests for
simulation method. The results show that our method is transition coverage in cache coherence protocols,” Proc. 2012 Design,
faster by about three orders of magnitude, and find more Automation and Test in Europe, pp. 3-8, 2012.
bugs in the designs. One of the future works is to explore
281