Sie sind auf Seite 1von 5

2014 15th International Microprocessor Test and Verification Workshop

A random instruction sequence generator for ARM®


based systems

Shajid Thiruvathodi Deepak Yeggina


Systems and Software Group Systems and Software Group
ARM Embedded Technologies Pvt. Ltd. ARM Embedded Technologies Pvt. Ltd.
Bangalore, India Bangalore, India
shajid.thiruvathodi@arm.com deepak.yeggina@arm.com

Abstract— Random instruction sequence (RIS) tools are widely paper describes the tool used for generating the random
used across the industry for processor verification and validation. instructions and also the capability of inserting directed
These tools are often used to find design bugs in a relatively instruction sequence blocks into these random instruction
stable but not yet mature RTL design. RIS tools are very sequences.
effective in generating test scenarios that are hard to envision.
However, quite often completely random instruction sequences
are of little test value for exposing corner cases in the design,
II. THE TOOL
especially if the bug involves a sequence of events happening in a
narrow timing window. Macros can help enhance the test quality
of the generated instruction sequences by providing controlled This section discusses 3 main aspects of the tool.
randomness around a specific sequence of instructions targeting A. Instruction sequence generator
a specific area in the processor architecture. B. Bare metal kernel
Index Terms—Random instruction sequences, macros, directed
C. Results checking engine
testing A. Instruction sequence generator

I. INTRODUCTION This component of the RIS tool generates a random


instruction sequence that constitutes a test. The instructions
Random instruction sequences (RIS) are an industry generated are primarily targeted at the memory system, though
adopted technique to verify processors. They are an effective this is not a limitation of the generator component. Full set of
way to tackle a huge verification space in modern general ARM ISA can be added to the list of supported instructions
purpose super scalar processors. The verification is with minimal changes.
constrained by the quality of the generator, the amount of time The test generation is controlled through a set of
available and the amount of computation resources at hand. configuration files. The configuration files specify the memory
Most of the times, the amount of computation resources and areas usable by the test user controlled partition of source,
the amount of time available are limited. To make effective use target and base registers to be used in the test and the
of the available time and resources, RIS generators have knobs instructions weights.
to tune the generator to target specific areas of interest or
concern in the design. At ARM, we have developed a bare 1) Memory targeted by the test
metal RIS generator tool, which is a MP aware random
instruction sequence generator. This tool is focused on finding The tool generates memory access instructions targeting
bugs in the memory system (MMU/caches) of a shared two types of memory areas - common areas and private areas.
memory multiprocessor system and interconnects. The tool is Load and stores can be performed by all processors to the
capable of running on simulators, hardware accelerated common memory areas, but these are not checked due to the
emulators, FPGA or silicon environments. Because of the non-deterministic values in these areas. . The private areas are
slow execution speeds available in simulator environment, the allotted to each CPU in the system. Load and stores can be
tool generates a number of instruction sequences or tests performed by respective CPUs to the private memory areas,
offline on a host machine and packs these generated tests as and these are checkable memory regions of the test. Memory
part of the final executable image. While running this image system related operations like data preload, cache maintenance
on the simulator, the tests are launched in a pattern which operations, barriers etc. can also be performed to these memory
stresses the design. On faster execution environments like areas.
emulators, FPGA or silicon, the tool is capable of generating
tests on the target platform directly and executing them. This

1550-4093/15 $31.00 © 2015 IEEE 73


DOI 10.1109/MTV.2014.20
Fig. 2: Example configurations file specifying resource
partition.

Fig. 2. Above shows a sample resource partition


configuration file that reflects a resource partition with 2
common and 2 private memory regions.
The PRIVATE_BASE_REGS contains the information
about the registers (REGISTER field) that will be used to hold
the address of the private region. It also has the size (SIZE
field) of the private memory areas.
The register lists that start with PREDICTABLE will be
used as the target for load and source for the store instructions
Fig. 1: An example test memory layout from or to the private areas respectively. These have
predictable values at the end of the test. The register list that
In the above example there are 2 private memory areas per starts with UNPREDICTABLE will be used as a target for the
CPU and 2 common memory areas. The bi-directional arrows load or source of store instructions to the common area. These
represent loads and stores performed to these memory areas, registers have un-predictable values at the end of the test.
from the generated test cases.
3) Instructions
2) Resource Partition
The tool supports all of the memory system related
The memory access instructions use a general purpose instructions available in the ARM ISA. A subset of data
register for specifying the base address on which the operation processing instructions available in ARM ISA is also
can be performed. This is called as base register. The addresses supported. The instructions that are generated can be tuned by
in the base registers are assigned by the generator at run-time. specifying weights to the instructions in another configuration
Users can control the base registers to be used for different file. The number of instruction sequences in a test, the number
memory areas. Each memory area should have a base register of tests, and number of re-runs of the tests can also be
assigned in the resource partition configuration file. configured using the configuration file.
The value in the target registers for a load operation can be Fig. 3. Below shows an example of the test snippet
predictable or un-predictable. If the load is performed from a generated using the resource partition shown in Fig. 2. Load
private memory area, the value in the register is always and store operations using the common base and private base
predictable. If the load is performed from a common memory registers are generated in the test.
area, the value in the register is un-predictable. Predictable
values can be checked at the end of the test (Checking is
explained later in this paper, in a separate section). The
predictable registers can also be made the source of stores to
the private memory area and common memory area. The un-
predictable registers can be made as source of stores to
common memory areas only. The resource partition
configuration file describes all the base registers, source and
target registers and predictable and unpredictable registers. The
tool treats these register as resources and uses them with
memory access instructions operating on common or private
memory areas. This register allocation is called as resource
partition.
P2C2 : {
PRIVATE_BASE_REGS : [
{
REGISTER : 8
SIZE : 64
WEIGHT : 30
},
{
REGISTER : 10
SIZE : 64
WEIGHT : 40
}
]

COMMON_BASE_REGS :[
{
REGISTER :7
SIZE : 64
WEIGHT : 25
},
{
REGISTER :9
SIZE : 64
WEIGHT : 5
}
]

ALIAS_REGS : [11, 12]


INDEX_REGS : [6]

PREDICTABLE_TARGET_REGS : [0, 1, 2, 3]
UNPREDICTABLE_TARGET_REGS : [4, 5]

PREDICTABLE_TARGET_S_REGS : [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
UNPREDICTABLE_TARGET_S_REGS : [0, 1, 2, 3, 4, 5, 6, 7]

PREDICTABLE_TARGET_D_REGS : [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
UNPREDICTABLE_TARGET_D_REGS : [0, 1, 2, 3]

PREDICTABLE_TARGET_Q_REGS : [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]


UNPREDICTABLE_TARGET_Q_REGS : [0, 1]

74
Fig. 3: An example test snippet generated using the in C++. The FUNCs bring in another level of randomization to
resource partition shown in Fig. 2. the generated instruction stream. An example of a function is a
cache evictor function that gets invoked from the test case in
4) Memory Attributes the course of a test run. The execution jumps to the start of the
function through a branch instruction inserted in the random
ARM architecture supports various attributes for memory instruction sequence and returns back to the next immediate
access. The configuration file has support for specifying instruction after the branch once the function returns.
weights to these memory attributes. Based on the weights, Functions give users the capability to write complex
memory attributes are picked for the common and private algorithms for stressing/verifying the CPUS, but on the other
memory areas of the test. The attributes for the memory used hand, a call to a function involves some overhead of saving and
by the test is selected by the generator during the generation of restoring the intermediate context of the test.
the test. The instruction behavior can also be affected based on
the attributes of the memory area. For, example a memory Fig. 4: An example test snippet showing call to a function.
access instruction will fault on un-aligned address access to
region of memory marked as Device. The tool can generate
additional instructions to make sure that the memory accesses
to these regions are always aligned.
B. Bare metal kernel

The generated tests are compiled along with a minimal bare


metal kernel which handles the setup of the test before they are
run.
The information regarding number of memory areas, size of
each area and the general purpose registers used for accessing
the private and common memory areas of the test is passed to
the kernel, during compile time, through resource partition
configuration file.
The kernel is responsible for page table setup, exception
handling, setting up the test execution context, launching of the
test cases and saving of the checkable values in the registers
and memory areas. There are additional checks performed in
the kernel to detect memory accesses outside of the allocated
test memory areas, as part of exception handling function.
C. Results checking engine

The tool uses two pass consistency checks to determine the


result of a test. The results of the first execution of the test case
are considered as reference values. Before the first execution
of the test case, the source registers, target registers and
memory areas are initialized with random values. After the first
execution, the predictable context of the test is saved.
Predictable context is the predictable source and target registers
and the memory values in the private memory areas of each
core.
For the second execution of the test, the registers and the
memory areas are initialized to the same random values as the
first execution.
The predictable context values after both executions are
compared. Any inconsistency or mismatch is flagged as a test FUNC involves a 5 instruction overhead marked with arrows,
failure, else the test is declared passed and the kernel launches before and after the call to FUNC.
another test.
IV. MACROS
III. FUNCS
Functions are long running directed tests that could be In spite of various available knobs in the generator, it is
inserted a random sequence as a call. These are usually written very hard to generate some specific test sequences. To

75
overcome the short coming of the generators in these areas,
verification engineers usually resort to directed testing of these
areas. Directed tests are manpower intensive with a limited
amount of reuse across projects. Usually, directed tests cannot
be run with randomly generated instructions, hence they lose
out on the goodness of randomly generated stimulus.

For example consider the lock acquire/release sequence in a


multiprocessor system. How would a directed test be able to
do the following things?
1. Generate varying number of instructions in between
lock acquire and lock release.
2. Cause the lock acquire to miss the cache.
3. Cause the lock release to miss the cache.
4. Cause the lock acquire to issue a page table walk. All macros start with macro directive and end with mend
The events listed above are a small subset of all the events directive. There are special declarative statements in the
that can be crossed with the lock acquire/release scenario. macros for choosing the registers used by the macro. The
Other examples of hard to generate test sequences are: directives reg_cbase and reg_ctarget is used for picking
1. Message passing scenarios involving multiple CPUs registers used for addressing the common area and the
2. Wait for event/interrupt scenarios unpredictable registers respectively. The align directive is used
for aligning the value present in the register specified. In the
At ARM, we have come up with a way to mix the directed above example the registers picked for x will be aligned by 4
scenarios with randomly generated code. For this purpose we bytes. There are reg_ptarget and reg_ptarget directives that can
have come up with a custom language which is very close to be used for picking registers used for addressing the private
ARM assembly language. We call this language the Macro area and predictable target registers respectively.
Language (ML).
ML gives verification engineers the ability to introduce When the above macro is rendered it is inserted into the
interesting/desired instruction snippets in the random generated stream of the random instructions without affecting the flow.
instruction sequence. These snippets are introduced into the We will get a tight instruction sequence. Here is snippet of test
sequence without affecting the ‘flow’ (need to save and restore generated which has the above macro rendered into it.
registers) resulting in tighter code sequences.
ML is very similar to ARM assembly, so there is no steep
learning curve involved. The test writer needs to understand the
resources and memory model provided by the tool. The
language provides a number of directives which help the test
writer to control the tool.
The snippet of code written in ML is called a Macro. Once
a macro has been written, the generator parses and renders it
randomly during the instruction generation phase. This results
in a test sequence being run under varying conditions.
“Rendering of macro” refers to the act of converting an abstract
specification of a scenario into actual instructions. Here are 2
case studies of use of macros.

a. Case Study 1
An example is the randomization of the registers. The base
register used for loads and stores in the macro could match
with the ones being used by the generated test. Hence the areas
of memory the macro and the test act could be the same. One
scenario could be that, the macro is acting on the data which is
present in the same cache-line as the one the test just loaded or
stored.
Here is an example of an atomic-increment macro.

76
This is a tight sequence of atomic increment generated
from the macro without the need for any save and restore
instructions. The instructions (in bold) generated from the
macro uses the same register R7 used by the test. This will
give good tighter sequence of instructions without the overhead
involved in calling a FUNC.
b. Case Study 2
Here is a high level description of a bug related to page
table and ASID switch (context switch).

The bug is that the load after the context switch still
hits the TLB of the old context. The LDRNE is speculatively
executed by the core. But the CMP operation results in an
opposite condition EQ. The load is cancelled but the page table
walk continues to happen in the background. The page table
walks returns after the context switch and gets allocated to the
TLB. So there is a virtual to physical address mapping in the
TLB is of the older context. The LDR marked by the arrow
gets the value from an older mapping.
Adding a FUNC to perform the context switching sequence
would not hit the bug. This was because the FUNC added its Above figure shows the rendered macro. The instructions in
own register save overhead which made it hard to meet the bold are the exact instructions rendered by the macro. There is
timing. Here is the macro that hit the bug. no overhead of saving and restoring the registers used by the
‘test’. The sequence is very tight which resulted in exposing
the bug.

V. CONCLUSION
Macros provide a technique for inserting well-crafted blocks
of code into a random sequence of instructions. This capability
of the tool will allow the tool to cover scenarios that would be
very hard to generate if the tool was to generate only random
instruction sequences. There are interesting ways in which
macros can be extended. Instructions could be grouped based
on their functionality. Instead of specifying an instruction in
the macro we could specify an instruction group and an
instruction from that group will be picked when the macro is
rendered. This will give varying flavors of instruction
sequences for a macro. The users also can create their own
instruction groups and use them in the macros. Instruction
sequence length could also be specified along with the macro.
This will aid the tool to pick an instruction from the group
repeatedly based on the length of the sequence. These
instruction sequences are hard to generate with the tool.
Macros could be scaled to work on Multi-core processors.

77

Das könnte Ihnen auch gefallen