You are on page 1of 84

AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS .

CHAPTER-1
INTRODUCTION
Advances in computer networks and storage subsystems continue to push the rate at
which data streams must be processed between and within computer systems. Meanwhile,
the content of such data streams is subjected to ever increasing scrutiny, as components at
all levels mine the streams for patterns that can trigger time-sensitive action. The problem
of discovering credit card numbers, currency values, or telephone numbers requires a more
general specification mechanism.
While there is a well developed theory for regular expressions and their
implementation via Finite-State Machines (FSMs), the use of regular expressions for high-
performance pattern matching is more difficult and is an area of ongoing research. In this
Paper, a memory-efficient pattern matching algorithm which can significantly reduce the
number of states and transitions by merging pseudo-equivalent states while maintaining
correctness of string matching. In addition, the new algorithm is complementary to other
memory reduction approaches and provides further reductions in memory needs.
It is becoming increasingly common for network devices to 'handle packets based
on the contents of packet payloads. Example applications include intrusion detection,
firewalls, and web proxies. These packet content inspection and filtering devices rely on a
fast multi-pattern matching algorithm which is used to detect predefined keywords or
signatures in the packets. Unfortunately, these signature sets are large (e.g., thousands) and
complex, multi-pattern matching is often a performance bottleneck . Another problem is, to
accelerate the speed, fast string matching is necessary. In the matching, a set of rules is
statically giving].
The main purpose of a signature-based network intrusion detection system is to
prevent malicious network attacks by identifying known attack patterns. Due to the
increasing complexity of network traffic and the growing number of attacks, an intrusion
detection system must be efficient, flexible and scalable.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS .

1.1 BASICS OF PATTERN MATCHING


In many information retrieval and text-editing applications it is necessary to be able
to locate quickly some or all occurrences of user-specified patterns of words and phrases in
text. This describes a simple, efficient algorithm to locate all occurrences of any of a
finite number of keywords and phrases in an arbitrary text string.
The approach should be familiar to those acquainted with finite automata. The
algorithm consists of two parts. In the first part we construct from the set of keywords a
finite state pattern matching machine; in the second part we apply the text string as
input to the pattern matching machine.
The machine signals whenever it has found a match for a keyword. Using finite
states machines in pattern matching applications is not new, but their use seems to be
frequently shunned by programmers.
Part of the reason for this reluctance on the part of programmers may be due to
the complexity of programming the conventional algorithms for constructing finite
automata from regular expressions, particularly if state minimization techniques are
needed. In this Project an efficient finite state pattern matching machine can
be constructed quickly and simply from a restricted class of regular expressions, namely
those consisting of finite sets of keywords. Our approach combines the ideas in the Aho-
corasick algorithm with those of finite state machines.

1.2 MOTIVATION
Network intrusion detection system is used to inspect packet contents against
thousands of predefined malicious or suspicious patterns. Because traditional software
alone pattern matching approaches can no longer meet the high throughput of todays
networking, many hardware approaches are proposed to accelerate pattern matching.
Among hardware approaches, memory-based architecture has attracted a lot of attention
because of its easy reconfigurability and scalability.
In order to accommodate the increasing number of attack patterns and meet the
throughput requirement of networks, a successful network intrusion detection system
must have a memory-efficient pattern- matching algorithm and hardware design.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

1.3 NEED OF THE WORK


As now a day we are using this string/pattern matching algorithms everywhere in
our daily life. Some of the main applications are security issues, employee attendance
system, banking applications .Suppose if we want to store the data of millions of
customers data and to match the data when we give input(eg:banking)it requires large
memory and also complexity increases. We can apply this concept to this type of
applications.
In this project, I propose a pattern-matching algorithm which can significantly
reduce the memory requirement. I presented a new pattern matching algorithm (Aho-
corasick algorithm+merg_fsm) which can significantly reduce the number of states and
transitions by merging pseudo-equivalent states while maintaining correctness of string
matching.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

CHAPTER-2
NETWORK INTRUSION DETECTION SYSTEM
Since the gain in popularity of chip design, intrusion detection systems have
begin to be used frequently as one component of an effective layered security model for
an organization. Various alterations of what started as monitoring systems spurred
interest, and intrusion detection quickly became known as an important computer security
tool or individual computers as well as in computer networks. Today they are used in
many places both inside and outside security perimeters and in many different ways.
Always quintessential is that the information collected through detection can be made
into powerful intelligence if put to use to strengthen computer security in the areas of
intrusion prevention, preemption, deterrence, deflection, and countermeasures.
Understandably, a protected system or network is only as secure as its defences are
strong. In the intrusion detection systems that we focus on in this thesis, we show how
pattern matching is a critical ability, and that it must be a strength of the system.
Unfortunately, in the past it has been identified as a visible and exploitable weakness, and
as such, has been the topic of much specialized research for some years now. Although
intrusion detection systems have various uses as is explained further in Chapter 2, many
types of these systems rely heavily on pattern matching within certain core components.
Moreover, pattern matching is widely used in many computer security applications.
In this chapter we give the background for intrusion detection systems (IDSs).
This content along with the state of the art in pattern matching algorithms used within
IDSs are important to fully grasp in order to understand the contributions of this thesis.
Intrusion detection covers a broad range of digital security because IDSs have a wide
range of uses. In general, these systems automate the process of extracting intelligence
about past or present actions that attempt to compromise the confidentiality, integrity, or
availability of a resource. The definition of an intrusion in this context is not fixed, but
rather is a concept that changes depending on the administration or objective of the
system.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

More specifically, the intelligence and information provided by IDS is contingent


upon how the system is being used, and is as important as the chosen IDS itself. Indeed,
there are many ways to use IDSs. If and when an IDS discovers an intrusion, regardless
of how it has been defined, it is common for a system to make a record or report of the
intrusion, typically by way of logging or generating an alert that is sent off to an
appropriate party. More and more, these systems are built to act not only as a judge of
intrusions, but also to react to them as we show in the next section. In the subsequent
sections of this chapter we review a classification of intrusion detection systems that is
familiar to the intrusion detection community. Considering this classification helps to
narrow the focus of the context in which an action or inaction constitutes an intrusion.
An understanding of this classification will clarify the scope of the contributions
of this thesis. We examine more closely the domain of IDSs where our contributions are
positioned. Section 2.1 discusses passive versus reactive IDSs. Section 2.2 discusses
misuse-based versus anomaly-based IDSs. Section 2.3 discusses network-based versus
host-based IDSs. It is in this section that we dig deeper showing that the implementation
and comparison work of this thesis is contained in the intersection between network-
based and misuse-based IDSs. Section 2.3. focuses on this specific intersection under
examination, and when it is appropriate to use systems in this class. Section 2.3. deals
with best practices of how network-based IDSs like these are used within networks.
Finally, Section 2.3.gives an overview of the architecture of an example IDS in this class
which illustrates and motivates the importance of the algorithms used within such
systems like those discussed in the following chapters.

2.1 PASSIVE VERSUS REACTIVE IDSS


Traditionally, intrusion detection systems were passive monitoring systems . As
the name indicates, the nature of detection does not involve any form of response to the
intrusion. The model upon which classic IDSs are supposed to have been built is,
therefore, that of a passive system. A passive system is one in which sensors detect
intrusions and report them to the systems reporting engine which, depending on its
capability and configuration, could format and log the intrusion to a database, a file, or
even a computer console.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

A passive system may also signal an alarm of some kind, but its distinguishing
characteristic is that it does not deal directly with the intrusion to stop it or prevent future
intrusions of the same sort in any way. To the contrary, in a reactive IDS a response to
the suspicious activity is performed by, for example, logging off a user or by
reprogramming a firewall to block network traffic from the suspected malicious source.
If the system takes action to directly affect the current or future intrusions it may
be designated as an intrusion prevention system (IPS) . For example, in the context of a
network-based intrusion prevention system, the system may be able to directly terminate
or rate-limit connections. This differs very little from a reactive intrusion detection
system in only that it performs the action itself. A reactive intrusion detection system, in
contrast to the above example, might have signaled a firewall or another network
appliance to terminate the malicious connection under suspicion. Firewalls and even
application layer firewalls differ in the sense that they do not usually have the capability
to search for anomalies or specific content patterns (or keywords) called signatures
(discussed further in Section 2.2) to the same degree as intrusion detection and prevention
systems do. Despite the classification that we present here, we point out that it is of
course not impossible to make a firewall with all the same capabilities that an IDS
possesses. It is often simply a matter of which names and terms are chosen to best market
the system. Furthermore, although here the example of an intrusion prevention system is
network-based, intrusion prevention systems could be host-based acting to deny
potentially malicious activity.

2.2 MISUSE-BASED VERSUS ANOMALY-BASED IDSS


Misuse detection is very comparable to classical or first-generation virus
scanning. It involves dealing with the input to the system and searching it for what the
IDS rules refer to as patterns of misuse, however those rules define them. Accordingly,
we distinguish that in the context of IDSs, the term misuse does not have to refer to an
attack by an insider or authorized user . A pattern in this context may be very simple, like
looking for a specific string of bytes at a given position or any position.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

It may also be rather complex like matching a regular expression, for example,
involving the presence of one string and the subsequent omission of another string within
a certain range of bytes. It may not even involve what is typically thought of as a pattern,
but instead search for a predefined harmful state that constitutes an abuse. These misuse
patterns are very often of the same nature as patterns in strings or regular expressions,
and in the area of IDSs are referred to as signatures. Consequently, misuse-based
intrusion detection systems are also known as signature-based intrusion detection systems
(and sometimes knowledge- or rule-based IDSs).
The nature of the patterns present in an intrusion detection system depends on
the power of the system itself and its intended use. Naturally a system that can search for
matches of complex patterns must have a more complex language to allow the systems
users to describe the patterns. Conjointly and in general, systems that match complicated
patterns would also be expected to take longer to process the input than a system that can
only match simpler patterns. These patterns or signatures are predefined and preloaded
into a system before it starts processing input. When the system starts processing input
and, in effect, searching for possible signature matches, the relevance of the signature
complexity and the number of signatures may greatly affect the speed of processing
which, depending on the circumstances, may be very important.
Because the misuse-based IDSs are only as good as their signatures, the
effectiveness of the system is clearly evident offline by simply looking at the
completeness of the rule set or signatures it will search for. Often this list of signatures is
referred to as a database of signatures. With this style of IDS the primary resource that is
updated frequently is the database of signatures. Although often the terms signature
and rule are used closely and in connection, veritably they are different; it is custom for
a rule to hold a signature along with supplementary information such as the alert to report
if the signature is encountered in a search.
While true that these kinds of systems only detect known attack classes, it does
not mean they are not valuable for detecting new attacks. It is plausible to misconstrue
that known attack-class signatures identify old attacks, and yet, this is not necessarily the
case. Known attack classes are simply ones for which a vulnerability exists.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

It may or may not have been already exploited in a specific attack. Also,
depending on the level of sophistication of the systems signatures it may be able to stop
whole classes of possible attacks with a certain set of signatures. By stark contrast to
misuse detection, anomaly-based (also called behavior based) intrusion detection systems
do not rely on definitions of what is suspected as malicious, incorrect, or abnormal.
Rather they are programmed to identify what is normal; hence, they should also identify
what falls outside this range.
Typically the systems heuristics of what is normal are learned through self-
learning and keeping statistics, but rules may also be input from a user. Anomaly IDSs
can, thus, be characterized as identifying unknown actions, and consequently, the output
from such a system may be harder to interpret. Unfortunately, it is also possible that the
conditioning process of learning normal behavior is corrupted if the initial conditioning
happens during an attack or another anomaly. Sometimes specification-based detection as
another classification is considered, which is slightly different than misuse and anomaly
detection. Instead of detecting bad or anomalous states it aims to detect states that are
known not to be good; that is, it detects actions that violate a specification of valid
actions (often on a per-program basis) . It can be thought of as anomaly detection with all
good behavior pre-programmed (specified) rather than having the IDS learn the typical
and normal behavior.

2.3 HOST-BASED VERSUS NETWORK-BASED IDSS


A host-based intrusion detection system (HIDS) consists of an application,
generally software, on a machine that is designed to inspect input actions that are
internal to the machine like system calls, application and audit logs, file-system
modifications, and other host activities and states. A commonality often seen in HIDSs is
the use of an object or checksum database that catalogs the last or known good states of
the objects being monitored. Attackers that know of a HIDS on their target system
may try to circumvent the HIDSs detection by covering up traces of their attacks through
modifying entries in this database so as to not set off alarms during the next HIDS scan.
For this reason a HIDS database needs to be strongly, often cryptographically, protected.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

A network-based intrusion detection system (NIDS) may take the form of an


independent network appliance or device tapped into the network with associated
processing capabilities. It monitors network activity, and therefore, its input is solely in
the form of the traffic on the network. Since frequently attacks on networks or machines
within them originate outside of the network in question, NIDSs have a wide range of
possible attacks to detect from the outside (ingress).
These typically include, but are not limited to, denial of service (DoS) attacks,
port-scans, spreading viruses, and attempts to break into or exploit vulnerabilities in
computer systems by malicious individuals, worms, or other malware self-spreading on
the network. However, NIDSs can also help to warn about or guard against sensitive data
and attacks within the network or leaving the relevant network.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

CHAPTER-3
PATTERN MATCHING ALGORITHMS LITERATURE

In this chapter we will discuss different pattern matching algorithms in detail. In


order to accommodate the increasing number of attack patterns and meet the throughput

requirement of networks, a successful network intrusion detection system must have


a memory-efficient pattern- matching algorithm and hardware design.

3.1 SINGLE-KEYWORD PATTERN MATCHING ALGORITHMS


Pattern matching algorithms solve the general keyword pattern matching
problem. That is, given a fixed and finite non-empty set of keywords and an input
string, they find all occurrences of any of the keywords in the input string [8]. In
this problem the input string is finite as well, but often a set of (multiple) input strings is
used as input when searching for the keywords. In our case in particular, the input strings
will be packets in the detection engine of the NIDS, and therefore, there will be many of
them to process rapidly. This implies an alphabet size of 256 (quite large), and the size of
the set of keywords on the other hand will be extremely small in comparison to the
number of input strings. Furthermore, the set of keywords is known before the algorithm
begins processing the input. Should this not be the case, if efficient modifications to the
keyword set are needed, the searching process is known as dynamic string matching [6].

Herein, we refer to computation performed on the set of keywords before


processing the input as offline computation, pre-computation, or preprocessing. Because
the time involved in pattern matching (processing the input in search of matches) will far
outweigh the pre-computation time, the performance of the pre-computation is not
emphasized. This is typical when analyzing a pattern matching algorithm, and partly
because the length of the input to be processed may not be available at pre-computation
time and takes priority over the length of the algorithm parameters to be preprocessed
(i.e. the keyword set). Thus, our assumptions about the keyword set and input string set
are common assumptions.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

The general keyword pattern matching problem has a specific instance that has
been shown empirically as easier to solve than the general problem. Before examining
algorithms of interest to solve the general problem in Chapter 4, we present some classic
algorithms solving this special case as necessary background and a stepping stone to
understanding the solutions to the general problem. This special situation is the case
when the size of the keyword set is one. This is also known as a singleton keyword set
[8].
For the discussion of the single-keyword pattern matching algorithms we use the
convention that the keyword x has length m and the input string y has length n. The
lengths represent the number of characters and in our case a character is any possible 8-
bit configuration, thus, taking one byte. We do not consider the case of multiple input
strings as the algorithms are simply repeated for all input strings. Of course customarily,
the pre-computation only needs to be done once before processing the first of the input
strings. Note that in our pseudocode for the pattern matching algorithms presented in this
thesis, we use the convention of outputting all the indexes of the character (byte) in y that
matches the leftmost character in the keyword x (or in a keyword from the keyword set in
the multiple-pattern matching algorithms presented in following chapters).
The two key criteria we examine with the description of each algorithm is the
running-time performance of the algorithm and the memory space required. The running-
time performance, also referred to as time complexity, is measured in the number of
machine steps, and in this case we are primarily concerned with character or byte
comparisons. Of course, fewer steps correspond to a faster and more favorable algorithm.
The time complexity will be considered for the average and worst case of the algorithms
execution. By this we mean that the performance may depend on the keyword and the
input as well as the algorithm itself. Another way to think of this is that the time
complexity changes depending on the algorithm parameters. The average- and worst-case
time complexities are the cases when the parameters cause the algorithms performance
to respectively either behave as expected on average or degrade to its worst
performancewhereby the time and memory complexity may increase.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

The second criterion, the amount of memory consumed while the algorithm runs,
is considered only in addition to the necessary space to store the keyword and input. Of
course the keywordand in the next chapter, the keyword setmust always be stored;
thus, we take this for granted and ignore it in our analysis. Often the same assumption is
made with the input. Although, the input, may actually be coming in on-the-fly, and most
algorithms need only to keep a certain amountsometimes called a windowof the
input. Generally, usage of less memory is favorable.

3.2 BRUTE FORCE ALGORITHM


The brute force algorithms methodology is very simple to understand. The brute
force algorithm consists in checking, at all positions in the input between positions 0 and
n m (left to right), whether an occurrence of the pattern starts there or not. After each
attempt, the algorithm shifts the pattern by exactly one position to the right [6].
Algorithm 3.1 gives the pseudocode for this simple approach to single-keyword pattern
matching
This algorithm is considered naive based on the fact that it does no pre-computation on
the keyword. However, that is advantageous if memory space is a concern, since it keeps only a
small constant (not a function of n or m) amount of information about its current position. The
worst-case time complexity of this algorithm is O(m(nm+1)) (equals O(nm)) [5], and the
expected number of comparisons is at most 2(nm+1) (equals O(n)) for randomly chosen strings
[2]. That is, on average we perform a constant number of comparisons at each position.

3.3 KARP-RABIN ALGORITHM


Rabin and Karp [9] proposed a pattern matching algorithm that also generalizes to
two-dimensional pattern matching [2]. This algorithm is based on the work of Rabin, and
specifically, uses the Rabin fingerprinting technique [6] which can be most simply
thought of as a number-theoretic notion for the purposes of the discussion here. Rabins
fingerprinting technique is similar to a hash function in its use in this algorithm. It has
special mathematical properties which we discuss further that account for it sometimes
being referred to as a rolling hash.

12

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

A Rabin fingerprint is a short tag like a hash value for a larger input [5]. These
fingerprints share the same property with output values from hash functions like MD5 in
that if two fingerprints are different, then the corresponding inputs that were used to
create them are different. Furthermore, there is only a small chance of two different
objects having the same fingerprint. We call this property a small probability of collision.

Consider the fingerprint of the keyword and the set of all (n m + 1) fingerprints made of the n
m + 1 substrings of length m found in the input string of length n. We can compare these fingerprints
instead of actually comparing the keyword against the portion of the input.

This saves comparisons at each position (compare to the inner loop in Algorithm
3.1) because instead of comparing things of length m, we compare only the fingerprints.
Moreover, typically if we used a normal hash function to calculate the fingerprints it
would take O(m) time to generate a fingerprint (linear in the length of the input to be
hashed); however, using Rabins fingerprinting method we can calculate a fingerprint on
an input of length m in constant time and using few simple machine operations. This is
not the case in general, but in the case at hand we can incrementally update the
fingerprint result as the window of length m slides over the input. During this time we
maintain only two fingerprints: the current one for the window and the one for the
keyword which together take O(size of fingerprint) memory space.
Consider two byte strings for the input and the keyword. The first fingerprint
over the first m bytes (0...m 1) of the n byte input is calculated in time O(m), but when
we shift the window to calculate the fingerprint of the input for the next m bytes (1...m)
the fingerprint can be updated in constant time. This is true for all subsequent positions as
we shift the window over the input string. The key to the speed of this algorithm is that
the full fingerprint calculation is done only twice: first for the keyword itself and once for
the first m bytes of the input. The first of these operations can be considered pre-
computation since it is only done once. The second of these operations will have to be
done once per input string to be searched, but we assume this is still only once in our
pseudocode (only one input string y is accepted). Hence, we say that the pre-computation
phase of the Karp-Rabin algorithm takes time O(m).

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

As the window slides through the input updating the fingerprint result, the result and the
keywords fingerprint can be compared to check for a match. This sliding and comparison process will
happen exactly n m + 1 times and each step can be done in constant time; therefore, the matching
phase of this algorithm runs in expected time O(n
m + 1).

Unfortunately, if the fingerprints match in the comparison it does not guarantee a


pattern match because there is a negligibly small probability that two Rabin fingerprints
are the same when the sources from which they were created are different (for a
discussion on this probability see Broder [5]). In order to guarantee a match the actual
keyword and the window portion of the input must be directly compared. When the
fingerprints match but the keyword is not matched it is referred to as a spurious hit [2]. In
theory if we had a spurious hit at every position of the input checked we would have a
worst-case time complexity of O (m(nm+1)) which is no better than the brute force
algorithm.
In practice, spurious hits can be made very infrequent by fine-tuning the
parameters to the Rabin fingerprinting algorithm. Therefore, if we do not concern
ourselves with this unlikely worst case the Karp-Rabin algorithm is a very good and
simple algorithm to implement once the Rabin fingerprinting algorithm is handled.
Lastly, it is also common to see implementations that use a simpler function than the
original Rabin fingerprint function. This works and keeps the same time complexity so
long as the new function maintains the constant time update property so that a window
slide to the next position and the corresponding fingerprint update is done in constant
time.
The update to the fingerprint at each position taking place in constant time is
theoretical, but in practice this operation may be simple or complex depending on the
chosen function; thus, this is an important consideration when changing the function.
Algorithm 3.2 demonstrates the Karp-Rabin method using a simpler function than the
Rabin fingerprinting technique. This simpler function from Corman et al. [2] and uses a
prime q. Arithmetic computations are performed modulo q.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

For efficiency in this algorithm with this simpler function, the prime value q
should be chosen such that 256q fits just inside one computer word, which allows all the
necessary computations to be performed with single-precision arithmetic [4]. The
constant 256 is chosen to match the alphabet size for our purposes.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

CHAPTER-4
FINITE STATE MACHINE
This chapter describes various finite state machines construction and the
procedure to design the transmission table.

4.1 FINITE STATE MACHINE


A finite state machine is usually specified in the form of a transition table, much
like the one shown in Table 4.1 below

Table 4.1 Set of transition rules for finite state machine


Condition Effect

Current state Input Output Next state

q0 --- 1 q2

q1 --- 0 q0

q2 0 0 q3

q2 1 0 q1

q3 0 0 q0

q3 1 0 q1

For each control state of the machine the table specifies a set of transition rules.
There is one rule per row in the table, and usually more than one rule per state. The
example table contains transition rules for control states named q0, q1, q2, and q3. Each
transition rule has four parts, each part corresponding to one of the four columns in the
table. The first two are conditions that must be satisfied for the transition rule to be
executable. They specify the control state in which the machine must be A condition on
the environment of the machine, such as the value of an input signal The last two
columns of the table define the effect of the application of a transition rule.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

They specify How the environment of the machine is changed, e.g., how the
value of an output signal changes The new state that the machine reaches if the transition
rule is applied In the traditional finite state machine model, the environment of the
machine consists of two finite and disjoint sets of signals: input signals and output
signals. Each signal has an arbitrary, but finite, range of possible values.
The condition that must be satisfied for the transition rule to be executable is then
phrased as a condition on the value of each input signal, and the effect of the transition
can be a change of the values of the output signals. The machine in Table 4.1 illustrates
that model. It has one input signal, named In, and one output signal, named Out.
A dash in one of the first two columns is used as a shorthand to indicate a dont
care condition (that always evaluates to the boolean value true). A transition rule, then,
with a dash in the first column applies to all states of the machine, and a transition rule
with a dash in the second column applies to all possible values of the input signal. Dashes
in the last two columns can be used to indicate that the execution of a transition rule does
not change the environment. A dash in the third column means that the output signal does
not change, and similarly, a dash in the fourth column means that the control state
remains unaffected. In each particular state of the machine there can be zero or more
transition rules that are executable. If no transition rule is executable, the machine is said
to be in an end state. If precisely one transition rule is executable, the machine makes a
deterministic move to a new control state. If more than one transition rule is executable a
nondeterministic choice is made to select a transition rule. A nondeterministic choice in
this context means that the selection criterion is undefined. Without further information
either option is to be considered equally likely. From here on, we will call machines that
can make such choices nondeterministic machines. 2 Table 4.2 illustrates the concept.
Two transition rules are defined for control state q1. If the input signal is one, only the
first rule is executable. If the input signal is zero, however, both rules will be executable
and the machine will move either to state q0 or to state q3.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Table 4.2 State Transition of q1 in Finite State Machine

Current state Input Output Next state

q1 --- 0 q0

q1 0 0 q3

The behavior of the machine in Table 4.2 is more easily understood when
represented graphically in the form of a state transition diagram, as shown in Figure 4.1.

Fig 4.1 State transversal diagram of finite state machine


The control states are represented by circles, and the transition rules are specified
as directed edges. The edge labels are of the type C/E, where C specifies the transition
condition (e.g., the required set of input values) and E the corresponding effect (e.g., a
new assignment to the set of output values). Turing machines, the above definition of a
finite state machine is intuitively the simplest. There are many variants of this basic
model that differ in the way that the environment of the machines is defined and thus in
the definition of the conditions and the effects of the transition rules. For truly finite state
systems, of course, the environment must be finite state as well (e.g., it could be defined
as another finite state machine). If this requirement is dropped, we obtain the well-known
Turing Machine model. It is used extensively in theoretical computer science as the
model of choice in, for instance, the study of computational complexity.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

The Turing machine can be seen as a generalization of the finite state machine
model, although Turings work predates that of Mealy and Moore by almost two decades.
The environment in the Turing machine model is a tape of infinite length. The tape
consists of a sequence of squares, where each square can store one of a finite set of tape
symbols. All tape squares are initially blank. The machine can read or write one tape
square at a time, and it can move the tape left or right, also by one square at a time.
Initially the tape is empty and the machine points to an arbitrary square. The condition of
a transition rule now consists of the control state of the finite state machine and the tape
symbol that can be read from the square that the machine currently points to. The effect
of a transition rule is the potential output of a new tape symbol onto the current square, a
possible left or right move, and a jump to a new control state. The tape is general enough
to model a random access memory, be it an inefficient one. Table 4.3 illustrates this type
of finite state machine.

Table 4.3 Two output finite state machine


Condition Effect
Current state Input Output Next state
q0 0 1/L q1
q0 1 1/R q2
q1 0 1/R q0
q1 1 1/L ---
q2 0 1/R q1
q2 1 1/L q3
q3 --- --- ---

This machine has two output signals. one is used to overwrite the current square
on the tape with a new symbol, and one is used to move the tape left or right one square.
State q3 is an END state. It is fairly hard to define an extension of this variant of the model
with a practical method for modeling the controlled interaction of multiple finite state
machines.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

The obvious choice would be to let one machine read a tape that is written by
another, but this is not very realistic. Furthermore, the infinite number of potential states
for the environment means that many problems become computationally intractable. For
the study of protocol design problems, therefore, we must explore other variants of the
finite state machine.

4.2 COMMUNICATING FINITE STATE MACHINES


Consider what happens if we allow overlap of the sets of input and output signals
of a finite state machine of the type shown in Table 4.1. In all fairness, we cannot say
what will happen without first considering in more detail what a signal is. We assume
that signals have a finite range of possible values and can change value only at precisely
defined moments. The machine executes a two-step algorithm. In the first step, the input
signal values are inspected and an arbitrary executable transition rule is selected. In the
second step, the machine changes its control state in accordance with that rule and
updates its output signals. These two steps are repeated forever. If no transition rule is
executable, the machine will continue cycling through its two-step algorithm without
changing state, until a change in the input signal values, effected by another finite state
machine, makes a transition possible.
A signal, then, has a state, much like a finite state machine. It can be interpreted
as a variable that can only be evaluated or assigned to at precisely defined moments. The
behavior of the machine from Table 4.1 is now fully defined, even if we assume a
feedback from the output to the input signal. In this case the machine will loop through
the following sequence of three states forever: q0, q2, q1. At each step, the machine
inspects the output value that was set in the previous transition. The Behavior of the
machine is independent of the initial value of the input signal. We can build elaborate
systems of interacting machines in this way, connecting the output signals of one
machine to the input signals of another.
The machines must share a common clock for their two-step algorithm, but
they are not otherwise synchronized. If further synchronization is required, it must be
realized with a subtle system of handshaking on the signals connecting the machines.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Three noticeable features it is a hard problem, it has been solved, and, from the
protocol designers point of view, it is irrelevant. Most systems provide a designer with
higher-level synchronization primitives to build a protocol. An example of such
synchronization primitives are the send and receive operations defined in PROMELA

ASYNCHRONOUS COUPLING In protocol design, finite state machines are most useful if they
can directly model the phenomena in a distributed computer system.
There are two different and equally valid ways of doing this, based on an
asynchronous or a synchronous communication model. With the asynchronous model, the
machines are coupled via bounded FIFO(first-in first-out) message queues. The signals of
a machine are now abstract objects called messages. The input signals are retrieved from
input queues, and the output signals are appended to output queues. All queues, and the
sets of signals, are still finite, so we have not given up the finiteness of our model.
Synchronization is achieved by defining both input and output signals to be conditional
on the state of the message queues. If an input queue is empty, no input signal is available
from that queue, and the transition rules that require one are un executable . If an output
queue is full, no output signal can be generated for that queue, and the transition rules
that produce one are similarly un executable.
From this point on we restrict the models we are considering to those with no
more than one synchronizing event per transition rule; that is, a single rule can specify an
input or an output, but not both. The reason for this restriction is twofold. First, it
simplifies the model. We do not have to consider the semantics of complicated
composites of synchronizing events that may be inconsistent (e.g., two outputs to the
same output queue that can accommodate only one of the two). Second, it models the real
behavior of a process in a distributed system more closely. Note that the execution of a
transition rule is an atomic event of the system. In most distributed systems a single send
or receive operation is guaranteed to be an atomic event. It is therefore appropriate not to
assume yet another level of interlocking in our basic system model.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Table 4.4 Acknowledgement and retransmission of the output


Current state Input Output Next state

q0 --- Merg0 q1

q1 Ack1 --- q0

q1 Ack0 --- q2

q2 --- Merg1 q3

q2 Ack0 --- q2

q3 Ack1 --- q0

The table can model the possibility of retransmissions in this way, though not
their probability. Fortunately, this is exactly the modeling power we need in a system that
must analyze protocols independently of any assumptions on the timing or speed of
individual processes. The last received message can be accepted as correct in states q1
and q4. A state transition diagram for Tables 4.4 and 4.5 is given in Figure 4.2. The
timeout option in the sender would produce and extra self-loop on states q1 and q3.

Fig 4.2 FSM for transition table 4.4 Fig 4.3 FSM for table 4.5

We do not have parameter values in messages just yet. In the above model the
value of the alternating bit is therefore tagged onto the name of each message.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

4.3 FORMAL DESCRIPTION


Let us now see if we can tidy up the informal definitions discussed so far. A
communicating finite state machine can be defined as an abstract demon that accepts
input symbols, generates output symbols, and changes its inner state in accordance with
some predefined plan. For now, these symbols or messages are defined as abstract
objects without
contents. We will consider the extensions required to include value transfer in Section
7.8. The finite state machine demons communicate via bounded FIFO queues that map
the output of one machine upon the input of another. Let us first formally define the
concept of a queue. A message queue is a triple (S, N, C), where:
S is a finite set called the queue vocabulary
N is an integer that defines the number of slots in the queue,
and C is the queue contents, an ordered set of elements from S.
The elements of S and C are called messages. They are uniquely named, but
otherwise undefined abstract objects. If more than one queue is defined we require that
the queue vocabularies be disjoint. Let M be the set of all messages queues, a superscript
mxM is used to identify a single queue, and an index nxN is used to identify a slot within
the queue. Cn m, then, is the nth message in the mth queue. A system vocabulary V can
be defined as the conjunction of all queue vocabularies, plus a null element that we
indicate with the symbol x. Given the set of queues M, numbered from 1 to M, the system
vocabulary V is defined as Now, let us define a communicating finite state machine. A
communicating finite state machine is a triple (Q, q0, M, T), where Q is a finite, non-
empty set of states, q0 is an element of Q, the initial state, M is a set of message queues,
as defined above, and T is a state transition relation. Relation T takes two arguments,
T(q,a), where q is the current state and a is an action. So far, we allow just three types of
actions: inputs, outputs, and a null action.
The excitability of the first two types of actions is conditional on the state of the
message queues. If executed, they both change the state of precisely one message queue.
Beyond this, it is immaterial, at least for our current purposes, what the precise definition
of an input or an output action. The transition relation T defines a set of zero or more
possible successor states in set Q for current state q.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

This set will contain precisely one state, unless non-determinism is modeled, as in
Table 4.2. When T(q,a) is not explicitly defined, we assume T(q,a). T(q,a) specifies
spontaneous transitions. A sufficient condition for these transitions to be executable is
that the machine be in state q.

4.4 MINIMIZATION OF MACHINES


Consider the finite state machine shown in Table 4.4, with the corresponding
state transition diagram in Figure 4.3.

Table 4.5 Acknowledgement and retransmission of the output

Condition Effect

current state Input output Next state

q0 merg1 --- q1

q0 merg0 --- q2

q1 --- ack q0

q2 --- ack q0

Though this machine has three states fewer than the machine from Table 4.5, it
certainly look like it behaves no differently. Two machines are said to be equivalent if
they can generate the same sequence of output symbols when offered the same sequence
of input symbols. The key word here is can. The machines we study can make
nondeterministic choices between transition rules if more than one is executable at the
same time.
This non determinism means that even two equal machines can behave differently
when offered the same input symbols. The rule for equivalence is that the machines must
have equivalent choices to be in equivalent states

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

States within a single machine are said to be equivalent if the machine can be
started in any one of these states and generate the same set of possible sequences of
outputs when offered any given test sequence of inputs. The definition of an appropriate
equivalence relation for states, however, has to be chosen with some care. Consider the
following PROMELA process
proctype A()
{ if
:: q?a -> q?b
:: q?a -> q?c
End if
}
Under the standard notion of language equivalence that is often defined for
deterministic finite state machines, this would be equivalent to
Proctype B()
{ q?a;
if
:: q?b
:: q?c
End if
}
Since the set of all input sequences (the language) accepted by both machines is
the same. It contains two sequences, of two messages each: { q?a;q?b , q?a;q?c } The
behavior of the two processes, however, is very different. The input sequence q?a;q?b,
for instance, is always accepted by process B but may lead to an unspecified reception in
process A. For nondeterministic communicating finite state machines, therefore processes
A and B are not equivalent. The definitions given below will support that notion. In the
following discussion of equivalence, state minimization, and machine composition, we
will focus exclusively on the set of control states Q and the set of transitions T of the
finite state machines. Specifically, the internal state of the message queues in set M is
considered to be part of the environment of a machine and not contributing to the state of
the machine itsel

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

That this is a safe assumption needs some motivation. Consider, as an extreme


case, a communicating finite state machine that accesses a private message queue to store
internal state information. It can do so by appending messages with state information in
the queue and by retrieving that information later. The message queue is internal and
artificially increases the number of states of the machine.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

CHAPTER-5
PROPOSED ALGORITHM

5.1 REVIEW OF AC ALGORITHM


This is the review on the AC algorithm. Among all memory architectures,
the AC algorithm has been widely adopted for string matching in because the algorithm
can effectively reduce the number of state transitions and therefore the memory size.
Using the same example as in Fig. 5.1 and 5.2 shows the state transition diagram derived
from the AC algorithm where the solid lines represent the valid transitions while the
dotted lines represent a new type of state transition called the failure transitions. The
failure transition is explained as follows. .
When a current state and an input character is given to the AC machine first
checks whether there is a valid transition for the input character; otherwise, the machine
jumps to the next state where the failure transition points. Then, the machine recursively
considers the same input character until the character causes a valid transition. Consider
an example when an AC machine is in state 1 and the input character is 'c' then the
machine jumps to state'2',otherwise jumps to state '0'

Fig 5.1 Finite state machine

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Table 5.1 Transition table for the 5.1 state machine


State Input Character Next State Failure State Match Vector
State 0 b 1 0 00
State 0 p 5 0 00
State 1 c 2 0 00
State 2 d 3 0 00
State 3 f 4 0 01
State 5 c 6 0 00
State 6 d 7 0 00
State 7 g 8 0 10

According to the AC state table in Fig. 5.1, there is no valid transition from state 1
given the input character. When there is no valid transition, the AC machine takes a
failure transition back to state 0. Then in the next cycle, the AC machine reconsiders the
same input character in state 0 and finds a valid transition to state 5. This example shows
that an AC machine may take more than one cycle to process an input character. In
Fig.5.3, the double-circled nodes indicate the final states of patterns. In Fig.5.3, state 4,
the final state of the first string pattern b c d f, stores the match vector and state 8, the
final state of the second string pattern p c d g, stores the match vector of . Except the
final states, the other states stores the match vector to simply express those states are not
final states.
Due to the common substrings of string patterns, the compiled AC machine may
have states with similar transitions. Despite the similarity, those similar states are not
equivalent states and cannot be merged directly. In this section, we first show that
functional errors can be created if those similar states are merged directly. Then, we
propose a mechanism that can rectify those functional errors after merging those similar
states. In Fig. 5.3, states 2 and 6 are similar because they have identical input transitions ,
identical failure transitions to state 0. Also, states 3 and 7 are similar. Note that merging
similar states results in an erroneous state machine. As shown in Fig. 5.4, the
UCET [Type text] Dept. of ECE
AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

state machine merges the similar states 2 and 6 to become state 26, and merges the
similar states 3 and 7 to become state 37.
Again, we refer to the state machine that merges the similar states as
the merg_FSM. Given an input string pcdf, the original AC state machine shown in
Fig. 4.3 moves from state 0, through state 5, state 6, state 7, and then takes a failure
transition to state 0. On the other hand, the merg_FSM moves from state 0, through state
5, state 26, state37, and finally reaches state 4 which indicates the final state of the
pattern bcdf. As a result of merging similar states, the input string pcdf is mistaken as
a match of the pattern bcdf. This example shows the merg_FSM may causes false
positive results.
The merg_FSM is a different machine from the original state machine but with a
smaller number of states and transitions. A direct implementation of merg_FSM has a
smaller memory than the original state machine in the memory architecture. Our
objective is to modify the AC algorithm so that we can store only the state transition table
of merg_FSM in memory while the overall system still functions correctly as the original
AC state machine does. The overall architecture of our state traversal machine is shown
in Fig. 5.6. The new state traversal mechanism guides the state machine to traverse on the
merg_FSM and provides correct results as the original AC state machine.

5.2 PROPOSED ALGORITHM (AC_ALGORITHM+MERG_FSM)


In the previous example, state 26 represents two different states (state 2 and state
6) and state 37 represents two different states (state 3 and state 7).We have shown that
directly merging similar states leads to an erroneous state machine. To have a correct
result, when state 26 is reached, we need a mechanism to understand in the original AC
state machine whether it is state 2 or state 6. Similarly, when state 37 is reached, we need
to know in the original AC state machine whether it is state 3 or state 7. In this example,
we can differentiate state 2 or state 6 if we can memorize the precedent state of state 26.
If the precedent state of state 26 is state 1, we know that in the original AC state machine,
it is state 2. On the other hand, if the precedent state of state 26 is state 5, the original is
state 6. This example shows that if we can memorize the precedent state entering the
merged states, we can differentiate all merged states. In the

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS .

following section, we discuss how the precedent path vector can be retained during the
state traversal in the merg_FSM.
First of all, we would like to mention that in a traditional AC state machine, a
final state stores the corresponding match vector which is one-hot encoded.

Fig5.2 Transitions in AC algorithm

Table 5.2 Transition table of ac algorithm for fig 5.2


State Input character Next state Failure state Match vector
State 0 b 1 0 01_0
State 0 p 5 0 10_0
State 1 c 2 0 01_0
State 2 d 3 0 01_0
State 3 f 4 0 01_1
State 5 c 6 0 10_0
State 6 d 7 0 10_0
State 7 g 8 0 10_1

For example in Fig. 5.2, state 4, the final state of the first string pattern bcdf,
stores the match vector 01 and, the final state of the second string pattern pcdg, is state
8 stores the match vector of 10 . Except for the final states, the other states store simply to
express those states are not final states. One-hot encoding for a match vector is necessary
because a final state may represent more than one matched string pattern .

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Therefore, the width of the match vector is equal to the number of string patterns.
As shown in Fig. 5.1, the majority of memories in the column match vector store the
zero vectors {00} which are not efficient.
But in the above figure 5.2 the state '2' & state '6' and the state '3' &state '7'
represents the same states(i.e, c,d respectively).so, we are directly merging the states
'2'&'6' and forming as state 26,and by merging the states '3' &'7' forming &naming as
state 37.

Fig.5.3 Transition in proposed algorithm (ac_algorithm+merg-fsm)

Table 5.3 Transitions table of ac algorithm for fig 5.3

State Input character Next state Failure state Match vector


State 0 b 1 0 01_0
State 0 p 5 0 10_0
State 1 c 26 0 11_0
State 5 c 26 0 11_0
State 26 d 37 0 11_0
State 37 f 4 0 01_1
State 37 g 8 0 10_1

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

In the previous example, state 26 represents two different states (state 2 and state
6) and state 37 represents two different states (state 3 and state 7).We have shown that
directly merging similar states leads to an erroneous state machine. To have a correct
result, when state 26 is reached, we need a mechanism to understand in the original AC
state machine whether it is state 2 or state 6. Similarly, when state 37 is reached, we need
to know in the original AC state machine whether it is state 3 or state 7. In this example,
we can differentiate state 2 or state 6 if we can memorize the precedent state of state 26.
If the precedent state of state 26 is state 1, we know that in the original AC state machine,
it is state 2. On the other hand, if the precedent state of state 26 is state 5, the original is
state 6. This example shows that if we can memorize the precedent state entering the
merged states, we can differentiate all merged states. In the following section, we discuss
how the precedent path vector can be retained during the state traversal in the
merg_FSM.
In our design, we reuse those memory spaces storing zero vectors {00} to store
useful path information called pathVec. First, each bit of the pathVec corresponds to a
string pattern. Then, if there exists a path from the initial state to a final state, which
matches a string pattern, the corresponding bit of the pathVec of the states on the path
will be set to 1. Otherwise, they are set to 0. Consider the string pattern bcdf whose
final state is state 4 in Fig. 5.2. The path from state 0, via states 1, 2, 3 to the final state 4
matches the first string pattern bcdf. Therefore, the first bit of the pathVec of the states
on the path, {state 0, state 1, state 2, state 3, and state 4}, is set to 1. Similarly, the path
from state 0, via states 5, 6, 7 to the final state 8 matches the second string pattern
pcdg. Therefore, the second bit of the pathVec of the states on the path, {state 0, state
5, state 6, state 7, and state 8}, is set to 1. In addition, we add an additional bit, called
ifFinal, to indicate whether the state is a final state. For example, because states 4 and 8
are final states, the ifFinal bits of states 4 and 8 are set to 1, the others are set to 0. As
shown in Fig. 5.3, each state stores the pathVec and ifFinal as the form, pathVec_
ifFinal. Compared with the original AC state machine in Fig. 5.2&Fig.5.3, we add only
an additional bit to each state and reduced the complexity of design.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Example: The state 4 stores the Path Vec_If final as 01_1


The state 8 stores the Path Vec_If final as 10_1
In this project, I proposed a pattern-matching algorithm which can significantly
reduce the memory requirement. I presented a new pattern matching algorithm(Aho-
corasick algorithm+merg-fsm) which can significantly reduce the number of states and
transitions by merging pseudo-equivalent states while maintaining correctness of string
matching.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

CHAPTER-6
VHDL DESCRIPTION AND FPGA OVERVIEW

This Chapter deals with basic VHDL implementation of a design and basic
concept of FPGA.

6.1 VHDL DESCRIPTION


High Level Specifications are nothing but the requirements to understand and
begin the design. In this stage the designer main aim is to capture the behavior of the
design using mostly behavioral constructs of the HDLs. The next step after capturing the
designs functionality is to segregate the design in all possible ways and try to write a
synthesizable code which infers available primitives from the library.
Then comes the synthesis step which is actually target driven. Here we have an
FPGA as the target device. Then Implementation is nothing but the process of placing
and routing the design on an FPGA. Mostly it is a tool driven and no manual intervention
of the designer is required. Designer only needs to specify constrain file in design if any.
Fig 6.1 shows Flow chart
High level Specifications Behavioral Description

RTL Design RTL Simulation

Synthesis
Target Device(FPGA)

Implementation
PAR

Bit File
Dumping on FPGA

Fig. 6.1 Design Flow Chart

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Fig. 6.2 Bit File Burning

The Final stage after doing the place and route successfully and fully satisfied
with mapping and other reports is the bit file generation phase. This bit file has to be
downloaded on to FPGA via a JTAG cable. Fig 6.2 shows a simple setup for this.

6.2 FPGA OVERVIEW


An FPGA is a device that contains a matrix of reconfigurable gate array logic
circuitry. When a FPGA is configured, the internal circuitry is connected in a way that
creates a hardware implementation of the software application. Unlike processors,
FPGAs use dedicated hardware for processing logic and do not have an operating system.
FPGAs are truly parallel in nature so different processing operations do not have to
compete for the same resources. As a result, the performance of one part of the
application is not affected when additional processing is added. Also, multiple control
loops can run on a single FPGA device at different rates. FPGA-based control systems
can enforce critical interlock logic and can be designed to prevent I/O forcing by an
operator. However, unlike hard-wired printed circuit board (PCB) designs which have
fixed hardware resources, FPGA-based systems can literally rewire their internal circuitry
to allow reconfiguration after the control system is deployed to the field. FPGA devices
deliver the performance and reliability of dedicated hardware circuitry.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

A single FPGA can replace thousands of discrete components by incorporating


millions of logic gates in a single integrated circuit (IC) chip. The internal resources of an
FPGA chip consist of a matrix of configurable logic blocks (CLBs) surrounded by a
periphery of I/O blocks shown in Fig. 5.1. Signals are routed within the FPGA matrix by
programmable interconnect switches and wire routes.

Fig.6.3 Internal structure of FPGA


In an FPGA logic blocks are implemented using multiple level low fan-in gates,
which gives it a more compact design compared to an implementation with two-level
AND-OR logic. FPGA provides its user a way to configure:
1. The intersection between the logic blocks and
2. The function of each logic block.
Logic block of an FPGA can be configured in such a way that it can provide
functionality as simple as that of transistor or as complex as that of a microprocessor. It
can used to implement different combinations of combinational and sequential logic
functions. Logic blocks of an FPGA can be implemented by any of the following:

1. Transistor pairs

2. Combinational gates like basic NAND gates or XOR gates

3. n-input Lookup tables

4. Multiplexers

5. Wide fan-in AND-OR structure.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Routing in FPGAs consists of wire segments of varying lengths which can be


interconnected via electrically programmable switches. Density of logic block used in an
FPGA depends on length and number of wire segments used for routing. Number of
segments used for interconnection typically is a tradeoff between density of logic blocks
used and amount of area used up for routing. Simplified version of FPGA internal
architecture with routing is shown in Fig. 6.4

Fig.6.4 FPGA structure


6.2.1 Why do we need FPGAs?
By the early 1980s large scale integrated circuits (LSI) formed the back bone of most
of the logic circuits in major systems. Microprocessors, bus/IO controllers, system timers
etc were implemented using integrated circuit fabrication technology. Random glue
logic or interconnects were still required to help connect the large integrated circuits in
order to:
1. Generate global control signals (for resets etc.)
2. Data signals from one subsystem to another sub system.
Systems typically consisted of few large scale integrated components and large
number of SSI (small scale integrated circuit) and MSI (medium scale integrated circuit)
components. Initial attempt to solve this problem led to development of Custom ICs
which were to replace the large amount of interconnect.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

This reduced system complexity and manufacturing cost, and improved performance.
However, custom ICs have their own disadvantages. They are relatively very expensive
to develop, and delay introduced for product to market (time to market) because of
increased design time. There are two kinds of costs involved in development of custom
ICs

1. Cost of development and design

2. Cost of manufacture

Therefore the custom IC approach was only viable for products with very high
volume, and which were not times to market sensitive. FPGAs were introduced as an
alternative to custom ICs for implementing entire system on one chip and to provide
flexibility of re-programability to the user. Introduction of FPGAs resulted in
improvement of density relative to discrete SSI/MSI components (within around 10x of
custom ICs). Another advantage of FPGAs over Custom ICs is that with the help of
computer aided design (CAD) tools circuits could be implemented in a short amount of
time (no physical layout process, no mask making, no IC manufacturing).
6.2.2 Evaluation of FPGA
In the world of digital electronic systems, there are three basic kinds of devices:
memory, microprocessors, and logic. Memory devices store random information such as
the contents of a spreadsheet or database. Microprocessors execute software instructions
to perform a wide variety of tasks such as running a word processing program or video
game. Logic devices provide specific functions, including device-to-device interfacing,
data communication, signal processing, data display, timing and control operations, and
almost every other function a system must perform.
The first type of user-programmable chip that could implement logic circuits was
the Programmable Read-Only Memory (PROM), in which address lines can be used as
logic circuit inputs and data lines as outputs. Logic functions, however, rarely require
more than a few product terms, and a PROM contains a full decoder for its address
inputs. PROMS are thus an inefficient architecture for realizing logic circuits, and so are
rarely used in practice for that purpose.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

The device that came as a replacement for the PROMs are programmable logic
devices or in short PLA. Logically, a PLA is a circuit that allows implementing Boolean
functions in sum-of-product form. The typical implementation consists of input buffers
for all inputs, the programmable AND-matrix followed by the programmable OR-matrix,
and output buffers. The input buffers provide both the original and the inverted values of
each PLA input. The input lines run horizontally into the AND matrix, while the so-
called product-term lines run vertically. Therefore, the size of the AND matrix is twice
the number of inputs times the number of product-terms.
When PLAs were introduced in the early 1970s, by Philips, their main drawbacks
were that they were expensive to manufacture and offered somewhat poor speed-
performance. Both disadvantages were due to the two levels of configurable logic,
because programmable logic planes were difficult to manufacture and introduced
significant propagation delays. To overcome these weaknesses, Programmable Array
Logic (PAL) devices were developed. PALs provide only a single level of
programmability, consisting of a programmable wired AND plane that feeds fixed OR-
gates. PALs usually contain flip-flops connected to the OR-gate outputs so that sequential
circuits can be realized. These are often referred to as Simple Programmable Logic
Devices (SPLDs).
With the advancement of technology, it has become possible to produce devices
with higher capacities than SPLDs. As chip densities increased, it was natural for the
PLD manufacturers to evolve their products into larger (logically, but not necessarily
physically) parts called Complex Programmable Logic Devices (CPLDs). For most
practical purposes, CPLDs can be thought of as multiple PLDs (plus some programmable
interconnect) in a single chip. The larger size of a CPLD allows to implement either more
logic equations or a more complicated design.

Fig. 6.5 Internal structure of CPLD

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Block diagram of a hypothetical CPLD is shown in Fig. 6.5. Each of the four
logic blocks shown there is the equivalent of one PLD. However, in an actual CPLD
there may be more (or less) than four logic blocks. These logic blocks are themselves
comprised of macro cells and interconnect wiring, just like an ordinary PLD.
Unlike the programmable interconnect within a PLD, the switch matrix within a
CPLD may or may not be fully connected. In other words, some of the theoretically
possible connections between logic block outputs and inputs may not actually be
supported within a given CPLD. The effect of this is most often to make 100% utilization
of the macro cells very difficult to achieve. Some hardware designs simply won't fit
within a given CPLD, even though there are sufficient logic gates and flip-flops
available. Because CPLDs can hold larger designs than PLDs, their potential uses are
more varied. They are still sometimes used for simple applications like address decoding,
but more often contain high-performance control-logic or complex finite state machines.
At the high-end (in terms of numbers of gates), there is also a lot of overlap in potential
applications with FPGAs. Traditionally, CPLDs have been chosen over FPGAs whenever
high-performance logic is required. Because of its less flexible internal architecture, the
delay through a CPLD (measured in nanoseconds) is more predictable and usually
shorter.
The development of the FPGA was distinct from the SPLD/CPLD evolution just
described. This is apparent from the architecture of FPGA shown in Figure. FPGAs offer
the highest amount of logic density, the most features, and the highest performance. The
largest FPGA now shipping, part of the Xilinx Virtex line of devices, provides eight
million "system gates" (the relative density of logic). These advanced devices also offer
features such as built-in hardwired processors (such as the IBM Power PC), substantial
amounts of memory, clock management systems, and support for many of the latest, very
fast device-to-device signaling technologies. FPGAs are used in a wide variety of
applications ranging from data processing and storage, to instrumentation,
telecommunications, and digital signal processing. The value of programmable logic has
always been its ability to shorten development cycles for electronic equipment
manufacturers and help them get their product to market faster.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

6.3 FPGA STRUCTURAL CLASSIFICATION


Basic structure of an FPGA includes logic elements, programmable interconnects
and memory. Arrangement of these blocks is specific to particular manufacturer. On the
basis of internal arrangement of blocks FPGAs can be divided into three classes:
6.3.1 Symmetrical arrays
This architecture consists of logic elements (called CLBs) arranged in rows and
columns of a matrix and interconnect laid out between them shown in Figure. This
symmetrical matrix is surrounded by I/O blocks which connect it to outside world. Each
CLB consists of n-input Lookup table and a pair of programmable flip flops. I/O blocks
also control functions such as tri-state control, output transition speed. Interconnects
provide routing path. Direct interconnects between adjacent logic elements have smaller
delay compared to general purpose interconnect.
6.3.2 Row based architecture
Row based architecture consists of alternating rows of logic modules and
programmable interconnect tracks. Input output blocks is located in the periphery of the
rows. One row may be connected to adjacent rows via vertical interconnect. Logic
modules can be implemented in various combinations. Combinatorial modules contain
only combinational elements which Sequential modules contain both combinational
elements along with flip flops. This sequential module can implement complex
combinatorial-sequential functions. Routing tracks are divided into smaller segments
connected by anti-fuse elements between them.
6.3.3 Hierarchical PLDs
This architecture is designed in hierarchical manner with top level containing only
logic blocks and interconnects. Each logic block contains number of logic modules. And
each logic module has combinatorial as well as sequential functional elements. Each of
these functional elements is controlled by the programmed memory. Communication
between logic blocks is achieved by programmable inter connect arrays. Input output
blocks surround this scheme of logic blocks and interconnects.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

6.4 FPGA CLASSIFICATION ON USER PROGRAMMABLE


SWITCH TECHNOLOGIES
FPGAs are based on an array of logic modules and a supply of uncommitted wires
to route signals. In gate arrays these wires are connected by a mask design during
manufacture. In FPGAs, however, these wires are connected by the user and therefore
must use an electronic device to connect them. Three types of devices have been
commonly used to do this, pass transistors controlled by an SRAM cell, a flash or
EEPROM cell to pass the signal, or a direct connect using anti-fuses. Each of these
interconnect devices have their own advantages and disadvantages. This has a major
effect on the design, architecture, and performance of the FPGA. Classification of FPGAs
on user programmable switch technology is given in Fig. 6.6 shown below.

Fig. 6.6 FPGA Classification on user programmable technology


6.4.1 SRAM Based
The major advantage of SRAM based device is that they are infinitely re-
programmable and can be soldered into the system and have their function changed
quickly by merely changing the contents of a PROM. They therefore have simple
development mechanics. They can also be changed in the field by uploading new
application code, a feature attractive to designers. It does however come with a price as
the interconnect element has high impedance and capacitance as well as consuming much
more area than other technologies. Hence wires are very expensive and slow. The FPGA
architect is therefore forced to make large inefficient logic modules (typically a look up
table or LUT).

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

The other disadvantages are: They needs to be reprogrammed each time when
power is applied, needs an external memory to store program and require large area.
Figure 6.5 shows two applications of SRAM cells: for controlling the gate nodes of pass-
transistor switches and to control the select lines of multiplexers that drive logic block
inputs. The figures gives an example of the connection of one logic block (represented by
the AND-gate in the upper left corner) to another through two pass-transistor switches,
and then a multiplexer, all controlled by SRAM cells . Whether an FPGA uses pass-
transistors or multiplexers or both depends on the particular product.

Fig. 6.7 SRAM-controlled Programmable Switches.


6.4.2 Antifuse Based
The Antifuse based cell is the highest density interconnect by being a true cross
point. Thus the designer has a much larger number of interconnects so logic modules can
be smaller and more efficient. Place and route software also has a much easier time.
These devices however are only one-time programmable and therefore have to be thrown
out every time a change is made in the design. The Antifuse has an inherently low
capacitance and resistance such that the fastest parts are all Antifuse based. The
disadvantage of the antifuse is the requirement to integrate the fabrication of the antifuses
into the IC process, which means the process will always lag the SRAM process in
scaling. Antifuses are suitable for FPGAs because they can be built using modified
CMOS technology.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

6.4.3 EEPROM Based


The EEPROM/FLASH cell in FPGAs can be used in two ways, as a control
device as in an SRAM cell or as a directly programmable switch. When used as a switch
they can be very efficient as interconnect and can be reprogrammable at the same time.
They are also non-volatile so they do not require an extra PROM for loading. They,
however, do have their detractions. The EEPROM process is complicated and therefore
also lags SRAM technology.

6.5 VIRTEX-5 FAMILIES


Virtex-5 devices are produced on 65-nm triple-oxide technology, using 300 mm
(12 inch) wafer technology. Virtex-5 also adopts the ASMBL architecture (used in
Virtex-4) and maintains various platforms with a different mix of features, having each
platform devices of different densities:
Virtex-5 LX for general logic applications.
Virtex-5 LXT for logic with advanced serial connectivity.
Virtex-5 SXT for signal processing applications with advanced serial connectivity.
Virtex-5 TXT for performance systems with double density advanced serial connectivity.
Virtex-5 FXT for high-performance embedded systems with advanced serial
connectivity.

6.6 SYSTEM BLOCKS COMMON TO ALL VIRTEX-5 FAMILIES


6.6.1 Configurable logic blocks
A CLB element contains two slices1 (Fig. 6.8). These two slices do not have
direct connections to each other, and each slice is organized as a column. Each slice, in a
different column, has an independent carry chain. Each CLB element is connected to a
switch matrix to access to the general routing matrix (GRM).

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Fig.6.8 Arrangement of slices within the CLB


Slices contain four look-up tables, four storage elements, multiplexers, and carry
logic. Some slices, called SLICEM, also contain distributed RAM and 32-bit registers.
Slices without distributed RAM and 32-bit registers are called SLICEL. Each CLB can
contain zero or one SLICEM.
6.6.2 Look up table (LUT)
LUTs can implement Boolean functions, distributed RAM, ROM and shift
registers the function generators are implemented as six-input look-up tables2. There are
two independent outputs for each LUT. Each LUT can implement a six-input Boolean
function or a two five-input Boolean functions, as long as these two functions share the
inputs. Only one output is used for the six-input function. Both outputs are used when
two five-input. Boolean functions are implemented. In addition to the basic LUTs, slices
contain three multiplexers that are used to combine up to four function generators to
provide any function of seven or eight inputs in a slice. SLICEM LUTs can implement
distributed RAM elements (Table 1): single-port 321 bit RAM, dual-port 321 bit
RAM, quad-port 322 bit RAM, simple dual-port 326 bit RAM, single-port 641 bit
RAM, dual-port 641 bit RAM, quad-port 641 bit RAM, simple dualport643 bit
RAM, single-port 1281 bit RAM, dual-port 1281 bit RAM and single-port 2561 bit
RAM.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Table 6.1 Distributed RAM configuration


RAM Number of LUTs
32x1 single port 1
32x1 dual port 2
32x2 quad port 4
32x6 dual port 4
64x1 single port 1
64x1 dual port 2
64x1 quad port 4
64x3 simple quad port 4
128x1 single port 2
128x1 dual port 4
256x1 single port 4

A 641 bit ROM can be implemented by a LUT of SLICEM or SLICEL. Three


configurations are available: ROM 641 bit, ROM 1281 bit, ROM 2561 bit and
ROM1281 bit. ROM contents are loaded at device configuration. A SLICEM LUT can
be configured as a 32-bit shift register. Each LUT can delay serial data anywhere from
one to 32 clock cycles.
6.6.3 Multiplexers
LUTs and associated multiplexers can implement: 4:1 multiplexers, using one
LUT; 8:1 multiplexers, using two LUTs; and 16:1 multiplexers, using four LUTs.
6.6.4 Storage elements
The storage elements in a slice can be configured as either edge-triggered D-type
flip-flops or level-sensitive latches. The control signals clock, clock enable and set/reset
are common to both storage elements in one slice.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

6.6.5 Carry logic


A CLB has two separate carry chains that are cascadable to form wider
add/subtract logic. The carry chain is running upward and has a height of four bits per
slice. For each bit, there is a carry multiplexer and a dedicated XOR gate for
adding/subtracting the operands with a selected carry bits. The dedicated carry path and
the carry multiplexer can also be used to cascade function generators for implementing
wide logic functions.

6.7 XILINX VIRTEX- 5 LX50T

Fig. 6.9 Xilinx Virtex 5 LX50T [11]


6.7.1 Overview
The Genesys circuit board is a complete, ready-to-use digital circuit development
platform based on a Xilinx Virtex 5 LX50T. The large on-board collection of high-end
peripherals, including Gbit Ethernet, HDMI Video, 64-bit DDR2 memory array, and
audio and USB ports make the Genesys board an ideal host for complete digital systems,
including embedded processor designs based on Xilinxs Micro Blaze.
The Virtex5-LX50T is optimized for high performance logic and offers:
7,200 slices, each containing four 6-input LUTs and eight flip-flops
1.7Mbits of fast block RAM
12 digital clock managers
six phase-locked loops
500MHz+ clock speeds

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

The Genesys board includes Digilent's newest Adept USB2 system, which offers
device programming, real-time power supply monitoring, automated board tests, virtual
I/O, and simplified user-data transfer facilities. A second USB programming port, based
on the Xilinx programming cable, is also built into the board.
6.7.2 Features
Xilinx Virtex 5 LX50T FPGA, 1136-pin BGA package
256Mbyte DDR2 SODIMM with 64-bit wide data
10/100/1000 Ethernet PHY and RS-232 serial port
Multiple USB2 ports for programming, data, and hosting
HDMI video up to 1600x1200 and 24-bit color
AC-97 Codec with line-in, line-out, mic, and headphone
Real-time power monitors on all power rails
16Mbyte Strata Flash for configuration and data storage
Programmable clocks up to 400MHz
112 I/Os routed to expansion connectors
GPIO includes eight LEDs, two buttons, two-axis navigation switch, eight slide
switches, and a 16x2 character LCD
Ships with a 20W power supply and USB cable.
6.7.3 Configuration
After power-on, the FPGA on the Genesys board must be configured (or
programmed) before it can perform any functions. A USB-connected PC can configure
the board using the JTAG interface anytime power is on, or a file can be automatically
transferred from the Strata Flash ROM at power-on. An on-board "mode" jumper selects
which programming mode will be used.
Both Digilent and Xilinx freely distribute software that can be used to program
the FPGA and the Flash ROM. Configuration files stored in the ROM use the Byte
Peripheral Interface (BPI) mode. In BPI UP mode, the FPGA loads configuration data
from the Strata Flash in an ascending direction starting at address 000000. In BPI DOWN
mode, configuration data loads in a descending direction starting at address 03FFFF.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Fig. 6.10 Xilinx iMPACT USB port & JTAG Header

Once transferred, programming files are stored in SRAM-based memory cells


within the FPGA. These SRAM cells define the FPGAs logic functions and circuit
connections until they are erased, either by removing power or asserting the PROG_B
input.
FPGA configuration files transferred using the JTAG interface use the .bin and
.svf file types, and BPI files use the .bit, .bin, and .mcs file types. Xilinxs ISE Web Pack
and EDK software can create .bit, .svf, .bin, or .mcs files from VHDL, Verilog, or
schematic-based source files (EDK is used for Micro Blaze embedded processor-based
designs). Digilent's Adept software and Xilinx's iMPACT software can be used to
program the Genesys board from a PC's USB port.
During FPGA programming, a .bit or .svf file is transferred from the PC to the
FPGA using the USBJTAG port. When programming the ROM, a .bit, .bin, or .mcs file is
transferred to the ROM in a two step process. First, the FPGA is programmed with a
circuit that can transfer data from the USB-JTAG port into the ROM, and then data is
transferred to the ROM via the FPGA circuit (this complexity is hidden and a simple
program ROM interface is shown).

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

After the ROM has been programmed, it can automatically configure the FPGA at
a subsequent power-on or reset event if the Mode jumpers are set to the proper BPI mode.
A programming file stored in the Strata Flash ROM will remain until it is overwritten,
regardless of power-cycle events.
6.7.4 Adept System and iMPACT USB ports
The Genesys board includes two USB peripheral ports one for Adept software
and another for Xilinx's iMPACT software. Either port can program the FPGA and Strata
Flash, but Adept offers a simplified user interface and many additional features such as
automated board test and user-data transfers. The Adept port is also compatible with
iMPACT, if the Digilent Plug-In for Xilinx Tools is installed on the host PC. Here we are
using only Xilinx USB port. The Xilinx USB port is based on the Xilinx USB
programming cable. It can be accessed by all Xilinx CAD tools and iMPACT.

Fig. 6.11 Adept and iMPACT USB Ports


6.7.5 Power supplies
The Genesys board requires an external 5V 4A or greater power source with a
coax centre-positive 2.1mm internal-diameter plug (a suitable supply is provided as a part
of the Genesys kit). Voltage regulator circuits from Texas Instruments create the required
3.3V, 2.5V, 1.8V, 1.0V, and 0.9V supplies from the main 5V supply.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Genesys power supplies are controlled by a logic-level switch (SW9) that


enables/disables the power supply controller ICs. A power-good LED (LD8), driven by
the power good outputs on all supplies, indicates that all supplies are operating within
10% of nominal.

Fig. 6.12 Power Supply


A load switch (the TPS51100) passes the input voltage VU to the "Vswt" node,
depending on the state of the power switch. Vswt is assumed to be 5V, and is used by
many systems on the board including the LCD, HDMI ports, I2C bus, and USB host.
Vswt is also available at expansion connectors, so that any connected boards can be
turned off along with the Genesys board.
6.7.6 DDR2 Memory
A single small outline dual in-line memory module (SODIMM) connector is provided
and loaded with a Micron MT4HTF3264HY-667D3 (or equivalent) single-rank
unregistered 256Mbyte DDR2 module (additional address lines and chip selects are
routed, so that similar SODIMMs with densities up to 2GB may be used). Serial Presence
Detect (SPD) using an IIC interface to the DDR DIMM is also supported. The Genesys
board has been tested for DDR2 operation at a 400MHz data rate. Faster data rates might
be possible but are not tested.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

DDR2 memory expansion


The DDR2 interface support user installation of SODIMM modules with more
memory since higher order address and chip select signals are also routed from the
SODIMM to the FPGA.
DDR2 clock signal
Two matched length pairs of DDR2 clock signals are broadcast from the FPGA to
the SODIMM. The FPGA design is responsible for driving both clock pairs with low
skew. The delay on the clock trace is designed to match the delay of the other DDR2
control signals.
DDR2 signaling
All DDR2 SDRAM control signals are terminated through 47 resistors to a 0.9V
VTT reference voltage. The FPGA DDR2 interface supports SSTL18 signalling and
all DDR2 signals are controlled impedance. The DDR2 data mask and strobe signals
are matched length within byte groups. The ODT functionality of the SODIMM
should be utilized.
6.7.7 Flash memory
The Genesys board uses a 256Mbit Numonyx P30 parallel flash memory device
(organized as 16-bit by 16Mbytes) for non-volatile storage of FPGA configuration files.
Configuration files are stored using the byte-peripheral interface mode (BPI) in either up
or down configurations.
A single FPGA configuration file requires less than 16Mbits, leaving 140Mbits
available for user data. Data can be transferred to/from the Flash by user applications, or
by facilities built into the Adept software. A reference design on the Digilent website
provides an example of driving the Flash memory.

Fig. 6.13 Flash Memory

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Table 6.2 Flash Memory pin Specification


Address Signals Data signals

A0: K12 A13: K16 D0: AD19 D13: AH12


A1: K13 A14: K21 D1: AE19 D14: AH22
A2: H23 A15: J22 D2: AE17 D15: AG22
A3: G23 A16: L16 D3: AF16
A4: H12 A17: L15 D4: AD20
A5: J12 A18: L20 D5: AE21
A6: K22 A19: L21 D6: AE16
A7: K23 A20: AE23 D7: AF15
A8: K14 A21: AE22 D8: AH13
A9: L14 A22: AG12 D9: AH14
A10: H22 A23: AF13 D10: AH19
A11: G22 A24: AG23 D11: AH20
A12: J15 D12: AG13

6.7.8 Ethernet PHY


The Genesys board includes a Marvell Alaska Tri-mode PHY (the 88E1111) paired
with a Halo HFJ11-1G01E RJ-45 connector. Both MII and GMII interface modes are
supported at 10/100/1000 Mb/s. Default settings used at power-on or reset are:
MII/GMII mode to copper interface
Auto Negotiation Enabled, advertising all speeds, preferring Slave
MDIO interface selected, PHY MDIO address = 00111
No asymmetric pause, no MAC pause, automatic crossover enabled
Energy detect on cable disabled (Sleep Mode disabled), interrupt polarity LOW
EDK-based designs can access the PHY using either the xps_ethernetlite IP core for
10/100 Mbps designs, or the xps_ll_temac IP core for 10/100/1000 Mbps designs. The
xps_ll_temac IP core uses the hard Ethernet MAC hardware core included in the Virtex 5
FPGA.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

6.7.9 USB host


A Cypress CY7C67300 USB controller provides the Genesys board with USB
host and peripheral capability. The CY7C67300 includes two serial interface engines
(SIE) that can be used independently. SIE1 is connected to a Type A USB host connector
(J8), and SIE2 is connected to a Type B USB peripheral connector (J9).
The USB controller has an internal microprocessor to assist in processing USB
commands; a dedicated IIC EEPROM (IC9) is available for storing firmware. Firmware
can be developed for the processor and/or written to the EEPROM using the Cypress
CY3663 EZ-OTG/EZ-Host development kit available from Cypress. To assist with
debug, the USB controller's two-wire serial port is connected to two FPGA pins (USBRX
to FPGA pin V9, USB-TX to FPGA pin W7) using LVCMOS33 I/O standards. Jumper
JP14 can be installed to prevent the USB controller from executing firmware stored in the
IIC EEPROM.
To access the USB host controller, EDK designs can use the xps_epc IP core.
Reference designs posted on the Digilent website show an example for reading characters
from a USB keyboard connected to the USB host interface.

Fig.6.14 USB host cypress CY7C67300


6.7.10 Video output
Video output is accomplished using a Chrontel CH7301C DVI transmitter device
connected to a standard Type A HDMI connector (J3). DVI and HDMI share a common
TMDS signaling standard, so a simple adaptor can be used to convert the HDMI
connector to a DVI connector (VGA signals are not available on the HDMI connector).
UCET [Type text] Dept. of ECE
AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

The Chrontel CH7301C (IC3) supports up to 1600 X 1200 resolutions with 24-bit
color. Status and control information can be moved between the FPGA and the CH7301C
using an I2C bus (SCL to FPGA pin U8, and SDA to FPGA pin V8, both using the
LVCMOS33 I/O standard).
The I2C bus is also routed to the HDMI connector to allow direct
communications with external monitors. EDK designs can use the xps_tft IP core (and its
associated driver) to access the Chrontel device. The xps_tft core reads video data from
the DDR2 memory, and sends it to the Chrontel device for display on an external
monitor. The IP core is capable of resolutions of 640X480 at 18 bits per pixel.
An EDK reference design available on the Digilent website (and included as a
part of the User Demo) reads a bitmap file from the StartaFlash memory and displays it
on the monitor. Another second EDK reference design (included in the User test available
though Adept) displays a gradient color bar and a text in the centre of the screen. An ISE
reference design is available that displays a colour bar. This reference design provides an
example of using the DVI circuit with an ISE project.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

CHAPTER-7
OVERVIEW OF MODEL-SIM SIMULATOR
In this chapter the basic simulation procedure for using Modelsim Software for
simulating the VHDL designs for various
.
7.1 SIMULATION (MODEL-SIM)
Modelsim is a verification and simulation tool for VHDL, Verilog,
SystemVerilog, and mixed language designs. This lesson provides a brief conceptual
overview of the ModelSim simulation environment. It is divided into four topics, which
you will learn more about in subsequent lessons.
Basic simulation flow
Project flow
Multiple library flow
Debugging tools

7.2 BASIC SIMULATION FLOW:


The following Fig 7.1 shows the basic steps for simulating a design in ModelSim.
Creating the Working Library
In ModelSim, all designs are compiled into a library. You typically start a new
simulation in ModelSim by creating a working library called "work". "Work" is
the library name used by the compiler as the default destination for compiled
design units.
Compiling Your Design
After creating the working library, you compile your design units into it. The
ModelSim library format is compatible across all supported platforms. You can
simulate your design on any platform without having to recompile your design.
Loading the Simulator with Your Design and Running the Simulation With the
design compiled, you load the simulator with your design by invoking the
simulator on a top-level module (Verilog) or a configuration or entity/architecture
pair (VHDL).

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Assuming the design loads successfully, the simulation time is set to zero, and
you enter run command to begin simulation.
Debugging Your Results
If you dont get the results you expect, you can use ModelSims robust debugging
environment to track down the cause of the problem.

Create a working library

Compile design files

Load and run simulation

Debug results

Fig 7.1 Basic simulation flow

7.3 PROJECT FLOW:


A project is a collection mechanism for an HDL design under specification or
test. Even though you dont have to use projects in ModelSim, they may ease interaction
with the tool and are useful for organizing files and specifying simulation settings. The
following Fig 7.2 shows the basic steps for simulating a design within a ModelSim
project. As you can see, the flow is similar to the basic simulation flow. However, there
are two important differences:
You not have to create a working library in the project flow; it is done for you
automatically.
Projects are persistent. In other words, they will open every time you invoke
ModelSim unless you specifically close them.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Create project

Add files to the project

Compile design files

Run simulation

Debug results

Fig 7.2 Steps for simulation of a design

7.4 MULTIPLE LIBRARY FLOW


Modelsim uses libraries in two ways: 1) as a local working library that contains
the compiled version of your design; 2) as a resource library. The contents of your
working library will change as you update your design and recompile.
A resource library is typically static and serves as a parts source for your design.
You can create your own resource libraries, or they may be supplied by another design
team or a third party (e.g., a silicon vendor).
You specify which resource libraries will be used when the design is compiled,
and there are rules to specify in which order they are searched. A common example of
using both a working library and a resource library is one where your gate-level design
and test bench are compiled into the working library, and the design references gate-level
models in a separate resource library.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Create a working library

Compile design files

Link to resource libraries

Run simulation

Debug results

Fig 7.3 Basic Steps for simulating with multiple libraries.


You can also link to resource libraries from within a project. If you are using a
project, you would replace the first step above with these two steps: create the project
and add the test bench to the project.

7.5 DEBUGGING TOOLS


Modelsim offers numerous tools for debugging and analyzing your
design. Several of these tools are:
Using projects
Working with multiple libraries
Setting breakpoints and stepping through the source code
Viewing waveforms and measuring time
Viewing and initializing memories
Creating stimulus with the Waveform Editor
Automating simulation

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION
APPLICATIONS .

The sample design for this lesson is a simple 8-bit, binary up-counter with an
associated testbench. The pathnames are as follows:
Verilog <install_dir>/examples/tutorials/verilog/basicSimulation/counter.v
and tcounter.v
VHDL <install_dir>/examples/tutorials/vhdl/basicSimulation/counter.vhd
and tcounter.vhd
This lesson uses the Verilog files counter.v and tcounter.v. If you have a VHDL
license, use counter.vhd and tcounter.vhd instead. Or, if you have a mixed license, feel
free to use the Verilog testbench with the VHDL counter or vice versa
Design Files
The sample design for this lesson is a simple 8-bit, binary up-counter with an
associated
testbench. The pathnames are as follows:
Verilog <install_dir>/examples/tutorials/verilog/basicSimulation/counter.v
and tcounter.v
VHDL <install_dir>/examples/tutorials/vhdl/basicSimulation/counter.vhd
and tcounter.vhd
This lesson uses the Verilog files counter.v and tcounter.v. If you have a VHDL
license, use counter.vhd and tcounter.vhd instead. Or, if you have a mixed license, feel
free to use the Verilog testbench with the VHDL counter or vice versa.

7.6 CREATE THE WORKING DESIGN LIBRARY


Before you can simulate a design, you must first create a library and compile the
source code into that library.
1.Create a new directory and copy the design files for this lesson into it. Start by
creating a new directory for this exercise (in case other users will be working with these
lessons). 2.Verilog: Copy counter.v and tcounter.v files from
/<install_dir>/examples/tutorials/verilog/basicSimulation to the new directory.
VHDL: Copy counter.vhd and tcounter.vhd files from
/<install_dir>/examples/tutorials/vhdl/basicSimulation to the new directory.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

2. Start ModelSim if necessary.


a. Type vsim at a UNIX shell prompt or use the ModelSim icon in Windows.
Upon opening ModelSim for the first time, you will see the Welcome to ModelSim
dialog. Click Close.
b. Select File > Change Directory and change to the directory you created in step
1. 3. Create the working library.
a. Select File > New > Library.
This opens a dialog where you specify physical and logical names for the library
Figure 5-4. You can create a new library or map to an existing library. Well be doing
the former.

Fig 7.4 Creating a New Library Dialog


b. Type work in the Library Name field (if it isnt already entered automatically).
c. Click OK.
ModelSim creates a directory called work and writes a specially-formatted file
named _info into that directory. The _info file must remain in the directory to
distinguish it as a ModelSim library. Do not edit the folder contents from your operating
system; all changes should be made from within ModelSim. ModelSim also adds the
library to the list in the Workspace (Figure 5-4) and records the library mapping for
future reference in the ModelSim initialization file modelsim.ini.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Fig 7.5 Work Library in the Workspace


When you pressed OK in step 3 above, the following was printed to
the Transcript:
vlib work
vmap work
These two lines are the command-line equivalents of the menu selections
you made. ManyCommand-line equivalents will echo their menu-driven functions in
this fashion.

7.7 COMPILE THE DESIGN


With the working library created, you are ready to compile your source files. You
can compile by using the menus and dialogs of the graphic interface, as in the Verilog
example below, or by entering a command at the ModelSim> prompt.
1. Compile counter.v and tcounter.v.
a. Select Compile > Compile. This opens the Compile Source Files dialog (Figure 5-6).
If the Compile menu option is not available, you probably have a project open. If so,
close the project by making the Workspace pane active and selecting File > Close from
the menus.
b. Select both counter.v and tcounter.v modules from the Compile Source Files dialog
and click compile. The files are compiled into the work library.
c. When compile is finished, click done.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Fig 7.6 Compile Source Files Dialog


2. View the compiled design units. a. On the Library tab, click the + icon next to the
work library and you will see two design units (Fig 7.7). You can also see their types
(Modules, Entities, etc.) and the path to the underlying source files (scroll to the right if
necessary).

7.8 LOAD THE DESIGN

Fig 7.7 Verilog Modules Compiled into work Library

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

1. Load the test_counter module into the simulator.


a. In the Workspace, click the + sign next to the work library to show the files
contained there.
b. Double-click test_counter to load the design.
You can also load the design by selecting Simulate > Start Simulation in the menu bar.
This opens the Start Simulation dialog. With the Design tab selected, click the + sign next
to the work library to see the counter and test_counter modules. Select the test_counter
module and click OK (Figure 7-8).

Fig 7.8 Loading Design with Start Simulation Dialog


When the design is loaded, you will see a new tab in the Workspace named sim
that displays the hierarchical structure of the design (Figure 7-9). You can navigate
within the hierarchy by click on any line with a + (expand) or - (contract) icon. You
will also see a tab named Files that displays all files included in the design.

Fig 7.9 Workspace Sim Tab Displays Design Hierarchy.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

2. View design objects in the Objects pane.


a. Open the View menu and select Objects. The command line equivalent is:
view objects
The Objects pane (Fig 7.10 shows the names and current values of data objects
in the current region selected in the Workspace). Data objects include signals, nets,
registers, constants and variables not declared in a process, generics, and parameters.

Fig 7.10 Object Plane Displays Design Objects


Run the Simulation
Now you will open the Wave window, add signals to it, then run the simulation.
1. Open the Wave debugging window. a.
Enter view wave at the command line.
You can also use the View > Wave menu selection to open a Wave window.
The Wave window is one of several windows available for debugging. To see a
list f the other debugging windows, select the View menu. You may need to move or
resize the windows to your liking. Window panes within the Main window can be
zoomed to occupy the entire Main window or undocked to stand alone. For details, see
Navigating the Interface.
2. Add signals to the Wave window.
a. In the Workspace pane, select the sim tab.
b. Right-click test_counter to open a popup context menu.
c. Select Add > To Wave > All items in region (Fig 7.11).
All signals in the design are added to the Wave window.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

3. Run the simulation.


a. Click the Run icon in the Main or Wave window toolbar.
The simulation runs for 100 ns (the default simulation length) and waves are drawn in
the Wave window.
b. Enter run 500 at the VSIM> prompt in the Main window.
The simulation advances another 500 ns for a total of 600 ns (Fig 7.12.).

Fig 7.11 Using the Popup Menu to Add Signals to Wave Window

Fig 7.12 Waves Drawn in Wave Window

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

c. Click the Run -All icons on the Main or Wave window toolbar. The simulation
continues running until you execute a break command or it hits a statement in your code
(e.g., a Verilog $stop statement) that halts the simulation.
d. Click the Break icon. The simulation stops running.

7.9 SET BREAKPOINTS AND STEP THROUGH THE SOURCE


Next you will take a brief look at one interactive debugging feature of the
ModelSim environ ment. You will set a breakpoint in the Source window, run the
simulation, and then step through the design under test. Breakpoints can be set only on
lines with red line numbers.
1. Open counter.v in the Source window.
a. Select the Files tab in the Main window Workspace.
b. Click the + sign next to the sim filename to see the contents of vsim.wlf dataset.
c. Double-click counter.v (or counter.vhd if you are simulating the VHDL files) to
open it in the Source window.
2. Set a breakpoint on line 36 of counter.v (or, line 39 of counter.vhd for VHDL).
a. Scroll to line 36 and click in the BP (breakpoint) column next to the line number.
A red ball appears in the line number column at line number 36 (Figure 7-13), indicating
that a breakpoint has been set.

Fig 7.13 Setting Breakpoint in Source Window

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

3. Disable, enable, and delete the breakpoint.


a. Click the red ball to disable the breakpoint. It will become a black ball.
b. Click the black ball again to re-enable the breakpoint. It will become a red ball.
c. Click the red ball with your right mouse button and select Remove Breakpoint 36.
d. Click in the line number column next to line number 36 again to re-create
the breakpoint.
4. Restart the simulation.
a. Click the Restart icon to reload the design elements and reset the simulation time to
zero.
The Restart dialog that appears gives you options on what to retain during the
restart

Fig 7.14 Restart Dialog


b. Click the Restart button in the Restart dialog.
c. Click the Run -All icons.
The simulation runs until the breakpoint is hit. When the simulation hits the
breakpoint, it stops running, highlights the line with a blue arrow in the Source view
(Figure 7-14), and issues a Break message in the Transcript pane.
When a breakpoint is reached, typically you want to know one or more signal
values. You have several options for checking values: look at the values shown in the
Objects window (Fig 7.15).

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION
APPLICATIONS .

Fig 7.15 Blue Arrow Indicates Where Simulation Stopped.

Fig 7.16 Values Shown in Objects Window


Set your mouse pointer over a variable in the Source window and a yellow box
will appear with the variable name and the value of that variable at the time of the
selected cursor in the Wave window

Fig 7.17 Parameter Name and Value in Source Examine Windo

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

Use the Examine command at the VSIM> prompt to output a variable value to
the Main window Transcript (i.e., examine count) Try out the step commands.
Click the Step icon on the Main window toolbar. This single-steps the debugger.
Experiment on your own. Set and clear breakpoints and use the Step, Step Over,
and Continue Run commands until you feel comfortable with their operation.

7.10 NAVIGATING THE INTERFACE


The Main window is composed of a number of "panes" and sub-windows that
display various types of information about your design, simulation, or debugging session.
You can also access other tools from the Main window that display in stand-alone
windows (e.g., the Dataflow window).

Fig 7.18 The main window


Here are a few important points to keep in mind about the ModelSim interface:
Windows/panes can be resized, moved, zoomed, undocked, etc. and the changes
arepersistent. You have a number of options for re-sizing, re-positioning,
undocking/redocking, and generally modifying the physical characteristics of windows
and panes. When you exit ModelSim, the current layout is saved so that it appears the
same the next time you invoke the tool.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION
APPLICATIONS .

Refer to the Main Window section in the Users Manual for more information.
Menus are context sensitive. The menu items that are available and how certain
menu items behave depend on which pane or window is active. For example, if
the sim tab in the Workspace is active and you choose Edit from the menu bar,
the Clear command is disabled. However, if you click in the Transcript pane and
choose Edit, the Clear command is enabled. The active pane is denoted by a blue
title bar.
Let us try a few things.
1. Zoom and undock panes.
Click the Zoom/Unzoom icon in the upper right corner of the Workspace pane
(Figure 7-18).

Fig 7.19 Window/Pane Control Icons

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS .

CHAPTER-8
SIMULATION RESULTS
The patterns stored in memory are b c d f and p c d g. If the given input pattern
matches with patterns stored in the memory then the status port shows whether the pattern is
matched or not &also shows that with which pattern in the memory it matched. Pre_reg
indicates through which path it has been transverse .If given input b c d f has to be
transverse though first path it is indicated as 01& If given input p c d g has to be
transverse though first path it is indicated as 10.If_final port indicates the final state of
pattern .If the final state(character)is not reached then If_final indicates as 0& if the final
state(character)is not reached If_final indicates as 1.

Fig8.1 Simulation result of pattern one matched

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

The above waveform is a simulated result when the given input pattern isb c d f
.we applied the input as b c d f.The states will transverse as shown in fig 5.3 and table
5.3.the starting state is s0 and it transverse along the path S1,S26,S37,S4.once it is
reaches the state S4 ,the If_final port indicates as 1.The Pre_reg port indicates as
01as it is traversed along first path(i.e,01).Status port indicates as pattern one
matched

Fig 8.2 Simulation result of pattern two matched

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

The above waveform is a simulated result when the given input pattern isp c d
g. We applied the input as p c d g. The states will transverse as shown in fig 5.3 and
table 5.3.The starting state is S0 and it transverse along the path{ S5,S26,S37,S8}.Once it
is reaches the state S8 ,the If_final port indicates as 1.The Pre_reg port indicates as
10as it is traversed along second path(i.e,10).Status port indicates as pattern two
matched

Fig 8.3 Simulation result of pattern not matched


The patterns stored in memory are b c d f and p c d g. If the given input
pattern matches with patterns stored in the memory then the status port shows whether
the pattern is matched or not &also shows that with which pattern in the memory it
matched .If we give any input(wrong pattern) which is not stored in memory the status
port indicates as pattern is not matched.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY
REDUCTION APPLICATIONS .

*The total equivalent gate count for this design is 596


Fig 8.4 Design summary report of previous algorithm

*The total gate count required for the proposed design is395
Fig 8.5 Design summary report of proposed ac_algorithm

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

CONCLUSION

We have presented a high-speed and scalable pattern matching algorithm that uses
multi-character transitions on finite state automata to increase the throughput, and also
leverages a clever transition optimization technique to reduce the memory requirements.
An experimental result shows that the proposed AC algorithm with Merge FSM reduces
the Gate Count when compared with the existing AC Algorithm. However this algorithm
has a can efficiently work for string matching applications only. Since the number of
states in merge_FSM can be drastically smaller than the original FSM, it results in a
much smaller memory size. For the previous algorithm required the gate count is 586 ,but
by using our algorithm we achieving the same by using only 395 gates.sa we reduced the
gate count as well as complexity of design.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION APPLICATIONS
.

FUTURE SCOPE

To extend the proposed architecture with variety of algorithms to enhance


security related issues. As now a day we are using this string/pattern matching algorithms
everywhere in our daily life. Some of the main applications are security issues, employee
attendance system, banking applications .Suppose if we want to store the data of millions
of customers data and to match the data when we give input(eg:banking)it requires large
memory and also complexity increases. We can apply this concept to this type of
applications. And to extend this project to implement more transactions like out of order.

UCET [Type text] Dept. of ECE


AN AREA EFFICIENCY PATTERN MATCHING METHOD FOR THE MEMORY REDUCTION
APPLICATIONS .

REFERENCES
[1] Cheng-Hung Lin, Member, IEEE, and Shih-Chieh Chang, Member, IEEE,
Efficient pattern matching algorithm for memory reduction applications, IEEE
transactions on VLSI systems, vol.19, no.1, January2012, pp. 112115.
[2] R. Sidhu and V. K. Prasanna, Fast regular expression matching using FPGAS,
in Proc. 9th Ann. IEEE Symp. Field-Program. Custom Comput. Mach. (FCCM),
2001, pp. 227238.
[3] L. Tan and T. Sherwood, A high throughput string matching architecture for
intrusion detection and prevention, in Proc. 32nd Annu. Int Symp. Comput.
Arch. (ISCA), 2005, pp. 112122.
[4] N. Tuck, T. Sherwood, B. Calder, and G. Varghese, Deterministic memory-
efficien string matching algorithms for intrusion detection, in Proc. 23nd Conf.
IEEE Commun. Soc. (INFOCOMM), Mar. 2004, pp. 26282639.
[5] F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz, Fast and memory-
efficient regular expression matching for deep packet inspection, in Proc.
ACM/IEEE Symp. Arch. Netw. Commun. Syst. (ANCS), 2006, pp. 93102.
[6] Ziv J., Lempel A., A Universal Algorithm for Sequential Data Compression,
IEEE Transactions on Information Theory, Vol. 23, No.3, pp. 337-343.
[7] J. van Lunteran, High-performance pattern-matching for intrusion detection,
IEEE INFOCOM, 2006, pp. 68-71.
[8] S. Dharmapurikar, P. Krishnamurthy, T. Sproull, J. Lockwood, "Deep packet
inspection using parallel bloom filters, IEEE. Micro, vol. 24, no. 1, pp. 52-61.
[9] H, Song, J. W. Lockwood "Efficient packet classification for network intrusion
detection using FPGA," Proceedings of the 2005 ACM/SIGDA 13th international
symposium on Field-programmable gate arrays, pp.238 245.
[10] <FPGA overview> available at www.digilentinc.com.
[11] <An overview on Xilinx ISE> Available at http://www.xilinx.com/support/
documentation/user_guides/ug190.pdf

UCET [Type text] Dept. of ECE