Beruflich Dokumente
Kultur Dokumente
Domenico Ficara
Stefano Giordano
domenico.ficara@iet.unipi.it s.giordano@iet.unipi.it
Fabio Vitucci
fabio.vitucci@iet.unipi.it
Gianni Antichi
gianni.antichi@iet.unipi.it
Gregorio Procissi
g.procissi@iet.unipi.it
Andrea Di Pietro
andrea.dipietro@iet.unipi.it
ABSTRACT
Modern network devices need to perform deep packet inspection at high speed for security and application-specific
services. Finite Automata (FAs) are used to implement regular expressions matching, but they require a large amount
of memory. Many recent works have proposed improvements
to address this issue.
This paper presents a new representation for deterministic finite automata (orthogonal to previous solutions), called
Delta Finite Automata (FA), which considerably reduces
states and transitions and requires a transition per character only, thus allowing fast matching. Moreover, a new
state encoding scheme is proposed and the comprehensive
algorithm is tested for use in the packet classification area.
General Terms
Algorithms, Design, Security
Keywords
DFA, Intrusion Prevention, Deep Packet Inspection, Regular Expressions, Packet Classification
1.
INTRODUCTION
31
The remainder of the paper is organized as follows. In section 2 related works about pattern matching and DFAs are
discussed. Sec.3 describes our algorithm, by starting from a
motivating example and sec.4 proves the integration of our
scheme with the previous ones. Then in sec.5 the encoding
scheme for states is illustrated and in the subsequent section
the integration with FA is shown. Finally, sec.8 presents
the experimental results, while sec.9 proves the applicability
of FA to packet classification.
2.
RELATED WORK
32
2
c
a
a
a
d
c
d
c
4
d
(b) The D2 FA
(c) The FA
3.
3.1
A motivating example
In this section we introduce FA, a D2 FA-inspired automaton that preserves the advantages of D2 FA and requires
a single memory access per input char. To clarify the rationale behind FA and the differences with D2 FA, we analyze
the same example brought by Kumar et al. in [14]: fig.1(a)
represents a DFA on the alphabet {a, b, c, d} that recognizes
the regular expressions (a+ ),(b+ c) and (c d+ ).
Figure 1(b) shows the D2 FA for the same regular expressions set. The main idea is to reduce the memory footprint
of states by storing only a limited number of transitions for
each state and a default transition to be taken for all input
char for which a transition is not defined. When, for example, in fig.1(b) the D2 FA is in state 3 and the input is d, the
default transition to state 1 is taken. State 1 knows which
state to go to upon input d, therefore we jump to state 4. In
this example, taking a default transition costs 1 more hop
(1 more memory access) for a single input char. However,
it may happen that also after taking a default transition,
the destination state for the input char is not specified and
another default transition must be taken, and so on. The
works in [14] and [6] show how we can limit the number of
hops in default paths and propose refined algorithms to define the best choice for default paths. In the example, the
total number of transitions was reduced to 9 in the D2 FA
(less than half of the equivalent DFA which has 20 edges),
thus achieving a remarkable compression.
3.2
33
This requires, however, the introduction of a supplementary structure that locally stores the transition set of the
current state. The main idea is to let this local transition
set evolve as a new state is reached: if there is no difference
with the previous state for a given character, then the corresponding transition defined in the local memory is taken.
Otherwise, the transition stored in the state is chosen. In
all cases, as a new state is read, the local transition set is
updated with all the stored transitions of the state. The
FA shown in fig.1(c) only stores the transitions that must
be defined for each state in the original DFA.
3.3
knowledge we have from the previous state and this transition must be stored in the tc table. On the other hand, when
all of the states that lead to the child state for a given character share the same transition, then we can omit to store
that transition. In alg.1 this is done by using the special
symbol LOCAL TX.
3.3.1
Construction
In alg.1 the pseudo-code for creating a FA from a N states DFA (for a character set of C elements) is shown.
The algorithm works with the transition table t[s, c] of the
input DFA (i.e.: a N C matrix that has a row per state and
where the i-th item in a given row stores the state number
to reach upon the reading of input char i). The final result
is a compressible transition table tc [s, c] that stores, for
each state, the transitions required by the FA only. All the
other cells of the tc [s, c] matrix are filled with the special
LOCAL TX symbol and can be simply eliminated by using
a bitmap, as suggested in [24] and [4]. The details of our
suggested implementation can be found in section 7.
3.4
Lookup
Equivalent states
34
1
a
b
c
d
1
1
1
1
2
3
1
4
0
0
1
0
3
b
1
0
0
1
0
c
5
0
0
1
0
Data set
Snort34
Cisco30
Cisco50
Cisco100
Bro217
2
3
1
4
2
3
1
4
2
3
5
4
2
3
1
4
t=0
t=1
t=2
t=3
Local
transition set
acc
1.52
1.62
1.52
1.58
1.13
TS (KB)
27
7
13
36
11
One of the main advantage of our FA is that it is orthogonal to many other schemes. Indeed, very recently,
two major DFA compressed techniques have been proposed,
namely H-cFA [13] and XFA [21, 20]. Both these schemes
address, in a very similar way, the issue of state blow-up in
DFA for multiple regular expressions, thus candidating to
be adopted in platforms which provide a limited amount of
memory, as network processors, FPGAs or ASICs. The idea
behind XFAs and H-cFA is to trace the traversal of some
certain states that corresponds to closures by means of a
small scratch-memory. Normally those states would lead to
state blow-up; in XFAs and H-cFA flags and counters are
shown to significantly reduce the number of states.
Since our main concern is to show the wide extent of the
possible applications for our FA, we report in fig.2(a) a
simple example (again taken from a previous paper [13]). In
the example, the aim is to recognize the regular expressions
.*ab[a]*c and .*def, and labels include also conditions and
operations that operate on a flag (set/reset with +/-1) and a
counter n (for more details refer to [13]). A DFA would need
20 states and a total of 120 transitions, the corresponding
H-cFA (fig.2(a)) uses 6 states and 38 transitions, while the
FA representation of the H-cFA (fig.2(b)) requires only 18
transitions. Specifically, the application of FA to H-cFA
and XFA (which is tested in sec.8) is obtained by storing
the instructions specified in the edge labels only once per
state. Moreover edges are considered different also when
their specified instructions are different.
5.
rcomp (%)
59
67
61
59
80
4.
p1char (%)
96
89
83
78
96
5.1
As claimed above, the implementation of Char-State compression requires a lookup in an indirection table which
should be small enough to be kept in local memory. If several
35
b, +1, n = 4
a, 1
a, 1
e,f
d
c, 1|(1 and n = 0)
b,e,f ,c|(0 or n = 0)
d
d
c, 1|(1 and n = 0)
e
f
e,c
e
2
d
(a) The H-cFA. Dashed and dotted edges have same labels,
respectively c, 1|(1 and n = 0) and a, 1. Not all edges
are shown to keep the figure readable. The real number of
transitions is 38.
6.
C-S IN FA
36
instead stored within the state) and a list of the pointers for
the specified transitions, which, again, can be considered as
20 bit offset values.
If the number of specified transitions within a state is
small enough, the use of a fixed size bitmap is not optimal:
in these cases, it is possible to use a more compact structure,
composed by a plain list of character-pointer couples. Note
that this solution allows for memory saving when less than
32 transitions have to be updated in the local table.
Since in a state data structure a pointer is associated with
a unique character, in order to integrate Char-State compression in this scheme it is sufficient to substitute each absolute pointer with a relative-id. The only additional structure consists of a character-length correspondence list, where
the length of the relative-ids associated with each character is stored; such an information is necessary to parse the
pointer lists in the node and in the local transition set. However, since the maximum length for the identifiers is generally lower than 16 bits (as it is evident from figure 4), 4 bits
for each character are sufficient. The memory footprint of
the character-length table is well compensated by the corresponding compression of the local transition set, composed
by short relative identifiers (our experimental results show
a compression of more than 50%). Furthermore, if a double indirection scheme for the translation of relative-ids is
adopted, a table indicating the number of unique identifiers
for each character (the threshold value we mentioned in section 5.1) will be necessary, in order to parse the indirection
table. This last table (that will be at most as big as the compressed local transition table) can be kept in local memory,
thus not affecting the performance of the algorithm.
7.
IMPLEMENTATION
8.
EXPERIMENTAL RESULTS
37
Dataset
# of regex
Snort24
Cisco30
Cisco50
Cisco100
Bro217
24
30
50
100
217
ASCII
length range
6-70
4-37
2-60
2-60
5-76
% Regex w/
wildcards (*,+,?)
83.33
10
10
7
3.08
Original DFA
# of states # of transitions
13886
3554816
1574
402944
2828
723968
11040
2826240
6533
1672448
DB =
98.92
98.84
98.76
99.11
99.41
DB = 14
98.92
98.84
98.76
99.11
99.40
D2 FA
DB = 10
98.91
98.83
98.76
98.93
99.07
DB = 6
98.48
97.81
97.39
97.67
97.90
DB = 2
89.59
79.35
76.26
74.65
76.49
BEC-CRO
trans.
96.33
90.84
84.11
85.66
93.82
98.71
98.79
98.67
98.96
99.33
FA
dup. states
0
7.12
1.1
11.75
11.99
DB =
95.97
97.20
97.18
97.93
98.37
DB = 14
95.97
97.20
97.18
97.93
98.34
D2 FA
DB = 10
95.94
97.18
97.18
97.63
95.88
DB = 6
94.70
95.21
94.23
95.46
95.69
DB = 2
67.17
55.50
51.06
51.38
53
BEC-CRO
FA + C-S
95.36
97.11
97.01
97.58
98.23
95.02
91.07
87.23
89.05
92.79
Finally, table 4 reports the results we obtained by applying FA and C-S to one of the most promising approach for
regular expression matching: XFAs [21, 20] (thus obtaining
a XFA). The data set (courtesy of Randy Smith) is composed of single regular expressions with a number of closures
that would lead to a state blow-up. The XFA representation limits the number of states (as shown in the table). By
adopting FA and C-S we can also reduce the number of
transitions with respect to XFAs and hence achieve a further size reduction. In details, the reduction achieved is
more than 90% (except for a single case) in terms of number of transitions, that corresponds to a rough 90% memory compression (last column in the table). The memory
requirements, both for XFAs and XFAs, are obtained by
storing the instructions specified in the edge labels only
once per state.
Dataset
c2663-2
s2442-6
s820-10
s9620-1
# of
states
14
12
23
19
# of trans.
XFA
3584
3061
5888
4869
# of trans.
XFA
318
345
344
366
Compr.
%
92
74.5
94.88
92.70
38
BEC-CRO
D FA
FA
+C-S
XFA/
H-cFA
XFA/
H-cFA
speed
Figure 6: Comparison of speed performance and
space requirements for the different algorithms.
ments in a qualitative graph (proportions are not to be considered real). It is evident that our solution almost achieves
the compression of D2 FA and BEC-CRO, while it proves
higher speed (as that of DFA). Moreover, by combining our
scheme with other ones, a general performance increase is
obtained, as shown by the integration with XFA or H-cFA.
9.
Algorithm
HyperCuts2
Hypercuts4
HiCuts2
Hicuts4
BV
FA
ACL1
Mem.
mean
12.97
10.2
6.17
6.47
24
9.78
100
ref.
max
23
17
17
16
31
13
FW1 100
Mem. ref.
mean max
12.65
29
11.5
26
15.02
28
11.5
26
22
28
8.69
13
IPC1 100
Mem. ref.
mean max
6.96
16
5.76
13
6.1
17
4.85
18
26
33
12.2
13
39
10.
In this paper, we have presented a new compressed representation for deterministic finite automata, called Delta
Finite Automata. The algorithm considerably reduces the
number of states and transitions and it is based on the observation that most adjacent states share several common
transitions, so it is convenient to store only the differences
between them. Furthermore, it is orthogonal to previous solutions, this way allowing for higher compression rates. Another fundamental feature of the FA is that it requires only
a state transition per character (keeping the characteristic
of standard DFAs), thus allowing a fast string matching.
A new encoding scheme for states has been also proposed
(that we refer to as Char State), which exploits the association of many states with a few input chars. Such a compression scheme can be efficiently integrated into the FA
algorithm, allowing a further memory reduction with a negligible increase in the state lookup time.
Finally, the integration of both schemes has also been proposed for application in the field of packet classification, by
representing the classification rule set through regular expressions. The experimental runs have shown good results
in terms of lookup speed as well as the issue of excessive
memory consumption, which we plan to address by exploiting the recent techniques presented in [13] and [21, 20].
Acknowledgements
We would like to thank Michela Becchi for her extensive support and for her useful regex-tool. We are grateful to Sailesh
Kumar, William Eatherton and John Williams (Cisco) for
having provided the regular expression set used in Cisco devices. We thank Randy Smith for his precious suggestion
and the XFAs used in our tests. Finally we would like to express gratitude to Haoyu Song for his freely available packet
classifier implementations that helped us in our work and to
the anonymous reviewers for their insightful comments.
11.
REFERENCES
40