You are on page 1of 5

Fast Template Matching System Using VHDL

Richard Tate and James Northern III, Member, IEEE

correlation matching of a 16 x 16 array with a 4 x4 array this

Abstract— The Image template matching problem is one of the would require 256 individual matching. The system also
fundamental problems of and has many practical applications in requires multiple templates to perform accurate template
image processing, pattern recognition, and computer vision. It is matching. Multiple templates are needed to represent different
a useful operation for filtering, edge detection, image orientation of a 3-D image [1]. The 2-D Templates must be
registration, and object detection. Template matching is the able to represent 3-D characteristic rotation in the plane,
process of determining the presence and the location of a
different viewing directions, and shifts of the object within the
reference image of an object inside a scene image under analysis
by a spatial cross-correlation process. Conventional cross- template. In this paper we will describe the system using one
correlation type algorithms are computationally expensive. In template for simplicity, but the simulation has been set-up to
this paper, a method of fast image template matching algorithm process 16 templates. The 16 templates require 4096 matches
based on template modeling is proposed. Taken advantage of fast to perform a complete search of the 16 x 16 image. There is
searching techniques, the method can achieve high computing also an image load time that should be added to the final
efficiency and good matching results. Furthermore a comparison processing time but can be overlooked for these simulations,
of the performance provided by this method and conventional due to the fact that they don’t affect final comparison of
matching algorithms is discussed. Theoretical analysis and performance. The Prototype can be expanded to
simulation results show that the proposed algorithm is very
accommodate larger quantity of templates, which can further
effective. Simple hardware architecture is presented and
implemented, where it executes matching for a 4x4 template on a increase runtime.
16 x 16 target image. Images and templates are represented using 1’s and 0’s. The
1’s represents bight locations on the image; the 0’s represent
Index Terms—Image recognition, signal processing, template dark locations on the image. The image of 1’s and 0’s are
matching, optimization. developed using edge detection and threshold. This topic will
not be discussed in this paper, but can be visited in the
reference section.
I. INTRODUCTION The system developed at Prairie View A&M University
utilizes gate level logic to perform complex array matching.
T HE ability to quickly detect and identify people and
objects has recently become a high priority for various
departments and agencies within the U.S. government;
The system does a bit level cross-correlation comparison
match, template regeneration, threshold, and sorting. With
262144 templates and all the processes just mention the
including the Department of Homeland Security and the matching becomes the bottle neck for the template.
Department of Defense. Consequently, the demand for
adaptable, robust target recognition systems has stimulated A. ATR Algorithm
significant interest among researchers in the science Automatic target recognition is an application from the field
community. Typically, target recognition systems classify of pattern recognition. Automatic target recognition is the
video sequences by integrating detection, correlation and final process of searching through images to find a target. RADAR
image recognition. In our research, we have focused on or satellite photographs or other image generating systems can
correlation techniques using templates to match and verify produce the images.
targets. The ATR system initially collects SAR images from the
This paper describes the use of templates to locate images sensors. These images are then passed on to the Focus of
within another image and methods to accelerate the object Attention (FOA) subsystem. The FOA determines the areas
detection. The matching process that occurs when using where targets may be located then extract the corresponding
templates as the image recognition mask requires binary images.
comparison of the template mask to the search image. This The FOA stage identifies interesting image and composes a
paper a small matching is presented as a prototype to for much list of targets suspected to be in that image. Having access to
larger systems. The system presented here is a 16 x 16 image range and altitude information, the FOA algorithm also
and multiple 4 x 4 templates. This system requires the cross- determines the elevation for that image, without having to
identify the target first. The FOA tasks use the SLD stage to
Dr. J. Northern and Richard Tate are with the Department of Electrical evaluate the likelihood that the suspected targets are actually
and Computer Engineering, Prairie View A&M University, Prairie View, TX in the given image and their position. To do so, the FOA
77446 USA (phone: 936-261-9915; fax: 936-261-9930; e-mail:
defines tasks for the SLD stage, where each task is composed or
of an image, a suspected target with its elevation, one or two
orientation intervals, and a few parameters [3].

978-1-4244-2077-3/08/$25.00 ©2008 IEEE.

This paper will discuss methods of reordering and eliminate
searches to reduce the amount of searches necessary to
complete the matching process. First the overall system layout
will be described, and then the template regeneration circuit in
section 2. In section 3 the effectiveness of the template
regeneration circuit is shown.


This section gives an overview of the system for template
matching and template matching regeneration. Figure 1 gives
and illustration of this process. The matching structure allows
for the mid-simulation changes to the template layout and
parallel matching with multiple templates.
The input binary image is passed to a buffer RAM, from
there it is shifted by a parallel-to-serial pixel shifter to a
template matching module. The data from the template
matching module is sent to two separate threshold units for the
main and secondary circuits. If the main circuit threshold is
reached and the secondary circuit threshold is not met the
value from the template matching is passed to a Storage RAM.
If the main circuit and the secondary circuit threshold are Figure 2. Search Image and Template
reached or just the secondary circuit threshold is reached then
the template regeneration module is activated. The activation
of the secondary circuit stops the main circuit and causes the Where, “x” and “y” are the offset value of the template in the
search image to stay at one location. Once the template horizontal and vertical directions. Also, ‘I’ represents the
regeneration is done with the comparing of all the templates in image and ‘T’ represents the template.
the library, the value is passed to a Storage RAM. After that As the template mask is passed over the underlining image
value is passed the matching begins again at the next position a comparison result is formulated. Figure 3, shows the
in the search image. overlapping of the two images. The matching is preformed
using exclusive NOR gates and a multiplexer to provide the
necessary comparison.
One input of the exclusive NOR gate is for the input of the
image bit and the other input is the control bit for the
comparison. The control bit is used to compare the incoming
image bit to a zero or one. The exclusive NORs are grouped
into two groups of 16, with one group representing the ‘1’ and
the other group representing the logic ‘0’.

Figure 1. VHDL Modules

A. Cross-Correlation Comparison
Our matching process uses cross-correlation comparison,
which is the process of the template being matched with each
binary bit in the image. Figure 2 shows a diagram of the
template as it relates to the size of the image.
The correlation can be described mathematically by cross-
correlation function in (eq. 1).

16 16
M x1, y1 = ∑∑ I cx,cyTcx + x1,cy + y1
Figure 3. Overlay of template and image search image
cx cy
(eq. 1)
The multiplexers are used to control the data flow from the
output of the exclusive NOR gates. The multiplexer allows for
expandability and parallel matching to be achieved. The 32
outputs from the exclusive NOR gate arrays are connected to
the multiplexer. The selector input is controlled by the
template input which is sixteen bits. The multiplexer outputs a
16 bit string that represents the hits and misses for the
comparison match. The binary string “1011011111011111” is
an example of the output from the multiplexers. This output is
equal to 13 hits and 3 misses. In section IV we explain how
this allows for mid-simulation adjustment to the 4x4 template
B. Computation of Hit Rate and Threshold
Once the matching is completed, the output matching is set1 set2 set3 set4
calculated for hit rate. The hit rate module contains half Figure 4. Four sets of 4x4 templates stored in the Library Module
adders connected together adding each of the sixteen bits
together and outputting a five digit representation of the total Using Figure 4, each set has template that have at least two
number of hits. Five digits are used because the “00000” is binary bits in common. The threshold here would be set to
still needed to represent the value of zero. have a hit of count of 14 or greater. The 14 or greater
The output value is then compared to the stored threshold threshold eliminates the Regeneration of templates that do not
value for comparison. If the threshold value is met, the value exist in the set of templates. If 16 are reached then no
of the hit count, the column, the row, and an address is passed template generating is needed. Each set represents a different
to the output RAM. A new address is then generated. If the template matching array.
threshold is not met, the threshold will send a ‘Z’ signal to the The process of template creating is done using the output of
RAM. Once the image switches back to read or to the set of the multiplexer and the threshold. Using the above example if
templates being processed, the threshold address value returns the threshold is passed and using set 1, with the first template
to zero. as the initial run template, a possible output of the multiplexer
is “1111110111111101”. The outputted value is compared
III. CREATING THE TEMPLATE with the current template and the values that correspond with
the 1’s remain the same and the values that correspond with
Using the output of the multiplexers and jot count we can
the 0’s are inverted. Using the first template in binary form
generate a best fit template for a certain location within the
image that achieves a predetermine threshold. The purpose of “0000000001100010” and the outputted value form the
this is to create a template that correctly matches the position multiplexer “1111110111111101” we can create a new
in the image that is being cross-correlation matched at that template:
moment. Although this template would provide a perfect
match for that location in the image, it must still be able to
compare to templates within the predetermine set of available
templates. The number of template that is regenerated must
Output of the matching circuit
also be limited so that bottleneck is not created within the
system by matching newly created templates that would not
provide for better matches.
New template
A. Threshold 0000001001100000
To reduce the amount of templates that are created we use a
threshold to determine when a template should be generated.

The threshold is based on difference between bits within a set   
of templates. The set of templates are made up of templates
that have bits in common. The sets are stored in a Library. For
this paper we used templates that have two bit difference   
between each template in the set. Using the two bit difference   
we can reduce the amount of templates that the system creates,   
hence reducing the number of useless searches performed. It
should also be noted that the reduction is only presence when
there is an image that causes the Template Regeneration
This new template is then matched against the template in
Module to be activated. The threshold is based on the
the set. If a correct match is found then at that position a
maximum difference in the bits in the Library Module. Using
100% match ratio is calculated. If a template is found that has
the earlier example with a pixel change of two, the threshold
a higher hit rate than the current template, that template is
for that set of pixels is equal to fourteen or greater, which
marked to have the highest hit rate value for that position in
means if the output of the hit count is greater than or equal
the image.
fourteen a template is generated for matching.
This section will describe the experimental simulation of
the Template Regeneration Module. The simulation tool used  


for evaluation is Modelsim XE III. The clock is set to 200
MHZ. The testbench was creating in Xilinx ISE testbench 
waveform editor. The circuit utilizes one library module (16 

templates) and an image of 16x16 pixels, where one pixel is
equal to one bit. A target was placed to trigger the template 
regeneration module. The image was placed at row five,  

column four (reading from right to left) or (column 13, row 5
reading from left to right) of the search image, as shown in 
Figure 5. The starting template was the first template in set 1. 
The templates were processed in the order from top to bottom.
The target image was created to be identical to the second 
template in set 2.
Two types of simulations were performed, the first using B. Template generation circuit
the original template matching circuit, and the second with the
For the template generation circuit, the simulation data
template regeneration circuit. This test provided a gauge for
shows that at 34.5 microseconds, the first template matched
performance of the overall circuit. Additional simulations
the target with a threshold of fourteen. It took an additional
were done to change the shape of the trigger image to evaluate
five clock cycles to match the sixth template, which is the best
the performance of the template generation circuit.
match for that particular target. In this example, it shows that
the template regeneration circuit can identify the correct
template 18 times as fast as the original template matching
circuit. The number of searches that the regeneration circuit
would have to process is 73.

# of searches = [((4x16)+4)+5] = 73

When compared to the original template matching circuit of

1350 searches to 73 searches of the “template” generation
circuit, the template regenerated circuit finds the target faster.
Figures 6 and 7 are taken from Modelsim XE III simulation.
The red portion of the simulation, represents one clock cycle
after row five, column four (from right to left) in the search
Figure 5. Search image and templates
image is processed. The simulation that has the template
Also, a simulation without a known target image was regeneration module enters into the secondary circuit and the
performed to show all cross-correlation comparisons result of that matching happens two clock cycles after the (5,
performed for the entire image. The time required to write the 4) position is processed by the main circuit. The original
image data into the Buffer RAM is not included in the template matching circuit continues to perform uninterrupted
simulation time due to the fact that is doesn’t changes the template matching.
result of the Comparison of the two circuits.
A. Original Template Matching Circuit
In our simulation, the output data shows that after 13,800
microseconds the highest hit count was found using template
four in set one. This is the total amount of time needed to
completely cross-correlate the entire image. Within an image
and given set of templates, this process required 1350
searches. The template moves 68 times within the image to
identify region of interest (ROI) for target.

# of template matches to ROI = (4x16)+4 Figure 6.Template Regeneration

The number of times it takes to complete search for five
entire templates.

# of searches = [((4x16)+4)+(16x16x5)]
# of searches = 1350
Figure 7. Original template matching library

Template matching is a very important task that must be
performed effectively within an ATR system. The goal of the
design was to create a sub-system within the ATR
environment that could do mid-simulation changes to the
template matching circuit, parallel matching, and template
regeneration during the simulation. In this paper a template
matching system was presented. The system was developed
using Xilinx and simulated in Modelsim XE III. The system
was tested using various conditions to ensure performance.
Analysis of the data shows the system to be effective in
reducing search time to locate a possible target.

This material is based upon work supported by the
USAF/AFRL at Wright Patterson AFB Sensors Directorate
under Contract No. FA8650-05-D-1912. Any opinions,
findings and conclusions or recommendations expressed in
this material are those of the author(s) and do not necessarily
reflect the view of the USAF/AFRL, Wright Patterson AFB

[1] Clark F. Olson and Daniel P. Huttenlocher “Automatic Target
Recognition by Matching Oriented Edge Pixels”, IEEE
Transactions on Image Processing, vol. 6, no. 1, January 1997.

[2] Kang-Ngee., et al., ”High Performance Automatic Target

Recognition Through Data-Specifi VLSI”, IEEE Transactions
on Very Large Scale Integration (VLSI) Systems, vol. 6, no. 3,
September 1998.

[3] Young H. Cho, “Optimized Automatic Target Recognition

Algorithm on Scalable Myrinet/Field Programmable Array
Nodes,” 34th IEEE Asilomar Conference on Signals, Systems,
and Computers, Monterey, CA, October 2000.

[4] A. DeHon, “DPGA Utilization and Application,” Proceedings of

the 1996 International Symposium on Field Programmable Gate
Arrays, ACM/SIGDA, February 1996.

[5] R. Amerson, R.J. Carter, W.B. Culbertson, P. Kuekes,

and G. Snider, “Teramac - Configurable Custom Computing,”
Proceedings of the 1995 Symposium on FPGAs for
Custom Computing Machines, pp. 180- 188, April 1995.