Sie sind auf Seite 1von 6

Hardware Implementation of Real time Image Processing on FPGA

Ms. Aayushi Jain


Prof. Sunil Shah
M Tech (Embedded System and VLSI Design) Deptt of Electronics and Communication
Gyan Ganga Institute of Science and Technology, Jabalpur Gyan Ganga Institute of Science and Technology, Jabalpur
Mail Id:- jainayushi1709@gmail.com Mail Id:- sunil.ggits@gmail.com

Abstract— The hardware architecture presented in In a typical micro-controller/dsp processor based


this work is suitable for the efficient design, this will involve storing the frames in a
implementation of machine vision systems. This buffer, and then performing the operations
architecture supports robust, high speed, low mentioned in equation1.1 to each pixel gray level,
latency, and low power smart camera applications. in a loop. Suppose each addition instruction takes
Configuring the imaging sensor for a reduced 12 clock cycle, and each multiplication instruction
synchronization overhead can either increase the takes 36 clock cycle, then total number of clock
maximum frame speed and, simultaneously, reduce cycles required to process one pixel will be 48. If
the latency or reduce the power consumption at a the incoming frames are of size 100 × 100, then
maintained frame speed. The majority of the such a design will need 100 × 100 × 48 clock cycles
dissipated dynamic power stems from the clock to process the entire frame. Now suppose, there are
nets. Aggressive parallelization of computation and 10000 adder and multiplier circuits, one
memory accesses maintaining the clock nets at a cosponsoring to each pixel. In such a design, all the
lower frequency would appear to be a good strategy pixels can be processed in parallel. Thus total
with regards to achieving a low dynamic power for operation can be implemented in just two clock
DSP systems on an FPGA. Static power can be cycles. In principle, such a system will be 12000
controlled by selecting an FPGA device which has a times faster than that of a micro controller/dsp
size that matches the size of the application. processor based system. In actual designs, algorithm
will be divided in to parallel blocks and will be
executed simultaneously.
Keywords—Image Processing Using FPGA,
hardware implementation Digital Image Processing,
VHDL Image processor;

Motivations for Hardware Image Processing


As explained in the previous section, a micro-
controller/dsp processor executes algorithm
sequences sequentially. If multiple hardware
circuits can be designed to carry out different
algorithm sequences in parallel, there will be
considerable increase in overall execution speed.
Suppose a system has to be designed such that the
brightness of the incoming frames has to be
increased. Brightness of an image can be increased
by multiplying each pixel gray level with a constant
α, and then adding a gain constant β to it as in Fundamental steps of a image processing system
equation1.1
g(i, j) = f(i, j) × α + β
In a nutshell, the significant increase in processing
speed is the major motivation behind hardware
image processing. If the processing time is less, the
power consumption also will be reduced. Hence it Thus to achieve significant speed-up, the proportion
can be concluded that hardware image processing of algorithms which can be executed in parallel
systems give better performance in time critical should be more. Most of the image processing
applications. In the current scenario, most of the algorithms are parallel in nature.
image processing algorithms are running in a
sequential environment. Hence a research in FPGA
based image processing has grater significance and Why FPGA
scope in time critical applications. An FPGA based design is inherently parallel in
nature. Different algorithm sequences will be
Parallelism mapped to different hardware modules in a FPGA,
Most of the image processing algorithms have which operates concurrently. The main reasons for
inherent parallelism in them. The processing speed choosing FPGA as an embedded image processing
can be improved by executing the sequences platform are as given below.
concurrently. In principle, all the algorithm  Parallel operation
sequences can be implemented in a separate  Speed of execution
processor. But if each step depends on previous
 Flexibility
algorithm sequences, the processors will have to
 Low power design
wait for the results from previous stages. Thus the
reduction in response time of the system will be Literature Survey
very little. For practical implementations in a [1.] Jing Gu & Yang Huayu “Real-time Image
parallel architecture, algorithm should have Collection and Processing System Design”
significant number of parallel operations. This is Fifth International Conference on
Known as Amdahl’s law [1] . Let ’s’ be the Instrumentation and Measurement,
proportion of total number sequences in an Computer, Communication and Control,
algorithm, that has to be executed sequentially. Let 2015.
’p’ be the proportion of the algorithm that can be
executed in parallel, using N different processors. Using FPGA and DSP structure can well solve real-
Then the best possible speed up that can be obtained time image acquisition system requirements; The
is given by equation1. image data of the dim place is based on the
logarithmic stretching algorithm, which can
effectively enhance the image, and make the
uneven distribution of the image become clear;The
parallel JPEG compression algorithm is used to
The equality can only be achieved if there are no block the two-dimensional images before FPGA
additional overhead like communication, introduced sending data, and the processing time is reduced;
as a result of conversion of sequential algorithm to a Although the compression algorithm of JPEG can
parallel one. In practical scenario, the actual speed reduce the amount of the information greatly, it can
up will be always less than the number of preserve the details of the original image and
processors N. As N increases, the execution speed achieve the expected target.
also increases. Ideally, if N tends to infinity, the [2.] Gopinath Mahale1, Hamsika Mahale1,
over all execution speed of the algorithm depends Arnav Goel1, S.K.Nandy1, S.Bhattacharya2,
solely on the proportion of the algorithms, that has Ranjani Narayan3 28th International
to be executed sequentially. This is given in Conference on VLSI Design and 2015 14th
equation
International Conference on Embedded Wireless Camera Based Vision Sensor
Systems 2015 Node” Sixth International Symposium on
Parallel Computing in Electrical
The objective of this paper is to come up with a
Engineering.
scalable modular hardware solution for real-time
Face Recognition (FR) on large databases. Existing While performing more and more tasks on software
hardware solutions use algorithms with low the energy requirement of the vision sensor node is
recognition accuracy suitable for real-time response. increased. Hence we will avoid a task partitioning
In addition, database size for these solutions is strategy having more modules in software
limited by on-chip resources making them implementation. Similarly, shifting more tasks to
unsuitable for practical real time applications. Due hardware results in increased hardware cost, as well
to high computational complexity we do not choose as increased design and development time. We have
algorithms in literature with superior recognition shown that partitioning tasks between hardware and
accuracy. Instead, we come up with a combination software at the vision sensor node affects the energy
of Weighted Modular Principle Component requirement of the vision sensor node. Considering
Analysis (WMPCA) and Radial Basis Function this, our results show that the most suitable strategy
Neural Network (RBFNN) which outperforms for our specific application is when we perform
algorithms used in existing hardware solutions on vision tasks such as image capturing, background
highly illumination and pose variant face databases. subtraction, segmentation, morphology and TIFF
We propose a hardware solution for real-time FR Group4 compression on FPGA and then send the
which uses parallel streams to perform independent results using transceiver embedded in SENTIO32
modular computations. platform. The bubble remover, labeling and features
extraction is performed at the central base station.
A hardware solution for real-time face recognition
In this way the power requirement of the vision
is presented to address problems of low recognition
sensor node is reduced, which resulted a life time of
accuracy and small database size support in existing
5.1 years of the vision sensor node for a sample rate
solutions. We come up with a combination of
of 5 minutes. A general conclusion of our results is
WMPCA and RBFNN which shows better
that the highest system life time is achieved for full
recognition accuracy on images with considerable
FPGA solution for the processing steps. However
variations in pose and illumination. We propose a
this system requires FPGA technology with large
modular hardware solution for the algorithm to
memory resources and low sleep power or low
support high frame rate requirements. The proposed
configuration energy.
architecture shows scalability with respect to
database size and image dimensions due to [4.] Lan Shi, David Hadlich, Christopher Soell,
availability of large external memory. A novel Thomas Ussmueller y and Robert Weigel “A
storage format on external memory is followed to Tone Mapping Algorithm Suited for
minimize memory latencies. The FPGA emulation Analog-Signal Real-Time Image
for the hardware solution is able to perform face Processing” 978-1-5090-0493-5/16/c 2016
recognition on images of dimension 128×128 at 450 IEEE.
recognitions per second with 450 classes of face This work presents a Tone Mapping Operator
images. Due to improved recognition algorithm and (TMO) which adjusts the High Dynamic Range
scalable parallel architecture, the proposed FRS is (HDR) of image sensor data to the limited dynamic
well suited for real-time surveillance applications. range of conventional displays with analog signal
[3.] Khursheed Khursheed, Muhammad Imran, processing. It is based on Photographic Tone
Abdul Wahid Malik, Mattias O’Nils, Reproduction (PTR) and suitable for analog circuit
Najeem Lawal, Thörnberg Benny design in a CMOS image sensor in order to reduce
“Exploration of Tasks Partitioning Between the hardware cost and operation time for real-time
Hardware Software and Locality for a image processing. The proposed analog TMO does
not require access to any other pixel of the image Using high-level languages and compilers to hide
sensor and processes in the analog domain at the the constraints and automatically extract
focal plane. Furthermore, the proposed TMO also parallelism from the code does not always produce
provides a well tone mapped image quality an efficient mapping to hardware. The code is
depending on a suitable calibration and user usually adapted from a software implementation
settings. It specially benefits for customer-tailored and thus has the disadvantage that the resulting
applications, which strongly require a real time implementation is based fundamentally on a serial
processing with low power consumption, e.g., algorithm.
security monitoring, traffic monitoring, advanced Manual mapping removes the ‘software mindset’
drive system, medical technology, etc. restriction but instead the designer must now deal
Further research will investigate the CMOS- more closely with timing, resource and bandwidth
integrated circuit design of the proposed analog constraints, which complicate the mapping process
TMO. The analog circuit noise, inaccuracies and Timing or processing constraints can be met using
overflow problems, specially in arithmetical pipelining. This only adds to latency rather than
circuits, will be considered in circuit design. changing the algorithm, which is why automated
Furthermore, a stage-pipeline for speed-up of the pipelining is possible. Meeting bandwidth
analog computing could also be implemented as an constraints on the other hand is more difficult
enhancement to this study at the next step. because the underlying algorithm may need to be
completely redesigned, an impossible task for a
In this proposal we describes a hardware
compiler. This paper presented some general
architecture for real-time image component labeling
techniques for evaluating complex expressions to
and the computation of image component feature
help deal with resource constraints by reducing
descriptors. These descriptors are object related
logic block usage.
properties used to describe each image component.
Embedded machine vision systems demand a robust Resources Bailey et al. Kim et al. [10]
performance and power efficiency as well as [9, 27, 28]
minimum area utilization, depending on the Logic LUT 958 18170
deployed application. In the proposed architecture,
Block RAMs 4 52
the hardware modules for component labeling and
feature calculation run in parallel. A CMOS image Max. clock
40.6MHz 29.51MHz
frequency
sensor will be used to capture the images. The
Frame
architecture will be synthesize and implement on a 110 72
speed/sec
Xilinx virtex-6 FPGA. The developed architecture
will be capable of processing as much as 350 video
frames per second of size 640 × 480 pixels. Window operations require local caching and
Dynamic power consumption will be maintained to control mechanisms but the underlying algorithm
be the minimum. remains the same. Global operations such as chain
coding require random access to memory and
cannot be easily implemented under stream
SUMMARY processing modes. This forces the designer to
reformulate the algorithm.
FPGAs are often used as implementation platforms
for real-time image processing applications because
their structure can exploit spatial and temporal References
parallelism. Such parallelization is subject to the
processing mode and hardware constraints of the [1.] H. C. van Assen, H. A. Vrooman, M.
system. Egmont-Petersen et al.“Automated
calibration in vascular X-ray images using
theaccurate localization of catheter marker components algorithm,” in Proceedings of
bands,” InvestigativeRadiology, vol. 35, no. the 4th IEEE International Symposium on
4, pp. 219–226, 2000. Electronic Design, Test and Applications
[2.] D.Meng, C. Yun-feng, andW. Qing-xian, (DELTA ’08), pp. 228–231, Hong Kong,
“Autonomous craters detection from China, January 2008.
planetary image,” in Proceedings of the 3rd [10.] N. Ma, D. G. Bailey, and C. T.
International Conference on Innovative Johnston, “Optimised single pass connected
Computing Information and Control components analysis,” in Proceedings of the
(ICICIC ’08), June 2008. International Conference on ICECE
[3.] C. Steger, M. Ulrich, and C. Wiedemann, Technology (FPT ’08), pp. 185–192,
Machine Vision Algorithems and December 2008.
Applications, Wiley-VCH, New [11.] K. Benkrid S Sukhsawas D Crookes
York,NY,USA,2008. and A. Benkrid, “An FPGA-Based Image
[4.] D. G. Bailey, C. T. Johnston, and N. Ma, Connected Component Labeller,” in Field
“Connected components analysis of Programmable Logic and Application, vol.
streamed images,” in 2008 International 2778 of Lecture Notes in Computer Science,
Conference on Field Programmable Logic pp. 1012–1015, 2003.
and Applications, FPL,pp. 679–682, [12.] M. Jabło´nski and M. Gorgo´n,
September 2008. “Handel-C implementation of classical
[5.] M. G. Pinheiro, “Image descriptors based on component labelling algorithm,” in
the edge orientation,” in Proceedings of Proceedings of the EUROMICRO Systems
the4th InternationalWorkshop on Semantic on Digital System Design (DSD ’04), pp.
Media Adaptation and Personalization 387–393, September 2004.
(SMAP ’09), pp. 73–78, December 2009. [13.] M. Mylona, D. Holding, and K.
[6.] T. Jabid, M. H. Kabir, and O. Chae, “Local Blow, “DES developed in handel-C,” in
Directional Pattern (LDP)—a robust image Proceedings of the London Communications
descriptor for object recognition,” in Symposium, 2002.
Proceedings of the 7th IEEE International [14.] F. Chang, C.-J. Chen, and C.-J. Lu,
Conference on Advanced Video and Signal “A linear-time component labeling
Based (AVSS ’10), pp. 482–487, Boston, algorithm using contour tracing technique,”
Mass, USA, September 2010. Computer Vision and Image Understanding,
[7.] A. A. Mohamed and R. V. Yampolskiy, “An vol. 93, no. 2, pp. 206–220, 2004.
improved LBP algorithm for avatar face [15.] M. F. Ercan and Yu-Fai Fung,
recognition,” in Proceedings of the 23rd “Connected component labeling on a one
International Symposium on Information, dimensional DSP array,” in Proceedings of
Communication and Automation the IEEE Region 10 Conference (TENCON
Technologies (ICAT ’11), Sarajevo, Bosnia ’99), vol. 2, pp. 1299–1302, December
and Herzegovina, October 2011. 1999.
[8.] Y. Yoon, K.-D. Ban, H. Yoon, and J. Kim, [16.] H. C. van Assen, M. Egmont-
“Blob extraction based character Petersen, and J. H. C. Reiber, “Accurate
segmentation method for automatic license object localization in gray level images
plate recognition system,” in Proceedings of using the center of gravity measure:
the IEEE International Conference on accuracy versus precision,” IEEE
Systems, Man, and Cybernetics (SMC ’11), Transactions on Image Processing, vol. 11,
pp. 2192–2196, Anchorage, Alaska, USA, no. 12, pp. 1379–1384, 2002.
October 2011. [17.] Digillent Incorporation,
[9.] C. T. Johnston and D. G. Bailey, “FPGA http://www.digilentinc.com/.
implementation of a single pass connected
[18.] Aptina Imaging Corporation,
http://www.aptina.com/.
[19.] Visual Applets at
SILICONSOFTWARE GmbH,
http://www.silicon-software.info/en/.

Das könnte Ihnen auch gefallen