Optimizing Signal and Image Processing Applications Using Intel Libraries

Optimizing signal and image processing applications using
Intel libraries
Jerome Landrea and Frederic Truchetetb
a CReSTIC,
Univ. de Reims-Champagne-Ardenne, I.U.T., 10 rue de Quebec, Troyes, France

Univ. de Bourgogne, I.U.T., 12 rue de la Fonderie, Le Creusot, France
b Le2i,
ABSTRACT
This paper presents optimized signal and image processing libraries from Intel Corporation. Intel Performance
Primitives (IPP) is a low-level signal and image processing library developed by Intel Corporation to optimize
code on Intel processors. Open Computer Vision library (OpenCV) is a high-level library dedicated to computer
vision tasks. This article describes the use of both libraries to build flexible and efficient signal and image
processing applications.
1. INTRODUCTION
When creating a signal or image processing application, code optimization is very important to ensure overall
performances.1 FPGA (Field Programmable Gate Array) and DSP (Digital Signal Processors) reconfigurable
processors allow hardware optimizations24 leading to real-time image processing applications. However, these
processors are expensive and reconfiguration is not easy when they are used on a production line.
Due to multimedia needs, personal computers are more and more powerful and less and less expensive. With
64 bits processors and operating systems, new perspectives of real-time image processing tasks are emerging.
With new digital cameras and video cameras, size of images is increasing critically (from four to more than ten
megapixels per image). Image processing programmers need tools to optimize their code to perform real-time
operations on big images. Intel Corporation develops efficient processors and efficient tools to get the best
performances from these processors.
Intel Corporation proposes IPP5 and OpenCV6 to use low-level and high-level computer vision and image
processing functions. IPP is a commercial proprietary library working only on Intel processors (x86, pentium,
II, III, 4, Xeon, Itanium, Itanium 2, Core Duo, Core 2 Duo. . . ) for several platforms (Windows, Linux and
MacOS). OpenCV is free and opensource and can be compiled on many different platforms (Windows, Linux,
MacOS, freeBSD,. . . ). Both openCV and IPP can work on full image, on a region of interest (ROI) or on a
channel of interest (COI) where operations are performed on a subset of image.
Within Intel processors, many advanced signal processing methods are wired: MMX (Multimedia Extensions),
MMX2, SSE (Streaming Single instruction multiple data Extensions), SSE2, SSE3, . . . IPP offers a software
layer able to use hardware wired functions.
Figure 1 shows how OpenCV and IPP interact. Applications using OpenCV interact directly with nonIntel processor (d) or with Intel processor when IPP is unavailable (c). When IPP is present, OpenCV uses
automatically IPP optimized functions on Intel processors (b). Applications can also use IPP without OpenCV
on Intel processors (a).
IPP and OpenCV libraries are an important part of my Ph.D. thesis in which I have applied these libraries to
image indexing and retrieval using wavelets. A first tutorial was proposed to IPP and OpenCV programmers.7
Our paper is divided into different sections. Section 2, is a description of IPP library. Section 3 introduces
Intel OpenCV library. In section 4, a set of performance test is given to illustrate computing time optimization.
Section 5 describes the speedup obtained with a recent processor for our tests. Section 6 is a general conclusion
on Intel libraries.
Figure 1. IPP and OpenCV interactions on Intel and non-Intel architectures.
2. INTEL PERFORMANCE PRIMITIVES

IPP is a set of four low-level signal and image processing libraries: signal processing, image processing, small
matrices computation and cryptography. They are developed by Intel Corporation to optimize code for Intel
processors architecture. IPP works with many datatypes from 8 bits unsigned integer to 64 bits floating point
data. IPP supports floating point complex data up to 32 bits. IPP does not have any graphic user interface
(GUI) so an external GUI must be used to visualize results.
IPP library is composed of several dynamic libraries, one per Intel processor family (PIII, P4, Core Duo. . . ).
Users can statically link these libraries to their applications, choosing only one processor model. Users can also
choose to link libraries dynamically so that IPP based programs will choose the best dynamic library to use
based on the processor used at run time. Static linking does not require destination computer to have IPP but
dynamic linking needs destination machine to have IPP installed.
Libraries follow upward compatibility. It means a program statically linked with pentium library will work
on pentium, II, III, 4 and last generation 64 bits processors, but with less performance than if it was compiled
to run on a last generation Core 2 Duo processor. Programmers have the choice of the destination processor and
of the linking method.
2.1. Signal processing library

This part of the library is dedicated to signal processing tasks. Many functions exists to allow classical and
advanced signal processing tasks. Classical statistical, logical, arithmetic, filtering and sampling functions are
present. Advanced signal functions are also available: Fourier, wavelet and Discrete Cosine transforms.
Specific signal processing tasks like speech recognition and coding, MP3 encoding and decoding functions
allow a wide range of signal representation. String handling functions and fixed-accuracy arithmetic functions
enhance the number of useful functions available to programmers.
2.2. Image processing library

Image processing functions are also optimized to take advantage of wired functions integrated into Intel processors. Images arithmetic, logical, threshold, statistical, geometric functions are present in this part of the library.
Color convertions, wavelets, Fourier transforms, images filtering allow a wide set of applications. All standard
datatypes are allowed from unsigned 8 bits integer (Ipp8u) to 64 bits floating point values (Ipp64f).
Computer vision is a important part of IPP with motion estimation, corner and edge detections, motion
template. . . Multimedia is also implemented allowing MPEG-1, MPEG-2, DV, MPEG-4, H.263, H.264 video
coding and decoding. IPP is a complete set of functions to build vision and image processing applications.
2.3. Small matrices library

The last part of IPP library deals with matrix computation limited to small matrices. For large matrices, Intel
offers another library Math Kernel Library which is not described in this paper.
Small matrices library includes computation on matrices and vectors. Multiplication, addition, subtraction,
distances, transposition, trace of matrices can be computed. Least square problem solving and linear system
solution functions are also available.
This library offers functions limited to a maximum size of 6x6 matrices. As its name indicates, it is limited to
small matrices. But all functions provided in this library are sufficient for programming CAD and 3D applications.
2.4. Cryptography
Cryptography functions are composed of many different types of cryptographic algorithms: DES, blowfish, AES
. . . There are many hashing functions available: MD5, SHA1, SHA256, RSA . . .
Big number arithmetic, prime number generation functions are also present. Cryptography library is an
extension of signal processing library. Intel Centrino processors dedicated to wireless networks connection can
use these wired functions for wireless security cryptography.
3. OPENCV
OpenCV is a high-level image processing and computer vision library developed by Intel Corporation. It is
available free for many platforms: Intel architecture, AMD processors, Unix systems. . .
OpenCV offers functions working on high-level data structure like sets, trees, graphs, vectors, matrices and
images. OpenCV supports many datatypes from unsigned 8 bits integer to 64 bits floating point values. OpenCV
allows any number of channels for an image which is useful when working on multichannel images.
OpenCV offers computer vision methods. It also offers a graphical user interface (GUI) to read, visualize and
save images from an image file but also from any video stream. It allows video processing on-the-fly by applying
transforms directly to video buffers before visualization.
4. PERFORMANCE EVALUATION
Four performances tests have been realized to prove IPP and OpenCV efficiency. Several image processing tasks
have been applied to images with increasing sizes. There are four images, the first one is a 1024x768 pixels RGB
color image which is a standard size for computer displays (XGA). The other sizes considered are 1600x1200,
2048x2048 and 4096x4096 to show IPP performances on big images.
Tests were performed on a 32 bits Pentium 4, 2 GHz with 512 Mb RAM running Linux Fedora core 5,
GCC 4.0, IPP 5.0 and OpenCV 0.9.9, which is a relatively slow configuration compared with recent 64 bits
processors. However, computing time is very good despite of this processor. A comparison between OpenCV
alone, OpenCV+IPP and IPP alone is proposed to show the advantages of IPP for real-time imaging.
4.1. test 1 : Laplacian edge detector

Edge detection is often used when working with images to try to detect edges of objects. It is a simple filtering
algorithm but it needs one convolution per pixel to get the result. A 3x3 filter was chosen for our test. Time is
given in seconds and size of images are given in millions of pixels.
Figure 3a shows computing time for a laplacian edge detector using OpenCV alone, OpenCV+IPP and IPP
alone. Edge detection can be processed real-time on a video at 30 frames per second up to 1600x1200 image size
(on this slow architecture). In this case, processing must not take more than 1/30=0.03 second. Only IPP offers
such a performance, OpenCV is slower even helped by linking it with IPP.
Figure 2. a) Original image - b) Laplacian edge detector
4.2. test 2 : Color conversion

When dealing with image processing applications, it is often necessary to change the colorspace of the image
before processing. Color conversion is a simple but slow task. In a first part of the test, our images are converted
from RGB to Luv colorspace. In a second step, images are converted from RGB to HLS colorspace.
Figure 3b shows computing time for increasing size images color conversions using OpenCV alone, OpenCV+IPP
and IPP alone. Again IPP takes the advantage on all images. Real-time color conversion is not possible in this
test. IPP does not accelerate at all computing in this case.
Figure 3. a) Laplacian edge detector performances. - b) Color conversion.
4.3. test 3 : Wavelet transforms

Wavelet transform allows multiresolution decomposition of images. A wavelet transform has been tested to
demonstrate IPP efficiency. It is based on an integer lifting scheme (S transform) proposed by Calderbank et
al.8
4
Figure 4a shows wavelet transform computing time for our images. Again IPP is more rapid than OpenCV.
4.4. test 4 : K-means classification

Classification is a time consuming task for image processing. A classification task (using K-means algorithm)
with varying number of families (K between 2 and 9) on one million two-dimension vectors is shown on figure
4b. OpenCV using IPP optimized functions is more efficient in computing time than OpenCV alone. IPP had
no function to compute K-means directly so is ignored for this test.
Figure 4. a) Wavelet transform. - b) One million 2D vector classification using K-means.
5. TOWARDS 64 BITS
Tests were ported to an Intel Core Duo (64 bits) 2GHz processor with 1Gb RAM running Linux Fedora Core 5
(32 bits because 64 bits version of this operating system was not available at the time of redaction of this paper)
in order to compare computing time. Results are presented in the table 5. Results are expressed as a mean
of speedup obtained on each image size (test 1, test 2 and test 3) and as a mean of speedup obtained on each
cluster size (test 4).
% of speedup
OpenCV
OpenCV+IPP
IPP
test 1
22.78%
27.11%
38.7%
test 2
17.34%
19.55%
23.31%
test 3
24.19%
26.13%
26.76%
test 4
23.28%
25.20%
NA
Figure 5. Results of tests on a 64 bits processor.
The amount of RAM is not the same on the two processors, but it does not influence computing time because
images are stored in RAM without using memory swap in the two cases. The second level cache memory was
2Mb on Core Duo and only 512Kb on P4 processor, which increases computing power of Core Duo. IPP is much
more efficient on 64 bits processors than on 32 bits ones allowing real-time image processing tasks on images up
to 1600x1200 pixels.
6. CONCLUSION
This article presents Intel libraries available for programmers to build efficient image processing applications.
Several tests have been performed to prove IPP and OpenCV efficiency and interoperability. On a 64 bits
hardware architecture (and a 32 bits operating system), the maximum speedup between P4 and Core Duo is
about 38% which is a good result. It means new processors are able to work on very big image (up to 1600x1200)
at real-time.
IPP can be used without OpenCV, but OpenCV offers to IPP a GUI for visualisation and several highlevel computer vision techniques not available in IPP. For every computation made by OpenCV, optimized IPP
functions are used if available to get the best results in terms of computing time and accuracy.
OpenCV and IPP are very powerful tools to build real-time image processing application taking advantage
of Intel technologies: Hyper-threading, SSE3, Core Duo, 64 bits architecture . . . Another advantage is that
every processor from Intel Corporation can run IPP and OpenCV which guarantees a upward compatibility.
Your applications written for pentium 4 processor will work on next generation processors from Intel without
changing the code (but you will have to recompile the source code). It is a very important advantage for industrial
applications to have real-time computing and real upward compatibility.
Intel develops a compiler to enhance C/C++ code optimization at the software level. In a future publication,
Intel C++ compiler will be tested to compare performances to those of GCC compiled code.
REFERENCES
1. M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis and Machine Vision, PWS Publishing, second
edition ed., 1999.
2. M. Leeser, S. Coric, E. Miller, H. Yu, and M. Trepanier, Parallel-beam backprojection: an fpga implementation optimized for medical imaging, J. VLSI Signal Process. Syst. 39(3), pp. 295311, 2005.
3. B. Girau and C. Torres-Huitzil, FPGA implementation of an integrate-and-fire LEGION model for image
segmentation, in European Symposium on Artificial Neural Networks, pp. 173178, 2006.
4. S. Qureshi, Embedded image processing on the TMS320C6000 DSP: examples in code composer studio and
MATLAB, Springer Science+Business Media, New York, 2005.
5. INTEL, Integrated Performance Primitives for Intel Architecture, INTEL Corporation, 2000-2006.
http://www.intel.com/software/products/perflib.
6. INTEL,
Open
Source
Computer
Vision
Library,
INTEL
Corporation,
1999-2006.
http://www.sourceforge.net/projects/opencvlibrary.
7. J. Landre, Programming with intel ipp (integrated performance primitives) and intel opencv (open computer
vision) under gnu linux, http://jlandre.ifrance.com, july 2003. version 0.4.
8. R. Calderbank, I. Daubechies, W. Sweldens, and B.-L. Yeo, Wavelet transforms that map integers to integers, Applied and Computational Harmonic Analysis (ACHA) 5(3), pp. 332369, 1998.

Optimizing Signal and Image Processing Applications Using Intel Libraries

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Optimizing Signal and Image Processing Applications Using Intel Libraries

Hochgeladen von

Copyright:

Verfügbare Formate

Optimizing signal and image processing applications using

Univ. de Reims-Champagne-Ardenne, I.U.T., 10 rue de Quebec, Troyes, France

Figure 1. IPP and OpenCV interactions on Intel and non-Intel architectures.

2. INTEL PERFORMANCE PRIMITIVES

2.1. Signal processing library

2.2. Image processing library

2.3. Small matrices library

4.1. test 1 : Laplacian edge detector

Figure 2. a) Original image - b) Laplacian edge detector

4.2. test 2 : Color conversion

Figure 3. a) Laplacian edge detector performances. - b) Color conversion.

4.3. test 3 : Wavelet transforms

4.4. test 4 : K-means classification

Figure 4. a) Wavelet transform. - b) One million 2D vector classification using K-means.

Figure 5. Results of tests on a 64 bits processor.

Das könnte Ihnen auch gefallen