Sie sind auf Seite 1von 116

Real-Time Video and Image Processing for Object

Tracking using DaVinci Processor


A dissertation submitted in partial fulfillment of
the requirements for the degree of
Master of Technology
by
Badri Narayan Patro
(Roll No. 09307903)

Under the guidance of


Prof. V. Rajbabu

DEPARTMENT OF ELECTRICAL ENGINEERING


INDIAN INSTITUTE OF TECHNOLOGYBOMBAY
July 15, 2012

This work is dedicated to my family and friends.


I am thankful for their motivation and support.

Abstract
A video surveillance system is primarily designed to track key objects, or people exhibiting suspicious behavior as they move from one position to another and record it for possible future use.
Object tracking is an essential part of surveillance systems. As part of this project, an algorithm
for object tracking in video based on image segmentation and blob detection and identification
was implemented on Texas Instruments (TIs) TMS320DM6437 Davinci multi media processor. Using back ground subtraction, all objects present in the image can be detected irrespective
of they are moving or not. With the help of image segmentation, the subtracted image is filtered out and free from salt pepper noise. The segmented image is processed for detecting and
identifying the blobs presents, which is going to be tracked. The object tracking is carried out
by feature extraction and center of mass calculation in feature space of the image segmentation
results of successive frames. Consequently, this algorithm can be applied to multiple moving
and still objects in the case of a fixed camera.
In this project we develop and demonstrate a framework for real-time implementation
of image and video processing algorithms such as object tracking and image inversion using
Davinci processor. More specifically we track single object and two object present in the scene
captured by a CC camera that acts as the video input device and output is displayed in LCD
display. The tracking happens in real-time consuming 30 frames per second (fps) and is robust
to background and illumination changes. The performance of single object tracking using background subtraction and blob detection was very efficient in speed and accuracy as compared to
a PC (Matlab) implementation of a similar algorithm. Execution time for different blocks of
single object tracking were estimated using the profile and accuracy of the detection is verified using the debugger provided by TI code composer studio (CCS). We demonstrate that the
TMS320DM6437 processor provides at least ten-times speed-up and is able to track a moving
object in real-time.

ii

Contents
Abstract

ii

List of Figures

vii

List of Tables

ix

List of Abbreviations

Introduction

DaVinci Digital Media Processor

2.1

DaVinci Processor and Family . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.1

DaVinci Family

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.2

Davinci Vs OMAP . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Introduction to TMS320DM6437 . . . . . . . . . . . . . . . . . . . . . . . . .

2.2.1

Main Components of TMS320DM6437 DVDP . . . . . . . . . . . . .

2.2.2

DM6437 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2.3

CPU Core:TMS320C64x+ DSP . . . . . . . . . . . . . . . . . . . . .

2.2.4

Ethernet and PCI Interface . . . . . . . . . . . . . . . . . . . . . . . .

2.2.5

External Memory: On-Board Memory . . . . . . . . . . . . . . . . . .

10

2.2.6

Code Composer Studio . . . . . . . . . . . . . . . . . . . . . . . . . .

10

Davinci Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.3.1

Basic working functionality of the DaVinci processor . . . . . . . . . .

12

2.3.2

Signal Processing Layer (SPL) . . . . . . . . . . . . . . . . . . . . . .

12

2.3.3

Input output Layer (IOL) . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.3.4

Application Layer (APL) . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.2

2.3

iv

2.4

2.5

2.6

VPSS and EPSI APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.4.1

VPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.4.2

EPSI APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

APIs and Codec Engine (CE) . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.5.1

xDM and xDAIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.5.2

VISA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.5.3

CODEC ENGINE(CE) . . . . . . . . . . . . . . . . . . . . . . . . . .

21

Video Processing Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.6.1

Video standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.6.2

Video timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.6.3

Video Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.6.4

Video Sampling Format . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.6.5

Video IO Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

2.6.6

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

Target Object Tracking

24

3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.1.1

. . . . . . . . . . . . . .

24

3.2

Image preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

3.3

Background subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

3.4

Image segmentation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

3.5

Blob detection and identification . . . . . . . . . . . . . . . . . . . . . . . . .

30

3.5.1

Basic steps of blob detection . . . . . . . . . . . . . . . . . . . . . . .

30

3.5.2

Blob detection and identification method . . . . . . . . . . . . . . . .

32

3.6

Feature extraction for blobs . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

3.7

Tracking: Centroid calculation . . . . . . . . . . . . . . . . . . . . . . . . . .

35

Conventional approaches for target tracking

Video and Image Processing Algorithms on TMS320DM6437

37

4.1

Implementation of Single target tracking on TMS320DM6437 . . . . . . . . .

38

4.1.1

Debugging and profiling results . . . . . . . . . . . . . . . . . . . . .

45

Implementation of multiple object tracking on DM6437 . . . . . . . . . . . . .

48

4.2.1

Debugging and profiling results . . . . . . . . . . . . . . . . . . . . .

54

Implementation object tracking algorithm in matlab . . . . . . . . . . . . . . .

57

4.2

4.3

Summary

61

APPENDIX

63

6.1

APPENDIX A:Real-Time Video Processing using Matlab simulink interface


with ccs studio3.3 in DM6437 DVDP . . . . . . . . . . . . . . . . . . . . . .

63

6.1.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

6.1.2

Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . .

63

6.1.3

Software Requirements: . . . . . . . . . . . . . . . . . . . . . . . . .

63

6.1.4

Configuration Parameters for C6000 Hardware . . . . . . . . . . . . .

66

6.2

APPENDIX B:Edge Detection using Video and Image library . . . . . . . . .

71

6.3

APPENDIX C: Video Processing Tokens . . . . . . . . . . . . . . . . . . . .

72

6.3.1

Video standard ( NTSC & PAL) . . . . . . . . . . . . . . . . . . . . .

72

6.3.2

Video timing(Interlaced Vs Progressive) . . . . . . . . . . . . . . . . .

74

6.3.3

Video Resolution(HD, ED, SD) . . . . . . . . . . . . . . . . . . . . .

74

6.3.4

Video file format(YUV420, YCbCr) . . . . . . . . . . . . . . . . . . .

75

6.3.5

Video IO Interface(Composite, component, S-Video) . . . . . . . . . .

76

APPENDIX D:YUV to RGB Conversion . . . . . . . . . . . . . . . . . . . .

78

6.4.1

YUV format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

6.4.2

8-Bit YUV Formats for Video . . . . . . . . . . . . . . . . . . . . . .

81

6.4.3

Color Space Conversion . . . . . . . . . . . . . . . . . . . . . . . . .

85

APPENDIX E: Single object tracking on DM6437 CODE . . . . . . . . . . . .

88

6.4

6.5

vi

List of Figures
1.1

Object tracking for visual surveillance system . . . . . . . . . . . . . . . . . .

2.1

TMS320DM6437 hardware block diagram . . . . . . . . . . . . . . . . . . . .

2.2

TMS320DM6437 hardware component diagram . . . . . . . . . . . . . . . . .

10

2.3

DaVinci SW Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.4

Video Processing Sub system . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.5

APIs of XDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.6

VISA Work Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.7

CE Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.8

CE Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

3.1

Steps in an object tracking system . . . . . . . . . . . . . . . . . . . . . . . .

25

3.2

Image segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

3.3

Feature extraction

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

4.1

Single object tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

4.2

Evm board setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

4.3

EVM board setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

4.4

Result of single object tracking algorithm . . . . . . . . . . . . . . . . . . . .

47

4.5

Result of single object tracking algorithm . . . . . . . . . . . . . . . . . . . .

48

4.6

Multiple object tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

4.7

scale=.3,Result of multi object tracking algorithm . . . . . . . . . . . . . . . .

55

4.8

Result of multi object tracking algorithm . . . . . . . . . . . . . . . . . . . . .

55

4.9

Debugging results of Three target tracking . . . . . . . . . . . . . . . . . . . .

56

4.10 Different steps for object tracking using segmentation and pattern matching . .

57

4.11 Flow chat of Object tracking based on segmentation & pattern matching.

58

vii

. . .

4.12 Featured extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

4.13 Result of object tracking algorithm . . . . . . . . . . . . . . . . . . . . . . . .

59

4.14 Results of tracking algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

6.1

DM6437 board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

6.2

Open simulink lib browser 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

6.3

Video capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

6.4

Add video display 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

6.5

Video capture conf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

6.6

Video display conf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

6.7

Video preview 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

6.8

Video image toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

6.9

Target selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

6.10 Video complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

6.11 Video sobal edge detection 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

6.12 Simulink conf 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

6.13 Simulink conf 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

6.14 Simulink conf 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

6.15 Video and image library

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

6.16 4:4:4 YCbCr, 4:2:2 YCbCr, 4:2:0 YCbCr color sampling format respectable . .

76

6.17 YUY2 memory layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

6.18 UYVY memory layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

6.19 RGB2UYVY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

6.20 YUV sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

6.21 Picture ascept ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

6.22 Pixel ascept ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

viii

List of Tables
4.1

Single object tracking profiler data . . . . . . . . . . . . . . . . . . . . . . . .

46

4.2

Multiple object tracking profiler data

. . . . . . . . . . . . . . . . . . . . . .

55

6.1

RGB and YCbCr values for various colors using BT.601 . . . . . . . . . . . .

80

ix

List of Abbreviations
APL

Application Layer

DVDP

Digital Video Development Platform

DVSDK

Digital Video Software Development Kit

EPSI

Embedded Peripheral Software Interface

EVM

Evaluation Module

GPP

General Purpose processor

HD

High Definition

IOL

Input output Layer

NTSC

National Television System Committee

PAL

Phase Alternating Line

SD

Standard Defination

SPL

Signal Processing Layer

VISA

Video Image Speech Audio

VPBE

Video Processing Back End

VPFE

Video Processing Front End

VPSS

Video Processing Sub System

xDAIS

eXpressDSP Algorithm Interface Standard

xDM

eXpressDSP Digital Media


x

Chapter 1
Introduction
Surveillance systems are used for monitoring, screening and tracking of activities in public
places such as banks, in order to ensure security. Various aspects like screening objects and
people, bio metric identification and video surveillance, maintaining the database of potential
threats etc., are used for monitoring the activity. Moving object tracking in video has attracted
a great deal of interest in computer vision [1]. For object recognition, navigation systems and
surveillance systems, object tracking is an first-step. The object tracking [13,16] methods may
broadly be categorized as segmentation-based method, template-based method, probabilistic
and pixel-wise. In segmentation-based tracking or blob detector, the basic idea is aimed
at detecting points and/or regions in the image that are either brighter or darker than the surrounding. They are easy to implement and fast to compute but may lack accuracy for some
application [21]. Template concepts are based on matching the direct appearance from frame
to frame. These methods offer a great deal of accuracy but are computationally expensive [22].
The probabilistic method uses intelligent searching strategy for tracking the target object. Similarly, the similarity matching techniques are used for tracking the target object in pixel-based
methods.
Most tracking algorithms are based on difference evaluation between the current image
and a previous image or a background image [2] . However, algorithms based on the difference
of images have problems in the following cases. (1) Still objects included in the tracking task
exist. (2) Multiple moving objects are present in the same frame. (3) The camera is moving.
(4) Occlusion of objects occurs. This can be solved by using an algorithm for object tracking
[1], based on image segmentation and pattern matching. But we use a novel approach image
segmentation algorithm, in order to extract all objects in the input image.
1

In all these applications fixed cameras are used with respect to static background (e.g.
stationary surveillance camera) and a common approach of background subtraction is used to
obtain an initial estimate of moving objects. First perform background modeling to yield reference model. This reference model is used in background subtraction in which each video
sequence is compared against the reference model to determine possible variation. The variations between current video frames to that of the reference frame in terms of pixels signify
existence of moving objects. The variation which also represents the foreground pixels are
further processed for object localization and tracking. Ideally, background subtraction should
detect real moving objects with high accuracy and limiting false negatives (not detected) as
much as possible. At the same time, it should extract pixels of moving objects with maximum
possible pixels, avoiding shadows, static objects and noise.
Image segmentation is the process of identifying components of the image. Segmentation
involves operations[3] such as thersholding, boundary detection, region growing etc. Thresholding is the process of reducing the grey levels in the image. Many algorithms exist for thresholding [19,20]. Boundary detection finds out edges in the image. Any differential operator can
be used for boundary detection [1,2].
Blobs are binary objects i.e. points which are in same state. Blobs can be differentiated
on the basis of color, shape, area, perimeter etc. An image with various shapes and colors has
to undergo various processes before actual blob detection. Blob detection is a corner stone for
object detection and recognition. A critical aspect of blob detection [7] is the precision of detected pixels at the blobs border. Usually a blob shows a descend in gradient of intensity at
the border. In threshold based detection this causes the detection of false-positives and leads
to imprecise results if the image material contains too much noise. The number of identified
pixels, which belong to the Blob, estimate the blobs characteristics. A possible solution is
the parallel processing with detection procedures, in different image resolutions [11]. But this
requires multiple copies of the image data and transform it into different image resolutions.
In addition the results for the multiple copies have to be merged to one general result. Other
approaches use fixed parameters for the detection of blobs. For example the reduction of the
application environment to a fixed background [18] to apply foreground-background segmentation. These methods are vulnerable for changing parameters, like illumination conditions or
changing perspectives.
A precise computation of the blobs center-point is very dependent on the precision of

the blob detection. Especially for blobs that are not perfectly circular- or squarely-shaped, the
detection and computation methods need to be exact. Precision is an important factor because
with only a few number of false-positive pixels the computed center point gets shifted. This
will cause a large error in the computations for the position and orientation of the light emitting
device
In case of feature extraction and tracking, a common method to compute center-points
of blobs is a bounding box which refers to the minimum and maximum positions in the XYcoordinate system. The method of inner circle creates a circle of maximum size that fits into
the blob area without intersecting the area around it. Both methods do not solve the problem
of precision, in reference to not perfectly circular- or squarely-shaped blobs. A very common
method is center-of-mass that refers to the number of pixels in relation to the coordinates of the
pixels [10]. It computes the center of the blob based on the number of pixels and weights them
by a related value, for example the brightness of the pixel.
It is possible to increase precision by applying a higher resolution, but this is the point
where GPP architectures reach their performance barrier. This is a big problem of computer
vision. With more data to analyze, the maximum frame rate goes down and the system is not
able to achieve real-time processing speed any longer.

Figure 1.1: Object tracking for visual surveillance system

Video surveillance systems require high resolution video, large bandwidth and higher
computational speed at low cost and power. DaVinci devices are suitable for surveillance applications as they provide, ASIC-like low cost and power for the complex processing, programmable DSPs with high performance. It also provide function accelerators in the video
processing subsystems(VPSS) for common video processing tasks such as encoding, decoding
3

and display. DaVinci devices can be used across all product lines i.e. digital video recorders
(DVR), digital video servers (DVS) and surveillance IP modules. Surveillance applications
are constantly evolving, adding new features like analytic, image stabilization, image recognition, motion estimation, and target tracking. The TMS320DM6437, most remarkable improvements are a new C64x+ core, more level-1 (L1) memory and a new DMA architecture
over TMS320DM642 [59] . Two DMA controllers are available within the DM6437 DSP. The
IDMA can perform transfers from/to internal memory only, while the EDMA3 can perform
transfers from/to all kinds of memory. Its main disadvantage is a smaller level-2 (L2) memory.
TMS320DM6437 based on TI multimedia Digital Signal Processor (DSP)can be used for
real-time Object tracking [17][19] in video as it provides highly efficient video coding standard
H.264 for compression algorithm and wireless intelligent video monitoring system(WiFi and
WiMAX) [21]. The system can do real-time processing of video data acquired from CCD camera by using the codec engine framework to call the video processing algorithm library, which
is used to implement image encoding and video object tracking according to user algorithm
and to provide a display on displaying device. In this project, a video object tracking approach
based on Image segmentation [1], background subtraction and blob detection and identification
was implemented on TMS320DM6437. Mixture of thershloding based segmentation, centroid
based tracking and novel idea for blob detection and identification algorithm are introduced
to improve video object tracking and it provides fast, accurate and good video object tracking
services.

Chapter 2
DaVinci Digital Media Processor
2.1

DaVinci Processor and Family

The DaVinci Technology is a family of processors with integrated with software and hardware
tools package for a flexible solution of the host of applications from cameras to phones to hand
held devices to automotive gadgets. DaVinci Technology is the combination of raw processing
power and software needed to simplify and speed up the production of digital multi-media and
video equipment.
DaVinci technology consists of:
DaVinci Processors: Scalable, programmable DSPs and DSP-based SoCs(system on
chip) tailored from DSP cores, accelerators, peripherals and ARM processors optimized
to match performance, various price and feature requirements in a spectrum of digitalvideo end equipments. i.e TMS320DM6437 and TMS320DM6467.
DaVinci Software:It is inter communicable, optimized, video and audio standards.
codecs leveraging DSP and integrated accelerators, APIs within operating systems(Linux)
for rapid software implementation. i.e Codec Engine, DSP BIOS, NDK, Audio and Video
Codec.
DaVinci Development Tools/Kits: Complete development kits along with reference
designs, DM6437 DVSDK, Code Composer Studio, Green Hill, Virtual linux.
DaVinci Video Processor solution are the tailored for digital audio, video, image and vision application. The DaVinci platform includes a general purpose processor(GPP), video accelerators, an optional DSP, and related peripherals
5

2.1.1

DaVinci Family

Dual-core(ARM and DSP) models : Those Davinci Processor, which are having Dualcore(ARM+DSP) model are DM6443, DM6446, DM6467.
DM6443, This processor Contains ARM9 + TI C64x+ DSP + DaVinci Video (Decode).Which is used for Video Accelerator and Networking for display.
DM6446 , Which is having ARM9 + TI C64x+ DSP + DaVinci Video (Encode
and Decode).Which is used for Video Accelerator and Networking for capture and
display.
DM6467, This processor contains ARM9 + TI C64x+ DSP + DaVinci Video (Encode and Decode). This processor is used for Video Accelerator and Networking
for high definition capture and display.
Only-DSP models: The DaVinci chip which contains only DSP Processor are DM643x
and DM64x DM643x and DM64x are having DSP core as TI C64x+.
Only-ARM models: The processor which contains ARM architect are DM335, DM355,
DM357 and DM365.
DM335 is a variant (pin compatible) without the MJCP.
DM355, which contains ARM9 + DaVinci Video (Encode and Decode) which is
used for MPEG4/JPEG co processor (MJCP).
DM357 is a DM6446 variant (pin-compatible) with the DSP replaced by a dedicated
video co processor (HMJCP).
DM365, which is an enhanced DM355, including addition of a high def second
video co processor (HDVICP)
The DaVinci family of processors now scales from multiple core devices (e.g. DM644x)
to single core DSP devices (e.g. DM643x) to single core ARM devices (e.g. DM355)
These processors are available today (TMS320DM647, TMS320DM648, TMS320DM643x,
TMS320DM6446, TMS320DM355, TMS320DM6467).

2.1.2

Davinci Vs OMAP

In case of DaVinci processor, we will get better DSP core performance and in case of OMAP
processor, we will get better on ARM core performance.The DaVinci processor is more suitable
toward DSP Design and on the other hand, OMAP has a much more powerful ARM core, so
for general purpose processing (i.e. GPP/ARM) the OMAP designs is much more suitable. For
OMAP3 in particular the DSP will always be slower than the ARM, where as for DM6467, the
ARM will always be slower than the DSP.

2.2

Introduction to TMS320DM6437

The TMS320DM6437 Digital Video Development Platform (DVDP) includes high performance
software programmable Digital Media Processors (DMP), which reduces time-to-market for
development of new multimedia applications. DMP, being programmable, provides an important feature of flexibility that helps in easy development and debugging of multimedia applications. The kit is designed to support various Videoover- IP applications such as set-top boxes,
surveillance cameras, digital video recorders and encoding and decoding flexibility for various
industry-standards available in market.
There are many advantages of using DMP over FPGA or ASIC. ASIC has few disadvantages amongst which its higher cost and longer time-to-market are crucial in mass production
and marketing. FPGAs too are considerably harder to program in certain applications. DMP
provides a good balance in terms of cost, flexibility, easily programmable and the time-tomarket compared to rest two. The new DaVinci series processor included in this DVDP supports
new set of extended instructions that helps to increase its overall performance.

2.2.1

Main Components of TMS320DM6437 DVDP

This platform contains both stand-alone and PCI-based evaluation and development of any application that uses TIs DaVinci processors. The TMS320DM6437 (DM6437) contains a single
C64x+ core along with a video input (VPFE) and video output macro (VPBE). Key features of
this platform are:
1. TIs DM6437 processor with operating frequency of 600 MHz
7

2. 10/100 MBS Ethernet / PCI Bus Interface


3. 128 Mbytes of DDR2 SDRAM
4. 16 Mbyte non-volatile Flash memory + 64 Mbyte NAND Flash + 2 Mbyte SRAM
5. One video decoder(TVP5146M2) that support composite or S-video
6. Configurable BIOS load option
7. Embedded JTAG Emulation Interface
8. Four video DAC outputs that support component, composite and RGB
9. AIC33 stereo codec
10. four LEDs and four position DIP switch are present for user input output testing.

Figure 2.1: TMS320DM6437 hardware block diagram

2.2.2

DM6437

DM6437 is a high performance processor from TIs DaVinci Family that supports clock rates
of 400, 500, 600, 660 and 700 MHz. The DSP core contains eight functional units, two generalpurpose register files and 2 data paths. Eight functional units can execute eight instructions
simultaneously. Each functional unit is assigned a dedicated task of multiplication, arithmetic,
8

logical and branch operation, load data from memory into registers and store data from registers
into memory. The two general purpose register files, namely A and B, contains 32 32-bit registers, providing a total of 64 registers. These registers support data types that include packed
8-bit data, packed 16- bit data, 32-bit data, 40-bit data and 64-bit data values. In case of values
exceeding 32- bits, a pair of registers is used to represent 40-bit and 64-bit data values.

2.2.3

CPU Core:TMS320C64x+ DSP

The TMS320C64x+ DSPs are the highest-performance fixed-point DSP generation in the TMS320C6000 DSP platform. The CPU core consists of Level 1 Program (L1P) cache, Level 1 Data
(L1D) cache and Level 2 (L2) Unified cache. L1P cache has a size of 32 Kbytes and can be
configured either as a memory-mapped or direct-mapped cache. L1D cache has a size of 80
Kbytes, out of which 48 Kbytes is configured as memory-mapped and 32 Kbytes can be configured either as a memory-mapped or 2-way set associative cache. L2 cache can be upto 256
Kbytes in size and shared as program as well as data. L2 memory can be configured as a standalone SRAM or as a combination of Cache an SRAM. The size of L2 cache can be varied by
making changes in the configuration of system. These changes can be performed in the gel file
(evmdm6437.gel) by changing the parameter CACHE_L2CFG. If the value of this parameter
is set to 0, it indicates that no L2 cache is configured and whole memory is used as SRAM.
By changing the value of CACHE_L2CFG to 1, 2, 3 or 7, one can get L2 cache size of 32KB,
64KB, 128KB or 256KB respectively [5].
The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable
of executing one instruction every clock cycle. The .M functional units perform all multiply
operations. The .S and .L units perform a general set of arithmetic, logical, and branch functions. The .D units primarily load data from memory to the register file and store results from
the register file into memory.

2.2.4

Ethernet and PCI Interface

The Ethernet interface on DM6437 provides an interface between the board and the external
network. This interface supports both 10 Mbps and 100 Mbps network connections. The Peripheral Component Interconnect (PCI) provides an interface to connect DM6437 with other
PCI-compliant devices. This connection provides an easy way for movement of data from one

Figure 2.2: TMS320DM6437 hardware component diagram

device to another.

2.2.5

External Memory: On-Board Memory

The TMS320DM6437 consist of 128 Mbytes of DDR2 SDRAM. The SDRAM is used for
storage of program, video or data. Also, the board contains 16 Mbytes NOR Flash, 2 Mbytes
SRAM and 64 Mbytes NAND Flash. NAND and NOR Flash are used mainly as boot-loaders,
while SRAM is mainly used for debugging application code.

2.2.6

Code Composer Studio

Code Composer Studio (CCS) provides an Integrated Development Environment (IDE) to incorporate the software tools used to develop applications targeted to Texas Instruments Digital
Signal Processors. CCS includes tools for code generation,such as a C compiler, an assembler,
and a linker. It has graphical capabilities and supports real-time debugging. It provides an easyto-use software tool to build and debug programs.

10

The C compiler compiles a C source program with extension .c to produce an assembly source file with extension .asm.The assembler assembles an.asm source file to produce a
machine language object file with extension.obj. The linker combines object files and object
libraries as input to produce an executable file with extension.out. This executable file represents a linked common object file format (COFF), popular in Unix-based systems and adopted
by several makers of digital signal processors.
To create an application project, one can add the appropriate files to the project. Compiler/linker options can readily be specified. A number of debugging features are available,
including setting breakpoints and watching variables; viewing memory, registers, and mixed C
and assembly code; graphing results; and monitoring execution time. One can step through a
program in different ways (step into, over, or out).
Real-time analysis can be performed using real-time data exchange (RTDX). RTDX allows
for data exchange between the host PC and the target DVDP, as well as analysis in real time
without stopping the target. Key statistics and performance can be monitored in real time.
Through the joint team action group (JTAG), communication with on-chip emulation support
occurs to control and monitor program execution. The DM6437 EVM board includes a JTAG
interface through the USB port.
CCS provides a single IDE to develop an application by offering following features:
Programming DSP using C/C++
Ready-to-use built-in functions for video and image processing
Run-time debugging on the hardware
Debugging an application using software breakpoints

2.3

Davinci Software Architecture

DaVinci Software Architecture consist of three layer that is Signal Processing Layer (SPL),InputOutput Layer (IOL) and Application Layer(APL). Signal Processing Layer take care of all the
processing functions. Similarly, all all the input and output functions are grouped into another
layer that is Input-Output Layer (IOL). The third layer is the Application Layer(APL) which is
the most important part in developing new algorithm. Of course, Most of time we will develop
components for either the SPL or IOLs.
11

2.3.1

Basic working functionality of the DaVinci processor

let us take example of video capture driver, for example, reads data from a video port or peripheral and starts filling a memory buffer. When this input buffer is full, an interrupt is generated
by the IOL to the APL and a pointer to this full buffer is passed to the APL.The APL picks up
this buffer pointer and in turn generates an interrupt to the SPL and passes the pointer. The SPL
now processes the data in this input buffer and when complete, generates an interrupt back to
the APL and passes the pointer of the output buffer that it created. The APL passes this output
buffer pointer to the IOL commanding it to display it or send it out on the network.Note that
only pointers are passed while the buffers remain in place.The overhead passing the pointers is
negligible.
All these three layers and different APIs and Different Driver and component are shown
in Figure 2.1

2.3.2

Signal Processing Layer (SPL)

SPL consists of the entire signal processing functions or algorithms that run on the device. For
example, a video codec, such as MPEG4-SP or H.264, will run in this layer. These algorithms
are wrapped with expressed Digital Media(xDM) API. In between xDM and (Video, Image,
Speech, Audio)VISA are the Codec Engine, Link and DSP/BIOS. Memory buffers, along with
their pointers, provide input and output to the xDM functions. This decouples the SPL from all
other layers. The Signal Processing layer (SPL) presents VISA APIs to all other layers. The
main component of the SPL are xDM, XDAIS, VISA APIS and Codec Engine Interface.

2.3.3

Input output Layer (IOL)

The Input Output Layer (IOL) covers all the peripheral drivers and generates buffers for input or
output data. Whenever a buffer is full or empty, an interrupt is generated to the APL. Typically,
these buffers reside in shared memory, and only pointers are passed from IOL to the APL and
eventually to SPL. The IOL is delivered as drivers integrated into an Operating System such
as Linux OS or WinCE. In the case of Linux, these drivers reside in the kernel space of Linux
OS. The Input Output layer (IOL) presents the OS-provided APIs as well as EPSI APIs to all
other layers. IOL contains Video Processing Subsystem(VPSS) device driver used for video
capturing and displaying, USB driver to capture video to USB based media, debug is done by
12

Figure 2.3: DaVinci SW Architecture

using UART serial port driver for console application, when we want to captured video is sent
over the network we need for Ethernet driver that is EMAC and VPFE driver internally uses
I2C driver for communication protocol, for audio processing system Multichannel Audio Serial
Port (McASP) driver are used, for buffering of stream data we are using Multichannel Buffered
Serial Port (McBSP) driver are used.

2.3.4

Application Layer (APL)

The Application layer interacts with IOL and SPL. It makes calls to IOL for data input and
output, and to SPL for processing.The Sample Application Thread (SAT) is a sample application
component that shows how to call EPSI and VISA APIS and interfaces with SPL and IOL as
built in library functions. All other application components are left to the developer. He may
develop them or leverage the vast open source community software. These include, but not
limited to, Graphical User Interfaces (GUI), middle ware, networking stack, etc. Master thread
is the highest level thread such as an audio or video thread that handles the opening of I/O
resources (through EPSI API), the creation of processing algorithm instances (through VISA
API), as well as the freeing of these resources. Once necessary resources for a given task
are acquired, the master thread specifies an input source for data (usually driver or file), the
processing to be performed on the input data (such as compression or decompression) and an
output source for the processed data (usually driver or file).
13

The Network Developers Kit(NDK)provides services such as HTTP server, DHCP client/server,
DNS server, etc. that reside in the application layer. Note that these services use the socket interface of the NDK, which resides in the I/O layer, so the NDK spans both layers.

2.4

VPSS and EPSI APIs

VPSS is a video processing driver present in IOL. Embedded Peripheral Software Interface(EPSI),
which is span to both APL and IOL layer. APL makes calls to IOL for data input and output
driver and APIs.

2.4.1

VPSS

VPSS provides an input interface Video Processing Front End (VPFE) on the DM6437 for
external imaging peripherals such as image sensors, video decoders and digital camera in order
to capture the image and an output interface video processing back end, (VPBE) for display
devices, such as analog SDTV displays, digital LCD panels, HDTV video encoders in order to
display the processed image. VPSS (Video Processing Sub System) block level diagram, which
mainly consists of VPFE and VPBE subsystems is shown in Figure 2.2.

Figure 2.4:

Video Processing Sub system

The common buffer memory and direct memory access (DMA) controls ensure the efficient use of the DDR2 memory controller burst bandwidth and other peripherals. The shared
buffer memory logic provides primary source/sink to all of VPFE and VPBE modules.VPSS

14

uses DDR2 bandwidth efficiently due to both its large bandwidth requirements and the realtime requirements of the VPSS modules.
1. Video Processing Front End (VPFE)
The VPFE block is comprised of a charge-coupled device (CCD) controller (CCDC),
preview engine image pipe (IPIPE), hardware 3A statistic generator (H3A), resizer and
histogram.
The CCD controller is responsible for accepting raw unprocessed image/video data from a
sensor (CMOS or CCD). The preview engine image pipe (IPIPE) is responsible for transforming raw (unprocessed) image/video data from a sensor (CMOS or CCD) into YCbCr
422 data which is easily controlled for compression or display Typically, the output of
the preview engine is used for both video compression and displaying it on an external
display device, such as a NTSC/PAL analog encoder or a digital LCD. The output of Preview engine or DDR2 is the input to the Resizer, which can be resized to the 720x480
pixels per frame. The output of the resizer module will be sent to the SDRAM/DDRAM.
Then Resizer is free to the Preview Engine Pipe for the further processing.
The H3A module is designed to support the control loops for auto focus (AF), auto white
balance (AWB), and auto exposure (AE) by collecting metrics about the imaging/video
data, where AF engine extracts and filters RGB data from the input image/video data and
provides either the accumulation or peaks of the data in a specified region and AE/AWB
engine accumulates the values and checks for saturated values in a sub-sampling of the
video data. Histogram allows luminance intensity distribution of the pixels of the image/frame to be represented.
2. Video Processing Back End (VPBE) VPBE is responsible for displaying the processed
image on different display devices such as TV, LCD or HDTV. The VPBE block is comprised of the on-screen display (OSD) and the video encoder (VENC) modules.
OSD is a graphic accelerator,which is responsible for resizing of images to either NTSC
format or PAL format (640x480 to 720x576) on the output devices and it combines display windows into a single display frame, which helps VENC module to output the video
data. The primary function of the OSD module is to gather and combine video data
and display/bitmap data and then pass it to the video encoder (VENC) in YCbCr for-

15

mat.VENC converts the display frame from the OSD into the correctly formatted, desired
output signals in order to interface it to different display devices.
The VENC takes the display frame from the on-screen display (OSD) and formats it into
the desired output format and output signals (including data, clocks, sync, etc.) that are
required to interface to display devices. The VENC consists of three primary sub-blocks,
analog video encoder which generates required signals to interface to NTSC/PAL system
also includes video A/D converter,second is timing generator, responsible for generate the
specific timing required for analog video output and lastly digital LCD controller ,which
supports various LCD display formats, YUV outputs for interface to high-definition video
encoders and/or DVI/HDMI interface devices.

2.4.2

EPSI APIs

Device driver APIs vary from OS to OS (Linux, DSP/BIOS,Win CE, etc.) . For example, the
device driver APIs for Linux are different from the device driver APIs for DSP/BIOS. Usage
of device driver APIs creates portability issues when an application is migrated from one OS
to another, that is DM6446 with Linux to DM6437 with DSP/BIOS. So EPSI is a common
interface across all OS and have a separate glue layer that maps the EPSI APIs to device driver
specific APIs. Each OS has a separate glue layer called EPSI to Driver Mapping (EDM) for
each device. The DSP/BIOS EDM glue layer maps the EPSI APIs to DSP/BIOS device driver
APIs as shown in figure 2.3. Definition of EPSI APIs does not mask or prevent the usage of
device driver APIs directly .
The Different ESPI APIs are DEV_open(), DEV_read(), DEV_write(), DEV_close(),
DEV_control(), DEV_getBuffer(), DEV_returnBuffer()
1. VPFE_OPEN: This function initializes the device and returns a handle. The FVID handle
is then used to configure the video input (composite, component, s-video), video standard
(NTSC, PAL, SECAM, AUTO), video file format (UYVY, YUYV, YUV420, etc.). These
configurations represent the actual physical connection and the file format supported by
the driver.
2. VPFE_GETBUFFER: The VPFE driver has a queue of buffers for capturing video frames.
Whenever a buffer is filled up, it is moved to the back of the queue. Instead of FVID_dequeue()
and FVID_queue() pair of API calls for checking status of the buffer full or not, the FVID
16

APIs provide another API FVID_exchange(). This API removes a buffer from the VPFE
buffer queue, takes a buffer from application and adds it buffer to the VPFE buffer queue.
The buffer dequeued from the VPFE buffer queue is returned to the application. This
function exchanges a processed buffer for a buffer that is ready for processing. Once the
buffer is exchanged, it is available the application for further processing. The buffer to be
processed is accessed via vpfeHdl- >buf structure.
FVID_queue() returns an empty buffer to the driver to be filled (input driver) or passes
a full buffer to the driver to be displayed (output driver). FVID_dequeue() aquires a full
buffer from the driver (input driver) or acquires an empty buffer from the driver for app to
fill (output driver). if no buffers are already available in the stream, this call can block until
a buffer is made available from the driver pBuf is passed by reference for FVID_dequeue
to modify with the address of the return buffer.
3. VPFE_RETURNBUFFER: In Linux, the VPFE_getBuffer() dequeues a buffer from the
VPFE queue. It needs to be returned back to the VPFE queue using VPFE_returnBuffer()
function. However, in DSP/BIOS, the buffer is dequeued and queued at once using the
FVID_exchange() API inside the VPFE_getBuffer() IOL API. Hence, there is no need to
return the buffer back to the VPFE queue.
4. VPFE_CLOSE: This function is used to uninitialize the VPFE device. The buffers allocated at the VPFE driver are freed using the FVID_free() API. The device is then uninitialized using FVID_delete() API.

2.5

APIs and Codec Engine (CE)

DaVinci processor have three APIs. These are eXpressDSP Digital Media (xDM), Video Image
Speech Audio (VISA) and EPSI APIs and also has an transparent interface Codec Engine (CE).
Codec Engine is a piece of software, developed by Texas Instruments, that manages the system
resources and translates VISA calls into xDM calls. EPSI is discussed in previous section .

2.5.1

xDM and xDAIS

When we want to create an algorithm which is basic eXpressDSP-compliant (xdc). It requires


to have a standard interfacing algorithm. This standard interfacing is provided by xDAIS and
17

xDM .
xDAIS
An eXpressDSP Algorithm Interface Standard (xDAIS) [33] algorithm is a module that
implements the abstract interface internal algorithm (IALG). The IALG API takes allows
the user application to allocate memory for the algorithm and share memory between
algorithms. The algorithm has to contain these basic APIs that is Resource allocation/deallocation, initialization and start/stop APIs in order to satisfy xDAIS standard. These
APIs are :
- algAlloc()
- algInit()
- algActivate()
- algDeactivate()
- algFree()
xDM
An xDM standard defines a uniform set of APIs for multimedia compression algorithms
(codecs) with the main intent of providing ease of replaceability and insulate the application from component level changes. xDM components may run on either the DSP or the
ARM processor.
xDIAS is a the base class which is having various API functions i.e.algoAlloc(), algInit()
etc. where xDM is the child class which is inherited from its base class(xDIAS), which
contain all the function of the base class and it own function algProcess()process and
algControl() as shown in figure 2.4. Run time process & control APIs.

2.5.2

VISA

CE presents a simple and consistent set of interfaces to the application developer called VISA,
which stands for Video, Imaging, Speech and Audio.
Lets take an example of an Video Encoder application, the codec engine software basically translates these create, control, process and delete VISA APIs to their respective xDM
APIs, while managing the system resources and inter-processor communication. create() API

18

Figure 2.5: APIs of XDM

Figure 2.6:

VISA Work Flow

creates an instance of xDM algorithm and allocates the required resources for the algorithm to
run. Create() API queries the xDM algorithm for the resources that it needs, and based on the
algorithms requirements, it allocates them with the help of Codec Engine. CE checks the resource pool and responds to the queries sent by create API. Note that xDM-compliant functions
cannot allocate resources directly; they can only request for the resources; the Codec Engine is
always in control of the resources and manages them across multiple functions running in SPL
as shown in Figure 2.5.
Control() API allows the APL to modify parameters that control the algorithm. For example, in a MPEG4 video codec, we want to change the bit-rate or resolution. Process () API
filters the input buffer to get the output buffer e.g., encode or decode function. For example,
an MPEG4 algorithm would use the input buffer, encode it and create an encoded frame in an
output buffer.delete() API deletes the algorithm instance and reclaims the resources.
The process and control API of VISA are a direct reflection of the low-level process and
control functions of the xDM algorithm. As a result, we are providing low-level control of
codecs along with high level abstraction of the details . In Figure 2.4, we show the specific
VISA and xDM APIs. The APL has to understand only these four APIs and the signature of
19

these APIs is held constant which helps easy to replace one codec with another .
VISA - Four SPL Functions Complexities of Signal Processing Layer (SPL) are abstracted
into four functions: _create, _delete, _process, _control.
Create: creates an instance of an algorithm that is, it mallocs the required memory and
initializes the algorithm.
/* allocate and initialize video decoder on the engine */
dec = VIDDEC_create(ce, decoderName, &vdecParams);
Process: invokes the algorithm, calls the algorithms processing function passing descriptors for in and out buffers.
/* decode the frame */
status = VIDDEC_process(dec, &encodedBufDesc, &outBufDesc,
&decInArgs, &decOutArgs);
Control: used to change algorithm settings, algorithm developers can provide user controllable parameters.
/* Set Dynamic Params for Decoder */
status = VIDDEC_control(dec, XDM_SETPARAMS, &decDynParams,
&decStatus);
Delete: deletes an instance of an algorithm, opposite of "create", this deletes the memory
set aside for a specific instance of an algorithm .
/* teardown the codecs */
VIDDEC_delete(dec);
The codec engine software basically translates these create, control, process and delete APIs to
their respective xDM APIs, while managing the system resources and inter-processor communication.

20

2.5.3

CODEC ENGINE(CE)

Codec Engine is a set of APIs that use to instantiate and run xDAIS algorithms. A VISA
interface is provided as well for interacting with xDM-compliant xDAIS algorithms. Codec
Engine(CE) comes into picture to manage the system resources and functions. All these lower
level management and control functions are now handled by the Codec Engine which manages
the xDM component and abstracts the application developer from the signal processing layer.
Codec Engine is a set of APIs which are used to call xDAIS algorithm. Users can instantiate and call xDAIS algorithm by them. They also provide VISA interface which have
compatibility with xDM . CE interface of algorithm is the same regardless of operating system
is Linux, VxWorks, DSPBIOS or WinCE and processor (ARM or DSP) hardware. Core Engine
API(CE API) are :
Engine_open: Open a Codec Engine;
Engine_close: Close a Codec Engine;
Engine _getCpuLoad: Obtain the percentage of usage of CPU;
Engine _ getLastError: Obtain the last error code caused by operation;
Engine_getUsedMem: Obtain the usage of memory.
Codec Creation (Instantiation)

Figure 2.7:

CE Interface

In Figure 2.7, when VIDENC_create()(VISA interface) application sends a request to create an instance via codec engine. Codec Engine gathers algorithm resource requirements from
21

algorithm via iAlg(algorithm interface) and iDMA(DMA interface) interfaces. then Codec Engine secures resources from resource pool then Codec Engine grants resources to Algo via
iAlg and iDMA interfaces. Then using xDAIS APIs (algNumAlloc, algAlloc, algInit, algFree,
algActivate, algDeactivate, algMoved) will create an instant for codec.

Figure 2.8:

CE Algorithm

The codec engine provides a robust, consistent interface for dynamically creating/deleting
algorithms and accessing/controlling algorithm instances. This allows algorithms of the same
class to be easily exchanged without any modification to application code and also allows the
same application code to be used across a variety of platforms without modification are shown
in figure 2.8.

2.6

Video Processing Tokens

There are five different video processing tokens presents. These are video standard, video
timing, video resolution, video sampling format, video IO Interface.

2.6.1

Video standard

NTSC ( National Television System Committee) and PAL ( Phase Alternating Line) are two
video standard. The different functionality of these two are described in appendix C.

2.6.2

Video timing

There are two types of video timing is present, that is interlaced and progressive. The difference
between the interlaced and progressive are prenest in appendix C.

22

2.6.3

Video Resolution

Different types of video resolution are standard definition(SD), enhanced definition(ED), high
definition(HD). The difference between these three resolution are present in appendix C.

2.6.4

Video Sampling Format

Different type of the file format are 4:4:4 YUV sampling Format, 4:2:2 YUV sampling Format,
4:2:0 YUV. detail of sampling format, color conversion, 8 bit format, benifit of YUV over other
are present in appendix C. sampling Format.

2.6.5

Video IO Interface

There are three type of video IO interface present, these are composite, component, S-video.
Basic explanation of composite,S video and component are present in appendix C.

2.6.6

Summary

In this chapter we discussed the details of the DaVinci family and how it differs from OMAP
processor. We also provided an introduction to the software architecture, APIs,and major subsystems of the DaVinci processor TMS320DM6437.

23

Chapter 3
Target Object Tracking
3.1

Introduction

Object tracking can be defined as the process of segmenting an object of interest from a video
scene and keeping track of its motion, orientation, occlusion etc. in order to extract useful
information. Object tracking algorithm is applied to different application such as, automated
video surveillance, robot vision, traffic monitoring, animation etc.

3.1.1

Conventional approaches for target tracking

There are two major components of a visual tracking system [7]; target representation and localization, and filtering and data association. Target representation and localization is mostly
a bottom-up process. These methods give a variety of tools for identifying the moving object.
Locating and tracking the target object successfully depends on the algorithm. Typically the
computational complexity of these algorithms is low. The following are some common target
representation and localization algorithms:
Point tracking (Blob tracking): Segmentation of object interior (e.g. blob detection,
block-based correlation).
Kernel-based tracking (Mean-shift tracking): An iterative localization procedure based
on the maximization of a similarity measure such as Bhattacharyya coefficient.
Silhouette tracking (Contour tracking): Detection of object boundary(e.g. active contours or condensation algorithm, watershed algorithm).
24

Filtering and data association is mostly a top-down process, which involves incorporating prior
information about the scene or object, dealing with object dynamics, and evaluation of different
hypotheses. The computational complexity for these algorithms is usually much higher.The
following are some common filtering and data association algorithms:
Kalman filter: An optimal recursive Bayesian filter for linear functions subjected to
Gaussian noise.
Particle filter: Useful for sampling the underlying state-space distribution of non-linear
and non-Gaussian processes.
The major steps for object tracking are as shown in Figure 4.1. Here is the different steps

Figure 3.1: Steps in an object tracking system

in object tracking:
1. Image preprocessing.
2. Background subtraction
3. Image segmentation.
4. Blob detection and identification.
5. Feature extraction.
6. Object tracking.

25

3.2

Image preprocessing

The image captured by a surveillance camera is affected by various system noise, and output
data format may be uncompressed or compressed. In order to remove the noise preprocessing
of the image is essential. Preprocessing of image includes, filtering and noise removal data.

3.3

Background subtraction

Background subtraction[14,16] is a widely used approach for detecting moving objects in videos
from static cameras. The rationale in this approach is that of detecting the moving objects from
the difference between the current frame and a reference frame, often called the background
image, or background model. It is required that the background image must be a representation of the scene with no moving objects and must be kept regularly updated so as to adapt to
the varying luminance conditions and geometry settings.
The main motivation for the background subtraction is to detect all the foreground objects
in a frame sequence from fixed camera. In order to detect the foreground objects, the difference
between the current frame and an image of the scenes static background is compared with a
threshold. The detection equation is expressed as:
|f rame(i) background(i)| > T h

(3.1)

The background image varies due to many factors such as illumination changes (gradual or
sudden changes due to clouds in the background), changes due to camera oscillations, changes
due to high-frequencies background objects (such as tree branches, sea waves etc.).
The basic methods for background subtraction are
1. Frame difference
|f rame(i) f rame(i 1)| > T h

(3.2)

Here the previous frame is used as an background estimate. This evidently works only in
particular conditions of objects speed and frame rate, and is very sensitive to the threshold
.
2. Average or median

26

Background image is obtained as the average or the median of the previous n frames.
This method is rather fast, but needs large memory. The memory requirement is n *
size(frame).
3. Background obtained as the running average
B(i + 1) = F (i) + (1 ) B(i)

(3.3)

where , the learning rate, is typically 0.05 and no more memory requirements.
There are various other simple approaches aiming to, maximize speed, limit the memory requirements, to achieve the highest possible accuracy under any possible circumstances.
These approaches include, running gaussian average, temporal median filter, mixture of gaussians, kernel density estimation (KDE), sequential KD approximation, co-occurrence of image
variations, eigenbackgrounds. Since all the approaches focus on real-time performance, a lower
bound on speed always exists. A short review of all these approaches is described below.
Median, running average give the fastest speed [14]. Mixture of gaussians, KDE, eigenbackgrounds, SKDA, optimized mean-shift gives intermediate speed while standard meanshift gives slowest speed.
For memory requirements, average, median, KDE[14], mean-shift consumes highest memory. Mixture of Gaussian, eigenbackgrounds , SKDA consumes intermediate and running
average consumes very low memory.
For accuracy parameter, mixture of gaussians and eigen backgrounds provide good accuracy and the simple methods such as standard average, running average, median can
provide acceptable accuracy in specific applications.

3.4

Image segmentation

Image segmentation [3] is very essential and critical to image processing and pattern recognition. The basic aspects in image segmentation include thresholding, clustering, edge detection
and region growing. Image segmentation is the process of dividing an image into regions that
have same characteristics and then extracting the interested regions. It has applications in target
tracking, automatic control, biomedical image analysis, agricultural engineering and other fields
27

of image processing. In this process each region which is union of any two adjacent regions is
not homogeneous. If P( ) is a homogeneity predicate defined on groups of connected pixels[1],
then segmentation is a partition of the set F into connected subsets [3] or regions (S1, S2,..., Sn)
such that
n
[
Si with Si Sj = where i6=j.
i=0

The uniformity predicate P(Si)=true for all regions, Si ,and P(Si Sj)=false, when i6=j and

Si and Sj are neighbors.


The various steps in image segmentation are :
1. Thresholding [3]
Thresholding means classify image histogram by one or several thresholds. The pixels
are classified based on gray scale values lying within a gray scale class. The process of
thresholding involves deciding a gray scale value to distinguish different classes and this
gray scale value is called threshold. Threshold based classification can be classified
as global-threshold dividing and local-threshold dividing. Global-threshold dividing involves obtaining threshold by entire image information and dividing entire image. Localthreshold dividing involves obtaining thresholds in different regions and dividing each
region based on it.
In threshold segmentation, selecting threshold is the key. In traditional segmentation,
threshold is determined by one-dimension histogram. However, one dimension histogram
only reflects the distribution of image gray scale, without the space correlation between
image pixels. It may lead to error in segmentation and dis satisfactory result. Other image
segmentation algorithm include region growing, edge detection, clustering etc. Among
these thresholding and region growing are generally not used alone, but used in a series of
treatment process. The disadvantage is that it has inherent dependence on the selection of
seed region and the order in which pixels and regions are examined;the resulting segments
by region splitting may appear too square due to the splitting scheme.
2. Novel image segmentation approach: A novel image segmentation algorithm proposed
in[6] uses crisp fuzzier, smooth filter and median filter in the order as shown in figure
below.
In this approach, crisp fuzzier is used for finding out the relevant gray value information
and the output data of crisp fuzzier is then processed in order to eliminate isolated points
28

Figure 3.2: Image segmentation

and noise. Elimination of isolated points and noise removal is done by using a binary
smoothing filter and a median filter respectively. The processed image is then input to
the object detection and identification[6],[8],[9],[10] algorithm to search for image blobs.
Each blob is enclosed by rectangle which is elastic in nature, i.e., it can stretch in vertical,
horizontal and downward directions until the whole blob is enclosed in a rectangle. The
process is then repeated for all blobs that are present in the image. The statistical feature[1] of each blob such as, the approximate location of the center gravity, the size of the
rectangular enclosure, the actual size or pixel count of the blob, and volume bounded by
the membership function value are calculated. Tracking is carried out by calculating the
center of mass[7] of the the identified blobs. The image segmentation process is discussed
next.
Image filtering algorithm In image filtering algorithm is as follows:
(a) Read the pixel data of the input video frame . The data is processed through a crisp
fuzzier. Let the pixel value , say p, is in the range of PL and PH ( PL p PH),
then the data is assigned a membership function value of 1.0. Else the data is set to
zero. The process is repeated until all data are processed, where PL and PH are the
lower and upper limits of the pixel values of a color.
(b) The image data obtained in step 1 is input to a binary smoothing filter which is used
to
i. remove isolated points;
ii. fill in small holes in otherwise dark areas; and
iii. fill in small notches in straight-edge segments.

29

(c) The image data obtained in step 2 is filtered using a median filter to remove noise
in the blobs and to force points with very distinct intensities to be more like their
neighbors.The median filter is a nonlinear digital filtering technique, it replaces the
center value in the window with the median of all the pixel values in the window.
However, its performance is not better than Gaussian blur for high levels of noise,
where as for speckle noise and salt and pepper noise (impulsive noise), it is particularly effective.

3.5

Blob detection and identification

For different kinds of blobs, different detection methods are needed. Blob detection algorithm
need to fulfill number of requirements listed as below:
Reliability / noise insensitiveness: a low level vision method must be robust against under
and over segmentation to noise.
Accuracy: In many applications highly accurate results in sub pixel resolutions are needed.
Scalability:the algorithm should be scalable, such that blobs of different sizes can also be
detected.
Speed: The algorithm should be applicable to real-time processing,
Few and semantically meaningful parameters for initialization: The algorithms parameter should be easy to understand for non experts and the deliveried results should be
predictable, when changing the algorithms parameters.
Other than the important aspect of a blob detection algorithm is the capability of extracting geometric and radiometric attributes to allow for a subsequent classification of blobs.

3.5.1

Basic steps of blob detection

The blob detection in real time is the first step for tracking application. Blob detection can be
divided into three major steps [12]:
1. Blob detection (segmentation).

30

2. Blob specification.
3. Blob trajectory.
Blob detection
The aim of blob detection is to determine the center point of the blobs in the current frame
encoded in XY-coordinates. In this project, a blob consists of white and light gray pixels while
the background pixels are black. The number of blobs in the video frames can vary, which
complicates the conditions for the detection approach. To simplify the problem, an upper bound
for the number of blobs to detect has been defined. Two simple constraints are sufficient to
decide if a pixel belongs to a blob:
1. Is the brightness value of the pixel above a specified threshold?
2. Is the pixel adjacent to pixels of a detected blob?
The threshold is represented as a natural number value. It can be configured by the user or
computed by averaging arbitrary attribute values of all pixels in the current frame. Usually the
color value or the brightness value is used for this estimation process. The averaging requires
the application of a frame buffer to allow multiple processing of the same frame or a continuous
adjustment of the threshold, while performing the blob detection with some initial threshold
value. To combine detected pixels to blobs, a test of pixel adjacency needs to be performed.
One way is to check every single pixel on the frame for its adjacency to pixels which are already
detected and assigned to a blob. For the adjacency, it is common to distinguish between a four
pixel neighborhood or an eight pixel neighborhood.
Blob specification
Blob detection step generates a binary image where white represents the foreground objects .
The main goal of blob specification step is to assign a label that identifies the different white
blobs, and to track them.
The parameters[12] taken into account during specification are:
Multi-blob tracking: is used as it is needed to track more than one blob in our surveillance
applications.

31

Distance: it represents the distance between the blobs detected, to have a good performance of tracking smallest value possible of distance 1 is used. If the distance between
blobs is less than this value only one blob will be detected.
Area of the blob : It represents the number of pixels included in blob.
color: color feature of each blob.
Width and length of blob.
Blob trajectory
After tracking, for each blob a trajectory consisting of the temporal sequence of the points
provided as input is generated. Trajectory of the blob provides the direction of the blob and size
of the blob.

3.5.2

Blob detection and identification method

Various approaches developed for blob detection can be grouped into following categories:
scale-space analysis, matched filters or template matching, watershed detection, sub-pixel precise blob detection, effective maxima line detection which are having both advantage and disadvantages. A new method for blob detection and identification is presented below.
The program operates based on the following assumptions:
1. The blobs do not overlap.
2. The rectangular enclosure for each blob does not intersect from each other.
The algorithm works based on a scanning search technique. The search starts from (0,0)
of the processed image frame or data and then proceeds from left to right, i.e., from column 0
to the last column of the image data, and then from top to right, i.e., next row and continue up
to the last row of the image. The image data is the output data of image segmentation.The steps
of the algorithm are as follows:
1. Read the image frame data, p . If the data p is zero, read another data until a nonzero data
is read.
2. If a nonzero data is obtained, then a blob is detected and the following decisions are made:
32

(a) If it is the first blob then calculate statistics of the blob i.e., centroid measured with
respect to, point (0,0), area of rectangle, actual area, and volume. Actual area is the
actual nonzero count of pixels in the rectangle, and the volume is the volume formed
by the membership function values of the blob.
(b) If it is not the first blob, the data point is checked whether or not it is part of an
existing blobs rectangular enclosure (Note that only one rectangle will satisfy this
condition due to assumption (2)). If it is part of the rectangle, then move the search
pointer to the upper right corner of the rectangle plus one and continue the search
until a new blob is detected. If it is not part of existing rectangle then a new blob is
detected. Once new blob is detected, the blob is enclosed in a rectangle.
3. The process is repeated until all blobs are detected and enclosed in rectangles.
Each time a first nonzero point p in a blob is detected, the point is enclosed initially by a
rectangle of the size 2x2 ( or a 2x2 square). The point p is located at the upper-left corner of the
rectangle as shown in a figure 1a and the starting points left dimension count, right dimension
count and bottom-dimension count, respectively as shown in figure 1b.
The search for expansion is done within the boundaries of the image frame. The search is
conducted to the left, to the right and to the bottom of the square, in that order.
1. For search-left, the data points immediately to the left of the rectangle are checked. If a
nonzero pixel value is detected, the left-dimension of the rectangle is increased by one
(left expansion). If not the left-dimension stays the same.
2. For search-right, the data points immediately to the right of the rectangular are checked.
if non zero pixel value is detected, the right-dimension of the rectangle is increased by
one (right expansion). If not the right- dimension stays the same.
3. For search bottom, the data points immediately below the bottom are checked. If a
nonzero pixel value is detected, the bottom-dimension is increased by one. If not the
following steps are done. (a) Search for lower-left, and (b) search for lower-right. For
the lower-left search, if nonzero pixel is detected,the bottom-dimension is increased by
one , And if the left- dimension did not change in (1), then increase the left-dimension
by one. Else, the left-dimension remains the same. If the lower-left pixel is zero, then
the bottom-dimension remains unchanged. For the lower-right search, if a nonzero pixel
33

is detected, the bottom-dimension is increased by one. And if the right dimension did
not change in (2), then increase the right-dimension by one. Else, the right-dimension
remains the same. If the lower-right pixel is zero, then the bottom-dimension remains
unchanged.
4. The process is repeated until all the dimensions (left-dimension, right-dimension, and
bottom-dimension ) remain unchanged. This indicates that a blob is fully enclosed in a
rectangle. Note that the size of the rectangular enclosure stretches to the left, to the right,
or to the bottom as the search procedures discussed above are successively executed.
Thus, one can see that the size of the rectangle keeps on expanding like

3.6

Feature extraction for blobs

In this section, we describe the extracted features of identified blob. Figure 3.3 shows an example of a blob.
1. Blob area: By counting the number of pixels included in blob i of the t-th frame, we
calculate the area of the object ai (t).

Figure 3.3: Feature extraction

2. Blob width and height: We extract the positions of the pixel Pxmax (or Pxmin ) which
has the maximum (or minimum) x-component:
P xmax = (Xmaxx , Xmaxy ),
34

(3.4a)

P xmin = (Xminx , Xminy )

(3.4b)

where Xmax, x, Xmax, y, Xmin, x, and Xmin, y are the x and y-coordinates of the
rightmost and leftmost boundary of segment i, respectively. In addition, we also extract
P ymax = (Y maxx , Y maxy ),

(3.5a)

P ymin = (Y minx , Y miny )

(3.5b)

Then we calculate the width w and the height h of the objects as follows
wi(t) = Xmaxx Xminx ,

(3.6a)

hi(t) = Y maxy Y miny

(3.6b)

3. Position: We define the positions of each object in the frame as follows


(Xmaxx + Xminx )
,
2
(Y maxy + Y miny )
yi(t) =
2

xi(t) =

(3.7a)
(3.7b)

4. Color: Using the image data at Pxmax , Pxmin , Pymax and Pymin , we define the color
feature of each object for the Y (Limunious) component as
Y i(t) =

[Y (P xmax) + Y (P xmin) + Y (P ymax) + Y (P ymin)]


,
4

(3.8)

as well as by equivalent equations for the U and V components.

3.7

Tracking: Centroid calculation

For the computation of the blobs center-point different methods could be applied. Bounding
box measures the center-point of the blob by checking for the minimal and maximal XY coordinates of the blob. If a new pixel is adjacent or in the range of the estimated XY coordinates
of the blob, it will be added to the blob and an adjustment of the min-/max-values is performed.
For the computation of the center-point the bounding box offers not enough information about
the blob to extract very precise results. The computation can be implement very efficiently and
will not cause big performance issues.
BLOBs_Xcenter_position = minX_position +

35

maxX_position minX_position
2
(3.9a)

BLOBs_Y _center_position = minY _position +

maxY _position minY _position


2
(3.9b)

Division by two can be realized by a bit shift and mathematic operations, such as addition and
subtraction, are not computationally expensive either. The center coordinates are strongly affected by the pixels at the blobs border. With a threshold based detection the pixels at the
border can show flickering. This effect becomes even stronger for blobs in motion. The movement of the device by the user will cause motion blur on the blobs shape. These effects will
increase the flickering of pixels at the blobs border and will cause a flickering in the computed
center-point. For reducing these flickering effects another method for the computation of the
blob center point is the center-of-mass method. All pixels of the detected blob are taken into
account for the computation of the center point. The algorithm applies the coordinate values of
the detected pixels as weighted sum and calculates an averaged center coordinate.
X
(X_position_of _all_BLOB_pixels)
BLOBs_X_center_position =
number_of _all_BLOB_pixels
X
(Y _position_of _all_BLOB_pixels)
BLOBs_Y _center_position =
number_of _all_BLOB_pixels

(3.10a)

(3.10b)

To get an even higher precision the brightness values of the pixels could be applied as weights
as well, which increases the precision of the center-point of a blob.

center_position =

pixel_positionpixel_brightness
pixel_brightness

(3.11)

Since the available video material is converted into greyscale, the border of the blobs can
show a significant flickering, dependent on the threshold value and the color gradient. To avoid
a similar flickering in the computed center-position of the blob, the application of the proposed
averaging is recommended. This can be realized by applying a running summation of the values
during the detection phase. The division by the number of pixels can be done after all merging
procedures for the blobs has been executed.

36

Chapter 4
Video and Image Processing Algorithms
on TMS320DM6437
In this project we develop and demonstrate a framework for real-time object tracking using the
Texas Instruments (TIs) TMS320DM6437 Davinci multi media processor. More specifically
we track single object and two object present in the scene captured by a camera that acts as the
video input device. The tracking happens in real-time consuming 30 frames per second(fps) and
is robust to background, illumination changes. The approach involves an off-line detection algorithm which is a pre-processing step. As part of the real-time tracking the proposed approach
does background subtraction, image segmentation, blobs detection and identification, feature
extraction and calculation of center of mass for target tracking. Also Real-time implementation
of basic image processing algorithms such as image inversion, edge detection using Sobel operator and Image compression were carried out on DaVinci digital media processor DM6437.
The input was captured using a CCD camera and output is displayed in LCD display. Blob
detection and identification was very slow due to high computational complexity due limiting
speed of processor ,coding style and the algorithm processes of multiple blob detection, identification and centroid calculation.Execution time for different blocks of single object tracking
were estimated using the profiler and acquricy of the detection is test by debugger provided by
TI code composer studio (CCS).
There are two approaches for video processing algorithms on TMS320DM6437 : the Real
time approach by using CCD camera and Display and second one is that we can perform this
by using read file from hard drive and process( ENC DEC) and also other video process it
and store that file into hard drive . Second case no need for configure capture and display
37

channel and checking PAL or NTSC etc. First case is used real time video preview and video
Encoder/Decoder where as second one is used in video recorder or video copy or video encoder/decoder to/from file.

4.1

Implementation of Single target tracking on TMS320DM6437

In this chapter, describes the steps to implement the single object target tracking algorithm on
an TMS320DM6437 hardware kit containing DaVinci DSP Processor. The tarcking algorithm
implemented in this project consists of steps like capturing a frame, copying the frame extracting
Y component, removing of noise, dilation, processing of image and region of interest extraction.
A given image consists of a luminance (Y) and a chrominance (C) component. The luminance
component mainly represents the gray scale of a pixel. This information is essential in further
processing using this algorithm. The chrominance component is not essential, as the algorithm
is independent of such information. The Y component is extracted from the frame and stored in
a two dimensional array. After this, the Y components are processed for noise removal.
Objective of this project is to track single object movement by using TMS320DM6437
EVM board. Tracking of single object consist of different steps Capturing input frame, Copy
frame, Frame subtraction, tracking. Capture the input frame buffer store frame into an array,
copy the array into another array for storing the foreground image than process the array for
frame subtraction. In frame subtraction function, ist delay the frame store into another array than
subtract the incoming frame from delayed frame than the resultant frame is now segmented by
using thersholding. After segmentation we proceed for feature extraction. In feature extraction
process we will find the centroid of the moving image then we will track the moving image with
the movement of the centroid as shown in fig. 5.6.
The working principle of Single target tracking module on TMS320DM6437 is given below

Step 0: Object of this project is to track single object movement by using TMS320DM6437.

Step 1:Start CCSv3.3 using the desktop Shortcut icon and before to start ccs, make sure
that the DM6437 kit is connected to PC (via USB emulation cable) or external emulator,

38

Figure 4.1: Single object tracking

the input from CCD camera is connected to video input port of DM6437 board via composite to BNC connector, output to the display device is connected to one of the three
output port of the kit via composite cable(input output connection can be done by using
s-video interface also) and powered on all the peripherals and board.

Step 2:Load the video_preview.pjt project from the dvsdk example directory
Step 3: In video_preview (void) function, first we declare
FVID_Frame *frameBuffTable[FRAME_BUFF_CNT];
FVID_Frame *frameBuffPtr;
FVID_Frame is structure which contain I(interlaced), P(progressive) frame buffer, lines,
bpp, pitch, colore format (Ycrcb422, RGB888, RGB565) etc. frameBuffTable is a array
of pointer of type FVID_Frame structure. The size of frameBuffTable is 6, that is 3
for capture frame and 3 for display frame. We can increase the input frame queue form
3 to any no or display frame queue from 3 to any no by increase the value of macro
FRAME_BUFF_CNT. frameBuffPtr is pointer to FVID_frame.
This file is present in fvid.h
Step 4: GIO_Handle is a pointer to Gobal input output object:GIO_obj this is used to declare pointer to mini driver channel object like (hGioVpfeCcdc, hGioVpbeVid0, hGioVp39

beVenc).
GIO_Handle hGioVpfeCcdc;
GIO_Handle hGioVpbeVid0;
GIO_Handle hGioVpbeVenc;
The params keyword lets you specify a method parameter that takes an argument where
the number of arguments is variable. The PARAM element is used to pass values to an
embedded OBJECT (usually an embedded program). PSP is tends for Platform Support

Package. This param are used to set for video display(VPBEOsdConfigParams)/capture(VPFECcdcCo


driver configuration params to defaults. Also used to set TVP5146 Video decoder configuration (VPFE_TVP5146_ConfigParams).
PSP_VPFE_TVP5146_ConfigParams
tvp5146Params =

VID_PARAMS_TVP5146_DEFAULT;

PSP_VPFECcdcConfigParams
vpfeCcdcConfigParams = VID_PARAMS_CCDC_DEFAULT_D1;

Step 5: This is used to create video input channel CCDC and create video output channel Vid0 or VENC. FVID_create() is allocates and initializes an GIO_Obj structure.
FVID_create() returns a non-NULL GIO_Handle object on success and NULL for failure.
/* create video input channel */
hGioVpfeCcdc = FVID_create("/VPFE0",IOM_INOUT,NULL,
&vpfeChannelParams,NULL);
/* create video output channel, plane 0 */
hGioVpbeVid0 = FVID_create("/VPBE0",IOM_INOUT,NULL,
&vpbeChannelParams,NULL);
Step 6:FVID_allocBuffer() is used by the application to allocate a frame buffer using the
drivers memory allocation routines.

40

/* allocate some frame buffers */


result = FVID_allocBuffer(hGioVpfeCcdc, &frameBuffTable[i]);

Step 7:In this block of code video capture and video display channels are in a queue of
size 3 we can increse the queue size by increase value of the variable FRAME_BUFF_CNT
/* prime up the video capture channel */
FVID_queue(hGioVpfeCcdc, frameBuffTable[0]);
FVID_queue(hGioVpfeCcdc, frameBuffTable[1]);
FVID_queue(hGioVpfeCcdc, frameBuffTable[2]);
/* prime up the video display channel */
FVID_queue(hGioVpbeVid0, frameBuffTable[3]);
FVID_queue(hGioVpbeVid0, frameBuffTable[4]);
FVID_queue(hGioVpbeVid0, frameBuffTable[5]);
Step 8:The code for single object tracking can be written inside these two function.
FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
\*Write your code inside the thses two function */
FVID_exchange(hGioVpbeVid0, &frameBuffPtr);
FVID_exchange():Exchange one driver-managed buffer for another driver-managed buffer.
This operation is similar to an FVID_free()/alloc() pair but has less overhead since it involves only one call into the driver.
FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
hGioVpfeCcdc is a captured input frame GIO_handler and &frameBuffPtr is the address
of the frame buffer table.
Step 9: Example of single object tracking
FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
extract_uyvy ((frameBuffPtr->frame.frameBufferPtr));
copy_frame();
41

frame_subtract();
tracking();
write_uyvy ((frameBuffPtr->frame.frameBufferPtr));
/* display the video frame */
FVID_exchange(hGioVpbeVid0, &frameBuffPtr);
1. Function A: extract_uyvy ((frameBuffPtr->frame.frameBufferPtr)):Capture the
input frame In this function the Captured the input frame is stored in an array . The
captured frame is a standard definition NTSC type color image in YUV22 sampling
format given in fvid.h file. So we have to declare an array of size 480*720 for Y,
480*360 for U and V. The extracting logic is given in the function
I_u1[r][c]

= * (((unsigned char * )currentFrame)


+ r*720*2+4*c+ 0);

I_y1[r][2*c]

= * (((unsigned char * )currentFrame)


+ r*720*2+4*c+ 1);

I_v1[r][c]

= * (((unsigned char * )currentFrame)


+ r*720*2+4*c+ 2);

I_y1[r][2*c+1] = * (((unsigned char * )currentFrame)


+ r*720*2+4*c+ 3);
extract_uyvy ((frameBuffPtr->frame.frameBufferPtr)) :

In this function,(frameBuffPtr->frame.frameBufferPtr) parameter is explained as


frame is a union, which contains y/c frame buffer(iFrm, pFrm) and row frame
buffer(riFrm, rpFrm) and row Frame buffer pointer (frameBufferPtr). frame.frameBufferPtr
: means accessing the member pointer frameBufferPtr of a unoin frame. frameBuffPtr>frame.frameBufferPtr: means a structure type pointer frameBuffPtr of type FVID_Frame
pointing to member frame union.
2. Function B:copy_frame(); Copy the original frame into another array for storing the
foreground image. And Copied ayyay(I_y, I_u, I_v) should be written into display
frame.
I_u[r][c]

= I_u1[r][c];
42

I_y[r][2*c]

= I_y1[r][2*c] ;

I_v[r][c]

= I_v1[r][c]

I_y[r][2*c+1] = I_y1[r][2*c+1] ;

3. Function C:frame_subtract(); In frame subtraction function is consist of three parts


that is Delay frame, Subtract Frame, Image Segmentation Delay frame : delay the
frame by storing frame into another array in a concurrent loop call
I_u2[r][c]

= I_u1[r][c];

I_y2[r][2*c]

= I_y1[r][2*c] ;

I_v2[r][c]

= I_v1[r][c]

I_y2[r][2*c+1] = I_y1[r][2*c+1] ;

Subtract Frame: Subtract the incoming frame from delayed frame store it into
another array.
I_u3[r][c]

= I_u1[r][c]

I_y3[r][2*c]

= I_y1[r][2*c] - I_y2[r][2*c];

I_v3[r][c]

=I_v1[r][c]

I_y3[r][2*c+1] =

- I_u2[r][c] ;

-I_v2[r][c] ;

I_y1[r][2*c+1]- I_y2[r][2*c+1];

Image Segmentation : Resultant subtracted frame is now segmented by using thersholding. In yuv422 format, in case luminous the 0 gray value is equal to 16,
1 gray value is equal to 235.
if((I_u3[m][n]<45 || I_u3[m][n]>200) && (I_y3[m][2*n]<45
||I_y3[m][2*n]>200) && (I_v3[m][n]
(I_y3[m][2*n+1]

<45 || I_y3[m][2*n+1]>200){

I_u3[m][n]

= 128 ;

I_y3[m][2*n]

= 16 ;

I_v3[m][n]

= 128

I_y3[m][2*n+1] = 16;}
else{

<45 || I_v3[m][n]>200) &&

I_u3[m][n]

= 128 ;

I_y3[m][2*n]

= 235;

43

I_v3[m][n]

= 128

I_y3[m][2*n+1] = 235;}

4. Function D:Tracking(): This function comprises of three parts that is feature_extraction,


tracking and creating rectangle.
Feature extraction: In this case, we will extract the feature of the moving object
that is position of the centroid of the moving object that is centroid_x, centroid_y.
{
cent_x= cent_x + m ;
cent_y= cent_y + n ;
cent_z= cent_z + 1 ;
}
centroid_x= (cent_x/cent_z);
centroid_y= (cent_y/cent_z);
Tracking Object: The movement of the object is nothing but the movemnet of
centroid of the object calculated.
Creating Rectangular: Once the position(x,y) of the centroid is found, than we can
create an rectangle around the the centroid of the object.
for(p =centroid_x-10 ; p < centroid_x+10; p++){
for(q = centroid_y-10; q < centroid_y+10; q++){
if(p== centroid_x-10 || p==centroid_x+9 || q ==
centroid_y-10 || q==centroid_y+9){
I_u[p][q]

= 255;

I_y[p][2*q]

= 255;

I_v[p][q]

= 255;

I_y[p][2*q+1] = 255; }
else {
I_u[p][q]

= I_u[p][q];

I_y[p][2*q]

= I_y[p][2*q];

I_v[p][q]

= I_v[p][q];

I_y[p][2*q+1] = I_y[p][2*q+1];}}}
44

5. Function E:write_uyvy ((frameBuffPtr->frame.frameBufferPtr)); In this function, First we copied the array into the current output foreground frame which is
going to be displayed. The current frame, which is going to be displayed is a standard definition NTSC type color image in YUV22 sampling format given in fvid.h
file. The display logic for frame is given in the function.
* (((unsigned char * )currentFrame) + r*720*2+4*c+ 0)
= I_u[r][c] ;
* (((unsigned char * )currentFrame) + r*720*2+4*c+ 1)
= I_y[r][2*c] ;
* (((unsigned char * )currentFrame) + r*720*2+4*c+ 2)
= I_v[r][c] ;
* (((unsigned char * )currentFrame) + r*720*2+4*c+ 3)
= I_y[r][2*c+1];
Step 10: FVID_exchange for Display frame
FVID_exchange(hGioVpbeVid0, &frameBuffPtr);
hGioVpbeVid0 is a display output frame GIO_handler and &frameBuffPtr is the address
of the frame buffer table.

4.1.1

Debugging and profiling results

Profiling results
Profiling is used to measure code performance and make sure for efficient use of the DSP targets
resources during debug and development sessions. Profiling is used on different function of
single object tracking and the time taken to execute each function is measured by considering
its inclusive and exclusive cycle, access count and processor clock frequency.
Object tracking setup
1. Verify board jumper JP1 is set to the correct display format either NTSC or PAL as according to DM6437 DVDP getting started guide. .
45

Table 4.1: Single object tracking profiler data


Function name

Access count

Incl Cycle

Excl cycle

Incl time taken(s)

write_uyvy_fun

7007

23402017

22402007

0.000005

copy_frame_fun

17944199

17944199

0.02563

frame_subtract_fun

38848159

38848159

0.05550

tracking_fun

157644657

94935656

0.22521

read_JP1_fun

632311

1804

0.00090

extract_uyvy_fun

9007

33402017

33402017

0.000005

main_fun

78070

300

0.00011

video_preview_fun

180255249

46289

0.25751

centriod_loop

8594

156441001

94241711

0.00026

Rectangle_loop

24

151250796

91165143

0.009003

frame_buffer_int_loop

5542

5542

0.000001

allocate_frame_buffers_loop

29750

11315

0.000007

while_loop_vid_capture_disp_all_fun

5655

0.000008

2. Verify board jumpers and switches are set as according to getting started guide so that the
boot mode is EMIF boot.
3. Connect a composite video cable from an NTSC video camera input device to the EVM
boards Video in RCA jack J5 as shown in Figure 4.2.a.
4. Connect a composite video cable from a video display to the EVM boards DAC D Video
Out RCA jack J4 as shown in Figure 4.2.b.
5. On board USB emulator cable to connect the EVMs USB connector to a PC as shown
Figure 4.3.a. USB connection enables debugging via Code Composer Studio.
6. Plug in the video camera, video LCD display.
7. Connect the provided +5V power supply to an AC power source. Connect the provided
+5V power supply to the EVM boards power connector as shown in Figure 4.3.b.

46

(a)

Figure 4.2:

(b)

Evm board setup: (a)Video input connection,and (b)Video output connection

(a)

Figure 4.3:

(b)

EVM board setup : (a)USB on board emulator ,and (b)Power connection

Debugging results
Debugging results of single object tracking is shown in the following Figure 4.4 and 4.5. In
Figure 4.4.a shows the DVDP EVM6437 board, Figure 4.4.b shows the complete set up for
object tracking, where CCD camera input is given to EVM board with the help of composite
connector. The output of the target is shown in LCD display with the help of composite cable
which is connected to output port of the EVM board. The original ball which is going to be
tracked is shown in the Figure 4.4.c.
In Figure 4.5 shows the debugging result of the single ball tracking. The result of the
background subtraction with out any filter is shown in the Figure 4.5.a. The results of the
background subtraction after filtering is shown in the Figure 4.5.b and Figure 4.5.c shows the
debugging result in code composer studio.

(a)

Figure 4.4:

(b)

(c)

Debugging results : (a)TMS320DM6437 EVM board, (b)Target tracking setup with DM6437 board, and (c)Original ball for

tracking

47

(a)

Figure 4.5:

(b)

(c)

Debugging results : (a)Results of background subtraction with out filtering, (b)Background subtraction with filtering, and

(c)Debugging output for I_y1_u1_v1

4.2

Implementation of multiple object tracking on DM6437

Objective of part of the project is to track two object movement by using TMS320DM6437
EVM board. Tracking of single object consist of different steps Capturing input frame, Copy
frame, Frame subtraction,Blob Detection and identification, feature extraction, tracking. Capture the input frame buffer store frame into an array, copy the array into another array for storing
the foreground image than process the array for frame subtraction. In frame subtraction function, frist delay the frame store into another array than subtract the incoming frame from delayed
frame than the resultant frame is now segmented by using thersholding. than detect and identify
the blobs present in the frame . Extract the feature of blobs present in the frame. Than track the
blobs by centroid calculation.

Figure 4.6: Multiple object tracking

Step 1: The code for Multiple object tracking can be written inside these two function.
FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
\*Write your code inside the thses two function */
48

FVID_exchange(hGioVpbeVid0, &frameBuffPtr);
FVID_exchange():Exchange one driver-managed buffer for another driver-managed buffer.
This operation is similar to an FVID_free()/alloc() pair but has less overhead since it involves only one call into the driver.
FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
hGioVpfeCcdc is a captured input frame GIO_handler and &frameBuffPtr is the address
of the frame buffer table.
Step 2: Example of Multiple object tracking
FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
extract_uyvy ((frameBuffPtr->frame.frameBufferPtr));
copy_frame();
frame_subtract();
write_uyvy ((frameBuffPtr->frame.frameBufferPtr));
FVID_exchange(hGioVpbeVid0, &frameBuffPtr);
1. Function A: extract_uyvy ((frameBuffPtr->frame.frameBufferPtr)):Capture the
input frame In this function the Captured the input frame is stored in an array . The
captured frame is a standard definition NTSC type color image in YUV22 sampling
format given in fvid.h file. So we have to declare an array of size 480*720 for Y,
480*360 for U and V. The extracting logic is given in the function
I_u1[r][c]

= * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 0);
I_y1[r][2*c]

= * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 1);
I_v1[r][c]

= * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 2);
I_y1[r][2*c+1] = * (((unsigned char * )currentFrame)
+ r*720*2+4*c+ 3);

49

extract_uyvy ((frameBuffPtr->frame.frameBufferPtr))

In this function(frameBuffPtr->frame.frameBufferPtr) is explained as frame is a


union, which contains y/c frame buffer(iFrm, pFrm) and row frame buffer(riFrm,
rpFrm) and row Frame buffer pointer(frameBufferPtr). frame.frameBufferPtr : means
accessing the member pointer frameBufferPtr of a unoin frame. frameBuffPtr>frame.frameBufferPtr: means a structure type pointer frameBuffPtr of type FVID
_Frame pointing to member frame union.
2. Function B:copy_frame(); Copy the original frame into another array for storing the
foreground image. And Copied ayyay(I_y, I_u, I_v) should be written into display
frame.
I_u[r][c]

= I_u1[r][c];

I_y[r][2*c]

= I_y1[r][2*c] ;

I_v[r][c]

= I_v1[r][c]

I_y[r][2*c+1] = I_y1[r][2*c+1] ;

3. Function C:frame_subtract();
In frame subtraction function is consist of different parts that is Delay frame, Subtract Frame, Image Segmentation, blob detection and identification, feature extraction, Tracking.
Delay frame : delay the frame by storing frame into another array in a concurrent
loop call
I_u2[r][c]

= I_u1[r][c];

I_y2[r][2*c]

= I_y1[r][2*c] ;

I_v2[r][c]

= I_v1[r][c]

I_y2[r][2*c+1] = I_y1[r][2*c+1] ;

Subtract Frame: Subtract the incoming frame from delayed frame store it into
another array.
I_u5[r][c]

= I_u1[r][c]
50

- I_u2[r][c] ;

I_y5[r][2*c]

= I_y1[r][2*c]

- I_y2[r][2*c];

I_v5[r][c]

= I_v1[r][c]

- I_v2[r][c] ;

I_y5[r][2*c+1] = I_y1[r][2*c+1]- I_y2[r][2*c+1];


Image Segmentation : Resultant subtracted frame is now segmented by using thersholding.IN yuv422 format, in case luminious the 0 gray value is equal to 16,
1 gray value is equal to 235.
if((I_u5[m][n]<45 || I_u5[m][n]>200) && (I_y5[m][2*n]<45
|| I_y5[m][2*n]>200) && (I_v5[m][n]
&& (I_y5[m][2*n+1]

<45 || I_v5[m][n]>200)

<45 || I_y5[m][2*n+1]>200)) {

I_u5[m][n]

= 128 ;

I_y5[m][2*n]

= 16 ;

I_v5[m][n]

= 128

I_y5[m][2*n+1] = 16;}
else {

I_u5[m][n]

= 128 ;

I_y5[m][2*n]

= 235;

I_v5[m][n]

= 128

I_y5[m][2*n+1] = 235; }

Blob detection and Identification


(a) Read the image frame data, p . If the data,p, is zero, read another data until a
nonzero data is read.
(b) Each time a first nonzero point p in a blob is detected, the point is enclosed
initially by a rectangle of the size 2x2 ( or a 2x2 square). Expand the rectangle
by searching value present in the boundary of the rectangle, if non zero value is
found than increase the length(rhigh) or width(chigh) of the rectangle.
if(chigh!=LAST_COL && rlow<LAST_ROW ){//along horz
for (rtemp =rlow;rtemp<=rhigh;rtemp++){
if (I_y3[rtemp][chigh+1]>16){
chigh=chigh+1;
flag=1;

51

break;
if(rhigh!=LAST_ROW && clow<LAST_COL ){//along vert
for (ctemp =clow;ctemp<=chigh;ctemp++){
if(I_y3[rhigh+1][ctemp]>16){
rhigh=rhigh+1;
flag=1;
break;
feature Extraction:If it is the first blob then Calculate statistics of the blob
that is - centroid measured with respect to, point (0,0), area of rectangle, actual
area(count), and volume. Actual area is the actual nonzero count of pixels in
the rectangle, and the volume is the volume formed by the membership function
values of the blob. this gives the rectangle area.
count=0;
LL=rhigh-rlow+1;
LH=chigh-clow+1;
t_area=LL*LH;

This gives the actual area of the blob


for (ix = rlow;ix<=rhigh;ix++)
for (jx = clow;jx<=chigh;jx++)
if(I_y3[ix][jx]>16)
count=count +1;
Also we will find maximum length and width of the blob.
Store the blob dimention into an array. that ia clow, rlow, chigh, rhigh.
if(count>(t_area/2)&& (t_area>100) ){//% for selecting blob
iblob=iblob+1;
arr[k]=rlow;
arr[k+1]=clow;
arr[k+2]=rhigh;
arr[k+3]=chigh;
k=k+4;

52

4. Function D:tracking(): This function is comprise of three part that is Feature extraction, tracking and creating rectangule.
Feature extraction: In this case, we will extract ht feature of the moving object that
is position of the centroid of the moving object that is centroid_x,centroid_y.
for( a=1;a<=4*iblob;a=a+4){
for (m=arr[a];m<=arr[a+2];m++){
for (n=arr[a+1];n<=arr[a+3];n++){
if((I_y3[m][n]
I_y3[m][n]

<45 || I_y3[m][n]>200)){

= 16 ;}{

cent_x= cent_x + m ;
cent_y= cent_y + n ;
cent_z= cent_z + 1 ;}
centroid_x[a]= (cent_x/cent_z);
centroid_y[a]=(cent_y/cent_z);

Tracking: The movement of the object is nothing but the movemnet of centroid of
the object calculted. Creating Rectangular: Once the position(x,y) of the centroid
is found, than we can create an rectangle around the the centroid of the object.
for(l=1;l<=4*iblob;l=l+4){
for(p =centroid_x[l]-10; p < centroid_x[l]+10; p++){
for(q = centroid_y[l]-10; q < centroid_y[l]+10; q++){
if(p== centroid_x[l]-10 || p==centroid_x[l]+9 ||
q ==centroid_y[l]-10 || q==centroid_y[l]+9){
I_y[p][q]

= 235;}

else {I_y[p][q]

=I_y[p][q];}

5. Function E:write_uyvy ((frameBuffPtr->frame.frameBufferPtr)); In this function the display the output foreground is which is copied in an array . The display
frame is a standard definition NTSC type color image in YUV22 sampling format
given in fvid.h file. The Writing frame into display logic is given in the function.
53

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 0)


=

I_u[r][c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 1)


=

I_y[r][2*c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 2)


=

I_v[r][c] ;

* (((unsigned char * )currentFrame) + r*720*2+4*c+ 3)

Step 3: FVID_exchange for Display frame


FVID_exchange(hGioVpbeVid0, &frameBuffPtr);
hGioVpbeVid0 is a display output frame GIO_handler and &frameBuffPtr is the address
of the frame buffer table.

4.2.1

Debugging and profiling results

Profiling results
Profiling is used to measure code performance and make sure for efficient use of the DSP targets
resources during debug and development sessions. Profiling is used on different function of
Multiple object tracking and the time taken to execute each function is measured by considering
its inclusive and exclusive cycle, access count and processor clock frequency.
Debugging results
Debugging results of multiple object tracking is shown in the following Figures 4.7, 4.8 and
4.9. Two ball tracking is show Figure 4.7.a .4.7.b shows two object tracking and Result of
the background subtraction with filtering is shown in figure 4.7.c. In Figure 4.8 shows the
debugging result of two object tracking , in which Figure 4.8.a. shows debugging results of two
object tracking, in Figure 4.8.b shows the result of the background subtraction of two object
with filtering and debugging results of two object tracking in code composer studio. is shown
in Figure 4.8.c. and the result of three object tracking is shown in Figure 4.9.

54

Table 4.2: Multiple object tracking profiler data


Function name

Access count

Incl Cycle

Excl cycle

Incl time taken(s)

write_uyvy_fun

7007

23402017

22402007

0.000005

copy_frame_fun

17944199

17944199

0.02563

frame_subtract_fun

988491609

98848159

0.141212

read_JP1_fun

632311

1804

0.00090

extract_uyvy_fun

9007

33402017

33402017

0.000005

main_fun

78070

300

0.00011

video_preview_fun

3802552121

46289

1.114650

centriod_loop

200

656441001

94241711

0.061174

Rectangle_loop

24

651240712

91165143

0.038765

frame_buffer_int_loop

5542

5542

0.000001

allocate_frame_buffers_loop

29750

11315

0.000007

while_loop_vid_capture_disp_all_fun

5655

0.000008

(a)

Figure 4.7:

(c)

Debugging results : (a)Two ball tracking, (b)Two object tracking, and (c)Results of background subtraction with filtering

(a)

Figure 4.8:

(b)

(b)

(c)

Debugging results : (a)Two object targeting results, (b)Results of background subtraction of 2 object with filtering, and

(c)Debugging result of 2 object tracking

55

Figure 4.9: Debugging results of Three target tracking

56

4.3

Implementation object tracking algorithm in matlab

In the proposed object tracking algorithm, number of features are extracted for all segmented
objects. Then pattern matching with the objects of the previous frame is carried out. A high level
flow chart of the proposed algorithm is shown in Figure 5.13. This algorithm is implemented in
Matlab. The detailed processing consists of the following steps are shown in fig.5.14

Figure 4.10: Different steps for object tracking using segmentation and pattern matching

Step 1: With the image segmentation algorithm, we extract all objects in the input image.
Step 2: Then we extract coordinates of four object-pixels which are indicated in Figure.
Pxmax and Pxmin have the maximum and minimum x-component, while Pymax and
Pymin have the maximum and minimum y component, respectively.
Step 3: Next we calculate characteristic features of the object to be tracked in the video
based on image segment and pattern matching. The segmented object, that is, object position (x, y), object size (width, height), color information (R, G, B), and object area,
respectively. Object position (x, y), width w and height h are calculated according to
below equations.
w = Xxmax - Xxmin, h= Yymax - Yymin,
x = (Xxmax + Xxmin )/2, y= (Yymax + Yymin)/2
The object area is determined by counting the number of its constituting pixels. As object
color information, average RGB data of the 4 pixels, Pxmax, Pxmin, Pymax and Pymin,
are used. Feature extraction calculation is as shown in figure 5.15.

57

Figure 4.11: Flow chat of Object tracking based on segmentation & pattern matching.

Step 4: The minimum distance search in the feature space is performed between each
object in the current frame and all objects in the preceding frame. Then we identify each
object in the current frame with the object in the preceding frame which has the minimum
distance or in other words which is the most similar object.

Step 5: Next, we calculate the motion vector (mx(t- 1), my(t- 1)) from the difference in
position between the object in the current frame and matching object in the preceding
frame. By adding the motion vector (mx(t-1), my(t-1)) to the current position (x(t-1),
y(t-1)) of the object, we determine an estimate for the object?s position (x(t), y(t)) in the
next frame. This estimate position is exploited instead of the extracted position (x(t - 1),
y(t - 1)) for pattern matching after a start-up phase from the third frame onwards.
58

(a)

Figure 4.12:

(b)

Different templates : (a) Featured extraction, (b) Estimation of positions in the next frame

Step 6: By carrying out this matching procedure with all segments obtained for the current frame, we can identify all objects one by one and can maintain tracking of all objects
between frames.
Results A video frame is captured and image thresholding is carried out according algorithm present in paper [1]. Region growing technique is applied to obtain the different
regions in image. Initially object is marked as a point and then it was allowed to grow to
form a segment. This segment is searched in the next frame for tracking the target. Figure 5.16.(a) shows the y component of raw video frame. Figure 5.16.(b) shows the image
after threshold. Figure 5.16.(c) shows the selected portion for tracking. The process of
region growing is shown in the figure 5.17. (a) and the image segment is shown in figure
5.17. (b). Tracking of the segment is shown in figure 5.17. (c).

(a)

Figure 4.13:

(b)

Different templates : (a)y color image, (b)thresholding, and (c)select region become point

59

(c)

(a)

Figure 4.14:

(b)

(c)

Different templates : (a) Region growing, (b) segmented region, and (c) track the segment

Summary Simulation results for single object tracking and multiple object tracking is
verified on Davinci proecessor and matlab using suitable algorithm. A video frame is captured
and then background subtraction followed by image segmentation,which is carried out by using thresholding was carried out according to algorithm present in paper [1].The segmentation
image is processed for blob detection and identification and follows by feature extraction and
tracking is carried out according to algorithm present in paper [1]. A point based image segmentation was used and a pattern matching was carried out using Manhatten distance to track
human beings in a video.

60

Chapter 5
Summary
In this dissertation, various components of the target tracking has been discussed in detail. Realtime object tracking algorithm was implemented on TMS320DM6437 with input form CCD
camera and output to LCD display. Profiling of different function of single object tracking,
multiple object tracking were carried out with the help of CCS studio. DaVinci processor helps
to reuse API, using video and image processing library and optimize the coding efficiency.
An object tracking in video based on image segmentation and pattern matching algorithm was
implemented in Matlab for comparision. This algorithm was simulated in Matlab and results of
the simulation were verified for segmentation and tracking of an person.
This dissertation has been primarily focused on implementation of Video and image processing algorithms on Davinci processor. The aim was to implement the object tracking algorithm on TMS320DM6437 processor. For object tracking model back ground subtraction,
image segmentation, blob detection & identification and tracking were implemented. After successful implementation of the algorithms, tracking is carried out with the help of center of mass
calculation and movement. In case of multiple object movement, inaccuracy is observed in the
algorithm. In order to avoid this problem blob detection and identification is implemented. A
GMM is implementation in the processor was very slow and memory consuming. Then an novel
idea for blob detection and identification is carried out. In this algorithm, first back ground subtraction is implemented then image segmentation followed by blob detection and identification
followed by feature extraction and then tracking is done by using center of mass calculation.
Profiling of different functions of single object tracking was carried out with the help of CCS
studio. Based on profiling results tracking which is implemented in processor is much much
faster than Matlab.
61

The proposed approach can be improved by using a better background subtraction model,
using good image segmentation algorithm like region growing, template matching. Further
improvement can be obtained by using Kalman filter or particle filter.

62

Chapter 6
APPENDIX
6.1

APPENDIX A:Real-Time Video Processing using Matlab


simulink interface with ccs studio3.3 in DM6437 DVDP

6.1.1

Introduction

This chapter describes real time video processing of image inversion, edge detection, median
filtering using matlab simulink interface with ccs studio3.3 in DM6437 DVDP.

6.1.2

Hardware Requirements

Texas Instruments DM6437 Digital Video Development Platform (DVDP)


PC
Video camera
LCD Display

6.1.3

Software Requirements:

Mathworks Products
MATLAB R2008a
Simulink

63

Image and video Processing Toolbox


Signal Processing Toolbox
Signal Processing Blockset for Simulink
Real Time Workshop (w/o Embedder Coder)
Link for Code Composer Studio
Embedded Target for TI C6000 DSP.
Texas Instruments Products
- Code Composer Studio(CCS) v3.3
Hardware Setup
1. Connect the EVM6437 eval board to the PC using the USB cable.
2. Connect power to the board.
3. Dont press any buttons on the board.
4. Ensure that all of the software products are installed.
Start by creating a new model in Simulinko.
The procedure for capture/display video using the DM6437 is shown in Figure 5.28

Figure 6.1: DM6437 board

1. Open the Simulink library browser as shown in figure 5.29


2. In the new window, add the "Video Capture" and "Video Display" from the "DM6437
EVM Board Support" group of the "Target Support Package TC6" Blockset as shown in
Figure 5.30.
64

Figure 6.2: Open simulink lib browser 1

Figure 6.3: Video capture

3. Double-click the Video Capture block and change the Sample Time (and the Video
capture mode only if you are using the components in the PAL/NTSC mode) as shown in
Figure 5.32.
4. Double-click the Video Display block and change the Video capture mode only if you
are using the components in the PAL2 mode as shown in Figure.
5. Save the model as Video_preview.mdlas shown in figure 5.34,
6. Add the "Complement block" from the "Sources" group of the Video and Image Processing Blockset as shown in figure 5.35.
7. Connect the blocks as shown in Figure 5.34.
8. Select the target from C6000 lib that is EVM6437 as shown in figure 5.36.
9. Different simulink(.mdl)file are shown in figure 5.37 and 5.38, these are Video_complement.mdl
and Video_edge_sobal.mdl,
10. Generate code & create project. Double-click the " Generate code &.." block.

65

Figure 6.4: Add video display 1

Figure 6.5: Video capture conf

11. Build the project. Double-click the Build Project block.


12. Load the project. Double-click the Load Project block.
13. Run the target. Double-click the Run block.

6.1.4

Configuration Parameters for C6000 Hardware

1. Launch Matlab
2. At the Matlab command line, type simulinkto launch Simulink
3. Create a new model in Simulink.
4. To open the Configuration Parameters, select Simulation Configuration Parameters.
as shown in figure 5.41.
5. In the Select tree, chose the Real-Time Workshop category.

66

Figure 6.6: Video display conf

Figure 6.7: Video preview 1

6. For Target Selection, choose the file ti c6000.tlc. Real-Time Workshop will automatically change the Make command and Template makefile selections as shown in figure
5.39.
7. Choose the Optimization category in the Select tree. For Simulation and Code generation, unselect Block reduction optimization and Implement logic signals as shown in
figure 5.40.
8. Choose the TI C6000 target sel. Set Code generation target type to DM6437 DVDP.
9. Choose the TI C6000 compiler. Set Symbolic debugging.
10. In the Select tree, choose the Debug category. Select Verbose build here.
11. In the Select tree, choose the Solver category. Ensure that Solver is set to Fixed type /
67

Figure 6.8: Video image toolbox

Figure 6.9: Target selection

discrete.

68

Figure 6.10: Video complement

Figure 6.11: Video sobal edge detection 1

Figure 6.12: Simulink conf 1

69

Figure 6.13: Simulink conf 2

Figure 6.14: Simulink conf 3

70

6.2

APPENDIX B:Edge Detection using Video and Image library

The Texas Instruments C64x+ IMGLIB is an optimized Image/Video Processing Functions Library for C programmers using TMS320C64x+ devices. It includes many C-callable, assemblyoptimized, general-purpose image/video processing routines. These routines are used in realtime applications where ptimal execution speed is critical. Using these routines assures execution speeds considerably faster than equivalent code written in standard ANSI C language.
In addition, by providing ready-to-use DSP functions, TI IMGLIB can significantly shorten
image/video processing application development time.
IN case of Code Composer Studio, IMGLIB can be added by selecting Add Files to Project
from the Project menu, and choosing imglib2.l64P from the list of libraries under the c64plus
than in imglib_v2xx folder. Also, ensure that it have linked to the correct run time support
library (rts64plus.lib). An alternate to include the above two libraries in your project is to add the
following lines in your linker command file: -lrts64plus.lib -limglib2.l64P The include directory
contains the header files necessary to be included in the C code when you call an IMGLIB2
function from C code, and should be added to the "include path" in CCS build options. The
Image and Video processing Library (IMGLIB)[] is which is having 70 building block kernels
that can be used for image and video processing applications. IMGLIB includes:
Compression and Decompression : DCT, motion estimation, quantization, wavelet Processing
Image Analysis: Boundary and perimeter estimation, morphological operations, edge
detection, image histogram, image thresholding
Image Filtering & Format Conversion: image convolution, image Correlation, median
filtering, color space conversion
VLIB is software library having more than 40 kernels from TI accelerates video analytics
development and increases performance up to 10 times. This 40+ kernels provide the ability to
perform:
Background Modeling & Subtraction
Object Feature Extraction
71

Tracking & Recognition


Low-level Pixel Processing
Edge detection using video and image library as shown in figure

Figure 6.15: Video and image library

Step 1: open Video preview project;video_preview.pjt. Step 2: Add these two library for
sobal and median filter function.
#include <C:\dvsdk_1_01_00_15\include\IMG_sobel_3x3_8.h>
#include <C:\dvsdk_1_01_00_15\include\IMG_median_3x3_8.h>
Step 3: Add these two caller function following parameter. frameBuffPtr is a array of structue
frame, in order to access access the frame buffer pointer use frame.frameBufferPtr. and
480, 1440 are the length and width of the frame.
FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
IMG_sobel_3x3_8((frameBuffPtr->frame.frameBufferPtr),
(frameBuffPtr->frame.frameBufferPtr),480,1440);
IMG_median_3x3_8((frameBuffPtr->frame.frameBufferPtr),
8,(frameBuffPtr->frame.frameBufferPtr));
FVID_exchange(hGioVpbeVid0, &frameBuffPtr);

6.3
6.3.1

APPENDIX C: Video Processing Tokens


Video standard ( NTSC & PAL)

NTSC ( National Television System Committee), which consists of 29.97 interlaced frames of
video per second. Each frame consists of a total of 525 scanlines, of which 486 make up the
visible raster. The remainder (the vertical blanking interval) are used for synchronization and
vertical retrace.
72

PAL( Phase Alternating Line), is an analogue television colour encoding system used in
broadcast television systems in many countries. Analogue television further describe frame
rates, image resolution and audio modulation. For discussion of the 625-line / 50 field (25
frame) per second television standard, see 576i.The term PAL is often used informally to refer
to a 625-line/50 Hz (576i) television system, and to differentiate from a 525-line/60 Hz (480i)
NTSC system.
PAL specifies 786 pixels per line, 625 lines per screen, 25 frames per second, and a primary
power of 220 volts PAL delivers 625 lines at 50 half-frames per second while NTSC delivers
525 lines of resolution at 60 half-frames per second.
Difference between NTSC and PAL
NTSC is the video system or standard used in North America and most of South America. In
NTSC, 30 frames are transmitted each second. Each frame is made up of 525 individual scan
lines.PAL is the predominant video system or standard mostly used overseas. In PAL, 25 frames
are transmitted each second. Each frame is made up of 625 individual scan lines.
720 x 576 = 414, 720 for 4:3 aspect ratio PAL.
720 x 480 = 345, 600 for 4:3 aspect ratio NTSC
Frame and Field Rates
In video:PAL is higher in resolution (576 horizontal lines) than NTSC (480 horizontal lines),
but NTSC updates the on-screen image more frequently than PAL (30 times per second versus
25 times per second) NTSC video is lower in resolution than PAL video, but because the screen
updates more frequently, motion is rendered better in NTSC video than it is in PAL video.
There is less jerkiness visible in NTSC. When video source material is transferred to DVD, it is
usually transferred in the format it was created in - PAL or NTSC, and the subsequent image has
either higher temporal resolution (more frames per second - NTSC) or higher spatial resolution
(more lines per image - PAL)
Movies Movies the world over are shown at a frame rate of 24 frames per second. That is,
24 images are projected onto the cinema screen every second. Issue Of Resolution is PAL
DVDs have a compelling advantage over NTSC DVDs. PAL DVDs have 576 pixels of vertical
resolution versus 480 pixels of vertical resolution. That is a 20 percent increase in resolution
for a PAL DVD as compared to an NTSC DVD. Increased resolution translates into a better
73

looking image.

6.3.2

Video timing(Interlaced Vs Progressive)

Progressive Displays Paint the Lines of An Image Consecutively, One After Another Interlaced Displays Paint First One-Half of the Image (Odd Lines), Then the Other Half (Even
Lines). A CRT, each image is displayed starting at the top left corner of the display, moving to
the right edge of the display. Then scanning then moves down one line, and repeats scanning
left-to-right This process is repeated until the entire screen is refreshed.
Interlacing was used to reduce the amount of information sent for each image. By transferring
the odd-numbered lines, followed by the even-numbered lines .The amount of information sent
for each image was halved.
A progressive display has no limit on the line-to-line changes, so is capable of providing a
higher resolution image (vertically) without flicker. LCD.Ex plasma and computer displays are
progressive

6.3.3

Video Resolution(HD, ED, SD)

Video resolution is one of those fuzzy" things in life. It is common to see video resolutions
of 720 x 480 or 1920 x 1080. However, those are just the number of horizontal samples and
vertical scan lines & do not necessarily convey the amount of useful information.
Ex: Analog video signal can be sampled at 13.5 MHz to generate 720 samples per line.
Sampling the same signal at 27 MHz would generate 1440 samples per line. However, only the
number of samples per line has changed, not the resolution of the content. Therefore, video is
usually measured using lines of resolution". In essence, how many distinct black and white
vertical lines can be seen across the display. This number is then normalized to a 1:1 display
aspect ratio (dividing the number by 3/4 for a 4:3 display, or by 9/16 for a 16:9 display).Aspect
ratio : is the ratio of picture width to its height . Ex 4:3 aspect ratio & 16:9 aspect ratio.
Standard Definition(SD): is usually defined as having 480 or 576 interlaced active scan
lines, and is commonly called "480i" & "576i" respectively. For a fixed-pixel (non-CRT) consumer display with a 4:3 aspect ratio, this translates into an resolution of 720 x 480i or 720 x
576i. For a 16:9 aspect ratio, this translates into an active resolution of 960 x 480i or 960 x
576i.

74

Enhanced Definition(ED): Enhanced definition video is usually defined as having 480


or 576 progressive active scan lines, and is commonly called "480p" and "576p" respectively.
Difference between SD and ED is that SD is interlaced, while ED is progressive.
High Definition(HD):is usually defined as having 720 progressive (720p) or 1080 interlaced (1080i) active scan lines. CRT-based HDTVs with a 4:3 aspect ratio and LCD/ plasma
16:9 aspect ratio displays with resolutions of 1024 x 1024p, 1280 x 768p, 1024 x 768p. fixedpixel (non-CRT) consumer display with a 16:9 aspect ratio, this translates into an active resolution of 1280 x 720p or 1920 x 1080i, respectively.

6.3.4

Video file format(YUV420, YCbCr)

The three most popular color models are RGB(used in computer graphics), YIQ, YUV, or
YCbCr(used in video systems) and CMYK (used in color printing). The YUV color space
is used by the PAL, NTSC and SECAM(Sequentiel Couleur Avec M moire or Sequential Color
with Memory) composite color video standards. The black-and-white system used only luma
(Y) information; color information (U and V) was added in such a way that a black-and-white
receiver would still display a normal black-and-white picture.For digital RGB values with a
range of 0:255, Y has a range of 0:255, U a range of 0 to +/-112, and V a range of 0 to +/-157.
YCbCr, or its close relatives, YUV, YUV, YCbCr, and YPbPr are designed to be efficient
at encoding RGB values so they consume less space while retaining the full perceptual value.
YCbCr is a scaled and offset version of YUV color space. Y is defined to have a nominal
8-bit range of 16-235; Cb and Cr are defined to have a nominal range of 16-240. There are
several YCbCr sampling formats, such as 4:4:4, 4:2:2, 4:1:1, and 4:2:0 that are also described
and shown in figure 3.1.
4:4:4 YCbCr sampling Format: Each sample has a Y, a Cb and a Cr value. Each sample
is typically 8 bits (consumer applications) or 10 bits (pro-video applications) per component. Each sample therefore requires 24 bits (or 30 bits for pro-video applications).
4:2:2 YCbCr Format: For every two horizontal Y samples, there is one Cb and Cr sample.
Each sample is typically 8 bits (consumer applications) or 10 bits (pro-video applications)
per component. Each sample therefore requires 16 bits (or 20 bits for pro video applications), usually formatted.To display 4:2:2 YCbCr data, it is first converted to 4:4:4 YCbCr
data, using interpolation to generate the missing Cb and Cr samples.
75

4:2:0 YCbCr Format:Rather than the horizontal-only 2:1 reduction of Cb and Cr used
by 4:2:2, 4:2:0 YCbCr implements a 2:1 reduction of Cb and Cr in both the vertical and
horizontal directions. It is commonly used for video compression.

Figure 6.16: 4:4:4 YCbCr, 4:2:2 YCbCr, 4:2:0 YCbCr color sampling format respectable

Also YUV12, (YCbCr 4:1:1)used in some consumer video and DV video compression
applications.RGB - YCbCr conversion equations: are differ for SDTV & HDTV. Gammacorrected RGB is notated as RGB. Other color format are YIQ, YDbDr, YpbPr, xxYCC,
HVS and HIS.

6.3.5

Video IO Interface(Composite, component, S-Video)

Composite-Video
Composite video is the format of an analog television (picture only) signal before it is combined
with a sound signal and modulated onto an RF carrier. In contrast to Component(YPrPb) it
contains all required video information, including colors in a single line-level signal. Like
component video, composite video cables do not carry audio and are often paired with audio
cables.
A video stream is composed of a Y signal for luminescence or black and white values and
a C signal for chrominance or color. The Y signal provides brightness and contrast, allowing
for deep rich blacks and startling bright whites. The quality of this signal is especially evident
76

in low-lit scenes where a degraded signal will translate to faded" blacks and muted whites,
making it difficult to differentiate scenery or action. The color signal RGB for red, green and
blue carries the information needed to create changing hues. Composite video is so named
because the Y/C signals are compressed and channeled through a single wire to be separated
by a comb filter" inside the television set. i.e The color video signal is a linear combination
of the luminance of the picture, and a modulated subcarrier carries the chrominance or color
information, a combination of hue and saturation.
S-Video
The RCA phono connector or BNC connector (pro-video market)transfers a composite NTSC
or PAL video signal, made by adding the intensity (Y) and color (C) video signals together.
The television then has to separate these Y and C video signals in order to display the picture.
The problem is that the Y/C separation process at decoder side is never perfect. Many video
components now support a 4-pin S1 S-Video connector. This connector keeps the intensity
(Y) and color (C) video signals separate, eliminating the Y/C separation process in the TV. As
a result, the picture is sharper and has less noise
Separate Video more commonly known as S-Video and Y/C, is often referred to by JVC(who
introduced the DIN-connector pictured) as both an S-VHS connector and as Super Video. It is
an analog video transmission scheme, in which video information is encoded on two channels:
luma (luminance, intensity, gray, Y") and chroma (colour, C").
More recently, S-Video has been superseded by component video, which isolates not only
the Y signal on its own cable, but the red and blue signals as well, while green values are inferred
from reading the other data streams. Component video requires three cables plus audio cables,
for a total of five cables. The latest enhancement in audiovisual interfaces is High-Definition
Multimedia Interface (HDMI), a true digital interface that combines video and audio into a
single cable while preserving perfect integrity. This all-digital standard is the most desirable
interface currently available.

77

6.4

APPENDIX D:YUV to RGB Conversion

6.4.1

YUV format

Digital video is often encoded in a YUV format. An RGB color is encoded using three values:
red, green, and blue. Although RGB is a common way to represent colors, other coordinate
systems are possible. The term YUV refers to a family of color spaces, all of which encode
brightness information separately from color information. Like RGB, YUV uses three values
to represent any color. These values are termed Y, U, and V. (In fact, this use of the term
"YUV" is technically inaccurate. In computer video, the term YUV almost always refers to one
particular color space named YCbCr, discussed in chapter 2. However, YUV is often used as a
general term for any color space that works along the same principles as YCbCr.)
The Y component, also called luma, represents the brightness value of the color. The
prime symbol () is used to differentiate luma from a closely related value, luminance, which
is designated Y. Luminance is derived from linear RGB values, whereas luma is derived from
non-linear (gamma-corrected) RGB values. Luminance is a closer measure of true brightness
but luma is more practical to use for technical reasons. The prime symbol is frequently omitted,
but YUV color spaces always use luma, not luminance.
Y 0 = 0.299R + 0.587G + 0.114B

(6.1)

This formula reflects the fact that the human eye is more sensitive to certain wavelengths of
light than others, which affects the perceived brightness of a color. Blue light appears dimmest,
green appears brightest, and red is somewhere in between. This formula also reflects the physical characteristics of the phosphors used in early televisions. A newer formula, taking into
account modern television technology, is used for high-definition television:
Y 0 = 0.2125R + 0.7154G + 0.0721B

(6.2a)

The luma equation for standard-definition television is defined in a specification named ITU-R
BT.601. For high-definition television, the relevant specification is ITU-R BT.709. The U and
V components, also called chroma values or color difference values, are derived by subtracting
the Y value from the red and blue components of the original RGB color:
U =BY0

(6.2b)

V =RY0

(6.2c)

78

Benefits of YUV
Analog television uses YUV partly for historical reasons. Analog color television signals were
designed to be backward compatible with black-and-white televisions. The color television
signal carries the chroma information (U and V) superimposed onto the luma signal. Blackand-white televisions ignore the chroma and display the combined signal as a grayscale image.
(The signal is designed so that the chroma does not significantly interfere with the luma signal.)
Color televisions can extract the chroma and convert the signal back to RGB.
YUV has another advantage that is more relevant. The human eye is less sensitive to
changes in hue than changes in brightness. As a result, an image can have less chroma information than luma information without sacrificing the perceived quality of the image. For example,
it is common to sample the chroma values at half the horizontal resolution of the luma samples.
In other words, for every two luma samples in a row of pixels, there is one U sample and one
V sample. Assuming that 8 bits are used to encode each value, a total of 4 bytes are needed for
every two pixels (two Y, one U, and one V), for an average of 16 bits per pixel, or 30% less
than the equivalent 24-bit RGB encoding.
YUV is not inherently any more compact than RGB. Unless the chroma is downsampled,
a YUV pixel is the same size as an RGB pixel. Also, the conversion from RGB to YUV is not
lossy. If there is no downsampling, a YUV pixel can be converted back to RGB with no loss
of information. Downsampling makes a YUV image smaller and also loses some of the color
information. If performed correctly, however, the loss is not perceptually significant.
YUV in Computer Video
The formulas listed previously for YUV are not the exact conversions used in digital video.
Digital video generally uses a form of YUV called YCbCr. Essentially, YCbCr works by
scaling the YUV components to the following ranges:
These ranges assume 8 bits of precision for the YCbCr components. Here is the exact
derivation of YCbCr, using the BT.601 definition of luma:
Start with RGB values in the range [0...1]. In other words, pure black is 0 and pure white
is 1. Importantly, these are non-linear (gamma corrected) RGB values. Calculate the luma.
For BT.601, Y = 0.299R + 0.587G + 0.114B, as described earlier. Calculate the intermediate
chroma difference values (B - Y) and (R - Y). These values have a range of +/- 0.886 for (B -

79

Y), and +/- 0.701 for (R - Y). Scale the chroma difference values as follows:
P b = (0.5/(1 0.114)) (B Y 0 )

(6.3a)

P r = (0.5/(1 0.299)) (R Y 0 )

(6.3b)

These scaling factors are designed to give both values the same numerical range, +/- 0.5. Together, they define a YUV color space named YPbPr. This color space is used in analog
component video.Scale the YPbPr values to get the final YCbCr values:
Y 0 = 16 + 219 Y 0

(6.4a)

Cb = 128 + 224 P b

(6.4b)

Cr = 128 + 224 P r

(6.4c)

The following table shows RGB and YCbCr values for various colors, again using the BT.601
definition of luma.
Table 6.1: RGB and YCbCr values for various colors using BT.601
Color

Cb

Cr

Black

16

128

128

Red

255

81

90

240

Green

255

145

54

34

Blue

255

41

240

110

Cyan

255

255

170

166

16

Magenta

255

255

106

202

222

Yellow

255

255

210

16

146

White

255

255

255

235

128

128

As this table shows, Cb and Cr do not correspond to intuitive ideas about color. For
example, pure white and pure black both contain neutral levels of Cb and Cr (128). The highest
and lowest values for Cb are blue and yellow, respectively. For Cr, the highest and lowest values
80

are red and cyan. Note For the purposes of this article, the term U is equivalent to Cb, and the
term V is equivalent to Cr. YUV Sampling

6.4.2

8-Bit YUV Formats for Video

In this case we use 8 bits per pixel location to encode the Y channel , and use 8 bits per sample
to encode each U or V chroma sample. However, most YUV formats use fewer than 24 bits per
pixel on average, because they contain fewer samples of U and V than of Y. This article does
not cover YUV formats with 10-bit or higher Y channels.
Chroma channels can have a lower sampling rate than the luma channel, without any
dramatic loss of perceptual quality. A notation called the "A:B:C" notation is used to describe
how often U and V are sampled relative to Y:
4:4:4 Formats, 32 Bits per Pixel -means no downsampling of the chroma channels. AYUV
is YUV 4:4:4 Formats where each pixel is encoded as four consecutive bytes arranged in
the sequence.
4:2:2 Formats, 16 Bits per Pixel -means 2:1 horizontal downsampling, with no vertical
downsampling. Every scan line contains four Y samples for every two U or V samples.
4:2:0 Formats, 16 Bits per Pixel-means 2:1 horizontal downsampling, with 2:1 vertical
downsampling. IMC1 and IMC3 are the two YUV 4:2:0 Formats presents.
4:1:1 Formats, 12 Bits per Pixel-means 4:1 horizontal downsampling, with no vertical
downsampling. Every scan line contains four Y samples for each U and V sample. 4:1:1
sampling is less common than other formats, and is not discussed in detail in this article.
IMC2, IMC4, YV12 and NV12 are the four YUV 4:1:1 Formats presents.
The following diagrams shows how chroma is sampled for each of the downsampling rates.
Luma samples are represented by a cross, and chroma samples are represented by a circle.
The following diagrams shows how chroma is sampled for each of the downsampling rates.
Luma samples are represented by a cross, and chroma samples are represented by a circle.
Two 4:2:2 formats are recommended, with the following FOURCC codes:YUY2 and
UYVY
Both are packed formats, where each macropixel is two pixels encoded as four consecutive
bytes. This results in horizontal downsampling of the chroma by a factor of two. In YUY2
81

format, the data can be treated as an array of unsigned char values, where the first byte contains
the first Y sample, the second byte contains the first U (Cb) sample, the third byte contains the
second Y sample, and the fourth byte contains the first V (Cr) sample, as shown in the following
diagram.

Figure 6.17: YUY2 memory layout

If the image is addressed as an array of little-endian WORD values, the first WORD contains the first Y sample in the least significant bits (LSBs) and the first U (Cb) sample in the
most significant bits (MSBs). he second WORD contains the second Y sample in the LSBs and
the first V (Cr) sample in the MSBs.
YUY2 is the preferred 4:2:2 pixel format for Video Acceleration (DirectX VA). It is expected to be an intermediate-term requirement for VA accelerators supporting 4:2:2 video.
Tthat

UYVY format is the same as the YUY2 format except the byte order is reversedA
is, the chroma and luma bytes are flipped (Figure 4). If the image is addressed as an array of
two little-endian WORD values, the first WORD contains U in the LSBs and Y0 in the MSBs,
and the second WORD contains V in the LSBs and Y1 in the MSBs.

Figure 6.18: UYVY memory layout

Figure 6.19: RGB2UYVY

82

Surface Definitions
This section describes the 8-bit YUV formats that are recommended for video rendering. These
fall into several categories:

Figure 6.20: YUV sampling

First, you should be aware of the following concepts :


Surface origin: For the YUV formats described in this article, the origin (0,0) is always
the top left corner of the surface.
Stride: The stride of a surface, sometimes called the pitch, is the width of the surface in
bytes. Given a surface origin at the top left corner, the stride is always positive.
Alignment: The alignment of a surface is at the discretion of the graphics display driver.
Packed format versus planar format. YUV formats are divided into packed formats and planar
formats. In a packed format, the Y, U, and V components are stored in a single array. Pixels are
organized into groups of macropixels, whose layout depends on the format. In a planar format,
the Y, U, and V components are stored as three separate planes.
Picture Aspect Ratio
Picture aspect ratio defines the shape of the displayed video image. Picture aspect ratio is
notated X:Y, where X:Y is the ratio of picture width to picture height. Most video standards use
either 4:3 or 16:9 picture aspect ratio. The 16:9 aspect ratio is commonly called widescreen.
Picture aspect ratio is also called display aspect ratio (DAR).
83

Figure 6.21: Picture ascept ratio

Pixel Aspect Ratio


Pixel aspect ratio (PAR) measures the shape of a pixel. When a digital image is captured, the
image is sampled both vertically and horizontally, resulting in a rectangular array of quantized
samples, called pixels or pels. Pixel aspect ratio also applies to the display device. The physical
shape of the display device and the physical pixel resolution (across and down) determine the
PAR of the display device. Computer monitors generally use square pixels. If the image PAR
and the display PAR do not match, the image must be scaled in one dimension, either vertically
or horizontally, in order to display correctly. The following formula relates PAR, display aspect
ratio (DAR), and image size in pixels:

DAR =

image_width_in_pixels
P AR
image_height_in_pixels

(6.5)

Here is a real-world example: NTSC-M analog video contains 480 scan lines in the active
image area. ITU-R Rec. BT.601 specifies a horizontal sampling rate of 704 visible pixels per
line, yielding a digital image with 704 x 480 pixels. The intended picture aspect ratio is 4:3,
yielding a PAR of 10:11.
DAR: 4:3
Width in pixels: 704
Height in pixels: 480
PAR: 10/11

84

where 4/3 = (704/420) x (10/11)


To display this image correctly on a display device with square pixels, you must scale
either the width by 10/11 or the height by 11/10.

Figure 6.22: Pixel ascept ratio

6.4.3

Color Space Conversion

Conversion from one YCbCr space to another requires the following steps.
1. Inverse quantization: Convert the YCbCr representation to a YPbPr representation, using the source nominal range.
2. Upsampling: Convert the sampled chroma values to 4:4:4 by interpolating chroma values.
3. YUV to RGB conversion: Convert from YPbPr to non-linear RGB, using the source
transfer matrix.
4. Inverse transfer function: Convert non-linear RGB to linear RGB, using the inverse of
the transfer function.
5. RGB color space conversion: Use the color primaries to convert from the source RGB
space to the target RGB space.
6. Transfer function: Convert linear RGB to non-linear RGB, using the target transfer
function.
85

7. RGB to YUV conversion: Convert RGB to YPbPr, using the target transfer matrix.
8. Downsampling: Convert 4:4:4 to 4:2:2, 4:2:0, or 4:1:1 by filtering the chroma values.
9. Quantization: Convert YPbPr to YCbCr, using the target nominal range.
Steps 1 to 4 occur in the source color space, and steps 6 to 9 occur in the target color space.
In the actual implementation, intermediate steps can be approximated and adjacent steps can be
combined. There is generally a trade-off between accuracy and computational cost.
For example, to convert from RT.601 to RT.709 requires the following stages:
1. Inverse quantization: YCbCr(601) to YPbPr(601)
2. Upsampling: YPbPr(601)
3. YUV to RGB: YPbPr(601) to RGB(601)
4. Inverse transfer function: RGB(601) to RGB(601)
5. RGB color space conversion: RGB(601) to RGB(709)
6. Transfer function: RGB(709) to RGB(709)
7. RGB to YUV: RGB(709) to YPbPr(709)
8. Downsampling: YPbPr(709)
9. Quantization: YPbPr(709) to YCbCr(709)
Converting RGB888 to YUV 4:4:4
In the case of computer RGB input and 8-bit BT.601 YUV output, we believe that the formulas
given in the previous section can be reasonably approximated by the following:
Y = ((66 R + 129 G + 25 B + 128) >> 8) + 16

(6.6a)

U = ((38 R 74 G + 112 B + 128) >> 8) + 128

(6.6b)

V = ((112 R 94 G 18 B + 128) >> 8) + 128

(6.6c)

These formulas produce 8-bit results using coefficients that require no more than 8 bits of
(unsigned) precision. Intermediate results will require up to 16 bits of precision.
86

Converting 8-bit YUV to RGB888


From the original RGB-to-YUV formulas, one can derive the following relationships for BT.601.
Y = round(0.256788 R + 0.504129 G + 0.097906 B) + 16

(6.7a)

U = round(0.148223 R 0.290993 G + 0.439216 B) + 128

(6.7b)

V = round(0.439216 R 0.367788 G 0.071427 B) + 128

(6.7c)

Therefore, given C, D, E after subtracting constants: C = Y - 16 D = U - 128 E = V - 128


the formulas to convert YUV to RGB can be derived as follows:
R = clip(round(1.164383 C + 1.596027 E))

(6.8a)

G = clip(round(1.164383 C (0.391762 D) (0.812968 E)))

(6.8b)

B = clip(round(1.164383 C + 2.017232 D))

(6.8c)

where clip() denotes clipping to a range of [0..255]. We believe these formulas can be
reasonably approximated by the following:

R = clip((298 C + 409 E + 128) >> 8)

(6.9a)

G = clip((298 C 100 D 208 E + 128) >> 8)

(6.9b)

B = clip((298 C + 516 D + 128) >> 8)

(6.9c)

These formulas use some coefficients that require more than 8 bits of precision to produce
each 8-bit result, and intermediate results will require more than 16 bits of precision.
To convert 4:2:0 or 4:2:2 YUV to RGB, we recommend converting the YUV data to 4:4:4
YUV, and then converting from 4:4:4 YUV to RGB. The sections that follow present some
methods for converting 4:2:0 and 4:2:2 formats to 4:4:4.
Conversion between RGB and 4:4:4 YUV
We first describe conversion between RGB and 4:4:4 YUV. To convert 4:2:0 or 4:2:2 YUV to
RGB, we recommend converting the YUV data to 4:4:4 YUV, and then converting from 4:4:4
YUV to RGB. The AYUV format, which is a 4:4:4 format, uses 8 bits each for the Y, U, and V
samples. YUV can also be defined using more than 8 bits per sample for some applications.

87

Two dominant YUV conversions from RGB have been defined for digital video. Both are
based on the specification known as ITU-R Recommendation BT.709. The first conversion is
the older YUV form defined for 50-Hz use in BT.709. It is the same as the relation specified
in ITU-R Recommendation BT.601, also known by its older name, CCIR 601. It should be
considered the preferred YUV format for standard-definition TV resolution (720 x 576) and
lower-resolution video. It is characterized by the values of two constants Kr and Kb:
Converting 4:2:0 YUV to 4:2:2 YUV
Converting 4:2:0 YUV to 4:2:2 YUV requires vertical upconversion by a factor of two. This
section describes an example method for performing the upconversion. The method assumes
that the video pictures are progressive scan.
Note The 4:2:0 to 4:2:2 interlaced scan conversion process presents atypical problems and
is difficult to implement. This article does not address the issue of converting interlaced scan
from 4:2:0 to 4:2:2.
Converting 4:2:2 YUV to 4:4:4 YUV
Converting 4:2:2 YUV to 4:4:4 YUV requires horizontal upconversion by a factor of two. The
method described previously for vertical upconversion can also be applied to horizontal upconversion. For MPEG-2 and ITU-R BT.601 video, this method will produce samples with the
correct phase alignment. Converting 4:2:0 YUV to 4:4:4 YUV
To convert 4:2:0 YUV to 4:4:4 YUV, you can simply follow the two methods described
previously. Convert the 4:2:0 image to 4:2:2, and then convert the 4:2:2 image to 4:4:4. You
can also switch the order of the two upconversion processes, as the order of operation does not
really matter to the visual quality of the result.
Summary Demo code for video capture, display, encoder and decoder , video copy and
Video preview is imaplemented on DM6437 board. The image invertion and edge detection
using sobel operator is implemented on davinci board .

6.5

APPENDIX E: Single object tracking on DM6437 CODE

/* * ======== video_preview.c ========* */


/* runtime include files */
88

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
/* BIOS include files */
#include <std.h>
#include <gio.h>
#include <tsk.h>
#include <trc.h>
/* PSP include files */
#include <psp_i2c.h>
#include <psp_vpfe.h>
#include <psp_vpbe.h>
#include <fvid.h>
#include <psp_tvp5146_extVidDecoder.h>
/* CSL include files */
#include <soc.h>
#include <cslr_sysctl.h>
/* BSL include files */
#include <evmdm6437.h>
#include <evmdm6437_dip.h>
/* Video Params Defaults */
#include <vid_params_default.h>
/* This example supports either PAL or NTSC depending
on position of JP1 */
#define STANDARD_PAL

#define STANDARD_NTSC 1
#define FRAME_BUFF_CNT 6
static int read_JP1(void);
static CSL_SysctlRegsOvly sysModuleRegs =
(CSL_SysctlRegsOvly )CSL_SYS_0_REGS;
//*******************************************************
89

// USER DEFINED FUNCTIONS


//*******************************************************
void extract_uyvy (void * currentFrame);
void write_uyvy (void * currentFrame);
void tracking();
void copy_frame();
void frame_substract();
//*******************************************************
// VARIABLE ARRAYS
//*******************************************************
unsigned char I_y[480][720];
unsigned char I_u[480][360];
unsigned char I_v[480][360];
unsigned char I_y1[480][720];
unsigned char I_u1[480][360];
unsigned char I_v1[480][360];
unsigned char I_y2[480][720];
unsigned char I_u2[480][360];
unsigned char I_v2[480][360];
unsigned char I_y3[480][720];
unsigned char I_u3[480][360];
unsigned char I_v3[480][360];
unsigned char I_y4[480][720];
unsigned char I_u4[480][360];
unsigned char I_v4[480][360];
////////////////////////
/* * ======== main ========* */
void main() {
printf("Video Preview Application\n");
fflush(stdout);
/* Initialize BSL library to read jumper switches: */
EVMDM6437_DIP_init();

90

/* VPSS PinMuxing */
/* CI10SEL

- No CI[1:0]

*/

/* CI32SEL

- No CI[3:2]

*/

/* CI54SEL

- No CI[5:4]

*/

/* CI76SEL

- No CI[7:6]

*/

/* CFLDSEL

- No C_FIELD

*/

/* CWENSEL

- No C_WEN

*/

/* HDVSEL

- CCDC HD and VD enabled

*/

/* CCDCSEL

- CCDC PCLK, YI[7:0] enabled

*/

/* AEAW

- EMIFA full address mode

*/

/* VPBECKEN

- VPBECLK enabled

*/

/* RGBSEL

- No digital outputs

*/

/* CS3SEL

- LCD_OE/EM_CS3 disabled

*/

/* CS4SEL

- CS4/VSYNC enabled

*/

/* CS5SEL

- CS5/HSYNC enabled

*/

/* VENCSEL

- VCLK,YOUT[7:0],COUT[7:0] enabled */

/* AEM

- 8bEMIF + 8bCCDC + 8 to 16bVENC

sysModuleRegs -> PINMUX0

&= (0x005482A3u);

sysModuleRegs -> PINMUX0

|= (0x005482A3u);

/* PCIEN

0: PINMUX1 - Bit 0 */

sysModuleRegs -> PINMUX1 &= (0xFFFFFFFEu);


sysModuleRegs -> VPSSCLKCTL = (0x18u);
return;}
/* * ======== video_preview ========* */
void video_preview(void) {
FVID_Frame *frameBuffTable[FRAME_BUFF_CNT];
FVID_Frame *frameBuffPtr;
GIO_Handle hGioVpfeCcdc;
GIO_Handle hGioVpbeVid0;
GIO_Handle hGioVpbeVenc;
int status = 0;
int result;

91

*/

int i;
int standard;
int width;
int height;
/* Set video display/capture driver params to defaults */
PSP_VPFE_TVP5146_ConfigParams tvp5146Params =
VID_PARAMS_TVP5146_DEFAULT;
PSP_VPFECcdcConfigParams

vpfeCcdcConfigParams =

VID_PARAMS_CCDC_DEFAULT_D1;
PSP_VPBEOsdConfigParams vpbeOsdConfigParams =
VID_PARAMS_OSD_DEFAULT_D1;
PSP_VPBEVencConfigParams vpbeVencConfigParams;
standard = read_JP1();

/* Update display/capture params based on video standard (PAL/NTSC) */


if (standard == STANDARD_PAL)
width

= 720;

height = 576;
vpbeVencConfigParams.displayStandard =
PSP_VPBE_DISPLAY_PAL_INTERLACED_COMPOSITE;}
else {
width

= 720;

height = 480;
vpbeVencConfigParams.displayStandard =
PSP_VPBE_DISPLAY_NTSC_INTERLACED_COMPOSITE; }
vpfeCcdcConfigParams.height = vpbeOsdConfigParams.height = height;
vpfeCcdcConfigParams.width = vpbeOsdConfigParams.width = width;
vpfeCcdcConfigParams.pitch = vpbeOsdConfigParams.pitch = width * 2;
/* init the frame buffer table */
for (i=0; i<FRAME_BUFF_CNT; i++) {
frameBuffTable[i] = NULL;

/* create video input channel */


if (status == 0) {

92

PSP_VPFEChannelParams vpfeChannelParams;
vpfeChannelParams.id

= PSP_VPFE_CCDC;

vpfeChannelParams.params =
(PSP_VPFECcdcConfigParams*)&vpfeCcdcConfigParams;
hGioVpfeCcdc = FVID_create
("/VPFE0",IOM_INOUT,NULL,&vpfeChannelParams,NULL);
status = (hGioVpfeCcdc == NULL ? -1 : 0);

/* create video output channel, plane 0 */


if (status == 0) {
PSP_VPBEChannelParams vpbeChannelParams;
vpbeChannelParams.id

= PSP_VPBE_VIDEO_0;

vpbeChannelParams.params =
(PSP_VPBEOsdConfigParams*)&vpbeOsdConfigParams;
hGioVpbeVid0 = FVID_create
("/VPBE0",IOM_INOUT,NULL,&vpbeChannelParams,NULL);
status = (hGioVpbeVid0 == NULL ? -1 : 0);

/* create video output channel, venc */


if (status == 0) {
PSP_VPBEChannelParams vpbeChannelParams;
vpbeChannelParams.id

= PSP_VPBE_VENC;

vpbeChannelParams.params =
(PSP_VPBEVencConfigParams *)&vpbeVencConfigParams;
hGioVpbeVenc = FVID_create
("/VPBE0",IOM_INOUT,NULL,&vpbeChannelParams,NULL);
status = (hGioVpbeVenc == NULL ? -1 : 0);

/* configure the TVP5146 video decoder */


if (status == 0) {
result = FVID_control(hGioVpfeCcdc,
VPFE_ExtVD_BASE+PSP_VPSS_EXT_VIDEO_DECODER_CONFIG, &tvp5146Params);
status = (result == IOM_COMPLETED ? 0 : -1);
/* allocate some frame buffers */
if (status == 0) {

93

for (i=0; i<FRAME_BUFF_CNT && status == 0; i++) {


result = FVID_allocBuffer(hGioVpfeCcdc, &frameBuffTable[i]);
status = (result == IOM_COMPLETED &&
frameBuffTable[i] != NULL ? 0 : -1); } }
/* prime up the video capture channel */
if (status == 0) {
FVID_queue(hGioVpfeCcdc, frameBuffTable[0]);
FVID_queue(hGioVpfeCcdc, frameBuffTable[1]);
FVID_queue(hGioVpfeCcdc, frameBuffTable[2]);
}
/* prime up the video display channel */
if (status == 0) {
FVID_queue(hGioVpbeVid0, frameBuffTable[3]);
FVID_queue(hGioVpbeVid0, frameBuffTable[4]);
FVID_queue(hGioVpbeVid0, frameBuffTable[5]);

/* grab first buffer from input queue */


if (status == 0) {
FVID_dequeue(hGioVpfeCcdc, &frameBuffPtr);

/* loop forever performing video capture and display */


while ( status == 0 ) {
/* grab a fresh video input frame */
FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
extract_uyvy ((frameBuffPtr->frame.frameBufferPtr));
copy_frame();
frame_substract();
tracking();
write_uyvy ((frameBuffPtr->frame.frameBufferPtr));
/* display the video frame */
FVID_exchange(hGioVpbeVid0, &frameBuffPtr);}}
/* * ======== read_JP1 ========
* Read the PAL/NTSC jumper.
*
94

* Retry, as I2C sometimes fails: */


static int read_JP1(void)
{

int jp1 = -1;


while (jp1 == -1) {
jp1 = EVMDM6437_DIP_get(JP1_JUMPER);
TSK_sleep(1);

return(jp1);}
void extract_uyvy(void * currentFrame)
{ int r, c;
for(r = 0; r < 480; r++) {
for(c = 0; c < 360; c++){
I_u1[r][c]

= * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 0);
I_y1[r][2*c]

= * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 1);
I_v1[r][c]

= * (((unsigned char * )currentFrame)

+ r*720*2+4*c+ 2);
I_y1[r][2*c+1] = * (((unsigned char * )currentFrame)
+ r*720*2+4*c+ 3);
}

} }

void write_uyvy (void * currentFrame)


{ int r, c;
for(r = 0; r < 480; r++) {
for(c = 0; c < 360; c++){
* (((unsigned char * )currentFrame) + r*720*2+4*c+ 0)=
I_u[r][c] ;
* (((unsigned char * )currentFrame) + r*720*2+4*c+ 1)=
I_y[r][2*c] ;
* (((unsigned char * )currentFrame) + r*720*2+4*c+ 2)=
I_v[r][c] ;
* (((unsigned char * )currentFrame) + r*720*2+4*c+ 3)=
I_y[r][2*c+1];
95

} }

void copy_frame()
{ int r, c;
for(r = 0; r < 480; r++) {
for(c = 0; c < 360; c++) {
I_u[r][c]

= I_u1[r][c];

I_y[r][2*c]

= I_y1[r][2*c] ;

I_v[r][c]

= I_v1[r][c]

I_y[r][2*c+1] = I_y1[r][2*c+1] ;
}

} }

void frame_substract()
{int r, c;
for(r = 0; r < 480; r++) {
for(c = 0; c < 360; c++){
I_u3[r][c]=

I_u1[r][c]

-I_u2[r][c] ;

I_y3[r][2*c] = I_y1[r][2*c] -I_y2[r][2*c];


I_v3[r][c]

= I_v1[r][c]

-I_v2[r][c] ;

I_y3[r][2*c+1]=I_y1[r][2*c+1]-I_y2[r][2*c+1];
}

for(r = 0; r < 480; r++) {


for(c = 0; c < 360; c++) {
I_u2[r][c]

= I_u1[r][c];

I_y2[r][2*c]

= I_y1[r][2*c] ;

I_v2[r][c]

= I_v1[r][c]

I_y2[r][2*c+1] = I_y1[r][2*c+1] ;
}

} }

void tracking()
{ int r, c,m,n,p,q;
int cent_x,cent_y,cent_z;
int centroid_x,centroid_y;
int dim_x,dim_y;
cent_x=0;

96

cent_y=0;
cent_z=0;
for(m = 0; m < 480; m++) {
for(n = 0; n < 360; n++) {
if((I_u3[m][n]<45 || I_u3[m][n]>200) & (I_y3[m][2*n]<45
|| I_y3[m][2*n]>200) & (I_v3[m][n]
& (I_y3[m][2*n+1]

<45 || I_y3[m][2*n+1]>200)){

I_u3[m][n]

= 128 ;

I_y3[m][2*n]

= 16 ;

I_v3[m][n]

= 128

I_y3[m][2*n+1] = 16;

else

<45 || I_v3[m][n]>200)

cent_x= cent_x + m ;
cent_y= cent_y + n ;
cent_z= cent_z + 1 ;
}
centroid_x= (cent_x/cent_z);
centroid_y=(cent_y/cent_z);} }
for(p =centroid_x-10 ; p < centroid_x+10; p++) {
for(q = centroid_y-10; q < centroid_y+10; q++) {
if(p== centroid_x-10 || p==centroid_x+9 || q ==
centroid_y-10 || q==centroid_y+9) {
I_u[p][q]

= 255;

I_y[p][2*q]

= 255;

I_v[p][q]

= 255;

I_y[p][2*q+1] = 255;
else

{I_u[p][q]

= I_u[p][q];

I_y[p][2*q]

= I_y[p][2*q];

I_v[p][q]

= I_v[p][q];

I_y[p][2*q+1] = I_y[p][2*q+1]; }}}


}

97

References
[1] Morimoto. T., Kiriyama. O., Harada. Y., Adachi. H., Koide. T., and Mattausch. H. J.,
Object tracking in video images based on image segmentation and pattern matching, Proc.
of IEEE Int. Symp. on Cir. and Syst., 2005, pp. 3215-3218.
[2] Yamaoka. K., Morimoto. T., Adachi. H., Koide. T., and Mattausch. H. J., Image segmentation and pattern matching based FPGA/ASIC implementation architecture of real-time object
tracking, Asia and south pacific conference on design automation, 2006, pp. 176-181.
[3] Qiaowei. L., Shuangyuan. Y., and Senxing. Z., Image segmentation and major approaches,
IEEE International Conference on Computer Science and Automation Engineering, 2011,
pp. 465-468.
[4] Patra. D., Santosh. K. K., and Chakraborty. D., Object tracking in video images using
hybrid segmentation method and pattern matching, Annual IEEE India Conference, 2009,
pp. 1-4.
[5] Watve. A. K., Object tracking in video scenes, M. Tech. seminar, IIT Kharagpur, India,
2010.
[6] Uy. D. L.,An algorithm for image clusters detection and identification based on color for
an autonomous mobile robot, Research report submitted to Hampton university, Verginia,
1994
[7] Bochem. A., Herpers. R., and Ken. K. B., , Acceleration of Blocb Detection within Images in Hardware, Research report, University of New Brunswick, 2009, World Wide Web,
http://www.cs.unb.ca/tech-reports/documents/TR_10_205.pdf.
[8] Kaspers, A.,Blob Detection, Research report, Image Sciences Institute, UMC Utrecht,
May 5, 2011.
98

[9] Gupta. M., Cell Identification by Blob Detection, International Journal of Advances in
Electonics Engineering, vol. 2, Issue 1, 2012.
[10] Hinz. S., Fast and subpixel precise blob detection and attribution, IEEE International
Conference on Image Processing, 2005, vol.3, pp. 457-60.
[11] Francois. A. R., Real-time multi-resolution blob track-ing, Technical Report IRIS-04423, Institute for Robotics and Intelligent Systems, University of South-ern California, July
2004.
[12] Mancas. M., Augmented Virtual Studio, Tech. rep. 4. 2008. pp. 1-3.
[13] Dharamadhat. T., Thanasoontornlerk. K., and Kanongchaiyos. P., , Tracking object in
video pictures based on background subtraction and image matching, IEEE International
Conference on Robotics and Biomimetics, 2008, pp. 1255-1260.
[14] Piccardi. M., Background subtraction techniques: a review, IEEE International Conference on Systems, Man and Cybernetics, 2004, vol.4, pp. 3099- 3104.
[15] Andrews. A., Targeting multiple objects in real time, B.E thesis, University of Calgary,
Canada, October, 1999.
[16] Saravanakumar. S., Vadivel. A., and Saneem. A. C. G., Multiple human object tracking
using background subtraction and shadow removal techniques, International Conference on
Signal and Image Processing, 2010, pp. 79-84.
[17] ZuWhan. K., Real time object tracking based on dynamic feature grouping with background subtraction, IEEE Conference on Computer Vision and Pattern Recognition, 2008,
pp. 1-8.
[18] Isard. M., and MacCormick. J., BraMBLe: a Bayesian multiple-blob tracker, Eighth
IEEE International Conference on Computer Vision, 2001, vol.2, pp. 34-41.
[19] Gonzales. R. C., and Woods. R. E., Digital Image Processing-Second Edition, Prentice
Hall, 2002.
[20] Haralick. R. M., and Shapiro. L. G., Computer and Robot Vision, volume I, AddisonWesley, 1992, pp. 28-48.
99

[21] Castagno. R., Ebrahimi. T., and Kunt. M.,Video Segmentation Based on Multiple Features for Interactive Multimedia Applications, IEEE Transactions on Circuits and Systems
for Video Technology, vol. 8, pp. 562-571, September 1998.
[22] Kenako. T., and Hori. O.,Feature selection for reliable tracking using template matching, Proc. IEEE Intl. Conference on Computer Vision and Pattern Recognition, 2003, vol.
1, pp. 796-802.
[23] Bochem. A., Herpers. R., and Kent. K. B., ,Hardware Acceleration of BLOB Detection
for Image Processing, Third International Conference on Advances in Circuits, Electronics
and Micro-Electronics, 2010, pp. 28-33.
[24] Mostafa. A., Mehdi. A., Mohammad. H., and Ahmad. A., Object Tracking in Video
Sequence Using Background Modeling, Proc. IEEE Workshop on Application of Computer
Vision, 2011, pp. 967-974.
[25] Babu. R. V., and Makur A., Object-based Surveillance Video Compression using Foreground Motion Compensation Int. Conf. on Control, Automation, Robotics and Vision,
2006, pp. 1-6.
[26] Comaniciu. D., Ramesh. V., and Meer. P., Real-time tracking of non-rigid objects using
mean shift, Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2000,
vol.2, pp. 142-149.
[27] Foresti. G. L., A real-time system for video surveillance of unattended outdoor environments, IEEE Trans. Circuits and Systems for Vid. Tech., vol. 8, no. 6, pp. 697-704, 1998.
[28] Elbadri. M., Peterkin. R., Groza. V., Ionescu. D., and Saddik. El. A., Hardware support
of JPEG, Canadian Conf. on Electrical and Computer Engineering, 2005, pp. 812-815.
[29] Deng. M., Guan. Q., and Xu. S., Intelligent video target tracking system based on DSP,
Int. Conf. on Computational Problem-Solving, 2010, pp. 366-369.
[30] Liping. K., Zhefeng. Z., Gang. X., The Hardware Design of Dual-Mode Wireless Video
Surveillance System Based on DM6437, Second Inte. Conf. on Networks Security Wireless
Communications and Trusted Computing, 2010, pp. 546-549.

100

[31] Pescador. F., Maturana. G., Garrido. M. J., Juarez. E., and Sanz. C., An H.264 video
decoder based on a DM6437 DSP, Digest of Technical Papers International Conference on
Consumer Electronics, 2009, pp. 1-2.
[32] Wang. Q., Guan. Q. , Xu. S., and Tan. F., A network intelligent video analysis system
based on multimedia DSP, Int. Conf. on Communications, Circuits and Systems, 2010, pp.
363-367.
[33] Kim. C., and Hwang. J. N., Object-based video abstraction for video surveillance system, IEEE Trans. circuits and Systems for Video Technology y, vol. 12, no. 12, pp. 11281138, 2002.
[34] Nishi. T., and Fujiyoshi. H., Object-based video coding using pixel state analysis, IEEE
Intl. Conference on Pattern Recognition, 2004.
[35] William. K. P., Digital Image Processing (second edition), John Wiley & Sons, New
York, 1991.
[36] Wallace. G. K., The JPEG still picture compression standard, IEEE Transactions on
Consumer Electronics, vol.38, no.1, pp. xviii-xxxiv, Feb 1992.
[37] Seol. S. W., An automatic detection and tracking system of moving objects using double
differential based motion estimation, Proc. of Int. Tech. Conf. Circ./Syst., Comput. and
Comms., 2003, pp. 260-263.
[38] Dwivedi. V., Jpeg Image Compression and Decompression with Modeling of DCT Coefficients on the Texas Instrument Video Processing Board TMS320DM6437, Master of
science, California State University, Sacramento, Summer 2010.
[39] Kapadia. P.,Car License Plate Recognition Using Template Matching Algorithm, Master Project Report, California State University, Sacramento,Fall 2010.
[40] Gohil. N., Car License Plate Detection, Masters Project Report, California State University, Sacramento, Fall 2010.
[41] Texas Instruments Inc., TMS320DM6437 DVDP Getting Started Guide, Texas, July
2007.

101

[42] Texas Instrument Inc., TMS320DM6437 Digital Media Processor, Texas, pp. 211-234,
June 2008.
[43] Texas Instruments Inc., TMS320C64x+ DSP Cache Users Guide, Literature Number:
SPRU862A, Table 1-6, pp. 23, October 2006.
[44] Texas Instrument Inc., TMS320DM643x DMP Peripherals Overview Reference Guide,
pp. 15-17, June 2007.
[45] Texas Instrument Inc., TMS320C6000 Programmers Guide, Texas, pp. 37-84, March
2000.
[46] Xilinx Inc., The Xilinx LogiCORE IP RGB to YCrCb Color-Space Converter, pp. 1-5,
July 2010.
[47] Texas Instruments Inc., How to Use the VPBE and VPFE Driver on TMS320DM643x.
Dallas, Texas, November 2007.
[48] Texas Instrument Inc., TMS320C64X+ DSP Cache , User Guide, pp. 14-26, February
2009.
[49] Texas Instruments technical Reference, TMS320DM6437 Evaluation Module, Spectrum Digital , 2006.
[50] Keith. Jack., Video Demystified: A Handbook for the Digital Engineer, 4th Edition,
Llh Technology Pub,1995.
[51] Pawate. B. I. , Developing Embedded Software using DaVinci&OMAP Technology
Margon & Claypool, 2009.
[52] Bovik. Al., Handbook of Image & Video Processing, Academic Press Series, Department of Electrical and Computer Engineering, UTA Texas, 1999.
[53] Stephens. L. B., Student Thesis on Image Compression Algorithms, California State
University, Sacramento, August 1996
[54] Berkeley Design Technology, Inc.,The Evolution of DSP Processors, World Wide Web,
http://www.bdti.com/articles/evolution.pdf, Nov. 2006.

102

[55] Berkeley Design Technology, Inc., Choosing a Processor: Benchmark and Beyond,
World Wide Web,http://www.bdti.com/articles/20060301_TIDC_Choosing.pdf, Nov. 2006.
[56] University of Rochester, DSP Architectures: Past, Present and Future, World Wide
Web, http://www.ece.rochester.edu/research/wcng/papers/CAN_r1.pdf, Nov. 2006.
[57] Steven. W. Smith., The Scientist and Engineers Guide to Digital Signal Processing,
Second Edition, California Technical Publishing, 1999.
[58] Texas Instruments Inc., TMS320DM642 Technical Overview, Dallas, Texas, September 2002.

103

Acknowledgments
I express my sincere thanks and deep sense of gratitude to my supervisor Prof. V Rajbabu for
his invaluable guidance, inspiration, unremitting support,encouragement and for his stimulating
suggestions during in the preparation of this report. His persistence and inspiration during the
ups and downs in research, and his clarity and focus during the uncertainties, have been very
helpful to me. Without his continuous encouragement and motivation, the present work would
not have seen the light of day.
I acknowledge with thanks to all EI lab members and TI-DSP lab members, at IIT Bombay
who have directly or indirectly helped me throughout my stay in IIT. I Would also like to thank
the assistance provided by the department staff, central library staff and computer faculty staff.
I would like to express my sincere thanks to Mr. Ajay Nandoriya and Mr. K.S Nataraj for
their help and support during the project work.
The family members are of course, a source of faith and moral strength. I acknowledge
the shower of blessing and love of my parents, Mr. Rajiba Lochana Patro and Mrs. Uma Rani
patro, also Godaborish patro and Madhu sundan patro for their unrelenting moral supports in
difficult times.I wish to express my deep gratitude towards all of my friends and colleagues for
providing me constant moral support, their support makes my stay in institute pleasant. I have
enjoyed every moment that I spent with all of you.
And finally I am thankful to God in whom I trust.

Date:

Badri Narayan Patro

Das könnte Ihnen auch gefallen