Sie sind auf Seite 1von 53

B.E.

PROJECT REPORT
On

ESTIMATING THE COUNT OF PEOPLE FROM A VIDEO


Submitted by,

Prasad Sunil Udawant (B120023170)


Tejas Suresh Pandhare (B120023102)
Pratik Gahininath Kekane (B120023117)

Project Guide

Prof. P. Mahajani
(Internal Guide)

Sponsored by

College

Year: 2015-2016
Maharashtra Institute of Technology, Pune 38
Department of Electronics and Telecommunication

MAEERs

MAHARASHTRA INSTITUTE OF TECHNOLOGY, PUNE.

CERTIFICATE
This is to certify that the Project entitled
ESTIMATING THE COUNT OF PEOPLE FROM A VIDEO
has been carried out successfully by
Prasad Sunil Udawant (B120023170)
Tejas Suresh Pandhare (B120023102)
Pratik Gahininath Kekane (B120023117)
during the Academic Year 2015-2016 in partial fulfilment of their course of study for
Bachelors Degree in
Electronics and Telecommunication
as per the syllabus prescribed by the
Shri Savitribai Phule Pune University.

Prof. Mrs. P. Mahajani


Internal Guide

Dr. G.N. Mulay


Head of Department
(Electronics and Telecommunications)
MIT, Pune.

ABSTRACT
There is an ever increasing pressure to provide services for an ever
increasing human population. On many occasions, managing the people becomes
critical especially at public places like pilgrimage centers, malls, tourist places. This
is where technology comes to help us out.
There are existing technologies to detect people in an enclosed
environment. These technologies use sensors like infra-red, thermal etc. They have
varying accuracies and drawbacks. Nowadays, there is an increasing trend in video
based solution. With image processing providing a variety of processing techniques
and efficient algorithms, it assures a pin-point accuracy. We are making use of Da
Vinci video processor (DM 6437) which boasts a very long instruction word
architecture (VLIW) developed by Texas Instrument. The video processing back end
and front end (VPBE and VPFE) are video specific platforms that enable easy
processing of real time video. The features of Da Vinci combined with image
segmentation would help detect the number of people. We will use techniques like
histogram, K-means along with edge detection to ensure reliability of count.
On implementation of our proposed model, we will be able to detect the
number of people in a specific environment in a response a real time video input.
This will open up gates for number of controlling actions depending on application
at hand.

ACKNOWLEDGEMENT
We sincerely thank our final year mentor Prof. (Mrs.) P. Mahajani for
all her support and help. She gave shape to our abstract idea with stimulating
suggestions and encouragement, resulting in a successful project. Her timely
guidance was the reason for the systematic progress of the project
We also appreciate the role of all the other staff members, departmental
facilities and HoD Sir who helped in approving our project by conducting several
review sessions to track our progress and direct us in the correct path.

LIST OF ABBREVIATIONS
APL
DVDP
DVSDK
EPSI
EVM
GPP
HD
IOL
NTSC
PAL
SD
SPL
VISA
VPBE
VPFE
VPSS
xDAIS
xDM

Application Layer
Digital Video Development Platform
Digital Video Software Development Kit
Embedded Peripheral Software Interface
Evaluation Module
General Purpose processor
High Definition
Input output Layer
National Television System Committee
Phase Alternating Line
Standard Definition
Signal Processing Layer
Video Image Speech Audio
Video Processing Back End
Video Processing Front End
Video Processing Sub System
eXpressDSP Algorithm Interface Standard
eXpressDSP Digital Media

LIST OF FIGURES

Serial no.

Figure Name

2.1

People count using Face Recognition

2.2

People count using PIR Sensor

3.1

System Block Diagram

3.2

Video Scanning Methods

3.3

Block Diagram of DaVinci Processor

3.4

Configuration Switch S3 Summary

3.5

Operating system layers in DaVinci Processor

3.6

Functional Block Diagram

3.7

VPFE Functional Block Diagram

3.8

VPBE Functional Block Diagram

4.1

System Flowchart

4.2

Background Subtraction

4.3

Erosion of a binary image with a disk structuring element

4.4

Dilation of a binary image with a disk structuring element

4.5

Opening of binary image

5.1

Color Bar Output

5.2

Edge Segmentation Output

A.1

YCbCr Sampling

CONTENTS
Chapter 1. Introduction..................8
1.1 Scope of project.............................10
1.2 Organization of report........................11
Chapter 2. Literature Survey................12
2.1 Present Scenario.....................................14
Chapter 3. System Development..... 16
3.1 System specifications...................................................................................................... 16
3.2 System block diagram & description.............................................................................. 17
3.3 System block components................................................................................................ 19
Video standards.19
Color CCD Camera...21
TM320DM6437 DA VINCI Video processor..24
3.4 Complexities involved......33
Chapter 4. System Design.....................35
4.1 Image preprocessing.........................................................................................................36
4.2 Background Substraction..................................................................................................36
4.3Image Segmentation...........................................................................................................38
4.4 Morphological operations.............................................................................................39
Chapter 5. Implementation of system & Results........................41
5.1 Test Code for Color Bars..43
5.2 Edge Detection using IMGLIB.....47
5.3 Background substraction on DM6437.......49
Chapter 6. References............................52
APPENDIX A:Video File Format.....53

Chapter 1

INTRODUCTION

Surveillance systems are used for monitoring, screening and tracking of


activities in public places such as banks, in order to ensure security. Various aspects
like screening objects and people, bio metric identification and video surveillance,
maintaining the database of potential threats etc., are used for monitoring the
activity. Moving object tracking in video has attracted a great deal of interest in
computer vision. For object recognition, navigation systems and surveillance
systems, object tracking is the first-step. The object tracking methods may broadly
be categorized as segmentation-based method, template-based method, probabilistic
and pixel-wise. In segmentation-based tracking or blob detector, the basic idea is
aimed at detecting points and/or regions in the image that are either brighter or darker
than the surrounding. They are easy to implement and fast to compute but may lack
accuracy for some application. Template concepts are based on matching the direct
appearance from frame to frame. These methods offer a great deal of accuracy but
are computationally expensive. The probabilistic method uses intelligent searching
strategy for tracking the target object. Similarly, the similarity matching techniques
are used for tracking the target object in pixel-based methods.
Most tracking algorithms are based on difference evaluation between
the current image and a previous image or a background image. However, algorithms
based on the difference of images have problems in the following cases.
(1) Still objects included in the tracking task exist.
(2) Multiple moving objects are present in the same frame.
(3) The camera is moving.
(4) Occlusion of objects occurs.
This can be solved by using an algorithm for object tracking, based on image
segmentation and pattern matching. But we use a novel approach image
+segmentation algorithm, in order to extract all objects in the input image.
The background subtraction method is to use the difference method of the
current image and background image to detect moving objects, with simple
algorithm, but very sensitive to the changes in the external environment and has poor
anti- interference ability. However, it can provide the most complete object
information in the case if the background is known. It receives the most attention
due to its computationally affordable implementation and its accurate detection of
moving entities. In this project, in a single static camera condition, we combine
dynamic background modelling with threshold selection method based on the
background subtraction, and update current frame on the basis of exact detection of
object. This method is effective to improve the effect of moving object detection.
8

Any motion detection system based on background subtraction needs to handle a


number of critical situations such as:
1. Noise image, due to a poor quality image source.
2. Gradual variations of the lighting conditions in the scene.
3. Small movements of non-static objects such as tree branches and bushes
blowing in the wind.
4. Undeviating variations of the objects in the scene, such as cars that park (or
depart after a long period).
5. Sudden changes in the light conditions, (e.g. sudden raining), or the presence of
a light switch (the change from daylight to non-natural lights in the evening).
6. Movements of objects in the background that leave parts of it different from the
background model;
7. Shadow regions that are projected by foreground objects and are detected as
moving objects.
8. Multiple objects moving in the scene both for long and short periods.
The main objective of this paper is to develop an algorithm that can
detect human motion at certain distance for object tracking applications. We carry
out various tasks such as motion detection, background modelling and subtraction,
foreground detection etc.

1.1

SCOPE OF PROJECT
The motivation for such a system is to gather data about how many people
there are inside a building at a given time. This will help the owners to set up
re extinguishing equipment or the size and placement of re exits. Building
owners are required by law to have enough of this equipment based on how
many people that can gather inside. A computer vision counting system have
the advantage of not disrupting the ow of traffic like contact based systems
might do, and more robust then simple photoelectric cells. Knowing when and
how many customers are inside a shopping mall could also help to optimize
labor scheduling, system controls and monitor the effectiveness of
promotional events. Optimization of security measures is also a possible
benet from this, knowing how many security guards should be assigned, and
hot-spots inside the mall for them to patrol.

Our work will hopefully answer the questions like:1. Can we achieve better separation of groups into individuals?
Exploring algorithms and methods so that each individual can be
detected, tracked and counted.
2. Can we nd features to discriminate people from objects?
Features need to be found and combined to make good decisions about
foreground objects
3. Can our proposed algorithm detect moving as well as stationary
humans?

10

1.2 ORGANIZATION OF THE REPORT


Chapter 1 :Introduction and scope
This chapter provides an overview of the basic functionality of the system and
describes its scope of expansion
Chapter 2: Literature Survey and present scenario.
We present the Literature Survey of the work done in this field so far as well
as the present scenario.
Chapter 3 : System Block Diagram and Flow Chart
Explains in detail the design and development process of the system. Includes
system specifications block diagram, description of each block, System flow
chart.
Chapter 4:System Design
It includes all the algorithms used for image pre-processing, feature
extraction and classification and recognition.
Chapter 5 : Result and Conclusion
It includes all test codes and functions run on MATLAB & CCSv3.3 to verify
accurate working of hardware & the various stages of the proposed algorithm
of a test video. It also includes future scope of project based on the conclusion.
Chapter 6 : Appendix
Includes important references related to the work done on the topic,
datasheets, books for implementation of the proposed system and websites,
referred, as & when, the doubt & queries arose.

11

Chapter 2

LITERATURE SURVEY

[A] HUMAN DETECTION IN VIDEO, 1 Muhammad Usman Ghani Khan


,2 Atif Saeed, presented at Journal of Theoretical and Applied Information
Technology in 2007/05.
Their proposed algorithm comprises of following steps:
1. Converting a video sequence in to individual images.
2. Accessing the sequential images and detecting the important features.
3. Allocating those regions (if any) giving indications of human presence such one
indication is having a human skin like color.
4. Applying movement detection test for all of the allocated regions.
5. Applying face detector to those detected moving objects to detect if it is a face
or not
[B] Fast People Counting Using Head Detection from Skeleton Graph IEEE
International Conference on Advanced Video and Signal Based Surveillance}
In this paper, they present a new method for counting people. This
method is based on the head detection after a segmentation of the human body by
skeleton graph process. The skeleton silhouette is computed and decomposed into a
set of segments corresponding to the head, torso and limbs. This structure captures
the minimal information about the
Skeleton shape.
[C] Real-time people counting system using video camera by Roy-Erland Berg
In this thesis, experiments have be tried out on a people counting system in an
effort to enhance the accuracy when separating counting groups of people, and
nonhuman Objects. This system features automatic color equalization, adaptive
background Subtraction, shadow detection algorithm and Kalman tracking.
[D] Real-Time Video and Image Processing for Object Tracking using Da
Vinci Processor, Dissertation submitted by Badri Narayan Patro, M.Tech , IIT
Bombay under the guidance of Prof. V. Rajbabu
In this project they have developed and demonstrated a framework for
real-time implementation of image and video processing algorithms such as object
12

tracking and image inversion using Davinci processor. More specifically we track
single object and two object present in the scene captured by a CC camera that acts
as the video input device and output is displayed in LCD display. The tracking
happens in real-time consuming 30 frames per second (fps) and is robust to
background and illumination changes. The performance of single object tracking
using background subtraction and blob detection was very efficient in speed and
accuracy as compared to a PC (Matlab) implementation of a similar algorithm.
Execution time for different blocks of single object tracking were estimated using
the profile and accuracy of the detection is verified using the debugger provided by
TI code composer studio (CCS). We demonstrate that the TMS320DM6437
processor provides at least ten-times speed-up and is able to track a moving object
in real-time.

2.1 PRESENT SCENARIO

13

Presently there are good many numbers of different algorithms


available to make this system work. Some of them make use of multiple cameras
while some make use of single camera.
A very obvious way to get the count of people in a video is by
recognizing the face of each individual. But this puts lot of limitations on the system
and makes it impractical to work with. Some systems implement a new method to
detect a human skin and faces from colored images. These systems based on the
detection of all pixels in colored images which are probably a human skin via a
reference skin colors matrix. The image then goes through some modifications to
enhance the face detection.

Figure 2.1: People count using Face Recognition


Pyro electric infrared (PIR) sensors are well-known occupancy detectors.
They have been widely employed for human tracking systems, due to

their low

cost and power consumption, small form factor and unobtrusive and privacypreserving interaction. In particular, a dense array of PIR sensors having digital
14

output and the modulated visibility of Fresnel lenses can provide capabilities for
tracking human motion, identifying walking subject and counting people entering or
leaving the entrance of a room or building. However, the analog output signal of PIR
sensors involves more aspects beyond simple people presence, including the
distance of the body from the PIR sensor, the velocity of the movement (i.e.,
direction and speed), body shape and gait (i.e., a particular way or manner of
walking). Thus, we can leverage discriminative features of the analog output signal
of PIR sensors in order to develop various applications for indoor human tracking
and localization
A number of systems are based upon a similar approach i.e. feature based
regression. This involves detection of humans based upon the features extracted
from background and foregrounds of the image. The interpretation of these features
varies from system to system. While many follow a complex mathematical model
to retrieve some meaningful information.

Figure 2: Human Detection using PIR sensor

Figure 2.2: People count using Face Recognition

Chapter 3

SYSTEM DEVELOPMENT
15

3.1 System Specifications


Our proposed algorithm takes video as an input via a CCD based
camera. Its features like extraordinary dynamic range, spatial resolution, spectral
bandwidth, and acquisition speed serve our purpose to get a noise free input. One of
the obtained video frames is used for processing on a specialized video processing
board popularly known as DA VINCI. While the primary software we have used
to support our hardware is Code Composer Studio, we use MATLAB to just test and
verify the concept before we actually apply to our main system.

Software:
Code Composer Studio v3.3
MATLAB 2014a (for testing purposes only)

Hardware:
TM320DM6437 DA VINCI Video processor
CRT TV Box
Color CCD camera for video surveillance

3.2 SYSTEM BLOCK DIAGRAM & DESCRIPTION


16

Figure 3.1: System Block Diagram

System Block Description:CCD camera will be installed in an area which is to be brought under video
surveillance. It will be installed in the specific area at a minimum height and
a minimum angle which would be able to detect people with varying heights.
The video stream, which is in PAL form, is then fed to the DA VINCI video
processor. This device includes a Video Processing Sub-System (VPSS) with
two configurable video/imaging peripherals:
1) Video processing front-end input (VPFE) used for video capture,
2) Video processing back-end output (VPBE) output.
Video processing front-end is comprised of CCD controller, a
preview engine, Histogram module, Auto-exposure/white balance, focus
module, and resizer. The common video decoders CMOS sensors, CCDs can
17

be easily interfaced to CCDC. The previewer is real time image processing


engine that takes raw image data from either CMOS sensor or CCD and
converts from RGB to YUV422 format. The resizer accepts image data for
vertical and horizontal resizing.
Video processing back-end driver is comprised of on-screen display
and video encoder (VENC). The VENC provides 4 analog DACs that run at
54 MHz providing a means for composite NTSC PAL and/or component
video output. It also provides digital output interface to RGB devices.
Thus the features of the processor like the third generation high
performance, advanced VelociTI VLIW architecture developed by TI make it
an excellent choice for digital media applications.
The CRT TV box on obtaining the video converts the processed
frames into a monitor compatible video format VGA.

18

3.3 SYSTEM BLOCK COMPONENTS


1) Video Standards:
Progressive scan captures, transmits and displays an image in a way
similar to text on page, line by line, top to bottom.
The interlaced scan in a CRT display also completes such a scan, but in
two phases (or two fields-viz. odd and even). The first field displays the first and all
odd numbered lines from top left corner to bottom right corner. The second pass
displays the second and all even numbered lines, filling in the gaps in first (odd field)
scan. This scanning by alternate lines is called interlacing.
A field is an image that contains only half of line needed to make
complete picture and therefore, it saves bandwidth.

Figure 3.2: Video Scanning Methods

19

The two video standards that are currently employed in any television system are as
following:
A] NTSC (National Television System Committee)
> 29.97 interlaced frames of video per second
> Scans 525 lines per second. Out of these 525 lines, 480 are for visible raster and
others are for synchronization and vertical retrace
> Gives higher temporal resolution than PAL
> Screen updates more frequently and hence motion is rendered better in NTSC than
in PAL video
B] PAL (Phase Alternating Line)
> PAL alternates the Chroma phase between each line of video such that if there are
any drifts in Chroma decoding they average out between lines. NTSC doesnt have
this protection and as a result its Chroma reproduction can be wrong, however PAL
can be accused of having less Chroma detail.
> PAL specifies 786 pixels per line, 625 lines or 50 fields (25 frames) per second
> PAL gives higher spectral resolution than NTSC
> PAL video is of higher resolution than NTSC video

20

Difference between NTSC and PAL:


NTSC is the video system or standard used in North America or most of South
America. In NTSC, 30 frames are transmitted each second. Each frame is
made up of individual 525 scan lines. PAL is a predominant video system or
standard mostly used overseas. In PAL, 25 frames are transmitted each
second. Each frame is made up of 625 individual scan lines.
720*576=414,720 for 4:3 aspect ratio using PAL.
720*480=345,600 for 4:3 aspect ratio using NTSC.

2) Color CCD camera


DESCRIPTION
MODEL NO: MCB2200
1/3 Color Camera
PAL, Audio
12V DC, 3W max
SPECIFICATIONS
Pick up device

SONY 1/3nterline transfer color CCD

Picture Elements

NTSC: 510*492; PAL: 500*582(STD Res.)


NTSC: 768*494; PAL: 752*582(STD Res.)

Horizontal Resolution

380 TV lines (Standard Resolution)


480 TV lines (High Resolution)

Sensitivity

0.3lux/F=1.2 (Standard Resolution)


0.5lux/F=1.2 (High Resolution)

S/N Ratio

Over 48 dB
21

Electronic Shutter

1/60(1/50) to 1/100,000

Auto Iris

Video/Direct Drive Switch

Auto Gain Control

On/Off Switch

Gamma Correction

0.45

Video output

BNC VBS 1.0 Vp-p, 75ohm

Power Source

DC 12v only or AC 24V/DC 12V

Sync. Mode

Internal Sync.

Lens Mount

C/CS mount

Power Consumption

3W Max.

SPECIAL FEATURES

1) Electronic Shutter ON/OFF :


ES ON:
The camera continuously adjusts the shutter speed from 1/60(NTSC),
1/50(PAL) second to 1/100,000 second according to the luminance conditions
of the scene.
ES OFF:
The shutter speed is fixed at 1/60(NTSC), 1/50(PAL) second. Set ES OFF,
when auto iris lens is uses or flicker is observed under a very bright fluorescent
lamp. Otherwise, turn ES ON for optimum performance.

2) Back Light Compensation ON/OFF:


When BLC is turned on, the AGC, ES & IRIS operating point is
determined by averaging over the center area instead of entire field-of-

22

view, so that dimly lit foreground object at center area can be clearly
distinguished from brightly lit backgrounds.
BLC Should not be used unless it is needed to compensate for backlit.

3) Automatic Gain Control ON/OFF :


AGC ON:
The sensitivity increases automatically when light is low
AGC OFF:
A low noise picture is obtained under a low light condition.

23

3) TM320DM6437 DA VINCI Video processor

Figure 3.3: Block Diagram of DaVinci Processor


Description
The DaVinci EVM is a development board that enables evaluation of and
design with the DaVinci processors. The EVM serves as both a working
platform as well as reference design.
The DaVinci family consists of DSP based system on a chip processors
designed to handle todays video and connectivity driven applications. The
DaVinci EVM is a reference platform that highlights the on-chip capabilities.
Board features include:
Features
256 MB of sDRAM
16 MB of linear Flash memory
24

Composite video inputs (1 decoder)


Composite and component video outputs
AIC33 stereo codec
Stereo analog audio inputs and outputs
S/PIDF digital audio outputs
USB 2.0 host connector
10/100 Ethernet PHY
Infrared remote interface
9-pin UART
SD/MMC/MS serial media card support
CompactFlash/SM/xD parallel media card support
ATA hard disc interface

FIGURE 3.4 : Configuration Switch S3 Summary

25

Da Vinci Processor and Family


The Da Vinci Technology is a family of processors with integrated with
software and hardware tools package for a flexible solution of the host of
applications from cameras to phones to hand held devices to automotive gadgets.
DaVinci Technology is the combination of raw processing power and software
needed to simplify and speed up the production of digital multi-media and video
equipment.
Da Vinci technology consists of:
Da Vinci Processors:
Scalable, programmable DSPs and DSP-based SoCs (system on chip) tailored
from DSP cores, accelerators, peripherals and ARM processors optimized to
match performance, various price and feature requirements in a spectrum of
digital video end equipments i.e. TMS320DM6437 and TMS320DM6467.

Da Vinci Software:
It is inter communicable, optimized, video and audio standards. Codecs
leveraging DSP and integrated accelerators, APIs within operating systems
(Linux) for rapid software implementation. I.e. Codec Engine, DSP BIOS,
NDK, Audio and Video Codec.
Da Vinci Development Tools/Kits:
Complete development kits along with reference designs, DM6437 DVSDK,
Code Composer Studio, Green Hill, and Virtual Linux. Da Vinci Video
Processor solution are the tailored for digital video, image and vision
application. The Da Vinci platform includes a general purpose processor
(GPP), video ac

26

Basic working functionality of the Da Vinci processor


Let us take example of video capture driver, for example, reads data
from a video port or peripheral and starts filling a memory buffer. When this input
buffer is full, an interrupt is generated by the IOL to the APL and a pointer to this
full buffer is passed to the APL.The APL picks up this buffer pointer and in turn
generates an interrupt to the SPL and passes the pointer. The SPL now processes the
data in this input buffer and when complete, generates an interrupt back to the APL
and passes the pointer of the output buffer that it created. The APL passes this output
buffer pointer to the IOL commanding it to display it or send it out on the network.
Note that only pointers are passed while the buffers remain in place. The overhead
passing the pointers is negligible. All these three layers and different APIs and
Different Driver and component are shown in Figure 3.5

Figure 3.5: Operating system layers in DaVinci Processor

>>Signal Processing Layer (SPL):


SPL consists of the entire signal processing functions or algorithms that
run on the device. For example, a video codec, such as MPEG4-SP or H.264, will
run in this layer. These algorithms are wrapped with expressed Digital Media (xDM)
API. In between xDM and (Video, Image, Speech, Audio) VISA are the Codec
27

Engine, Link and DSP/BIOS. Memory buffers, along with their pointers, provide
input and output to the xDM functions. This decouples the SPL from all other layers.
The Signal Processing layer (SPL) presents VISA APIs to all other layers. The main
component of the SPL are xDM, XDAIS, VISA APIS and Codec Engine Interface.
>> Input output Layer (IOL):
The Input Output Layer (IOL) covers all the peripheral drivers and
generates buffers for input or output data. Whenever a buffer is full or empty, an
interrupt is generated to the APL. Typically, these buffers reside in shared memory,
and only pointers are passed from IOL to the APL and eventually to SPL. The IOL
is delivered as drivers integrated into an Operating System such as Linux OS or
WinCE. In the case of Linux, these drivers reside in the kernel space of Linux OS.
The Input Output layer (IOL) presents the OS-provided APIs as well as EPSI APIs
to all other layers. IOL contains Video Processing Subsystem(VPSS) device driver
used for video capturing and displaying, USB driver to capture video to USB based
media, debug is done by using UART serial port driver for console application, when
we want to captured video is sent
over the network we need for Ethernet driver that is EMAC and VPFE driver
internally uses I2C driver for communication protocol, for audio processing system
Multichannel Audio Serial Port (McASP) driver are used, for buffering of stream
data we are using Multichannel Buffered Serial Port (McBSP) driver are used.
>>Application Layer (APL):
The Application layer interacts with IOL and SPL. It makes calls to IOL
for data input and output, and to SPL for processing. The Sample Application Thread
(SAT) is a sample application component that shows how to call EPSI and VISA
APIS and interfaces with SPL and IOL as built in library functions. All other
application components are left to the developer. He may develop them or leverage
the vast open source community software. These include, but not limited to,
Graphical User Interfaces (GUI), middle ware, networking stack, etc. Master thread
is the highest level thread such as an audio or video thread that handles the opening
of I/O resources (through EPSI API), the creation of processing algorithm instances
(through VISA API), as well as the freeing of these resources. Once necessary
resources for a given task are acquired, the master thread specifies an input source
for data (usually driver or file), the processing to be performed on the input data
(such as compression or decompression) and an output source for the processed data
(usually driver or file).
The Network Developers Kit (NDK) provides services such as HTTP
server, DHCP client/server, DNS server, etc. that reside in the application layer.
28

Note that these services use the socket interface of the NDK, which resides in the
I/O layer, so the NDK spans both layers.

Figure 3.6: Functional Block Diagram

29

Video Processing Sub Systems :

1. Video Processing Front End (VPFE):


The VPFE block is comprised of a charge-coupled device (CCD)
controller (CCDC), preview engine image pipe (IPIPE), hardware 3A statistic
generator (H3A), resizer and histogram. The CCD controller is responsible for
accepting raw unprocessed image/video data from a sensor (CMOS or CCD). The
preview engine image pipe (IPIPE) is responsible for transforming raw
(unprocessed) image/video data from a sensor (CMOS or CCD) into YCbCr 422
data which is easily controlled for compression or display Typically, the output of
the preview engine is used for both video compression and displaying it on an
external display device, such as a NTSC/PAL analog encoder or a digital LCD. The
output of Preview engine or DDR2 is the input to the Resizer, which can be resized
to the 720x480 pixels per frame. The output of the resizer module will be sent to the
SDRAM/DDRAM.
Then Resizer is free to the Preview Engine Pipe for the further
processing. The H3A module is designed to support the control loops for auto focus
(AF), auto white balance (AWB), and auto exposure (AE) by collecting metrics
about the imaging/video data, where AF engine extracts and filters RGB data from
the input image/video data and provides either the accumulation or peaks of the data
in a specified region and AE/AWB engine accumulates the values and checks for
saturated values in a sub-sampling of the video data. Histogram allows luminance
intensity distribution of the pixels of the image/ frame to be represented.

30

FIGURE 3.7: VPFE BLOCK DIAGRAM

2. Video Processing Back End (VPBE):


VPBE is responsible for displaying the processed image on different
display devices such as TV, LCD or HDTV. The VPBE block is comprised of the
on-screen display (OSD) and the video encoder (VENC) modules. OSD is a graphic
accelerator, which is responsible for resizing of images to either NTSC format or
PAL format (640x480 to 720x576) on the output devices and it combines display
windows into a single display frame, which helps VENC module to output the video
data. The primary function of the OSD module is to gather and combine video data
and display/bitmap data and then pass it to the video encoder (VENC) in YCbCr
format. VENC converts the display frame from the OSD into the correctly formatted,
desired output signals in order to interface it to different display devices. The VENC
takes the display frame from the on-screen display (OSD) and formats it into the
desired output format and output signals (including data, clocks, sync, etc.) that are
required to interface to display devices. The VENC consists of three primary subblocks, analog video encoder which generates required signals to interface to
31

NTSC/PAL system also includes video A/D converter second is timing generator,
responsible for generate the specific timing required for analog video output and
lastly digital LCD controller, which supports various LCD display formats, YUV
outputs for interface to high-definition video encoders and/or DVI/HDMI interface
devices.

Figure 3.8: VPBE Functional block diagram

32

3.3 COMPLEXITIES INVOLVED


1) Positioning camera:
To be able to get a full skeleton of all the people under surveillance without much
ambiguity using a single camera is not that simple. It may happen that the camera is
installed in a fashion that the captured video might miss a person. So proper
precaution must be taken to bring that area under surveillance in a way that it will
not hide people in close proximity.
2) Identifying human beings from the input video:
There are number of ways which people have worked upon such as face detection
algorithm. But this put many restrictions on the expected system. One of it being
capturing the face of every human under video surveillance which is impractical.
Thus, we opted for physique based algorithm which detects the skeleton of human
after subtracting the background from the input frame.

3) Counting stationary people:


We started off with identifying the foreground by subtracting the background frame
from an input video frame. This gave us all the moving objects in a foreground and
among those we were able to count people. But this left stationary people uncounted.
So to overcome this flaw we thought of a reference image of the area under
surveillance. This will facilitate detection of moving and stationary people.
4) Separating people from group of people:
The most ambiguous situation in video processing application is overlapping edges.
These overlapping edges arise on account of people in close proximity. In our
system, this leads to overlapping of the respective skeletons which will lead to
33

erroneous count of people. To get rid of such situation we have included head
detection and pose estimation in our algorithm.

34

Chapter 4

SYSTEM DESIGN

Introduction
People counting algorithm is applied to different application such as, automated
video surveillance, traffic monitoring, stampede management etc. It involves various
image processing algorithm such as image segmentation, morphological image
processing.

FIGURE 4.1: SYSTEM FLOWCHART


The major steps for object tracking are as shown in Figure 4.1. Here are the
different steps in people counting:
1. Image preprocessing.
2. Background subtraction.
3. Image segmentation. (Thresholding).
4. Morphological Operation (opening).
5. Blob detection and analysis (connected component labeling).
6. Count the no of people.
35

4.1 Image Preprocessing


The image captured by a surveillance camera is affected by various system noises
and output data format may be uncompressed or compressed. In order to remove the
noise preprocessing of the image is essential. Preprocessing of image includes
filtering and noise removal data.

4.2 Background Subtraction


Background subtraction is a widely used approach for detecting moving objects in
videos from static cameras. The rationale in this approach is that of detecting the
moving objects from the difference between the current frame and a reference frame,
often called the background image, or background model. It is required that the
background image must be a representation of the scene with no moving objects and
must be kept regularly updated so as to adapt tithe varying luminance conditions and
geometry settings.
The main motivation for the background subtraction is to detect all the foreground
objects in a frame sequence from fixed camera. In order to detect the foreground
objects, the difference between the current frame and an image of the scenes static
background is compared with threshold. The detection equation is expressed as:
|frame(i) - background(i)| > Threshold

(4.1)

The background image varies due to many factors such as illumination changes
(gradual or sudden changes due to clouds in the background), changes due to camera
oscillations, changes due to high-frequencies background objects (such as tree
branches, sea waves etc.).
The basic methods for background subtraction are:
1. Frame difference
|frame(i) - frame(i - 1)| > Threshold
(4.2)
Here the previous frame is used as a background estimate. This evidently works only
in particular conditions of objects speed and frame rate, and is very sensitive to the
threshold.
36

2. Average or median
Background image is obtained as the average or the median of the previous n frames.
This method is rather fast, but needs large memory. The memory requirement is n
*size(frame).
3. Background obtained as the running average
B(i + 1) = * F(i) + (1 - ) * B(i)
(4.3)
Where , the learning rate, is typically 0.05 and no more memory
requirements.
We are using adaptive background subtraction algorithm:

FIGURE 4.2: BACKGROUND SUBSTRACTION

37

Other methods are:


Median, running average give the fastest speed [14]. Mixture of Gaussians,
KDE, eigenbackgrounds, SKDA, optimized mean-shift gives intermediate
speed while standard meanshiftgives slowest speed.
For memory requirements, average, median, KDE [14], mean-shift consumes
highest memory.Mixture of Gaussian, eigenbackgrounds, SKDA consumes
intermediate and running average consumes very low memory.
For accuracy parameter, mixture of Gaussians and Eigen backgrounds
provide good accuracy and the simple methods such as standard average,
running average, and median can provide acceptable accuracy in specific
applications.

4.3 Image Segmentation


Thresholding:
Thresholding means classify image histogram by one or several
thresholds. The pixels are classified based on gray scale values lying within a gray
scale class. The process of thresholding involves deciding a gray scale value to
distinguish different classes and this gray scale value is called threshold.
Threshold based classification can be classified as global-threshold dividing and
local-threshold dividing. Global-threshold dividing involves obtaining threshold by
entire image information and dividing entire image. Local threshold dividing
involves obtaining thresholds in different regions and dividing each region based on
it.
In threshold segmentation, selecting threshold is the key. In traditional
segmentation, threshold is determined by one-dimension histogram. However, one
dimension histogram only reflects the distribution of image gray scale, without the
space correlation between image pixels. It may lead to error in segmentation and
dissatisfactory result. Other image segmentation algorithm includes region growing,
edge detection, clustering etc. Among these thresholding and region growing are
generally not used alone, but used in a series of treatment process. The disadvantage
is that it has inherent dependence on the selection of seed region and the order in
which pixels and regions are examined; the resulting segments by region splitting
may appear too square due to the splitting scheme.
The background subtraction image is a gray-scale image so it has to be

38

transformed in a binary image to make the segmented image (i.e. separation of the
foreground and the background). To transform a gray-scale image (255 values) in
binary image (2 values) a threshold must interfere. All the Pixel's values smaller than
this threshold is viewed as the background of the scene (value 0). This will eliminate
allot of noisy pixels which have, the most of the times, a close value and will
eliminate.
Too some of the Pixels which represent the shadows make by the moving
objects. Infect, in a gray-scale image, the shadow of an object, most of the time,
doesn't change allot the feature (color) of the Pixel. So this shadow, in the
background subtraction, has a small value.

4.4 Morphological Operations :

1. Erosion :
The first morphological operation used is the erosion. It's a basic
operation and its primary feature is to erode away the boundaries of the different
foreground regions. Thus this foreground objects will become smaller (little of them
will totally be vanished) and holes in objects will be bigger. Let X be a subset of E
and let B denote the structure element. The morphological
erosion is defined by:

In outline, all the pixels of the foreground object which can totally contain the
structure element B will be contained in the eroded object. For example, take
consider of a 3x3 square structure element having its morphological center the
same as the geometrical center. It is as follows:

39

To compute a binary erosion, all the Pixels of the foreground must be


process. For each pixel of the foreground, the algorithm puts the structure element
(the center of the structure element matches with the pixel) and tests if the structure
element is completely contained in the foreground. If it is not, the current pixel will
be considered like the background and on the contrary, if it is, the current pixel will
be contained in the eroded foreground

Figure 4.3: Erosion of a binary image with a disk structuring element

40

2. Dilation:
Like the erosion, the dilatation is the second basic operation and its primary
feature is to dilate the boundaries of the different foreground regions. Thus this
foreground objects will become bigger and holes in objects will be smaller (little of
them will totally disappear).
Let X be a subset of E and let B denote the structure element. The
morphological erosion is defined by:

Figure 4.4: Dilation of binary image


In outline, all the pixels of the background which can touch the
foreground regions, by putting on it the structure element B, will be contained in the
dilated object.
For example, take consider of a 3x3 square structure element having its
morphological center the same as the geometrical center (see illustration 7). To
compute a binary dilatation, all the Pixels of the background must be process. For
each pixel of the background, the algorithm puts the structure element (the center of
the structure element matches with the pixel) and tests if the structure element is in
touch with at least one pixel of the foreground. If it is, the current pixel will be
considered like the foreground and on the contrary, if it is not, the pixel will stay a
background pixel.

41

3) Opening:

Figure 4.5: Opening of binary image


The opening operation is a combination of the two basics operation (Erosion and
Dilatation). It's the dilation of the erosion and its primary feature is too eliminate
noise (small objects). This operation will separate blobs which are linked with a
small layer. Let X be a subset of E and let B denote the structure element. The
morphological
erosion is defined by:

Blob Analysis :
Once the segmentation is done, an another image processing must be
launched in the binary image. In fact, in order to count objects, the first step
to do is to identify all the objects on the scene and calculate all their features.
This process is called a Blob Analysis. It consists to analyze the binary image,
find all the blobs present and compute statistics for each one. Typically, the
blobs features usually calculated are area (number of pixels which compose
the blob), perimeter, and location and blob shape. In this process, it is
possible to filter the different blobs by their features. For example, if the
searching blobs have to have a minimum area, some blobs can be eliminate
with this algorithm if they don't respect this constraint (it permits to limit the
number of blobs, thus reduce the computing operations). Two different ways
of connection can be defined in the blob analysis algorithm depending of the
application. One consists to take the adjacent pixels along the vertical and the
horizontal as touching pixels and the other by including diagonally adjacent
pixels.
42

Chapter 5

RESULTS & CONCLUSION

5.1 TEST CODE FOR COLOR BARS


Generate a colorbars test box, height and width will define the size of the generated
colorbar buffer, the return value is a pointer to the buffer

void* generate_colorbars(
int height,
//height of colorbar buffer
int width)
//width of colorbar buffer
{
int xx = 0; //local horizontal counter
int yy = 0; //local vertical counter
void* localBoxBuffPtr; //buffer pointer to fill in with color bar
values
localBoxBuffPtr = malloc(height*width*2); //allocate the buffer
for( xx = 1; xx < height*width*2; xx+=1 ) //initialize the new
buffer
*( ( (unsigned char*)localBoxBuffPtr ) +xx) = 0x01//fill
in the data with clear value
for( yy = 0; yy < height; yy+=1 ){
for( xx = 0; xx < width*2; xx+=1 ){
if(xx > ((width*2)/8)*0 && xx < ((width*2)/8)*1){ //white
bar
((yy*width*2)+xx)) =
((yy*width*2)+xx)) =
((yy*width*2)+xx)) =

if(xx%4 ==
*( (
128;//Cb
if(xx%4 ==
*( (
180;//Y0
if(xx%4 ==
*( (
128;//Cr
if(xx%4 ==
*( (
180;//Y1

0) //If byte is Cb
(unsigned char*)localBoxBuffPtr ) +
1) //If byte is Y0
(unsigned char*)localBoxBuffPtr ) +
2) //If byte is Cr
(unsigned char*)localBoxBuffPtr ) +
3) //If byte is Y1
(unsigned char*)localBoxBuffPtr ) +

((yy*width*2)+xx)) =
}
if(xx > ((width*2)/8)*1 && xx < ((width*2)/8)*2){
//yellow bar
if(xx%4 == 0) //If byte is Cb
*( ( (unsigned char*)localBoxBuffPtr ) +
((yy*width*2)+xx)) = 44;//Cb
if(xx%4 == 1) //If byte is Y0
*( ( (unsigned char*)localBoxBuffPtr ) +
((yy*width*2)+xx)) = 162;//Y0

43

if(xx%4 == 2) //If byte is Cr


*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 142;//Cr
if(xx%4 == 3) //If byte is Y1
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 162;//Y1
}
if(xx > ((width*2)/8)*2 && xx < ((width*2)/8)*3){
bar
if(xx%4 == 0) //If byte is Cb
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 156;//Cb
if(xx%4 == 1) //If byte is Y0
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 131;//Y0
if(xx%4 == 2) //If byte is Cr
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 44;//Cr
if(xx%4 == 3) //If byte is Y1
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 131;//Y1
}
if(xx > ((width*2)/8)*3 && xx < ((width*2)/8)*4){
bar
if(xx%4 == 0) //If byte is Cb
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 72;//Cb
if(xx%4 == 1) //If byte is Y0
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 112;//Y0
if(xx%4 == 2) //If byte is Cr
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 58;//Cr
if(xx%4 == 3) //If byte is Y1
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 112;//Y1
}
if(xx > ((width*2)/8)*4 && xx < ((width*2)/8)*5){
//magenta bar
if(xx%4 == 0) //If byte is Cb
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 184;//Cb
if(xx%4 == 1) //If byte is Y0
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 84;//Y0
if(xx%4 == 2) //If byte is Cr
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 198;//Cr
if(xx%4 == 3) //If byte is Y1
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 84;//Y1
}
if(xx > ((width*2)/8)*5 && xx < ((width*2)/8)*6){
bar

44

+
+
//cyan
+
+
+
+
//green
+
+
+
+

+
+
+
+
//red

((yy*width*2)+xx)) =
((yy*width*2)+xx)) =
((yy*width*2)+xx)) =

if(xx%4 ==
*( (
100;//Cb
if(xx%4 ==
*( (
65;//Y0
if(xx%4 ==
*( (
212;//Cr
if(xx%4 ==
*( (
65;//Y1

0) //If byte is Cb
(unsigned char*)localBoxBuffPtr ) +
1) //If byte is Y0
(unsigned char*)localBoxBuffPtr ) +
2) //If byte is Cr
(unsigned char*)localBoxBuffPtr ) +
3) //If byte is Y1
(unsigned char*)localBoxBuffPtr ) +

((yy*width*2)+xx)) =
}
if(xx > ((width*2)/8)*6 && xx < ((width*2)/8)*7){
bar
if(xx%4 == 0) //If byte is Cb
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 212;//Cb
if(xx%4 == 1) //If byte is Y0
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 35;//Y0
if(xx%4 == 2) //If byte is Cr
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 114;//Cr
if(xx%4 == 3) //If byte is Y1
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 35;//Y1
}
if(xx > ((width*2)/8)*7 && xx < ((width*2)/8)*8){
bar
if(xx%4 == 0) //If byte is Cb
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 128;//Cb
if(xx%4 == 1) //If byte is Y0
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 16;//Y0
if(xx%4 == 2) //If byte is Cr
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 128;//Cr
if(xx%4 == 3) //If byte is Y1
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 16;//Y1
}
}//for xx...
}//for yy...
return localBoxBuffPtr;
} // End generate_colorbars()

45

//blue
+
+
+
+
//black
+
+
+
+

FIGURE 5.1 : COLOR BAR OUTPUT

46

5.2 EDGE DETECTION USING IMGLIB


The Texas Instruments C64x+ IMGLIB is an optimized Image/Video
Processing Functions Library for C programmers using TMS320C64x+ devices. It
includes many C-callable, assembly optimized, and general-purpose image/video
processing routines. These routines are used in real-time applications where optimal
execution speed is critical. Using these routines assures execution speeds
considerably faster than equivalent code written in standard ANSI C language. In
addition, by providing ready-to-use DSP functions, TI IMGLIB can significantly
shorten image/video processing application development time.
In case of Code Composer Studio, IMGLIB can be added by selecting
Add Files to Project from the Project menu, and choosing imglib2.l64P from the list
of libraries under the c64plusthan in imglib_v2xx folder. Also, ensure that it have
linked to the correct run time support library (rts64plus.lib). An alternate to include
the above two libraries in your project is to add the following lines in your linker
command file: -lrts64plus.lib -limglib2.l64P The include directory contains the
header files necessary to be included in the C code when you call an IMGLIB2
function from C code, and should be added to the "include path" in CCS build
options. The Image and Video processing Library (IMGLIB) [] is which is having
70 building block kernels that can be used for image and video processing
applications.
IMGLIB includes:
Compression and Decompression : DCT, motion estimation, quantization,
wavelet Processing
Image Analysis: Boundary and perimeter estimation, morphological
operations, edge detection, image histogram, image thresholding
Image Filtering & Format Conversion: image convolution, image
Correlation, median filtering, color space conversion
VLIB is software library having more than 40 kernels from TI accelerates video
analytics development and increases performance up to 10 times. This 40+ kernels
provide the ability to perform:
Background Modeling & Subtraction
Object Feature Extraction
Tracking & Recognition
Low-level Pixel Processing
Step 1: open Video preview project; video_preview.pjt.
Step 2: Add these two library for sobel and median filter function.
47

#include <C:\dvsdk_1_01_00_15\include\IMG_sobel_3x3_8.h>
#include <C:\dvsdk_1_01_00_15\include\IMG_median_3x3_8.h>
Step 3: Add these two caller function following parameter. frameBuffPtr is an
array of structure frame, in order to access access the frame buffer pointer use
frame.frameBufferPtr. and 576, 1440 are the length and width of the frame.
FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
IMG_sobel_3x3_8((frameBuffPtr->frame.frameBufferPtr),(frameBuffPtr>frame.frameBufferPtr),480,1440);
IMG_median_3x3_8((frameBuffPtr->frame.frameBufferPtr),8,(frameBuffPtr>frame.frameBufferPtr));
FVID_exchange(hGioVpbeVid0, &frameBuffPtr);

FIGURE 5.2: EDGE DETECTION OUTPUT WITH WHITE


BACKGROUND

48

5.3 Background Subtraction On DM6437


The simple way to access the frame buffer itself is to reach into the structure
with frameBuffPtr->frame.frameBufferPtr. This will return the address of the
current frame you recently swapped or plan to swap with FVID_exchange(). You
can also type cast it to a type you find more useful, below is a small example of
extracting the frame buffer pointer from a FVID_exchange() call.
int* framepointer;
FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
framepointer = (int*)(frameBuffPtr->frame.frameBufferPtr);

Note the frame you are now pointing to is an interleaved YCbCr 4:2:2 stream, so
every other byte is a Y, every 4th byte is a Cb, and every other 4th byte is a Cr (i.e.
Cb Y Cr Y Cb Y Cr Y Cb Y...), which also means the size of the buffer will be your
frame width x height x 2 bytes per pixel

void imagebw(void* currentFrame,int x,


{
int xx = 0;

int y)

for( xx = 1; xx < (x * y)*2; xx+=2 )


*( ( (unsigned char*)currentFrame ) + xx )=0x80;

}
void copyframe(void *currentFrame) //stored background model
{
int xx = 0;
for( xx = 1; xx < 829440; xx+=1 )
arr[xx]=*( ( (unsigned char*)currentFrame+xx ));
}

void writeframe(void *currentFrame)// return frame to frame buffer


{
int xx = 0;
49

for( xx = 1; xx < 829440; xx+=1 )


*( ( (unsigned char*)currentFrame+xx ))=arr1[xx];
}
void subtract(void *currentFrame) //frame subtraction
{
int xx = 0;
for( xx = 1; xx < 829440; xx+=1 )
arr1[xx]=*( ( (unsigned char*)currentFrame+xx ))-arr[xx];
}

5.4 CONCLUSION
We have used background subtraction method, utilized blob analysis
using edge detection algorithms and arrived at estimating the count of people from
a video. While this technique gives accurate results in an environment wherein
people arent carrying objects and also they arent too close to each other.
So in a nutshell, we can successfully count the people in restricted
environment but to make this system more generic, we have to resort to another
techniques.

50

5.5 FUTURE SCOPE


The system can be improved to make sure that objects carried by people
in a video will not be counted as separate people.
Also if people who are close to each other in an input video then using
present system their count will not be accurate as blob analysis will be faulty. This
can be improved in future systems.
.

51

CHAPTER 6

REFERENCES

[1] Real-time people counting system using video camera by Roy-Relend Berg.
Presented on 2007/05/30.
[2] Real-Time People Counting system using Video Camera by Damien LEFLOCH
Master presented at Master of Computer Science, Image and Artificial Intelligence 2007
Supervisors: Jon Y. Hardeberg, Faouzi Alaya Cheikh, Pierre Gouton

[3] TI E2E Community - https://e2e.ti.com/


[4] https://processors.wiki.ti.com

[5] Fast People Counting Using Head Detection From Skeleton Graph {2010
Seventh IEEE International Conference on Advanced Video and Signal Based
Surveillance} by Djamel MERAD, Kheir-Eddine AZIZ. Nicolas THOME.
[6] Digital Image Processing by Rafeal C Gonzalez and Richard E Woods
[7] Handbook of Image Video Processing Editor Al Bovik
[8] TMS320C64x+ DSP Image/Video Processing Library (v2.0.1) (Programmer's
Guide by Texas Instrument)
[9] TM320DM6437 DVDP getting started guide
[10] TM320DM6437 Datasheet
[11] Spectrum digital technical reference manual DM 6437

52

APPENDIX
A} Video File Formats
The three most popular color models are RGB (used in computer
graphics), YIQ, YUV, or YCbCr (used in video systems) and CMYK (used in color
printing). The YUV color space is used by the PAL, NTSC and SECAM (Sequential
Couleur Avec M moire or Sequential Color with Memory) composite color video
standards. The black-and-white system used only luma (Y) information; color
information (U and V) was added in such a way that a black-and-white receiver
would still display a normal black-and-white picture. For digital RGB values with a
range of 0:255, Y has a range of 0:255, U a range of 0 to +/-112, and V a range of 0
to +/-157. YCbCr, or its close relatives, YUV, YUV, YCbCr, and YPbPr are
designed to be efficient at encoding RGB values so they consume less space while
retaining the full perceptual value. YCbCr is a scaled and offset version of YUV
color space. Y is defined to have a nominal 8-bit range of 16-235; Cb and Cr are
defined to have a nominal range of 16-240. There are several YCbCr sampling
formats, such as 4:4:4, 4:2:2, 4:1:1, and 4:2:0.
Now, if we filter a 4:4:4 YCbCr signal by subsampling the Chroma by a
factor of two horizontally, we end up with 4:2:2 implies that there are four luma
values for every two Chroma values on a given video line. Each (Y, Cb) or (Y, Cr)
pair represents one pixel value. Another way to say this is that a Chroma pair
coincides spatially with every another luma value. 4:2:2 YCbCr qualitatively shows
little loss in image quality compared with its 4:4:4 YCbCr source, even though it
represents a saving of 33% in bandwidth over 4:4:4 YCbCr. 4:2:2 YCbCr is a
foundation for the ITU-R BT.601 video recommendation, and it is the most common
format for transferring digital video between subsystem components.

FIGURE A.1 : YCbCr Sampling

53

Das könnte Ihnen auch gefallen