Beruflich Dokumente
Kultur Dokumente
PROJECT REPORT
On
Project Guide
Prof. P. Mahajani
(Internal Guide)
Sponsored by
College
Year: 2015-2016
Maharashtra Institute of Technology, Pune 38
Department of Electronics and Telecommunication
MAEERs
CERTIFICATE
This is to certify that the Project entitled
ESTIMATING THE COUNT OF PEOPLE FROM A VIDEO
has been carried out successfully by
Prasad Sunil Udawant (B120023170)
Tejas Suresh Pandhare (B120023102)
Pratik Gahininath Kekane (B120023117)
during the Academic Year 2015-2016 in partial fulfilment of their course of study for
Bachelors Degree in
Electronics and Telecommunication
as per the syllabus prescribed by the
Shri Savitribai Phule Pune University.
ABSTRACT
There is an ever increasing pressure to provide services for an ever
increasing human population. On many occasions, managing the people becomes
critical especially at public places like pilgrimage centers, malls, tourist places. This
is where technology comes to help us out.
There are existing technologies to detect people in an enclosed
environment. These technologies use sensors like infra-red, thermal etc. They have
varying accuracies and drawbacks. Nowadays, there is an increasing trend in video
based solution. With image processing providing a variety of processing techniques
and efficient algorithms, it assures a pin-point accuracy. We are making use of Da
Vinci video processor (DM 6437) which boasts a very long instruction word
architecture (VLIW) developed by Texas Instrument. The video processing back end
and front end (VPBE and VPFE) are video specific platforms that enable easy
processing of real time video. The features of Da Vinci combined with image
segmentation would help detect the number of people. We will use techniques like
histogram, K-means along with edge detection to ensure reliability of count.
On implementation of our proposed model, we will be able to detect the
number of people in a specific environment in a response a real time video input.
This will open up gates for number of controlling actions depending on application
at hand.
ACKNOWLEDGEMENT
We sincerely thank our final year mentor Prof. (Mrs.) P. Mahajani for
all her support and help. She gave shape to our abstract idea with stimulating
suggestions and encouragement, resulting in a successful project. Her timely
guidance was the reason for the systematic progress of the project
We also appreciate the role of all the other staff members, departmental
facilities and HoD Sir who helped in approving our project by conducting several
review sessions to track our progress and direct us in the correct path.
LIST OF ABBREVIATIONS
APL
DVDP
DVSDK
EPSI
EVM
GPP
HD
IOL
NTSC
PAL
SD
SPL
VISA
VPBE
VPFE
VPSS
xDAIS
xDM
Application Layer
Digital Video Development Platform
Digital Video Software Development Kit
Embedded Peripheral Software Interface
Evaluation Module
General Purpose processor
High Definition
Input output Layer
National Television System Committee
Phase Alternating Line
Standard Definition
Signal Processing Layer
Video Image Speech Audio
Video Processing Back End
Video Processing Front End
Video Processing Sub System
eXpressDSP Algorithm Interface Standard
eXpressDSP Digital Media
LIST OF FIGURES
Serial no.
Figure Name
2.1
2.2
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
4.1
System Flowchart
4.2
Background Subtraction
4.3
4.4
4.5
5.1
5.2
A.1
YCbCr Sampling
CONTENTS
Chapter 1. Introduction..................8
1.1 Scope of project.............................10
1.2 Organization of report........................11
Chapter 2. Literature Survey................12
2.1 Present Scenario.....................................14
Chapter 3. System Development..... 16
3.1 System specifications...................................................................................................... 16
3.2 System block diagram & description.............................................................................. 17
3.3 System block components................................................................................................ 19
Video standards.19
Color CCD Camera...21
TM320DM6437 DA VINCI Video processor..24
3.4 Complexities involved......33
Chapter 4. System Design.....................35
4.1 Image preprocessing.........................................................................................................36
4.2 Background Substraction..................................................................................................36
4.3Image Segmentation...........................................................................................................38
4.4 Morphological operations.............................................................................................39
Chapter 5. Implementation of system & Results........................41
5.1 Test Code for Color Bars..43
5.2 Edge Detection using IMGLIB.....47
5.3 Background substraction on DM6437.......49
Chapter 6. References............................52
APPENDIX A:Video File Format.....53
Chapter 1
INTRODUCTION
1.1
SCOPE OF PROJECT
The motivation for such a system is to gather data about how many people
there are inside a building at a given time. This will help the owners to set up
re extinguishing equipment or the size and placement of re exits. Building
owners are required by law to have enough of this equipment based on how
many people that can gather inside. A computer vision counting system have
the advantage of not disrupting the ow of traffic like contact based systems
might do, and more robust then simple photoelectric cells. Knowing when and
how many customers are inside a shopping mall could also help to optimize
labor scheduling, system controls and monitor the effectiveness of
promotional events. Optimization of security measures is also a possible
benet from this, knowing how many security guards should be assigned, and
hot-spots inside the mall for them to patrol.
Our work will hopefully answer the questions like:1. Can we achieve better separation of groups into individuals?
Exploring algorithms and methods so that each individual can be
detected, tracked and counted.
2. Can we nd features to discriminate people from objects?
Features need to be found and combined to make good decisions about
foreground objects
3. Can our proposed algorithm detect moving as well as stationary
humans?
10
11
Chapter 2
LITERATURE SURVEY
tracking and image inversion using Davinci processor. More specifically we track
single object and two object present in the scene captured by a CC camera that acts
as the video input device and output is displayed in LCD display. The tracking
happens in real-time consuming 30 frames per second (fps) and is robust to
background and illumination changes. The performance of single object tracking
using background subtraction and blob detection was very efficient in speed and
accuracy as compared to a PC (Matlab) implementation of a similar algorithm.
Execution time for different blocks of single object tracking were estimated using
the profile and accuracy of the detection is verified using the debugger provided by
TI code composer studio (CCS). We demonstrate that the TMS320DM6437
processor provides at least ten-times speed-up and is able to track a moving object
in real-time.
13
their low
cost and power consumption, small form factor and unobtrusive and privacypreserving interaction. In particular, a dense array of PIR sensors having digital
14
output and the modulated visibility of Fresnel lenses can provide capabilities for
tracking human motion, identifying walking subject and counting people entering or
leaving the entrance of a room or building. However, the analog output signal of PIR
sensors involves more aspects beyond simple people presence, including the
distance of the body from the PIR sensor, the velocity of the movement (i.e.,
direction and speed), body shape and gait (i.e., a particular way or manner of
walking). Thus, we can leverage discriminative features of the analog output signal
of PIR sensors in order to develop various applications for indoor human tracking
and localization
A number of systems are based upon a similar approach i.e. feature based
regression. This involves detection of humans based upon the features extracted
from background and foregrounds of the image. The interpretation of these features
varies from system to system. While many follow a complex mathematical model
to retrieve some meaningful information.
Chapter 3
SYSTEM DEVELOPMENT
15
Software:
Code Composer Studio v3.3
MATLAB 2014a (for testing purposes only)
Hardware:
TM320DM6437 DA VINCI Video processor
CRT TV Box
Color CCD camera for video surveillance
System Block Description:CCD camera will be installed in an area which is to be brought under video
surveillance. It will be installed in the specific area at a minimum height and
a minimum angle which would be able to detect people with varying heights.
The video stream, which is in PAL form, is then fed to the DA VINCI video
processor. This device includes a Video Processing Sub-System (VPSS) with
two configurable video/imaging peripherals:
1) Video processing front-end input (VPFE) used for video capture,
2) Video processing back-end output (VPBE) output.
Video processing front-end is comprised of CCD controller, a
preview engine, Histogram module, Auto-exposure/white balance, focus
module, and resizer. The common video decoders CMOS sensors, CCDs can
17
18
19
The two video standards that are currently employed in any television system are as
following:
A] NTSC (National Television System Committee)
> 29.97 interlaced frames of video per second
> Scans 525 lines per second. Out of these 525 lines, 480 are for visible raster and
others are for synchronization and vertical retrace
> Gives higher temporal resolution than PAL
> Screen updates more frequently and hence motion is rendered better in NTSC than
in PAL video
B] PAL (Phase Alternating Line)
> PAL alternates the Chroma phase between each line of video such that if there are
any drifts in Chroma decoding they average out between lines. NTSC doesnt have
this protection and as a result its Chroma reproduction can be wrong, however PAL
can be accused of having less Chroma detail.
> PAL specifies 786 pixels per line, 625 lines or 50 fields (25 frames) per second
> PAL gives higher spectral resolution than NTSC
> PAL video is of higher resolution than NTSC video
20
Picture Elements
Horizontal Resolution
Sensitivity
S/N Ratio
Over 48 dB
21
Electronic Shutter
1/60(1/50) to 1/100,000
Auto Iris
On/Off Switch
Gamma Correction
0.45
Video output
Power Source
Sync. Mode
Internal Sync.
Lens Mount
C/CS mount
Power Consumption
3W Max.
SPECIAL FEATURES
22
view, so that dimly lit foreground object at center area can be clearly
distinguished from brightly lit backgrounds.
BLC Should not be used unless it is needed to compensate for backlit.
23
25
Da Vinci Software:
It is inter communicable, optimized, video and audio standards. Codecs
leveraging DSP and integrated accelerators, APIs within operating systems
(Linux) for rapid software implementation. I.e. Codec Engine, DSP BIOS,
NDK, Audio and Video Codec.
Da Vinci Development Tools/Kits:
Complete development kits along with reference designs, DM6437 DVSDK,
Code Composer Studio, Green Hill, and Virtual Linux. Da Vinci Video
Processor solution are the tailored for digital video, image and vision
application. The Da Vinci platform includes a general purpose processor
(GPP), video ac
26
Engine, Link and DSP/BIOS. Memory buffers, along with their pointers, provide
input and output to the xDM functions. This decouples the SPL from all other layers.
The Signal Processing layer (SPL) presents VISA APIs to all other layers. The main
component of the SPL are xDM, XDAIS, VISA APIS and Codec Engine Interface.
>> Input output Layer (IOL):
The Input Output Layer (IOL) covers all the peripheral drivers and
generates buffers for input or output data. Whenever a buffer is full or empty, an
interrupt is generated to the APL. Typically, these buffers reside in shared memory,
and only pointers are passed from IOL to the APL and eventually to SPL. The IOL
is delivered as drivers integrated into an Operating System such as Linux OS or
WinCE. In the case of Linux, these drivers reside in the kernel space of Linux OS.
The Input Output layer (IOL) presents the OS-provided APIs as well as EPSI APIs
to all other layers. IOL contains Video Processing Subsystem(VPSS) device driver
used for video capturing and displaying, USB driver to capture video to USB based
media, debug is done by using UART serial port driver for console application, when
we want to captured video is sent
over the network we need for Ethernet driver that is EMAC and VPFE driver
internally uses I2C driver for communication protocol, for audio processing system
Multichannel Audio Serial Port (McASP) driver are used, for buffering of stream
data we are using Multichannel Buffered Serial Port (McBSP) driver are used.
>>Application Layer (APL):
The Application layer interacts with IOL and SPL. It makes calls to IOL
for data input and output, and to SPL for processing. The Sample Application Thread
(SAT) is a sample application component that shows how to call EPSI and VISA
APIS and interfaces with SPL and IOL as built in library functions. All other
application components are left to the developer. He may develop them or leverage
the vast open source community software. These include, but not limited to,
Graphical User Interfaces (GUI), middle ware, networking stack, etc. Master thread
is the highest level thread such as an audio or video thread that handles the opening
of I/O resources (through EPSI API), the creation of processing algorithm instances
(through VISA API), as well as the freeing of these resources. Once necessary
resources for a given task are acquired, the master thread specifies an input source
for data (usually driver or file), the processing to be performed on the input data
(such as compression or decompression) and an output source for the processed data
(usually driver or file).
The Network Developers Kit (NDK) provides services such as HTTP
server, DHCP client/server, DNS server, etc. that reside in the application layer.
28
Note that these services use the socket interface of the NDK, which resides in the
I/O layer, so the NDK spans both layers.
29
30
NTSC/PAL system also includes video A/D converter second is timing generator,
responsible for generate the specific timing required for analog video output and
lastly digital LCD controller, which supports various LCD display formats, YUV
outputs for interface to high-definition video encoders and/or DVI/HDMI interface
devices.
32
erroneous count of people. To get rid of such situation we have included head
detection and pose estimation in our algorithm.
34
Chapter 4
SYSTEM DESIGN
Introduction
People counting algorithm is applied to different application such as, automated
video surveillance, traffic monitoring, stampede management etc. It involves various
image processing algorithm such as image segmentation, morphological image
processing.
(4.1)
The background image varies due to many factors such as illumination changes
(gradual or sudden changes due to clouds in the background), changes due to camera
oscillations, changes due to high-frequencies background objects (such as tree
branches, sea waves etc.).
The basic methods for background subtraction are:
1. Frame difference
|frame(i) - frame(i - 1)| > Threshold
(4.2)
Here the previous frame is used as a background estimate. This evidently works only
in particular conditions of objects speed and frame rate, and is very sensitive to the
threshold.
36
2. Average or median
Background image is obtained as the average or the median of the previous n frames.
This method is rather fast, but needs large memory. The memory requirement is n
*size(frame).
3. Background obtained as the running average
B(i + 1) = * F(i) + (1 - ) * B(i)
(4.3)
Where , the learning rate, is typically 0.05 and no more memory
requirements.
We are using adaptive background subtraction algorithm:
37
38
transformed in a binary image to make the segmented image (i.e. separation of the
foreground and the background). To transform a gray-scale image (255 values) in
binary image (2 values) a threshold must interfere. All the Pixel's values smaller than
this threshold is viewed as the background of the scene (value 0). This will eliminate
allot of noisy pixels which have, the most of the times, a close value and will
eliminate.
Too some of the Pixels which represent the shadows make by the moving
objects. Infect, in a gray-scale image, the shadow of an object, most of the time,
doesn't change allot the feature (color) of the Pixel. So this shadow, in the
background subtraction, has a small value.
1. Erosion :
The first morphological operation used is the erosion. It's a basic
operation and its primary feature is to erode away the boundaries of the different
foreground regions. Thus this foreground objects will become smaller (little of them
will totally be vanished) and holes in objects will be bigger. Let X be a subset of E
and let B denote the structure element. The morphological
erosion is defined by:
In outline, all the pixels of the foreground object which can totally contain the
structure element B will be contained in the eroded object. For example, take
consider of a 3x3 square structure element having its morphological center the
same as the geometrical center. It is as follows:
39
40
2. Dilation:
Like the erosion, the dilatation is the second basic operation and its primary
feature is to dilate the boundaries of the different foreground regions. Thus this
foreground objects will become bigger and holes in objects will be smaller (little of
them will totally disappear).
Let X be a subset of E and let B denote the structure element. The
morphological erosion is defined by:
41
3) Opening:
Blob Analysis :
Once the segmentation is done, an another image processing must be
launched in the binary image. In fact, in order to count objects, the first step
to do is to identify all the objects on the scene and calculate all their features.
This process is called a Blob Analysis. It consists to analyze the binary image,
find all the blobs present and compute statistics for each one. Typically, the
blobs features usually calculated are area (number of pixels which compose
the blob), perimeter, and location and blob shape. In this process, it is
possible to filter the different blobs by their features. For example, if the
searching blobs have to have a minimum area, some blobs can be eliminate
with this algorithm if they don't respect this constraint (it permits to limit the
number of blobs, thus reduce the computing operations). Two different ways
of connection can be defined in the blob analysis algorithm depending of the
application. One consists to take the adjacent pixels along the vertical and the
horizontal as touching pixels and the other by including diagonally adjacent
pixels.
42
Chapter 5
void* generate_colorbars(
int height,
//height of colorbar buffer
int width)
//width of colorbar buffer
{
int xx = 0; //local horizontal counter
int yy = 0; //local vertical counter
void* localBoxBuffPtr; //buffer pointer to fill in with color bar
values
localBoxBuffPtr = malloc(height*width*2); //allocate the buffer
for( xx = 1; xx < height*width*2; xx+=1 ) //initialize the new
buffer
*( ( (unsigned char*)localBoxBuffPtr ) +xx) = 0x01//fill
in the data with clear value
for( yy = 0; yy < height; yy+=1 ){
for( xx = 0; xx < width*2; xx+=1 ){
if(xx > ((width*2)/8)*0 && xx < ((width*2)/8)*1){ //white
bar
((yy*width*2)+xx)) =
((yy*width*2)+xx)) =
((yy*width*2)+xx)) =
if(xx%4 ==
*( (
128;//Cb
if(xx%4 ==
*( (
180;//Y0
if(xx%4 ==
*( (
128;//Cr
if(xx%4 ==
*( (
180;//Y1
0) //If byte is Cb
(unsigned char*)localBoxBuffPtr ) +
1) //If byte is Y0
(unsigned char*)localBoxBuffPtr ) +
2) //If byte is Cr
(unsigned char*)localBoxBuffPtr ) +
3) //If byte is Y1
(unsigned char*)localBoxBuffPtr ) +
((yy*width*2)+xx)) =
}
if(xx > ((width*2)/8)*1 && xx < ((width*2)/8)*2){
//yellow bar
if(xx%4 == 0) //If byte is Cb
*( ( (unsigned char*)localBoxBuffPtr ) +
((yy*width*2)+xx)) = 44;//Cb
if(xx%4 == 1) //If byte is Y0
*( ( (unsigned char*)localBoxBuffPtr ) +
((yy*width*2)+xx)) = 162;//Y0
43
44
+
+
//cyan
+
+
+
+
//green
+
+
+
+
+
+
+
+
//red
((yy*width*2)+xx)) =
((yy*width*2)+xx)) =
((yy*width*2)+xx)) =
if(xx%4 ==
*( (
100;//Cb
if(xx%4 ==
*( (
65;//Y0
if(xx%4 ==
*( (
212;//Cr
if(xx%4 ==
*( (
65;//Y1
0) //If byte is Cb
(unsigned char*)localBoxBuffPtr ) +
1) //If byte is Y0
(unsigned char*)localBoxBuffPtr ) +
2) //If byte is Cr
(unsigned char*)localBoxBuffPtr ) +
3) //If byte is Y1
(unsigned char*)localBoxBuffPtr ) +
((yy*width*2)+xx)) =
}
if(xx > ((width*2)/8)*6 && xx < ((width*2)/8)*7){
bar
if(xx%4 == 0) //If byte is Cb
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 212;//Cb
if(xx%4 == 1) //If byte is Y0
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 35;//Y0
if(xx%4 == 2) //If byte is Cr
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 114;//Cr
if(xx%4 == 3) //If byte is Y1
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 35;//Y1
}
if(xx > ((width*2)/8)*7 && xx < ((width*2)/8)*8){
bar
if(xx%4 == 0) //If byte is Cb
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 128;//Cb
if(xx%4 == 1) //If byte is Y0
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 16;//Y0
if(xx%4 == 2) //If byte is Cr
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 128;//Cr
if(xx%4 == 3) //If byte is Y1
*( ( (unsigned char*)localBoxBuffPtr )
((yy*width*2)+xx)) = 16;//Y1
}
}//for xx...
}//for yy...
return localBoxBuffPtr;
} // End generate_colorbars()
45
//blue
+
+
+
+
//black
+
+
+
+
46
#include <C:\dvsdk_1_01_00_15\include\IMG_sobel_3x3_8.h>
#include <C:\dvsdk_1_01_00_15\include\IMG_median_3x3_8.h>
Step 3: Add these two caller function following parameter. frameBuffPtr is an
array of structure frame, in order to access access the frame buffer pointer use
frame.frameBufferPtr. and 576, 1440 are the length and width of the frame.
FVID_exchange(hGioVpfeCcdc, &frameBuffPtr);
IMG_sobel_3x3_8((frameBuffPtr->frame.frameBufferPtr),(frameBuffPtr>frame.frameBufferPtr),480,1440);
IMG_median_3x3_8((frameBuffPtr->frame.frameBufferPtr),8,(frameBuffPtr>frame.frameBufferPtr));
FVID_exchange(hGioVpbeVid0, &frameBuffPtr);
48
Note the frame you are now pointing to is an interleaved YCbCr 4:2:2 stream, so
every other byte is a Y, every 4th byte is a Cb, and every other 4th byte is a Cr (i.e.
Cb Y Cr Y Cb Y Cr Y Cb Y...), which also means the size of the buffer will be your
frame width x height x 2 bytes per pixel
int y)
}
void copyframe(void *currentFrame) //stored background model
{
int xx = 0;
for( xx = 1; xx < 829440; xx+=1 )
arr[xx]=*( ( (unsigned char*)currentFrame+xx ));
}
5.4 CONCLUSION
We have used background subtraction method, utilized blob analysis
using edge detection algorithms and arrived at estimating the count of people from
a video. While this technique gives accurate results in an environment wherein
people arent carrying objects and also they arent too close to each other.
So in a nutshell, we can successfully count the people in restricted
environment but to make this system more generic, we have to resort to another
techniques.
50
51
CHAPTER 6
REFERENCES
[1] Real-time people counting system using video camera by Roy-Relend Berg.
Presented on 2007/05/30.
[2] Real-Time People Counting system using Video Camera by Damien LEFLOCH
Master presented at Master of Computer Science, Image and Artificial Intelligence 2007
Supervisors: Jon Y. Hardeberg, Faouzi Alaya Cheikh, Pierre Gouton
[5] Fast People Counting Using Head Detection From Skeleton Graph {2010
Seventh IEEE International Conference on Advanced Video and Signal Based
Surveillance} by Djamel MERAD, Kheir-Eddine AZIZ. Nicolas THOME.
[6] Digital Image Processing by Rafeal C Gonzalez and Richard E Woods
[7] Handbook of Image Video Processing Editor Al Bovik
[8] TMS320C64x+ DSP Image/Video Processing Library (v2.0.1) (Programmer's
Guide by Texas Instrument)
[9] TM320DM6437 DVDP getting started guide
[10] TM320DM6437 Datasheet
[11] Spectrum digital technical reference manual DM 6437
52
APPENDIX
A} Video File Formats
The three most popular color models are RGB (used in computer
graphics), YIQ, YUV, or YCbCr (used in video systems) and CMYK (used in color
printing). The YUV color space is used by the PAL, NTSC and SECAM (Sequential
Couleur Avec M moire or Sequential Color with Memory) composite color video
standards. The black-and-white system used only luma (Y) information; color
information (U and V) was added in such a way that a black-and-white receiver
would still display a normal black-and-white picture. For digital RGB values with a
range of 0:255, Y has a range of 0:255, U a range of 0 to +/-112, and V a range of 0
to +/-157. YCbCr, or its close relatives, YUV, YUV, YCbCr, and YPbPr are
designed to be efficient at encoding RGB values so they consume less space while
retaining the full perceptual value. YCbCr is a scaled and offset version of YUV
color space. Y is defined to have a nominal 8-bit range of 16-235; Cb and Cr are
defined to have a nominal range of 16-240. There are several YCbCr sampling
formats, such as 4:4:4, 4:2:2, 4:1:1, and 4:2:0.
Now, if we filter a 4:4:4 YCbCr signal by subsampling the Chroma by a
factor of two horizontally, we end up with 4:2:2 implies that there are four luma
values for every two Chroma values on a given video line. Each (Y, Cb) or (Y, Cr)
pair represents one pixel value. Another way to say this is that a Chroma pair
coincides spatially with every another luma value. 4:2:2 YCbCr qualitatively shows
little loss in image quality compared with its 4:4:4 YCbCr source, even though it
represents a saving of 33% in bandwidth over 4:4:4 YCbCr. 4:2:2 YCbCr is a
foundation for the ITU-R BT.601 video recommendation, and it is the most common
format for transferring digital video between subsystem components.
53