You are on page 1of 11

3D Res (2014) 5:5

DOI 10.1007/s13319-013-0005-0

3DR EXPRESS

Employing Light Field Cameras in Surveillance:


An Analysis of Light Field Cameras in a Surveillance
Scenario
Rogério Seiji Higa • Yuzo Iano •
Ricardo Barroso Leite • Roger Fredy
Larico Chavez • Rangel Arthur

Received: 12 July 2013 / Revised: 10 November 2013 / Accepted: 8 December 2013


Ó 3D Research Center, Kwangwoon University and Springer-Verlag Berlin Heidelberg 2014

Abstract The light field cameras are becoming a Keywords Light field  Surveillance 
new trend in photography. They have characteristics Plenoptic  Depth of field  Depth map 
that could overcome some problems present in Framework
surveillance videos, like refocusing and depth esti-
mation. The main advantage is that the light field
camera allows focus reconstruction, increasing the
depth of field. However, these new cameras require 1 Introduction
different processing power and storage than conven-
tional cameras. In this paper, it is shown how these The image resolution of surveillance cameras has
cameras work and how they could be employed in increased over the years but this does not guarantee a
surveillance. Images of a standard surveillance cam- good quality image to recognize the face of a person of
era versus the plenoptic camera are compared and interest [1]. A big resolution sensor is not enough for
some issues about the design of the light field camera good image quality. The sensor/lens set is what
are discussed. Finally, we compare the performance of determines the image quality.
video compression standards applied to plenoptic One of the difficulties of the normal surveillance
image sequences. The power signal noise ratio results cameras is that the depth of field is limited, if two
show that MPEG4 is the best choice. objects at different depths are in the scene, one of them
will be in focus and the other one will be out of focus.
Sometimes the surveillance videos seen on the news
are focused on the front of the shop, the interest point,
R. S. Higa (&)  Y. Iano  R. B. Leite  but the subject of interest is on the back of the scene
R. F. L. Chavez  R. Arthur
State University of Campinas, Campinas, SP, Brazil where the image is blurred. The depth of field depends
e-mail: rhiga@decom.fee.unicamp.br on the size of the aperture; it increases as the size of the
Y. Iano aperture diminishes. However, an aperture with a
e-mail: yuzo@decom.fee.unicamp.br smaller size blocks more light and the image captured
R. B. Leite has more noise, also the shutter speed will have to be
e-mail: rleite@decom.fee.unicamp.br slower, which in video applications is limited by the
R. F. L. Chavez frame rate.
e-mail: rlarico@decom.fee.unicamp.br The light field cameras are becoming a new trend in
R. Arthur photography, with the capacity to adjust the focus
e-mail: rangel@ft.unicamp.br post-capture as the most acclaimed feature. These

123
5 Page 2 of 11 3D Res (2014) 5:5

features are ideal to overcome some problems present circle of confusion size, which means that the objects’
in surveillance videos. sharpness are almost the same.
Light field cameras can increase the depth field by However, looking from the position they are from
capturing the angle information from the light rays. the camera, the distance between the objects is bigger
There is a tradeoff between the angle information and in the second situation. A smaller aperture allows the
the image resolution, so the image generated by the objects to be farther from the focal plane and
light field camera has a lower effective image maintains the same sharpness, as shown in Fig. 1b.
resolution compared to the sensor resolution. There If the objects in Fig. 1a were put at the same distance
are many types of cameras that try to sample the that in Fig. 1b, the circle of confusion would be bigger,
plenoptic function. The coded aperture cameras [2, 3] decreasing the sharpness.
use a mask on the aperture plane of a lens, which Decreasing the size of the aperture is a standard
encodes the light that reaches the sensor. The convo- way to increase the depth of field, but this lowers the
lution of the encoder mask and decoder mask results in quantity of light that enters the camera. A surveillance
a Dirac delta function at the origin. Another approach camera has to work day and night, so the lower amount
puts a mask near the sensor and uses the principles of of light is not good for night situations.
modulation to gather more information in the captured
image [4]. In [5] is proposed a design of a camera that
uses a lattice-focal lens, using pieces of lens with 3 Plenoptic Function
different focal lengths. Each piece of lens has a
different transfer function for each depth. The plenoptic function describes the light distribution
The camera used in our experiments was the over space. It has been described as a 7-dimensional
Raytrix R5 color GigE model that has a microlens function (5 spatial dimensions, 1 color dimension, and
array in front of the sensor and is based on integral 1 temporal dimension), as defined in Eq. 1.
photography [6, 7]. The digital light field camera was
P ¼ Pðh; u; k; Vx; Vy; Vz; tÞ; ð1Þ
proposed by Ng [8] and consists of microlenses that
are positioned at a distance f from the sensor and where ðVx ; Vy ; Vz Þ is the observer position, h and u are
focused on infinity, where f is the focal length. By the the azimuth and elevation angles, k is the wavelength
proposal of Lumsdaine and Georgiev [9], the micro- of the light and t is the time. In RGB colorspace, k
lenses are at 4/3f, focusing on the image produced by defolds into three more dimensions.
the main lens. This design allows the maximum A two plane parametrization of the plenoptic
exploitation of the captured information. Since then, function was proposed in [13]. This parametrization
various companies started researching and the first has 4 spatial dimensions and enables a better under-
commercial light field camera was made by Raytrix standing of the plenoptic function. Figure 2 shows how
[10] and then followed by Lytro [11]. Although the this parametrization is made. The plenoptic function
drawbacks of the images captured by the light field analysis is made using only two dimensions, as it is
cameras are the loss in the sensor resolution, the hard to analyze things in four dimensions. Figure 3
quality is not dependent on the resolution alone. For shows three objects at different depths, the x-axis
example, an out of focus high resolution image can be represents the x–y plane and the u-axis represents the
worse than in a focus low resolution image. u–v plane. Most of the analysis made in this context can
be extrapolated to the full 4D version, since the
behavior is similar in the others axes. The inclination of
2 Depth of Field the light field pattern indicates the depth of the object.
The angle of inclination of the red object light field is
The depth of field is the range where the object appears larger than the green object although they are at the
sharper in the image. If the object is not on the focus same distance from the blue object. This is because the
plane, its image gets blurred. A way to diminish the parameterization is not linear in relation to the depth.
blurriness is to reduce the size of the aperture stop; this In an ideal pinhole camera, there is only one point
will block the rays and make the circle of confusion sampling per pixel, which corresponds to the center of
smaller. Figure 1 shows two situations with the same the squares in the right side of Fig. 3. A real pinhole

123
3D Res (2014) 5:5 Page 3 of 11 5

Fig. 1 Depth of field with


two different apertures [12].
Diminishing the size of the
aperture increases the depth
of field. a Large aperture and
b small aperture

Fig. 3 Plenoptic function sampling of a pinhole camera

Fig. 2 Plenoptic function parameterization using two planes


light from the scene, the lens concentrates the light
camera has the sampling more like the squares shown coming from the focal plane. If the lens is focused on
in the figure. The vertical length represents the size of the green object, the plenoptic representation of the
the pinhole, while the horizontal length represents the sampling corresponds to the area inside a parallelo-
size of the pixels. gram, as shown in Fig. 4a. The inclination of the
In a standard lens camera, the sampling is made by parallelogram represents the focal plane of the lens.
integrating all the incident rays into a pixel. In In all previous situations, the sampling occurs along
comparison to the pinhole camera, it samples more the u-axis. The light field information along the x-axis

123
5 Page 4 of 11 3D Res (2014) 5:5

Fig. 6 The microlens recovers the angle of the incidence. a All


rays focusing on the sensor are integrated, the angle of incidence
is lost. b Putting a microlens in the place of the sensor allows the
recovery of the angle of incidence using a higher resolution
sensor. a Sensor and b microlens

shown in Fig. 6a. All rays falling on that pixel are put
out as a single value and the information about the
angle of incidence is lost. Nonetheless, if a microlens
is put in the place of the sensor, the rays are separated
again and a sensor with a higher resolution can be used
to store the intensity of these rays, as shown in Fig. 6b.

4 Light Field Camera

Section 3 has shown how a light field camera samples


the plenoptic function and captures more information.
In this section it is explained how to render the image
Fig. 4 Plenoptic function sampling of standard lens camera.
captured by a light field camera into a conventional
a Focus on green object, b focus on blue object and c focus on
red object image. Also presented is how to extract depth
information and render a view image. The camera
used in this work is the Raytrix GigE R5 model, one of
the first commercial light field cameras.

4.1 Rendering

The resolution of the image produced by the rendering


algorithm described in [8] was limited by the number
of the micro lenses, as it is rendered one pixel per
microlens. In the full resolution rendering described in
Fig. 5 Plenoptic function sampling of a light field camera [10], a patch from each of the microimages is taken
from the plenoptic image, and then all of them are tiled
is lost. Figure 5 shows how a plenoptic camera together to form a rendered view. This results in more
samples the light field along both axes. resolution, rendering more pixels per microlens.
Imagine a sensor in the place of the microlens, Figure 7 illustrates the process. The size of the patch
where the size of a pixel is the size of a microlens, as is what determines the focal plane (Fig. 8).

123
3D Res (2014) 5:5 Page 5 of 11 5

Fig. 7 Process of rendering the focused image

Fig. 9 Rendering with wrong parameters

with an aperture smaller than the ideal size, so the


white circles would be well-defined. The image
rendering process needs to know the centers of the
microimages, as long as, if the centers were not
perfectly matched, the rendered image will present
artifacts. Figure 9 shows these artifacts.

4.2.1 Hexagonal Grid

On a square grid, the position of a microlens can be


found by simple horizontal and vertical translations, but
on a hexagonal grid it is not that simple. The horizontal
position depends on which line of the grid it is on.
Equation 2 shows a method for calculating the micro-
image centers based on the distance between the centers.
 
cx ¼ ox pitch þ 0:5 oy % 2 pitch;
rffiffiffi
3 ð2Þ
cy ¼ o y pitch;
2
where ðcx ; cy Þ are the center coordinates of the
microimages, pitch is the distance between two
microimages, and ox and oy are the horizontal and
vertical indices, respectively. Notice that it is pre-
sumed that the central microlens is at coordinate (0,0)
and the grid is perfectly aligned with the horizontal
Fig. 8 Example of rendering. a Focus on near object and axis. Translation and rotation of the coordinates are
b focus on far object applied to fit the calibration image.

4.2 Calibration 4.2.2 White Image

The calibration method used here is suggested by the The localization of the centers of the microimages is
Raytrix camera manual. A white image is captured made by capturing a white image as shown in Fig. 10.

123
5 Page 6 of 11 3D Res (2014) 5:5

Fig. 10 White image used for calibration


Fig. 12 Light field image

Fig. 11 Center found after processing

The MATLAB function regionprops is used to find the Fig. 13 Depth estimation
centers of the white circles. Figure 11 shows the results
of this step. Also, from Fig. 11 it can be seen that the
shape of the microimage is a pentagon. This is because 4.3 Depth Estimation and All-In-Focus Rendering
the main lens had five aperture blades. The shape of the
microimage is controlled by the shape of the aperture. Depth estimation is usually made by finding the disparity
Once the center coordinates are found, the next step between two images (Fig. 12). The same technique can
is to find the central microimage coordinate and be applied to plenoptic images, finding the disparity
compensate any rotation in the image. Then, the pitch between two microimages [14]. The algorithm is simple
parameters of Eq. 2 are adjusted to match the points when working on a rectangular grid, but a little more
discovered in Fig. 11. complex on a hexagonal grid. The method for estimating
Using a simple white image permits the calibration the depth, described in [15], is more suitable for
can be performed anywhere. Any change in the main hexagonal grids. In analyzing the edges of the micro-
lens adjustments affects the calibration parameters. images, if it gets smoother, then probably the patch is at
Surveillance cameras are normally installed in hard- the correct size. Figure 13 shows a depth estimation
to-access places, so a simple calibration method is using this algorithm. The depth information can be used
well suited. to help people detection algorithms [16].

123
3D Res (2014) 5:5 Page 7 of 11 5

Fig. 15 Experimental setup using conventional and plenoptic


cameras
Fig. 14 All-in-focus rendering

The depth estimation is not perfect because it is one with an all focus image, and the other with a depth
based on the assumption that the image has textures, estimation image. Toshiba is developing a small
otherwise the estimation goes wrong. See Fig. 14 and, sensor specialized for light field capture [18], which
considering regions that there is no texture, the depth can be used for surveillance cameras. The advantages
estimation has an unexpected behavior. of this approach would be the better image quality
The all-in-focus rendering is made based on the using a higher resolution sensor, real time processing,
depth estimation, if the estimation fails, then the and less storage space. The disadvantage is that the
rendering is compromised. A few artifacts near the camera will need a high processing power or special-
wire can be seen in Fig. 14. ized hardware; also the rendering of an all-in-focus
image could be affected by rendering artifacts.
In our approach, the light field camera is used as a
5 Framework surveillance camera like the second approach, but the
plenoptic image stream is sent without processing. The
The framework proposed here uses the light field advantage is that the information is not degraded by
camera to increase the depth of field of surveillance any processing and the use of a higher resolution
cameras. The Raytrix R5 color GigE light field camera sensor. The disadvantage is that the storage space
is employed as a surveillance camera and the raw feed needed will be much higher, and there is also needed a
is saved into a computer for later processing. This is high processing power for viewing the surveillance
not the best scenario in relation to the bandwidth used, video stream in real-time.
as the raw images require a high bitrate stream. The experimental setup is shown in Fig. 15, where
A second approach possible is to adapt existing the Pelco Sarix 2.1 MP surveillance camera is below
cameras to produce plenoptic images. Olympus the Raytrix R5 color GigE plenoptic camera.
received a patent that adapt Four Thirds standard
cameras into a plenoptic one [17]. The same kind of 5.1 Surveillance Image Resolution
adapter can be made for surveillance cameras. The
advantage of this approach would be a fast conversion In the context of surveillance, which is better? A high
of the existing surveillance cameras into light field resolution image with no special attention to focus
cameras. The disadvantage is that the effective settings or a low resolution but sharp image? Figure 16
resolution would be lower, and a high power computer shows a comparison between these two situations. The
would be needed to view all the streams. out of focus image was taken with a surveillance camera
A third approach would be to build a light field and the low resolution was rendered by the image taken
surveillance camera from the start, using the light field with the light field camera. The surveillance camera
processing inside the camera and send two streams: resolution is 1,920 9 1,080 px and the rendered image

123
5 Page 8 of 11 3D Res (2014) 5:5

Fig. 16 Using the light


field camera for surveillance
in an indoor environment.
Surveillance out of focus.
a Surveillance, b plenoptic
image and c rendered from
plenoptic image

is 640 9 480 px. The face of the person on the right in which gives a larger depth of field. So the increase of
Fig. 16a is unrecognizable, but in Fig. 16b the face can depth of field is not only due to the light field camera
be recognized even with the low resolution. characteristics, but even with the increase in depth of
This comparison was made because light field field, an out of focus object could still be blurred. So
cameras have an effective resolution lower than the the advantage of refocusing after capture is still true,
sensor resolution. As stated earlier, the resolution this is useful when the automatic focus sets on the
alone does not guarantee a good quality image. A low wrong object of interest like in Fig. 16a, where it is
resolution image is still useful for surveillance as long focused on the poster.
as it is in focus.
5.3 Performance of Existing Codecs on Plenoptic
5.2 Light Field Surveillance Images

The result of using a light field camera for surveillance is Compression is an important topic that affects how
shown in Fig. 16. Figure 16b is the original plenoptic much footage the surveillance camera can store. The
image captured, and Fig. 16c is the rendered image from creation of a new standard is not easy and takes a
the original plenoptic image with the focus on the long time, so knowing the performance of existing
subject of interest. Considering that the indoor environ- compression standards over plenoptic images is
ment has less light available, Fig. 16c is darker than the useful. The comparison is made not only on the
standard surveillance cameras. The Raytrix light field raw plenoptic images, but also on the processed
camera uses a fixed f/8 aperture, while the surveillance rendered views. The question is whether the rendered
camera has an automatic f/1.9 aperture. The surveillance views will have stronger degradation than the raw
camera has the advantage of letting more light in, so the image or not.
images have less noise and are brighter. The plenoptic image sequences shown in Fig. 17
Observing the images in Fig. 16a, c, the compar- were processed using three different codecs
ison of depth of field of these cameras must take in (MJPEG2000, MJPEG and MPEG4). The quality
consideration that the Raytrix uses a smaller aperture, comparison is made in both plenoptic and renderized

123
3D Res (2014) 5:5 Page 9 of 11 5

Fig. 17 Plenoptic image sequences used in the compression tests. a Sequence 1, b Sequence 2 and c Sequence 3

Fig. 18 Performance of existing compression standards applied on plenoptic images. a Sequence 1, b Sequence 2 and c Sequence 3

images using the PSNR. The qualities of the codecs The size depends not only on the quality parameter,
were adjusted so the compressed file would have but also on the content of the sequence, and, because of
roughly the same size for each sequence. The objec- this, the exact same size for all compressed sequences
tive is to use the same bits per pixel for each codec. is not possible.

123
5 Page 10 of 11 3D Res (2014) 5:5

Figure 18 shows that the compression standards [19] Another concern in surveillance is the video
give good results, even if it is not taking advantage of the storage, which can be used as evidence. The size of
plenoptic image geometry. The full shapes markers the image captured by the Raytrix camera is
present the performance of the compression over the 2,560 9 1,920 px, which is five times more informa-
plenoptic images. The hollow shapes markers present tion than a HD surveillance camera. The storage has to
the image rendered using the plenoptic image without be increased five times or the system will record five
compression, and a rendered image using a plenoptic times less footage. The experiments showed that
image extracted from the compressed video sequence. existing compression standards could be used with the
The MPEG-4 standard showed the best performance light field camera, so a fast solution for storing
overall, followed by MJPEG2000 and MJPEG. plenoptic images would not be an issue, but dedicated
The objective of this section is to show that, if a compression could increase the efficiency and reduce
normal surveillance camera were adapted using a the space needed by the plenoptic image.
microlens array or using existing codecs on light field The future of light field cameras is very promising,
cameras, the captured plenoptic images through the with new results appearing very frequently. The
compressed camera stream would have an acceptable application of this technology on surveillance cameras
performance. seems a logical next step to improve security systems,
but, as mentioned, the camera must be designed for
this application in mind.
6 Conclusion
Acknowledgments The authors would like to thank the
following institutions for making this work possible:
This work has analyzed how the light field camera can
Coordenação de Aperfeiçoamento de Pessoal de Nı́vel
be used in a surveillance application. The optics of the Superior (CAPES), Programa de Formação de Recursos
plenoptic camera allows the focus reconstruction and Humanos em TV Digital (CAPES-RHTVD), Centro de
increases the depth of field. Pesquisa e Desenvolvimento em Tecnologias Digitais para
Informação e Comunicação—Rede Nacional de Ensino (CTIC-
Light field cameras can improve people detection RNP), Conselho Nacional de Desenvolvimento Cientı́fico e
algorithms and the depth information could be used to Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado
better segment the scene, but the algorithms of depth de São Paulo (Fapesp), Fundo de Apoio ao Ensino, Pesquisa e à
estimation are not perfect and may fail when there are Extensão—State University of Campinas (Faepex-Unicamp).
no edges on the image.
The size of aperture controls how much light enters
the camera, and on a light field camera it also controls References
the size of the microimage. A standard surveillance
camera has a typical maximum aperture of f/1.9, which 1. Kovesi, P. (2009). Video surveillance: Legally blind? In
is ideal for low light situations. The light field cameras Digital image computing: techniques and applications,
DICTA 2009, 1–3 December, 2009, Melbourne. IEEE
have to match the microlens and main lens aperture, Computer Society. doi:10.1109/DICTA.2009.41.
otherwise the microimages would get bigger and 2. Zhou, C., & Nayar, S. K. (2009). What are good apertures
interfere with each other. The Raytrix camera used has for defocus deblurring? In IEEE international conference
a fixed aperture of approximately f/8, which is four on computational photography.
3. Marcia, R. F., Harmany, Z. T., & Willett, R. M. (2009).
times less light than a surveillance camera. So the Compressive coded aperture imaging. In Proc. SPIE 7246,
camera used is not suited for surveillance applications, Computational Imaging VII. doi:10.1117/12.803795.
mainly because of the small fixed aperture. 4. Veeraraghavan, A., Raskar, R., Agrawal, A., Mohan, A., &
The desirable characteristics of a light field sur- Tumblin, J. (2007). Dappled photography: Mask enhanced
cameras for heterodyned light fields and coded aperture
veillance camera would include a large aperture refocusing. ACM Transactions on Graphics, 26(3), 1–12.
opening, low f-number, and a main lens compatible doi:10.1145/1276377.1276463.
with the surveillance camera field of view. A careful 5. Levin, A., Hasinoff, S. W., Green, P., Durand, F., & Freeman,
microlens design would allow more light inside the W. T. (2009). 4D frequency analysis of computational cam-
eras for depth of field extension. ACM Transactions on
camera and make the light field camera suitable for Graphics, 28(3), 97:1–97:14. doi:10.1145/1531326.1531403.
surveillance. The Lytro camera, for example, has a 6. Ives, F. (1903). Parallax stereogram and the process of
fixed f/2 aperture, so such design is possible. making same. US Patent 725,567.

123
3D Res (2014) 5:5 Page 11 of 11 5

7. Lippmann, G. (1908). La Photographie Integrale. Comptes 15. Wanner, S., Fehr, J., & Jahne, B. (2011). Generating EPI
Rendus Academie des Sciences, 146(9), 446–451. representations of 4D light fields with a single lens focused
8. Ng, R. (2006). Digital light field photography. Dissertation, plenoptic camera. Advances in Visual Computing, Lecture
Stanford University. Notes in Computer Science, 6938, 90–101.
9. Lumsdaine, A., & Georgiev, T. (2008). Full resolution 16. Salas, J. & Tomasi, C. (2011). People detection using color
lightfield rendering. Adobe Tech Report, January 2008. and depth images. In Proceedings of the third Mexican
10. Perwass, C., & Wietzke, L. (2012). Single lens 3D-camera conference on pattern recognition, MCPR’11 (pp. 127–135).
with extended depth-of-field. In Proc. SPIE 8291, Human Berlin: Springer. http://dl.acm.org/citation.cfm?id=2026143.
vision and electronic imaging XVII, 829108. doi:10.1117/ 2026160.
12.909882. 17. (2013). Olympus patent: Light field adapter for micro four-
11. Lytro. (2013). Accessed November 11, 2013, from https:// thirds cameras. Accessed June, 2013, from http://lightfield-
www.lytro.com/. forum.com/2013/06/olympus-patent-light-field-adapter-for-
12. Higa, R. (2013). What is depth of field? Accessed Novem- micro-four-thirds-cameras/.
ber 11, 2013, from http://www.photographycompendium. 18. Alabaster, J. (2013). Toshiba demos ‘Lytro chip,’ converts
com/what-is-depth-of-field/. phones to light-field cameras. Accessed June 2013, from
13. Levoy, M., & Hanrahan, P. (1996). Light field rendering. In http://www.techhive.com/article/2029395/toshiba-demos-
Proceedings of the 23rd annual conference on computer lytro-chip-converts-phones-to-lightfield-cameras.html.
graphics and interactive techniques, SIGGRAPH’96 (pp. 19. Higa, R. S., Chavez, R. F. L., Leite, R. B., Arthur, R., &
31–42). New York: ACM. doi:10.1145/237170.237199. Iano, Y. (2013). Plenoptic image compression comparison
14. Georgiev, T., & Lumsdaine, A. (2010). Reducing plenoptic between JPEG, JPEG2000 and SPITH. Cyber Journals:
camera artifacts. Computer Graphics Forum, 29(6), JSAT, 3(6).
1955–1968. doi:10.1111/j.1467-8659.2010.01662.x.

123