Sie sind auf Seite 1von 4

VIDEO ENHANCEMENT BY TEMPORAL INTEGRATION BASED ON GLOBAL SEGMENTATION

REPRESENTATION O F INTERFRAME SPATIAL IMAGE TRANSFORMATION


Y. Nukuzawa,

T. Komatsu, and

T. Saito

Department of Electrical Engineering, Kanagawa University


3-27-1 Rokkakubashi, Kanagawa-ku, Yokohama, 22 1, JAPAN
E-mail: kurikuri@cc.kanagawa-u.ac.jp

ABSTRACT
We present image processing algorithms for the generic temporalintegration video-enhancement approach based on the global
segmentation representation of motion, and demonstrates their
usefulness by experimental simulations. As a specific case, here we
take up the interlaced-to-progressive scan conversion problem, form
the interlaced-to-progressive transform along the line of the
temporal integration, and then experimentally evaluate it. The
experimental simulations demonstrate that the temporal-integration
approach is very promising as a basic means of video enhancement.

progressive transform.
In this paper, we deal with image processing for the generic
temporal-integration video-enhancement approach, and present its
computational algorithms. Moreover, we take up the interlaced-toprogressive scan conversion problem as a specific case of the video
enhancement problem and form the interlaced-to-progressive
transform along the line of the temporal integration.

2. IMAGE PROCESSING FOR GENERIC


TEMPORAL-INTEGRATION VIDEOENHANCEMrENT APPROACH

1. INTRODUCTION
With the coming drastic changes in the information and
communication environment surrounding us, the importance of the
role of multimedia information, especially visual information, will
increase very rapidly, and then a wide variety of applications in the
field of video sequence processing include the concepts of editing,
enhancing, handling, retrieving and recognizing visual information.
In this paper, among various applications of advanced video
sequence processing we take up the spatial-resolution enhancement
of a video sequence.
The motivation of our study on the spatial-resolution
enhancement of a video sequence is as follows. To increase spatial
resolution of a high-resolution imaging device such as a CCD
imager, reducing the pixel size is the most straightforward way, but
this approach renders p imaging device much more sensitive to
shot noise. To keep shot noise invisible on a monitor, there needs to
be a limitation in the pixel size reduction. Current CCD technology
has almost reached this limit. Hence, a new approach is required to
increase spatial resolution further beyond the resolution bounds
qualified by shot noise physically.
One promising approach towards improving spatial resolution
is to produce an improved-resolution moving image sequence by
integrating multiple consecutive frames of a moving image
sequence. Here we take up this approach, and we refer to it as the
temporal-integrationvideo-enhancement approach. The principle
of the temporal-integration video-enhancement approach is that we
reach an increase in the sampling rate by integrating more samples
of the imaged object from a given input moving image sequence
where the object appears moving. Here we apply the principle of the
temporal-integration video-enhancement approach to the
conversion problem that the progressively-scanned moving images
are reproduced from the standard interlacedly-scanned TV images.
This scan conversion is often referred to as the interlaced-to-

0-7803-3258-X/96/$5.00 0 1996 IEEE

725

Image processing for the generic temporal-integration videoenhancement approach includes the three major concepts: global
segmentation representation 01 interframe or interfield spatial image
transformation, sub-pixel registration, high-resolution
reconstruction. Interframe or iinterfleld spatial image transformation
usually represents image motion, and hence for simplicity we often
use the term of image motion instead of the term of interframe or
interfield spatial image transformation as I think proper. The global
segmentation representation of motion is to divide a moving image
sequence into multiple global image regions each of which
undergoes separate coherent motion represented well with a
parametric model of motion such as an affine or a perspective
transformation model. The sub-pixel registration is to make precise
sub-pixel interframe correspondence between pixels that appear in
two image frames arbitrarily chosen out of an observed moving
image sequence. The high-resolution reconstruction is to
reconstruct an improved-resolution and/or improved-SNR moving
image sequence with uniformly-spaced samples by integrating the
non-uniformly accumulated samples composed of samples showing
the sub-pixel interframe correspondence.
There is probably no unsolved serious problem about the highresolution reconstruction, and a number of algorithms are available
for it [ 11, [2]. We can construct an interpolation algorithm which
takes several degradation factors into account by extending the
image reconstruction algorithm for projections onto convex sets
(POCS). At present, we believe that the POCS-based iterative
interpolation method is fairly flexible and best suited to the
temporal-integration video-enhancement approach.
On the other hand, the global segmentation representation of
motion and the sub-pixel registration are interdependent and
difficult to solve completely. In addition to that, in some cases,
requirements on accuracy of estimation of motion are quite different
between the two. The global segmentation representation of motion
requires us to estimate global motion roughly, that is to say, e.g. with
half- or quarter-pixel accuracy at most, but the sub-pixel registration

demands that we should estimate local motion much more precisely,


namely, with fractional pixel accuracy, e.g. preferably one fifth
pixel accuracy at lowest if possible. To cope with the predicament,
we should adopt a two-stage approach. In the first stage of this
approach, we recover a rough global segmentation representation of
motion from a given input moving image sequence. In the second
stage of this approach, we should employ another accurate motion
analysis method.
Here, we briefly present a typical accurate motion analysis
method, referred to as the quadrilateral-patch-based method [ 5 ] ,for
the second stage. In this method, firstly we define a region of interest
(ROI) within a segmented image region given by the first stage for a
certain specified image frame, and cover the ROI with quadrilateral
patches. Then we spatially transform the quadrilateral patches
according to the estimated affine warping model to track them from
frame to frame, and thus we put the quadrilateral patches at their
proper initial positions in each image frame. Furthermore we
describe local image motion with sub-pixel accuracy inside the ROI
as deformation of the quadrilateral patches, and we deform the
quadrilateral patches by shifting their grid points by sub-pixel
displacement vectors. Thus we describe an image warp between the
two image frames as deformation of the quadrilateral patches, and
then we perform the sub-pixel registration by warping observed
image frames to a certain specified image frame with the resultant
warping functions recovered piecewise within each quadrilateral
patch. As for the piecewise-recovered warping function, we use
perspective transformation, which describes the mapping of a twodimensional plane onto the image plane of the arbitrarily moved
camera and which facilitates a planar quadrilateral-to-quadrilateral
mapping.
In such a specific case as does not require us to estimate local
interframe or interfield spatial image transformation extremely
precisely, we need not to take the two-stage approach. The
interlaced-to-progressive scan conversion problem exemplifies
such a specific case, and the requisite accuracy for the estimation of
local spatial image transformation will be quarter-pixel accuracy at
most. Hence, in the case of the interlaced-to-progressive transform ,
we need not to separate the sub-pixel registration from the global
segmentation representation of spatial image transformation.
However, in this case, we should represent spatial image
transformation between successive image fields, which requires us
to take the phase shift of scan lines between odd and even image
fields into consideration and then to compensate for the phase shift
with the standard interpolation method composed of a certain
simple linear operation such as the bi-linear interpolation technique.

video-enhancement approach.
Here, we adopt the direct approach. The direct approach,
operating on an observed moving image sequence
straightforwardly, performs both global segmentation and
estimation of global parametric models of motion, which
approximate the true transformations of segmented global image
regions, without recovering a local optical flow representation of
motion. Recently we have presented an alternately iterative grid
search algorithm for the direct approach, but the algorithm often
provides undesirable noisy segmentation results [4].To solve this
problem, we newly present some additional techniques.
3.2. Basic Computational Algorithm: Specific Case of Two
Image Regions
First we briefly describe the direct approach for the specific case
where the image is composed of two image regions undergoing
different affine transformations:

where the 2-D vectors p ,p ' mean the image coordinates of the pixel
P in the present image frame and those of the warped pixel P' in the
next image frame respectively. The direct approach can be easily
extended to a more general case. In this specific case, we perform
global segmentation of the image into two image regions and
estimation of their corresponding two different affine warping
models simultaneously, and we formulate the problem as the
minimization of the cost funcbon defined by

: Sum of squared forward prediction errors

3. GLOBAL SEGMENTATION REPRESENTATION OF


MOTION
3.1. Direct Approach vs. Gradual Migration Approach
With regard to the global segmentation representation of motion, we
can take up two different approaches: a gradual migration approach
and a direct approach. The gradual migration approach firstly
estimates a local optical flow representation of motion, and then
recovers a global segmentation representation of motion by
applying a clustering algorithm using global parametric models of
motion. The gradual migration approach often provides erroneous
image segmentation along motion boundaries, which is mainly
owing to the generalized aperture problem about the estimation of
local image motion [4],and hence it would seem that the gradual
migration approach is not suitable for the temporal-integration

Sum of squared backward prediction errors

I ( p i , 5 ): Intensity value of the pixel P, in the jth image frame 5


and then we perform the minimization of the cost with respect to the
two affine warping models Ti, T, Here we use (2K+ 1) consecutive
image frames round the present image frame F,, and for each pixel Pz
in the present image frame F', we produce four different sums of
squared interframe prediction errors { Cgfl(Ti),C,'b)(T,), Csfl(T2),

726

C,;b(T2) ), which are computed with the two different affine


warping models T,, T, and their inverse affine warping models T;
Ti. For each pixel P,, we select the minimum value out of the
four squared prediction error sums, and this minimization gives us
initial segmentation of each pixel P,into two image regions.
With regard to this minimization, we should notice the
following two points:
(1) the use of more than three consecutive image frames might
stabilize this initial segmentation, and
(2) the use of the backward interframe predictions together with the
forward interframe predictions might allow us to make interframe
correspondence even in the image region uncovered by the other
image regions corresponding to the foreground moving objects.
Furthermore we sum up the pixelwise-minimized squared
interframe prediction errors over the entire image frame, and thus
we obtain the cost function defined over the entire image frame.
We optimize the two different affine warping models T T, so
that the cost function of Eq. 2 may be minimized by altemately
iterating two different refinement processes. The two different
refinement processes are organized as follows: first we divide the
six affine parameters of each affine warping model T, into two subsets, namely, the sub-set composed of the two affine parameters [ cI,
) corresponding to the translation and the sub-set composed of the
four affine parameters ( U,, b,, c,, d, } corresponding to the other
transformations such as rotation, dilation and shearing, and then we
refine the two sub-sets of the affine parameters alternately in an
iterative fashion. To update each sub-set of the affine parameters so
that the cost function of Eq.2 may be decreased monotonously, we
employ the standard steepest descent algorithm.

iterative minimization ofthc cost function of Eq. 2 initially we give


a small value to T/ and we increase the value monotonously with
iteration to make it converge to a ccrtain prescribed positive value.
Thus the present image framc is segmentcd into three image
regions: two global imagc rcgions undergoing different affine
warping models T ,,T, and ilfuizzy imagc region.

3.4. Temporal Projection ancl Temporal Smoothing


The affine warping models and segmentation of successive input
image frames will be similar.. because a rigid object shape and
motion change slowly from frelnie to frame. To preserve temporal
coherency property and tcmporal continuity, we introduce a
technique referred to as the temporal projection where we use the
current resultant global segnicnlation representation of motion as an
initial setup for segmentation of the next input image frame. On the
other hand, the resultant global segmentation representation of
motion given by the basic algorithm often looks very noisy, and so
to stabilize global segmentation representation of motion, moreover
we introduce a technique referred to as the temporal smoothing
where we smooth out and integrate the segmented image regions
temporally over consecutive image frames with the standard
temporal median filter while performing registration on the
segmented image regions according to the estimated affine warping
models.

,,

3.5. HierarchicalEstimation
The basic algorithm needs a vait amount of computation. To reduce
its computational complexity, we form the foregoing altemately
iterative refinement algorithm for minimizing the cost function of
Eq.2 according to the hierarchical estimation framework, which is
composed of the two basic components: the pyramid construction
and the coarse-to-fine refinement. We employ the Gaussian
pyramid as the pyramid construction, while as regards the coarse-tofine refinement we transmit the values of the estimated affine
parameters from one level to the next level where the transmitted
values are then used as an initial estimate. As for the sub-set of the
four affine parameters ( a, 6, d , e ), at the top Gaussian plane, their
estimates almost converge within a few iterations, and at the
succeeding levels their estimates are updated by only a small
quantity.

3.3. Fuzzy Pixel and Fuzzy Image Region


The basic algorithm for the direct approach does not take the spatial
coherence property into consideration, and hence it cannot handle
image regions with few salient features such as flat image regions.
For such image regions, we had better defer determining the
segmentation, or rather, we had better not do any segmentation. To
cope with this problem, we introduce the concept of a fuzzy pixel
and that of a fuzzy image region. We extract fuzzy pixels from the
present image frame F, while selecting the minimum value out of the
foregoing four different squared interframe prediction errors in Eq.
2, and here we define the fuzzy pixel as the pixel whose pixelwiseminimized squared interframe prediction error gives salient priority
to neither of the two different affine warping models. To put it in the
concrete, we judge whether the pixel P, in the present image frame
F, is a fuzzy pixel or not according to the rule:

if

( ~ : , ~ - ~ ~ ~ ) - ~ l , ~ y m -<
r mT,,n 2
t ) 0,then the pixel P,

where the value of the threshold

3.6. Extension to a More General Case of More Than Three


Image Regions
The above describes the direct approach only for the specific case
where we segment the image into two global image regions. We can
easily extend the direct approach to that applicable to a more general
case. For that purpose, we can adopt three different strategies: a
parallel approach, a sequential approach and a hierarchical
approach. We consider that the hierarchical approach is best suited
to the temporal-integration video-enhancementapproach, and hence
we adopt the hierarchical approach, which splits the existing
segmented image region where a certain homogeneity test is not
satisfied into two distinct image regions. As for the homogeneity
test, we test whether the (estimated affine warping model
approximates the true image motion in each segmented image
region sufficiently well or not, that is to say, the minimized value of
the cost function of Eq. 2 falls talow a prescribed threshold value T,
or not.

is a fuzzy pixel

T/ is non-negative and in the

4. EXPERIMENTALSIMULATIONS
As a specific case, we take up the interlaced-to-progressive scan

727

form the interlaced-to-progressive transform along the line of the


temporal integration. The experimental simulations demonstrate
that the temporal-integration approach is very promising as a basic
means of video enhancement.

conversion problem where for the present image field global motion
segmentation representation of interfield spatial image
transformation is recovered from successive image fields before and
after the present image field and then a progressively-scanned image
with the double scan lines is reconstructed by integrating the nonuniformly accumulated samples composed of samples showing the
sub-pixel interfield correspondence. Fig. l(a) shows the original
image field of the test image sequence where the magazine swings
like a pendulum whereas the background appears to shift
horizontally. Fig. l(b) shows the resultant global segmentation
representation of interfield spatial image transformation. Fig l(c)
shows part of the original interlacedly-scanned image field, but the
image field is vertically interpolated and magnified by two times
with the standard bi-linear interpolation technique. Fig. l(d) shows
part of the reconstructed progressively-scanned image. The
interlaced-to-progressive transform formed along the line of the
temporal integration works very well, and produces high-quality
progressive1y-scanned images.

REFERENCES
[ l ] T. Komatsu, T. Igarashi, K. Aizawa, and T. Saito, "Very High
Resolution Imaging Scheme with Multiple Different-Aperture
Cameras," Signal Process. : Image Comniun., 5, pp.511-526, 1993
[2] A.J. Patti, M.J. Sezan, and A.M. Telekap, "High Resolution
Image Reconstruction from a Low-Resolution Image Sequence in
the Presence of Time-Varying Motion Blur," Proc. IEEE 1994 Int.
Conf. Image Process., pp.343-347, 1994
[3] J.Y.A Wang and E.H. Adelson, "Layered Representation for
Motion Analysis," Proc. IEEE 1993 Conf. Comput. Vision Patt.
Recog., pp.361-366, 1993
[4] T. Saito, T. Komatsu, and Y. Akimoto, " Two-Approaches
toward Global Motion Segmentation for Mid-Level Moving Image
Representation," Proc. Second Asian Conf. Comput. Vision,
pp.II:311-11:315, 1995
[ 5 ] Y. Nakazawa, T. Saito, T. Komatsu, T. Sekimori, and K.
Aizawa, "Two Approaches for Image-Processing Based High
Resolution Image Acquisition," Proc. IEEE 1994 Int. Conf. Image
Process., pp.1147-1151, 1994.

5. CONCLUSIONS
We present computational algorithms of image processing for the
generic temporal-integration video-enhancement approach, and
demonstrate their usefulness by experimental simulations. Here we
take up the interlaced-to-progressive scan conversion problem, and

. ... ..
(a) Original image field

..<I

(b) Global segmentahon representahon of


interfield spahal image transformation

(d) Part of the reconstructed progressively-scanned image


(c) Part of the original interlacedly-scanned image
Figure 1 Application to the interlaced-to-progressive scan conversion problem.

728

Das könnte Ihnen auch gefallen